Top 30 NLP Use Cases in 2024: Comprehensive Guide

case study on nlp

Natural language processing (NLP) is a subfield of AI and linguistics that enables computers to understand, interpret and manipulate human language. 

Although machines face challenges in understanding human language, the global NLP market was estimated at ~$5B in 2018 and is expected to reach ~$43B by 2025. And this exponential growth can mostly be attributed to the vast use cases of NLP in every industry.

You may be familiar with many day-to-day NLP applications such as autocorrection, translation, or chatbots. However, NLP has numerous impactful applications that business leaders are not aware of. Therefore, we compiled a comprehensive list of NLP use cases and applications and categorized them according to relevant industries and business functions:

General applications

1. translation.

One of the top use cases of natural language processing is translation. The first NLP-based translation machine was presented in the 1950s by Georgetown and IBM , which was able to automatically translate 60 Russian sentences into English. Today, translation applications leverage NLP and machine learning to understand and produce an accurate translation of global languages in both text and voice formats.

2. Autocorrect

NLP is used to identify a misspelled word by cross-matching it to a set of relevant words in the language dictionary used as a training set. The misspelled word is then fed to a machine learning algorithm that calculates the word’s deviation from the correct one in the training set. It then adds, removes, or replaces letters from the word, and matches it to a word candidate which fits the overall meaning of a sentence.

3. Autocomplete

Autocomplete, or sentence completion, combines NLP with certain machine learning algorithms (e.g. Supervised learning, Recurrent neural networks (RNN), or Latent semantic analysis (LSA)) to predict the likelihood of using a following word or sentence to complete the meaning.

4. Conversational AI

Conversational AI is the technology that enables automatic conversation between computers and humans. It is the heart of chatbots and virtual assistants like Siri or Alexa. Conversational AI applications rely on NLP and  intent recognition  to understand user queries, dig in their training data, and generate a relevant response.

Chatbots have numerous applications in different industries as they facilitate conversations with customers and automate various rule-based tasks, such as answering FAQs or making hotel reservations .

5. Automated speech/voice recognition

Voice recognition, also known as automatic speech recognition (ASR) and speech to text (STT), is a type of software that converts human speech from its analog form (acoustic sound waves) to a digital form that can be recognized by machines. ASR works by:

  • Splitting the audio of a speech recording into individual sounds (tokens),
  • Analyzing each sound,
  • Using algorithms (NLP, deep learning , Hidden Markov Model , N-grams) to find the most probable word fit in that language,
  • Converting the sounds into text.

Today, smartphones integrate speech recognition with their systems to conduct voice searches (e.g. Siri) or provide more accessibility around texting. 

Image shows how the speech recognition process works.

6. Automatic text summarization

Automatic text summarization is the process of shortening long texts or paragraphs and then generating a concise summary that passes the intended message. There are 2 main methods to summarize texts:

  • Extractive summary : In this method, the output text will be a combination of meaningful sentences extracted directly from the original text.
  • Abstractive summary : This method is more advanced, as the output is a new text. The aim is to understand the general meaning of sentences, interpret the context, and generate new sentences based on the overall meaning.

In both methods, NLP is used in the text interpretation steps, which are:

  • Cleaning the text from filling words
  • Sampling the text into shorter sentences (tokens)
  • Creating a similarity matrix that represents relations between different tokens
  • Calculating sentence ranks based on semantic similarity
  • Selecting sentences with top ranks to generate the summary (either extractive or abstractive)

Image demonstrates the steps of the summarizing texts process of NLP models.

7. Language models

Language models are AI models which rely on NLP and deep learning to generate human-like text and speech as an output. Language models are used for machine translation, part-of-speech (PoS) tagging, optical character recognition (OCR) , handwriting recognition, etc.

Some of the famous language models are GPT transformers which were developed by OpenAI, and LaMDA by Google. These models were trained on large datasets crawled from the internet and web sources to automate tasks that require language understanding and technical sophistication. For instance, GPT-3 has been shown to produce lines of code based on human instructions.

I got GPT-3 to start writing my SQL queries for me p.s. these work against my *actual* database! pic.twitter.com/6RoJewXEEx — faraaz.eth (@faraaz) July 22, 2020

To find the right AI data partner for your NLP projects, check out the following articles:

  • Top 12 AI Data Collection Services & Selection Criteria
  • Data Crowdsourcing Platform: 10+ Companies & Criteria

Retail & e-commerce use cases

8. customer service chatbot.

Chatbots in customer service can:

  • Answer FAQs .
  • Schedule appointments.
  • Book tickets .
  • Process and track orders .
  • Cross sell .
  • Onboard new users.

To explore more use cases, feel free to read our in-depth article about chatbot use cases in customer service .

9. In-store bot

Several retail shops use NLP-based virtual assistants in their stores to guide customers in their shopping journey. A virtual assistant can be in the form of a mobile application which the customer uses to navigate the store or a touch screen in the store which can communicate with customers via voice or text. In-store bots act as shopping assistants, suggest products to customers, help customers locate the desired product, and provide information about upcoming sales or promotions.

To learn more about retail chatbots, click here.

10. Market intelligence

Marketers can rely on web scraping to extract e-commerce data (e.g. blogs, social media posts , news websites), as well as product data (reviews, ranks, comments) and combine it with NLP capabilities to analyze consumer sentiments, detect market trends, and optimize their marketing strategies.

11. Semantic based search

Semantic search refers to a search method that aims to not only find keywords but also understand the context of the search query and suggest fitting responses. Many online retail and e-commerce websites rely on NLP-powered semantic search engines to leverage long-tail search strings (e.g. women white pants size 38), understand the shopper’s intent, and improve the visibility of numerous products. Retailers claim that on average, e-commerce sites with a semantic search bar experience a mere 2% cart abandonment rate, compared to the 40% rate on sites with non-semantic search. 

Read our article on the Top 10 eCommerce Technologies with Applications & Examples to find out more about the eCommerce technologies that can help your business to compete with industry giants.

Healthcare use cases

12. dictation.

To document clinical procedures and results, physicians dictate the processes to a voice recorder or a medical stenographer to be transcribed later to texts and input to the EMR and EHR systems. NLP can be used to analyze the voice records and convert them to text, to be fed to EMRs and patients’ records.

13. Clinical documentation

In 2017, it was estimated that primary care physicians spend ~6 hours on EHR data entry during a typical 11.4-hour workday. NLP can be used in combination with optical character recognition (OCR) to extract healthcare data from EHRs, physicians’ notes, or medical forms, to be fed to data entry software (e.g. RPA bots ). This significantly reduces the time spent on data entry and increases the quality of data as no human errors occur in the process.

14. Clinical trial matching

NLP can be used to interpret the description of clinical trials and check unstructured doctors’ notes and pathology reports, to recognize individuals who would be eligible to participate in a given clinical trial. The algorithm used to develop such an NLP model would use medical records and research papers as training data to be able to recognize medical terminology and synonyms, interpret the general context of a trial, generate a list of criteria for trial eligibility, and evaluate participants’ applications accordingly.

A team at Columbia University developed an open-source tool called DQueST which can read trials on ClinicalTrials.gov and then generate plain-English questions such as “What is your BMI?” to assess users’ eligibility. An initial evaluation revealed that after 50 questions, the tool could filter out 60–80% of trials that the user was not eligible for, with an accuracy of a little more than 60%.

15. Computational phenotyping

Phenotyping is the process of analyzing a patient’s physical or biochemical characteristics (phenotype) by relying on only genetic data from DNA sequencing or genotyping. Computational phenotyping uses structured data (EHR, diagnoses, medication prescriptions) and unstructured data (physicians’ vocal records which summarize patients’ medical history, immunizations, allergies, radiology images, and laboratory test results, as well as progress notes and discharge reports). Computational phenotyping enables patient diagnosis categorization, novel phenotype discovery, clinical trial screening, pharmacogenomics, drug-drug interaction (DDI), etc.

In this case, NLP is used for keyword search in rule-based systems which search for specific keywords (e.g. pneumonia in the right lower lobe) through the unstructured data, filter the noise, check for abbreviations or synonyms, and match the keyword to underlying event defined previously by rules.

16. Computer assisted coding (CAC)

Computer Assisted Coding (CAC) tools are a type of software that screens medical documentation and produces medical codes for specific phrases and terminologies within the document. NLP-based CACs screen can analyze and interpret unstructured healthcare data to extract features (e.g. medical facts) that support the codes assigned.

17. Clinical diagnosis

NLP is used to build medical models that can recognize disease criteria based on standard clinical terminology and medical word usage. IBM Waston, a cognitive NLP solution, has been used in MD Anderson Cancer Center to analyze patients’ EHR documents and suggest treatment recommendations and had 90% accuracy. However, Watson faced a challenge when deciphering physicians’ handwriting, and generated incorrect responses due to shorthand misinterpretations. According to project leaders, Watson could not reliably distinguish the acronym for Acute Lymphoblastic Leukemia “ALL” from the physician’s shorthand for allergy “ALL”.

18. Virtual therapists

Virtual therapists (therapist chatbots) are an application of conversational AI in healthcare . NLP is used to train the algorithm on mental health diseases and evidence-based guidelines, to deliver cognitive behavioral therapy (CBT) for patients with depression, post-traumatic stress disorder (PTSD), and anxiety. In addition, virtual therapists can be used to converse with autistic patients to improve their social skills and job interview skills. For example,  Woebot ,  which we listed among successful chatbots , provides CBT, mindfulness, and Dialectical Behavior Therapy (CBT).

Banking use cases

19. stock price prediction.

NLP is used in combination with KNN classification algorithms to assess real-time web-based financial news, to facilitate ‘news-based trading’, where analysts seek to isolate financial news that affects stock prices and market activity. To extract real-time web data, analysts can rely on:

  • Web scraping/crawling tools
  • Web scraping APIs

To learn how web scraping is used in finance, read In-Depth Guide to Web Scraping for Finance .

20. Credit scoring

Credit scoring is a statistical analysis performed by lenders, banks, and financial institutions to determine the creditworthiness of an individual or a business.

NLP can assist in credit scoring by extracting relevant data from unstructured documents such as loan documentation, income, investments, expenses, etc. and feed it to credit scoring software to determine the credit score.

In addition, modern credit scoring software utilizes NLP to extract information from personal profiles (e.g. social media accounts, mobile applications) and utilize machine learning algorithms to weigh these features and assess creditworthiness.

Conversational banking can also help credit scoring where conversational AI tools analyze answers of customers to specific questions regarding their risk attitudes.

Insurance use cases

21. insurance claims management.

NLP can be used in combination with OCR to analyze insurance claims. For example, IBM Watson has been used to comb through structured and unstructured text data to detect the right information to process insurance claims and feed it to an ML algorithm which labels the data according to the sections of the claim application form, and by the terminology that commonly is filled into it.

Finance department use cases

22. financial reporting.

NLP can be combined with machine learning algorithms to identify significant data in unstructured financial statements, invoices, or payment documentations, extract it, and feed it to an automation solution, such as an RPA bot utilized for reporting to generate financial reports .

23. Financial auditing

NLP enables the automation of financial auditing by:

  • Screening financial documents of an organization
  • Classifying financial statement content
  • And identifying document similarities and differences

In turn, this enables the detection of deviations and anomalies in financial statements.

24. Fraud detection

NLP can be combined with ML and predictive analytics to detect fraud and misinterpreted information from unstructured financial documents. For instance, a study revealed that NLP linguistic models were able to detect deceptive emails, which were identified by a “reduced frequency of first-person pronouns and exclusive words, and elevated frequency of negative emotion words and action verbs”. The researchers used an SVM classifier algorithm to analyze linguistic features of annual reports, including voice, active versus passive tone, and readability, detecting an association between these features and fraudulent financial statements.

HR use cases

25. resume evaluation.

NLP can be used in combination with classification machine learning algorithms to screen candidates’ resumes, extract relevant keywords (education, skills, previous roles), and classify candidates based on their profile match to a certain position in an organization. Additionally, NLP can be used to summarize resumes of candidates who match specific roles to help recruiters skim through resumes faster and focus on specific requirements of the job.

Image shows how NLP evaluates resumes.

26. Recruiting chatbot

Recruiting chatbots , also known as hiring assistants, are used to automate the communication between recruiters and candidates. Recruiting chatbots use NLP for:

  • Screening candidate resumes,
  • Scheduling interviews,
  • Answer candidates’ questions about the position,
  • Build candidate profiles,
  • Facilitating candidate onboarding .

27. Interview assessment

Many large enterprises, especially during the COVID-19 pandemic, are using interviewing platforms to conduct interviews with candidates. These platforms enable candidates to record videos, answer questions about the job, and upload files such as certificates or reference letters. NLP is particularly useful for interview platforms to analyze candidate sentiment, screen uploaded documentations, check for references, detect specific keywords which can reflect positive or negative behavior during the interview, as well as transcribe the video and summarize it for archiving purposes.

28. Employee sentiment analysis

NLP can be used to detect employees’ job satisfaction, motivation, friction areas, difficulties, as well as racial and sexual bias. NLP is used to screen feedback surveys, public emails, employee comments on social media and job employment websites, etc. This enables HR employees to better detect conflict areas, identify potential successful employees, recognize training requirements, keep employees engages, and optimize the work culture.

Feel free to read our article on HR technology trends to learn more about other technologies that shape the future of HR management.

Cybersecurity use cases

29. spam detection.

NLP models can be used for text classification in order to detect spam-related words, sentences, and sentiment in emails, text messages, and social media messaging applications. Spam detection NLP models typically follow these steps:

  • Data cleaning and preprocessing : removing filling and stop words.
  • Tokenization : sampling text into smaller sentences and paragraphs.
  • Part-of-speech (PoS) tagging : tagging a word in a sentence or paragraph to its corresponding part of a speech tag, based on its context and definition.

The processed data will be fed to a classification algorithm (e.g. decision tree, KNN, random forest) to classify the data into spam or ham (i.e. non-spam email).

Image shows the spam detection process of NLP models.

30. Data exfiltration prevention

Data exfiltration is a security breach that involves unauthorized data copying or transfer from one device to another. To exfiltrate data, attackers use cybersecurity techniques such as domain name system ( DNS ) tunneling (i.e. DNS queries which reflect a demand for information sent from a user’s computer (DNS client) to a DNS server) and phishing emails which lead users to provide hackers with personal information. NLP can be used to detect DNS queries, malicious language, and text anomalies to detect malware and prevent data exfiltration.

For more on NLP

To explore what natural language processing is, and what are its products, feel free to read our in-depth articles:

  • Top 5 Expectations Regarding the Future of NLP
  • In-Depth Guide to NLP: What it is, How it Works & Top Use Cases
  • Natural Language Understanding: in-Depth Guide
  • NLU vs NLP: Main Differences & Use Cases Comparison
  • Natural Language Generation (NLG): What it is & How it works

If you believe your business will benefit from a conversational AI solution, scroll down our data-driven lists of:

  • Chatbot platforms
  • Voice bot platforms
  • NLP service providers

And we can guide you through the process:

This article was drafted by former AIMultiple industry analyst Alamira Jouman Hajjar.

case study on nlp

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month. Cem's work has been cited by leading global publications including Business Insider , Forbes, Washington Post , global firms like Deloitte , HPE, NGOs like World Economic Forum and supranational organizations like European Commission . You can see more reputable companies and media that referenced AIMultiple. Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization. He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider . Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

To stay up-to-date on B2B tech & accelerate your enterprise:

Next to Read

Nlu vs nlp in 2024: main differences & use cases comparison, improve your nlp solutions with data augmentation in 2024, complete guide to nlp in 2024: how it works & top use cases.

Your email address will not be published. All fields are required.

Related research

Wu Dao 2.0 in 2024: China's Improved Version of GPT-3

Wu Dao 2.0 in 2024: China's Improved Version of GPT-3

LaMDA Google's Language Model: Could It Be Sentient? [2024]

LaMDA Google's Language Model: Could It Be Sentient? [2024]

Browse Course Material

Course info, instructors.

  • Dr. Richard Fletcher
  • Prof. Daniel Frey
  • Dr. Mike Teodorescu
  • Amit Gandhi
  • Audace Nakeshimana

Departments

  • Supplemental Resources

As Taught In

  • Artificial Intelligence
  • Curriculum and Teaching

Learning Resource Types

Exploring fairness in machine learning for international development, case study on natural language processing: identifying and mitigating unintended demographic bias in machine learning for nlp.

  • Download video
  • Download transcript

Case Study on Natural Language Processing: Identifying and Mitigating Unintended Demographic Bias in Machine Learning for NLP slides (PDF - 1.3MB)

Learning Objectives

  • Explore a case study on bias in NLP.
  • Demonstrate techniques to mitigate word embedding bias.

Natural language processing is used across multiple domains, including education, employment, social media, and marketing. There are many sources of unintended demographic bias in the NLP pipeline. The NLP pipeline is the collection of steps from collecting data to making decisions based on the model results.

Unintended demographic bias

Key definitions for this course:

  • Unintended: bias has an adverse side effect, but it is not deliberately learned.
  • Demographic: the bias is some form of inequality between demographic groups.
  • Bias: artifact of the NLP pipeline that causes unfairness.

There are two types of unintended demographic bias, sentiment bias and toxicity bias. Sentiment bias refers to an artifact of the ML pipeline that causes unfairness in sentiment analysis algorithms. Toxicity bias refers to an artifact of the ML pipeline that causes unfairness in toxicity prediction algorithms.

Whether a phrase is toxic or non-toxic can be determined by a single word, and specific nationalities or groups are often marginalized. For example, “I am American” may be classified as non-toxic where as “I am Mexican” may be classified as toxic.

Bias introduction

Bias introduction can occur at many phases in the NLP pipeline, including the word corpus, word embedding, dataset, ML algorithm, and decision steps. Possible unfairness can occur from the applying these results to society.

Measuring word embedding bias

Word embeddings encode text into vector spaces where the distance between words describes important semantic meaning. This allows for analogies such as man is to woman as to king is to queen. However, research shows that biases from word embeddings trained from Google News articles will complete the analogy man is to woman as to computer scientist is to homemaker.

A method to quantify word embedding bias is demonstrated in the slides. Biased word embeddings are used to initialize a set of unbiased labeled word sentiments. A logistic regression classifier is trained using this dataset and predicts negative sentiment for a set of identities, for example “American” or “Canadian.”

The probabilities for negative sentiment can be compared to a uniform distribution to generate a relative negative sentiment bias (RNSB) score.

Mitigating word embedding bias

Adversarial learning can be used to debias word embeddings. Different identity terms can be more or less correlated with negative or postitive sentiment. In datasets, “American,” “Mexican,” or “German” can have more correlations with negative sentiment subspaces, which can cause downstream unfairness. We would ideally have balanced correlations between positive and negative sentiment subspaces for each group to prevent any effects of bias. Adversarial learning algorithms can be used to mitigate these biases.

Key Takeaways

  • No silver bullet for NLP applications.
  • Bias can come from all stages of the ML pipeline and implementing mitigation strategies at each stage is essential to addressing bias.

 Contributions

Content presented by Audace Nakeshimana (MIT).

The research for this project was conducted in collaboration with Christopher Sweeney and Maryam Najafian (MIT). The content for this presentation was created by Audace Nakeshimana, Christopher Sweeney, and Maryam Najafian (MIT).

facebook

You are leaving MIT OpenCourseWare

OUR LATEST ARTICLES

Case Study: Using Natural Language Processing for Healthcare Summaries

Case Study: Using Natural Language Processing for Healthcare Summaries

A leading healthcare organization recently engaged manceps to help them bring machine learning solutions to their case preparation process.  , by using natural language processing and state-of-the-art language models to integrate their wealth of data into a scalable system, the company was able to automatically structure complex case files into single-page medical narratives.  , in most complex medical claims, insurers and patients have the right to request a medical review of prescribed treatment from an independent reviewer. our client is such a reviewer, acting as a mediator between payers and providers for medical necessity reviews and preauthorizations.  , once our client receives the details of the case, the organization must then validate (or overturn) the insurer's decision.  , validating treatment plans is just one of many ways that this organization helps at the intersection of the insurer, physician, and patient. in addition to providing an appeal mechanism, our client can also provide treatment pre-authorizations as outsourced by insurance providers.  , the problem, when a case is brought before this healthcare organization, it receives an upload through their application portal of hundreds — if not thousands — of pertinent medical document pages that it will need to interpret in order to render a verdict.   , for liability purposes, this information tends to be overwhelmingly comprehensive. not only will the organization receive information about the case, such as the patient’s medical records and test results but it will also receive documentation relating to the insurance company, its policies, and other extenuating details.  , further complicating matters, the information can come in a variety of formats such as printed text, scanned handwritten notes, images, and/or computer-generated ehr dumps, all of which can have inaccuracies or otherwise be incomplete.  , it is the job of our client and its clinical staff to transform this poorly-organized data into a decision — one that must be made quickly and accurately.  , our solution, manceps built a scalable, containerized data engineering system to structure their patents’ files through natural language processing (nlp) to summarize the case and drastically reduced the number of hours their in-house medial team had to spend evaluating case files.  , step 1: organize the data, our first step was to organize the crush of content they receive and convert it into a normalized, structured data set that our artificial intelligence system could eventually interpret.  , to do this, we built a service that extracts embedded and scanned language through digital extraction and ocr (optical character recognition), respectively, in order to process every word on every page into something that could be read, tagged, and understood by our ai system.  , during this process, we also built an exhaustive set of intelligent validators to guarantee the accuracy of the case materials, ensuring that all the records were accurately associated with the correct patient and the case at hand.   , step 2: add natural language processing capabilities, the core challenge of any nlp project is that people understand sequences of words while computers understand sequences of numbers. by translating words, sentences, and language into numbers — or vectors, as data scientists call them — computers are able to map the relationships words have to one another.  , these word relationships are the key to understanding language. only by associating the word leopard to the words “wild”, “cat”, and “spots” can humans begin to understand what a leopard is. it is in this way that natural language processing becomes natural language understanding. instead of associating the word “leopard” with the word “cat” in a holistic way, however, computers do this mathematically, converting words into a veritable constellation of numerical understanding.  , the most important part of any nlp implementation is finding the right language model for translating text into such vectors, while maintaining a common link between the two distinct entities.   , fortunately,  state-of-the-art pre-trained language models are available to perform these tasks with deep-learning-powered language processing.  , once we had built our data pipeline to properly extract and stream text, we were able to do two things with it: provide indexed text for dynamic end-user interaction and funnel language embeddings to power our ml models training and inference.  , this enabled our deep learning models to understand whether particular sections or sentences of the case file were relevant to the medical procedures under review. relevant information was then sent back and forth across the system to different stakeholders.  , by layering the language model onto our client’s data, our machine learning system could now understand the story of the case file and begin to summarize it.  , step 3: summarize the case, pragmatically speaking, using natural language processing to summarize dense text requires two steps. the first is to extract relevant information. the second is to rewrite that extracted information into a coherent narrative. because the source material was exceedingly long for this project, manceps performed multiple to produce the best results.  , pre-extraction., first, our system dug through the original case file and extracted the 500 most important sentences, based on the set priorities.  , extraction., at the extraction phase, our system then reduced the word count further. it chose 10 of the 500 sentences to serve as the most concise summary possible. in this case, we tuned the system to prioritize comprehensively capturing all information contained in the source material, even if that meant repeating information.  , generation., finally, once the system had reduced the case file down to a single page, we used natural language generation tools to rewrite those 10 sentences into a completely summarized, totally comprehensive narrative.  , our system has already saved this organization thousands of hours. by automatically organizing and summarizing case file information, its physicians are now able to quickly understand case elements so they can make informed, medically accurate, and timely determinations.  , for health care companies, the stakes of getting this right couldn’t be higher. if our system were to miss a crucial part of a patient’s case, the consequences could be serious. by trusting manceps to build this mission-critical system for them, this medical organization could serve more cases, more quickly, at a fraction of the cost..

317085810024352-photo-1580735995239-eab9cbde7ed6.jpeg

The Complete Guide to Bringing AI to Your Organization

Get notified when we publish a new story., our most recent articles, how to extract knowledge from documents with google palm 2 llm, devfest west coast 2020, video: machine learning engineering with tensorflow extended, video: how to build a reproducible ml pipeline.

30232092-r1005-5-15841303371315.jpg

50 AI Secrets: How Every Fortune 50 Company is Using AI Right Now

Natural language processing in healthcare

Suresh Rangasamy is a consultant in McKinsey’s New York office, Rosanne Nadenichek is a consultant in the Southern California office, Mahi Rayasam is an associate partner in the Detroit office, and Alex Sozdatelev is a partner in the Chicago office.

The authors would also like to thank Manuk Garg, Jessica Lamb, Vipul Khanna, Suman Mandal and Soumyodeep Mukherjee for their help and guidance in the preparation of this infographic.

Explore a career with us

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 08 April 2022

Natural language processing applied to mental illness detection: a narrative review

  • Tianlin Zhang   ORCID: orcid.org/0000-0003-0843-1916 1 ,
  • Annika M. Schoene 1 ,
  • Shaoxiong Ji   ORCID: orcid.org/0000-0003-3281-8002 2 &
  • Sophia Ananiadou 1 , 3  

npj Digital Medicine volume  5 , Article number:  46 ( 2022 ) Cite this article

43k Accesses

82 Citations

54 Altmetric

Metrics details

  • Disease prevention
  • Psychiatric disorders

Mental illness is highly prevalent nowadays, constituting a major cause of distress in people’s life with impact on society’s health and well-being. Mental illness is a complex multi-factorial disease associated with individual risk factors and a variety of socioeconomic, clinical associations. In order to capture these complex associations expressed in a wide variety of textual data, including social media posts, interviews, and clinical notes, natural language processing (NLP) methods demonstrate promising improvements to empower proactive mental healthcare and assist early diagnosis. We provide a narrative review of mental illness detection using NLP in the past decade, to understand methods, trends, challenges and future directions. A total of 399 studies from 10,467 records were included. The review reveals that there is an upward trend in mental illness detection NLP research. Deep learning methods receive more attention and perform better than traditional machine learning methods. We also provide some recommendations for future studies, including the development of novel detection methods, deep learning paradigms and interpretable models.

Similar content being viewed by others

case study on nlp

A visual-language foundation model for computational pathology

Ming Y. Lu, Bowen Chen, … Faisal Mahmood

case study on nlp

An overview of clinical decision support systems: benefits, risks, and strategies for success

Reed T. Sutton, David Pincock, … Karen I. Kroeker

case study on nlp

Adapted large language models can outperform medical experts in clinical text summarization

Dave Van Veen, Cara Van Uden, … Akshay S. Chaudhari

Introduction

Mental illnesses, also called mental health disorders, are highly prevalent worldwide, and have been one of the most serious public health concerns 1 . There are many different mental illnesses, including depression, suicidal ideation, bipolar disorder, autism spectrum disorder (ASD), anxiety disorder, schizophrenia, etc., any of which can have a negative influence on an individual’s physical health and well-being with the problem exacerbated due to Covid-19 2 . According to the latest statistics, millions of people worldwide suffer from one or more mental disorders 1 . If mental illness is detected at an early stage, it can be beneficial to overall disease progression and treatment.

There are different text types, in which people express their mood, such as social media messages on social media platforms, transcripts of interviews and clinical notes including the description of patients’ mental states. In recent years, natural language processing (NLP), a branch of artificial intelligence (AI) technologies, has played an essential role in supporting the analysis and management of large scale textual data and facilitating various tasks such as information extraction, sentiment analysis 3 , emotion detection, and mental health surveillance 4 , 5 , 6 . Detecting mental illness from text can be cast as a text classification or sentiment analysis task, where we can leverage NLP techniques to automatically identify early indicators of mental illness to support early detection, prevention and treatment.

Existing reviews introduce mainly the computational methods for mental health illness detection, they mostly focus on specific mental illnesses (suicide 7 , 8 , 9 , depression 10 , 11 , 12 ), or specific data sources (social media 13 , 14 , 15 , non-clinical texts 16 ). To the best of our knowledge, there is no review of NLP techniques applied to mental illness detection from textual sources recently. We present a broader scope of mental illness detection using NLP that covers a decade of research, different types of mental illness and a variety of data sources. Our review aims to provide a comprehensive overview of the latest trends and recent NLP methodologies used for text-based mental illness detection, and also points at the future challenges and directions. Our review seeks to answer the following questions:

What are the main NLP trends and approaches for mental illness detection?

Which features have been used for mental health detection in traditional machine learning-based models?

Which neural architectures have been commonly used to detect mental illness?

What are the main challenges and future directions in NLP for mental illness?

Search methodology

Search strategy.

A comprehensive search was conducted in multiple scientific databases for articles written in English and published between January 2012 and December 2021. The databases include PubMed, Scopus, Web of Science, DBLP computer science bibliography, IEEE Xplore, and ACM Digital Library.

The search query we used was based on four sets of keywords shown in Table 1 . For mental illness, 15 terms were identified, related to general terms for mental health and disorders (e.g., mental disorder and mental health), and common specific mental illnesses (e.g., depression, suicide, anxiety). For data source, we searched for general terms about text types (e.g., social media, text, and notes) as well as for names of popular social media platforms, including Twitter and Reddit. The methods and detection sets refer to NLP methods used for mental illness identification.

The keywords of each sets were combined using Boolean operator “OR", and the four sets were combined using Boolean operator “AND". We conducted the searches in December 2021.

Filtering strategy

A total of 10,467 bibliographic records were retrieved from six databases, of which 7536 records were retained after removing duplication. Then, we used RobotAnalyst 17 , a tool that minimizes the human workload involved in the screening phase of reviews, by prioritizing the most relevant articles for mental illness based on relevancy feedback and active learning 18 , 19 .

Each of the 7536 records was screened based on title and abstract. Records were removed if the following exclusion criteria were met: (1) the full text was not available in English; (2) the abstract was not relevant to mental illness detection; (3) the method did not use textual experimental data, but speech or image data.

After the screening process, 611 records were retained for further review. An additional manual full-text review was conducted to retain only articles focusing on the description of NLP methods only. The final inclusion criteria were established as follow:

Articles must study textual data such as contents from social media, electronic health records or transcription of interviews.

They must focus on NLP methods for mental illness detection, including machine learning-based methods (in this paper, the machine learning methods refer to traditional feature engineering-based machine learning) and deep learning-based methods. We exclude review and data analysis papers.

They must provide a methodology contribution by (1) proposing a new feature extraction method, a neural architecture, or a novel NLP pipeline; or (2) applying the learning methods to a specific mental health detection domain or task.

Following the full-text screening process, 399 articles were selected. The flow diagram of the article selection process is shown in Fig. 1 .

figure 1

Six databases (PubMed, Scopus, Web of Science, DBLP computer science bibliography, IEEE Xplore, and ACM Digital Library) were searched. The flowchart lists reasons for excluding the study from the data extraction and quality assessment.

Data extraction

For each selected article, we extracted the following types of metadata and other information:

Year of publication.

The aim of research.

The dataset used, including type of mental illness (e.g., depression, suicide, and eating disorder), language, and data sources (e.g., Twitter, electronic health records (EHRs) and interviews).

The NLP method (e.g., machine learning and deep learning) and types of features used (e.g., semantic, syntactic, and topic).

We show in Fig. 2 the number of publications retrieved and the methods used in our review, reflecting the trends of the past 10 years. We can observe that: (1) there is an upward trend in NLP-driven mental illness detection research, suggesting the great research value and prospects for automatic mental illness detection from text (2) deep learning-based methods have increased in popularity in the last couple of years.

figure 2

The trend of the number of articles containing machine learning-based and deep learning-based methods for detecting mental illness from 2012 to 2021.

In the following subsections, we provide an overview of the datasets and the methods used. In section Datesets, we introduce the different types of datasets, which include different mental illness applications, languages and sources. Section NLP methods used to extract data provides an overview of the approaches and summarizes the features for NLP development.

In order to better train mental illness detection models, reliable and accurate datasets are necessary. There are several sources from which we can collect text data related to mental health, including social media posts, screening surveys, narrative writing, interviews and EHRs. At the same time, for different detection tasks, the datasets also differ in the types of illness they focus on and language. We show a comprehensive mapping of each method with its associated application using a Sankey diagram (Fig. 3 ).

figure 3

The different methods with their associated application are represented via flows. Nodes are represented as rectangles, and the height represents their value. The width of each curved line is proportional to their values.

Data sources

Figure 4 illustrates the distribution of the different data sources. It can be seen that, among the 399 reviewed papers, social media posts (81%) constitute the majority of sources, followed by interviews (7%), EHRs (6%), screening surveys (4%), and narrative writing (2%).

figure 4

The pie chart depicts the percentages of different textual data sources based on their numbers.

Social media posts

The use of social media has become increasingly popular for people to express their emotions and thoughts 20 . In addition, people with mental illness often share their mental states or discuss mental health issues with others through these platforms by posting text messages, photos, videos and other links. Prominent social media platforms are Twitter, Reddit, Tumblr, Chinese microblogs, and other online forums. We briefly introduce some popular social media platforms.

Twitter. Twitter is a popular social networking service with over 300 million active users monthly, in which users can post their tweets (the posts on Twitter) or retweet others’ posts. Researchers can collect tweets using available Twitter application programming interfaces (API). For example, Sinha et al. created a manually annotated dataset to identify suicidal ideation in Twitter 21 . Hu et al. used a rule-based approach to label users’ depression status from the Twitter 22 . However, normally Twitter does not allow the texts of downloaded tweets to be publicly shared, only the tweet identifiers—some/many of which may then disappear over time, so many datasets of actual tweets are not made publicly available 23 .

Reddit . Reddit is also a popular social media platform for publishing posts and comments. The difference between Reddit and other data sources is that posts are grouped into different subreddits according to the topics (i.e., depression and suicide). Because of Reddit’s open policy, their datasets are publicly available. Yates et al. established a depression dataset named “Reddit Self-reported Depression Diagnosis" (RSDD) 24 , which contains about 9k depressed users and 100k control users. Similarly, CLEF risk 2019 shared task 25 also proposed an anorexia and self-harm detection task based on the Reddit platform.

Online forums. People can discuss their mental health conditions and seek mental help from online forums (also called online communities). There are various forms of online forums, such as chat rooms, discussion rooms (recoveryourlife, endthislife). For example, Saleem et al. designed a psychological distress detection model on 512 discussion threads downloaded from an online forum for veterans 26 . Franz et al. used the text data from TeenHelp.org, an Internet support forum, to train a self-harm detection system 27 .

Electronic health records

EHRs, a rich source of secondary health care data, have been widely used to document patients’ historical medical records 28 . EHRs often contain several different data types, including patients’ profile information, medications, diagnosis history, images. In addition, most EHRs related to mental illness include clinical notes written in narrative form 29 . Therefore, it is appropriate to use NLP techniques to assist in disease diagnosis on EHRs datasets, such as suicide screening 30 , depressive disorder identification 31 , and mental condition prediction 32 .

Some work has been carried out to detect mental illness by interviewing users and then analyzing the linguistic information extracted from transcribed clinical interviews 33 , 34 . The main datasets include the DAIC-WoZ depression database 35 that involves transcriptions of 142 participants, the AViD-Corpus 36 with 48 participants, and the schizophrenic identification corpus 37 collected from 109 participants.

Screening surveys

In order to evaluate participants’ mental health conditions, some researchers post questionnaires for clinician-patient diagnosis of patients or self-measurement. After participants are asked to fill in a survey from crowd-sourcing platforms (like Crowd Flower, Amazon’s Mechanical Turk) or online platforms, the data is collected and labeled. There are different survey contents to measure different psychiatric symptoms. For depression, the PHQ-9 (Patient Health Questionnaire) 38 or Beck Depression Inventory (BDI) questionnaire 39 are widely used for assessing the severity of depressive symptoms. The Scale Center for Epidemiological Studies Depression Scale (CES-D) questionnaire 40 with 20 multiple-choice questions is also designed for testing depression. For suicide ideation, there are many questionnaires such as the Holmes-Rahe Social Readjustment Rating Scale (SRRS) 41 or the Depressive Symptom Inventory-Suicide Subscale (DSI-SS) 42 .

Narrative writing

There are other types of texts written for specific experiments, as well as narrative texts that are not published on social media platforms, which we classify as narrative writing. For example, in one study, children were asked to write a story about a time that they had a problem or fought with other people, where researchers then analyzed their personal narrative to detect ASD 43 . In addition, a case study on Greek poetry of the 20th century was carried out for predicting suicidal tendencies 44 .

Types of mental illness

There are many applications for the detection of different types of mental illness, where depression (45%) and suicide (20%) account for the largest proportion; stress, anorexia, eating disorders, PTSD, bipolar disorder, anxiety, ASD and schizophrenia have corresponding datasets and have been analyzed using NLP (Fig. 5 ). This shows that there is a demand for NLP technology in different mental illness detection applications.

figure 5

The chart depicts the percentages of different mental illness types based on their numbers.

The amount of datasets in English dominates (81%), followed by datasets in Chinese (10%), Arabic (1.5%). When using non-English language datasets, the main difference lies in the pre-processing pipline, such as word segmentation, sentence splitting and other language-dependent text processing, while the methods and model architectures are language-agnostic.

NLP methods used to extract data

Machine learning methods.

Traditional machine learning methods such as support vector machine (SVM), Adaptive Boosting (AdaBoost), Decision Trees, etc. have been used for NLP downstream tasks. Figure 3 shows that 59% of the methods used for mental illness detection are based on traditional machine learning, typically following a pipeline approach of data pre-processing, feature extraction, modeling, optimization, and evaluation.

In order to train a good ML model, it is important to select the main contributing features, which also help us to find the key predictors of illness. Table 2 shows an overview of commonly used features in machine learning. We further classify these features into linguistic features, statistical features, domain knowledge features, and other auxiliary features. The most frequently used features are mainly based on basic linguistic patterns (Part-of-Speech (POS) 45 , 46 , 47 , Bag-of-words (BoW) 48 , 49 , 50 , Linguistic Inquiry and Word Count (LIWC) 51 , 52 , 53 ) and statistics (n-gram 54 , 55 , 56 , term frequency-inverse document frequency (TF-IDF) 57 , 58 , 59 , length of sentences or passages 60 , 61 , 62 ) because these features can be easily obtained through text processing tools and are widely used in many NLP tasks. Furthermore, emotion and topic features have been shown empirically to be effective for mental illness detection 63 , 64 , 65 . Domain specific ontologies, dictionaries and social attributes in social networks also have the potential to improve accuracy 65 , 66 , 67 , 68 . Research conducted on social media data often leverages other auxiliary features to aid detection, such as social behavioral features 65 , 69 , user’s profile 70 , 71 , or time features 72 , 73 .

Machine learning models have been designed based on a combination of various extracted features. The majority of the papers based on machine learning methods used supervised learning, where they described one or more methods employed to detect mental illness: SVM 26 , 74 , 75 , 76 , 77 , Adaptive Boosting (AdaBoost) 71 , 78 , 79 , 80 , k-Nearest Neighbors (KNN) 38 , 81 , 82 , 83 , Decision Tree 84 , 85 , 86 , 87 , Random Forest 75 , 88 , 89 , 90 , Logistic Model Tree (LMT) 47 , 47 , 91 , 92 , Naive Bayes (NB) 64 , 86 , 93 , 94 , Logistic Regression 37 , 95 , 96 , 97 , XGBoost 38 , 55 , 98 , 99 , and some ensemble models combining several methods 75 , 100 , 101 , 102 . The advantage of such supervised learning lies in the model’s ability to learn patterns from labeled data, thus ensuring better performance. However, labeling the large amount of data at a high quality level is time-consuming and challenging, although there are methods that help reduce the human annotation burden 103 . Thus, we need to use other methods which do not rely on labeled data or need only a small amount of data to train a classifier.

Unsupervised learning methods to discover patterns from unlabeled data, such as clustering data 55 , 104 , 105 , or by using LDA topic model 27 . However, in most cases, we can apply these unsupervised models to extract additional features for developing supervised learning classifiers 56 , 85 , 106 , 107 . Across all papers, few papers 108 , 109 used semi-supervised learning (models trained from large unlabeled data as additional information), including statistical model ssToT (semi-supervised topic modeling over time) 108 and classic semi-supervised algorithms (YATSI 110 and LLGC 111 ).

Deep learning methods

As mentioned above, machine learning-based models rely heavily on feature engineering and feature extraction. Using deep learning frameworks allows models to capture valuable features automatically without feature engineering, which helps achieve notable improvements 112 . Advances in deep learning methods have brought breakthroughs in many fields including computer vision 113 , NLP 114 , and signal processing 115 . For the task of mental illness detection from text, deep learning techniques have recently attracted more attention and shown better performance compared to machine learning ones 116 .

Deep learning-based frameworks mainly contain two layers: an embedding layer and a classification layer. By using an embedding layer, the inputs are embedded from sparse one-hot encoded vectors (where only one member of a vector is ‘1’, all others are ‘0’, leading to the sparsity) into dense vectors which can preserve semantic and syntactic information such that deep learning models can be better trained 117 . There are many different embedding techniques, such as ELMo, GloVe word embedding 118 , word2vec 119 and contextual language encoder representations (e.g., bidirectional encoder representations from transformers (BERT) 120 and ALBERT[ 121 ).

According to the structures of different classification layer’s structures, we have divided the deep learning-based methods into the following categories for this review: convolutional neural networks (CNN)-based methods (17%), recurrent neural networks (RNN)-based methods (36%), transformer-based methods (17%) and hybrid-based methods (30%) that combine multiple neural networks with different structures, as shown in Table 3 .

CNN-based methods. The standard CNN structure is composed of a convolutional layer and a pooling layer, followed by a fully-connected layer. Some studies 122 , 123 , 124 , 125 , 126 , 127 utilized standard CNN to construct classification models, and combined other features such as LIWC, TF-IDF, BOW, and POS. In order to capture sentiment information, Rao et al. proposed a hierarchical MGL-CNN model based on CNN 128 . Lin et al. designed a CNN framework combined with a graph model to leverage tweet content and social interaction information 129 .

RNN-based methods . The architecture of RNNs allows previous outputs to be used as inputs, which is beneficial when using sequential data such as text. Generally, long short-term memory (LSTM) 130 and gated recurrent (GRU) 131 networks models that can deal with the vanishing gradient problem 132 of the traditional RNN are effectively used in NLP field. There are many studies (e.g., 133 , 134 ) based on LSTM or GRU, and some of them 135 , 136 exploited an attention mechanism 137 to find significant word information from text. Some also used a hierarchical attention network based on LSTM or GRU structure to better exploit the different-level semantic information 138 , 139 .

Moreover, many other deep learning strategies are introduced, including transfer learning, multi-task learning, reinforcement learning and multiple instance learning (MIL). Rutowski et al. made use of transfer learning to pre-train a model on an open dataset, and the results illustrated the effectiveness of pre-training 140 , 141 . Ghosh et al. developed a deep multi-task method 142 that modeled emotion recognition as a primary task and depression detection as a secondary task. The experimental results showed that multi-task frameworks can improve the performance of all tasks when jointly learning. Reinforcement learning was also used in depression detection 143 , 144 to enable the model to pay more attention to useful information rather than noisy data by selecting indicator posts. MIL is a machine learning paradigm, which aims to learn features from bags’ labels of the training set instead of individual labels. Wongkoblap et al. used MIL to predict users with depression task 145 , 146 .

Transformer-based methods. Recently, transformer architectures 147 were able to solve long-range dependencies using attention and recurrence. Wang et al. proposed the C-Attention network 148 by using a transformer encoder block with multi-head self-attention and convolution processing. Zhang et al. also presented their TransformerRNN with multi-head self-attention 149 . Additionally, many researchers leveraged transformer-based pre-trained language representation models, including BERT 150 , 151 , DistilBERT 152 , Roberta 153 , ALBERT 150 , BioClinical BERT for clinical notes 31 , XLNET 154 , and GPT model 155 . The usage and development of these BERT-based models prove the potential value of large-scale pre-training models in the application of mental illness detection.

Hybrid-based methods. Some methods combining several neural networks for mental illness detection have been used. For examples, the hybrid frameworks of CNN and LSTM models 156 , 157 , 158 , 159 , 160 are able to obtain both local features and long-dependency features, which outperform the individual CNN or LSTM classifiers used individually. Sawhney et al. proposed STATENet 161 , a time-aware model, which contains an individual tweet transformer and a Plutchik-based emotion 162 transformer to jointly learn the linguistic and emotional patterns. Inspired by the improved performance of using sub-emotions representations 163 , Aragon et al. presented a deep emotion attention model 164 which consists of sub-emotion embedding, CNN, GRU as well as an attention mechanism, and Lara et al. also proposed Deep Bag of Sub-Emotions (DeepBose) model 165 . Furthermore, Sawhney et al. introduced the PHASE model 166 , which learns the chronological emotional progression of a user by a new time-sensitive emotion LSTM and also Hyperbolic Graph Convolution Networks 167 . It also learns the chronological emotional spectrum of a user by using BERT fine-tuned for emotions as well as a heterogeneous social network graph.

Evaluation metrics

Evaluation metrics are used to compare the performance of different models for mental illness detection tasks. Some tasks can be regarded as a classification problem, thus the most widely used standard evaluation metrics are Accuracy (AC), Precision (P), Recall (R), and F1-score (F1) 149 , 168 , 169 , 170 . Similarly, the area under the ROC curve (AUC-ROC) 60 , 171 , 172 is also used as a classification metric which can measure the true positive rate and false positive rate. In some studies, they can not only detect mental illness, but also score its severity 122 , 139 , 155 , 173 . Therefore, metrics of mean error (e.g., mean absolute error, mean square error, root mean squared error) 173 and other new metrics (e.g., graded precision, graded recall, average hit rate, average closeness rate, average difference between overall depression levels) 139 , 174 are sometimes needed to indicate the difference between the predicted severity and the actual severity in a dataset. Meanwhile, taking into account the timeliness of mental illness detection, where early detection is significant for early prevention, an error metric called early risk detection error was proposed 175 to measure the delay in decision.

Although promising results have been obtained using both machine and deep learning methods, several challenges remain for the mental illness detection task that require further research. Herein, we introduce some key challenges and future research directions:

Data volume and quality: Most of the methods covered in this review used supervised learning models. The success of these methods is attributed to the number of training datasets available. These training datasets often require human annotation, which is usually a time-consuming and expensive process. However, in the mental illness detection task, there are not enough annotated public datasets. For training reliable models, the quality of datasets is concerning. Some datasets have annotation bias because the annotators can not confirm a definitive action has taken place in relation to a disorder (e.g., if actual suicide has occurred) and can only label them within the constraints of their predefined annotation rules 9 . In addition, some imbalanced datasets have many negative instances (individuals without mental disorders), which is not conducive to training comprehensive and robust models. Therefore, it is important to explore how to train a detection model by using a small quantity of labeled training data or not using training data. Semi-supervised learning 176 incorporates few labeled data and large amounts of unlabeled data into the training process, which can be used to facilitate annotation 177 or improve classification performance when labeled data is scarce. Additionally, unsupervised methods can also be applied in mental disorders detection. For instance, unsupervised topic modeling 178 increases the explainability of results and aids the extraction of latent features for developing further supervised models. 179 , 180

Performance and instability: There are some causes of model instability, including class imbalance, noisy labels, and extremely long or extremely short text samples text. Performance is not robust when training on the datasets from different data sources due to diverse writing styles and semantic heterogeneity. Thus, the performance of some detection models is not good. With the advances of deep learning techniques, various learning techniques have emerged and accelerated NLP research, such as adversarial training 181 , contrastive learning 182 , joint learning 183 , reinforcement learning 184 and transfer learning 185 , which can also be utilized for the mental illness detection task. For example, pre-trained Transformer-based models can be transferred to anorexia detection in Spanish 186 , and reinforcement networks can be used to find the sentence that best reflects the mental state. Other emerging techniques like attention mechanism 187 , knowledge graph 188 , and commonsense reasoning 189 , can also be introduced for textual feature extraction. In addition, feature enrichment and data augmentation 190 are useful to achieve comparable results. For example, many studies use multi-modal data resources, such as image 191 , 192 , 193 , and audio 194 , 195 , 196 , which perform better than the single-modal text-based model.

Interpretability: The goal of representation learning for mental health is to understand the causes or explanatory factors of mental illness in order to boost detection performance and empower decision-making. The evaluation of a successful model does not only rely on performance, but also on its interpretability 197 , which is significant for guiding clinicians to understand not only what has been extracted from text but the reasoning underlying some prediction 198 , 199 , 200 . Deep learning-based methods achieve good performance by utilizing feature extraction and complex neural network structures for illness detection. Nevertheless, they are still treated as black boxes 201 and fail to explain the predictions. Therefore, in future work, the explainability of the deep learning models will become an important research direction.

Ethical considerations: It is of greater importance to discuss ethical concerns when using mental health-related textual data, since the privacy and security of personal data is significant and health data is particularly sensitive. During the research, the researchers should follow strict protocols similar to the guidelines 202 introduced by Bentan et al., to ensure the data is properly applied in healthcare research while protecting privacy to avoid further psychological distress. Furthermore, when using some publicly available data, researchers need to acquire ethical approvals from institutional review boards and human research ethics committees 203 , 204 .

There has been growing research interest in the detection of mental illness from text. Early detection of mental disorders is an important and effective way to improve mental health diagnosis. In our review, we report the latest research trends, cover different data sources and illness types, and summarize existing machine learning methods and deep learning methods used on this task.

We find that there are many applications for different data sources, mental illnesses, even languages, which shows the importance and value of the task. Our findings also indicate that deep learning methods now receive more attention and perform better than traditional machine learning methods.

We discuss some challenges and propose some future directions. In the future, the development of new methods including different learning strategies, novel deep learning paradigms, interpretable models and multi-modal methods will support mental illness detection, with an emphasis on interpretability being crucial for uptake of detection applications by clinicians.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Rehm, J. & Shield, K. D. Global burden of disease and the impact of mental and addictive disorders. Curr. Psychiatry Rep. 21 , 1–7 (2019).

Article   Google Scholar  

Santomauro, D. F. et al. Global prevalence and burden of depressive and anxiety disorders in 204 countries and territories in 2020 due to the covid-19 pandemic. The Lancet 398 , 1700–1712 (2021).

Nadkarni, P. M., Ohno-Machado, L. & Chapman, W. W. Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18 , 544–551 (2011).

Article   PubMed   PubMed Central   Google Scholar  

Ive, J. Generation and evaluation of artificial mental health records for natural language processing. NPJ Digital Med. 3 , 1–9 (2020).

Mukherjee, S. S. et al. Natural language processing-based quantification of the mental state of psychiatric patients. Comput. Psychiatry 4 , 76–106 (2020).

Jackson, R. G. Natural language processing to extract symptoms of severe mental illness from clinical text: the clinical record interactive search comprehensive data extraction (cris-code) project. BMJ Open 7 , 012012 (2017).

Castillo-Sánchez, G. Suicide risk assessment using machine learning and social networks: a scoping review. J. Med. Syst. 44 , 1–15 (2020).

Franco-Martín, M. A. A systematic literature review of technologies for suicidal behavior prevention. J. Med. Syst. 42 , 1–7 (2018).

Ji, S. Suicidal ideation detection: a review of machine learning methods and applications. IEEE Trans. Comput. Soc. Syst. 8 , 214–226 (2021).

Giuntini, F. T. A review on recognizing depression in social networks: challenges and opportunities. J. Ambient Intell. Human. Comput. 11 , 4713–4729 (2020).

Mahdy, N., Magdi, D. A., Dahroug, A. & Rizka, M. A. Comparative study: different techniques to detect depression using social media. in Internet of Things-Applications and Future, pp. 441–452 (2020).

Khan, A., Husain, M. S. & Khan, A. Analysis of mental state of users using social media to predict depression! a survey. Int. J. Adv. Res. Comput. Sci. 9 , 100–106 (2018).

Google Scholar  

Skaik, R. & Inkpen, D. Using social media for mental health surveillance: a review. ACM Comput. Surv. 53 , 1–31 (2020).

Chancellor, S. & De Choudhury, M. Methods in predictive techniques for mental health status on social media: a critical review. NPJ Digital Med. 3 , 1–11 (2020).

Ríssola, E. A., Losada, D. E. & Crestani, F. A survey of computational methods for online mental state assessment on social media. ACM Trans. Comput. Healthc. 2 , 1–31 (2021).

Calvo, R. A., Milne, D. N., Hussain, M. S. & Christensen, H. Natural language processing in mental health applications using non-clinical texts. Nat. Lang. Eng. 23 , 649–685 (2017).

Przybyła, P. Prioritising references for systematic reviews with robotanalyst: a user study. Res. Synth. Methods 9 , 470–488 (2018).

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. & Ananiadou, S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4 , 1–22 (2015).

Miwa, M., Thomas, J., O’Mara-Eves, A. & Ananiadou, S. Reducing systematic review workload through certainty-based screening. J. Biomed. Inform. 51 , 242–253 (2014).

Kemp, S. Digital 2020: 3.8 billion people use social media. We Are Social 30 , (2020). https://wearesocial.com/uk/blog/2020/01/digital-2020-3-8-billion-people-use-social-media/ .

Sinha, P. P. et al. Suicidal-a multipronged approach to identify and explore suicidal ideation in twitter. In Proc. 28th ACM International Conference on Information and Knowledge Management , pp. 941–950 (2019).

Hu, P. et al. Bluememo: depression analysis through twitter posts. In IJCAI , pp. 5252–5254 (2020).

Golder, S., Ahmed, S., Norman, G. & Booth, A. Attitudes toward the ethics of research using social media: a systematic review. J. Med. Internet Res. 19 , 7082 (2017).

Yates, A., Cohan, A. & Goharian, N. Depression and self-harm risk assessment in online forums. In Proc. 2017 Conference on Empirical Methods in Natural Language Processing (2017).

Naderi, N., Gobeill, J., Teodoro, D., Pasche, E. & Ruch, P. A baseline approach for early detection of signs of anorexia and self-harm in reddit posts. In CLEF (Working Notes) (2019).

Saleem, S. et al. Automatic detection of psychological distress indicators in online forum posts. In Proc. 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference , pp. 1–4 (2012).

Franz, P. J., Nook, E. C., Mair, P. & Nock, M. K. Using topic modeling to detect and describe self-injurious and related content on a large-scale digital platform. Suicide Life Threat. Behav. 50 , 5–18 (2020).

Article   PubMed   Google Scholar  

Menachemi, N. & Collum, T. H. Benefits and drawbacks of electronic health record systems. Risk Manag. Healthc. Policy 4 , 47 (2011).

Kho, A. N. Practical challenges in integrating genomic data into the electronic health record. Genet. Med. 15 , 772–778 (2013).

Downs, J. et al. Detection of suicidality in adolescents with autism spectrum disorders: developing a natural language processing approach for use in electronic health records. In AMIA Annual Symposium Proceedings , vol. 2017, p. 641 (2017).

Kshatriya, B. S. A. et al. Neural language models with distant supervision to identify major depressive disorder from clinical notes. Preprint at arXiv https://arxiv.org/abs/2104.09644 (2021).

Tran, T. & Kavuluru, R. Predicting mental conditions based on “history of present illness" in psychiatric notes with deep neural networks. J. Biomed. Inform. 75 , 138–148 (2017).

Morales, M. R. & Levitan, R. Speech vs. text: a comparative analysis of features for depression detection systems. In 2016 IEEE Spoken Language Technology Workshop (SLT) , pp. 136–143 (2016).

Arseniev-Koehler, A., Mozgai, S. & Scherer, S. What type of happiness are you looking for?-a closer look at detecting mental health from language. In Proc. Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic , pp. 1–12 (2018).

Ringeval, F. et al. Avec 2017: real-life depression, and affect recognition workshop and challenge. In Proc. 7th Annual Workshop on Audio/Visual Emotion Challenge , pp. 3–9 (2017).

Valstar, M. et al. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proc. 4th International Workshop on Audio/visual Emotion Challenge , pp. 3–10 (2014).

Voleti, R. et al. Objective assessment of social skills using automated language analysis for identification of schizophrenia and bipolar disorder. In Proc. Interspeech , pp. 1433–1437 (2019).

Tlachac, M., Toto, E. & Rundensteiner, E. You’re making me depressed: Leveraging texts from contact subsets to predict depression. In 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) , pp. 1–4 (2019).

Stankevich, M., Smirnov, I., Kiselnikova, N. & Ushakova, A. Depression detection from social media profiles. In International Conference on Data Analytics and Management in Data Intensive Domains , pp. 181–194 (2019).

Wongkoblap, A., Vadillo, M. A. & Curcin, V. A multilevel predictive model for detecting social network users with depression. In 2018 IEEE International Conference on Healthcare Informatics (ICHI) , pp. 130–135 (2018).

Delgado-Gomez, D., Blasco-Fontecilla, H., Sukno, F., Ramos-Plasencia, M. S. & Baca-Garcia, E. Suicide attempters classification: toward predictive models of suicidal behavior. Neurocomputing 92 , 3–8 (2012).

von Glischinski, M., Teismann, T., Prinz, S., Gebauer, J. E. & Hirschfeld, G. Depressive symptom inventory suicidality subscale: optimal cut points for clinical and non-clinical samples. Clin. Psychol. Psychother. 23 , 543–549 (2016).

Hilvert, E., Davidson, D. & Gámez, P. B. Assessment of personal narrative writing in children with and without autism spectrum disorder. Res. Autism Spectr. Disord. 69 , 101453 (2020).

Zervopoulos, A. D. et al. Language processing for predicting suicidal tendencies: a case study in greek poetry. In IFIP International Conference on Artificial Intelligence Applications and Innovations , pp. 173–183 (2019).

Birjali, M., Beni-Hssane, A. & Erritali, M. Machine learning and semantic sentiment analysis based algorithms for suicide sentiment prediction in social networks. Proc. Computer. Sci. 113 , 65–72 (2017).

Trifan, A., Antunes, R., Matos, S. & Oliveira, J. L. Understanding depression from psycholinguistic patterns in social media texts. Adv. Inf. Retr. 12036 , 402 (2020).

Briand, A., Almeida, H. & Meurs, M. -J. Analysis of social media posts for early detection of mental health conditions. In Canadian Conference on Artificial Intelligence , pp. 133–143 (2018).

Trifan, A. & Oliveira, J. L. Bioinfo@ uavr at erisk 2019: delving into social media texts for the early detection of mental and food disorders. In CLEF (Working Notes) (2019).

Lin, W., Ji, D. & Lu, Y. Disorder recognition in clinical texts using multi-label structured svm. BMC Bioinform. 18 , 1–11 (2017).

Chomutare, T. Text classification to automatically identify online patients vulnerable to depression. In International Symposium on Pervasive Computing Paradigms for Mental Health , pp. 125–130 (2014).

Islam, M. R. Depression detection from social network data using machine learning techniques. Health Inf. Sci. Syst. 6 , 1–12 (2018).

Su, Y., Zheng, H., Liu, X. & Zhu, T. Depressive emotion recognition based on behavioral data. In International Conference on Human Centered Computing , pp. 257–268 (2018).

Simms, T. et al. Detecting cognitive distortions through machine learning text analytics. In 2017 IEEE International Conference on Healthcare Informatics (ICHI) , pp. 508–512 (2017).

He, Q., Veldkamp, B. P., Glas, C. A. & de Vries, T. Automated assessment of patients’ self-narratives for posttraumatic stress disorder screening using natural language processing and text mining. Assessment 24 , 157–172 (2017).

Shickel, B., Siegel, S., Heesacker, M., Benton, S. & Rashidi, P. Automatic detection and classification of cognitive distortions in mental health text. In 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE) , pp. 275–280 (2020).

Guntuku, S. C., Giorgi, S. & Ungar, L. Current and future psychological health prediction using language and socio-demographics of children for the clpysch 2018 shared task. In Proc. Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic , pp. 98–106 (2018).

Stankevich, M., Isakov, V., Devyatkin, D. & Smirnov, I. V. Feature engineering for depression detection in social media. In ICPRAM , pp. 426–431 (2018).

Boag, W. Hard for humans, hard for machines: predicting readmission after psychiatric hospitalization using narrative notes. Transl. Psychiatry 11 , 1–6 (2021).

Adamou, M. et al. Mining free-text medical notes for suicide risk assessment. In Proc. 10th Hellenic Conference on Artificial Intelligence , pp. 1–8 (2018).

Saleem, S. et al. Automatic detection of psychological distress indicators and severity assessment from online forum posts. In Proc. COLING 2012 , pp. 2375–2388 (2012).

Trifan, A. & Oliveira, J. L. Cross-evaluation of social mining for classification of depressed online personas. J. Integr. Bioinform . (2021)

Balani, S. & De Choudhury, M. Detecting and characterizing mental health related self-disclosure in social media. In Proc. 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems , pp. 1373–1378 (2015).

Delahunty, F., Wood, I. D. & Arcan, M. First insights on a passive major depressive disorder prediction system with incorporated conversational chatbot. In Irish Conference on Artificial Intelligence and Cognitive Science (2018).

Deshpande, M. & Rao, V. Depression detection using emotion artificial intelligence. In 2017 International Conference on Intelligent Sustainable Systems (iciss) , pp. 858–862 (2017).

Hwang, Y., Kim, H. J., Choi, H. J. & Lee, J. Exploring abnormal behavior patterns of online users with emotional eating behavior: topic modeling study. J. Med. Internet Res. 22 , 15700 (2020).

Alam, M. A. U. & Kapadia, D. Laxary: a trustworthy explainable twitter analysis model for post-traumatic stress disorder assessment. In 2020 IEEE International Conference on Smart Computing (SMARTCOMP) , pp. 308–313 (2020).

Plaza-del Arco, F. M., López-Úbeda, P., Diaz-Galiano, M. C., Urena-López, L. A. & Martin-Valdivia, M.-T. Integrating Umls for Early Detection of Sings of Anorexia . (Universidad de Jaen, Campus Las Lagunillas: Jaen, Spain, 2019).

Dao, B., Nguyen, T., Phung, D. & Venkatesh, S. Effect of mood, social connectivity and age in online depression community via topic and linguistic analysis. In International Conference on Web Information Systems Engineering , pp. 398–407 (2014).

Katchapakirin, K., Wongpatikaseree, K., Yomaboot, P. & Kaewpitakkun, Y. Facebook social media for depression detection in the thai community. In 2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE) , pp. 1–6 (2018).

Chang, M. -Y. & Tseng, C. -Y. Detecting social anxiety with online social network data. In 2020 21st IEEE International Conference on Mobile Data Management (MDM) , pp. 333–336 (2020).

Tong. L. et al. Cost-sensitive boosting pruning trees for depression detection on Twitter. In IEEE Transactions on Affective Computing , https://doi.org/10.1109/TAFFC.2022.3145634 (2019).

Guntuku, S. C., Buffone, A., Jaidka, K., Eichstaedt, J. C. & Ungar, L. H. Understanding and measuring psychological stress using social media. In Proc. International AAAI Conference on Web and Social Media , vol. 13, pp. 214–225 (2019).

Zhao, L., Jia, J. & Feng, L. Teenagers’ stress detection based on time-sensitive micro-blog comment/response actions. In IFIP International Conference on Artificial Intelligence in Theory and Practice , pp. 26–36 (2015).

Ziwei, B. Y. & Chua, H. N. An application for classifying depression in tweets. In Proc. 2nd International Conference on Computing and Big Data , pp. 37–41 (2019).

Prakash, A., Agarwal, K., Shekhar, S., Mutreja, T. & Chakraborty, P. S. An ensemble learning approach for the detection of depression and mental illness over twitter data. In 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom) , pp. 565–570 (2021).

Coello-Guilarte, L., Ortega-Mendoza, R. M., Villaseñor-Pineda, L. & Montes-y-Gómez, M. Crosslingual depression detection in twitter using bilingual word alignments. In International Conference of the Cross-Language Evaluation Forum for European Languages , pp. 49–61 (2019).

Qiu, J. & Gao, J. Depression tendency recognition model based on college student’s microblog text. In International Conference on Intelligence Science , pp. 351–359 (2017).

Almouzini, S. et al. Detecting arabic depressed users from twitter data. Proc. Comput. Sci. 163 , 257–265 (2019).

Mbarek, A., Jamoussi, S., Charfi, A. & Hamadou, A. B. Suicidal profiles detection in twitter. In WEBIST , pp. 289–296 (2019).

Xu, S. et al. Automatic verbal analysis of interviews with schizophrenic patients. In 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP) , pp. 1–5 (2018).

Verma, P., Sharma, K. & Walia, G. S. Depression detection among social media users using machine learning. In International Conference on Innovative Computing and Communications , pp. 865–874 (2021).

Shrestha, A. & Spezzano, F. Detecting depressed users in online forums. In Proc. 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , pp. 945–951 (2019).

Desmet, B. & Hoste, V. Recognising suicidal messages in dutch social media. In 9th International Conference on Language Resources and Evaluation (LREC) , pp. 830–835 (2014).

He, L. & Luo, J. “What makes a pro eating disorder hashtag”: using hashtags to identify pro eating disorder tumblr posts and twitter users. In 2016 IEEE International Conference on Big Data (Big Data) , pp. 3977–3979 (2016).

Marerngsit, S. & Thammaboosadee, S. A two-stage text-to-emotion depressive disorder screening assistance based on contents from online community. In 2020 8th International Electrical Engineering Congress (iEECON) , pp. 1–4 (2020).

Nadeem, M. Identifying depression on twitter. Preprint at arXiv https://arxiv.org/abs/1607.07384 (2016).

Fodeh, S. et al. Using machine learning algorithms to detect suicide risk factors on twitter. In 2019 International Conference on Data Mining Workshops (ICDMW) , pp. 941–948 (2019).

Tariq, S. A novel co-training-based approach for the classification of mental illnesses using social media posts. IEEE Access 7 , 166165–166172 (2019).

Mittal, A., Goyal, A. & Mittal, M. Data preprocessing based connecting suicidal and help-seeking behaviours. In 2021 5th International Conference on Computing Methodologies and Communication (ICCMC) , pp. 1824–1830 (2021).

Kamite, S. R. & Kamble, V. Detection of depression in social media via twitter using machine learning approach. In 2020 International Conference on Smart Innovations in Design, Environment, Management, Planning and Computing (ICSIDEMPC) , pp. 122–125 (2020).

Schoene, A. M. & Dethlefs, N. Automatic identification of suicide notes from linguistic and sentiment features. In Proc. 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities , pp. 128–133 (2016).

Almeida, H., Briand, A. & Meurs, M.- J. Detecting early risk of depression from social media user-generated content. In CLEF (Working Notes) (2017).

Govindasamy, K. A. & Palanichamy, N. Depression detection using machine learning techniques on twitter data. In 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS) , pp. 960–966 (2021).

Baheti, R. & Kinariwala, S. Detection and analysis of stress using machine learning techniques. Int. J. Eng. Adv. Technol. 9 , 335–342 (2019).

Németh, R., Sik, D. & Máté, F. Machine learning of concepts hard even for humans: the case of online depression forums. Int. J. Qualitative Methods 19 , 1609406920949338 (2020).

Benton, A., Mitchell, M. & Hovy, D. Multitask learning for mental health conditions with limited social media data. In Proc. 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol. 1 , pp. 152–162 (2017).

Hiraga, M. Predicting depression for japanese blog text. In Proc. ACL 2017, Student Research Workshop , pp. 107–113 (2017).

Nasir, A., A slam, K., Tariq, S. & Ullah, M. F. Predicting mental illness using social media posts and comments. International Journal of Advanced Computer Science and Applications(IJACSA) , vol. 11 (2020).

Skaik, R. & Inkpen, D. Using twitter social media for depression detection in the canadian population. In 2020 3rd Artificial Intelligence and Cloud Computing Conference , pp. 109–114 (2020).

Chadha, A. & Kaushik, B. Machine learning based dataset for finding suicidal ideation on twitter. In 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV) , pp. 823–828 (2021).

Sekulić, I., Gjurković, M. & Šnajder, J. Not Just Depressed: Bipolar Disorder Prediction on Reddit. In Proc, the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis , pp. 72–78 (2018).

Kumar, A., Sharma, A. & Arora, A. Anxious depression prediction in real-time social data. In International Conference on Advances in Engineering Science Management & Technology (ICAESMT)-2019. (Uttaranchal University, Dehradun, India, 2019).

Nghiem, M. -Q., Baylis, P. & Ananiadou, S. Paladin: an annotation tool based on active and proactive learning. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations , pp. 238–243 (2021).

Park, A., Conway, M. & Chen, A. T. Examining thematic similarity, difference, and membership in three online mental health communities from reddit: a text mining and visualization approach. Comput. Hum. Behav. 78 , 98–112 (2018).

Shrestha, A., Serra, E. & Spezzano, F. Multi-modal social and psycho-linguistic embedding via recurrent neural networks to identify depressed users in online forums. Netw. Modeling Anal. Health Inform. Bioinforma. 9 , 1–11 (2020).

Friedenberg, M., Amiri, H., Daumé III, H. & Resnik, P. The umd clpsych 2016 shared task system: text representation for predicting triage of forum posts about mental health. In Proc. Third Workshop on Computational Linguistics and Clinical Psychology , pp. 158–161 (2016).

Nguyen, T. Using linguistic and topic analysis to classify sub-groups of online depression communities. Multimed. Tools Appl. 76 , 10653–10676 (2017).

Yazdavar, A. H. et al. Semi-supervised approach to monitoring clinical depressive symptoms in social media. In Proc. 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 , pp. 1191–1198 (2017).

Sarsam, S. M., Al-Samarraie, H., Alzahrani, A. I., Alnumay, W. & Smith, A. P. A lexicon-based approach to detecting suicide-related messages on twitter. Biomed. Signal Process. Control 65 , 102355 (2021).

Driessens, K., Reutemann, P., Pfahringer, B. & Leschi, C. Using weighted nearest neighbor to benefit from unlabeled data. In Pacific-Asia Conference on Knowledge Discovery and Data Mining , pp. 60–69 (2006).

Zhou, D., Bousquet, O., Lal, T. N., Weston, J. & Schölkopf, B. Learning with local and global consistency. In Advances in Neural Information Processing Systems , pp. 321–328 (2004).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444 (2015).

Article   CAS   PubMed   Google Scholar  

Voulodimos, A., Doulamis, N., Doulamis, A. & Protopapadakis, E. Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018 1–13 (2018).

Young, T., Hazarika, D., Poria, S. & Cambria, E. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13 , 55–75 (2018).

Deng, L. & Yu, D. Deep learning: methods and applications. Found. Trends Signal Process. 7 , 197–387 (2014).

Su, C., Xu, Z., Pathak, J. & Wang, F. Deep learning in mental health outcome research: a scoping review. Transl. Psychiatry 10 , 1–26 (2020).

Ghannay, S., Favre, B., Esteve, Y. & Camelin, N. Word embedding evaluation and combination. In Proc. Tenth International Conference on Language Resources and Evaluation (LREC’16) , pp. 300–305 (2016).

Pennington, J., Socher, R. & Manning, C. D. Glove: Global vectors for word representation. In Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 1532–1543 (2014).

Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. In Proc. 1st International Conference on Learning Representations (ICLR) Workshops Track . (2013).

Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for languageunderstanding. In Proc. NAACL-HLT , pp. 4171–4186 (2019).

Lan, Z. et al. Albert: a lite bert for self-supervised learning of language representations. In Proc. 8th International Conference on Learning Representations (ICLR) (2020).

Gaur, M. et al. Knowledge-aware assessment of severity of suicide risk for early intervention. In The World Wide Web Conference , pp. 514–525 (2019).

Boukil, S., El Adnani, F., Cherrat, L., El Moutaouakkil, A. E. & Ezziyyani, M. Deep learning algorithm for suicide sentiment prediction. In International Conference on Advanced Intelligent Systems for Sustainable Development , pp. 261–272 (2018).

Phan, H. T., Tran, V. C., Nguyen, N. T. & Hwang, D. A framework for detecting user’s psychological tendencies on twitter based on tweets sentiment analysis. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems , pp. 357–372 (2020).

Wang, Y. -T., Huang, H. -H., Chen, H. -H. & Chen, H. A neural network approach to early risk detection of depression and anorexia on social media text. In CLEF (Working Notes) (2018).

Trotzek, M., Koitka, S. & Friedrich, C. M. Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences. IEEE Trans. Knowl. Data Eng. 32 , 588–601 (2018).

Obeid, J. S. Automated detection of altered mental status in emergency department clinical notes: a deep learning approach. BMC Med. Inform. Decis. Mak. 19 , 1–9 (2019).

Rao, G., Zhang, Y., Zhang, L., Cong, Q. & Feng, Z. Mgl-cnn: a hierarchical posts representations model for identifying depressed individuals in online forums. IEEE Access 8 , 32395–32403 (2020).

Lin, H. Detecting stress based on social interactions in social networks. IEEE Trans. Knowl. Data Eng. 29 , 1820–1833 (2017).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9 , 1735–1780 (1997).

Cho, K. et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 1724–1734 (2014).

Pascanu, R., Mikolov, T. & Bengio, Y. On the difficulty of training recurrent neural networks. In International Conference on Machine Learning , pp. 1310–1318 (2013).

Ghosh, S. & Anwar, T. Depression intensity estimation via social media: a deep learning approach. IEEE Trans. Comput. Soc. Syst. 8 , 1465–1474 (2021).

Uddin, A. H., Bapery, D. & Arif, A. S. M. Depression analysis of bangla social media data using gated recurrent neural network. In 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT) , pp. 1–6 (2019).

Yao, H. Detection of suicidality among opioid users on reddit: machine learning–based approach. J. Med. Internet Res. 22 , 15293 (2020).

Ahmed, U., Mukhiya, S. K., Srivastava, G., Lamo, Y. & Lin, J. C. -W. Attention-based deep entropy active learning using lexical algorithm for mental health treatment. Front. Psychol. 12 , 471 (2021).

Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proc. 3rd International Conference on Learning Representations (ICLR) (2015).

Sekulić, I. & Strube, M. Adapting deep learning methods for mental health prediction on social media. In Proc. the 5th Workshop on Noisy User-generated Text (W-NUT) , pp. 322–327 (2019).

Sawhney, R., Joshi, H., Gandhi, S. & Shah, R. R. Towards ordinal suicide ideation detection on social media. in: Proc. 14th ACM International Conference on Web Search and Data Mining , pp. 22–30 (2021).

Rutowski, T. et al. Cross-demographic portability of deep nlp-based depression models. In 2021 IEEE Spoken Language Technology Workshop (SLT) , pp. 1052–1057 (2021).

Rutowski, T. et al. Depression and anxiety prediction using deep language models and transfer learning. In 2020 7th International Conference on Behavioural and Social Computing (BESC) , pp. 1–6 (2020).

Ghosh, S., Ekbal, A. & Bhattacharyya, P. A multitask framework to detect depression, sentiment and multi-label emotion from suicide notes. Cognit. Comput. 14 , 110–129 (2022).

Gui, T. et al. Cooperative multimodal approach to depression detection in twitter. In Proc. AAAI Conference on Artificial Intelligence , vol. 33, pp. 110–117 (2019).

Gui, T. et al. Depression detection on social media with reinforcement learning. In China National Conference on Chinese Computational Linguistics , pp. 613–624 (2019).

Wongkoblap, A., Vadillo, M. A. & Curcin, V. Predicting social network users with depression from simulated temporal data. In IEEE EUROCON 2019-18th International Conference on Smart Technologies , pp. 1–6 (2019).

Wongkoblap, A., Vadillo, M. A. & Curcin, V. Modeling depression symptoms from social network data through multiple instance learning. AMIA Summits Transl. Sci. Proc. 2019 , 44 (2019).

PubMed   PubMed Central   Google Scholar  

Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems , pp. 5998–6008 (2017).

Wang, N. et al. Learning models for suicide prediction from social media posts. In Proc. the Seventh Workshop on Computational Linguistics and Clinical Psychology , pp. 87–92 (2021).

Zhang, T., Schoene, A. M. & Ananiadou, S. Automatic identification of suicide notes with a transformer-based deep learning model. Internet Interv. 25 , 100422 (2021).

Haque, F., Nur, R. U., Al Jahan, S., Mahmud, Z. & Shah, F. M. A transformer based approach to detect suicidal ideation using pre-trained language models. In 2020 23rd International Conference on Computer and Information Technology (ICCIT) , pp. 1–5 (2020).

Chaurasia, A. et al. Predicting mental health of scholars using contextual word embedding. In 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom) , pp. 923–930 (2021).

Malviya, K., Roy, B. & Saritha, S. A transformers approach to detect depression in social media. In 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) , pp. 718–723 (2021).

Murarka, A., Radhakrishnan, B. & Ravichandran, S. Detection and classification of mental illnesses on social media using roberta. Preprint arXiv https://arxiv.org/abs/2011.11226 (2020).

Wang, X. Depression risk prediction for chinese microblogs via deep-learning methods: content analysis. JMIR Med. Inform. 8 , 17958 (2020).

Abed-Esfahani, P. et al. Transfer learning for depression: early detection and severity prediction from social media postings. In CLEF (Working Notes) (2019).

Gaur, M. Characterization of time-variant and time-invariant assessment of suicidality on reddit using c-ssrs. PloS ONE 16 , 0250448 (2021).

Tadesse, M. M., Lin, H., Xu, B. & Yang, L. Detection of suicide ideation in social media forums using deep learning. Algorithms 13 , 7 (2020).

Zhou, S., Zhao, Y., Bian, J., Haynos, A. F. & Zhang, R. Exploring eating disorder topics on twitter: machine learning approach. JMIR Med. Inform. 8 , 18273 (2020).

Deshpande, S. & Warren, J. Self-harm detection for mental health chatbots. In Public Health and Informatics , pp. 48–52. (IOS Press, 2021).

Solieman, H. & Pustozerov, E. A. The detection of depression using multimodal models based on text and voice quality features. In 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus) , pp. 1843–1848 (2021).

Sawhney, R., Joshi, H., Gandhi, S. & Shah, R. A time-aware transformer based model for suicide ideation detection on social media. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 7685–7697 (2020).

Plutchik, R. A general psychoevolutionary theory of emotion. In Theories of Emotion , pp. 3–33. (Elsevier, 1980).

Aragón, M. E., López-Monroy, A. P., González-Gurrola, L. C. & Montes, M. Detecting depression in social media using fine-grained emotions. in: Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pp. 1481–1486 (2019).

Aragón, M. E., López-Monroy, A. P., González, L. C. & Montes-y-Gómez, M. Attention to emotions: detecting mental disorders in social media. In International Conference on Text, Speech, and Dialogue , pp. 231–239 (2020).

Lara, J. S., Aragon, M. E., Gonzalez, F. A. & Montes-y-Gomez, M. Deep bag-of-sub-emotions for depression detection in social media. In Proc. International Conference on Text, Speech, and Dialogue . pp. 60–72 (2021).

Sawhney, R., Joshi, H., Flek, L. & Shah, R. Phase: Learning emotional phase-aware representations for suicide ideation detection on social media. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pp. 2415–2428 (2021).

Sawhney, R., Joshi, H., Shah, R. & Flek, L. Suicide ideation detection via social and temporal user representations using hyperbolic learning. In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pp. 2176–2190 (2021).

Thorstad, R. & Wolff, P. Predicting future mental illness from social media: a big-data approach. Behav. Res. Methods 51 , 1586–1600 (2019).

Aladağ, A. E., Muderrisoglu, S., Akbas, N. B., Zahmacioglu, O. & Bingol, H. O. Detecting suicidal ideation on forums: proof-of-concept study. J. Med. Internet Res. 20 , 9840 (2018).

Desmet, B. & Hoste, V. Online suicide prevention through optimised text classification. Inf. Sci. 439 , 61–78 (2018).

Cheng, Q., Li, T. M., Kwok, C.-L., Zhu, T. & Yip, P. S. Assessing suicide risk and emotional distress in chinese social media: a text mining and machine learning study. J. Med. internet Res. 19 , 243 (2017).

Roy, A. A machine learning approach predicts future risk to suicidal ideation from social media data. NPJ Digital Med. 3 , 1–12 (2020).

Rios, A. & Kavuluru, R. Ordinal convolutional neural networks for predicting rdoc positive valence psychiatric symptom severity scores. J. Biomed. Inform. 75 , 85–93 (2017).

Losada, D. E., Crestani, F. & Parapar, J. Overview of erisk 2019 early risk prediction on the internet. In International Conference of the Cross-Language Evaluation Forum for European Languages , pp. 340–357 (2019).

Losada, D. E. & Crestani, F. A test collection for research on depression and language use. In International Conference of the Cross-Language Evaluation Forum for European Languages , pp. 28–39 (2016).

Van Engelen, J. E. & Hoos, H. H. A survey on semi-supervised learning. Mach. Learn. 109 , 373–440 (2020).

Settles, B. Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances. In Proc. 2011 Conference on Empirical Methods in Natural Language Processing , pp. 1467–1478 (2011).

Maupomé, D. & Meurs, M. -J. Using topic extraction on social media content for the early detection of depression. In CLEF (Working Notes) vol. 2125 (2018)

Gaur, M. et al. “Let me tell you about your mental health!” contextualized classification of reddit posts to dsm-5 for web-based intervention. In Proc. 27th ACM International Conference on Information and Knowledge Management , pp. 753–762 (2018).

Galiatsatos, D. et al. Classification of the most significant psychological symptoms in mental patients with depression using bayesian network. In Proc. 16th International Conference on Engineering Applications of Neural Networks (INNS) , pp. 1–8 (2015).

Wang, W. Y., Singh, S. & Li, J. Deep adversarial learning for nlp. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials , pp. 1–5 (2019).

Le-Khac, P. H., Healy, G. & Smeaton, A. F. Contrastive representation learning: a framework and review. IEEE Access 8 , 193907–193934 (2020).

Li, Y., Tian, X., Liu, T. & Tao, D. Multi-task model and feature joint learning. In Twenty-Fourth International Joint Conference on Artificial Intelligence (2015).

Sharma, A. R. & Kaushik, P. Literature survey of statistical, deep and reinforcement learning in natural language processing. In 2017 International Conference on Computing, Communication and Automation (ICCCA) , pp. 350–354 (2017).

Ruder, S., Peters, M. E., Swayamdipta, S. & Wolf, T. Transfer learning in natural language processing. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials , pp. 15–18 (2019).

López-Úbeda, P., Plaza-del-Arco, F. M., Díaz-Galiano, M. C. & Martín-Valdivia, M.-T. How successful is transfer learning for detecting anorexia on social media? Appl. Sci. 11 , 1838 (2021).

Hu, D. An introductory survey on attention mechanisms in nlp problems. In Proc. SAI Intelligent Systems Conference , pp. 432–448 (2019).

Wang, Z., Zhang, J., Feng, J. & Chen, Z. Knowledge graph and text jointly embedding. In Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 1591–1601 (2014).

Sap, M., Shwartz, V., Bosselut, A., Choi, Y. & Roth, D. Commonsense reasoning for natural language processing. In Proc. 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts , pp. 27–33 (2020).

Feng, S. Y. et al. A survey of data augmentation approaches for nlp. In Proc. Findings of the Association for Computational Linguistics: ACL-IJCNLP , pp. 968–988 (2021).

Lin, C. et al. Sensemood: depression detection on social media. In Proc. 2020 International Conference on Multimedia Retrieval , pp. 407–411 (2020).

Mann, P., Paes, A. & Matsushima, E. H. See and read: detecting depression symptoms in higher education students using multimodal social media data. In Proc. International AAAI Conference on Web and Social Media , vol. 14, pp. 440–451 (2020).

Xu, Z., Pérez-Rosas, V. & Mihalcea, R. Inferring social media users’ mental health status from multimodal information. In Proc. 12th Language Resources and Evaluation Conference , pp. 6292–6299 (2020).

Wang, B. et al. Learning to detect bipolar disorder and borderline personality disorder with language and speech in non-clinical interviews. In Proc. Interspeech 2020 , pp. 437–441 (2020).

Rodrigues Makiuchi, M., Warnita, T., Uto, K. & Shinoda, K. Multimodal fusion of bert-cnn and gated cnn representations for depression detection. In Proc. 9th International on Audio/Visual Emotion Challenge and Workshop , pp. 55–63 (2019).

Mittal, A. et al. Multi-modal detection of alzheimer’s disease from speech and text. In Proc. BIOKDD'21 (2021).

Ribeiro, M. T., Singh, S. & Guestrin, C. “Why should i trust you?” explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 1135–1144 (2016).

Du, M., Liu, N. & Hu, X. Techniques for interpretable machine learning. Commun. ACM 63 , 68–77 (2019).

Song, H., You, J., Chung, J. -W. & Park, J. C. Feature attention network: Interpretable depression detection from social media. In Proc. 32nd Pacific Asia Conference on Language, Information and Computation (2018).

Zogan, H., Razzak, I., Wang, X., Jameel, S. & Xu, G. Explainable depression detection with multi-modalities using a hybrid deep learning model on social media. Preprint at arXiv https://arxiv.org/abs/2007.02847 (2020).

Castelvecchi, D. Can we open the black box of AI? Nat. N. 538 , 20 (2016).

Article   CAS   Google Scholar  

Benton, A., Coppersmith, G. & Dredze, M. Ethical research protocols for social media health research. In Proc. First ACL Workshop on Ethics in Natural Language Processing , pp. 94–102 (2017).

Nicholas, J., Onie, S. & Larsen, M. E. Ethics and privacy in social media research for mental health. Curr. Psychiatry Rep. 22 , 1–7 (2020).

McKee, R. Ethical issues in using social media for health and health care research. Health Policy 110 , 298–301 (2013).

Tadisetty, S. & Ghazinour, K. Anonymous prediction of mental illness in social media. In 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC) , pp. 0954–0960 (2021).

Doan, S. Extracting health-related causality from twitter messages using natural language processing. BMC Med. Inform. Decis. Mak. 19 , 71–77 (2019).

Hutto, C. & Gilbert, E. Vader: a parsimonious rule-based model for sentiment analysis of social media text. In Proc. International AAAI Conference on Web and Social Media , vol. 8 (2014).

Cambria, E., Poria, S., Hazarika, D. & Kwok, K. Senticnet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In Proc. AAAI Conference on Artificial Intelligence , vol. 32 (2018).

Nielsen, F. Å. A new anew: evaluation of a word list for sentiment analysis in microblogs. In Proc. CEUR Workshop Proceedings, vol. 718, pp. 93–98 (2011).

Wang, X. et al. A depression detection model based on sentiment analysis in micro-blog social network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining , pp. 201–213 (2013).

Leiva, V. & Freire, A. Towards suicide prevention: early detection of depression on social media. In International Conference on Internet Science , pp. 428–436 (2017).

Stephen, J. J. & Prabu, P. Detecting the magnitude of depression in twitter users using sentiment analysis. Int. J. Electr. Comput. Eng. 9 , 3247 (2019).

Mohammad, S. M. & Turney, P. D. Nrc emotion lexicon. National Research Council, Canada 2 (2013).

Zhou, T. H., Hu, G. L. & Wang, L. Psychological disorder identifying method based on emotion perception over social networks. Int. J. Environ. Res. Public Health 16 , 953 (2019).

Article   PubMed Central   Google Scholar  

Saloun, P., Ondrejka, A., Malčík, M. & Zelinka, I. Personality disorders identification in written texts. In AETA 2015: Recent Advances in Electrical Engineering and Related Sciences , pp. 143–154 (Springer, 2016).

Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent dirichlet allocation. J. Mach. Learn. Res. 3 , 993–1022 (2003).

Dumais, S. T. Latent semantic analysis. Annu. Rev. Inf. Sci. Technol. 38 , 188–230 (2004).

Xu, W., Liu, X. & Gong, Y. Document clustering based on non-negative matrix factorization. In Proc. 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval , pp. 267–273 (2003).

Desmet, B., Jacobs, G. & Hoste, V. Mental distress detection and triage in forum posts: the lt3 clpsych 2016 shared task system. In Proc. Third Workshop on Computational Linguistics and Clinical Psychology , pp. 148–152 (2016).

Tausczik, Y. R. & Pennebaker, J. W. The psychological meaning of words: Liwc and computerized text analysis methods. J. Lang. Soc. Psychol. 29 , 24–54 (2010).

Rodrigues, R. G., das Dores, R. M., Camilo-Junior, C. G. & Rosa, T. C. Sentihealth-cancer: a sentiment analysis tool to help detecting mood of patients in online social networks. Int. J. Med. Inform. 85 , 80–95 (2016).

Yoo, M., Lee, S. & Ha, T. Semantic network analysis for understanding user experiences of bipolar and depressive disorders on reddit. Inf. Process. Manag. 56 , 1565–1575 (2019).

Ricard, B. J., Marsch, L. A., Crosier, B. & Hassanpour, S. Exploring the utility of community-generated social media content for detecting depression: an analytical study on instagram. J. Med. Internet Res. 20 , 11817 (2018).

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems , pp. 3111–3119 (2013).

Hemmatirad, K., Bagherzadeh, H., Fazl-Ersi, E. & Vahedian, A. Detection of mental illness risk on social media through multi-level svms. In 2020 8th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS) , pp. 116–120 (2020).

Bandyopadhyay, A., Achilles, L., Mandl, T., Mitra, M. & Saha, S. K. Identification of depression strength for users of online platforms: a comparison of text retrieval approaches. In Proc. CEUR Workshop Proceedings, vol. 2454, pp. 331–342 (2019).

Zhong, Q. -Y. Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing. BMC Med. Inform. Decis. Mak. 18 , 1–11 (2018).

Huang, Y., Liu, X. & Zhu, T. Suicidal ideation detection via social media analytics. In International Conference on Human Centered Computing , pp. 166–174 (2019).

Lv, M., Li, A., Liu, T. & Zhu, T. Creating a chinese suicide dictionary for identifying suicide risk on social media. PeerJ 3 , 1455 (2015).

Nguyen, T., Phung, D., Adams, B. & Venkatesh, S. Prediction of age, sentiment, and connectivity from social media text. In International Conference on Web Information Systems Engineering , pp. 227–240 (2011).

Peng, Z., Hu, Q. & Dang, J. Multi-kernel svm based depression recognition using social media data. Int. J. Mach. Learn. Cybern. 10 , 43–57 (2019).

Wu, M. Y., Shen, C.-Y., Wang, E. T. & Chen, A. L. A deep architecture for depression detection using posting, behavior, and living environment data. J. Intell. Inf. Syst. 54 , 225–244 (2020).

Zogan, H., Wang, X., Jameel, S. & Xu, G. Depression detection with multi-modalities using a hybrid deep learning model on social media. Preprint at arXiv https://arxiv.org/abs/2007.02847 (2020).

Yao, X., Yu, G., Tang, J. & Zhang, J. Extracting depressive symptoms and their associations from an online depression community. Comput. Hum. Behav. 120 , 106734 (2021).

Dinkel, H., Wu, M. & Yu, K. Text-based depression detection on sparse data. Preprint at arXiv https://arxiv.org/abs/1904.05154 (2019).

Zhou, Y., Glenn, C. & Luo, J. Understanding and predicting multiple risky behaviors from social media. In Workshops at the Thirty-First AAAI Conference on Artificial Intelligence (2017).

Wang, Y., Wang, Z., Li, C., Zhang, Y. & Wang, H. A multitask deep learning approach for user depression detection on sina weibo. Preprint at arXiv https://arxiv.org/abs/2008.11708 (2020).

Aragon, M. E., Lopez-Monroy, A. P., Gonzalez-Gurrola, L. -C. G. & Montes, M. Detecting mental disorders in social media through emotional patterns-the case of anorexia and depression. IEEE Trans. Affect. Comput . (2021).

Li, N., Zhang, H. & Feng, L. Incorporating forthcoming events and personality traits in social media based stress prediction. IEEE Trans. Affect. Comput. (2021).

Download references

Acknowledgements

This research was partially funded by the Alan Turing Institute and the H2020 EPHOR project, grant agreement No. 874703.

Author information

Authors and affiliations.

Department of Computer Science, The University of Manchester, National Centre for Text Mining, Manchester, UK

Tianlin Zhang, Annika M. Schoene & Sophia Ananiadou

Department of Computer Science, Aalto University, Helsinki, Finland

Shaoxiong Ji

The Alan Turing Institute, London, UK

Sophia Ananiadou

You can also search for this author in PubMed   Google Scholar

Contributions

T.Z. conducted the review, prepared figures, and wrote the initial draft. A.M.S., S.J., and S.A. revised the paper. S.A. supervised the project. All authors reviewed the paper.

Corresponding author

Correspondence to Sophia Ananiadou .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Zhang, T., Schoene, A.M., Ji, S. et al. Natural language processing applied to mental illness detection: a narrative review. npj Digit. Med. 5 , 46 (2022). https://doi.org/10.1038/s41746-022-00589-7

Download citation

Received : 26 October 2021

Accepted : 23 February 2022

Published : 08 April 2022

DOI : https://doi.org/10.1038/s41746-022-00589-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Toward efficient, sustainable, and scalable methods of treatment characterization: an investigation of coding clinical practice from chart notes.

  • Benjamin M. Isenberg
  • Kimberly D. Becker
  • Bruce F. Chorpita

Administration and Policy in Mental Health and Mental Health Services Research (2024)

Development of intelligent system based on synthesis of affective signals and deep neural networks to foster mental health of the Indian virtual community

  • Mandeep Kaur Arora
  • Jaspreet Singh

Social Network Analysis and Mining (2024)

Identifying patients in need of psychological treatment with language representation models

  • İrfan Aygün
  • Mehmet Kaya

Multimedia Tools and Applications (2024)

Unraveling minds in the digital era: a review on mapping mental health disorders through machine learning techniques using online social media

An automatic speech analytics program for digital assessment of stress burden and psychosocial health.

  • Amanda M. Y. Chu
  • Benson S. Y. Lam
  • Mike K. P. So

npj Mental Health Research (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

case study on nlp

Book cover

  • © 2021

Practical Natural Language Processing with Python

With Case Studies from Industries Using Text Data at Scale

  • Mathangi Sri 0

Bangalore, India

You can also search for this author in PubMed   Google Scholar

  • Emphasizes a data- and business problem-first approach
  • A case study-based approach that presents real-world problems and solutions
  • Explains the accuracy and limitations of certain libraries from a professional's view

15k Accesses

4 Citations

6 Altmetric

  • Table of contents

About this book

Authors and affiliations, about the author, bibliographic information.

  • Publish with us

Buying options

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (5 chapters)

Front matter, types of data.

Mathangi Sri

NLP in Customer Service

Nlp in online reviews, nlp in banking, financial services, and insurance (bfsi), nlp in virtual assistants, back matter.

Work with natural language tools and techniques to solve real-world problems. This book focuses on how natural language processing (NLP) is used in various industries. Each chapter describes the problem and solution strategy, then provides an intuitive explanation of how different algorithms work and a deeper dive on code and output in Python. 

Practical Natural Language Processing with Python follows a case study-based approach. Each chapter is devoted to an industry or a use case, where you address the real business problems in that industry and the various ways to solve them. You start with various types of text data before focusing on the customer service industry, the type of data available in that domain, and the common NLP problems encountered. Here you cover the bag-of-words model supervised learning technique as you try to solve the case studies. Similar depth is given to other use cases such as online reviews, bots, finance, and so on. As you cover theproblems in these industries you’ll also cover sentiment analysis, named entity recognition, word2vec, word similarities, topic modeling, deep learning, and sequence to sequence modelling. 

By the end of the book, you will be able to handle all types of NLP problems independently. You will also be able to think in different ways to solve language problems. Code and techniques for all the problems are provided in the book.

What You Will Learn

  • Build an understanding of NLP problems in industry
  • Gain the know-how to solve a typical NLP problem using language-based models and machine learning
  • Discover the best methods to solve a business problem using NLP - the tried and tested ones
  • Understand the business problems that are tough to solve 

Who This Book Is For

Analytics and data science professionals who want to kick start NLP, and NLP professionals who want to get new ideas to solve theproblems at hand.

  • Natural Language Processing
  • Deep Learning
  • Recommender System
  • Recurrent Neural Networks

Mathangi is a renowned data science leader in India. She has 11 patent grants and 20+ patents published in the area of intuitive customer experience, indoor positioning, and user profiles. She has 16+ years of proven track record in building world-class data science solutions and products. She is adept in machine learning, text mining, NLP technologies, and NLP tools. She has built data science teams across large organizations including Citibank, HSBC, and GE, and tech startups such as 247.ai, PhonePe, and Gojek. She advises start-ups, enterprises, and venture capitalists on data science strategy and roadmaps. She  is an active contributor on machine learning to many premier institutes in India. She is recognized as one of “The Phenomenal SHE” by the Indian National Bar Association in 2019.

Book Title : Practical Natural Language Processing with Python

Book Subtitle : With Case Studies from Industries Using Text Data at Scale

Authors : Mathangi Sri

DOI : https://doi.org/10.1007/978-1-4842-6246-7

Publisher : Apress Berkeley, CA

eBook Packages : Professional and Applied Computing , Apress Access Books , Professional and Applied Computing (R0)

Copyright Information : Mathangi Sri 2021

Softcover ISBN : 978-1-4842-6245-0 Published: 01 December 2020

eBook ISBN : 978-1-4842-6246-7 Published: 30 November 2020

Edition Number : 1

Number of Pages : XV, 253

Number of Illustrations : 103 b/w illustrations

Topics : Machine Learning , Python , Open Source

Policies and ethics

  • Find a journal
  • Track your research

Case Study: Natural Language Processing (NLP) with Open Data for Drug Repositioning in Glioblastoma Therapy

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

The Association for Neuro Linguistic Programming

ANLP International CIC

The Association for NLP

Empowering NLP Professionals

Case studies

Music Industry Manager Manages Flying Fears

Music Industry Manager Manages Flying Fears

I was introduced to NLP through a mutual friend after I mentioned I had experienced panic attacks, which started after a particularly turbulent flight. Continue reading...

A First Class Honours Degree

A First Class Honours Degree

She became motivated, then de-motivated again and again and again – even as we worked together. Continue reading...

Unlocking Emotional Resilience: Conquering Workplace Challenges with mBraining and NLP Mastery

Unlocking Emotional Resilience: Conquering Workplace Challenges with mBraining and NLP Mastery

Sarah, facing unwanted attention from a male director, used mBraining and NLP submodality changes to regain control, balance emotions, and excel professionally. Continue reading...

Chocolate Obsession

Chocolate Obsession

A colleague spoke about having a chocolate obsession. She would obsessively eat 3-4 bars of chocolate per day but didn’t understand why. Continue reading...

Weight loss and management

Weight loss and management

Weight issues were causing a lack of confidence. We used hypnotherapy to investigate the root cause and used anchoring. Continue reading...

Stuck in a Rut to Empowered and Progressing

Stuck in a Rut to Empowered and Progressing

This client transitioned from unmotivated and unproductive to fully motivated and confidently building her business. Continue reading...

Effective Communication with Customers

Effective Communication with Customers

Both The NHS and the Drugs and Healthcare industries are going through a series of re-organisations and major change. Continue reading...

Change how you see food

Change how you see food

Frustrated with always giving in to snacking when bored or tired they wanted a new way to change this habit Continue reading...

NLP on the World Stage

NLP on the World Stage

my experience of using the power of NLP coaching with team GB performing arts. physically and skillfully at their best but needed mindset preparation. Continue reading...

Flying Fears Averted

Flying Fears Averted

I hadn’t flown for over 20 years, since I was 17. My Fear of Flying was now affecting me on a daily basis. Continue reading...

SWISH away sugar craving

SWISH away sugar craving

Wanted control of refined sugar intake. Unwanted behaviour of mindless grazing on biscuits, cakes, chocolates and sweets. Continue reading...

Anxiety Cessation

Anxiety Cessation

Anxiety issue root causes identified and memories completed through unconscious mind leading to sense of open freedom and choice. Continue reading...

Don’t Panic!

Don’t Panic!

Jackie suffered from panic attacks and anxiety, which prevented her from going out and living life to the full. Continue reading...

Turning Down Auditory Submodality

Turning Down Auditory Submodality

How an older lady was helped to stop hearing wartime songs in her head that were driving her crazy Continue reading...

Employee Communications Workshops

Employee Communications Workshops

In 2010, having dominated the mobile world for over a decade, Nokia faced a serious challenge from competitors such as the iPhone. Continue reading...

The Anatomy of a Plane Crash

The Anatomy of a Plane Crash

 “My name’s Neil, and I’ll be your plane crash victim today”.  I wish I’d said that, but the line truly belonged to Nina... Continue reading...

Can I have some of that too please?!

Can I have some of that too please?!

From consuming grief, feeling lost and difficult relationship issues to discovering a new sense of self, confidence, happiness and increased awareness. Continue reading...

I Died a Thousand Times Before I Learned How to Live

I Died a Thousand Times Before I Learned How to Live

In 2006 I suffered a severe brain injury through an assault that was connected to my job in hospitality. I suffered multiple facial/cranial fractures. Continue reading...

Phobia Removal – Trypanophobia (Needles)

Phobia Removal – Trypanophobia (Needles)

The client had a phobia of needles causing them varying levels of stress and discomfort. Continue reading...

Receiving The Gift That Conflict Is Trying To Deliver

Receiving The Gift That Conflict Is Trying To Deliver

How a couple turned their many arguments into a powerful loving union. Turning conflict into connection Continue reading...

case study on nlp

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Wiley-Blackwell Online Open

Logo of blackwellopen

Five sources of bias in natural language processing

1 Marketing Department, Bocconi University, Milan Italy

Shrimai Prabhumoye

2 School of Computer Science, Carnegie Mellon University, Pittsburgh Pennsylvania, USA

Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context. Here, we provide a simple, actionable summary of this recent work. We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and finally (5) the research design (or how we conceptualize our research). We explore each of the bias sources in detail in this article, including examples and links to related work, as well as potential counter‐measures.

1. INTRODUCTION

A visitor who arrived at night in the London of 1878 would not have seen very much: cities were dark, only illuminated by candles and gas lighting. The advent of electricity changed that. This technology lit up cities everywhere and conferred a whole host of other benefits. From household appliances to the internet, electricity brought in a new era. However, as with every new technology, electricity had some unintended consequences. To provide the energy necessary to light up cities, run the appliances and fuel the internet, we required more power plants. Those plants contributed to pollution and ultimately to the phenomenon of global warming. Presumably, those consequences were far from the minds of the people who had just wanted to illuminate their cities.

This dual nature of intended use and unintended consequences is common to all new technologies. And natural language processing (NLP) proves no exception. NLP has progressed relatively rapidly from a niche academic field to a topic of widespread industrial and political interest. Its economic impact is substantial: NLP‐related companies are predicted to be valued at $26.4 billion by 2024. 1 We make daily use of machine translation (Wu et al.,  2016 ), personal assistants like Siri or Alexa (Palakovich et al.,  2017 ; Ram et al.,  2018 ) and text‐based search engines (Brin & Page,  1998 ). NLP is used in industrial decision‐making processes for hiring, abusive language and threat detection on social media (Roberts et al.,  2019 ), and mental health assessment and treatment (Benton et al.,  2017 ; Coppersmith et al.,  2015 ). An increasing volume of social science research uses NLP to generate insights into society and the human mind (Bhatia,  2017 ; Kozlowski et al.,  2019 ).

However, the interest in and use of NLP have grown much faster than an understanding of the unintended consequences. Some researchers have pointed out how NLP technologies can be used for harmful purposes, such as suppressing dissenters (Bamman et al.,  2012 ; Zhang et al.,  2014 ), compromising privacy/anonymity (Coavoux et al.,  2018 ; Grouin et al.,  2015 ), or profiling (Jurgens et al.,  2017 ; Wang et al.,  2018 ). Those applications might be unintended outcomes of systems developed for other purposes but could be deliberately developed by malicious actors. A much more widespread unintended negative consequence is the unfairness caused by demographic biases, such as unequal performance for different user groups (Tatman,  2017 ), misidentification of speakers and their needs (Criado Perez,  2019 ) or the proliferation of harmful stereotypes (e.g., Agarwal et al.,  2019 ; Koolen & van Cranenburgh,  2017 ; Kiritchenko & Mohammad,  2018 ). In this work, we follow the definition of Shah et al. ( 2020 ) for ‘bias’, which focuses on the mismatch of ideal and actual distributions of labels and user attributes in training and application of a system.

These biases are partially due to the rapid growth of the field and an inability to adapt to the new circumstances. Originally, machine learning and NLP were about solving toy problems on small data sets, promising to do it on more extensive data later. Any scepticism and worry about Artificial Intelligence (AI's) power were primarily theoretical. In essence, there was not enough data or computational power for these systems to impact people's lives. With the recent availability of large amounts of data and the universal application of NLP, this point has now arrived. However, even though we now have the possibility, many models are still trained without regard for demographic aspects. Moreover, many applications are focussed solely on information content, without awareness or concern for those texts' authors and the social meaning of the message (Hovy & Yang,  2021 ). But today, NLP's reach and ubiquity do have a real impact on people's lives (Hovy & Spruit,  2016 ). Our tools, for better or for worse, are used in everyday life. The age of academic innocence is over: we need to be aware that our models affect people's lives, yet not always in the way we imagine (Ehni,  2008 ). The most glaring reason for this disconnect is bias at various steps of the research pipeline.

The focus on applications has moved us away from models as a tool for understanding and towards predictive models. It has become clear that these tools produce excellent predictions but are much harder to analyse. They solve their intended task but also pick up on secondary aspects of language and potentially exploit them to fulfil the objective function. And language carries a lot of secondary information about the speaker, their self‐identification, and membership in socio‐demographic groups (Flek,  2020 ; Hovy & Spruit,  2016 ; Hovy & Yang,  2021 ). Whether I say ‘I am totally pumped’ or ‘I am very excited’ conveys information about me far beyond the actual meaning of the sentence. In a conversation, we actively use this information to pick up on a speaker's likely age, gender, regional origin or social class (Eckert,  2019 ; Labov,  1972 ). We know that the same sentence (‘That was a sick performance!’) can express either approval or disgust, based on whether a teenager or an octogenarian says it.

In contrast, current NLP tools fail to incorporate demographic variation and instead expect all language to follow the ‘standard’ encoded in the training data. But the question is: whose standard (Eisenstein,  2013 )? This approach is equivalent to expecting everyone to speak like the octogenarian from above: it leads to problems when encountering the teenager. As a consequence, NLP tools trained on one demographic sample perform worse on another sample (Garimella et al.,  2019 ; Hovy & Søgaard,  2015 ; Jørgensen et al.,  2015 ). This mismatch was known to affect text domains, but it also applies to socio‐demographic domains: people of, say, different age groups are linguistically as diverse as a text from a web blog and a newspaper (Johannsen et al.,  2015 ). Incidentally, demographics like age and text‐domain can often be correlated (Hovy,  2015 ). Plank ( 2016 ) therefore suggests treating these aspects of language as parts of our understanding of ‘domain’.

The consequences of these shortfalls range from an inconvenience to something much more insidious. In the most straightforward cases, systems fail and produce no output. This outcome is annoying and harms the user who cannot benefit from the service, but at least it is obvious enough for the user to see and respond to. In many cases, though, the effect is much less easy to notice: the performance degrades, producing sub‐par output for some users. This difference will become only evident in comparison but is not apparent to the individual user. This degradation is much harder to see but often systematic for a particular demographic group and creates a demographic bias in NLP applications.

The problem of bias introduced by socio‐demographic differences in the target groups is not restricted to NLP, though, but occurs in all data sciences (O'Neil,  2016 ). For example, in speech recognition, there is a strong bias towards native speakers of any language (Lawson et al.,  2003 ). But even for native speakers, there are barriers: dialect speakers or children can struggle to make themselves understood by a smart assistant (Harwell,  2018 ). Moreover, women and children—who speak in a higher register than the speakers in the predominantly male training sample—might not be processed correctly (or at all) in voice‐to‐text systems (Criado Perez,  2019 ). There have been several examples of computer vision bias, from an image captioning system labelling pictures of black people as ‘gorillas’ to cameras designed to detect whether a subject blinked, which malfunctioned if Asian people were in the picture (Howard & Borenstein,  2018 ). In a more abstract form, the correlation of socio‐demographics with variables of interest can cause problems, such as when ZIP code and income level can act as proxies for race (O'Neil,  2016 ). ProPublica reported that a machine learning system designed to predict bail decisions overfit on the defendants' skin colour: in this case, the social prejudices of the prior decisions became encoded in the data (Angwin et al.,  2016 ).

With great (predictive) power comes great responsibility, and several ethical questions arise when working with language. There are no hard and fast rules for everything, and the topic is still evolving, but several issues have emerged so far. While the overview here is necessarily incomplete, it is a starting point on the issue. It is based on recent work by Hovy and Spruit ( 2016 ), Shah et al. ( 2020 ) as well as the two workshops by the Association for Computational Linguistics on Ethics in NLP (Alfano et al.,  2018 ; Hovy et al.,  2017 ). See those sources for further in‐depth discussion.

2. OVERVIEW

This article is not the first attempt to comprehensively address demographic factors in NLP. General bias frameworks in Artificial Intelligence (AI) exist that lay the necessary groundwork for our approach. For example, Friedler et al. ( 2021 ) defined bias as fairness in algorithms by capturing all the latent features (i.e., demographics) in the data. Suresh and Guttag ( 2019 ) suggested a qualitative framework for bias in machine learning, defining bias as a ‘potential harmful property of the data’, though they leave out demographic and modelling aspects. Hovy and Spruit ( 2016 ) noted three qualitative sources of bias: data, modelling and research design, related to demographic bias, overgeneralization and topic exposure. In Shah et al. ( 2020 ), these and other frameworks are combined under a joint mathematical approach. Blodgett et al. ( 2020 ) provide an extensive survey of the way bias is studied in NLP. It points out the weaknesses in the research design and recommends grounding work analysing ‘bias’ in NLP systems in the relevant literature outside of NLP, understanding why system behaviours can be harmful and to whom, and engaging in a conversation with the communities that are affected by the NLP systems.

One thing to stress is that ‘bias’ per se is neither good nor bad: in a Bayesian framework, the prior P ( X ) serves as a bias: the expectation or base‐rate we should have for something before we see any further evidence. In real life, many of our reactions to everyday situations are biases that make our lives easier. Biases as a preset are not necessarily an issue: they only become problematic when they are kept even in the face of contradictory evidence or when applied to areas they were not meant for.

Many of the biases we will discuss here can also represent a form of information: as a diagnostic tool about the state of society (Garg et al.,  2018 ; Kozlowski et al.,  2019 ), or as a way to regularize our models (Plank et al.,  2014a ; Uma et al.,  2020 ). However, as input to predictive systems, these biases can have severe consequences and exacerbate existing inequalities between users.

Figure  1 shows the five sources of bias we discuss in this article. The first entry point for bias in the NLP pipeline is the choice of data for the experimentation. The labels chosen for training and the procedure used for annotating the labels introduces the annotation bias. Selection bias is introduced by the samples chosen for training or testing an NLP model. The third type of bias is introduced by the choice of representation used for the data. The choice of models or machine learning algorithms used also introduces the issue of bias amplification. Finally, the entire research design process can introduce bias if researchers are not careful with their choices in the NLP pipeline. In what follows, we discuss each of these biases in detail and provide insights into how they occur and how to mitigate them.

An external file that holds a picture, illustration, etc.
Object name is LNC3-15-0-g001.jpg

Schematic of the five bias sources in the general natural language processing pipeline

2.1. Bias from data

NLP systems reflect biases in the language data used for training them. Many data sets are created from long‐established news sources (e.g., Wall Street Journal, Frankfurter Rundschau from the 1980s through the 1990s), a very codified domain predominantly produced by a small, homogeneous sample: typically white, middle‐aged, educated, upper‐middle‐class men (Garimella et al.,  2019 ; Hovy & Søgaard,  2015 ). However, many syntactic analysis tools (taggers and parsers) are still trained on the newswire data from the 1980s and 1990s. Modern syntactic tools, therefore, expect everyone to speak like journalists from the 1980s. It should come as no surprise that most people today do not: language has evolved since then, and expressions that were ungrammatical then are acceptable today, ‘because internet’ (McCulloch,  2020 ). NLP is, therefore, unprepared to cope with this demographic variation.

Models trained on these data sets treat language as if it resembles this restricted training data, creating demographic bias. For example, Hovy ( 2015 ) and Jørgensen et al. ( 2015 ) have shown that this bias leads to significantly decreased performance for people under 35 and ethnic minorities, even in simple NLP tasks like finding verbs and nouns (i.e., part‐of‐speech tagging). The results are ageist, racist or sexist models that are biased against the respective user groups. This is the issue of selection bias , which is rooted in data.

When choosing a text data set to work with, we are also making decisions about the demographic groups represented in the data. As a result of the demographic signal present in language, any data set carries a demographic bias, that is, latent information about the demographic groups present in it. As humans, we would not be surprised if someone who grew up hearing only their dialect would have trouble understanding other people. If our data set is dominated by the ‘dialect’ of a specific demographic group, we should not be surprised that our models have problems understanding others. Most data sets have some built‐in bias, and in many cases, it is benign. It becomes problematic when this bias negatively affects certain groups or disproportionately advantages others. On biased data sets, statistical models will overfit to the presence of specific linguistic signals that are particular to the dominant group. As a result, the model will work less well for other groups, that is, it excludes demographic groups. Hovy ( 2015 ) and Jørgensen et al. ( 2015 ) have shown the consequences of exclusion for various groups, for example, people under 35 and speakers of African‐American vernacular English. Part‐of‐speech(POS) tagging models have a significantly lower accuracy for young people and ethnic minorities, vis‐á‐vis the dominant demographics in the training data. Apart from exclusion, these models will pose a problem for future research. Given that a large part of the world's population is currently under 30, such models will degrade even more over time and ultimately not meet their users' needs. This issue also has severe ramifications for the general applicability of any findings using these tools. In psychology, most studies are based on college students, a very specific demographic: western, educated, industrialized, rich and democratic research participants (so‐called WEIRD; Henrich et al.,  2010 ). The assumption that findings from this group would generalize to all other demographics has proven wrong and led to a heavily biased corpus of psychological data and research.

2.1.1. Counter‐measures

Potential counter‐measures to demographic selection bias can be simple. The most salient is undoubtedly to pay more attention to how data is collected and clarify what went into the construction of the data set. Bender and Friedman ( 2018 ) proposed a framework to document these decisions in a Data Statement. This statement includes various aspects of the data collection process and the underlying demographics. It provides future researchers with a way to assess the effect of any bias they might notice when using the data. As a beneficial side effect, it also forces us to consider how our data is made up. For already existing data sets, post‐stratification is the down‐sampling of over‐represented groups in the training data to even out the distribution until it reflects the actual distribution. Mohammady and Culotta ( 2014 ) have shown how existing demographic statistics can be used as supervision. In general, we can use measures to address overfitting or imbalanced data to correct for demographic bias in data. However, as various papers have pointed out (Bender et al.,  2021 ; Hutchinson et al.,  2021 ), addressing data bias is not a ‘one‐and‐done’ exercise but requires continual monitoring throughout a data sets lifecycle.

Alternatively, we can also collect additional data to balance existing data sets to account for exclusions or misrepresentations. Webster et al. ( 2018 ) released a gender‐balanced data set for co‐reference resolution task. Zhao et al. ( 2017 ) also explore balancing a data set with gender confounds for multi‐label object classification and visual semantic role labelling tasks. Data augmentation by controlling the gender attribute is an effective technique in mitigating gender bias in NLP processes (Dinan et al.,  2020 ; Sun et al.,  2019 ). Wei and Zou ( 2019 ) explore data augmentation techniques that improve performance on various text classification tasks.

2.2. Bias from annotations

Annotation can introduce bias in various forms through a mismatch of the annotator population with the data. This is the issue of label bias . Label and selection bias can—and most often do—interact, so it can be challenging to distinguish them. It does, however, underscore how important it is to address them jointly. There are several ways in which annotations introduce bias.

In its simplest form, bias arises because annotators are distracted, uninterested, or lazy about the annotation task. As a result, they choose the ‘wrong’ labels. More problematic is label bias from informed and well‐meaning annotators that systematically disagree. Plank et al. ( 2014b ) have shown that this type of bias arises when there is more than one possible correct label. For example, the term ‘social media’ can be validly analysed as either a noun phrase composed of an adjective and a noun, or a noun compound, composed of two nouns. Which label an annotator chooses depends on their interpretation of how lexicalized the term ‘social media’ is. If they perceive it as fully lexicalized, they will choose a noun compound. If they believe the process is still ongoing, that is, the phrase is analytical, they will choose an ‘adjective plus noun’ construct. Two annotators with these opposing views will systematically label ‘social’ as an adjective or a noun, respectively. While we can spot the disagreement, we cannot discount either of them as wrong or malicious.

Finally, label bias can result from a mismatch between authors' and annotators' linguistic and social norms. Sap et al. ( 2019 ) showed that they reflect social and demographic differences, for example, that annotators rate the utterances of different ethnic groups differently and that they mistake innocuous banter as hate speech because they are unfamiliar with communication norms of the original speakers.

There has been a movement towards increasingly using annotations from crowdsourcing rather than trained expert annotators. While it is cheaper and (in theory) equivalent to the quality of trained annotators (Snow et al.,  2008 ), it does introduce a range of biases. For example, various works have shown that crowdsourced annotators' demographic makeup is not as representative as one might hope (Pavlick et al.,  2014 ). On the one hand, crowdsourcing is easier to scale, potentially covering more diverse backgrounds than we would find in expert annotator groups. On the other hand, it is much harder to train and communicate with crowdsourced annotators, and their incentives might not align with the projects we care about. For example, suppose we ask crowd workers to annotate concepts like dogmatism, hate speech, or microaggressions. Their answers will inherently include their societal perspective of these concepts. This bias can be good or bad, depending on the sample of annotators: we may get multiple perspectives that approximate the population as a whole, or annotations may get skewed results due to the selection. However, we might also not want various perspectives if there is a theoretically motivated and well‐defined way in which we plan to annotate. Crowdsourcing and its costs raise several other ethical questions about worker payment and fairness (Fort et al.,  2011 ).

2.2.1. Counter‐measures

Malicious annotators are luckily relatively easy to spot and can be remedied by using multiple annotations per item and aggregating with an annotation model (Hovy et al.,  2013 ; Passonneau & Carpenter,  2014 ; Paun et al.,  2018 ). These models help us find biased annotators and let us account for the human disagreement between labels. A free online version of such a tool is available at https://mace.unibocconi.it/ . They presuppose, however, that there is a single correct gold label for each data point and that annotations are simply corruptions of it.

If there is more than one possible correct answer, we can use disagreement information in the update process of our models (Fornaciari et al.,  2021 ; Plank et al.,  2014a ; Uma et al.,  2020 ). That is, we can encourage the models to make more minor updates if human annotators easily confuse the categories with each other (say, adjectives and nouns in noun compounds like ‘social media’). We make regular updates if they are mutually exclusive categories (such as verbs and nouns).

The only way to address mismatched linguistic norms is to pay attention to selecting annotators (i.e., matching them to the author population in terms of linguistic norms) or provide them with dedicated training. The latter should be generally considered. While annotator training is time‐intensive and potentially costly, it can be worth the effort in terms of better and less biased labels.

2.3. Bias from input representations

Even balanced, well‐labelled data sets contain bias: the most common text inputs representing in NLP systems, word embeddings (Mikolov et al.,  2013 ), have been shown to pick up on racial and gender biases in the training data (Bolukbasi et al.,  2016 ; Manzini et al.,  2019 ). For example, ‘woman’ is associated with ‘homemaker’ in the same way ‘man’ is associated with ‘programmer’. There has been some justified scepticism over whether these analogy tasks are the best way to evaluate embedding models (Nissim et al.,  2020 ), but there is plenty of evidence that (1) embeddings do capture societal attitudes (Bhatia,  2017 ; Garg et al.,  2018 ; Kozlowski et al.,  2019 ), and that (2) these societal biases are resistant to many correction methods (Gonen & Goldberg,  2019 ). This is the issue of semantic bias .

These biases hold not just for word embeddings but also for the contextual representations of big pre‐trained language models that are now widely used in different NLP systems. As they are pre‐trained on almost the entire available internet, they are even more prone to societal biases. Several papers have shown that these models reproduce and thereby perpetuate these biases and stereotypes (Kurita et al.,  2019 ; Tan and Celis,  2019 ).

There exist a plethora of efforts for debiasing embeddings (Bolukbasi et al.,  2016 ; Sun et al.,  2019 ; Zhao et al.,  2017 , 2019 ). The impact and applicability of debiased embeddings are unclear on a wide range of downstream tasks. As stated above, biases are usually masked, not entirely removed, by these methods. Even if it was possible to remove biases in the embeddings, it is not always clear whether it is useful (bias might carry information).

A central issue is the language models' training objective: to predict the most likely next term, given the previous context ( n ‐grams). While this objective captures distributional semantic properties, it may itself not contribute to building unbiased embeddings, as it represents the world as we find it, rather than as we would like to have it (descriptive vs. normative view).

2.3.1. Counter‐measures

In general, when using embeddings for downstream applications, it is good practice to be aware of their biases. This awareness helps to identify the applicability of such embeddings to your specific domains and tasks. For example, these models are not directly applicable to data sets that contain scientific articles or medical terminologies.

Recent work has focussed on debiasing embeddings for specific downstream applications and groups of the population. For example debiasing embeddings for reducing gender bias in text classification (Prost et al.,  2019 ), dialogue generation (Dinan et al.,  2020 ; Liu et al.,  2020 ), and machine translation (Font & Costa‐jussà,  2019 ). Such efforts are more conscious of the effects of debiasing on the target application. Additional metrics, approaches and data sets have been proposed to measure the bias inherent in large language models and their sentence completions (Nangia et al.,  2020 ; Nozza et al.,  2021 ; Sheng et al.,  2019 ).

2.4. Bias from models

Simply using ‘better’ training data is not a feasible long‐term solution: languages evolve continuously, so even a representative sample can only capture a snapshot—at best a short‐lived solution (see Fromreide et al.,  2014 ). These biases compound to create severe performance differences for different user groups. Zhao et al. ( 2017 ) demonstrated that systems trained on biased data exacerbate that bias even further when applied to new data, and Kiritchenko and Mohammad ( 2018 ) have shown that sentiment analysis tools pick up on societal prejudices, leading to different outcomes for different demographic groups. For example, by merely changing the gender of a pronoun, the systems classified the sentence differently. Hovy et al. ( 2020 ) found that machine translation systems changed the perceived user demographics to make samples sound older and more male in translation. This issue is bias overamplification , which is rooted in the models themselves.

One of the sources of bias overamplification is the choice of loss objective used in training the models. These objectives usually correspond to improving the precision of the predictions. Models might exploit spurious correlations (e.g., all positive examples in the training data happened to come from female authors so that gender can be used as a discriminative feature) or statistical irregularities in the data set to achieve higher precision (Gururangan et al.,  2018 ; Poliak et al.,  2018 ). In other words, they might give the correct answers for the wrong reasons. This behaviour is hard to track until we find a consistent case of bias.

Another issue with the design of machine learning models is that they always make a prediction, even when they are unsure or when they cannot know the answer. The latter could be due to the test data point lying outside the training data distribution or the model's representation space. Prabhumoye et al. ( 2021 ) discuss this briefly in a case study for machine translation systems. If a machine translation tool translates the gender‐neutral Turkish ‘O bir doktor, o bir hemşire’ into ‘He is a doctor, she is a nurse’, it might provide us with an insight into societal expectations (Garg et al.,  2018 ). Still, it also induces an incorrect result the user did not intend. Ideally, models should report to the user that they could not translate rather than produce a wrong translation.

2.4.1. Counter‐measures

The susceptibility of models to all aspects of the training data makes it so important to test our systems on various held‐out data sets rather than a single, designated test set. Recent work has explored objectives other than recall, F1 and so on, for example, the performance stratified by subgroup present in the data. These metrics can lead to fairer predictions across subgroups (Chouldechova,  2017 ; Corbett‐Davies & Goel,  2018 ; Dixon et al.,  2018 ), for example, if the metrics show that the performance for a specific group is much lower than for the rest. Moving away from pure performance metrics and looking at the robustness and behaviour of the model in suites of specially designed cases can add further insights (Ribeiro et al.,  2020 ).

Card and Smith ( 2020 ) explore constraints to be specified on outcomes of models. Specifically, these constraints ensure that the proportion of predicted labels should be the same or approximately the same for each user group.

More generally, methods designed to probe and analyse the model can help us understand how it reached decisions. Neural features like attention (Bahdanau et al.,  2015 ) can provide visualizations. Kennedy et al. ( 2020 ) propose a sampling‐based algorithm to explore the impact of individual words on classification. As policy changes put an increased focus on explainable AI (EU High‐Level Expert Group on AI,  2019 ), such methods will likely become useful for both bias spotting and legal recourse.

Systems that explicitly model user demographics will help produce both more personalized and less biased translations (Font & Costa‐jussà,  2019 ; Mirkin et al.,  2015 ; Mirkin & Meunier,  2015 ; Saunders & Byrne,  2020 ; Stanovsky et al.,  2019 ).

2.5. Bias from research design

Despite a growing interest in multi‐ and cross‐lingual work, most NLP research is still in and on English. It generally focuses on Indo‐European data/text sources, rather than other language groups or smaller languages, for example, in Asia or Africa (Joshi et al.,  2020 ). Even if there is a potential wealth of data available from other languages, most NLP tools skew towards English (Munro,  2013 ; Schnoebelen,  2013 ).

This underexposure is a self‐fulfilling prophecy: researchers are less likely to work on those languages for which there are not many resources. Instead, they work on languages and tasks for which data is readily available, potentially generating more data in the process. Consequently, there is a severe shortage for some languages but an overabundance for others. In a random sample of Tweets from 2013, there were 31 different languages (Plank,  2016 ), but no treebanks for about two‐thirds of them and even fewer semantically annotated resources like WordNets. Note that the number of language speakers does not necessarily correlate with the number of available resources . These were not obscure languages with few speakers, but often languages with millions of speakers. The shortage of syntactic resources has since been addressed by the Universal Dependency Project (Nivre et al.,  2020 ). However, a recent paper (Joshi et al.,  2020 ) found that most conferences still focus on the well‐resourced languages and are less inclusive of less‐resourced ones.

This dynamic makes new research on smaller languages more complicated, and it naturally directs new researchers towards the existing languages, first among them English. The existence of off‐the‐shelf tools for English makes it easy to try new ideas in English. The focus on English may therefore be self‐reinforcing and has created an overexposure of this variety. The overexposure to English (as well as to particular research areas or methods) creates a bias described by the availability heuristic (Tversky & Kahneman,  1973 ). If we are exposed to something more often, we can recall it more efficiently, and if we can recall things quickly, we infer that they must be more important, bigger, better, more dangerous and so on. For instance, people estimate the size of cities they recognize to be larger than that of unknown cities (Goldstein & Gigerenzer,  2002 ). It requires a much higher start‐up cost to explore other languages in terms of data annotation, basic analysis models and other resources. The same holds for languages, methods and topics we research.

Overexposure can also create or feed into existing biases, for example, that English is the ‘default’ language, even though both morphology and syntax of English are global outliers. It is questionable whether NLP would have focused on n ‐gram models to the same extent if it had instead been developed on a morphologically complex language (e.g., Finnish, German). However, because of the unique structure of English, n ‐gram approaches worked well, spread to become the default approach and only encountered problems when faced with different languages. Lately, there has been a renewed interest beyond English, as there are economic incentives for NLP groups to work on and in other languages. Concurrently, new neural methods have made more multi‐lingual and cross‐lingual approaches possible. These methods include, for example, multi‐lingual representations (Devlin et al.,  2019 ; Nozza et al.,  2020 ) and the zero‐shot learning they enable (e.g., Bianchi et al.,  2021 ; Jebbara & Cimiano,  2019 ; Liu et al.,  2019 , inter alia). However, English is still one of the most widely spoken languages and by far the biggest market for NLP tools. So there are still more commercial incentives to work on English than other languages, perpetuating the overexposure.

One of the reasons for the linguistic and cultural skew in research is the makeup of research groups themselves. In many cases, these groups do not necessarily reflect the demographic composition of the user base. Hence, marginalized communities or speaker groups do not have their voice represented proportionally. Initiatives like Widening NLP 2 are beginning to address this problem, but the issue still leaves a lot of room for improvement.

Finally, not analysing the behaviour of models sufficiently, or not fully disclosing it can be harmful (Bianchi & Hovy,  2021 ). These omissions are not necessarily due to ill will, but are often the result of a relentless pressure to publish. An example of the resulting bias is not fully understanding the intended use of the trained models and how they can be misused (i.e., its dual use). The introduction of ethical consideration sections and an ethics reviews in NLP venues is a step to give these aspects more attention and encourage reflection.

An interesting framework to think about these issues is the suggestion by Van de Poel ( 2016 ) to think of new technology (such as NLP) as a large‐scale social experiment. An experiment we are all engaged in at a massive scale. As an experiment, however, we need to make sure we respect specific guidelines and ground rules. There are detailed requirements for social and medical sciences experiments to get the approval of an ethics committee or IRB (internal review board). These revolve around the safety of the subjects and involve beneficence (no harm to subjects, maximize benefits, minimize risk), respect for subjects ' autonomy (informed consent), and justice (weighing of benefits vs. harms, protection of vulnerable subjects). Not all of these categories are easily translated into NLP as a large‐scale experiment. However, it can help us frame our decisions within specific philosophical schools of thought, as outlined by Prabhumoye et al. ( 2021 ).

2.5.1. Counter‐measures

There are no easy solutions to design bias, which might only become apparent in hindsight. However, any activity or measure that increases the chance of reflection on the project can help to counter inherent biases. For example, Emily Bender has suggested making overexposure bias more apparent by stating explicitly which language we work on ‘even if it is English’ (Bender,  2019 ). There is, of course, no issue with research on English, but it should be made explicit that the results might not automatically hold for all languages.

It can help to ask ourselves counterfactuals: ‘ Would I research this is if the data wasn ' t as easily available? Would my finding still hold on another language?’ We can also try to assess whether the research direction of a project feeds into existing biases or whether it overexposes certain groups.

A way forward is to use various evaluation settings and metrics (Ribeiro et al.,  2020 ). Some conferences have also started suggesting guidelines to assess the potential for ethical issues with a system (e.g., the NAACL 2021 Ethics FAQ and guidelines). 3 Human intervention and thought are required at every stage of the NLP application design lifecycle to prioritize equity and stakeholders from marginalized groups (Costanza‐Chock,  2020 ). Recent work by Bird ( 2020 ) suggests new ways of collaborating with Indigenous communities in the form of open discussions and proposes a postcolonial approach to computational methods for supporting language vitality. Finally, Havens et al. ( 2020 ) discuss the need for a bias‐aware methodology in NLP and present a case study in executing it. Researchers have to be mindful of the entire research design: data sets they choose, the annotation schemes or labelling procedures they follow, how they decide to represent the data, the algorithms they choose for the task and how they evaluate the automated systems. Researchers need to be aware of the real‐world applications of their work and consciously decide to choose to help marginalized communities via technology (Asad et al.,  2019 ).

3. CONCLUSION

This article outlined five of the most common sources of bias in NLP models: data selection, annotation, representations, models and our own research design. However, we are not merely at the mercy of these biases: there exists a growing arsenal of algorithmic and methodological approaches to mitigate biases from all sources. The most difficult might be bias from research design, which requires introspection and systematic analysis of our own preconceived notions and blind spots.

ACKNOWLEDGEMENT

Open Access Funding provided by Universita Bocconi within the CRUI‐CARE Agreement.

Biographies

Dirk Hovy is an Associate Professor of Computer Science in the Department of Marketing, and the scientific director of the Data and Marketing Insights research unit at Bocconi University in Milan, Italy. His research focuses on what language can tell us about society, and what computers can tell us about language. He is interested in the interplay of social dimensions of language and NLP models, and the consequences for bias and fairness. His work explores how to integrate sociolinguistic knowledge into NLP models to counteract demographic bias, and was recently awarded a Starting Grant by the European Research Council (ERC) on this topic. Dirk co‐founded and organized two editions of the Ethics in NLP workshops, and is a frequent invited speaker on panels on ethics. Website: http://www.dirkhovy.com/

Shrimai Prabhumoye is a PhD student at the Language Technologies Institute at the School of Computer Science, Carnegie Mellon University. Her work focuses on controllable text generation with focus on style, content and structure. She is also exploring the ethical considerations of controllable text generation. She co‐designed the Computational Ethics for NLP course at CMU, which was offered for the first time in Spring 2018. Website: https://www.cs.cmu.edu/∼sprabhum/

Hovy, D. , & Prabhumoye, S. (2021). Five sources of bias in natural language processing . Language and Linguistics Compass , e12432. 10.1111/lnc3.12432 [ PMC free article ] [ PubMed ] [ CrossRef ]

Dirk Hovy and Shrimai Prabhumoye contributed equally.

1 http://www.marketsandmarkets.com/Market‐Reports/natural‐language‐processing‐nlp‐825.html .

2 http://www.winlp.org/ .

3 https://2021.naacl.org/ethics/faq/ .

  • Agarwal, O. , Durupınar, F. , Badler, N. I. , & Nenkova, A. (2019). Word embeddings (also) encode human personality stereotypes. Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019) , 205–211. Minneapolis, Minnesota : Association for Computational Linguistics. https://www.aclweb.org/anthology/S19‐1023
  • Alfano, M. , Hovy, D. , Mitchell, M. , & Strube, M. (Eds.). (2018). Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing. New Orleans, Louisiana, USA : Association for Computational Linguistics. https://www.aclweb.org/anthology/W18‐0800 [ Google Scholar ]
  • Angwin, J. , Larson, J. , Mattu, S. , & Kirchner, L. (2016). Machine bias. ProPublica, May , 23. https://www.propublica.org/article/machine‐bias‐risk‐assessments‐in‐criminal‐sentencing
  • Asad, M. , Dombrowski, L. , Costanza‐Chock, S. , Erete, S. , & Harrington, C. (2019). Academic accomplices: Practical strategies for research justice. Companion Publication of the 2019 on Designing Interactive Systems Conference 2019 Companion (pp. 353–356).
  • Bahdanau, D. , Cho, K. , & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Y. Bengio, & Y. LeCun (Eds.), 3rd International Conference on Learning Representations , ICLR 2015, San Diego, CA, USA, May 7‐9, 2015, Conference Track Proceedings. http://arxiv.org/abs/1409.0473
  • Bamman, D. , O'Connor, B. , & Smith, N. (2012). Censorship and deletion practices in Chinese social media. First Monday , 17.
  • Bender, E. (2019). The BenderRule: On naming the languages we study and why it matters . The Gradient. https://thegradient.pub/the‐benderrule‐on‐naming‐the‐languages‐we‐study‐and‐why‐it‐matters
  • Bender, E. M. , & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science . Transactions of the Association for Computational Linguistics , 6 , 587–604. [ Google Scholar ]
  • Bender, E. M. , Gebru, T. , McMillan‐Major, A. , & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623).
  • Benton, A. , Mitchell, M. , & Hovy, D. (2017). Multitask learning for mental health conditions with limited social media data. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers , 152–162. Valencia, Spain: Association for Computational Linguistics. https://www.aclweb.org/anthology/E17‐1015
  • Bhatia, S. (2017). Associative judgment and vector space semantics . Psychological Review , 124 , 1–20. [ PubMed ] [ Google Scholar ]
  • Bianchi, F. , & Hovy, D. (2021). On the gap between adoption and understanding in nlp. Findings of the Association for Computational Linguistics: ACL 2021 . Association for Computational Linguistics.
  • Bianchi, F. , Terragni, S. , Hovy, D. , Nozza, D. , & Fersini, E. (2021). Cross‐lingual contextualized topic models with zero‐shot learning. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , 1676–1683. Association for Computational Linguistics. https://www.aclweb.org/anthology/2021.eacl‐main.143
  • Bird, S. (2020). Decolonising speech and language technology. Proceedings of the 28th International Conference on Computational Linguistics , 3504–3519. Barcelona, Spain (Online): International Committee on Computational Linguistics. https://www.aclweb.org/anthology/2020.coling‐main.313
  • Blodgett, S. L. , Barocas, S. , Daumé, H., III , & Wallach, H. (2020). Language (technology) is power: A critical survey of “bias” in NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 5454–5476. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.acl‐main.485
  • Bolukbasi, T. , Chang, K.‐W. , Zou, J. Y. , Saligrama, V. , & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems , 4349–4357.
  • Brin, S., & Page, L. (1998). The anatomy of a large‐scale hypertextual web search engine . Computer networks and ISDN systems , 30 ( 1–7 ), 107–117. [ Google Scholar ]
  • Card, D. , & Smith, N. A. (2020). On consequentialism and fairness . Frontiers in Artificial Intelligence , 3 , 34. https://www.frontiersin.org/article/10.3389/frai.2020.00034 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments . Big Data , 5 , 153–163. [ PubMed ] [ Google Scholar ]
  • Coavoux, M. , Narayan, S. , & Cohen, S. B. (2018). Privacy‐preserving neural representations of text. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , 1–10. Brussels, Belgium: Association for Computational Linguistics. https://www.aclweb.org/anthology/D18‐1001
  • Coppersmith, G. , Dredze, M. , Harman, C. , Hollingshead, K. , & Mitchell, M. (2015). Clpsych 2015 shared task: Depression and ptsd on twitter. CLPsych@ HLT‐NAACL , 31–39.
  • Corbett‐Davies, S. , & Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023 .
  • Costanza‐Chock, S. (2020). Design justice: Community‐led practices to build the worlds we need . The MIT Press. [ Google Scholar ]
  • Criado Perez, C. (2019). Invisible women: Exposing data bias in a world designed for men . Random House. [ Google Scholar ]
  • Devlin, J. , Chang, M.‐W. , Lee, K. , & Toutanova, K. (2019). BERT: Pre‐training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics. https://www.aclweb.org/anthology/N19‐1423
  • Dinan, E. , Fan, A. , Williams, A. , Urbanek, J. , Kiela, D. , & Weston, J. (2020). Queens are powerful too: Mitigating gender bias in dialogue generation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 8173–8188. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.emnlp‐main.656
  • Dixon, L. , Li, J. , Sorensen, J. , Thain, N. , & Vasserman, L. (2018). Measuring and mitigating unintended bias in text classification. Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’ 18, 67 –73. Association for Computing Machinery, New York, NY, USA. 10.1145/3278721.3278729 [ CrossRef ]
  • Eckert, P. (2019). The limits of meaning: Social indexicality, variation, and the cline of interiority . Language , 95 , 751–776. [ Google Scholar ]
  • Ehni, H.‐J. (2008). Dual use and the ethical responsibility of scientists . Archivum Immunologiae et Therapiae Experimentalis , 56 , 147–152. [ PubMed ] [ Google Scholar ]
  • Eisenstein, J. (2013). What to do about bad language on the internet. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 359–369. Atlanta, Georgia: Association for Computational Linguistics. https://www.aclweb.org/anthology/N13‐1037
  • EU High‐Level Expert Group on AI . (2019). Ethics guidelines for trustworthy AI . https://ec.europa.eu/newsroom/dae/document.cfm?doc_id=60419
  • Flek, L. (2020). Returning the N to NLP: Towards contextually personalized classification models. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 7828–7838. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.acl‐main.700
  • Font, J. E. , & Costa‐jussà, M. R. (2019). Equalizing gender bias in neural machine translation with word embeddings techniques. Proceedings of the First Workshop on Gender Bias in Natural Language Processing , 147–154. Florence, Italy: Association for Computational Linguistics. https://www.aclweb.org/anthology/W19‐3821
  • Fornaciari, T. , Uma, A. , Paun, S. , Plank, B. , Hovy, D. , & Poesio, M. (2021). Beyond black & white: Leveraging annotator disagreement via soft‐label multi‐task learning. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2591–2597. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2021.naacl‐main.204
  • Fort, K. , Adda, G. , & Cohen, K. B. (2011). Last words: Amazon Mechanical Turk: Gold mine or coal mine? Computational Linguistics , 37 , 413–420. https://www.aclweb.org/anthology/J11‐2010 [ Google Scholar ]
  • Friedler, S. A. , Scheidegger, C. , & Venkatasubramanian, S. (2021). The (im) possibility of fairness: Different value systems require different mechanisms for fair decision making . Communications of the ACM , 64 , 136–143. [ Google Scholar ]
  • Fromreide, H. , Hovy, D. , & Søgaard, A. (2014). Crowdsourcing and annotating NER for Twitter #drift. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) , 2544–2547. Reykjavik, Iceland: European Language Resources Association (ELRA). http://www.lrec‐conf.org/proceedings/lrec2014/pdf/421_Paper.pdf
  • Garg, N. , Schiebinger, L. , Jurafsky, D. , & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes . Proceedings of the National Academy of Sciences , 115 , E3635–E3644. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Garimella, A. , Banea, C. , Hovy, D. , & Mihalcea, R. (2019). Women's syntactic resilience and men's grammatical luck: Gender‐bias in part‐of‐speech tagging and dependency parsing. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , 3493–3498. Florence, Italy: Association for Computational Linguistics. https://www.aclweb.org/anthology/P19‐1339
  • Goldstein, D. G. , & Gigerenzer, G. (2002). Models of ecological rationality: The recognition heuristic . Psychological Review , 109 , 75–90. [ PubMed ] [ Google Scholar ]
  • Gonen, H. , & Goldberg, Y. (2019). Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , 609–614. https://www.aclweb.org/anthology/N19‐1061
  • Grouin, C. , Griffon, N. , & Névéol, A. (2015). Is it possible to recover personal health information from an automatically de‐identified corpus of French EHRs? Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis , 31–39. Lisbon, Portugal: Association for Computational Linguistics. https://www.aclweb.org/anthology/W15‐2604
  • Gururangan, S. , Swayamdipta, S. , Levy, O. , Schwartz, R. , Bowman, S. R. , & Smith, N. A. (2018). Annotation artifacts in natural language inference data. In NAACL‐HLT (2).
  • Harwell, D. (2018). The accent gap. Why some accents don’t work on Alexa or Google Home . The Washington Post. https://www.washingtonpost.com/graphics/2018/business/alexa‐does‐not‐understand‐your‐accent/
  • Havens, L. , Terras, M. , Bach, B. , & Alex, B. (2020). Situated data, situated systems: A methodology to engage with power relations in natural language processing research. Proceedings of the Second Workshop on Gender Bias in Natural Language Processing , 107–124. Barcelona, Spain (Online): Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.gebnlp‐1.10
  • Henrich, J. , Heine, S. J. , & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences , 33 , 61–83. [ PubMed ] [ Google Scholar ]
  • Hovy, D. (2015). Demographic factors improve classification performance. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , 752–762. Beijing, China: Association for Computational Linguistics. https://www.aclweb.org/anthology/P15‐1073
  • Hovy, D. , Berg‐Kirkpatrick, T. , Vaswani, A. , & Hovy, E. (2013). Learning whom to trust with mace. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 1120–1130.
  • Hovy, D. , Bianchi, F. , & Fornaciari, T. (2020). “you sound just like your father” commercial machine translation systems include stylistic biases. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 1686–1690. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.acl‐main.154
  • Hovy, D. , & Søgaard, A. (2015). Tagging performance correlates with author age. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) , 483–488.
  • Hovy, D. , & Spruit, S. L. (2016). The social impact of natural language processing. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 591–598. Berlin, Germany: Association for Computational Linguistics. https://www.aclweb.org/anthology/P16‐2096
  • Hovy, D. , & Yang, D. (2021). The importance of modeling social factors of language: Theory and practice. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 588–602. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2021.naacl‐main.49
  • Hovy, D. , Spruit, S. , Mitchell, M. , Bender, E. M. , Strube, M. , & Wallach, H. (Eds.). (2017). Proceedings of the First ACL Workshop on Ethics in Natural Language Processing . Valencia, Spain: Association for Computational Linguistics. https://www.aclweb.org/anthology/W17‐1600 [ Google Scholar ]
  • Howard, A. , & Borenstein, J. (2018). The ugly truth about ourselves and our robot creations: The problem of bias and social inequity . Science and Engineering Ethics , 24 , 1521–1536. [ PubMed ] [ Google Scholar ]
  • Hutchinson, B. , Smart, A. , Hanna, A. , Denton, E. , Greer, C. , Kjartansson, O. , Barnes, P. , & Mitchell, M. (2021). Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , 560–575.
  • Jebbara, S. , & Cimiano, P. (2019). Zero‐shot cross‐lingual opinion target extraction. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , 2486–2495. Minneapolis, Minnesota: Association for Computational Linguistics. https://www.aclweb.org/anthology/N19‐1257
  • Johannsen, A. , Hovy, D. , & Søgaard, A. (2015). Cross‐lingual syntactic variation over age and gender. Proceedings of the Nineteenth Conference on Computational Natural Language Learning , 103–112. Beijing, China: Association for Computational Linguistics. https://www.aclweb.org/anthology/K15‐1011
  • Jørgensen, A. , Hovy, D. , & Søgaard, A. (2015). Challenges of studying and processing dialects in social media. Proceedings of the Workshop on Noisy User‐generated Text , 9–18.
  • Joshi, P. , Santy, S. , Budhiraja, A. , Bali, K. , & Choudhury, M. (2020). The state and fate of linguistic diversity and inclusion in the NLP world. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 6282–6293. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.acl‐main.560
  • Jurgens, D. , Tsvetkov, Y. , & Jurafsky, D. (2017). Writer profiling without the writer's text. In Ciampaglia G. L., Mashhadi A., & Yasseri T. (Eds.), Social informatics (pp. 537–558). Springer International Publishing. [ Google Scholar ]
  • Kennedy, B. , Jin, X. , Mostafazadeh Davani, A. , Dehghani, M. , & Ren, X. (2020). Contextualizing hate speech classifiers with post‐hoc explanation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 5435–5442. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.acl‐main.483
  • Kiritchenko, S. , & Mohammad, S. (2018). Examining gender and race bias in two hundred sentiment analysis systems. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics , 43–53.
  • Koolen, C. , & van Cranenburgh, A. (2017). These are not the stereotypes you are looking for: Bias and fairness in authorial gender attribution. Proceedings of the First ACL Workshop on Ethics in Natural Language Processing , 12–22. Valencia, Spain: Association for Computational Linguistics. https://www.aclweb.org/anthology/W17‐1602
  • Kozlowski, A. C. , Taddy, M. , & Evans, J. A. (2019). The geometry of culture: Analyzing the meanings of class through word embeddings . American Sociological Review , 84 , 905–949. 10.1177/0003122419877135 [ CrossRef ] [ Google Scholar ]
  • Kurita, K. , Vyas, N. , Pareek, A. , Black, A. W. , & Tsvetkov, Y. (2019). Measuring bias in contextualized word representations. Proceedings of the First Workshop on Gender Bias in Natural Language Processing , 166–172. Florence, Italy: Association for Computational Linguistics. https://www.aclweb.org/anthology/W19‐3823
  • Labov, W. (1972). Sociolinguistic patterns . University of Pennsylvania Press. [ Google Scholar ]
  • Lawson, A. D. , Harris, D. M. , & Grieco, J. J. (2003). Effect of foreign accent on speech recognition in the nato n‐4 corpus. Eighth European Conference on Speech Communication and Technology .
  • Liu, H. , Wang, W. , Wang, Y. , Liu, H. , Liu, Z. , & Tang, J. (2020). Mitigating gender bias for neural dialogue generation with adversarial learning. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 893–903. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.emnlp‐main.64
  • Liu, Z. , Shin, J. , Xu, Y. , Winata, G. I. , Xu, P. , Madotto, A. , & Fung, P. (2019). Zero‐shot cross‐lingual dialogue systems with transferable latent variables. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP‐IJCNLP) , 1297–1303. Hong Kong, China: Association for Computational Linguistics. https://www.aclweb.org/anthology/D19‐1129
  • Manzini, T. , Yao Chong, L. , Black, A. W. , & Tsvetkov, Y. (2019). Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , 615–621. Minneapolis, Minnesota: Association for Computational Linguistics. https://www.aclweb.org/anthology/N19‐1062
  • McCulloch, G. (2020). Because internet: Understanding the new rules of language . Riverhead Books. [ Google Scholar ]
  • Mikolov, T. , Sutskever, I. , Chen, K. , Corrado, G. S. , & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems , 3111–3119.
  • Mirkin, S. , & Meunier, J.‐L. (2015). Personalized machine translation: Predicting translational preferences. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , 2019–2025.
  • Mirkin, S. , Nowson, S. , Brun, C. , & Perez, J. (2015). Motivating personality‐aware machine translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , 1102–1108.
  • Mohammady, E. , & Culotta, A. (2014). Using county demographics to infer attributes of twitter users. Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media , 7–16.
  • Munro, R. (2013, May 22). NLP for all languages . Idibon Blog. http://idibon.com/nlp‐for‐all
  • Nangia, N. , Vania, C. , Bhalerao, R. , & Bowman, S. R. (2020). CrowS‐pairs: A challenge dataset for measuring social biases in masked language models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 1953–1967. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.emnlp‐main.154
  • Nissim, M. , van Noord, R. , & van der Goot, R. (2020). Fair is better than sensational: Man is to doctor as woman is to doctor . Computational Linguistics , 46 , 487–497. https://www.aclweb.org/anthology/2020.cl‐2.7 [ Google Scholar ]
  • Nivre, J. , de Marneffe, M.‐C. , Ginter, F. , Hajič, J. , Manning, C. D. , Pyysalo, S. , Schuster, S. , Tyers, F. , & Zeman, D. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection. Proceedings of the 12th Language Resources and Evaluation Conference , 4034–4043. Marseille, France: European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec‐1.497
  • Nozza, D. , Bianchi, F. and Hovy, D. (2021). Measuring hurtful sentence completion in language models. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2398–2406. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2021.naacl‐main.191
  • Nozza, D. , Bianchi, F. and Hovy, D. (2020) What the [MASK]? Making sense of language‐specific BERT models. arXiv preprint arXiv:2003.02912 .
  • O'Neil, C. (2016, February 4). The ethical data scientist . Slate. http://www.slate.com/articles/technology/future_tense/2016/02/how_to_bring_better_ethics_to_data_science.html
  • Palakovich, J. , Eigeman, J. , McDaniel, C. E. , Maringas, M. , Chodavarapu, S. , et al. (2017). Virtual agent proxy in a real‐time chat service . US Patent , 9 ( 559 ), 993. [ Google Scholar ]
  • Passonneau, R. J. , & Carpenter, B. (2014). The benefits of a model of annotation . Transactions of the Association for Computational Linguistics , 2 , 311–326. [ Google Scholar ]
  • Paun, S. , Carpenter, B. , Chamberlain, J. , Hovy, D. , Kruschwitz, U. , & Poesio, M. (2018). Comparing Bayesian models of annotation . Transactions of the Association for Computational Linguistics , 6 , 571–585. [ Google Scholar ]
  • Pavlick, E. , Post, M. , Irvine, A. , Kachaev, D. , & Callison‐Burch, C. (2014). The language demographics of Amazon Mechanical Turk . Transactions of the Association for Computational Linguistics , 2 , 79–92. https://www.aclweb.org/anthology/Q14‐1007 [ Google Scholar ]
  • Plank, B. (2016). What to do about non‐standard (or non‐canonical) language in NLP. Proceedings of the Conference on Natural Language Processing (KONVENS) , 13–20. Bochumer Linguistische Arbeitsberichte.
  • Plank, B. , Hovy, D. , & Søgaard, A. (2014a). Learning part‐of‐speech taggers with inter‐annotator agreement loss. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics , 742–751.
  • Plank, B. , Hovy, D. , & Søgaard, A. (2014b). Linguistically debatable or just plain wrong? Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 507–511. Baltimore, Maryland: Association for Computational Linguistics. https://www.aclweb.org/anthology/P14‐2083
  • Poliak, A. , Naradowsky, J. , Haldar, A. , Rudinger, R. , & Van Durme, B. (2018). Hypothesis only baselines in natural language inference. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics , 180–191. New Orleans, Louisiana: Association for Computational Linguistics. https://www.aclweb.org/anthology/S18‐2023
  • Prabhumoye, S. , Boldt, B. , Salakhutdinov, R. , & Black, A. W. (2021). Case study: Deontological ethics in NLP. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 3784–3798. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2021.naacl‐main.297
  • Prost, F. , Thain, N. , & Bolukbasi, T. (2019). Debiasing embeddings for reduced gender bias in text classification. Proceedings of the First Workshop on Gender Bias in Natural Language Processing , 69–75. Florence, Italy: Association for Computational Linguistics. https://www.aclweb.org/anthology/W19‐3810
  • Ram, A. , Prasad, R. , Khatri, C. , Venkatesh, A. , Gabriel, R. , Liu, Q. , Nunn, J. , Hedayatnia, B. , Cheng, M. , Nagar, A. , et al. (2018). Conversational AI: The science behind the alexa prize. arXiv preprint arXiv:1801.03604 .
  • Ribeiro, M. T. , Wu, T. , Guestrin, C. , & Singh, S. (2020). Beyond accuracy: Behavioral testing of NLP models with CheckList. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 4902–4912. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.acl‐main.442
  • Roberts, S. T. , Tetreault, J. , Prabhakaran, V. , & Waseem, Z. (Eds.). (2019). Proceedings of the Third Workshop on Abusive Language Online . Florence, Italy: Association for Computational Linguistics. https://www.aclweb.org/anthology/W19‐3500 [ Google Scholar ]
  • Sap, M. , Card, D. , Gabriel, S. , Choi, Y. , & Smith, N. A. (2019). The risk of racial bias in hate speech detection. Proceedings of the 57th Conference of the Association for Computational Linguistics , 1668–1678. Florence, Italy: Association for Computational Linguistics. https://www.aclweb.org/anthology/P19‐1163
  • Saunders, D. , & Byrne, B. (2020). Reducing gender bias in neural machine translation as a domain adaptation problem. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 7724–7736. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.acl‐main.690
  • Schnoebelen, T. (2013, June 21). The weirdest languages . Idibon Blog. http://idibon.com/the‐weirdest‐languages
  • Shah, D. S. , Schwartz, H. A. , & Hovy, D. (2020). Predictive biases in natural language processing models: A conceptual framework and overview. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 5248–5264. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.acl‐main.468
  • Sheng, E. , Chang, K.‐W. , Natarajan, P. , & Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP‐IJCNLP) , 3407–3412. Hong Kong, China: Association for Computational Linguistics. https://www.aclweb.org/anthology/D19‐1339
  • Snow, R. , O'Connor, B. , Jurafsky, D. , & Ng, A. (2008). Cheap and fast – but is it good? evaluating non‐expert annotations for natural language tasks. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing , 254–263. Honolulu, Hawaii: Association for Computational Linguistics. https://www.aclweb.org/anthology/D08‐1027
  • Stanovsky, G. , Smith, N. A. , & Zettlemoyer, L. (2019). Evaluating gender bias in machine translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , 1679–1684. Florence, Italy: Association for Computational Linguistics.
  • Sun, T. , Gaut, A. , Tang, S. , Huang, Y. , ElSherief, M. , Zhao, J. , Mirza, D. , Belding, E. , Chang, K.‐W. , & Wang, W. Y. (2019). Mitigating gender bias in natural language processing: Literature review. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , 1630–1640. Florence, Italy: Association for Computational Linguistics. https://www.aclweb.org/anthology/P19‐1159
  • Suresh, H. , & Guttag, J. V. (2019). A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002 .
  • Tan, Y. C. , & Celis, L. E. (2019). Assessing social and intersectional biases in contextualized word representations. Advances in Neural Information Processing Systems , 13230–13241.
  • Tatman, R. (2017). Gender and dialect bias in YouTube's automatic captions. Proceedings of the First ACL Workshop on Ethics in Natural Language Processing , 53–59. Valencia, Spain: Association for Computational Linguistics. https://www.aclweb.org/anthology/W17‐1606
  • Tversky, A. , & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability . Cognitive Psychology , 5 , 207–232. [ Google Scholar ]
  • Uma, A. , Fornaciari, T. , Hovy, D. , Paun, S. , Plank, B. , & Poesio, M. (2020). A case for soft loss functions . Proceedings of the AAAI Conference on Human Computation and Crowdsourcing , 8 , 173–177. [ Google Scholar ]
  • Van de Poel, I. (2016). An ethical framework for evaluating experimental technology . Science and Engineering Ethics , 22 , 667–686. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Wang, J. , Li, S. , Jiang, M. , Wu, H. , & Zhou, G. (2018). Cross‐media user profiling with joint textual and social user embedding. Proceedings of the 27th International Conference on Computational Linguistics , 1410–1420. Santa Fe, New Mexico, USA: Association for Computational Linguistics. https://www.aclweb.org/anthology/C18‐1119
  • Webster, K. , Recasens, M. , Axelrod, V. , & Baldridge, J. (2018). Mind the GAP: A balanced corpus of gendered ambiguous pronouns . Transactions of the Association for Computational Linguistics , 6 , 605–617. https://www.aclweb.org/anthology/Q18‐1042 [ Google Scholar ]
  • Wei, J. , & Zou, K. (2019). EDA: Easy data augmentation techniques for boosting performance on text classification tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP‐IJCNLP) , 6382–6388. Hong Kong, China: Association for Computational Linguistics. https://www.aclweb.org/anthology/D19‐1670
  • Wu, Y. , Schuster, M. , Chen, Z. , Le, Q. V. , Norouzi, M. , Macherey, W. , Krikun, M. , Cao, Y. , Gao, Q. , Macherey, K. , Klingner, J. , Shah, A. , Johnson, M. , Liu, X. , Kaiser, L. , Gouws, S. , Kato, Y. , Kudo, T. , Kazawa, H. , … Dean, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 .
  • Zhang, B. , Huang, H. , Pan, X. , Ji, H. , Knight, K. , Wen, Z. , Sun, Y. , Han, J. , & Yener, B. (2014). Be appropriate and funny: Automatic entity morph encoding. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 706–711. Baltimore, Maryland: Association for Computational Linguistics. https://www.aclweb.org/anthology/P14‐2115
  • Zhao, J. , Wang, T. , Yatskar, M. , Cotterell, R. , Ordonez, V. , & Chang, K.‐W. (2019). Gender bias in contextualized word embeddings. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , 629–634. Minneapolis, Minnesota: Association for Computational Linguistics. https://www.aclweb.org/anthology/N19‐1064
  • Zhao, J. , Wang, T. , Yatskar, M. , Ordonez, V. , & Chang, K.‐W. (2017). Men also like shopping: Reducing gender bias amplification using corpus‐level constraints. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , 2979–2989. Copenhagen, Denmark: Association for Computational Linguistics. https://www.aclweb.org/anthology/D17‐1323

Top 14 Use Cases of Natural Language Processing in Healthcare

Pinakin Ariwala

Introduction

Natural Language Processing or   NLP applications in healthcare present some unique and stimulating opportunities. It provides a glide through the vast proportion of new data and leverages it for boosting outcomes, optimizing costs, and providing optimal quality of care.

Hey there! This blog is almost about  2200+ words  long and may take  ~9 mins  to go through the whole thing. We understand that you might not have that much time.

This is precisely why we made a  short video  on the topic. It is less than 2 mins, and summarizes  top 14 Use Cases of Natural Language Processing in Healthcare.  We hope this helps you learn more and save your time. Cheers!

Better access to data-driven technology as procured by healthcare organizations can enhance healthcare and expand business endorsements. But, it is not simple for the company enterprise systems to utilize the many gigabytes of health and web data. But, not to worry, the   drivers of NLP in healthcare   are a feasible part of the remedy. 

What is NLP in Healthcare? 

The NLP illustrates the manners in which artificial intelligence policies gather and assess unstructured data from the language of humans to extract patterns, get the meaning and thus compose feedback. This is helping the healthcare industry to make the best use of unstructured data. This technology facilitates providers to automate the managerial job, invest more time in taking care of the patients, and enrich the patient’s experience using real-time data.

However, NLP applications in healthcare go beyond understanding human language.

two use cases of nlp in healthcare

You will be reading more in this article about the most effective uses and role of NLP in healthcare corporations, including benchmarking patient experience, review administration and   sentiment analysis ,  dictation and the implications of EMR, and lastly the  predictive analytics . 

14 Best Use Cases of NLP in Healthcare

Let us have a look at the 14 use cases associated with Natural Language Processing in Healthcare:

1. Clinical Documentation

NLP healthcare systems  help free clinicians from the laborious physical systems of   EHRs   and permits them to invest more time in the patient; this is how NLP can help doctors. Both speech-to-text dictation and formulated data entry have been a blessing. The Nuance and M*Modal consists of technology that functions in team and   speech recognition   technologies for getting structured data at the point of care and formalised vocabularies for future use

The NLP technologies bring out relevant data from speech recognition equipment which will considerably modify analytical data used to run VBC and PHM efforts. This has better outcomes for the clinicians. In upcoming times, it will apply NLP tools to various public data sets and social media to determine Social Determinants of Health (SDOH) and the usefulness of wellness-based policies.

2. Speech Recognition

NLP has matured its use case in speech recognition over the years by allowing clinicians to transcribe notes for useful EHR data entry. Front-end speech recognition eliminates the task of physicians to dictate notes instead of having to sit at a point of care, while back-end technology works to detect and correct any errors in the transcription before passing it on for human proofing. 

The market is almost saturated with speech recognition technologies, but a few startups are disrupting the space with   deep learning algorithms   in mining applications, uncovering more extensive possibilities.

3. Computer-Assisted Coding (CAC)

Computer-assisted coding (CAC) is one of the most famous examples of NLP applications in healthcare . CAC captures data of procedures and treatments to grasp each possible code to maximize claims. It is one of the most popula r  uses of NLP , but unfortunately, its adoption rate is just  30 %. It has enriched the speed of coding but fell short at accuracy .

4. Data Mining Research

The integration of data mining ,  healthcare technology, and big data analytics in healthcare   systems allows organizations to reduce the levels of subjectivity in decision-making and provide useful medical know-how. Once started, data mining can become a cyclic technology for knowledge discovery, which can help any HCO create a good business strategy to deliver better care to patients.

5. Automated Registry Reporting

An NLP use case is to extract values as needed by each use case. Many health IT systems are burdened by regulatory reporting when measures such as ejection fraction are not stored as discrete values. For automated reporting, health systems will have to identify when an ejection fraction is documented as part of a note, and also save each value in a form that can be utilized by the organization’s analytics platform for automated registry reporting.

Automated registry reporting can be cumbersome to implement. To achieve the best possible results from the go, we recommend you seek the expertise of   Natural Language Processing services .

6. Clinical Decision Support

Advancements in NLP applications in healthcare are poised to elevate clinical decision support. Nonetheless, solutions are formulated to bolster clinical decisions more acutely. There are some areas of processes, which require better strategies of supervision, e.g., medical errors.

According to a   report , r ecent research has indicated the beneficial use of NLP for computerized infection detection. Some leading vendors are M*Modal and IBM Watson Health for NLP-powered CDS. In addition, with the help of Isabel Healthcare, NLP is aiding clinicians in diagnosis and symptom checking.  

7. Clinical Trial Matching

NLP applications in healthcare are making significant strides, especially in Clinical Trial Matching. 

Using NLP and machines in healthcare for recognising patients for a clinical trial is a significant use case. Some companies are striving to answer the challenges in this area using   Natural Language Processing in Healthcare engines for trial matching. With the latest growth, NLP can automate trial matching and make it a seamless procedure. 

One of the use cases of clinical trial matching is IBM Watson Health and Inspirata, which have devoted enormous resources to utilize NLP while supporting oncology trials.

14 Best Use Cases of NLP in Healthcare

8. Prior Authorization

Analysis has demonstrated that payer prior authorisation requirements on medical personnel are just increasing. These demands increase practice overhead and holdup care delivery. The problem of whether payers will approve and enact compensation might not be around after a while, thanks to NLP. IBM Watson and Anthem are already up with an NLP module used by the payer’s network for deducing prior authorisation promptly.

9. AI Chatbots and Virtual Scribe

Although no such solution exists presently, the chances are high that speech recognition apps would help humans modify clinical documentation. The perfect device for this will be something like Amazon’s Alexa or Google’s Assistant. Microsoft and Google have tied up for the pursuit of this particular objective. Well, thus, it is safe to determine that Amazon and IBM will follow suit.  

Chatbots or Virtual Private assistants exist in a wide range in the current digital world, and the healthcare industry is not out of this. Presently, these assistants can capture symptoms and triage patients to the most suitable provider. New startups formulating   chatbots   comprise BRIGHT.MD, which has generated Smart Exam, “a virtual physician assistant” that utilises conversational NLP to gather personal health data and compare the information to evidence-based guidelines along with diagnostic suggestions for the provider.   

NLP in Healthcare

Another “virtual therapist” started by Woebot connects patients through Facebook messenger. According to a trial, it has gained success in lowering anxiety and depression in   82 % of the college students who joined in.

10. Risk Adjustment and Hierarchical Condition Categories

Hierarchical Condition Category coding, a risk adjustment model, was initially designed to predict the future care costs for patients. In value-based payment models, HCC coding will become increasingly prevalent. HCC relies on ICD-10 coding to assign risk scores to each patient. Natural language processing can help assign patients a risk factor and use their score to predict the costs of healthcare.

Case Study - Medical Record Processing using NLP

11. Computational Phenotyping

In many ways, the NLP is altering clinical trial matching; it even had the possible chances to help clinicians with the complicatedness of phenotyping patients for examination. For example, NLP will permit phenotypes to be defined by the patients’ current conditions instead of the knowledge of professionals.

To assess speech patterns, it may use NLP that could validate to have diagnostic potential when it comes to neurocognitive damages, for example, Alzheimer’s, dementia, or other cardiovascular or psychological disorders. Many new companies are ensuing around this case, including BeyondVerbal, which united with Mayo Clinic for recognising vocal biomarkers for coronary artery disorders. In addition, Winterlight Labs is discovering unique linguistic patterns in the language of Alzheimer’s patients.

12. Review Management & Sentiment Analysis

NLP can also help healthcare organisations manage online reviews. It can gather and evaluate thousands of reviews on healthcare each day on 3rd party listings. In addition, NLP finds out PHI or Protected Health Information, profanity or further data related to HIPPA compliance. It can even rapidly examine human sentiments along with the context of their usage. 

Some systems can even monitor the voice of the customer in reviews; this helps the physician get a knowledge of how patients speak about their care and can better articulate with the use of shared vocabulary. Similarly, NLP can track customers’ attitudes by understanding positive and negative terms within the review.

13. Dictation and EMR Implications

On average, EMR lists between 50 and 150 MB per million records, whereas the average clinical note record is almost 150 times extensive. For this, many physicians are shifting from handwritten notes to voice notes that NLP systems can quickly analyse and add to   EMR systems .  By doing this, the physicians can commit more time to the quality of care.

Much of the clinical notes are in amorphous form, but NLP can automatically examine those. In addition, it can extract details from diagnostic reports and physicians’ letters, ensuring that each critical information has been uploaded to the patient’s health profile.

14. Root Cause Analysis

Another exciting benefit of NLP is how predictive analysis can give the solution to prevalent health problems. Applied to NLP, vast caches of digital medical records can assist in recognising subsets of geographic regions, racial groups, or other various population sectors which confront different types of health discrepancies. The current administrative database cannot analyse socio-cultural impacts on health at such a large scale, but NLP has given way to additional exploration.

In the same way, NLP systems are used to assess unstructured response and know the root cause of patients’ difficulties or poor outcomes.

What Immediate Benefits Can Healthcare Organizations Get By Leveraging NLP?

Healthcare organizations can use NLP to transform how they deliver care and manage solutions. Organizations can use machine learning in healthcare to improve provider workflows and patient outcomes.

What Immediate Benefits Can Healthcare Organizations Get By Leveraging NLP?

Here is a wrap-up of the use of Natural Language Processing in healthcare:

1. Improve Patient Interactions With the Provider and the EHR

Natural language processing solutions can help bridge the gap between complex medical terms and patients’ understanding of their health. NLP can be an excellent way to combat EHR distress. Many clinicians utilize NLP as an alternative method of typing and handwriting notes.

2. Increasing Patient Health Awareness

Most need help comprehending the information even when patients can access their health data through an EHR system. Because of this, only a fraction of patients can use their medical information to make health decisions. This can change with the application of machine learning in healthcare.

3. Improve Care Quality

NLP tools can offer better provisions for evaluating and improving care quality. Value-based reimbursement would require healthcare organizations to measure physician performance and identify gaps in delivered care. NLP algorithms can help HCOs do that and also assist in identifying potential errors in care delivery.

4. Identify Patients With Critical Care Needs

NLP algorithms can extract vital information from large datasets and provide physicians with the right tools to treat complex patient issues.

How Can Natural Language Processing Help Doctors?

A   study highlighted that physicians spend as much as 49% of their time on EHRs and desk work. The same survey also revealed that they could devote only 27% of their day towards clinical patient care.

This excessive paperwork burden is touted to be a significant contributor to physician burnout. This not only takes a toll on the well-being of healthcare professionals but also profoundly impacts patient care.

NLP application in healthcare is gradually emerging as a potential solution to this.

Paperwork Reduction and Increased Efficiency:  NLP healthcare systems can interpret and record medical information in real-time, eliminating the need for doctors to sit down and make entries manually. This can significantly reduce the paperwork burden, increasing efficiency and allowing healthcare professionals to focus more on patient care.

Real-Time Clinical Data Analysis:  Advanced NLP systems can scan vast clinical text data within seconds and extract valuable insights from piles of data. For example, an NLP medical record summarization model can analyze a patient’s medical history within seconds and generate a comprehensive summary highlighting all the essential clinical findings and previous treatments.

Computer-Assisted Coding (CAC):  Another advantage of NLP is the ability of computer-assisted coding to synthesize lengthy chart notes into essential pointers. In the past, the manual review and processing of extensive stacks of chart notes from health records stretched for weeks, months, or even years. NLP-enabled systems can significantly expedite this process, accelerate the identification of crucial information, and streamline the overall workflow.

Implementing Predictive Analytics in Healthcare

Identification of high-risk patients, as well as improvement of the diagnosis process, can be done by deploying Predictive Analytics along with   Natural Language Processing in Healthcare along with  predictive analytics .

It is vital for emergency departments to have complete data quickly, at hand. For example, the delay in diagnosis of Kawasaki diseases leads to critical complications in case it is omitted or mistreated in any way.  As proved by scientific results ,  an NLP based algorithm identified at-risk patients of Kawasaki disease with a sensitivity of 93.6% and specificity of 77.5% compared to the manual review of clinician’s notes.

A set of   researchers from France   worked on developing another NLP based algorithm that would monitor, detect and prevent hospital-acquired infections (HAI) among patients. NLP helped in rendering unstructured data which was then used to identify early signs and intimate clinicians accordingly.

nlp-in-healthcare

Similarly, another experiment was carried out in order to  automate the identification as well as risk prediction for heart failure patients that were already hospitalized. Natural Language Processing was implemented in order to analyze free text reports from the last 24 hours, and predict the patient’s risk of hospital readmission and mortality over the time period of 30 days. At the end of the successful experiment, the algorithm performed better than expected and the model’s overall positive predictive value stood at 97.45%.

The benefits of deploying NLP can definitely be applied to other areas of interest and a myriad of algorithms can be deployed in order to pick out and predict specified conditions amongst patients.

Even though the healthcare industry at large still needs to refine its data capabilities prior to deploying NLP tools, it still has a massive potential to significantly improve care delivery as well as streamline workflows. Down the line, Natural Language Processing and other ML tools will be the key to superior clinical decision support & patient health outcomes.

The advantages of deploying  natural language processing solutions can indeed pertain to other areas of interest. A myriad of algorithms can be instilled for picking out and predicting defined situations among patients. Although the healthcare industry still needs to improve its data capacities before deploying NLP tools, it has an enormous ability to enhance care delivery and streamline work considerably. Thus, NLP and other ML tools will be the key to supervise clinical decision support and patient health explanations.

Natural Language Processing in healthcare is not a single solution to all problems. So, the system in this industry needs to comprehend the sublanguage used by medical experts and patients. NLP experts at Maruti Techlabs have vast experience in working with the healthcare industry and thus can help your company receive the utmost from real-time and past feedback data.

Maruti Techlabs supports leading hospitals and healthcare units with  AI-driven NLP services . Our trademark products interpret human behaviour and languages and provide customised search results, chatbots, and virtual assistants to help you benefit from the role of NLP in Healthcare. 

contact us

Pinakin is the VP of Data Science and Technology at Maruti Techlabs. With about two decades of experience leading diverse teams and projects, his technological competence is unmatched.

card1

Artificial Intelligence and Machine Learning - 10 MIN READ

Artificial Intelligence in Healthcare - A Comprehensive Account

blog-writer

Artificial Intelligence and Machine Learning - 8 MIN READ

Unlocking the Power of NLP in Healthcare: A Comprehensive Review

card1

Artificial Intelligence and Machine Learning - 9 MIN READ

Streamlining the Healthcare Space Using Machine Learning and mHealth

  • Software Product Development
  • Artificial Intelligence
  • Data Engineering
  • Product Strategy
  • DelightfulHomes (Product Development)
  • Sage Data (Product Development)
  • PhotoStat (Computer Vision)
  • UKHealth (Chatbot)
  • A20 Motors (Data Analytics)
  • Acme Corporation (Product Development)
  • Staff Augmentation
  • IT Outsourcing
  • Privacy Policy

USA   5900 Balcones Dr Suite 100  Austin, TX 78731, USA

India 10th Floor The Ridge Opp. Novotel, Iscon Cross Road Ahmedabad, Gujarat - 380060

clutch_review

Technologies

Case Study: Deontological Ethics in NLP

Shrimai Prabhumoye , Brendon Boldt , Ruslan Salakhutdinov , Alan W Black

Export citation

  • Preformatted

Markdown (Informal)

[Case Study: Deontological Ethics in NLP](https://aclanthology.org/2021.naacl-main.297) (Prabhumoye et al., NAACL 2021)

  • Case Study: Deontological Ethics in NLP (Prabhumoye et al., NAACL 2021)
  • Shrimai Prabhumoye, Brendon Boldt, Ruslan Salakhutdinov, and Alan W Black. 2021. Case Study: Deontological Ethics in NLP . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 3784–3798, Online. Association for Computational Linguistics.

What we do best

AI Data Services

Data Collection Create & collect audio, images, text & video from across the globe.

Data Annotation & Labeling Accurately annotate data to make AI & ML think faster & smarter.

Data Transcription AI-driven, cloud-based transcription supporting 150+ languages.

Healthcare AI Harness the power to transform complex data into actionable insight.

Conversational AI Localize AI-enabled speech models with rich structured multi-lingual datasets.

Computer Vision Train ML models with best-in-class AI data to make sense of the visual world.

Generative AI Harness the power to transform complex data into actionable insight.

  • Question & Answering Pairs
  • Text Summarization
  • LLM Data Evaluation
  • LLM Data Comparison
  • Synthetic Dialogue Creation
  • Image Summarization, Rating & Validation

Off-the-shelf Data Catalog & Licensing

Medical Datasets Gold standard, high-quality, de-identified healthcare data.

Physician Dictation Datasets

Transcribed Medical Records

Electronic Health Records (EHR)

CT Scan Images Datasets

X-Ray Images Datasets

Computer Vision Datasets Image and Video datasets to accelerate ML development.

Bank Statement Dataset

Damaged Car Image Dataset

Facial Recognition Datasets

Landmark Image Dataset

Pay Slips Dataset

Speech/Audio Datasets Source, transcribed & annotated speech data in over 50 languages.

New York English | TTS

Chinese Traditional | Utterance/Wake Word

Spanish (Mexico) | Call-Center

Canadian French | Scripted Monologue

Arabic | General Conversation

Banking & Finance Improve ML models to create a secure user experience.

Automotive Highly accurate training & validation data for Autonomous Vehicles.

eCommerce Improve shopping experience with AI to increase Conversion, Order Value, & Revenue.

Named Entity Recognition Unlock critical information in unstructured data with entity extraction in NLP.

Facial Recognition Auto-detect one or more human faces based on facial landmarks.

Search Queries Optimization Improving online store search results for better customer traffic.

Text-To-Speech (TTS) Enhance interactions with precise global language TTS datasets.

Content Moderation Services Power AI with data-driven content moderation & enjoy improved trust & brand reputation.

Optical Character Recognition (OCR) Optimize data digitization with high-quality OCR training data.

AI innovation in Healthcare

  • Healthcare AI

Medical Annotation

Data De-identification

Clinical Data Codification

Clinical NER

  • Generative AI

Off-the-Shelf Datasets

  • Events & Webinar
  • Security & Compliance
  • Buyer’s Guide
  • Infographics
  • In The Media
  • Sample Datasets

Natural Language Processing in Healthcare

  • July 19, 2022

Top Use Cases of Natural Language Processing in Healthcare

The global natural language processing market is slated to increase from $1.8 billion in 2021 to $4.3 billion in 2026, growing at a CAGR of 19.0% during the period.

As the digitization of healthcare grows significantly, advanced technologies like NLP are helping the industry extract useful insights from the massive amounts of unstructured clinical data to uncover patterns and develop appropriate responses.

With more access to the latest technologies, the healthcare industry can develop customized treatment plans, provide accurate diagnostic solutions and optimize patient care experience.

Let’s look at the role of NLP in healthcare and its top use cases.

Role of NLP in Healthcare

The healthcare industry produces tons of unstructured clinical and patient data. It becomes challenging to manually collate and correlate all this information into a structured format. Utilizing these trillions of data is important as it can help improve healthcare delivery, automate administrative systems, reduce patient time, and improve care with real-time data.

Natural language processing and artificial intelligence help collect unstructured medical data from human speech, reports, documents, and databases to extract meaningful patterns. With these patterns, you can extend better diagnosis, treatment, and support to patients.

There are two primary ways in which NLP enhances healthcare delivery. One is extracting information from a physician’s speech by comprehending its meaning.

The other is mapping out the critical information from databases and documents to help doctors and practitioners make informed decisions.

Different Use Cases of Natural Language Processing in Healthcare

There are many use cases of healthcare NLP . Here are the top 4 use cases

Healthcare Nlp Use Cases

How Can Healthcare Organizations Leverage NLP?

Benefits Of Nlp In Healthcare

Exploring Natural Language Processing (NLP) in Translation

Clinical NLP

Unlocking the Potential of Clinical Natural Language Processing (NLP) in Healthcare

Data Mining

Unstructured Text in Data Mining: Unlocking Insights in Document Processing

  • Data Annotation
  • Data Collection
  • Data De-Identification
  • Conversational AI
  • Computer Vision
  • Automotive AI
  • Banking & Finance
  • ShaipCloud™ Platform

(US): (866) 473-5655

[email protected] [email protected] [email protected]

Vendor Enrollment Form

© 2018 – 2024 Shaip | All Rights Reserved

Help | Advanced Search

Computer Science > Computation and Language

Title: nlp for maternal healthcare: perspectives and guiding principles in the age of llms.

Abstract: Ethical frameworks for the use of natural language processing (NLP) are urgently needed to shape how large language models (LLMs) and similar tools are used for healthcare applications. Healthcare faces existing challenges including the balance of power in clinician-patient relationships, systemic health disparities, historical injustices, and economic constraints. Drawing directly from the voices of those most affected, and focusing on a case study of a specific healthcare setting, we propose a set of guiding principles for the use of NLP in maternal healthcare. We led an interactive session centered on an LLM-based chatbot demonstration during a full-day workshop with 39 participants, and additionally surveyed 30 healthcare workers and 30 birthing people about their values, needs, and perceptions of NLP tools in the context of maternal health. We conducted quantitative and qualitative analyses of the survey results and interactive discussions to consolidate our findings into a set of guiding principles. We propose nine principles for ethical use of NLP for maternal healthcare, grouped into three themes: (i) recognizing contextual significance (ii) holistic measurements, and (iii) who/what is valued. For each principle, we describe its underlying rationale and provide practical advice. This set of principles can provide a methodological pattern for other researchers and serve as a resource to practitioners working on maternal health and other healthcare fields to emphasize the importance of technical nuance, historical context, and inclusive design when developing NLP technologies for clinical use.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Blog | Imaginary Cloud

  • digital transformation
  • scoping session
  • product design process
  • minimum viable product
  • software development
  • web development
  • mobile development
  • data science
  • ux/ui design
  • quality assurance
  • digital acceleration
  • flippednormals
  • all case studies
  • manufacturing
  • all industries
  • tech guides
  • tech glossary

How to analyse customer reviews with NLP: a case study

Vítor Bernardes

This report analyzes the customer reviews of Britannia International Hotel Canary Wharf. The analysis was performed using Natural Language Processing techniques, and the results were used to identify which aspects of the hotel's service needed to be improved. Apart from the hospitality industry, this analysis can benefit any other sector with access to customer feedback, like e-commerce, food services, or the entertainment industry.

Table of Contents

Problem Solution     ➤   Motivation and Objectives     ➤   Overview     ➤   Analysis Outcomes Applications     ➤   E-commerce     ➤   Hospitality industry     ➤   Food services industry     ➤   Entertainment Industry Endnotes

One of the most critical aspects of understanding a business is understanding its strengths and weaknesses. Analyzing why it is thriving or not represents a key to the longevity of that business. Hotels are not strange to this scenario.

As a business owner, it is essential to understand why some customers might not return to the hotel, the reason behind some aversion, or what positively stood out to them.

To perform this research, we gathered a dataset of hotel reviews and focused our attention on a specific hotel: Britannia International Hotel Canary Wharf.

Britannia International Hotel Canary Wharf.

The dataset was gathered from the Kaggle platform, containing over 515,000 customer reviews and scoring of 1493 luxury hotels across Europe.

Motivation and Objectives

To gain insights into the hotel reviews and understand the customers' feelings and feedback more accurately, we needed to understand the customer opinions and segmentation in our dataset with the available data.

Additionally, the large corpus of customer feedback makes it time-consuming to manually review them to capture customers' preferences and pain points. Therefore, we also proceeded to analyze the review texts with Natural Language Processing techniques to understand the intrinsic feelings and emotions behind reviews and recognize which aspects of the hotel required improvements.

While we applied this process to the hospitality industry, this type of analysis can be readily implemented for any other industry that captures customer feedback or even enabled by collecting customer comments from social media posts.

We started by evaluating the available data, with particular attention to the format and soundness of each field. As is typical when dealing with datasets, especially ones that involve user-generated data, some data needed cleaning. This is an important step in every data analysis process to ensure that the data we work with and use as a foundation for insights is sound and therefore leads to reasonable and representative conclusions.

In the specific case of this dataset, the actual review text needed some minor cleaning to remove redundant whitespace. However, we also noticed a significant issue: all punctuation was missing from the review. Therefore, it was necessary to perform a pre-processing step. We proceeded to recover some of the structure provided by that punctuation to ensure we could use Natural Language Processing techniques and obtain relevant results. A simple yet effective method was to approximate that structure by adding periods before each word beginning with a capital letter.

The effectiveness of that method also stemmed from our additional processing, where we filtered known acronyms and named entities, so we would not add unnecessary periods. To achieve that, we employed automatic named entity recognition, a process that attempts to identify named entities in a given piece of text automatically. In the NLP context, named entities are real-world objects that can be identified with a proper name, including cities, individuals, organizations, etc.

Data profiling

The next step was creating our dataset, which we filtered to only apply to our specific hotel. With our filtering, we were able to have access to information about our particular hotel.

The dataset contains the review date and the score given to that stay. It also had information regarding the reviewer's nationality and tags that described the characteristics of the visit, such as if it constituted a double or a single room and how long the stay was. In addition, it also possessed negative and positive reviews of that stay.

To approximate the available data to a real scenario, we randomly meshed the negative and positive reviews into only one column to analyze later.

Distribution Analysis

The first task was to see reviews' ratings by date. Identifying periods where the ratings would not be so good could be possible. This could derive from a seasonal aspect, such as not having air conditioning in the summer or the impact of a specific employee.

This approach was not fruitful, but the same logic applied to analyzing the tags or nationalities. Through the tags, we could identify, for instance, if customers with an Executive Double Room stay did leave bad reviews or not. That visualization could be done through boxplots. We analyzed all the different tags and found that most of them reflected similar distributions, which prevents the possibility of obtaining relevant insights.

Boxplots with reviewer score for different hotel accomodations.

Regarding the nationalities, it was essential to analyze the distribution of our customers. This could provide insights into the marketing team’s effectiveness in some markets. Excluding the UK customers, which represent 80% of all the customers, we get the following world map overview, where darker shades indicate a higher number of reviewers from that nationality:

World map overview indicating reviewers nationality.

Sentiment Analysis

To further understand the feeling behind the reviews, we use a language model hosted on the HuggingFace platform to know whether the review was positive or negative. The multilingual XLM-roBERTa-base model was trained on ~198M tweets and fine-tuned for sentiment analysis. The sentiment fine-tuning was done in 8 languages.

With the ability to split the reviews into positive and negative with a reasonable confidence level (0.76 accuracy in our dataset), we tried to analyze patterns within those reviews. A straightforward way to visualize the words is through word clouds. Following is the word cloud for Negative and Positive Reviews.

Negative reviews

There is much information to be gained from analyzing the dynamics between positive and negative customer reviews. Customers surely want to have their say, as demonstrated by our data set, where negative reviews are, on average, over twice as long as positive reviews. Additionally, by looking at the evolution of the average number of reviews over time, we can see a potential slight increasing trend in the number of negative reviews, which the business should be attentive to.

3 month moving of average reviews

Emotion Analysis

Besides identifying the sentiment behind a text, another technique in NLP is to identify the emotion behind it. To achieve this, we used the NCRLex library. NCRLex library allows us to recognize emotions from texts, such as fear, anger, or surprise. This analysis allows us to more accurately understand how customers feel about a specific service or product.

Similarly to sentiment visualization, we can visualize a word cloud for each emotion within the positive or negative reviews by identifying the different emotions associated. For example, the word cloud generated from the trust emotion within the positive reviews is as follows:

Word cloud generated from trust emotion within positive reviews

This process allows us to have some idea of what triggers which customer emotion.

Keyword Analysis

To further analyze the reviews, we wanted to identify the main objects of customer comments in their reviews. To achieve that, we extracted relevant keywords from the set of positive and negative reviews using YAKE, an unsupervised automatic keyword extraction method. This method computes statistical features related to characteristics for each review, including word case, position, frequency, context, and weights of each term according to these features. Finally, a score is computed indicating the significance of each term as a potential keyword. This is a powerful yet lightweight method that, due to its fully unsupervised nature, can be employed in different domains and even with other languages.

Additionally, we employed a pure frequency-based approach to uncover the most common objects mentioned in reviews. The results were similar to our keyword analysis, reaffirming its validity and reliability.

These were the keywords identified for positive and negative reviews:

  • Positive : hotel, location, staff, view, room, breakfast
  • Negative : hotel, staff, room, breakfast, window, bed, Wi-Fi

As expected, the identified keywords are common points addressed in the hospitality industry reviews. They already constitute a good indicator of adequate service or potential areas of improvement for the hotel. However, we wanted to go deeper into the analysis and uncover exactly what it was about these objects that were – or were not – working as expected by customers. For example, why were windows such a prominent aspect of negative reviews?

To that end, we used another technique from Natural Language Processing: syntactic dependency parsing. We employed spaCy, a fast, comprehensive, and production-ready NLP library for Python, to create a syntactic dependency tree, which connects all terms in the input text according to their syntactic relation. Then, we queried this tree to pinpoint precisely what it was about a given keyword (for example, "room" or "location") that customers did or did not especially like.

Syntactic dependency parsing process.

The result was a list of modifiers for each keyword. For example, we could learn that customers might consider a "room" to be "spacious" or the "location" to be "convenient." This resulting list of modifiers enabled us to create word clouds to visualize the frequency of each modifier for the given keyword, such as the word cloud below, for the keyword "room":

Word cloud for the keyword room

Analyzing these frequent modifiers for each keyword, their relevance, and weight, and analyzing separately for positive and negative reviews, provided us with a profounder insight into what customers like best – and not so much – the results we present below.

4 things to remember when choosing a tech stack for your web development project

Upon analyzing the data set as described above, we were able to identify some positive aspects of the business, as well as essential areas for improvement.

One noticeable comment from customers, which frequently appears in both positive and negative reviews, is that some consider the hotel dated. The three main modifiers used to describe the hotel in negative reviews pertain to that quality. This suggests the business may want to look into renovation to appease those pain points.

Modifiers for hotel keyword in negative reviews

The keyword analysis reveals customers' most common points when posting their reviews. As one would expect, the room features prominently in both negative and positive reviews. While it is mentioned regularly in negative reviews throughout the period we analyzed, in approximately the last six months, there was a surge in room mentions in positive reviews, a potentially favorable trend the business should be aware of. In positive reviews, the most common comments refer to rooms as clean and spacious. There are also references to being overall comfortable and cheap.

The beds were also frequently mentioned, with some users considering them stiff and uncomfortable. The prevalence of this comment also suggests an immediate area for improvement. On that note, some customers also pointed out that they found the hotel noisy.

Top modifiers for negative reviews for bed.

In addition to that, another major issue reported by customers is the heating, ventilation, and air conditioning system in place at the hotel — "hot" and "cold" were the main concerns from customers regarding their rooms. One particular pain point was the room window, which was so frequently mentioned to be identified as one of our keywords, especially since it required staff assistance to open some rooms' windows.

Word cloud with main concerns from customers.

In that sense, the staff was frequently brought up in positive and negative reviews, with some customers considering them rude. However, more often than not, they were considered friendly and helpful, although one particular point of interest is that many customers thought the hotel was understaffed. Finally, the mention of the staff in reviews remains relatively constant over time.

The hotel location was another prominent factor in positive reviews. It was predominantly perceived as a positive aspect, with many general compliments, and being considered convenient and centrally located. However, one crucial trend the business should be aware of is that, over time, location has been mentioned less frequently in positive reviews while increasingly referred to in negative reviews. While this may relate to the external location and, therefore, to external factors outside of immediate hotel control, it is a potential trend worth keeping an eye out for.

Finally, it is worth mentioning that a significant number of negative reviews commented upon the hotel's Wi-Fi, mainly due to it being paid and not free.

Keword-mentions-in-reviews

Applications

Business intelligence and sentiment analysis projects such as this can bring value to many use cases.

Nowadays, a significant portion of shopping is done online. E-commerce represents a growing trend of nearly unlimited access to resources, markets, and products in real-time from anywhere on the planet. Understanding the reach of the marketing in terms of customer segmentation is very important for a business to adjust efforts to reach the desired target public.

Almost every e-commerce platform contains a reviews section where customers can comment on the products they bought. This comment section represents a valuable data source that can bring value to the business.

Through NLP techniques, it is possible to acquire insights into what the customer likes or dislikes about the products. These insights can help understand flaws or further improvements to the product and/or the platform. We can identify key aspects that bring insecurity or other emotions to the customer, so we can act on them.

It also becomes possible to see the evolution of the user sentiment on the product over time and measure how changes affected the customers' overall opinion.

Hospitality Industry

The hospitality industry is a very competitive sector where little details can prove to be essential edges over competitors.

Booking, Trivago, Google, and other platforms often list establishments. The common aspect between these platforms is that customers often use them to leave reviews. By analyzing the review scores and comments, it is possible to gather insights into customers' opinions on key aspects of the businesses.

This data allows us to interpret which aspects of the business need changing or attention, what parts customers value, and possibly foresee some adjustments we should consider.

Food services industry

Restaurants, coffee shops, and bars increasingly rely on their online presence to attract customers. This involves being listed on several platforms like Yelp, Google, Zomato, and Tripadvisor, which allow users to leave ratings and written reviews. Often, clients choose which new places to try based solely on these reviews, making them a key to understanding how the business is performing.

It is in these establishments' best interest to use all this feedback to find ways to get an edge over their competitors. Analyzing possible customer pain points helps invest in worthwhile improvements, and tracking consumer sentiment over time ensures that the investments are paying off.

Any establishment that grows beyond a specific size must rely on Data Science techniques to analyze many reviews they may get on different platforms. This process can be automated, providing quick feedback and a broad vision of what is attracting or disenchanting customers. This will help managers take their food services to the next level.

Entertainment Industry

The entertainment industry is broad, including everything from Movies, TV Shows, and Youtube Channels to Amusement Parks and Circus Acts. Common to all of these businesses, especially in the digital age, is that they are subject to reviews and comments, both from critics and spectators.

As the business grows, the number of reviews might become unmanageable, making it difficult to understand the overall sentiment of the population. This is where NLP techniques should come into play, allowing many comments to be parsed and analyzed to extract valuable and actionable insights.

In summary, we analyzed customer feedback about their stay in a hotel using Natural Language Processing techniques and uncovered actionable insights that can directly impact business decision-making. This analysis and the underlying processes can be used for many other applications, bringing value to businesses across many sectors.

This project was completed in 3 days with a team of 2 Imaginary Cloud Data Scientists. Imaginary Cloud provides Data Science and AI development services, focusing on bringing the highest value to its clients through tailored solutions and an agile process.

Contact us if you need a custom Data Science or AI solution:

Artificial Intelligence Solutions  done right - CTA

At Imaginary Cloud , we simplify complex systems, delivering interfaces that users love. If you’ve enjoyed this article, you will certainly enjoy our newsletter, which may be subscribed below. Take this chance to also check our latest work and, if there is any project that you think we can help with, feel free to reach us . We look forward to hearing from you!

  • Language Models
  • Managed services
  • TURNKEY SOLUTIONS

case study on nlp

  • Articles, videos & papers >>
  • Latest From the Blogs >>

case study on nlp

  • Announcement See all

case study on nlp

  • Install Software
  • Schedule a Call

Home » Customers

NLP Case Studies: Proven Customer Success

  • Life Sciences

NLP case study

Relation Extraction from Pathology, Radiology, and Genomic Sequencing Reports

case study on nlp

Lessons Learned De-Identifying 700 Million Patient Notes with Spark NLP

NLP case study: VA

Using Healthcare-Specific LLM’s for Data Discovery from Patient Notes & Stories

NLP case study

Transforming Care in Psychiatry: Leveraging NLP to optimize Inpatient Violence and Delirium Screening in Acute Care Settings

NLP case study

NLP for Finance – Automated Invoice Classification for Submission Compliance

NLP case study: OM1

Identifying mental health concerns, subtypes, temporal patterns, and differential risks among children with Cerebral Palsy using NLP on EHR data

NLP case study: Closedloop

Empowering Healthcare through NLP: Harnessing Clinical Document Insights at Intermountain Health

NLP case study

Extracting what, when, why, and how from Radiology Reports in Real World Data acquisition projects

NLP case study: COTA

Leveraging Healthcare NLP Models in Regulatory Grade Oncology Data Curation

NLP case study: OMNY Health

Large Language Models to Facilitate Building of Cancer Data Registries

NLP case study: Gallagher-Bassett

How Care-Connect and Spryfox use NLP in Making Patient-level Decisions

NLP case study: RAG on FHIR

RAG on FHIR: Using FHIR with Generative AI to Make Healthcare Less Opaque

NLP case study: Galileo

Automated Extraction of Medical Risk Factors for Life Insurance Underwriting

NLP case study

Automated Classification and Entity Extraction from essential documents pertaining to Clinical Trials

Therapy specific outcomes – rheumatology insights using nlp, building an integrated data approach to pharma medical affairs.

NLP case study: insife

Artificial Intelligence for Pharmacovigilance Processing

NLP case study: Intel

John Snow Labs’ Spark NLP for Healthcare Library Speeds Up Automated Language Processing with Intel® AI Technologies

Understand patient experience journey to improve pharma value chain.

NLP case study: White comet

Building Reproducible Evaluation Processes for Spark NLP Models

NLP case study: Merck

Building Patient Cohorts with NLP and Knowledge Graphs

NLP case study: Vakil

Vakilsearch Understands Scanned Legal & Tax Forms Using John Snow Labs

Detecting undiagnosed conditions and automating medicare risk adjustment, a unified cv, ocr, and nlp approach for scalable document understanding at docusign.

NLP case study: OMNY health

Adverse Drug Event Detection Using Spark NLP

NLP case study: Ronin

Harnessing Causality, Encoded Clinical Knowledge, and Transparency: How Ronin Enables Personalized Decisions for Cancer Patients

Databricks introduces partner connect with seamless john snow labs integration.

NLP case study:

Automated Patient Risk Adjustment and Medicare HCC Coding from Clinical Notes

Using spark nlp in r: a drug standardization case study.

NLP case study: SelectData

SelectData interprets millions of patient stories with deep learned OCR and NLP

Spark nlp in action: intelligent, high-accuracy fact extraction from long financial documents, using spark nlp to de-identify doctor notes in the german language, identifying housing insecurity and other social determinants of health from free-text notes, accelerating clinical risk adjustment through natural language processing.

NLP case study: Deep6

Deep6 accelerates clinical trial recruitment with Spark NLP

NLP case study: Bitvore

Text Classification into a Hierarchical Market Taxonomy using Spark NLP at Bitvore

NLP case study: Financial Services Case Study

ESG Document Classification

Applying advanced analytics to help improve mental health among hiv positive adolescents in south africa.

NLP case study: selectdata

SelectData uses AI to better understand home health patients

NLP case study:MCG Health

Burden Reduction: Using Spark NLP to provide assistive intelligence in documenting evidence-based guidelines

NLP case study: MetiStream

Best Practices in Improving NLP Accuracy for Clinical Entity Recognition & Resolution

NLP case study: TOP-10 Global pharmaceutical

Using Spark NLP to Enable Real-World Evidence (RWE) and Clinical Decision Support in Oncology

NLP case study: Wisecube

Using Spark NLP to build a drug discovery knowledge graph for COVID-19

Automating a streaming pipeline with ocr on databricks lakehouse, beyond context: answering deeper questions by combining spark nlp and graph database analytics.

case study on nlp

Abstracting Real World Data from Oncology Notes

NLP case study: FDA

Identifying opioid-related adverse events from unstructured text in electronic health records

Benefits and challenges in de-identifying and linking unstructured medical records, predictive maintenance by intelligent mining of manufacturing incidents using spark nlp, accelerating biomedical innovation by combining nlp and knowledge graphs, sentiment analysis of insurance claims using spark nlp, automated question answering about clinical guidelines.

NLP case study: TOP-10 Global Pharmaceutical

Applying State-of-the-art Natural Language Processing for Personalized Healthcare

Spark nlp in action: improving patient flow forecasting at kaiser permanente, text-prompted cohort retrieval: leveraging generative healthcare models for precision population health management, using nlp to improve radiologist productivity & reduce burnout.

NLP case study

Building a Better Patient Chart: Combining structured, unstructured, and missing data

NLP case study: GBD

Automated Detection of Environmental, Social, and Governance Issues in Financial Documents

Automated and explainable deep learning for clinical language understanding at roche, automating phi removal from healthcare data with natural language processing, improving drug safety with adverse event detection using nlp, building, analyzing and querying biomedical knowledge graphs using graphster.

NLP case study: Roche

Unexpected Issues in Pharmacoepidemiology Studies Applying Natural Language Processing to Clinical Notes

case study on nlp

Roche automates knowledge extraction from pathology reports

NLP case study: Roche

Building a Smart Safety Data Sheet Parser Using NLP Lab

NLP case study: Virginia Tech

Identifying Patterns of Racial Discrimination through Natural Language Processing

NLP case study: Roche

Developing Guidelines for Responsible Generative AI in Healthcare

NLP case study: Galileo

Adapting LLM & NLP Models to Domain-Specific Data 10x Faster with Better Data

Conan: a semantic search system for contract analytics, using nlp to identity the emotional state of patients at different stages of a disease journey.

NLP case study: OM1

Using Real-World Data to Better Understand Inflammatory Bowel Disease (IBD)

I am very appreciative of all the hard work put into this project. The staff went above and beyond my expectations and were accommodating. The staff also was very attentive and thought “outside the box”. Overall, I am very satisfied and will gladly work with the staff in the near future again.

Kaiser Permanente uses Spark NLP to integrate domain-specific NLP as part of a scalable, performant, measurable, and reproducible ML pipeline and improve the accuracy of forecasting the demand for hospital beds.

John Snow Labs delivered a whole new revenue stream for Usermind within three months.

COMMENTS

  1. Top 30 NLP Use Cases in 2024: Comprehensive Guide

    1. Translation. One of the top use cases of natural language processing is translation. The first NLP-based translation machine was presented in the 1950s by Georgetown and IBM, which was able to automatically translate 60 Russian sentences into English.

  2. Case Study on Natural Language Processing: Identifying and Mitigating

    Explore a case study on bias in NLP. Demonstrate techniques to mitigate word embedding bias. Content. Natural language processing is used across multiple domains, including education, employment, social media, and marketing. There are many sources of unintended demographic bias in the NLP pipeline. The NLP pipeline is the collection of steps ...

  3. Case Study: Using Natural Language Processing for Healthcare ...

    By layering the language model onto our client's data, our Machine Learning system could now understand the story of the case file and begin to summarize it. Step 3: Summarize the Case. Pragmatically speaking, using natural language processing to summarize dense text requires two steps. The first is to extract relevant information.

  4. Natural language processing in healthcare

    Finally, we lay out a case study describing how we have used NLP to accelerate benchmarking clinical guidelines. Artificial intelligence (AI) is increasingly being adopted across the healthcare industry, and some of the most exciting AI applications leverage natural language processing (NLP). Simply put, NLP is a specialized branch of AI ...

  5. Translational NLP: A New Paradigm and General Principles for Natural

    Case Study: NLP for Disability Review. We illustrate our Translational NLP framework using our recent line of research on developing NLP tools to assist US Social Security Administration (SSA) officials in reviewing applications for disability benefits (Desmet et al., 2020). The goal of this effort was to help identify relevant pieces of ...

  6. Natural language processing applied to mental illness detection: a

    We provide a narrative review of mental illness detection using NLP in the past decade, to understand methods, trends, challenges and future directions. A total of 399 studies from 10,467 records ...

  7. Practical Natural Language Processing with Python: With Case Studies

    Practical Natural Language Processing with Python follows a case study-based approach. Each chapter is devoted to an industry or a use case, where you address the real business problems in that industry and the various ways to solve them. ... Gain the know-how to solve a typical NLP problem using language-based models and machine learning ...

  8. A Case Study of Efficacy and Challenges in Practical Human-in-Loop

    %0 Conference Proceedings %T A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation of NLP Systems Using Checklist %A Bhatt, Shaily %A Jain, Rahul %A Dandapat, Sandipan %A Sitaram, Sunayana %Y Belz, Anya %Y Agarwal, Shubham %Y Graham, Yvette %Y Reiter, Ehud %Y Shimorina, Anastasia %S Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval) %D 2021 %8 ...

  9. PDF Case Study: Deontological Ethics in NLP

    In this work, we study one ethi-cal theory, namely deontological ethics, from the perspective of NLP. In particular, we focus on the generalization principle and the respect for autonomy through informed consent. We provide four case studies to demonstrate how these principles can be used with NLP systems.

  10. A prior case study of natural language processing on different domain

    natural language processing (i.e. NLP) is a branch of AI that has significant. implication on the ways that computer machine and humans can interact. NLP has become an essential technology in ...

  11. Natural Language Processing (NLP) and EFL Learning: A Case Study Based

    In this case study, opinions were gathered from six mature students of English as a foreign language (EFL) in a university in Saudi Arabia. The researcher especially explored the use of a voice recognition device (Amazon's virtual assistant, Alexa) based on natural language processing (NLP) for deep learning.

  12. Case Study 2

    In this case, the algorithm will consider words that have occurred in at least 6 documents to form the Bag of Words. In the coming sections, we will start working on a scenario that requires Natural Language Processing (NLP). We will work on understanding emotions presented by humans through their written statements. Case Study 2: Sentiment ...

  13. Case Study: Natural Language Processing (NLP) with Open Data for Drug

    The development and discovery of new drugs is a long, costly and risky endeavour with low rate of success. It is a well-known practice to identify new uses for already existing drugs or known active compounds, referred to as "drug repositioning'' or "drug repurposing This case study uses NLP techniques and open data sources to explore their potential in identifying drugs for ...

  14. Case studies

    Unlocking Emotional Resilience: Conquering Workplace Challenges with mBraining and NLP Mastery. Posted by Gayle Young on October 10th 2023. NLP in the Workplace. Sarah, facing unwanted attention from a male director, used mBraining and NLP submodality changes to regain control, balance emotions, and excel professionally.

  15. Five sources of bias in natural language processing

    Finally, Havens et al. discuss the need for a bias‐aware methodology in NLP and present a case study in executing it. Researchers have to be mindful of the entire research design: data sets they choose, the annotation schemes or labelling procedures they follow, how they decide to represent the data, the algorithms they choose for the task ...

  16. Top 14 Use Cases of Natural Language Processing in Healthcare

    With the latest growth, NLP can automate trial matching and make it a seamless procedure. One of the use cases of clinical trial matching is IBM Watson Health and Inspirata, which have devoted enormous resources to utilize NLP while supporting oncology trials. 8. Prior Authorization.

  17. Case Study: Deontological Ethics in NLP

    Shrimai Prabhumoye, Brendon Boldt, Ruslan Salakhutdinov, and Alan W Black. 2021. Case Study: Deontological Ethics in NLP. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3784-3798, Online. Association for Computational Linguistics.

  18. [2010.04658] Case Study: Deontological Ethics in NLP

    Case Study: Deontological Ethics in NLP. Shrimai Prabhumoye, Brendon Boldt, Ruslan Salakhutdinov, Alan W Black. Recent work in natural language processing (NLP) has focused on ethical challenges such as understanding and mitigating bias in data and algorithms; identifying objectionable content like hate speech, stereotypes and offensive ...

  19. Top Use Cases of Natural Language Processing in Healthcare

    The global natural language processing market is slated to increase from $1.8 billion in 2021 to $4.3 billion in 2026, growing at a CAGR of 19.0% during the period.. As the digitization of healthcare grows significantly, advanced technologies like NLP are helping the industry extract useful insights from the massive amounts of unstructured clinical data to uncover patterns and develop ...

  20. [2312.11803] NLP for Maternal Healthcare: Perspectives and Guiding

    Drawing directly from the voices of those most affected, and focusing on a case study of a specific healthcare setting, we propose a set of guiding principles for the use of NLP in maternal healthcare. We led an interactive session centered on an LLM-based chatbot demonstration during a full-day workshop with 39 participants, and additionally ...

  21. Mental training for young athlete: A case of study of NLP practice

    Abstract. Aims To examine the effect of neurolinguistic programming (NlP) and its processes on psychosomatic skills (cognitive anxiety and somatic anxiety), basic skills (self-confidence and ...

  22. How to analyze customer reviews with NLP: a case study

    How to analyse customer reviews with NLP: a case study. This report analyzes the customer reviews of Britannia International Hotel Canary Wharf. The analysis was performed using Natural Language Processing techniques, and the results were used to identify which aspects of the hotel's service needed to be improved.

  23. NLP Case Studies: Proven Customer Success

    Join The Global NLP Community. Be the first to know about new releases, offers, and events. NLP case studies demonstrating using of natural language processing techniques that have delivered efficiency and innovation in their field of application. John Snow Labs.