U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Cochrane Database Syst Rev

Speech and language therapy interventions for children with primary speech and/or language disorders

Newcastle University, School of Education, Communication and Language Sciences, Queen Victoria Road, Newcastle upon TyneUK, NE1 7RU

Jane A Dennis

University of Bristol, Musculoskeletal Research Unit, School of Clinical Sciences, Learning and Research Building [Level 1]Southmead Hospital, BristolUK, BS10 5NB

Jenna JV Charlton

This is a protocol for a Cochrane Review (Intervention). The objectives are as follows:

To determine the effectiveness of speech and language therapy interventions for children with a primary diagnosis of speech and/or language disorders. The review will focus on comparisons between active interventions and controls.

Description of the condition

Speech and/or language disorders are amongst the most common developmental difficulties in childhood. Such difficulties are termed 'primary' if they have no known aetiology, and 'secondary' if they are caused by another condition such as autism, hearing impairment, general developmental difficulties, behavioural or emotional difficulties or neurological impairment ( Stark 1981 ; Plante 1998 ). Although some children have either a primary speech disorder but not a language disorder, or vice versa, these disorders commonly overlap. In addition, interventions in both cases share commonalities; for example, focusing on various elements of the language system and common underlying processes such as attention and listening. Therefore, in both research and intervention, it is difficult to tease speech and language disorders apart.

It is thought that approximately 5% to 8% of children may have difficulties with speech and/or language ( Boyle 1996 ; Tomblin 1997 ), of which a significant proportion will have 'primary' speech and/or language disorders. The presentation of primary speech and/or language disorders can vary considerably between individuals in terms of severity, pattern of impairment and degree of comorbidity ( Bishop 1997 ). Questions have been raised in recent years as to how 'specific' to speech and language these problems are, but this distinction between primary and secondary difficulties remains clinically useful and is one commonly reported in the literature ( Bishop 1997 ; Leonard 2014 ; Reilly 2014 and associated papers).

Given the heterogeneity of presentation, there are inconsistencies in terminology for speech and/or language disorders with no agreed diagnostic label. The term 'language disorder', as used in the latest edition of the Diagnostic and Statistical Manual of Mental Disorders ( DSM‐5 2013 ), has been found to be problematic, as it identifies too broad a range of conditions ( Bishop 2014 ). The term 'specific language impairment' is the most commonly‐used diagnostic label, 'specific' referring to the idiopathic nature of the condition. However, this term is problematic in that it suggests difficulties are specific to language only. Disagreements about terminology impede research and clinical processes as well as access to services ( Reilly 2014 ), and differences in diagnostic categories/labels have implications for the current review, meaning that a wide range of different terms are expected across the literature. For the purpose of the current review, however, impairments in speech and language will be referred to as 'speech and/or language disorders', reflecting the possibility that children may have impairment in both or either of these areas.

Primary speech and/or language disorders can affect one or several of the following areas: phonology (the pattern of sounds used by the child), vocabulary (the words that a child can say and understand), grammar (the way that language is constructed), morphology (meaningful changes to words to signal tense, number, etc.), narrative skills (the ability to relate a sequence of ideas), and pragmatic language (the ability to understand the intended meaning of others and to communicate effectively in conversation ( Adams 2012 )). As regards the current review, the majority of these affected areas may be categorised as a 'language' outcome, with 'phonology' categorised as a separate outcome. It is unclear whether primary speech and/or language disorders represent varying levels of a single condition, or a number of different conditions with diverse aetiologies but similar presenting patterns ( Law 1998 ; Tomblin 2004 ).

There is little consensus on the aetiology of primary speech and/or language disorders but there is evidence of a number of associated risk factors, including medical difficulties (for example, being born small for gestational age), and motor skill deficits ( Hill 2001 ). There is increasing evidence of genetic underpinnings of speech and/or language disorders ( SLI Consortium 2004 ; Bishop 2006 ); the links appear to be stronger for expressive language difficulties than receptive language difficulties ( Kovas 2005 ). There remain questions as to the nature of the role of environmental factors, whether distal (for example, socioeconomic status and maternal education) or proximal (for example, parent‐child and peer‐peer interaction and relationships) as causes of primary disorder, or whether these are factors affecting outcomes (mediators). Twin studies have so far suggested that heredity plays an increasingly strong role, especially as the child moves through primary school and especially for less socially‐disadvantaged children, but that environmental factors can have a relatively important role to play in the early years, and that marked language difficulties between higher and lower social groups are identifiable from very early on in children's development and tend to persist ( Bradbury 2015 ). It is likely that these risk factors act in a cumulative fashion to increase the severity of the presenting disorder ( Aram 1980 ) and are relevant when it comes to affecting access to educational and therapeutic resources.

Primary speech and/or language disorders can have far‐reaching implications for the child and his/her parent or carer in both the short and the longer term. Studies indicate that they may have adverse effects upon school achievement ( Aram 1984 ; Baker 1987 ; Bishop 1990 ; Catts 1993 ; Tallal 1997 ). It has recently been reported that "approximately two children in every class of 30 pupils will experience language disorder severe enough to hinder academic progress" ( Norbury 2016 ). They may also be associated with comorbid social, emotional and behavioural problems ( Huntley 1988 ; Rice 1991 ; Rutter 1992 ; Stothard 1998 ; Cohen 2000 ; Conti‐Ramsden 2004 ), and with peer interaction difficulties ( Murphy 2014 ). Children with primary speech and/or language disorders can also have long‐term difficulties that persist to adolescence and beyond ( Rescorla 1990 ; Haynes 1991 ; Johnson 1999 ), with some 30% to 60% experiencing continuing problems in reading and spelling, and with early difficulties predicting adult outcomes in literacy, mental health and employability ( Law 2009a ).

Description of the intervention

Interventions for children identified as having primary speech and/or language disorders include a variety of practices (methods, approaches, programmes) that are specifically designed to promote speech and/or language development or to remove barriers to participation in society that arise from a child’s difficulties, or both. Assessment of eligibility for intervention includes a combination of standardised assessment (where available), observations of linguistic and communicative performance, and professional judgement. Interventions are usually time limited and can be delivered by any professional group, but usually involve input from language specialists, most notably speech and language therapists/pathologists. The criteria for inclusion in such interventions commonly includes some reference to the specific or the primary nature of the language difficulty experienced by the children concerned — that is, it is not associated with low non‐verbal performance — and this allows for a focus on speech and language characteristics rather than a broader range of skills.

Interventions for children with speech and/or language disorders may be carried out directly or indirectly, and in a range of settings, such as the home, healthcare service provision, early years setting (nursery/school), school or private practices, by the specialist professionals themselves or through proxies such as parents, teachers or teaching assistants. There are also examples where interventions are delivered through peers in school.

Direct interventions focus on the treatment of the child individually, or within a group, depending on the age and needs of the children requiring therapy and the facilities available. In group treatments, it is thought that children benefit from the opportunities to interact and learn from one another.

Indirect interventions are often perceived to be more naturalistic in approach, allowing adults that are already within the child's environment to facilitate communication. Traditionally, these approaches create an optimum communicative environment for the child by promoting positive parent‐child interaction. Indirect approaches are increasingly being employed within a range of settings where speech and language therapists train professionals and carers who work with the children, and provide programmes or advice on how to maximise the child's communicative environment and enhance communicative attempts.

Parents are often actively engaged in delivering interventions to younger children but tend to be less actively involved in the administration of the intervention as the child gets older. Many intervention models target behaviours using play to enhance generalisation. Interventions for children with primary speech and/or language disorders would, in many cases, meet the criteria for being a complex intervention ( Craig 2008 ), being made up of a number of elements that vary according to both the theoretical assumptions behind the intervention and the perceived needs of the child.

The majority of interventions involve the training of specific behaviours (speech sounds, vocabulary, sentence structures) accompanied by reinforcement. Most commonly this involves rewards of some form (stickers, tokens and, most often, praise). The assumption behind overt behavioural techniques is that language or speech can explicitly be taught and that gaps in the child's skills can be filled by instruction. In the past twenty years, most therapy has shifted from explicit training paradigms to those based on social learning theory, which assumes that children learn most effectively if they are trained within a social context ( Miller 2011 ).

As the child gets older the emphasis of interventions shifts towards a more functional approach, whereby children are taught skills that are most useful for them at that moment. This functional shift often involves a move from explicit instruction to a more 'meta‐cognitive' approach whereby the therapist will encourage the child to reflect on what they hear and then adopt it into their own repertoire. Often the therapist will present the child with alternatives and encourage them to make judgements based on their intrinsic grammatical or phonological knowledge. It is assumed that the process of making a judgement increases the child's chances of modifying their language and/or speech performance. 'Constructivist' or usage‐based explanations represent a new direction from a linguistic perspective ( Childers 2002 ; Riches 2013 ).

Speech and/or language therapy interventions vary in duration and intensity depending on the resources available, the perceived needs of the child, and policies of different speech and/or language therapy and educational services. The intensity and the duration of typical therapy interventions have yet to be evaluated systematically ( Warren 2007 ), although both of these issues have been raised as potentially important determinants of outcomes ( Law 2000 ; Hoffman 2009 ). In practice, some interventions are of short duration and relatively low intensity, for instance, six hours over a year. It is common for these short durations of intervention to be offered in 'blocks' of treatment, commonly once a week for a six‐week period. This may then be repeated depending on a child's progress — although there is no specific evidence underpinning this approach. In other instances, especially in schools, interventions may be delivered on a daily basis over a longer period. On balance, however, most speech and/or language interventions tend to be relatively short (less than 20 hours in total).

Treatment goals vary considerably depending on the perceived difficulty that the child is experiencing. While the focus is often on aspects of expressive language, many studies also focus on receptive language ability or verbal comprehension, and in the last decade there has been an increasing emphasis on pragmatic language difficulties (the way children use language with others). Treatment goals may focus on specific aspects of language or address a number of aspects of language in combination. For many speech and language therapists, the child's social skills and their ability to integrate with peers and negotiate the curriculum are key outcomes.

There have been a number of recent developments in intervention for children with primary speech and/or language disorders, listed as follows.

  • An increased use of computerised intervention packages, and most recently 'apps' (short for computerised 'application'), in education.
  • A move towards meta‐cognitive or meta‐linguistic interventions, especially for older children and often with a view to enhancing comprehension. These emphasise the child making judgements based on their underlying linguistic knowledge, and often use other, readily recognisable supports (that is, colour and shape).
  • Increased emphasis on universal or public health interventions whereby speech, and especially language, interventions are provided for whole populations using key messaging to parents and training public health professionals (for example, Health Visitors in the UK) ( Law 2013 ).
  • Increased focus on comorbidity, for example, the relationship between language skills and socio‐emotional skills , and whether interventions addressing the former may have outcomes relevant to the latter ( Law 2009b ).

How the intervention might work

There are some explicit elements in the mechanism of change that can be identified and that are likely to help identify the 'active ingredients' of any intervention both in terms of immediate and longer‐term benefits.

The delivery agent

Interventions, especially those for younger children, often involve the child's parents or caregivers. This creates an optimum communicative environment for the child by promoting positive parent‐child interaction. It can increase parental knowledge about speech and language development, including how they might target their child's language development at home. It also helps them provide 'carry over' or generalisation at home and then 'maintenance' over time. Similarly, training teachers and teaching assistants to carry out the intervention tasks has the potential to widen the child's opportunities to practice new skills. Targeted interventions are likely to be delivered by specialist practitioners such as a speech and language therapist/pathologist. Evidence does suggest that it may be less the category of person that is key here than the commitment of parents and the experience and training of the practitioner that makes the difference. This may be especially true for aspects of grammar and phonological development, where the specialist skills of the speech and language therapist/pathologists are likely to be of paramount importance.

The context of delivery

Intervention for children with speech and/or language disorder is carried out in a number of different contexts: the home, the clinic, the nursery/early years setting/kindergarten, the school, etc. Many of the interventions reported in earlier studies were 'clinical' in focus, in the sense that they were carried out in a clinic separate from school, perhaps with the parents in attendance or actively engaged. In practice, while this may still be true for many children when they first encounter specialist services, this type of 'pull out' model is much less common, and children are seen within settings where they spend most of their time. The rationale is that the context in which children learn language is critical for their outcomes and that maximising the most appropriate sort of intervention in the right environment is more likely to be effective in the long run than very specific intervention led solely by an adult 'expert'. That said, there may well be a case for this more specific, one‐to‐one intervention, especially with children who have more pronounced problems.

In recent years there has been an increased use of computer‐delivered intervention, effectively a mediated version of the adult 'expert' model. Computerised interventions work by providing very explicit links between the stimulus and the reward within the context of the game format in which they are presented. Due to their similarity to non‐educational computer games with which children are often familiar, these interventions are considered to have a positive effect on a child's motivation and engagement. Such approaches have been used widely where there has been limited access to specialist provision.

The intervention technique

Speech and language therapists commonly use a range of behavioural techniques, including imitation, modelling, repetition and extension. These draw the child's attention to the structure and the content of the speech or language input (or both), and the input is often presented at a developmental level a little ahead of that of the child. Stimuli are commonly repeated many times to draw the child's attention to the correct form. It is assumed that practice is one of the cornerstones of reinforcement and that repetition makes it easy for the child to learn what they have not otherwise acquired. Key to all intervention is building the child's motivation to speak.

Children with speech and/or language disorder are often described as having poor auditory skills. There has been an ongoing discussion as to whether the child's auditory skills are the key underlying problem or whether the breakdown is primarily linguistic in nature ( Bishop 2005 ), and there is individual variability in auditory processing skills, which must be recognised prior to intervention delivery in order to personalise intervention to individual strengths and weaknesses. Nevertheless, activities designed to heighten the child's awareness of their auditory environment are common components of most interventions and may be a key ingredient in effective interventions.

Children with speech and/or language disorders are often thought to have strengths in their visual, relative to their auditory, processing and for this reason their visual skills are used to compensate for their other difficulties. Within the child's most common contexts for learning, the classroom and the home environment, information is often presented visually ( NCLD 1999 ). In speech and language interventions, widespread use is made of pictorial support materials and visual timetables to help children make better use of auditory material. In some cases, interventions are supported by manual signing systems (for example, Makaton or Paget Gorman ).

Frequency, intensity and duration of interventions vary considerably. It may be that the amount of intervention is key to an intervention's success; however, variability between interventions and outcomes means it is difficult to make recommendations about optimal dosage ( Zeng 2012 ). It may be that for some outcomes that are measured continuously, such as vocabulary, there may be a simple dosage or response effect — the more intervention received, the greater the vocabulary learned — but for others, such as specific grammatical structures where outcomes are more focussed, intensity may be more functionally important than duration. Care has to be taken in adopting specific programmes to retain the recommended dosage, and to not assume that reducing the amount of intervention for pragmatic, cost‐related reasons is likely to lead to the same effects.

The outcome

On the one hand, the intervention is most likely to 'work' if the outcome directly reflects the intervention that the child receives. On the other, it is often considered more desirable and indeed more robust if effects can be demonstrated on standardised omnibus language tests. Consequently, an intervention may be said to work more effectively on very specific outcomes and may work less effectively on population, standard, norm‐referenced measures, which have commonly not been designed to capture change.

Adverse effects

There are no known adverse effects of the interventions concerned. It is important to acknowledge that there are potential implications in terms of raised anxiety in parents who are made aware that there is concern about their child's speech and/or language development. There could also be risks associated with children being taken out of their routine schooling (with resultant reduction in exposure to the curriculum) to attend specialist sessions if the sessions are found to be of uncertain benefit.

Why it is important to do this review

This protocol updates a previously published systematic review ( Law 2003a ), but is substantively different in that it excludes studies comparing interventions with alternative interventions — so called 'head‐to‐head' studies — so that this review will only report on treatments compared to no treatment or to a placebo. This has been done to aid interpretation of the results. An array of different alternative interventions, where there is rarely more than one version of any given alternative, make it difficult to report outcomes in a coherent fashion. Studies with alternative intervention comparison groups are often very different in terms of the treatment received. This increases heterogeneity and makes the combination of effect sizes problematic. Each alternative intervention comparison would need to be reported separately. It may be that in future iterations of this review, or in other reviews, specific head‐to head comparisons do become feasible.

There is a strong case for retaining the focus on interventions that include a broad range of language functions across childhood, to act as a benchmark in the field, although care needs to be taken to test for compatibility.

Previous reviews have largely been narrative in nature and thus prone to bias ( Goldstein 1991 ; Enderby 1996 ; Law 1997 ; McLean 1997 ; Gallagher 1998 ; Guralnick 1998 ; Olswang 1998 ; Yoder 2002 ; McCauley 2006 ; Leonard 2014 ). Two systematic reviews ( Nye 1987 ; Law 1998 ) were published prior to the publication of the first Cochrane review in the field ( Law 2003a ). A number have followed it, covering specific subpopulations or practice contexts; for example, interventions for preschool children only ( Schooling 2010 ), educational contexts ( Cirrin 2008 ), receptive language impairments ( Boyle 2010 ), parent‐child interaction ( Roberts 2011 ), grammatical development ( Ebbels 2013 ), computerised interventions ( Strong 2011 ), late talkers ( Cable 2010 ), language or literacy ( Reese 2010 ), and vocabulary learning in typically developing children ( Marulis 2010 ).

The original Cochrane review triggered a number of discussions about whether the approach employed in the review was the most effective, given the constraints associated with the subject domain and effectively captured in the Medical Research Council (MRC) guidelines ( Craig 2008 ). (See Pring 2004 ; Johnston 2005 ; Law 2005a ; Garrett 2006 ; Marshall 2011 ). While clinical guidelines to direct practice in speech and language therapy do exist ( RCSLT 2005 ; Johnson 2006 ), there remains little in the way of specific guidance on what type of intervention to offer children with primary speech and language impairment. This review has the potential to help inform such guidance where evidence is both sufficiently robust and sufficiently strong to warrant such recommendations.

Criteria for considering studies for this review

Types of studies.

We will include randomised controlled trials (RCTs).

Types of participants

Children and adolescents up to the age of 18 years who have been given a diagnosis of primary speech and/or language disorder by a speech and language therapist/pathologist, child development team or equivalent.

Exclusion criteria

We will exclude studies if there is clear evidence that children have learning disabilities, hearing loss, neuromuscular impairment or other primary conditions of which speech and/or language disorders are commonly a part. Children whose difficulties arise from stuttering or whose difficulties are described as learned misarticulations (for example, lateral /s/ (lisp) or labialised /r/ (rhotic r)) will also be excluded from this review. In addition, we will exclude studies that focus on bilingual or multilingual children as a feature of the study, and studies in which training of literacy skills is the primary focus of the study. We will also exclude from the review studies that include infants or babies.

Types of interventions

Any type of therapy intervention, of any duration and delivery method, compared with delayed ('wait‐list') or no‐treatment controls or general stimulation conditions. General stimulation conditions include, for example, studies where control children are assigned to a control condition designed to mimic the interaction found in therapy without providing the target linguistic input. These conditions may be cognitive therapy or general play sessions that do not focus on the area of interest in the study.

We will include therapy interventions designed to improve an area of speech and/or language functioning concerning either expressive and receptive phonology (production and understanding of speech sounds, including recognising and discriminating between speech sounds and awareness of speech sounds, for example, rhyming and alliteration), expressive or receptive vocabulary (production or understanding of words), expressive or receptive syntax (production or understanding of sentences and grammar), or pragmatic language.

Types of outcome measures

We will use formal standardised tests, criterion‐referenced tests, parent reports and language samples. Within each of these categories there are many different measures, and different measures assess different areas of speech and language. Some examples include the Clinical Evaluation of Language Fundamentals (CELF, Semel 1995 ), within which both language and phonology are measured, the New Reynell Developmental Language Scales (NRDLS, Edwards 2011 ) and the Children's Communication Checklist (CCC, Bishop 2003 ), which both measure language but not phonology, and the Diagnostic Evaluation of Articulation and Phonology (DEAP, Dodd 2006 ), which measures speech and phonology.

Intervention studies in this area commonly report more than one outcome (reflected in a range of different measures and measures that assess different areas of speech and language) and it may not always be explicit whether such outcomes are primary or secondary. In such cases we will make a judgement as to which of the outcomes are most closely linked to the goal of the intervention specified in the background to the study in question.

Outcomes used in the review must be matched to the participants' areas of difficulty (for example, we will not include receptive language outcomes in the review if one of the inclusion criteria for the study was that participants had to have receptive language within normal limits).

Primary outcomes

  • Adverse effects. We will monitor studies for adverse effects. These are likely to be in the form of increased response of control relative to treatment groups, raised parental anxiety, and high dropout rates reflecting poor acceptability or parental dissatisfaction.

Secondary outcomes

  • Composite language measures.
  • Expressive vocabulary.
  • Expressive syntax.
  • Receptive vocabulary.
  • Receptive syntax.
  • Expressive phonology.
  • Phonological awareness (including phonological recognition and discrimination).

We will use these primary and secondary outcomes to populate the 'Summary of findings' table.

Search methods for identification of studies

Electronic searches.

We will search the sources listed below for all available years. We will not limit our search by language, date of publication or publication status, and will seek translations where necessary.

  • Cochrane Central Register of Controlled Trials (CENTRAL; current issue) in the Cochrane Library, and which includes the Cochrane Developmental, Psychosocial and Learning Problems Specialised Register.
  • MEDLINE Ovid (1948 onwards).
  • MEDLINE E‐pub ahead of print Ovid (current issue).
  • MEDLINE In‐Process and Other Non‐Indexed Citations Ovid (current issue).
  • Embase Ovid (1980 onwards).
  • CINAHL EBSCOhost (Cumulative Index to Nursing and Allied Health Literature; 1937 onwards).
  • ERIC EBSCOhost (Education Resources Information Center; 1966 onwards).
  • PsycINFO Ovid (1872 onwards).
  • LILACS (Latin American and Caribbean Health Sciences Literature; lilacs.bvsalud.org/en).
  • SpeechBITE (speechbite.com).
  • ProQuest Dissertations & Theses UK & Ireland (1950 onwards).
  • Conference Proceedings Citation Index ‐ Science Web of Science (CPCI‐S; 1990 onwards).
  • Conference Proceedings Citation Index ‐ Social Science & Humanities Web of Science (CPCI‐SS&H; 1990 onwards).
  • Cochrane Database of Systematic Reviews (CDSR; current issue) in the Cochrane Library.
  • Epistemonikos (epistemonikos.org).
  • ClinicalTrials.gov (clinicaltrials.gov).
  • World Health Organization International Clinical Trials Registry Platform (WHO ICTRP; who.int/trialsearch).

The search strategy for MEDLINE is in Appendix 1 . We will modify this search strategy, as appropriate, for all other databases and report these additional search strategies in an Appendix in the full review.

Searching other resources

We will check the reference lists of included studies and relevant reviews identified by the electronic searches for further studies. We will also contact key authors in the field for information about ongoing or unpublished studies that we may have missed. In addition, we will search The Communication Trust's What Works database of interventions (thecommunicationtrust.org.uk/whatworks).

Data collection and analysis

Selection of studies.

Review authors, working in pairs (JL, JAD and JJVC), will independently select potentially‐relevant studies for inclusion from the titles and citations or abstracts list generated by the search. Review authors will not be blinded to the name(s) of the trial author(s), institution(s) or publication source at any level of review.

Full‐text copies of all reports will be obtained and, if necessary, translated in order to assess eligibility. Two review authors (JL, JAD and JJVC, working in pairs) will independently assess reports against the inclusion criteria established under Criteria for considering studies for this review . When information is missing, we will contact trial investigators, where possible. Studies that have been identified by mutual consent will be included in the review.

Studies for which multiple reports appear will be categorised as 'included' or 'excluded' only once, and associated publications listed as secondary references. We will document all work in accordance within PRISMA guidance ( Moher 2009 ), and produce a flowchart of the process.

Data extraction and management

Two review authors (JL, JAD and JJVC) will independently extract data from reports of all eligible studies using a piloted form covering the following.

  • Design and methods (including information necessary to complete 'Risk of bias' tables as per the Cochrane Handbook for Systematic Reviews of interventions ( Higgins 2011a )).
  • Participants (including demographics/baseline characteristics such as age, gender, socioeconomic status and severity of speech and language difficulty).
  • Interventions (setting, focus, method of delivery and duration).
  • Outcome measures and associated outcome data, paying particular attention to modifications to scales, identity of assessor and timing of measurement.

We will resolve uncertainty and disagreement through discussion until consensus is reached. In addition, we may request further information from trial investigators, to ensure a given study meets inclusion criteria.

We will use endpoint scores (or 'post‐intervention', 'Time 2' or 'T2' scores) as our preferred treatment effect measure. When necessary, we will code multiple reports of a single study onto a single data extraction form. We will use a single Excel sheet to manage all numerical data from all forms.

Assessment of risk of bias in included studies

At least two review authors (JL, JAD and JJVC) will independently assess the risk of bias within each included study according to the Cochrane Handbook for Systematic Reviews of Interventions ( Higgins 2011a ). Review authors will independently assess the risk of bias within published reports of each included study across the seven domains described below and assign ratings of 'low', 'high' or 'unclear' risk of bias.

1. Sequence generation

We will determine whether studies used computer‐generated random numbers or a table of random numbers, drew lots or envelopes, or relied on coin tossing, shuffling cards, or throwing dice.

  • Low risk of bias: the study authors explicitly stated that they used one of the above methods.
  • High risk of bias: the authors did not use any of the above methods.
  • Unclear risk of bias: there is no information on the randomisation method or it is not clearly presented.

2. Allocation concealment

We will evaluate whether investigators and participants could foresee assignments before screening was complete and consent was given.

  • Low risk of bias: researchers and participants were unaware of future allocation to treatment conditions.
  • High risk of bias: allocation was either not used or was not concealed from researchers before eligibility was determined, or was not concealed from participants before consent was given.
  • Unclear risk of bias: information regarding allocation concealment is not known or not clearly presented.

3. Blinding of participants and personnel

Neither participants nor treatment providers (therapists) can be kept blind to the intervention condition in studies of this nature, and the resultant risk of bias will be recorded as ‘high: assessors were not blind to treatment condition’ for these component groups for this domain.

4. Blinding of outcome assessment

We will address the issue of whether or not outcomes were assessed by self‐report or whether objective assessors and coders of measures were employed and, if so, what steps were taken to blind them to treatment conditions.

  • Low risk of bias: assessors were blind to the outcome assessment.
  • High risk of bias: assessors were not blind to the outcome assessment.
  • Unclear risk of bias: information on the blinding of assessors is unclear or unavailable from study authors.

5. Incomplete outcome data

We will identify the presence of incomplete outcome data as follows.

  • Low risk of bias: there are no dropouts/exclusions; there are some missing data but the reasons for missing data are unlikely to be related to the true outcome; or missing data are balanced in proportion across intervention groups, with similar reasons for missing data across groups.
  • High risk of bias: there is differential attrition across groups, reasons for dropout are different across groups, or there was inappropriate application of simple imputation (for example, assuming certain outcomes, last observation carried forward (LOCF), etc.).
  • Unclear risk of bias: the attrition rate is unclear or authors state that intention‐to‐treat analysis was used but provide no details.

6. Selective outcome reporting

To assess reporting bias, we will attempt to collect all study reports and protocols and trial registration information, if possible, and will track the collection and reporting of outcome measures across all available reports for each included study.

  • Low risk of bias: all outcome measures and follow‐ups are reported.
  • High risk of bias: data from some outcome measures are not reported.
  • Unclear risk of bias: it is not clear whether all data collected by study authors were reported.

7. Other sources of bias

Performance bias.

We will assess whether there were treatment differences between groups other than the main intervention.

  • Low risk of bias: there were no treatment differences between groups other than the main intervention.
  • High risk of bias: there were treatment differences between groups other than the main intervention.
  • Unclear risk of bias: it is unclear whether there were differences between groups or this information was not available from study authors.

We will attempt to use the judgement of 'unclear risk of bias' as infrequently as possible.

Publication bias

We will make a concerted effort to identify unpublished RCTs in the field of interventions for speech and/or language disorders in order to establish whether there is publication bias.

Measures of treatment effect

We will use endpoint scores (or immediate 'post‐intervention', 'Time 2' or 'T2' scores) as our preferred treatment effect measure. These data may be binary or continuous.

Binary data

Although most of our prespecified outcomes are typically assessed with continuous measures, we anticipate some investigators may choose to dichotomise scale data into 'improved' or 'not improved'. In such cases, we plan to calculate odds ratios (ORs) with 95% confidence intervals (CIs).

Continuous data

When studies have used the same continuous outcome measure we will calculate mean differences (MDs) with 95% CIs. When studies have used different outcome measures to assess the same construct (for example, by using different scales to assess syntactic structure), we will calculate standardised mean differences (SMDs) and 95% CIs.

We will analyse and present conceptually‐distinct outcomes separately and will describe the properties of all scales used in a table, so that decisions concerning appropriate categorisation will be transparent to readers.

In the event that change scores are reported and endpoint data are not available, we will pool the data in Review Manager 5 ( RevMan 2014 ), using the MD (provided all instruments used for that outcome are the same), as recommended in the Cochrane Handbook for Systematic Reviews of Interventions (section 9.4.5.2, Deeks 2011 ). If outcome scales differ, we will present change score data separately, as combining data using SMD is unfeasible.

Unit of analysis issues

Cluster‐randomised studies.

Although it is likely that most of the interventions delivered for children with speech and language impairments will have randomised children at the individual level, there is a possibility that children will be allocated at a service level (clinic/school/class); so‐called cluster‐randomised studies. Cluster randomisation reduces the risk of contamination across those delivering the intervention. If we identify cluster‐RCTs, we will adhere to the guidance on statistical methods for managing data from cluster‐RCTs provided in the Cochrane Handbook for Systematic Reviews of Interventions (section 16.3, Higgins 2011b ). We will check that adequate adjustments for clustering were made for estimates of treatment effects. If not, we will seek to extract or calculate effect estimates and their standard errors as for a parallel‐group trial, and adjust the standard errors to account for the clustering ( Donner 1980 ). This requires information on an appropriate intraclass correlation coefficient (ICC); an estimate of the relative variability in outcome within and between clusters ( Donner 1980 ). If this information is not available in the relevant report, we will request it from the study authors. If this is not available or we receive no response, we will use external estimates obtained from studies that provide the best match on outcome measures and types of clusters from existing databases of ICCs ( Ukoumunne 1999 ), or other studies within the review. If we are unable to identify an appropriate ICC, we will perform sensitivity analyses using a high ICC of 0.10, a moderate ICC of 0.01 and a small ICC of 0.00 (see Sensitivity analysis ). These values are rather arbitrary but, as it is unlikely that the ICC is actually 0, it is preferable to use them to adjust the effect estimates and their standard errors. We will combine the estimates and corrected standard errors from cluster‐RCTs with those from parallel designs using the inverse variance method in RevMan 2014 .

Multiple treatment arms

We are aware that investigators frequently attempt to test many interventions or variations on similar interventions within the context of a single trial, even with a small sample. In such circumstances, where this is deemed appropriate by the review team, we may combine multiple eligible interventions tested within the same trial. This will be carried out using a standard formula for this purpose, as indicated in section 7.7.3.8 in the Cochrane Handbook for Systematic Reviews of Inverventions ( Higgins 2011c ). This formula for combining multiple arms is located in Table 7.7a within the Handbook, and can be used to combine numbers into a single sample size, mean and standard deviation for each intervention group. Where this has been carried out we will make it explicit in the review's narrative.

Cross‐over trials

With any educational or behavioural intervention such as speech and language therapy, true cross‐over trials are extremely unlikely. Should they arise, we are likely to treat them as parallel‐group studies and extract data at the point of first cross‐over. What is more common in this field are pseudo cross‐over studies of multicomponent interventions, in which one part of an intervention is delivered before the other in one intervention arm, and the second part delivered first in a second treatment arm (this resembles a cross‐over trial but is, in effect, a study of 'order of treatment' effect). As the review excludes head‐to‐head trials, we will only include pseudo cross‐over studies with a third 'no treatment', 'waiting control' or 'treatment‐as‐usual' arm. Therefore, we will extract endpoint data for both groups (after all parts of the multicomponent treatment are delivered).

Dealing with missing data

We will make every effort to contact the original investigators of included studies to gather information missing in the written reports.

For studies in which dropout is high or differently distributed between groups within the study, or both, we plan to conduct a Sensitivity analysis in which we will exclude such studies. We will not conduct any imputation of our own.

Assessment of heterogeneity

We anticipate clinical and methodological heterogeneity in included studies for a number of reasons. Different criteria are applied to children entering studies, sometimes making comparability across studies difficult. Similarly, different measures of speech and language are used to identify children for inclusion in studies and to measure outcomes. Finally, as indicated above, it is not uncommon for children identified with speech and/or language disorders to experience other 'comorbid' conditions such as other developmental difficulties or socioemotional problems. In some cases these are recorded; in others, it is unclear whether children experience such difficulties or not. These differences can make it challenging to compare across studies. To account for these differences, we will record assessment thresholds and potential comorbidity in our data extraction form and carry out subgroup analyses comparing groups of studies using the same or different assessments, more or less inclusive criteria, and with and without comorbidities.

We will explore heterogeneity by conducting subgroup analyses in RevMan 2014 . Characteristics of heterogeneity to be explored include the presence of more than one type of language impairment based on included outcomes in the current review (for example, expressive language impairment and phonological impairment), and the presence of an additional behaviour impairment (for example, attention deficit hyperactivity disorder, or behavioural, emotional and social difficulties).

We will assess statistical heterogeneity using the Chi² test for heterogeneity and a P value of 0.10 to account for low power due to small sample size. In addition, we will assess heterogeneity through visual inspection of forest plots (considering the magnitude of direction and effect) and the I² statistic ( Higgins 2003 ). We will consider values between 50% and 90% to represent substantial heterogeneity. As we will be using the random‐effects model we will also report tau² as a measure of between‐study variance. We will assess clinical and methodological heterogeneity by meta‐regression, using subgroups to explore how categorical study characteristics are associated with the intervention effects in the meta‐analysis.

See Subgroup analysis and investigation of heterogeneity .

Assessment of reporting biases

We plan to investigate the possibility of reporting biases, including publication bias, by assessing funnel plots for asymmetry where 10 or more studies report on the same outcome ( Egger 1997 ; Sterne 2001; Deeks 2005). Asymmetry could be due to publication bias or to a genuine relationship between trial size and effect size ( Sterne 2000 ). We will examine clinical variation of the studies to explore asymmetry.

We will diligently search for trial protocols for all included studies within the review; however, we are conscious that the trend to register protocols for trials has been less robust than in more traditionally 'medical' fields over time.

Data synthesis

We will only combine data where the intervention and the measurement are conceptually the same; primarily this will focus on the participant and intervention characteristics and study outcome. For example, all parent‐child interventions targeting and measuring expressive language may be combined. After this first pass, we will then make a judgement as to whether the interventions and measurements included in other studies are sufficiently similar to compare. We will base our decision to perform a quantitative synthesis of the data on whether the method of delivery (for example, parent, clinician) and outcome (for example, language, expressive vocabulary) of the intervention are the same constructs across studies. We will not combine data where interventions fall into different delivery or measurement categories.

Where appropriate, we will carry out data synthesis in RevMan 2014 , using inverse‐variance weighting. Differences in apparent intervention effects are considered as random effects (as it is less understood why such differences occur). If we are unable to conduct a meta‐analysis, we will carry out a narrative review of data.

Subgroup analysis and investigation of heterogeneity

We plan to conduct subgroup analyses to explore the impact of the study characteristics listed below on the results.

  • intervention versus no intervention;
  • parent versus no intervention;
  • computer intervention versus no intervention; and
  • peer intervention versus no intervention.
  • intervention versus general stimulation;
  • parent intervention versus general stimulation;
  • computer intervention versus general stimulation; and
  • peer intervention versus general stimulation.
  • preschool children (birth to 4 years of age);
  • primary school children (5 years to 11 years of age); and
  • older children (12 years of age and above).
  • comorbid disorders (e.g. behaviour disorders, autism spectrum disorders); and
  • level of assessment (impairment cut‐off points).
  • Variance in degree of heterogeneity. We plan to account for variance in degree of heterogeneity of language disorders by comparing studies in which more than one language impairment is present.

Sensitivity analysis

We plan to conduct sensitivity analyses to explore the effects on the results of including and excluding the types of studies below.

  • Studies that do and do not have an explicit process for their randomisation.
  • Studies where blinding of outcome assessors was inadequate or not attempted.
  • Studies in which dropout is high (30%), or differently distributed between groups within the study, or both.

In addition, in cluster‐randomised studies where we are unable to identify an appropriate ICC, we will perform sensitivity analyses using a high ICC of 0.10, a moderate ICC of 0.01 and a small ICC of 0.00.

'Summary of findings' table

We will present a 'Summary of findings' table(s) within the completed review. We will use GRADEpro 2014 to prepare the 'Summary of findings' table(s), as needed. We plan to assess the overall quality of the evidence for each outcome as 'high', 'moderate', 'low' or 'very low' according to the GRADE approach (Schünemann 2011). We will consider the criteria below.

  • Impact of risk of bias of individual trials.
  • Precision of pooled estimate.
  • Inconsistency or heterogeneity (clinical, methodological and statistical).
  • Indirectness of evidence.
  • Impact of selective reporting and publication bias on effect estimate.

We will use our primary and secondary outcomes ( Types of outcome measures ) to populate the 'Summary of findings' table(s).

Acknowledgements

The review authors wish to acknowledge Jo Abbott, Esther Coren, Julian Higgins, Stuart Logan, Georgia Salanti, and Geraldine Macdonald for support in previous versions of the review, and to thank investigators Maggie Vance, Ron Gillam, Laura Justice, Anne Hesketh, Yvonne Wren, Aoiffe Gallagher, Jim Boyle, Tim Pring, Susan Ebbels, Gwen Lancaster, Lucy Meyers, Gina Conti‐Ramsden, Rosana Clemente Estevan, Marc Fey, Sue Roulstone, Shari Robertson, Joe Reynolds, Jan Broomfield, Anne O'Hare, Charmian Evans, Marc Schmidt, Ralph Shelton, Louise Sutton, Janet Baxendale and Karla Washington for providing extra data and information. We are also grateful to Toby Lasserson of the Cochrane Editorial Unit (CEU) for his translation of articles in German.

Latterly, we wish to thank Joanne Wilson, Margaret Anderson and Gemma McLoughlin of Cochrane Developmental, Psychosocial and Learning Problems (CDPLP).

Appendix 1. MEDLINE search strategy

1 exp Communication Disorders/ 2 (speech adj5 disorder$).tw,kf. 3 (speech adj5 delay$).tw,kf. 4 (speech adj5 impair$).tw,kf. 5 (language adj5 disorder$).tw,kf. 6 (language adj5 delay$).tw,kf. 7 (language adj5 impair$).tw,kf. 8 dysglossia.tw,kf. 9 anomia.tw,kf. 10 Aphasia.tw,kf. 11 articulation.tw,kf. 12 echolia.tw,kf. 13 rhinolalia.tw,kf. 14 (mute or mutism).tw,kf. 15 "central auditory processing disorder".tw,kf. 16 "semantic‐pragmatic disorder".tw,kf. 17 or/1‐16 18 speech therapy/ 19 language therapy/ 20 myofunctional therapy/ 21 (speech adj5 (patholog$ or screen$ or therap$)).tw,kf. 22 speech train$.tw,kf. 23 (language adj5 (patholog$ or screen$ or therap$)).tw,kf. 24 language training.tw,kf. 25 ((grammar or grammatical) adj5 (facilitation or intervention$ or program$ or teach$ or therap$ or train$)).tw,kf. 26 ("Active Listening for Active Learning" or "Broad Target Recast" or "Core Vocabulary" or "Cycles Approach" or "Cycles for Phonology" or Earobics or "Electropalatography" or "Fast ForWord " or "Focussed Auditory Stimulation" or "Gillon Phonological Awareness Programme" or "Hanen" or "Let’s Learn Language" or "Lexicon Pirate" or "Lidcombe Programme" or "Linking Language" or "LINK‐S" or "Little Talkers" or "Makaton" or " Maximal Oppositions" or "Meaningful minimal contrast therapy " or "MMCT" or "Milieu Teaching" or "Milieu Therapy" or "Morpho‐syntactic" or "Multiple Opposition Therapy").tw,kf. 27 ("Naturalistic Speech Intelligibility Training " or "Non‐Linear Phonology Intervention " or "Non‐speech Oro‐motor Exercise " or "Nuffield Dyspraxia Programme " or "Nuffield Early Language Intervention " or "Oral Language Programme" or "Phoneme Factory " or "Phonology with Reading Programme " or "Picture Exchange System " or "Pre‐school Autism Communication Therapy " or PACT or "Psycholinguistic Framework " or "Rapid Syllable Transition Treatment" or "Shape Coding" or "Social Communication Intervention Programme" or "Social Stories" or "Strathclyde Language Intervention" or "Talk Boost" or "Talking Time" or "Thinking Together" or "Visualising and Verbalising").tw,kf. 28 or/18‐27 29 17 and 28 30 exp Speech Disorders/th,rh 31 exp Language Disorders/th,rh 32 Speech Therapy/mt 33 Language Therapy/mt 34 or/30‐33 35 29 or 34 36 exp Child/ 37 Infant/ 38 adolescent/ 39 (child$ or infant$ or baby or babies or toddler$ or boy$ or girl$ or pre‐school$ or preschool$ or kindergarten$ or kinder‐garten or teen$ or adolescen$ or schoolchild$ or schoolboy$ or schoolgirl or young people or youth$).tw. 40 or/36‐39 41 35 and 40 42 randomised controlled trial.pt. 43 controlled clinical trial.pt. 44 randomi#ed.ab. 45 placebo$.ab. 46 drug therapy.fs. 47 randomly.ab. 48 trial.ab. 49 groups.ab. 50 or/42‐49 51 exp animals/ not humans.sh. 52 50 not 51 53 41 and 52

Appendix 2. Data extraction form

Author and date of paper/publication/thesis:

Journal (or other source):

Which comparison?

  • Speech and language therapy (SLT) versus nothing or wait‐list control (WLC); or
  • SLT versus general stimulation.

Country (try to include state/province or city, or both, as well):

Setting (for example, school, clinic):

Number of participants at randomisation and at completion:

Unit of allocation:

Age at entry:

Study mix, for example, socioeconomic status (SES):

Gender mix:

Inclusion criteria (severity cutoff):

Intervention:

Target area of intervention:

Who delivers intervention?

How often? How long?

Comparator group (as above):

Length of follow‐up (note assessment points):

All outcomes measured (include scale information):

Outcomes used within this review / chosen for comparison:

  • At the level of overall development, for example, phonological maturity or expressive language?
  • At the level of disability, for example, improvement in intelligibility or consonant improvement in speech?

Results (use table below, state follow‐up point from which data are taken)

Expand / copy as necessary – do one per outcome, per time point

Name of outcome and measure:

Time point (for example, post‐treatment, six months, one year):


(* Be sure to think about intention‐to‐treat (ITT) when writing number (N) in: have trialists already adjusted results?) (** Check whether endpoint or change data have been used)

____________________________________________________ 'Risk of bias' judgements – provide quotation and page number, then judgement

Adequate sequence generation?Yes / Unclear / No
Allocation concealment?Yes / Unclear / No
Blinding of outcome assessment?Yes / Unclear / No
Incomplete outcome data addressed?Yes / Unclear / No
Free of selective reporting?Yes / Unclear / No
Other sources of bias?Yes / Unclear / No

Power calculation?

Items to correspond with trial investigators about?

Date contacted investigators:

Protocol first published: Issue 1, 2017

7 November 2016AmendedSubgroup analysis extended in‐line with comments in text.
25 April 2016AmendedBackground cut down, response to comments completed throughout, new section on how the intervention might work
2 July 2008AmendedMinor update: 31/07/07
2 July 2008AmendedConverted to new review format.
2 July 2008New search has been performedThis is an update of the 2003 review.
26 July 2007New citation required and conclusions have changedSubstantive amendment

Contributions of authors

Professor James Law has overall responsibility for this review. All authors have contributed to the writing of this protocol.

Sources of support

Internal sources.

Office base and support for the review to be carried out during office hours

External sources

  • No sources of support supplied

Declarations of interest

James Law (JL) ‐ is an author on one included study ( Law 1999 ) and one excluded study in the previous version of this review ( Kot 1995 ), and has published a non‐Cochrane review in this area ( Law 1997 ). For those studies in which JL is involved, the two other authors (JAD and JJVC) will assess the eligibility of studies for inclusion, complete 'Risk of bias' assessments and extract data. JL received £10,000 funding from the Nuffield Foundation for the previous version of this review ( Law 2003a ); the protocol of which was also published ( Law 2003b ). JL is an Editor for CDPLP. Jane A Dennis (JAD) — is the Feedback Editor for CDPLP. Jenna JV Charlton (JJVC) — none known.

This review is coregistered within the Campbell Collaboration ( Law 2005b ), as is the published protocol ( Law 2003c ).

This review supersedes the review by Law J, Garrett Z, Nye C. Speech and language therapy interventions for children with primary speech and language delay or disorder. Cochrane Database of Systematic Reviews 2003, Issue 3. Art. No.: {"type":"entrez-nucleotide","attrs":{"text":"CD004110","term_id":"30320848","term_text":"CD004110"}} CD004110 . DOI: 10.1002/14651858.CD004110 ( Law 2003a ).

Additional references

  • Adams C, Lockton E, Freed J, Gaile J, Earl G, McBean K, et al. The Social Communication Intervention Project: a randomized controlled trial of the effectiveness of speech and language therapy for school‐age children who have pragmatic and social communication problems with or without autism spectrum disorder . International Journal of Language and Communication Disorders 2012; 47 ( 3 ):233‐44. [DOI: 10.1111/j.1460-6984.2011.00146.x; PUBMED: 22512510] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Aram DVM, Nation JE. Preschool language disorders and subsequent language and academic difficulties . Journal of Communication Disorders 1980; 13 ( 2 ):159‐70. [PUBMED: 7358877] [ PubMed ] [ Google Scholar ]
  • Aram D, Ekelman B, Nation J. Preshoolers with language disorders: 10 years later . Journal of Speech and Hearing Research 1984; 27 ( 2 ):232‐44. [DOI: 10.1044/jshr.2702.244] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baker L, Cantwell DP. A prospective psychiatric follow‐up of children with speech/language disorders . Journal of the American Academy of Child and Adolescent Psychiatry 1987; 26 ( 4 ):546‐53. [DOI: 10.1097/00004583-198707000-00015; PUBMED: 3654509] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Berman FS. The Acquisition of Certain Prepositions in 3 to 5 Year Old Children [Phd dissertation] . Denver (CO): University of Denver, 1970. [ Google Scholar ]
  • Bishop D, Adams C. A prospective study of the relationship between specific language impairment, phonology and reading retardation . Journal of Child Psychology and Psychiatry 1990; 31 ( 7 ):1027‐50. [PUBMED: 2289942] [ PubMed ] [ Google Scholar ]
  • Bishop DVM. Uncommon Understanding: Development and Disorders of Language . Chichester (UK): Psychology Press, 1997. [ Google Scholar ]
  • Bishop DVM. The Children's Communication Checklist . 2nd Edition. London (UK): Pearson's Assessment, 2003. [ Google Scholar ]
  • Bishop DVM, McArthur GM. Individual differences in auditory processing in specific language impairment: a follow‐up study using event‐related potentials and behavioural thresholds . Cortex 2005; 41 ( 3 ):327‐41. [EMSID: UKMS5282; PMCID: PMC1266051] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bishop DVM, Adams CV, Norbury CF. Distinct genetic influences on grammar and phonological short‐term memory deficits: evidence from 6‐year‐old twins . Genes, Brain and Behaviour 2006; 5 ( 2 ):158‐69. [DOI: 10.1111/j.1601-183X.2005.00148.x; PUBMED: 16507007] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bishop DVM. Ten questions about terminology for children with unexplained language problems . International Journal or Language & Communication Disorders 2014; 49 ( 4 ):381‐415. [DOI: 10.1111/1460-6984.12101; PMCID: PMC4314704; PUBMED: 25142090] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Boyle J, Gillham B, Smith N. Screening for early language delay in the 18‐36 month age‐range: the predictive validity of tests of production and implications for practice . Child Language Teaching & Therapy 1996; 12 ( 2 ):113‐27. [DOI: 10.1177/026565909601200202] [ CrossRef ] [ Google Scholar ]
  • Boyle J, McCartney E, O’Hare A, Law J. Intervention for mixed receptive‐expressive language impairment: a review . Developmental Medicine & Child Neurology 2010; 52 ( 11 ):994‐9. [DOI: 10.1111/j.1469-8749.2010.03750.x] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bradbury B, Corak M, Waldfogel J, Washbrook E. Too Many Children Left Behind: The U.S. Acheivement Gap in Comparative Perspective . New York (NY): Russel Sage Foundation, 2015. [ Google Scholar ]
  • Bzoch KR, League R. The Receptive‐Expressive Emergent Language Scale for the Measurement of Language Skills in Infancy . Gainesvile (FL): Tree of Life Press, 1970. [ Google Scholar ]
  • Cable AL, Domsch C. Systematic review of the literature on the treatment of children with late language emergence . International Journal of Language & Communication Disorders 2011; 46 ( 2 ):138‐54. [DOI: 10.3109/13682822.2010.487883; PUBMED: 21401813] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Catts HW. The relationship between speech‐language impairments and reading disabilities . Journal of Speech and Hearing Research 1993; 36 ( 5 ):948‐58. [PUBMED: 8246483] [ PubMed ] [ Google Scholar ]
  • Childers JB, Tomasello M. Two year‐olds learn novel nouns, verbs and conventional actions from massed or distributed exposure . Developmental Psychology 2002; 38 ( 6 ):967‐78. [PUBMED: 12428708] [ PubMed ] [ Google Scholar ]
  • Cirrin FM, Gillam RB. Language intervention practices for school‐aged children with spoken language disorders: a systematic review . Language,Speech and Hearing Services in Schools 2008; 39 :S110‐37. [DOI: 10.1044/0161-1461(2008/012)] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cohen NJ, Vallance DD, Barwick M, Im N, Menna R, Horodezjy NB, et al. The interface between ADHD and language impairment: an examination of language, achievement and cognitive processing . Journal of Child Psychology and Psychiatry 2000; 41 ( 3 ):353‐62. [PUBMED: 10784082] [ PubMed ] [ Google Scholar ]
  • Conti‐Ramsden G, Botting N. Social difficulties and victimization in children with SLI at 11 years of age . Journal of Speech, Language and Hearing Research 2004; 47 :145‐61. [DOI: 10.1044/1092-4388(2004/013)] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, Petticrew M. Developing and evaluating complex interventions: the new Medical Research Council guidance . BMJ 2008; 337 :a1655. [DOI: 10.1136/bmj.a16] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Deeks JJ, Higgins JPT, Altman DG. Chapter 9: Analysing data and undertaking meta‐analyses. In: Higgins JP, Green S, editor(s). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011). The Cochrane Collaboration, 2011 . Available from www.handbook.cochrane.org.
  • Dodd B, Hua Z, Crosbie S, Holm A, Ozanne A. Diagnostic Evaluation of Articulation and Phonology (DEAP) . San Antonio (TX): Pearson Assessment, 2006. [ Google Scholar ]
  • Donner A, Koval JJ. The estimation of interclass correlation in the analysis of family data . Biometrics 1980; 36 ( 1 ):19‐25. [DOI: 10.2307/2530491] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders . 5th Edition. Washington (DC): American Psychiatric Publishing, 2013. [ Google Scholar ]
  • Ebbels S. Effectiveness of intervention for grammar in school‐aged children with primary language impairments: a review of the evidence . Child Language Teaching & Therapy 2013; 30 ( 1 ):7‐40. [DOI: 10.1177/0265659013512321] [ CrossRef ] [ Google Scholar ]
  • Edwards S, Letts C, Sinka I. The New Reynell Developmental Language Scales . Chiswick (UK): GL Assessment, 2011. [ Google Scholar ]
  • Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta‐analysis detected by a simple, graphical test . BMJ 1997; 315 ( 7109 ):629‐34. [DOI: 10.1136/bmj.315.7109.629] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Enderby P, Emerson J. Speech and language therapy: does it work? . BMJ 1996; 312 ( 7047 ):1655‐8. [PMCID: PMC2351353] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Fey ME, Cleave PL, Ravida AI, Long SH, Dejmal AE, Easton DL. Effects of grammar facilitation on phonological performance of children with speech and language impairments . Journal of Speech, Language and Hearing Research 1994; 37 ( 3 ):594‐607. [DOI: 10.1044/jshr.3703.594] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fey ME, Cleave PL, Long SH. Two models of grammar facilitation in children with language impairments: phase 2 . Journal of Speech, Language and Hearing Research 1997; 40 ( 1 ):5‐19. [PUBMED: 9113855] [ PubMed ] [ Google Scholar ]
  • Gallagher TM. Treatment research in speech, language and swallowing: lessons from child language disorders . Folia Phoniatrica et Logopaedica 1998; 50 ( 3 ):165–82. [PUBMED: 9691530] [ PubMed ] [ Google Scholar ]
  • Garrett Z, Thomas J. Systematic reviews and their application to research in speech and language therapy: a response to T. R. Pring’s ‘Ask a silly question: two decades of troublesome trials’ (2004) . International Journal of Language & Communication Disorders 2006; 41 ( 1 ):95‐105. [DOI: 10.1080/13682820500071542; PUBMED: 16272005] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Goldman R, Fristoe M. Goldman‐Fristoe Test of Articulation . Circle Pines (MN): American Guidance Service, 1969. [ Google Scholar ]
  • Goldstein H, Hockenburger EH. Significant progress in child language intervention: an 11‐year retrospective . Research in Developmental Disabilities 1991; 12 ( 4 ):401‐24. [PUBMED: 1792364] [ PubMed ] [ Google Scholar ]
  • Grigsby OJ. An experimental study of the development of concepts of relationship in preschool children as evidenced by their expressive ability . Journal of Experimental Education 1932; 1 ( 2 ):144‐62. [ Google Scholar ]
  • Guralnick MJ. Efficacy in early childhood intervention programs . In: Odom SJ, Karnes MB editor(s). Early Intervention for Infants and Children with Handicaps . Baltimore (MD): Paul H Brookes Publishing Company, 1988:63‐73. [ Google Scholar ]
  • Haynes C, Naidoo S. Children with Specific Speech and Language Impairment . Oxford (UK): Blackwell, 1991. [ Google Scholar ]
  • Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta‐analyses . BMJ 2003; 327 ( 7414 ):557‐60. [DOI: 10.1136/bmj.327.7414.557] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins JPT, Altman DG, Sterne JAC. Chapter 8: Assessing risk of bias in included studies. In: Higgins JP, Green S, editor(s). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011). The Cochrane Collaboration, 2011 . Available from www.handbook‐cochrane.org.
  • Higgins JPT, Deeks JJ, Altman DG. Chapter 16: Special topics in statistics. In: Higgins JPT, Green S, editor(s). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011). The Cochrane Collaboration, 2011 . Available from www.handbook.cochrane.org.
  • Higgins JPT, Deeks JJ. Chapter 7: Selecting studies and collecting data. In: Higgins JP, Green S, editor(s). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011). The Cochrane Collaboration, 2011 . Available from www.handbook.cochrane.org.
  • Hill EL. Non‐specific nature of specific language impairment: a review of the literature with regard to concomitant motor impairments . International Journal of Language & Communication Disorders 2001; 36 ( 2 ):149‐71. [PUBMED: 11344592] [ PubMed ] [ Google Scholar ]
  • Hoffman LM. Narrative language intervention intensity and dosage: telling the whole story . Topics in Language Disorders 2009; 29 ( 4 ):329‐43. [DOI: 10.1097/TLD.0b013e3181c29d5f] [ CrossRef ] [ Google Scholar ]
  • Huntley RMC, Holt KS, Butterfill A, Latham C. A follow‐up study of a language intervention programme . International Journal of Language & Communication Disorders 1988; 23 ( 2 ):127‐40. [DOI: 10.3109/13682828809019882] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Johnson CL, Beitchman JH, Young A, Escobar M, Atkinson L, Wilson B, et al. Fourteen‐year follow‐up of children with and without speech/language impairments: speech/language stability and outcomes . Journal of Speech, Language and Hearing Research 1999; 42 ( 3 ):744‐61. [PUBMED: 10391637] [ PubMed ] [ Google Scholar ]
  • Johnson CJ. Getting started in evidence‐based practice for childhood speech‐language disorders . American Journal of Speech‐Language Pathology 2006; 15 ( 1 ):20‐35. [DOI: 10.1044/1058-0360(2006/004)] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Johnston J. Re: Law, Garrett, and Nye (2004a). "The Efficacy of Treatment for Children With DevelopmentalSpeech and Language Delay/Disorder: A Meta‐Analysis" . Journal of Speech, Language and Hearing Research 2005; 48 ( 5 ):1114–7. [DOI: 1092-4388/05/4805-1114] [ PubMed ] [ Google Scholar ]
  • Kot A, Law J. Intervention with preschool children with specific language impairments: a comparison of two different approaches to treatment . Child Language Teaching & Therapy 1995; 11 ( 2 ):144‐62. [DOI: 10.1177/026565909501100202] [ CrossRef ] [ Google Scholar ]
  • Kovas Y, Hayiou‐Thomas ME, Oliver B, Dale PS, Bishop DVM, Plomin R. Genetic influences in different aspects of language development: the etiology of language skills in 4.5‐year‐old twins . Child Development 2005; 76 ( 3 ):632‐51. [DOI: 10.1111/j.1467-8624.2005.00868.x; PUBMED: 15892783] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Law J, Reilly S, Snow PC. Child speech, language and communication need re‐examined in a public health context: a new direction for the speech and language therapy profession . International Journal of Language & Communication Disorders 2013; 48 ( 5 ):486‐96. [DOI: 10.1111/1460-6984.12027; PUBMED: 24033648] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Law J. Evaluating intervention for language impaired children: a review of the literature . European Journal of Disorders of Communication 1997; 32 ( 2 ):1‐14. [PUBMED: 9279424] [ PubMed ] [ Google Scholar ]
  • Law J, Boyle J, Harris F, Harkness A, Nye C. Screening for speech and language delay: a systematic review of the literature . Health Technology Assessment 1998; 2 ( 9 ):1‐184. [PUBMED: 9728296] [ PubMed ] [ Google Scholar ]
  • Law J, Kot A, Barnett G. A comparison of two methods for providing intervention to three year old children with expressive/receptive language impairment . London (UK): City University of London; 1999. Unpublished report to NHS. [DOI: http://eresearch.qmu.ac.uk/422/]
  • Law J, Conti‐Ramsden G. Treating children with speech and language impairments. Six hours of therapy is not enough . BMJ 2000; 321 ( 7266 ):908‐9. [DOI: 10.1136/bmj.321.7266.908] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Law J, Garrett Z, Nye C. The specificity of a systematic review is the key to its value: a response to Johnston . Journal of Speech, Language and Hearing Research 2005; 48 :1118‐20. [DOI: 10.1044/1092-4388(2005/078)] [ CrossRef ] [ Google Scholar ]
  • Law J, Rush R, Schoon I, Parsons S. Modelling developmental language difficulties from school entry into adulthood: literacy, mental health and employment outcomes . Journal of Speech, Language and Hearing Research 2009; 52 ( 6 ):1401‐16. [DOI: 10.1044/1092-4388(2009/08-0142); PUBMED: 19951922] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Law J, Plunkett C. The interaction between behaviour and speech and language difficulties: does intervention for one affect outcomes in the other? Technical Report . London (UK): EPPI‐Centre; 2009 November. Report No.: 1705.
  • Leonard LB. Chapter 13. The nature and efficacy of treatment . In: Leonard LB editor(s). Children with Specific Language Impairment . 2nd Edition. Cambridge (MA): MIT Press, 2014:349‐72. [ Google Scholar ]
  • The Makaton Charity. Let's Talk Makaton . www.makaton.org (accessed 4 July 2016).
  • Manolson A. It Takes Two to Talk. A Parent's Guide to Helping Children Communicate . 3rd Edition. Toronto (ON): The Hanen Centre, 1992. [ Google Scholar ]
  • Marshall J, Goldbart J, Pickstone C, Roulstone S. Application of systematic reviews in speech‐and‐language therapy . International Journal Language & Communication Disorders 2011; 46 ( 3 ):261‐72. [DOI: 10.3109/13682822.2010.497530; PUBMED: 21575068] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marulis LM, Neuman B. The effects of vocabulary intervention on young children’s word learning: a meta‐analysis . Review of Educational Research 2010; 80 ( 3 ):300‐35. [DOI: 10.3102/0034654310377087] [ CrossRef ] [ Google Scholar ]
  • McCauley RJ, Fey ME. Treatment of Language Disorders in Children . Baltimore (MD): Paul H Brookes Publishing Company, 2006. [ Google Scholar ]
  • McLean LK, Woods Cripe JW. The effectiveness of early intervention for children with communication disorders . In: Guralnick MJ editor(s). The Effectiveness of Early Intervention . Baltimore (MD): Paul H Brookes Publishing Company, 1997:349‐428. [ Google Scholar ]
  • Miller PH. Theories of Developmental Psychology . 5th Edition. New York (NY): Worth Publishers, 2011. [ Google Scholar ]
  • Moher D, Liberati A, Tetzlaff J, Altman DG, for the PRISMA Group. Preferred reporting items for systematic reviews and meta‐analyses: the PRISMA statement . BMJ 2009; 339 :b2535. [DOI: 10.1136/bmj.b2535] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Moher D, Hopewell S, Schultz KF, Montori V, Gøtzsche PC, Devereaux PJ, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trial . BMJ 2010; 340 :c869. [http://dx.doi.org/10.1136/bmj.c869 (Published 24 March 2010)] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Murphy SM, Faulkner DM, Farley LR. The behaviour of young children with social communication disorders during dyadic interaction with peers . Journal of Abnormal Child Psychology 2014; 42 ( 2 ):277‐89. [DOI: 10.1007/s10802-013-9772-6; PUBMED: 23794095] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • National Center for Learning Disabilities. Visual and auditory processing disorders . www.ldonline.org/article/6390?theme=print (accessed 4 July 2016).
  • Nelson HD, Nygren P, Walker M, Panoscha R. Evidence synthesis, number 41. Screening for speech and language delay in preschool children . www.ahrq.gov/downloads/pub/prevent/pdfser/speechsyn.pdf (accessed 20 October 2011).
  • Norbury CF, Gooch D, Wray C, Baird G, Charman T, Simonoff E, et al. The impact of nonverbal ability on prevalence and clinical presentation of language disorder: evidence from a population study . Journal of Child Psychology and Psychiatry 2016 May 16 [Epub ahead of print]. [DOI: 10.1111/jcpp.12573] [ PMC free article ] [ PubMed ] [ CrossRef ]
  • Nye C, Foster SH, Seaman D. Effectiveness of language intervention with language/learning disabled children . Journal of Speech and Hearing Disorders 1987; 52 ( 4 ):348‐57. [PUBMED: 3669632] [ PubMed ] [ Google Scholar ]
  • Olswang LB. Treatment efficacy research . In: Fratelli C editor(s). Measuring Outcomes in Speech and Language Pathology . New York (NY): Thieme, 1998. [ Google Scholar ]
  • Paget Gorman Society . www.pagetgorman.org (accessed 4 July 2016).
  • Plante E. Criteria for SLI: the Stark and Tallal legacy and beyond . Journal of Speech, Language and Hearing Research 1998; 41 ( 4 ):951‐7. [DOI: 10.1044/jslhr.4104.951] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pring T. Ask a silly question: two decades of troublesome trials . International Journal of Language & Communication Disorders 2004; 39 ( 3 ):285‐302. [DOI: 10.1080/13682820410001681216; PUBMED: 15204442] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Royal College of Speech and Language Therapists. Clinical Guidelines . Oxford (UK): Speechmark Publishing Ltd, 2005. [ Google Scholar ]
  • Reese E, Sparks A, Leyva D. A review of parent interventions for preschool children’s language and emergent literacy . Journal of Early Childhood Literacy 2010; 10 ( 1 ):97‐117. [DOI: 10.1177/1468798409356987] [ CrossRef ] [ Google Scholar ]
  • Reilly S, Bishop DVM, Tomblin B. Terminological debate over language impairment in children: forward movement and sticking points . International Journal of Language & Communication Disorders 2014; 49 ( 4 ):452‐62. [DOI: 10.1111/1460-6984.12111; PMC4312775; PUBMED: 25142092] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rescorla L, Schwartz E. Outcomes of toddlers with specific expressive language delay . Applied Psycholinguistics 1990; 11 ( 4 ):393‐407. [DOI: 10.1017/S0142716400009644] [ CrossRef ] [ Google Scholar ]
  • Nordic Cochrane Centre, The Cochrane Collaboration. Review Manager 5 (RevMan 5) . Version 5.3. Copenhagen: Nordic Cochrane Centre, The Cochrane Collaboration, 2014.
  • Rice ML, Sell MA, Hadley PA. Social interactions of speech‐ and language‐impaired children . Journal of Speech and Hearing Research 1991; 34 ( 6 ):1299‐307. [PUBMED: 1787712] [ PubMed ] [ Google Scholar ]
  • Riches NG. Training the passive in children with specific language impairment: a usage‐based approach . Child Language Teaching & Therapy 2013; 29 ( 2 ):155‐69. [DOI: 10.1177/0265659012466667] [ CrossRef ] [ Google Scholar ]
  • Roberts MY, Kaiser AP. The effectiveness of parent‐implemented language interventions: a meta‐analysis . American Journal of Speech‐Language Pathology 2011; 20 :180–99. [DOI: 10.1044/1058-0360(2011/10-0055] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rutter M, Mahwood L, Howlin P. Language delay and social development . In: Fletcher P, Hall D editor(s). Specific Speech and Language Disorders in Children . London (UK): Whurr, 1992:63‐78. [ Google Scholar ]
  • Schooling T, Venediktov R, Leech H. Evidence‐based systematic review: effects of service delivery on the speech and language skills of children from birth to 5 years of age . www.asha.org/uploadedFiles/EBSR‐Service‐Delivery.pdf (accessed 8 December 2015).
  • Schulz KF, Altman DG, Moher D, for the CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials . Trials 2010; 11 :32. [DOI: 10.1186/1745-6215-11-32] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Semel EM, Wiig EH, Secord W. Clinical Evaluation of Language Fundamentals . 3rd Edition. San Antonio (TX): The Psychological Corporation, 1995. [ Google Scholar ]
  • Shriberg LD, Kwiatkowski J. Phonological disorders III: a procedure for assessing severity of involvement . Journal of Speech and Hearing Disorders 1982; 47 :256‐70. [DOI: 10.1044/jshd.4703.256] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Specific Language Impairment Consortium (SLIC). Highly significant linkage to the SLI1 locus in an expanded sample of individuals affected by specific language impairment (SLI) . American Journal of Human Genetics 2004; 74 ( 6 ):1225‐38. [DOI: 10.1086/421529; PMCID: PMC1182086 ; PUBMED: 15133743] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stark RE, Tallal RP. Selection of children with specific language deficits . Journal of Speech and Hearing Disorders 1981; 46 ( 2 ):114‐22. [PUBMED: 7253588] [ PubMed ] [ Google Scholar ]
  • Sterne JAC, Gavaghan D, Egger M. Publication and related bias in meta‐analysis: power of statistical tests and prevalence in the literature . Journal of Clinical Epidemiology 2000; 53 ( 11 ):1119‐29. [PUBMED: 11106885] [ PubMed ] [ Google Scholar ]
  • Stothard SE, Snowling MJ, Bishop DVM, Chipchase BB, Kaplan CA. Language‐impaired preschoolers: a follow‐up into adolescence . Journal of Speech, Language and Hearing Research 1998; 41 ( 2 ):407‐18. [PUBMED: 9570592] [ PubMed ] [ Google Scholar ]
  • Strong GK, Torgerson CJ, Torgerson D, Hulme C. A systematic meta‐analytic review of evidence for the effectiveness of the ‘Fast ForWord’ language intervention program . Journal of Child Psychology and Psychiatry 2011; 52 ( 3 ):224–35. [DOI: 10.1111/j.1469-7610.2010.02329.x; PMCID: PMC3061204; PUBMED: 20950285] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tallal P, Allard L, Miller S, Curtiss S. Chapter 10. Academic outcomes of language impaired children . In: Hulme C, Snowling M editor(s). Dyslexia: Biology, Cognition and Intervention . London (UK): Whurr Publishers, 1997:167‐81. [ Google Scholar ]
  • Tomblin JB, Smith E, Zhang X. Epidemiology of specific language impairment: prenatal and perinatal risk factors . Journal of Communication Disorders 1997; 30 ( 4 ):325‐43. [PUBMED: 9208366] [ PubMed ] [ Google Scholar ]
  • Tomblin JB, Zhang X, Weiss A, Catts H, Weismer SE. Chapter 4. Dimensions of individual differences in communication skills among primary grade children . In: Rice ML, Warren SF editor(s). Developmental Language Disorders: From Phenoypes to Etiologies . Mahwah (NJ): Lawrence Erlbaum Associates, 2004:53‐76. [ Google Scholar ]
  • Ukoumunne OC, Gulliford MC, Chinn S, Sterne JA, Burney PG. Methods for evaluating area‐wide and organisation‐based interventions in health and health care: a systematic review . Health Technology Assessment 1999; 3 ( 5 ):3‐92. [PUBMED: 10982317] [ PubMed ] [ Google Scholar ]
  • Ward S, Birkett D. Ward Infant Language Screening Test Assessment. Acceleration Remediation — Manual and Assessment . Manchester (UK): Central Manchester Health Care Trust, 1994. [ Google Scholar ]
  • Warren SF, Fey ME, Yoder PJ. Differential treatment intensity research: a missing link to creating optimally effective communication interventions . Mental Retardation and Developmental Disabilities Research Reviews 2007; 13 ( 1 ):70‐7. [DOI: 10.1002/mrdd.20139; PUBMED: 17326112] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yoder PJ, McDuffie A. Treatment of primary language disorders in early childhood: evidence of efficacy . In: Accardo P, Rogers B, Capute A editor(s). Disorders of Language Development . Baltimore (MD): York Press, 2002:151‐77. [ Google Scholar ]
  • Yoder PJ, Kaiser AP, Alpert CL. An exploratory study of the interaction between language teaching methods and child characteristics . Journal of Speech and Hearing Research 1991; 34 ( 1 ):155‐67. [PUBMED: 2008069] [ PubMed ] [ Google Scholar ]
  • Zeng B, Law J, Lindsay G. Characterizing optimal intervention intensity: the relationship between dosage and effect size in interventions for children with developmental speech and language difficulties . International Journal of Speech‐Language Pathology 2012; 14 ( 5 ):471‐7. [DOI: 10.3109/17549507.2012.720281; PUBMED: 22974106] [ PubMed ] [ CrossRef ] [ Google Scholar ]

References to other published versions of this review

  • Law J, Garrett Z, Nye C. Speech and language therapy interventions for children with primary speech and language delay or disorder . Cochrane Database of Systematic Reviews 2003, Issue 3 . [DOI: 10.1002/14651858.CD004110] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Law J, Garrett Z, Nye C. Speech and language therapy interventions for children with primary speech and language delay or disorder . Cochrane Database of Systematic Reviews 2003, Issue 1 . [DOI: 10.1002/14651858.CD004110] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Law J, Garrett Z, Nye C. Speech and language therapy interventions for children with primary speech and language delay or disorder [Protocol] . Campbell Systematic Reviews2003, issue 1. [ PMC free article ] [ PubMed ]
  • Law J, Garrett Z, Nye C. Speech and language therapy interventions for children with primary speech and language delay or disorder . Campbell Systematic Reviews2005; Vol. 1, issue 5. [DOI: 10.4073/csr.2005.5; www.campbellcollaboration.org/library/speech‐and‐language‐therapy‐interventions‐for‐children‐with‐primary‐speech‐and‐language‐delay‐or‐disorder‐a‐systematic‐review.html] [ PMC free article ] [ PubMed ] [ CrossRef ]

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Maps & Floorplans
  • Libraries A-Z

University of Missouri Libraries

  • Ellis Library (main)
  • Engineering Library
  • Geological Sciences
  • Journalism Library
  • Law Library
  • Mathematical Sciences
  • MU Digital Collections
  • Veterinary Medical
  • More Libraries...
  • Instructional Services
  • Course Reserves
  • Course Guides
  • Schedule a Library Class
  • Class Assessment Forms
  • Recordings & Tutorials
  • Research & Writing Help
  • More class resources
  • Places to Study
  • Borrow, Request & Renew
  • Call Numbers
  • Computers, Printers, Scanners & Software
  • Digital Media Lab
  • Equipment Lending: Laptops, cameras, etc.
  • Subject Librarians
  • Writing Tutors
  • More In the Library...
  • Undergraduate Students
  • Graduate Students
  • Faculty & Staff
  • Researcher Support
  • Distance Learners
  • International Students
  • More Services for...
  • View my MU Libraries Account (login & click on My Library Account)
  • View my MOBIUS Checkouts
  • Renew my Books (login & click on My Loans)
  • Place a Hold on a Book
  • Request Books from Depository
  • View my ILL@MU Account
  • Set Up Alerts in Databases
  • More Account Information...

Communication 1200 Public Speaking: Researching Your Speech

  • Getting Started
  • Researching Your Speech
  • Finding Articles
  • Finding Books
  • Evaluating Information
  • Citing in APA

Why should you research your speech?

Gathering evidence through research builds confidence that what you tell your audience is credible.

Research-based speeches compel the audience to believe what you are saying is true.

1. Have a plan!

Decide on a purpose for your speech

To inform your audience...

Example: T o inform my audience about the importance of research and citation.

Craft a thesis

Create a complete sentence using the purpose of your speech.

Example: A properly researched and carefully cited speech will build confidence in the speaker and credibility for the audience.

4. Create a list of concept words to search

Create a set of concept word found in your thesis; add synonyms to the list

Example:  A properly researched and carefully cited speech will build confidence in the speaker and credibility for the audience.

Use quotations around phrases; truncate; use Boolean Logic to broaden or narrow the search

"public speaking"

credible {truncate to find variations (credib* = credibility, credible, etc.) (confid* = confident, confidence, etc.)}

Combine concept words into a search string using AND to narrow and OR to broaden

(speech OR "public speaking") AND (confid* OR credib*) AND research

2. Inventory your research needs

  • Think about the kind of resources that you might need to support your thesis.
  • The type of sources you choose should be determined by your purpose and thesis (books, journal articles, reference works, newspapers, magazines, interviews, etc.).
  • Discover@MU searches books, articles, magazines, newspapers, conference papers, DVDs, dissertations, theses, etc.
  • If your topic is subject or discipline related, choose an appropriate subject database to search
  • Keep track of each source by creating a reference list, using APA citation style standards

5. Start searching

Use the concept words to create a search string to find relevant articles and books

Use the Discover@MU tool to find a variety of resources

Use a subject database if your speech topic is subject or discipline-specific

Example:  Use Communication & Mass Media Complete to search for articles in the area of communication.

Subject databases have subject specific thesauri to help you locate subject-specific terms to use in your search.

3. Decide where to begin your research

Should you use the Internet?

Positives :  Academic peer-reviewed articles, e-books and electronic reference sources are available online, from the library website.

Negatives :  Starting with a search engine makes it more difficult to filter quality from quantity and evaluate the credibility of the source.  Use the CRAAP test or the 5Ws & 1H to evaluate the content.

6. Keep track of your citations as you do your research

When you browse your search results and identify resources you might want to:

Use the database feature of creating folders.

Add books or articles you wish to read to a folder.

Save the folder or send the contents to yourself in an email.

Choose APA style from the drop-down style menu before you send it to yourself.

Keep track of the books and articles you find in your research by creating a reference list, making sure that they match APA style standards.

For quick reference refer to the Purdue OWL writing center's APA Styleguide .

The 7th edition of the  Publication Manual of the American Psychological Association: the Official Guide to APA Style  is available at the Journalism Library, Columbia Missourian Library and Ellis Library.

  • << Previous: Getting Started
  • Next: Finding Articles >>
  • Last Updated: Apr 22, 2024 10:03 AM
  • URL: https://libraryguides.missouri.edu/Comm1200

Facebook Like

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 10 July 2023

Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity

  • Sri Harsha Dumpala 1 , 2 ,
  • Katerina Dikaios 3 , 4 ,
  • Sebastian Rodriguez 1 , 2 ,
  • Ross Langley 3 ,
  • Sheri Rempel 4 ,
  • Rudolf Uher 3 , 4 &
  • Sageev Oore 1 , 2  

Scientific Reports volume  13 , Article number:  11155 ( 2023 ) Cite this article

2333 Accesses

5 Citations

10 Altmetric

Metrics details

  • Computer science

The sound of a person’s voice is commonly used to identify the speaker. The sound of speech is also starting to be used to detect medical conditions, such as depression. It is not known whether the manifestations of depression in speech overlap with those used to identify the speaker. In this paper, we test the hypothesis that the representations of personal identity in speech, known as speaker embeddings, improve the detection of depression and estimation of depressive symptoms severity. We further examine whether changes in depression severity interfere with the recognition of speaker’s identity. We extract speaker embeddings from models pre-trained on a large sample of speakers from the general population without information on depression diagnosis. We test these speaker embeddings for severity estimation in independent datasets consisting of clinical interviews (DAIC-WOZ), spontaneous speech (VocalMind), and longitudinal data (VocalMind). We also use the severity estimates to predict presence of depression. Speaker embeddings, combined with established acoustic features (OpenSMILE), predicted severity with root mean square error (RMSE) values of 6.01 and 6.28 in DAIC-WOZ and VocalMind datasets, respectively, lower than acoustic features alone or speaker embeddings alone. When used to detect depression, speaker embeddings showed higher balanced accuracy (BAc) and surpassed previous state-of-the-art performance in depression detection from speech, with BAc values of 66% and 64% in DAIC-WOZ and VocalMind datasets, respectively. Results from a subset of participants with repeated speech samples show that the speaker identification is affected by changes in depression severity. These results suggest that depression overlaps with personal identity in the acoustic space. While speaker embeddings improve depression detection and severity estimation, deterioration or improvement in mood may interfere with speaker verification.

Similar content being viewed by others

speech research paper

Relative importance of speech and voice features in the classification of schizophrenia and depression

speech research paper

Speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting

speech research paper

Depression recognition using voice-based pre-training model

Introduction.

Major depressive disorder, also known as depression, is a common mental disorder and a leading cause of disability worldwide 1 . According to the World Health Organization 2 , more than 300 million people (around \(5\%\) of the global population) are living with depression. Early and objective diagnosis of depressive symptoms is crucial in reducing the burden of depression, but inadequate access to clinical services and associated stigma limit detection. In addition to depression identification, it is important to measure the severity of depression as repeated measurements are needed to guide effective treatment and improve outcomes 3 . Measurement-based care is known to be effective, yet it is underused in practise because of the perceived burden of existing measurement tools 4 . For treatment purposes, automated assessment systems would have potential to help, if they could detect and measure depression with some reliability from easy-to-obtain material. Automated assessment systems may facilitate the detection and treatment of depression if they could reliably detect and measure depression in easy to obtain material.

Audio recording of speech is easy to obtain and may contain sufficient information for the detection and measurement of depression 5 , 6 , 7 . The potential vocal biomarkers for depression explored in previous works include a range of acoustic features, such as prosodic characteristics (e.g., pitch and speech rate), spectral characteristics (e.g., Mel-frequency cepstral coefficients and formant frequencies), and glottal (vocal fold) excitation patterns 8 , 9 , 10 , 11 . Previous work explored spectral, prosodic and glottal features for depression detection and severity estimation, but the accuracy and generalizability of depression detection is limited by the size of samples with available diagnostic information. Obtaining large samples of speech with diagnostic information is expensive and associated with ethical challenges of datasets combining identifiable (voice) and sensitive (diagnosis) information. One way of making better use of valuable datasets of limited size is to use models pre-trained on different but related tasks in much larger datasets.

Speech audio is routinely used for recognizing the identity of the speaker. Voice-based speaker identification is highly accurate thanks to models trained on large corpus; for instance the VoxCeleb2 12 dataset includes 3000 hours of speech by 7160 speakers. The experience of depression is intimately connected with the core of a person’s identity 13 . Depression is associated with self-focused attention and altered perception of the self 14 . The change between depressed and well states is so striking that recovery is commonly described as being a ’different person’. Based on the intimate link between depression and personal identity, we hypothesized that a model pre-trained for speaker identification will improve the detection of depression and estimation of depression severity from natural speech. In this work, we test this hypothesis by exploiting the representations of personal identity, known as speaker embeddings, in the detection and measurement of depression in speech.

To qualify the above hypothesis, we define speaker embeddings as text-independent speaker-specific information that include acoustic characteristics that are independent of what the speaker is saying. Speaker embeddings represent not only the identifiable information such as gender, age, etc., but have been shown to provide important cues about the traits of the speaker such as personality, physical state, likability, and pathology 15 . Speaker embeddings extracted from speech have previously been used for tasks such as automatic speaker verification 16 , improving speech recognition performance 17 , multi-speaker speech synthesis 18 , and emotion classification 19 . In this work, we apply speaker embeddings to the tasks of depression detection and severity estimation from speech. We empirically show that the speaker characteristics of an individual—as represented by speaker embeddings—are affected by changes in depression severity of the individual. We consider three established variants of speaker embeddings; the x-vectors, ECAPA-TDNN (Emphasized Channel Attention, Propagation, and Aggregation Time-delay neural network) x-vectors 20 , and d-vectors 21 . By using speaker embeddings, we demonstrate that large, public, unlabeled datasets in conjunction with much smaller labeled datasets, can be leveraged to improve on the state-of-the-art (SOTA) performance in clinically meaningful tasks with implications for public health.

figure 1

Schematic depiction of the outline of the paper. There are three different phases in this work ( a ) Pre-training for speaker embeddings using a large non-medical speech data collected from N different speakers, ( b ) Depression analysis using speaker embeddings extracted from pre-trained models on longitudinal data, and ( c ) Depression detection and severity estimation using speaker embeddings extracted from pre-trained models.

Related work

The application of deep learning techniques significantly boosted the performance of depression detection using speech 22 , 23 , 24 , 25 , 26 , 27 . Initial work on speech-based depression detection used deep neural networks (DNNs) with fully-connected layers 22 . Then, convolutional neural networks (CNNs) and recurrent neural networks with long short-term memory (LSTM) units achieved better performance for depression detection and severity estimation 23 , 24 . Later, CNN-LSTM, dilated CNN and dilated CNN-LSTM models improved the SOTA performance in depression detection and severity estimation 25 , 26 , 27 , 28 . Further, sentiment and emotion embeddings were used for depression severity estimation 29 . To the best of our knowledge, none of the previous studies have explored the application of speaker embeddings for depression detection and severity estimation. i-vector-based models have been trained from scratch for detecting depression 30 , 31 , 32 , but these studies did not use i-vector models to extract speaker embeddings for depression detection. In this work, we use speaker embeddings to train multi-kernel CNN (MK-CNN) 33 and LSTM models for depression detection and severity estimation.

Our method consists of three phases, (1) Pre-training, (2) Depression analysis on longitudinal data, and (3) Depression detection and severity estimation. In pre-training phase of the speaker embedding models, given speech data collected from a large pool of speakers, we train speaker classification models to classify the speech samples based on the speaker labels. In the second phase, we use longitudinal data to analyze the effect of the changes in depression severity on speaker embeddings of an individual. In the third phase, we analyze the significance of speaker embeddings for the task of depression detection and severity estimation using speech. We use the speaker embeddings extracted using the pre-trained speaker classification models (trained in the first phase) in the second and third phases. Figure 1 shows an overview of our method.

In this work, we used two depression datasets, DAIC-WOZ 34 ((Distress Analysis Interview Corpus - Wizard of Oz—a corpus of clinical interviews) and Vocal Mind (spontaneous speech corpus obtained in a clinical setting) for analysis. DAIC-WOZ dataset contains a set of 219 clinical interviews collected from 219 participants (154 healthy and 65 depressed). Each audio sample was labeled with a PHQ-8 (Patient Health Questionnaire) score, in the range of 0–24, to denote the severity of depression. Vocal Mind dataset contains speech samples collected from 514 participants (403 healthy and 111 depressed). Depression severity of each speech sample was scored on the Montgomery and Asberg Depression Rating Scale (MADRS), which is in the range of 0–60. Further, longitudinal speech data also collected as a part of the Vocal Mind project was used. Longitudinal speech data was collected from 65 individuals at different dates, where variations in their depression severity scores were observed during this period. Manual transcripts with timestamps of the DAIC-WOZ and Vocal Mind datasets were used to discard the interviewer speech segments and retain only the participant speech segments for analysis. The retained participant speech segments were combined and were then divided into non-overlapping segments of 5–6 seconds in duration. This resulted in 15710 and 25144 segments for DAIC-WOZ and Vocal Mind datasets, respectively. The depression label assigned for each segment is same as the label of the entire speech sample. For DAIC-WOZ dataset, speech samples with PHQ-8 scores greater than or equal to 10 (PHQ-8 \(\ge \) 10) were considered as depressed and those samples with PHQ-8 scores less than 10 (PHQ-8 < 10) were considered as healthy. This corresponds to the recommended threshold for depression identification 35 , 36 . For the Vocal Mind dataset, speech samples with MADRS greater than or equal to 10 (MADRS \(\ge \) 10) were considered as depressed and those samples with MADRS less than 10 (MADRS < 10) were considered as healthy. This corresponds to the established threshold for remission on MADRS 37 . Table 1 provides various statistics of the DAIC-WOZ and the Vocal Mind datasets.

Pre-training

We use the pre-trained models available in speech-brain 38 for extracting the x-vectors and ECAPA-TDNN x-vectors from the speech samples. To extract d-vectors, we pre-trained the GE2E network on the task of speaker verification by consolidating two large non-clinical datasets (LibriSpeech 39 and VoxCeleb2 12 ), which are publicly available . LibriSpeech dataset consists of speech samples collected from 1166 speakers, and the VoxCeleb dataset consists of speech samples collected from 1166 speakers. In this work, We did not fine-tune the pre-trained speaker classification models on the depression datasets (i.e., DAIC-WOZ and Vocal Mind datasets) .

We then used these pre-trained models to extract speaker embeddings (x-vector, ECAPA-TDNN x-vectors, and d-vectors) at segment-level for the depression datasets. The dimensions of the speaker embeddings are 512, 256, and 192 for x-vector, ECAPA-TDNN x-vector, and d-vector, respectively. Finally, we use these speaker embeddings to train and test the LSTM and MK-CNN models for depression detection and severity estimation. We train separate models for x-vector, ECAPA-TDNN x-vector, and d-vector speaker embeddings.

Speaker embeddings for depression

We train MK-CNN (shown in Fig. 2 ) and LSTM networks with different speaker embeddings for depression detection and severity estimation.

MK-CNN model

We trained a MK-CNN model, as shown in Fig. 2 , for depression detection and severity estimation using the extracted speaker embeddings. The first convolutional layer consists of 3 different kernels with sizes (3,  L ), (4,  L ), and (5,  L ), respectively. Here, L refers to the length of the input feature vector. L = 512, 256 and 192 for x-vector, ECAPA-TDNN x-vector and d-vector, respectively. Each kernel consists of 50 channels. In the second convolutional layer, the size of all kernels is 4, with 50 channels in each kernel. Outputs from each kernel of the second convolutional layer are flattened and then concatenated before passing through a fully-connected (FC) layer with 100 units and an output layer.

We also trained an LSTM network for depression detection and severity estimation using the extracted speaker embeddings. The LSTM network is the same as the MK-CNN network shown in Fig. 2 , with the MK-CNN block replaced by an LSTM block consisting of 2 LSTM layers with 128 units each. The output of the LSTM block, for the last timestep, is passed through an FC layer with 100 units and an output layer.

Baseline DNN

We considered a fully-connected deep neural network (DNN) as a baseline for comparison. This DNN has three hidden layers with 128, 64, and 128 ReLU units, respectively, followed by an output layer.

Further, we extracted COVAREP 24 and OpenSMILE 40 features for performance comparison with speaker embeddings. COVAREP and OpenSMILE features, obtained at the segment level, were used to train and test the MK-CNN, LSTM, and DNN networks. We extracted the 384-dimensional OpenSMILE features using the IS 09 configuration. We obtained the 444-dimensional COVAREP by computing the higher-order statistics (mean, maximum, minimum, standard deviation, skew, and kurtosis). We calculated statistics on the frame-level COVAREP features.

Combining embeddings (CE)

We also try combining speaker embeddings (one of the x-vector, ECAPA-TDNN x-vector or d-vector) with the OpenSMILE or COVAREP features (as shown in Fig. 3 ), for depression detection and severity estimation. The proposed network consists of two branches, one for speaker embeddings and the other for OpenSMILE or COVAREP features. The input features to each branch are passed through an LSTM (CE \(_{l}\) ) or MK-CNN (CE \(_{c}\) ) block and then through a fully-connected (FC) layer (100 units). The outputs of the FC layer of each branch are combined using dot product and then passed through an output layer to get the final decision.

For all the above networks, the final output layer is a softmax with two units when trained for the task of depression detection and a single linear unit when trained for depression severity estimation. The context in Figs. 2 and 3 refers to the number of contiguous segments in an audio recording considered to train and test the models. We experiment with temporal contexts of different lengths to analyze the optimal number of contiguous speech segments required to train the models (see subsection ”Temporal Context in Depression Detection” in supplementary material). Even though the networks are trained and tested at segment-level with different contexts, the final performance metrics are obtained based on the prediction for the entire audio file. For depression detection, we use majority voting on the segment-level decisions for the final decision. For depression severity score prediction, we compute the mean of the segment-level scores to compute the overall depression severity score.

figure 2

Network for depression detection using speaker embeddings as input. S, C, K refers to the stride, number of channels and kernel size of the convolutional layer, respectively. FC refers to a fully-connected layer. The same network is used for OpenSMILE and COVAREP features.

figure 3

Network for combining speaker embeddings, and OpenSMILE or COVAREP features for depression detection.

Analysis of longitudinal data

Here, we performed experiments on longitudinal speech data to analyze whether the speaker embeddings of an individual change as the depression severity score of that individual varies. For this analysis, we used the longitudinal data collected from speakers. For the given longitudinal speech samples, we extracted and analyzed different speaker embeddings i.e., x-vector, , ECAPA-TDNN x-vector, and d-vector. We then computed the cosine similarity scores between the speaker embeddings of the longitudinal speech samples. We also noted the difference in MADRS scores between the longitudinal samples. Finally, we analyzed the cosine similarity (A.B = ||A|| ||B|| cos \(\theta \) ) scores in relation to the variations in the MADRS score.

Training details

We used Adam optimizer ( \(\beta _1=0.9\) , \(\beta _2=0.99\) ), with an initial learning rate of 0.0005, to train all the networks. Dropout rates of 0.3, 0.4, and 0.3 were used for the MK-CNN block, LSTM block, and FC layers, respectively. ReLU activation was used for all the CNN, LSTM, and FC layers. All networks were trained for 50 epochs using a batch size of 128. For training the depression detection model, we used the negative log-likelihood loss function. Whereas for training the depression severity estimation model, we used the mean-squared error loss function. Class weights were set based on the distribution of samples in the train set to alleviate the class imbalance issue during training. We maintained a constant value for temporal context (number of contiguous segments in a sample) across the train, validation, and test phases.

Measurements

Depression detection performance is measured using the \(F_1\) score ( \(F_1(D)\) and \(F_1(H)\) ) and balanced accuracy (BAc.). \(F_1(D)\) and \(F_1(H)\) are the \(F_1\) scores of depressed and healthy classes, respectively. Depression severity estimation performance is measured using root mean squared error (RMSE). The higher the \(F_1\) and BAc. values, the better the performance. Similarly, the lower the RMSE values, the better the performance. We report results using 5-fold cross-validation. There is no speaker overlap between folds, and we maintain the same proportion of depressed and healthy participants across all the folds.

Experiments and discussion

Depression detection and severity estimation.

Tables 2 – 4 provide the experimental results obtained using ECAPA-TDNN x-vector (ECAPA) based speaker embeddings. Table 2 shows the depression detection and severity estimation performance when ECAPA speaker embeddings are combined with the OpenSMILE ((ECAPA, OpenSMILE)) or COVAREP ((ECAPA, COVAREP)) features, respectively. Models trained on speaker embeddings outperformed the models trained on COVAREP or OpenSMILE features for DAIC-WOZ and Vocal Mind datasets. The depression detection and severity estimation performance further improved when the speaker embeddings were used in conjunction with the OpenSMILE or COVAREP features. This shows that the speaker embeddings and the OpenSMILE or COVAREP features carry complementary information. The performance of the LSTM models was better or comparable to the MK-CNN models. To obtain the results in Tables 2 – 4 , we used a context of 16 segments for DAIC-WOZ, and a context of 20 segments for Vocal Mind datasets to train the LSTM and MK-CNN models. (see Supplementary Table S1 and S2 for the depression assessment results using x-vector and d-vector based speaker embeddings.)

We compared the performance of our proposed approach with previous SOTA approaches for depression detection and severity estimation (see Table 3 ). In Sequence 24 , LSTM models trained with COVAREP features were used for depression detection and severity estimation. In eGeMAPS 41 , CNN models were trained using OpenSMILE features for depression detection. In FVTC-MFCC 27 , channel-delayed correlations of MFCCs were used to train dilated CNN models. In FVTC-FMT 27 , channel-delayed correlations of formant frequencies were used to train dilated CNN models. None of these approaches explicitly considered speaker-specific features for depression detection. Table 3 shows that the models trained on speaker embeddings performed better (or at least comparable to) than the SOTA approaches for speech-based depression detection and severity estimation tasks. The depression detection and severity estimation performances obtained by combining speaker embeddings with the OpenSMILE features ((ECAPA, OS)) outperformed the previous SOTA approaches.

figure 4

Analysis of speaker embeddings with respect to changes in depression severity scores using longitudinal data. ( a – c ) shows the variation in cosine similarity scores (between speaker embeddings extracted from longitudinal data) when the difference in MADRS score changes. ( d – e ) shows the variation in equal error rates (EER) (for the task of speaker classification) with respect to the difference in MADRS score between longitudinal samples. The different speaker embeddings are x-vector, d-vector and ECAPA-TDNN x-vector.

Estimating depression from demographic variables

To understand the extent to which speaker embeddings make use of information beyond demographics such as biological sex and age for depression assessment, we trained machine learning models (decision trees, support vector machines and DNNs) for depression detection and severity estimation when only biological sex and age are provided as input. We found that the best performance obtained on the Vocal Mind dataset by combining biological sex and age ( \(F_1(D)\) = 0.16, \(F_1(H)\) = 0.65 and GM = 0.32, RMSE = 8.35) was significantly worse than the performance obtained by the speaker embedding ( \(F_1(D)\) = 0.34, \(F_1(H)\) = 0.81 and GM = 0.55, RMSE = 6.62). This shows that the speaker embeddings capture more information that is relevant for depression detection and severity estimation than just biological sex and age. Further details are provided in Supplementary Table S3 .

Previous works reported that some machine learning models simply learned gender-specific information from the voice for depression detection 42 , 43 , 44 . To analyze the contribution of the gender-agnostic information contained in speaker embeddings for depression detection, we performed gender-specific depression detection as done in previous works 43 , 44 . We observed from the experimental results that the speaker embeddings do not rely completely on the gender-specific information for depression detection. For the DAIC-WOZ dataset (see Supplementary Table S4 a), both Female and Male models achieved similar performance with the Female model performing slightly better than the Male model. Whereas for the Vocal Mind dataset (see Supplementary Table S4 b), there is a large difference between the performance of the Female and the Male models, with the Female model performing significantly better than the Male model. but could this also be partially explained if, e.g. males depression does not manifest as clearly in their voice? or maybe that is the point here? This might be attributed to the difference in imbalance ratio between non-depressed to depressed samples in each gender: for females, the imbalance ratio between non-depressed to depressed = 294:95 \(\approx \) 3:1 whereas for males the imbalance ratio between non-depressed to depressed = 109:16 \(\approx \) 7:1. Experimental results are provided in Supplementary Table S4 .

Comparison with other pre-trained embeddings

We compared the performance of the proposed speaker embeddings (d-vector and ECAPA-TDNN x-vectors) with embeddings extracted using other pre-training techniques such as Mockingjay 45 , vq-wav2vec 46 , wav2vec 2.0 47 , and TRILL 48 . We trained the MK-CNN and LSTM networks with the speech-based embeddings extracted from the different pre-trained models. In Table 4 , we reported results obtained using the LSTM networks (LSTM models performed better than the MK-CNN models across different embeddings). Speaker embeddings (both d-vector and ECAPA-TDNN x-vectors) performed better than the speech-based embeddings extracted using other pre-trained models. This signifies that the speaker embeddings alone could provide effective cues for detecting depression and estimating the severity of depression.

Effect of depression on speaker embeddings in longitudinal data

Figure 4 a–c shows the mean cosine similarity scores plotted with respect to the difference in MADRS scores between longitudinal speech samples. As the difference in the MADRS score increases, the cosine similarity value decreases. For longitudinal speech samples of a speaker, the higher the variation in MADRS score, the higher the variation in speaker embeddings for that speaker.

Figure 4 d–f shows the mean equal error rates (EER in %) plotted with respect to the difference in MADRS scores between longitudinal speech samples. As the difference in the MADRS score increases, the EER values increases. This further confirms that for longitudinal speech samples of a speaker, the higher the variation in MADRS score, the higher the variation in speaker embeddings of that speaker.

It can also be observed that the variance or EER in speaker embeddings increase as the difference in depression severity scores increase. One reason for this behavior could be the skewed distribution of the samples across different values. There are more longitudinal samples with low differences in depression severity compared to samples with higher differences in depression severity. This might have led to higher variance at the end of the curve. Higher number of longitudinal samples might give us a better understanding of this behavior.

Analysis of the speaker embeddings

We also analyzed the effectiveness of the extracted speaker embeddings (d-vector and ECAPA-TDNN x-vectors) for the task of speaker classification. The DAIC-WOZ dataset consists of recordings from 189-speakers—189-class speaker classification. Similarly, the Vocal Mind dataset consists of recordings from 514-speakers — 514-class speaker classification. We randomly selected 25 and 15 non-overlapping segments from each speaker to form the train and test sets for that speaker. We extracted ECAPA-TDNN x-vectors and d-vectors for all the samples. We trained logistic regression classifiers (with no hidden layers) separately on the d-vectors and ECAPA-TDNN x-vectors for the task of speaker classification. Speaker classification results are reported in terms of equal error rate (EER)—lower the value of EER, better the performance. Using d-vectors, we achieved EERs of 1.29 and 1.69 on the test sets of DAIC-WOZ and Vocal Mind datasets, respectively. Using ECAPA-TDNN x-vectors, we achieved EER values of 1.10 and 1.46 on the test sets of DAIC-WOZ and Vocal Mind datasets, respectively. These low EER values show that the extracted speaker embeddings carry crucial information about the speaker-specific characteristics.

Comparison with a no-information system

To provide context for interpreting the lower RMSE values achieved by our proposed depression assessment system (i.e. an LSTM model trained by combining ECAPA-TDNN speaker embeddings with OpenSMILE features), we present a detailed confusion matrix (see Fig. 5 ): We used known levels of depressive severity to evaluate the seriousness of misclassification. We found that our ECAPA-TDNN-Open SMILE model made the less severe mistakes of misclassifying between healthy controls and mild cases of depression, as shown in Fig. 5 a. This compares favourably to the no-information system that is equally likely to make the bigger mistake of misclassifying severe cases of depression as controls (see Fig. 5 b).

Specifically, the depression severity score values (PHQ-8) are clinically divided into 4 different groups: No depression or healthy (PHQ-8<= 8), Mild depression (PHQ-8 range 9-12), Moderate depression (PHQ-8 range 13-16) and Severe depression (PHQ-8 range 17-24). In matrix (a) on the left, we show a confusion matrix based on our system’s predicted regression scores and in matrix (b) we show a confusion matrix obtained for a Majority classifier (or a no-information system). These matrices demonstrate interesting characteristics: (1) Many of the errors made by our model are between healthy (None) and mild classes, which would likely be more tolerable, since a goal would be to track longitudinal changes; if a patient is already known to be depressed, then it may be less critical for a system to automatically detect where they lie relative to this particular border. (2) Our system misclassified only 5 patients who are clinically depressed as healthy (None), and 4 of these are mild depression cases. This is a less significant error than it would be to misclassify a severely depressed patient as being healthy (i.e. failing to flag them). The no-information system (majority predictor) classified all 16 clinically depressed patients as healthy. Indeed it would always have all of its errors in the first column: misclassifying all depressed patients as being healthy, regardless of the severity of their depression. (3) Indeed, in our system, none of the severely depressed patients are misclassified as healthy, whereas in the no-information system, 100% of severely depressed patients will be misclassified as healthy (red bin in Fig. 5 b) (4) For our proposed system, most of the misclassification errors are “one bin apart” (light green diagonals in Fig. 5 a), i.e. confusion between adjacent classes such as mild-none or mild-moderate, as opposed to confusion between more separated classes such as none-moderate. The no-information system misclassified all the 3 moderately depressed people as healthy and the 4 severely depressed people as healthy.

figure 5

Confusion matrix obtained by considering predicted depression severity scores (PHQ-8) by ( a ) our proposed system—LSTM model trained combining ECAPA-TDNN with OpenSMILE features, and ( b ) a no-information system which predicts the mean value for every input. Fine grained clinical levels of the predicted depression severity scores obtained by dividing the depression severity scores into 4 different groups: None (PHQ-8<= 8); Mild (PHQ-8 range 9–12), Moderate (PHQ-8 range 13–16) and Severe (PHQ-8 range 17–24).

Limitations

In this work, we showed that speaker embeddings can be used to build machine learning models for depression assessment. Using speaker embeddings in combination with acoustic features, we achieved incremental progress in performance over the previous state-of-the-art machine learning techniques for the tasks of depression severity estimation and depression detection. However, there is a need to further improve performance before deploying AI-based depression assessment systems. In this work, we considered acoustic features, but not text-based features (i.e. linguistic content). It is possible that the latter, in combination with acoustic features, might in future further improve the performance of these machine learning models. The main objective of this work is not to build machine learning models to replace human clinicians, but to develop models which can be used for measurement-based treatment and to assist (i.e. work in co-ordination with) human clinicians in making better assessment of depression. Moreover, the specificity of the current models in diagnosing depression from other mental disorders remains to be established.

Conclusions

In this work we train a speaker embedding network on standard large datasets and then use two small clinical datasets to show that the resulting embeddings can then be used to estimate the severity of depression and to detect depression from speech. In particular, when we combine these embeddings with OpenSMILE speech features, we achieve SOTA performance on the depression severity estimation and the depression detection tasks. Further, we show that the changes in depression severity affects the speaker identification by analyzing repeated speech samples collected from a subset of speakers.

Data availability

Publicly available Voxceleb2 ( https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html ) and LibriSpeech ( https://www.openslr.org/12 ) datasets were used to train the speaker embedding models i.e., x-vector, d-vector and ECAPE-TDNN x-vector models. The DAIC-WOZ dataset is publicly available at https://dcapswoz.ict.usc.edu/ ). The Vocal Mind dataset generated and analyzed during the current study is not publicly available due to potential identifiable character of speech data, sensitive character of the associated information on mental disorders, and limits of consent provided by participants. The study procedures for Vocal Mind dataset, and all the experiments in this research have been carried out in accordance with the Canadian Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans - TCPS 2 (2018) policy statement. The Research Ethics Board of Nova Scotia Health Authority approved all study procedures. All the participants provided written informed consent. The consent covers the publication of de-identified data and results. The consent does not permit publication of identifiable information. A proportion of participants have additionally consented for their de-identified audio recordings to be shared with other researchers in other Canadian research institutions and/or research institution outside of Canada. De-identified version of these samples are available from the corresponding author on reasonable request.

Rehm, J. & Shield, K. D. Global burden of disease and the impact of mental and addictive disorders. Curr. Psychiatry Rep. 21 , 10 (2019).

Article   PubMed   Google Scholar  

W.H.O et al. The european mental health action plan 2013–2020. Copenhagen: World Health Organization 17 (2015).

Zhu, M. et al. The efficacy of measurement-based care for depressive disorders: Systematic review and meta-analysis of randomized controlled trials. J. Clin. Psychiatry 82 , 37090 (2021).

Article   Google Scholar  

Lewis, C. C. et al. Implementing measurement-based care in behavioral health: A review. JAMA Psychiat. 76 , 324–335 (2019).

Quatieri, T. F. & Malyska, N. Vocal-source biomarkers for depression, A link to psychomotor activity. In Interspeech (2012).

Cummins, N. et al. A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71 , 10–49 (2015).

Slavich, G. M., Taylor, S. & Picard, R. W. Stress measurement using speech: Recent advancements, validation issues, and ethical and privacy considerations. Stress 22 , 408–413 (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Low, L. A., Maddage, N. C., Lech, M., Sheeber, L. & Allen, N. Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents. In ICASSP (IEEE, 2010).

Cummins, N., Epps, J. Breakspear, M. & Goecke, R. An investigation of depressed speech detection, Features and normalization. In Interspeech (2011).

Simantiraki, O., Charonyktakis, P., Pampouchidou, A., Tsiknakis, M. & Cooke, M. Glottal source features for automatic speech-based depression assessment. In INTERSPEECH , 2700–2704 (2017).

Ringeval, F. et al. Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition. In Proc. Audio/Visual Emotion Challenge and Workshop , 3–12 (2019).

Chung, J. S., Nagrani, A. & Zisserman, A. Voxceleb2: Deep speaker recognition. In Interspeech , 1086–1090 (2018).

Davey, C. G. & Harrison, B. J. The self on its axis: A framework for understanding depression. Transl. Psychiatry 12 , 1–9 (2022).

Montesano, A., Feixas, G., Caspar, F. & Winter, D. Depression and identity: Are self-constructions negative or conflictual?. Front. Psychol. 8 , 877 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Schuller, B. et al. A survey on perceived speaker traits: Personality, likability, pathology, and the first challenge. Comput. Speech Lang. 29 , 100–131 (2015).

Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P. & Ouellet, P. Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19 , 788–798 (2010).

Saon, G., Soltau, H., Nahamoo, D. & Picheny, M. Speaker adaptation of neural network acoustic models using i-vectors. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding , 55–59 (IEEE, 2013).

Jia, Y. et al. Transfer learning from speaker verification to multispeaker text-to-speech synthesis. Adv. Neural Inf. Process. Syst. 31 (2018).

Pappagari, R., Wang, T., Villalba, J., Chen, N. & Dehak, N. x-vectors meet emotions: A study on dependencies between emotion and speaker recognition. In ICASSP (IEEE, 2020).

Desplanques, B., Thienpondt, J. & Demuynck, K. Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. Preprint arXiv:2005.07143 (2020).

Wan, L., Wang, Q., Papir, A. & Moreno, I. L. Generalized end-to-end loss for speaker verification. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 4879–4883 (IEEE, 2018).

Tasnim, M. & Stroulia, E. Detecting depression from voice. In Canadian Conference on Artificial Intelligence , 472–478 (Springer, 2019).

Chlasta, K., Wołk, K. & Krejtz, I. Automated speech-based screening of depression using deep convolutional neural networks. Procedia Comput. Sci. 164 , 618–628 (2019).

Al Hanai, T., Ghassemi, M. M. & Glass, J. R. Detecting depression with audio/text sequence modeling of interviews. In Interspeech , 1716–1720 (2018).

Ma, X., Yang, H., Chen, Q., Huang, D. & Wang, Y. Depaudionet: An efficient deep model for audio based depression classification. In workshop on Audio/visual emotion challenge (2016).

Rodrigues Makiuchi, M., Warnita, T., Uto, K. & Shinoda, K. Multimodal fusion of bert-cnn and gated cnn representations for depression detection. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop , 55–63 (2019).

Huang, Z., Epps, J. & Joachim, D. Exploiting vocal tract coordination using dilated cnns for depression detection in naturalistic environments. In ICASSP , 6549–6553 (IEEE, 2020).

Seneviratne, N. & Espy-Wilson, C. Speech based depression severity level classification using a multi-stage dilated cnn-lstm model. Preprint arXiv:2104.04195 (2021).

Dumpala, S. H. et al. Estimating severity of depression from acoustic features and embeddings of natural speech. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 7278–7282 (IEEE, 2021).

Afshan, A. et al. Effectiveness of voice quality features in detecting depression. Interspeech 2018 (2018).

Cummins, N., Epps, J., Sethu, V. & Krajewski, J. Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 970–974 (IEEE, 2014).

Di, Y., Wang, J., Li, W. & Zhu, T. Using i-vectors from voice features to identify major depressive disorder. J. Affect. Disord. 288 , 161–166 (2021).

Sheikh, I., Dumpala, S. H., Chakraborty, R. & Kopparapu, S. K. Sentiment analysis using imperfect views from spoken language and acoustic modalities. In Proc. Grand Challenge and Workshop on Human Multimodal Language , 35–39 (2018).

Gratch, J. et al. The distress analysis interview corpus of human and computer interviews. In LREC , 3123–3128 (2014).

Kroenke, K., Spitzer, R. L. & Williams, J. B. The phq-9: Validity of a brief depression severity measure. J. Gen. Intern. Med. 16 , 606–613 (2001).

Manea, L., Gilbody, S. & McMillan, D. Optimal cut-off score for diagnosing depression with the patient health questionnaire (phq-9): A meta-analysis. CMAJ 184 , E191–E196 (2012).

Hawley, C., Gale, T. & Sivakumaran, T. Defining remission by cut off score on the madrs selecting the optimal value. J. Affect. Disord. 72 , 177–184 (2002).

Article   CAS   PubMed   Google Scholar  

Ravanelli, M. et al. Speechbrain. https://github.com/speechbrain/speechbrain (2021).

Panayotov, V., Chen, G., Povey, D. & Khudanpur, S. Librispeech: an asr corpus based on public domain audio books. In ICASSP , 5206–5210 (IEEE, 2015).

Eyben, F., Wöllmer, M. & Schuller, B. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proc. ACM conference on Multimedia , 1459–1462 (2010).

Huang, Z., Epps, J. & Joachim, D. Investigation of speech landmark patterns for depression detection. IEEE Trans. Aff. Comput. (2019).

Bailey, A. & Plumbley, M. D. Gender bias in depression detection using audio features. In 2021 29th European Signal Processing Conference (EUSIPCO) , 596–600 (IEEE, 2021).

Cummins, N., Vlasenko, B., Sagha, H. & Schuller, B. Enhancing speech-based depression detection through gender dependent vowel-level formant features. In Conference on artificial intelligence in medicine in Europe , 209–214 (Springer, 2017).

Vlasenko, B., Sagha, H., Cummins, N. & Schuller, B. Implementing gender-dependent vowel-level analysis for boosting speech-based depression recognition. In Interspeech (2017).

Liu, A. T., Yang, S.-w., Chi, P.-H., Hsu, P.-c. & Lee, H.-y. Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 6419–6423 (IEEE, 2020).

Baevski, A., Schneider, S. & Auli, M. vq-wav2vec: Self-supervised learning of discrete speech representations. Preprint arXiv:1910.05453 (2019).

Baevski, A., Zhou, H., Mohamed, A. & Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. Preprint arXiv:2006.11477 (2020).

Shor, J. et al. Towards learning a universal non-semantic representation of speech. Preprint arXiv:2002.12764 (2020).

Download references

Acknowledgements

This work has been supported by the Canada Research Chairs Program (File Number 950 - 233141) and the Canadian Institutes of Health Research (Funding Reference Number 165835). We thank the Canadian Institute for Advanced Research (CIFAR) for their support. Resources used in preparing this research were provided, in part, by NSERC, the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute www.vectorinstitute.ai/#partners .

Author information

Authors and affiliations.

Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada

Sri Harsha Dumpala, Sebastian Rodriguez & Sageev Oore

Vector Institute, Toronto, ON, Canada

Dalhousie University, Psychiatry, Halifax, NS, Canada

Katerina Dikaios, Ross Langley & Rudolf Uher

Nova Scotia Health, Halifax, NS, Canada

Katerina Dikaios, Sheri Rempel & Rudolf Uher

You can also search for this author in PubMed   Google Scholar

Contributions

S.H.D. designed and conducted the experiments, and wrote the first draft of the paper. S.R. helped in conducting experiments and plotting the figures. K.D., R.L. and S.R. designed the data collection process, and collected and annotated the data. R.U. and S.O. were involved in the discussions of the approach, and provided critical feedback to the paper. All authors have discussed the results and reviewed the manuscript.

Corresponding author

Correspondence to Sageev Oore .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Dumpala, S.H., Dikaios, K., Rodriguez, S. et al. Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity. Sci Rep 13 , 11155 (2023). https://doi.org/10.1038/s41598-023-35184-7

Download citation

Received : 17 August 2022

Accepted : 14 May 2023

Published : 10 July 2023

DOI : https://doi.org/10.1038/s41598-023-35184-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

speech research paper

How Do We Imagine a Speech? A Triple Network Model for Situationally Simulated Inner Speech

64 Pages Posted: 25 Aug 2024

Xiaowei Gao

South China Normal University

Junjie Yang

Xiaolin guo, yaling wang, jiaxuan liu, daniel kaiser.

affiliation not provided to SSRN

Shenzhen University

Inner speech, a silent verbal experience, is central to human consciousness and cognition, yet its neural mechanisms remain largely unknown.In this study, we adopted an ecological paradigm called situationally simulated inner speech, which involves the dynamic integration of contextual background, episodic and semantic memories, and external events into a coherent structure. We conducted dynamic activation and network analyses on fMRI data as participants engaged in inner speech prompted by cue words across ten contexts. Our seed-based co-activation pattern analyses revealed dynamic involvement of the language network, sensorimotor network, and default mode network in situationally simulated inner speech. Additionally, frame-wise dynamic conditional correlation analysis uncovered four temporal-reoccurring states with distinct functional connectivity patterns among these networks. We proposed a triple network model for deliberate inner speech, including language network for truncated overt speech, sensorimotor network for perceptual simulation and monitoring, and default model network for integration and 'sense-making' processing.

Note: Funding declaration: This work was supported by the National Social Science Foundation of China (No. 20&ZD296), KeyArea Research and Development Program of Guangdong Province (No.2019B030335001), National Natural Science Foundation of China (No.32100889). The funding agencies took no part in the design or implementation of the research. D.K. is supported by the Deutsche Forschungsgemeinschaft (SFB/TRR135, project number 222641018; KA4683/5-1, project number 518483074; KA4683/6-1, project number 536053998), “The Adaptive Mind” funded by the Excellence Program of the Hessian Ministry of Higher Education, Science, Research and Art, and a European Research Council starting grant (ERC-2022-STG 101076057). Views and opinions expressed are those of the authors only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. Conflict of Interests: The authors declare no competing financial interests.

Keywords: coactivation pattern, functional magnetic resonance imaging, situationally simulated inner speech, triple network model

Suggested Citation: Suggested Citation

South China Normal University ( email )

483 Wushan Str. Tianhe District Guangzhou, 510631, 510642 China

affiliation not provided to SSRN ( email )

No Address Available

Shenzhen University ( email )

3688 Nanhai Road, Nanshan District Shenzhen, 518060 China

Binke Yuan (Contact Author)

Do you have a job opening that you would like to promote on ssrn, paper statistics, related ejournals, neurology ejournal.

Subscribe to this free journal for more curated articles on this topic

A Scoping Review in Speech Pathology and Applications to Future Health Disparity Research Questions

Article sidebar, main article content.

Record ID: 163

Program Affiliation: Capstone

Presentation Type: Poster

Abstract: How do you know when a scoping review is a good fit for your literature review? As an undergraduate student, I participated in a project where our goal was to analyze commonly employed methodologies used to assess gender perception in speech, demographic characteristics of listeners that have been recorded, and the types of speech samples being utilized to investigate gender perception for speech pathology. However, there are many literature review styles to choose from before moving forward with a project or idea. We found that a scoping review would be the most appropriate tool to meet our goals because it highlights literature in emerging areas of science that have not been reviewed. A scoping review assesses the potential scope of research done about a certain topic in hopes of retrieving evidence on the team's research topics. To conduct our scoping review, we used the software, Covidence, which allows reviewers to complete article screening and data extraction quickly and flexibly. Articles went through multiple stages, abstract and title screening, full-text screening, and data extraction, to be filtered and eventually included in our findings. Through this experience in reviewing literature, I have gained knowledge on the benefits and disadvantages of scoping reviews, how to navigate through research article sections, and how to create a thought pattern that seeks out information for future research questions related to health disparities. 

Article Details

Lydia erwin.

Major: Speech, Language, and Hearing Services

Thirty years of research into hate speech: topics of interest and their evolution

  • Open access
  • Published: 30 October 2020
  • Volume 126 , pages 157–179, ( 2021 )

Cite this article

You have full access to this open access article

speech research paper

  • Alice Tontodimamma 1 ,
  • Eugenia Nissi 2 ,
  • Annalina Sarra 3 &
  • Lara Fontanella 3  

24k Accesses

54 Citations

8 Altmetric

Explore all metrics

The exponential growth of social media has brought with it an increasing propagation of hate speech and hate based propaganda. Hate speech is commonly defined as any communication that disparages a person or a group on the basis of some characteristics such as race, colour, ethnicity, gender, sexual orientation, nationality, religion. Online hate diffusion has now developed into a serious problem and this has led to a number of international initiatives being proposed, aimed at qualifying the problem and developing effective counter-measures. The aim of this paper is to analyse the knowledge structure of hate speech literature and the evolution of related topics. We apply co-word analysis methods to identify different topics treated in the field. The analysed database was downloaded from Scopus, focusing on a number of publications during the last thirty years. Topic and network analyses of literature showed that the main research topics can be divided into three areas: “general debate hate speech versus freedom of expression”,“hate-speech automatic detection and classification by machine-learning strategies”, and “gendered hate speech and cyberbullying”. The understanding of how research fronts interact led to stress the relevance of machine learning approaches to correctly assess hatred forms of online speech.

Similar content being viewed by others

speech research paper

An Identity-Based Framework for Generalizable Hate Speech Detection

speech research paper

Hate speech operationalization: a preliminary examination of hate speech indicators and their structure

speech research paper

Addressing Hate Speech with Data Science: An Overview from Computer Science Perspective

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

Introduction

In recent years, the ways in which people receive news, and communicate with one another, have been revolutionised by the Internet, and especially by social networks. It is a natural activity, in societies where freedom of speech is recognised, for people to express their opinions. From an era in which individuals communicated their ideas, usually orally and only to small numbers of other people, we have moved on to an era in which individuals can make free use of a variety of diffusion channels in order to communicate, instantaneously, with people who are a long distance away; in addition, more and more people make use of online platforms not only to interact with each other, but also to share news. The detachment created by being enabled to write, without any obligation to reveal oneself directly, means that this new medium of virtual communication allows people to feel greater freedom in the way they express themselves. Unfortunately, though, there is also a dark side to this system. Social media have become a fertile ground for heated discussions which frequently result in the use of insulting and offensive language. The creation and dissemination of hateful speech are now pervading the online platforms. As a result, countries are recognising hate speech as a serious problem, and this has led to a number of International and European initiatives being proposed, aimed at qualifying the problem and developing effective counter-measures.

A first issue, for the identification of a content as hateful, is that there is no universally accepted definition of hate speech, mainly because of the vague and subjective determinations as to whether speech is “offensive” or conveys “hate” (Strossen 2016 ). A comprehensive overview of different definitions can be found in Sellars ( 2016 ) who derives several related concepts that appear throughout academic and legal attempts to define hate speech as well as in attempts of online platforms. The identified common traits refer to: the targeting of a group, or an individual as a member of a group; the presence of a content that expresses hatred, causes a harm, incites bad actions beyond the speech itself, and has no redeeming purpose; the intention of harm or bad activity; the public nature of the speech; finally, a context that makes violent response possible. Sellars ( 2016 ) stresses, however, how the identified traits do not form a single definition, but could be used to help improve the confidence that the speech in question is worthy of identification as hate speech.

In addition to the ambiguity in the definition, hate speech creates a conflict between some people’s speech rights, and other people’s right to be free from verbal abuse (Greene and Simpson 2017 ). The complex balancing between freedom of expression and the defence of human dignity has received significant attention from legal scholars and philosophers and, according to Sellars ( 2016 ), the different approaches to define hate speech can be linked to academics’ particular motivations: “Some do not overtly call for legal sanction for such speech and seek merely to understand the phenomenon; some do seek to make the speech illegal, and are trying to guide legislators and courts to effective statutory language; some are in between.” Advocates of the free speech rights invoke the principle of viewpoint neutrality or content neutrality, which prohibits bans on the expression of viewpoints based on their substantive message (Brettschneider 2013 ). This protection extends even to speech that expresses ideas that most people would find distasteful, offensive, disagreeable, or discomforting, and thus extends even to hate speech (Beausoleil 2019 ). According to Strossen ( 2016 , 2018 ) hate speech laws not only violate the cardinal viewpoint neutrality, but also the emergency principles, by permitting government to suppress speech solely because its message is disfavoured, disturbing, or feared to be dangerous, by government officials or community members, and not because it directly causes imminent serious harm. On the other hand, Cohen-Almagor ( 2016 , 2019 ) insists that it is necessary to “take the evils of hate speech seriously” and that “certain kinds of speech are beyond tolerance.” The author criticizes the viewpoint neutrality concept arguing that a balance needs to be struck between competing social interests because freedom of expression is important as is the protection of vulnerable minorities: “people must enjoy absolute freedom to advocate and debate ideas, but this is so long as they refrain from abusing this freedom to attack the rights of others or their status in society as human beings and equal members of the community.” An alternative remedy to censoring hate speech could be to add more speech, as suggested by the UNESCO study titled “Countering On-line Hate Speech” (Gagliardone et al. 2015) which argues that counter-speech is usually preferable to the suppression of hate speech.

The rising visibility of hate speech on online social platform has resulted in a continuously growing rate of published research into different areas of hate speech. The increasing number of studies on this subject is beneficial to scholars and practitioners, but it also brings about challenges in terms of understanding the key research streams in the area. Previous surveys highlighted the state of the art and the evolution of research on hate speech (Schmidt and Wiegand 2017 ; Fortuna and Nunes 2018 ; MacAvaney et al. 2019 ; Waqas et al. 2019 ). The survey of Schmidt and Wiegand ( 2017 ) describes the key areas that have been explored to automatically recognize hateful utterances using neural language processing. Eight categories of features used in hate speech detection, including simple surface, word generalization, sentiment analysis, lexical resources and linguistic characteristics, knowledge-based features, meta-information, and multimodal information, have been highlighted. In addition, Schmidt and Wiegand ( 2017 ) stress how a comparability of different features and methods requires a benchmark data set. Fortuna and Nunes ( 2018 ) carried out an in-depth survey aimed at providing a systematic overview of studies in the field. In this survey, the authors firstly pay attention to the motivations for studying hate speech and then they conveniently distinguish theoretical and practical aspects. Specifically, they list some of the main rules for hate speech identification and investigate the methods and algorithms adopted in literature for automatic hate speech detection. Also, practical resources, such as datasets and other projects, have been reviewed. MacAvaney et al. ( 2019 ) discussed the challenges faced by online automatic approaches for hate speech detection in text, including competing definitions, dataset availability and construction. A throughout bibliographic and visualization analysis of the scientific literature related to online hate speech was conducted Waqas et al. ( 2019 ). Drawing on Web of Science (WOS) core database, their study concentrated on the mapping of general research indices, prevalent themes of research, research hotspots and influential stakeholders, such as organizations and contributing regions. Along with the most popular bibliometric measures, such as total number of papers, to measure productivity, and total citations, to assess the relevance of a country, institution, or author, the above mentioned research uses mapping knowledge tools to draw the structure and networks of authors, journals, universities and countries. Not surprisingly, the results of this bibliometric analysis show a remarkable increase in publication and citation trend after year 2005, when social media platforms have grown in terms of influence and user adoption, and the Internet has become a central arena for public and private discourse. Furthermore, it has emerged that most of the publications originate from the discipline of psychology and psychiatry, with recurring themes of cyberbullying, psychiatric morbidity, and psychological profiling of aggressors and victims. As noted by the authors, the high representation of psychology-related contributions is mainly due to the choice of WOS core database, which excludes relevant research fields from the analysis, being its coverage geared towards health and social science disciplines rather than engineering or computer ones.

Based on these previous studies, and especially on that of Waqas et al. ( 2019 ), our research intends to enlarge the mapping of global literature output regarding online hate speech over the last thirty years, by relying on bibliographic data extracted from Scopus database and using different methodological approaches. In order to identify how online hate scientific literature is evolving and understand what are the main research areas and fronts and how they interact over time, we used bibliometric measures, mapping knowledge tools and topic modelling. All the above methods are traditionally employed in bibliometrics analysis and share the idea of using a great amount of bibliographic data to let emerge, in an unsupervised way, the underlying knowledge base. In particular, topic analysis, based on the Latent Dirichlet Allocation method (LDA; Blei et al. 2003 ) is gaining popularity among scholars in diverse fields (Alghamdi and Alfalqi 2015 ). A topic model leads to two key outputs: a list of topics (i.e. groups of words that frequently occur together) and a lists of documents that are strongly associated with each topic (McPhee et al. 2017 ). Accordingly, this approach is useful for finding interpretable topics with semantic meaning and for assigning these topics to the literature documents, offering in such way a probabilistic quantification of relevance both for the identification of topics and for the classification of documents.

Our study exploits the main strengths of each method in drawing a synthetic representation of the research trends on online hate and adds value to previous quoted works, by taking advantage of topic modelling to retrieve latent driven themes. As highlighted in Suominen and Toivanen ( 2016 ), the key novelty of topic modelling, in classifying scientific knowledge, is that it virtually eliminates the need to fit new-to-the-world knowledge into known-to-the-world definitions.

The remainder of this work is structured as follows. Section “Materials and methods” describes the data source and the methods used. Section “Results” presents the bibliometric results, focusing on the yearly quantitative distribution of publications and on the latent topics retrieved through LDA. This section provides useful insights into the temporal evolution of the topics, their interactions and the research activity in the identified latent themes. A conclusion and future perspectives are given in “Conclusion” section. Finally, we report additional information on the bibliographic data set and the topic analysis results, in the online Supplementary Material.

Materials and methods

Bibliographic dataset.

For the analysis, we use a bibliometric dataset, covering the period 1992–2019, retrieved from Scopus database. This bibliographic database was selected because it is one of the most suitable source of references for scientific peer-reviewed publications.

In the same vein of Waqas et al. ( 2019 ), we focus on online hate and, for our search, we built a query that, in addition to the exact phrase “hate speech”, combines terms related to offensive or denigratory language (“hatred”, “abusive language, “abusive discourse”, “abusive speech”, “offensive language”, “offensive discourse”, “offensive speech”, “denigratory language”, “denigratory discourse”, “denigratory speech”) with words linked to the online nature (“online”,“social media”, “web”, “virtual”, “cyber”, “Orkut”, “Twitter”, “Facebook”, “Reddit”, “Instagram”, “Snapchat”, “Youtube”, “Whatsapp”, “Wechat”, “QQ”, “Tumblr”, “Linkedin”, “Pinterest”).

We have not considered specific terms linked to cyberbullying because, although if this phenomenon overlaps partially with hate speech, it encompasses a broader field. The exact query can be found in the Supplementary Material.

The bibliographic data was extracted by applying the query to the contents of title, abstract and keywords. The data for each resulting publication was manually exported on December 15, 2019.

All types of publications were included in the search, and 1614 documents related to hate speech, published in 995 different sources, were identified. This high number indicates a wide variety of research themes, and the multidisciplinary character of the subject which involves a plurality of disciplines. In particular, the top publication fields include Social Sciences, Computer Science, Arts and Humanities and Psychology. Looking at the document type, the majority is article, conference paper and book chapter.

Information about document distribution by research field is given in the Supplementary Material, along with the document distribution by source and the ranking of the most productive countries and authors.

Conceptual structure map

To investigate the structure of research on hate speech, we firstly consider an exploratory analysis of the keywords selected by the authors. The analysis was carried out through the R package Bibliometrix (Aria and Cuccurullo 2017 ), which allows to perform multiple correspondence analysis (MCA) (Greenacre and Blasius 2006 ) and hierarchical clustering to draw a conceptual structure map of the field. Specifically, MCA allows to obtain a low-dimensional Euclidean representation of the original data matrix, by performing a homogeneity analysis of the “documents by keywords” indicator matrix, built by considering a dummy variable for each keyword. The words are plotted onto a two-dimensional map where closer words are more similar in distribution across the documents. In addition, the implementation of a hierarchical clustering procedure on this reduced space leads to identify clusters of documents that are characterised by common keywords.

Topic analysis

To gain a deeper understanding of the topics discussed in the published research on hate speech, we have applied Latent Dirichet Allocation, which is an automatic topic mining technique that enables to uncover hidden thematic subjects in document collections by revealing recurring clusters of co-occurring words. The two foundational probabilistic topic models are the Probabilistic Latent Semantic Analysis (pLSA, Hofmann 1999 ) and the Latent Dirichlet Allocation (Blei et al. 2003 ). The pLSA is a probabilistic variant of the Latent Semantic Analysis introduced by Deerwester et al. ( 1990 ) to capture the semantic information embedded in large textual corpora without human supervision. In the pLSA approach, each word in a document is modelled as a sample from a mixture model, where the mixture components are multinomial random variables that can be viewed as representations of topics. The pLSA model allows multiple topics in each document, and the possible topic proportions are learned from the document collection. Blei et al. ( 2003 ) introduced the LDA which presents a higher modelling flexibility over pLSA by assuming fully complete probabilistic generative model where each document is represented as a random mixture over latent topics and each topic is characterized by a distribution over words. LDA mitigates some shortcomings of the earlier topic models. Specifically, it has the advantage to improve the way of mixture models of capturing the exchangeability of both words and documents. LDA assumes a probabilistic generative model where each document is described by a distribution of topics and each topic is described by a distribution of words. The set of candidate topics are the same for all documents and each document may contain words from multiple different topics. The generative two-stage process of each document in the corpus can be described as follows (Blei 2012 ). In the first step a distribution over topics is randomly chosen; in the second step for each word in the document a topic is randomly chosen from the distribution over topics and a word is randomly chosen from the corresponding distribution over the vocabulary. Following Blei ( 2012 ), it it is possible to describe LDA more formally. Let assume that we have a corpus defined as a collection of D documents where each document is a sequence of N words, \(w_d=(w_{d,1},w_{d,2},\dots ,w_{d,N})\) , and each word is an item from a vocabulary indexed by \(\{1,\dots ,V\}\) . Furthermore, we assume that there are K latent topics, \(\beta _{1:K}\) , defined as distribution over the vocabulary. The generative process for LDA corresponds to the following joint distribution of the hidden and observed variables

The topic proportions for the d th document are \(\theta _d\) , where \(\theta _{d,k}\) is the topic proportion for topic k in document d . The topic assignments for the d th document are \(z_d\) , where \(z_{d,n}\) is the topic assignment for the n th word in document d . Both the topic proportions and the topic distributions over the vocabulary follow a Dirichlet distribution. Since the posterior distribution, \(p \left( \beta _{1:K},\theta _{1:D},z_{1:D}|w_{1:D}\right) \) , is intractable for exact inference, a wide variety of approximate inference algorithms, such as sampling-based (Steyvers and Griffiths 2006 ) and variational (Blei et al. 2003 ) algorithms can be considered.

In our analysis, we implement LDA to model a corpus where each document consists of the publication title, its abstract and the keywords. To exctract the relevant content and remove any unwanted nuisance terms, we performed a cleaning process (tokenization; lowercase conversion; special characters, and stop-words removal) of the text documents using the function provided in the Text Analytics Toolbox of Matlab (MATLAB 2018 ). For the analyses, the tokens with less than 10 occurrences in the corpus have been pruned. LDA analysis was performed through the fitlda Matlab routine available in the same Toolbox.

The results of this study involved different analyses. Firstly, we concentrated on the yearly quantitative distribution of literature, then we examined the conceptual structure of hate speech research. Next, we combined the results of topic and network analysis for highlighting the emerging topics, their interactions over time, the most influential countries and the academic cooperations in the retrieved themes.

Research activity

The evolution over time of the number of published documents shows a remarkable growth, highlighting the increased global focus on online hate. See Fig.  1 , in which the number of publications per year is displayed.

Since 1992, it is possible to distinguish between two different phases. During the first phase from 1992 to 2010, a slow increase in publications occurred. A higher growth rate characterises, instead, the second phase, from 2010 to 2019, testifying the growing interest. This is consistent with Price’s theory on the productivity on a given subject (Price 1963 ), according to which the development of science goes through three phases. In the preliminary phase, known as the precursor, when some scholars start publishing research into a new field, small increments in scientific literature are recorded. In the second phase, the number of publications grows exponentially, since the expansion of the field attracts an increasing number of scientists, as many aspects of the subject still have to be explored. Finally, in the third phase there is a consolidation of the body of knowledge along with a stabilisation in the productivity; therefore the aspect of the curve transforms from exponential to logistic.

To verify the rapid increase in the trend of research literature related to online hate speech, we fit an exponential growth curve to the data (Price 1963 ). According to this model the annual rate of change is equal to \(20.5\%\) . Therefore, it can be said that hate speech research is in the second phase of development: an increasing amount of research is being published, but there is still room for improvement in many aspects.

figure 1

Number of publications on hate speech per year: observed and expected distribution according to an exponential growth

Conceptual structure of hate speech research

The conceptual structure of the research on hate speech is represented in Fig.  2 , where authors’ keywords, whose occurrences are greater than ten, are represented on the two dimensional plane obtained through Multiple Correspondence Analysis (MCA).

figure 2

Conceptual map of hate speech research

The two dimensions of the maps which emerged from the MCA can be interpreted as follows. The first, horizontal, dimension separates keywords emphasizing social networks and communities and hate speech linked to religion (on the right), from those related to the political aspects of the hate speech phenomenon (on the left). This dimension explains the \(39.61\%\) of variability. The second, vertical dimension, considers machine learning techniques and accounts for the \(13.55\%\) of overall inertia. In Fig.  2 are also displayed the results obtained through a hierarchical cluster analysis carried out adopting the method of the average linkage on the factorial coordinates obtained with the MCA. A very important fact is evident from the conceptual map: three clusters represent the three major areas of research involved in the matter of hate diffusion. The blue cluster shows words as “abusive language”, “cyberbullying”, “deep learning”, “text classification”, “sentiment analysis”, “social network”, terms that bring out the problem related to automatic detection. The green cluster shows words as “human rights”, “democracy”, “incitement”, “blasphemy”, words that bring out the problem related to the legal sphere. The red cluster, the most numerous, shows words as “social network analysis”, “privacy”, “youtube”, “facebook”, “online hate”, “cyberhate”, words that bring out the problem related to social sphere and social media.

Research topics in hate speech literature

Topic modelling, performed via LDA technique, provides an additional insight in structuring the online hate research into different topics. As known, LDA algorithm needs to specify a fixed number of topics, implying that the researchers should have some idea of the possible bounds of latent features in the text. In fact, there is no unique value, appropriate in all situations and all datasets (Barua et al. 2014 ). Of course, the LDA model produces finer-grained aggregations by increasing the number of desired topics while smaller values will produce coarser-grained, more general topics. On the other hand, a higher number of topics may cause the progressive intrusion of non-relevant terms among the most probable words, affecting the semantic coherence of the retrieved themes.

In our study, we run the LDA analysis by setting the number of desired topics, in turn, equal to 10, 12, 14 and, in the end, we adopted the twelve-topic solution which guarantees a fair compromise between topic interpretability and a detailed analysis.

Topic interpretation

In LDA, the topics are assumed to be latent variables, which need to be meaningfully interpreted. This is usually achieved by examining the top keywords in each topic (Steyvers and Griffiths 2006 ). Figures   3 and   4 show the most relevant words for each topic, where relevance is measured normalizing the posterior word probabilities per topic by the geometric mean of the posterior probabilities for the word across all topics. Topics are sorted according to the estimated probability to be observed in the entire data set. The most relevant terms, along with their relevance measures are provided in Section 2.1 of the Supplementary Material.

The twelve identified topics reveal important areas of online hate research in the past thirty years. They can be synthetically described as dealing with the following themes.

figure 3

Word clouds for topics 1–6

figure 4

Word clouds for topics 7–12

Topic 1 includes words such as “speech”, “hate”, “free”, “harm”, “freedom”, suggesting a broad discussion on the debate “hate speech” versus “free speech”. The constitutional right of freedom of expression is considered also in Topic 3, mainly characterised by words like “freedom”, “law/laws”, “rights”, “expression”,“constitutional”. Topic 2 is strictly linked with the political aspects of the hate speech phenomenon and contains terms such as “political/politics/politician”, “discourse”, “democracy”, “elections”. Topic 7 covers hate speech related to religion and extremism and is described by words such as “terrorism/terrorist”, “religion/religious”, “muslim/muslims”, “violence”, “global”,“war”, “extremism/extremist”.

The online aspect of hate is clearly highlighted in Topics 4, 6, 8 and 10. In particular, Topic 4 is related to research on social networks and communities, especially Facebook and Youtube, which are large social media providers whose inner mechanisms allow users to report hate speech. Studies in Topic 8 refer to Twitter, and it is possible to stress how they make use, above all, of content and sentiment analysis. Topic 6 covers the aspect of information diffusion on the Internet, including terms like “internet”, “information”, “media”. Finally, Topic 10 considers the problem of online deviant behaviour and cyberbullying, in which relevant words are: “online”, “exposure”, “crime/crimes”, “behavior”, “cyberbullying”, “cyberhate”.

Interestingly, the distinct hate speech targets are disclosed by Topics 5 and 11. Topic 5 deals with issues on racism, as indicated by the following sets of words: “racism”, “racist”, “race”, “racial”,“white/whiteness”, “black’; in that topic we also find, among the top scoring words, some terms associated with feminism (i.e.“feminist”, “women”, “misogyny”). Topic 11 refers to hate speech linked to gender and sexual identity since the most relevant-used words are: “sexual/sexuality”, “gender”, “gay”, “trasgender”, “lesbian”, “lgbt/lgbtq”.

Finally, Topics 9 and 12 deals with methodological aspects of hate speech analysis. In particular, Topic 9 refers to the analysis of discourse and language, as suggested by the most relevant words contained in it (“comments”, “discourse”, “language”, “emotions”, “linguistic”,“corpus”). On the other hand, Topic 12 considers machine learning techniques, in fact, within this specific topic, the terms “learning”, “detection”, “classification”, “machine”,“text” are those with the top scoring.

Topic temporal evolution

To further analyse each of the topics, we focus on their dynamic changes over the years. As previously pointed out, LDA algorithm estimates each topic as a mixture of words, but also models each document as a mixture of topics. Therefore, each document can exhibit multiple topics on the base of the words used. The estimated probabilities of observing each topic in each document can be exploited to assign one or more topics to the documents of the analysed bibliographic dataset. Specifically, in this study, we decided to assign the topics with the top three highest document-topic probabilities to each document, provided the probabilities are greater than 0.2.

The temporal evolution of the scientific productivity for each topic can be captured through Fig.  5 , where the exponential growth model has been fitted considering the number of documents published since 2000.

figure 5

Number of publications for each topic: observed and expected distributions according to an exponential growth

The temporal trend of most topics agrees with an exponential growth. However, looking at Topic 1 and Topic 3, we notice how the number of publications in the last period falls below the number expected according to the exponential law considered by Price ( 1963 ) with regard to the second phase in the development of scientific research on a given subject. We saw that the content of Topics 1 and 3 is associated with generic themes of online hate speech, thus the lesser amount of related publications in the last period reflects the interest of research community in identifying new research fronts. Conversely, the number of published documents for Topic 8 shows a sudden rise starting from 2018. This conclusion holds, even if to a lesser extend, for Topic 9 where the observed productivity rises above the expected one.

The notable case in Fig.  5 regards Topic 12, dealing with the application of the dominant and new theme of machine learning algorithms to online hate speech. In the last two years, this topic exhibits an explosive growth as for the related publication volumes. A relatively more contained rise in the size of publications is recorded for Topics 10 and 11, whose contents are associated with the specific themes of cyberhate and gendered hate.

Overall, these temporal patterns seem to suggest a shift in hate speech literature from more generic themes, about the debate on freedom of speech versus hate speech, towards research more focused on the technical aspects of hate speech detection and methodologies and techniques included in the fields of linguistics, statistics and machine learning. The appearance and development of new fields of interest and innovative ideas in the research activity on hate speech is confirmed by the heatmaps provided in the Supplementary Material, which show the number of documents, by years, assigned to the identified topics.

Topic interactions

After exploring the features of the identified topics in online hate speech research, we quantitatively model their interactions and build a topic relation network. In particular, given that each document has been assigned to multiple topics, we can exploit the topic co-occurence matrix in order to understand the connections among the different themes developed in this field of research.

In Fig.  6 , we display the topic network. In the graph, the nodes are coloured according to their degree and the edges are weighted according to the co-occurences: the wider the line, the stronger the connection. Moreover, the edges whose weight is lower than the average co-occurence number have been removed. Details on the connections are provided in Section 2.3 of the Supplementary Material.

figure 6

Topic co-occurence network for the publication on hate speech from 1992 to 2019

From the analysis of the links it is possible disclosing interesting relations between research fronts, which underline the multi-disciplinary nature of online hate research and the crossbreeding between different disciplines and research subjects. The strongest connection is between Topics 1 and 3, dealing respectively with the broad debate of hate speech versus free speech and the constitutional right of freedom of expression, respectively. This relation reflects the fact that both the topics are related to the boundaries of freedom of expression; accordingly, it is obvious to observe an overlapping of these two themes among documents. Through the network visualization, we see that Topic 1, being a general theme, is connected with the majority of the nodes. Other most connecting nodes are referred to the topic dealing with the questions of free speech (Topic3 ) and to the activities of hateful users on online social media (Topic 4). An interesting clique shows how closely connected are also Topics 4, 8 and 12. The interactions of this subgroup of nodes reveal the relation between computer sciences and social sciences disciplines.

The importance of the retrieved topics in the network of connections can be inferred considering the degree centrality measures shown in Fig.  7 .

figure 7

Node centrality measures

Besides, closeness and betweenness centrality scores, displayed also in Fig.  7 , are of interest to quantitatively characterize the topography of the topic co-occurrence network. Specifically, closeness centrality measures the mean distance from a vertex to other vertices (Zhang and Luo 2017 ), whereas the betweenness centrality of a node measures the extent to which the node is part of paths that connect an arbitrary pair of nodes in the network (Brandes 2001 ); put in other way betweenness measure quantifies the degree to which a node serves as a bridge. It results that the thematic topics such as “social networks and communities” (Topic 4), “religion and extremism” (Topic 7) and “cyberhate” (Topic 10) are ranked first. These findings suggest that those research areas are more effective and accessible in the network and form the densest bridges with other nodes.

We also built the topic co-occurrence networks distinguishing three different stages in the historical development of online hate speech research, as displayed in Fig.  8 . The initial development stage refers to 1992–2009 and accounts for 227 publications; then there was the rapid development stage (2010–2015 years), when the results of research have been rapidly emerging with more than 450 scientific contributes published. Finally, we move into the last three years-period (2016–2019), when more than 300 papers are being published every year. As before, the connections in the network maps represent the interactions between the different research fields and, in each network, the edges whose weight is lower than the average co-occurrence number for the corresponding temporal interval have been suppressed.

figure 8

Topic co-occurence networks

It can be seen that as new topics emerge, the network structure becomes richer in terms of connections, showing the most important footprints of the related research activities. Through a qualitative analysis of Fig.  8 , we observe that with advances in computer technology, especially developments in data or text mining and information retrieval, research on online hate speech based on computer sciences continues to receive more and more attention. In fact, from the analysis of links in the co-occurrence topic network, it was possible to identify, in the last period, interesting relations especially between Topics 8 and 12.

Overall, in the last thirty years, topics related to online hate research tend to arrange into three main clusters (Fig.  9 ). The fast greedy algorithm implemented in the R package igraph (Csardi and Nepusz 2006 ) was used to group the topics. The first meaningful cluster includes six topics that bring together basic themes of hate speech, covered by Topics 1,2, 3, as well as online speech designed to promote hate on the basis of race (Topic 5) and religion and extremism (Topic 7). At this group belongs also Topic 9, associated with analysis of discourse and language. In the smallest group, we find that cyberhate and gendered online hate are clustered together. Finally, Topics 4, 6, 8 and 12, in the last group, reveals that publications in this cluster deal with machine learning techniques and hateful content on online social media.

figure 9

Topic clusters

Research activity in the identified topics

Influential countries in the identified topics.

Table  1 summarises the top-ten countries’ share of publication in the study of online hate speech for each of the identified clusters. Actually, for the themes of the first group (Topics 1, 3, 5, 7 and 9), owing to the presence of ex-fair scores, are displayed the first 11 publisher countries. Not surprisingly, the Anglo-Saxon States are very involved in research dealing with the general debate of “hate speech” versus “freedom of expression”. In fact, in these countries, especially in the United States, the constitutional protection of freedom of speech is vigorously defended. Conversely, other countries, mainly European countries, prohibit certain forms of speech and even the expression of certain opinions, such those to incite hatred, but also to publicly deny crimes of genocide (e.g., the Holocaust) or war crimes.

United States and United Kingdom holds the largest share of publications in the other two domains, suggesting that both these countries had a pioneering role and the strongest impact in the new strands of research focused on machine learning algorithms and text classification as a viable source for identification of hate speech as well as on investigating cyberbullying and gendered hate behaviours. Interestingly, research on automatic identification and classification of hateful languages on social media using machine learning methods emerges as an important component also in the Italian, Indian and Spanish research activity on hate speech. Finally, for the third cluster (Topics 10 and 11), we see that a not negligible number of publications on themes linked with cyberbulling and gendered hate originated from Finland, which occupies the third position in the correspondent ranking, followed by Italy and South Africa.

Country cooperation in the identified topics

The preliminary analysis in the previous subsection depicts the overall landscape of countries contribution to the studies on online hate speech. Moving forward, by taking into account authors’ affiliation, it is possible to analyse the level of cooperation between countries. It is worth noting that country research collaboration is a valuable means since it allows scholars to share information and play their academic advantages (Ebadi and Schiffauerova 2015 ), and is deemed the hallmark of contemporary scientific production. To highlight the country research collaboration in the online hate speech research field, we constructed the countries cooperation network, displayed in the Supplementary Material. In what follows, we take into account the cooperation with respect to each of the clusters identified in the “Topic interactions” section. The characteristics of international cooperation between different countries in each domain of online hate research can be argued from the network maps visualised in Figs. 10 ,   11 and   12 . We see that the United States is the major partner in international cooperation in the field of online hate speech, in all identified topic clusters. Academic cooperative connections among countries, generating research on Topics 1, 2, 3, 5, 7 and 9, primarily originate from the Unites States, United Kingdom, Germany, Brazil, Sweden and Spain. The top ranked countries by centrality, for the cluster that embraces Topics 4, 6, 8 and 12, are Unites States, United Kingdom, China, Italy, Spain, Germany and Brazil. Finally, for the research related to the remaining Topics 10 and 11, we discover a wider scientific collaboration, mainly, among United States, Spain, South Korea, Czech Republic and Germany.

figure 10

Country cooperation network for topics 1, 2, 3, 5, 7, 9

figure 11

Country cooperation network for topics 4,6,8,12

figure 12

Country cooperation network for topics 10,11

In the last years, the dynamics and usefulness of social media communications are seriously affected by hate speech (Arango et al. 2019 ), which has become a huge concern, attracting worldwide interest. The attention payed to online hate speech by the scientific research community and by policy makers is a reaction to the spread of of hate speech, in all its various forms, on the many social media and other online platforms, and to the pressing need to guarantee non-discriminatory access to digital spaces, as well.

Motivated by these concerns, this paper has presented a bibliometric study of the world’s research activity on online hate speech, performed with the aim of providing an overview of the extent of published research in this field, assessing the research output and suggesting potential, fruitful, future directions.

Beyond the identification and mapping of traditional bibliometric indicators, we focused on the contemporary structure of the field that is composed of a certain variety of themes that researchers are engaging with over the years. Through topic modelling analysis, implemented via LDA algorithm, the main research topics of online hate have been identified and grouped in categories. In contrast to previous researches, designed as qualitative literature review, this study provides a broader and quantitative analysis of publications of online hate speech. In this respect, it should be noted that although topic models do not offer new insights on representing the main area of the research, it gives to our knowledge, for the first time, the possibility of discovering latent and potentially useful contents, shape their possible structure and relationships underlying the data, with quantitative methods.

As pointed out by different authors (see, among others, Yau et al. 2014 ), the combination of topic modelling algorithms and bibliometrics allows the researcher to feature the retrieved topics with a number of topic-based analytic indicators, other than to investigate their significance and dynamic evolution, and model their quantitative relations.

Our analysis has systematically sorted the relevant international studies, producing a visual analysis of 1614 documents published in Scopus database, and generated a large amount of empirical data and information.

The following conclusions can be drawn. The volume of academic papers published in a representative sample, from 1992 to 2019, displays a significant increase after 2010; thus, in the main evolution of online hate speech research, it has been possible to identify an initial development stage (1992–2010) followed by a rapid development (2011–2019). Many countries are regularly involved in publishing in this research field, even if the majority of studies have been conducted in the context of the high-income western countries; in this respect, it is notable the research strength of United States and United Kingdom. Also, the empirical findings provide evidence for the capability of countries to build significant research cooperation. The topic analysis retrieves twelve recurring topics, which can be characterised into three clusters. Specifically, the contemporary structure of online hate literature can be viewed as composed by a group dealing with basic themes of hate speech, a collection of documents that focuses on hate-speech automatic detection and classification by machine-learning strategies and, finally, a third core which focuses on specific themes of gendered hate speech and cyberbullying. Once the groups have been created and identified, the next step is to understand the evolutionary process of each of them over the years. Looking chronologically at online hate research development, we have a trace of an overall shift from generic and knowledge based themes towards approaches that face the challenges of automatic detection of hate speech in text and hate speech addressed to specific targets. The combination of topic modelling algorithms with tools of network analysis enabled to clarify topics relation and has made clear and visible the interdisciplinary nature of the field. The confluence of online hate studies into hate-speech automatic detection and classification approaches stresses how the problem of hate diffusion should be studied not only from the social point of view but also from the point of view of computer science. In our opinion, the main reason driving the shift from conceptually oriented studies to more practically oriented ones is that there is a growing demand for finding statistical methodologies to automatically detect hate speech and make it possible to build effective counter-measures. It is worth noting, however, that the observed shift does not remove the subjective nature of hate speech denotation, given that automatic detection and classification methods need ultimately to rely on a specific definition of what communication should be interpreted as offensive, dangerous and conveying hate. Moreover, supervised techniques require an annotated set of social media contents that will be used to train the algorithms to better detect and score online comments but interpretation of hatefulness varies significantly among individual raters (Salminen et al. 2019 ). There is also evidence highlighting how people from different countries perceive hatefulness of the same online comments differently (Salminen et al. 2018 ). The authors of these studies suggest that online hate should be defined as a subjective experience rather than as an average score that is uniform to all users and that research should concentrate on how incorporate user-level features when scoring and automating the processing of online hate.

An other interesting field worth of investigation is related to the producers of online hate speech. While the online behaviour of organized hate groups has been extensivily analysed, only recently attention has focused on the behaviour of individuals that produce hate speech on the mainstream platforms (see Siegel 2020 , and references herein). Finally, future study should continue to investigate tools devoted to effectively combat online hate speech. Since content deletion or user suspension may be charged with censorship and overblocking, one alternate strategy is to oppose hate content with counter-narratives (Gagliardone et al. 2015 ). Therefore, a promising line of research is the exploration of effective counterspeech techniques which can vary according to hate speech targets, online platforms and haters characteristics.

We think that this work, based on solid data and computational analyses, might provide a clearer vision for researchers involved in this field, providing evidence of the current research frontiers and the challenges that are expected in the future, highlighting all the connections and implications of the research in several research domains.

Alghamdi, R., & Alfalqi, K. (2015). A survey of topic modeling in text mining. International Journal of Advanced Computer Science and Applications , 6 (1), 147–153. https://doi.org/10.14569/IJACSA.2015.060121 .

Article   Google Scholar  

Arango, A., Pérez, J., & Poblete, B. (2019). Hate speech detection is not as easy as you may think: A closer look at model validation. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, pp. 45–54, https://doi.org/10.1145/3331184.3331262 .

Aria, M., & Cuccurullo, C. (2017). Bibliometrix: an R-tool for comprehensive science mapping analysis. Journal of Informetrics , 11 (4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007 .

Barua, A., Thomas, S., & Hassan, A. (2014). What are developers talking about? An analysis of topics and trends in stack overflow. Empirical Software Engineering , 19 (3), 619–654. https://doi.org/10.1007/s10664-012-9231-y .

Beausoleil, L. E. (2019). Free, hateful, and posted: rethinking first amendment protection of hate speech in a social media world. Boston College Law Review , 60 (7), 2101–2144.

Google Scholar  

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM , 55 (4), 77–84. https://doi.org/10.1145/2133806.2133826 .

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research , 3 (1), 993–1022. https://doi.org/10.1162/jmlr.2003.3.4-5.993 .

Article   MATH   Google Scholar  

Brandes, U. (2001). A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology , 25 (2), 163–177. https://doi.org/10.1080/0022250X.2001.9990249 .

Brettschneider, C. (2013). Value democracy as the basis for viewpoint neutrality: A theory of free speech and its implications for the state speech and limited public forum doctrines. Northwestern University Law Review , 107 , 603–646.

Cohen-Almagor, R. (2016). Hate and racist speech in the United States: A critique. Philosophy and Public Issues , 6 (1), 77–123.

Cohen-Almagor, R. (2019). Racism and hate speech: A critique of Scanlon’s contractual theory. First Amendment Studies , 53 (1–2), 41–66. https://doi.org/10.1080/21689725.2019.1601579 .

Csardi, G., & Nepusz, C. (2006). The igraph software package for complex network research. Inter Journal Complex Systems :1695, http://igraph.org

Deerwester, S., Dumais, S., Furnas, G., Landauer, T., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science , 41 (6), 391–407.

Ebadi, A., & Schiffauerova, A. (2015). How to receive more funding for your research? get connected to the right people. PloS One ,. https://doi.org/10.1371/journal.pone.0133061 .

Fortuna, P., & Nunes, S. (2018). A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR) ,. https://doi.org/10.1145/3232676 .

Gagliardone, I., Gal, D., Alves, T., & Martinez, G. (2015). Countering Online Hate Speech . Paris: UNESCO Publishing.

Greenacre, M., & Blasius, J. (2006). Multiple Correspondence Analysis and Related Methods . New York: Chapman and Hall/CRC. https://doi.org/10.1201/9781420011319 .

Book   MATH   Google Scholar  

Greene, A. R., & Simpson, R. M. (2017). Tolerating hate in the name of democracy. The Modern Law Review , 80 (4), 746–765. https://doi.org/10.1111/1468-2230.12283 .

Hofmann, T. (1999). Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 50–57, https://doi.org/10.1145/312624.312649 .

MacAvaney, S., Yao, H. R., Yang, E., Russell, K., Goharian, N., & Frieder, O. (2019). Hate speech detection: Challenges and solutions. PloS one ,. https://doi.org/10.1371/journal.pone.0221152 .

MATLAB (2018). version 9.5.0.944444 (R2018b). The MathWorks Inc., Natick, Massachusetts.

McPhee, C., Santonen, T., Shah, A., & Nazari, A. (2017). Reflecting on 10 years of the TIM review. Technology Innovation Management Review , 7 (7), 5–20. 10.22215/timreview/1087.

Price, D. J. (1963). Little Science, Big Science . New York: Columbia University Press.

Book   Google Scholar  

Salminen, J., Veronesi, F., Almerekhi, H., Jun, S., & Jansen, BJ. (2018). Online Hate Interpretation Varies by Country, But More by Individual: A Statistical Analysis Using Crowdsourced Ratings. In: 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 88–94, https://doi.org/10.1109/SNAMS.2018.8554954 .

Salminen, J., Almerekhi, H., Kamel, AM., Jung, S., & Jansen, BJ. (2019). Online Hate Ratings Vary by Extremes: A Statistical Analysis. In: CHIIR ’19: Proceedings of the 2019 Conference on Human Information Interaction and Retrieval, Association for Computing Machinery, New York, NY, USA, pp. 213–217, https://doi.org/10.1145/3295750.3298954 .

Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, Association for Computational Linguistics, pp. 1–10, https://doi.org/10.18653/v1/W17-1101 .

Sellars, AF, (2016). Defining Hate Speech. Berkman Klein Center Research Publication No. 2016-20 Paper No. 16-48, Boston University School of Law, Public Law Research, Boston University School of Law, Public Law Research, available at SSRN: https://doi.org/10.2139/ssrn.2882244 .

Siegel, A. A. (2020). Online hate speech. In J. Tucker & N. Persily (Eds.), Social Media and Democracy: The State of the Field . Cambridge: Cambridge University Press.

Steyvers, M., & Griffiths, T. (2006). Probabilistic topic models. In T. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.), Latent Semantic Analysis: A Road to Meaning . New Jersey: Lawrence Erlbaum.

Strossen, N. (2016). Freedom of speech and equality: Do we have to choose? Journal of Law and Policy , 25 (1), 185–225.

Strossen, N. (2018). HATE: Why We Should Resist it With Free Speech, Not Censorship (Inalienable Rights) . New York: Oxford University Press.

Suominen, A., & Toivanen, H. (2016). Map of science with topic modeling: Comparison of unsupervised learning and human-assigned subject classification. Journal of the Association for Information Science and Technology ,. https://doi.org/10.1002/asi.23596 .

Waqas, A., Salminen, J., Jung, Sg, Almerekhi, H., & Jansen, B. (2019). Mapping online hate: A scientometric analysis on research trends and hotspots in research on online hate. PLoS One ,. https://doi.org/10.1371/journal.pone.0222194 .

Yau, C., Porter, A., Newman, N., & Suominen, A. (2014). Clustering scientific documents with topic modeling. Scientometrics ,. https://doi.org/10.1007/s11192-014-1321-8 .

Zhang, J., & Luo, Y. (2017). Degree Centrality, Betweenness Centrality, and Closeness Centrality in Social Network. In: Proceedings of the 2017 2nd International Conference on Modelling, Simulation and Applied Mathematics (MSAM2017), Advances in Intelligent Systems Research, pp. 300–303, https://doi.org/10.2991/msam-17.2017.68 .

Download references

Acknowledgements

Open access funding provided by Università degli Studi G. D'Annunzio Chieti Pescara within the CRUI-CARE Agreement. We are grateful to the reviewers for their useful comments and suggestions which have significantly improved the quality of the paper.

Author information

Authors and affiliations.

Department of Neuroscience, Imaging and Clinical Sciences, University G. d’Annunzio of Chieti–Pescara, Chieti, Italy

Alice Tontodimamma

Department of Economics, University G. d’Annunzio of Chieti–Pescara, Pescara, Italy

Eugenia Nissi

Department of Legal and Social Sciences, University G. d’Annunzio of Chieti–Pescara, Pescara, Italy

Annalina Sarra & Lara Fontanella

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Annalina Sarra .

Ethics declarations

Conflict of interest.

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 508 KB)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Tontodimamma, A., Nissi, E., Sarra, A. et al. Thirty years of research into hate speech: topics of interest and their evolution. Scientometrics 126 , 157–179 (2021). https://doi.org/10.1007/s11192-020-03737-6

Download citation

Received : 28 January 2020

Published : 30 October 2020

Issue Date : January 2021

DOI : https://doi.org/10.1007/s11192-020-03737-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Online hate speech
  • Bibliometrics analysis
  • Topic models
  • Latent Dirichlet allocation

Mathematics Subject Classification

  • Find a journal
  • Publish with us
  • Track your research
  • EXPLORE Random Article
  • Happiness Hub

How to Do Research for a Speech

Last Updated: August 28, 2022 References

This article was co-authored by Emily Listmann, MA . Emily Listmann is a Private Tutor and Life Coach in Santa Cruz, California. In 2018, she founded Mindful & Well, a natural healing and wellness coaching service. She has worked as a Social Studies Teacher, Curriculum Coordinator, and an SAT Prep Teacher. She received her MA in Education from the Stanford Graduate School of Education in 2014. Emily also received her Wellness Coach Certificate from Cornell University and completed the Mindfulness Training by Mindful Schools. This article has been viewed 36,297 times.

Whether you've been tapped to give a speech at the annual sales convention, or asked to prepare a write a speech for a research paper, you'll need to research your speech to make sure it's as solid as possible. You’ll not only want to be sure that the points you make are backed up by solid evidence, you’ll also want to be sure that your research is presented to listeners in a way that’s easily digestible. We'll help you research your speech topic and draw from a variety of sources so you can write a speech that will leave your audience wanting to hear more.

Preparing to Do Research

Step 1 Clarify your topic....

  • For instance, if you’ve been assigned a broad topic, like “contemporary politics,” be sure you narrow it down to address a particular trend that most interests you or that you think is most relevant, such as “Immigration in 21st-Century U.S. Politics.”
  • If you know enough about your subject, do a rough outline in advance, so that you can identify each specific point or subtopic that you’d like to address and investigate. Your research may reshape your outline a bit, but it will give you a strong idea about where to start looking. [2] X Research source
  • Select a point of view and form a thesis before you start your research. This way you have a steady base to start looking for arguments that align with your point of view.

Step 2 Identify your purpose.

  • If you’re doing an informative presentation about the hospitality industry, you’ll want to focus on gathering related statistical data. If you’re trying to make an emotional appeal for the audience to support your cancer research charity, you may want to look for personal stories of survivors whose lives were saved by new treatments.
  • If you’re making a particular argument, you should look specifically for information that supports your angle rather than general data about a topic.

Step 3 Keep the timing in mind.

  • Besides knowing their level of knowledge on your topic, it also helps to know if they have something in common, like they are all high school students or all biologists. This can help determine the type of research that will be most effective for your speech.
  • If your audience members all share a particular interest or work in the same field, consider looking for examples and evidence that would speak especially to them. If they’re high school kids, tie in a pop-culture reference they would know; if they’re biologists, see if you can use biological examples to prove your point.

Step 5 Anticipate audience questions.

  • For instance, if you’re giving a speech on plastic recycling, you might think that an audience would ask questions like, “How much plastic is recycled each year?” or “How much energy does it take to recycle a plastic bottle versus creating a new one?” If you don’t know the answer to these questions, be sure to look them up during your research.

Step 6 Understand the context of your topic.

  • For example, if you’re giving a speech on online dating apps, you could look into the history of personal ads and other dating services or frame your topic in terms of the entire app industry.

Conducting Good Research

Step 1 Keep your research organized.

  • Keep all your research in one place, like a notepad or a word document.
  • Create a labeling system for your research. If you made an outline in advance, you can simply enter information in under the relevant bullet points. Otherwise, make sure that you devise headings to categorize your notes. You can do this at the end by going back and highlighting everything that speaks to a specific point in a particular color. [7] X Research source
  • Be sure to record the bibliographic information for all your sources. You want to make sure you know where you got each bit of evidence, who wrote it, and when it was published so that you can confirm and acknowledge it in your speech.

Step 2 Go to a library to find credible sources on your topic.

  • In each case, you’ll want to evaluate the credibility of the source by considering the credentials of the author(s) and their potential biases. If an author has no professional expertise or first-hand knowledge of the topic, they aren’t the most authoritative source for your research. If the author has a known political or personal agenda, it’s important to acknowledge that the information they present may reflect that bias.
  • Keep in mind that, in order to be persuasive, you will need to cite your sources within your speech, either by including them in a handout or digital presentation slides or by saying them aloud. Either way, you want to be sure that your sources are adding to credibility, not detracting from it. [9] X Research source
  • To evaluate the credibility of an online source, consider its accountability. That is, determine who’s responsible for the content and whose interests the content is designed to serve. If the content is reliable, there will generally be plenty of information available about the organization and the author(s) behind it so that you can assess their credentials and potential biases.
  • In general, government sites (identified by .gov and .mil) and educational sites (.edu) are more credible than others because their content is regulated and formally restricted.
  • Use only the official websites of those organizations directly related to your topic whenever possible. For example, if you’re giving a talk on emissions regulations, do research at the Environmental Protection Agency’s official website.
  • Using a custom search engine like Google Scholar can help eliminate obviously sketchy sources from your search results.

Step 4 Use a mix of primary and secondary sources.

  • Interview or speak with an expert in the subject if you can. That way you quote them in your speech and be more impactful.

Finding Strong Sources to Support Your Arguments

Step 1 Identify quotations from authoritative authors.

  • Most of your speech should be your words and ideas, so use quotes sparingly.
  • Be sure to mention the source and set up the context from which a quote is drawn as you present it. If it’s from a top scientist, you’ll want to note that your quote is from the most recent journal article published by a Nobel-winning biochemist.
  • For instance, if you’re presenting on gender and children’s mental health, you could give the statistics that show how surprisingly prevalent anxiety and depression are (affecting one in eight children) and then how even more surprisingly common they are for girls (affecting one in three).
  • It also helps if you find a way to connect your stats to concrete ideas. For example, if you’re talking about the distance between the Earth and the Moon, make the 252,088 miles more tangible by explaining that the distance is equal to 32 times the diameter of the Earth, or help people visualize the distance by using a basketball and a tennis ball that are 23 feet and 7 inches apart from one another.
  • Don’t overload your audience with numbers; you want them to walk away with a few memorable stats rather than a jumble of numbers.

Step 3 Track down striking and relevant visual aids.

  • If you’re giving a speech about a battle during WWII, give your audience photos that illustrate your descriptions. If you’re presenting on global warming, offer a graph that shows world climate changes since they’ve been recorded.

Step 4 Decide which evidence is the most compelling.

  • Also be sure that you’re including a variety of types of evidence.

Expert Q&A

  • If the source for any information or data that comes up in your research cannot be verified, it’s best to leave it out of your speech. Thanks Helpful 1 Not Helpful 0
  • Make sure your sources are current. Giving a speech about accident rates using 10-year-old data will make you look foolish. Thanks Helpful 0 Not Helpful 0
  • Always acknowledge your sources for research. If you do not, it constitutes plagiarism. [15] X Research source Thanks Helpful 0 Not Helpful 0

You Might Also Like

Become Taller Naturally

  • ↑ http://writingcenter.unc.edu/handouts/speeches/
  • ↑ http://www.write-out-loud.com/howtoresearch.html
  • ↑ http://sixminutes.dlugan.com/toastmasters-speech-7-research-your-topic/
  • ↑ https://www.boundless.com/communications/textbooks/boundless-communications-textbook/preparing-the-speech-a-process-outline-3/steps-of-preparing-a-speech-26/topic-research-gathering-materials-and-evidence-120-10680/
  • ↑ http://2012books.lardbucket.org/books/a-primer-on-communication-studies/s09-02-researching-and-supporting-you.html
  • ↑ http://2012books.lardbucket.org/books/a-primer-on-communication-studies/s09-preparing-a-speech.html#jones_1.0-ch09_s02_s02_s05_t01

About this article

Emily Listmann, MA

Did this article help you?

Become Taller Naturally

  • About wikiHow
  • Terms of Use
  • Privacy Policy
  • Do Not Sell or Share My Info
  • Not Selling Info

Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, speech-to-speech translation.

35 papers with code • 3 benchmarks • 5 datasets

Speech-to-speech translation (S2ST) consists on translating speech from one language to speech in another language. This can be done with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, which is text-centric. Recently, works on S2ST without relying on intermediate text representation is emerging.

Benchmarks Add a Result

--> --> --> -->
Trend Dataset Best ModelPaper Code Compare
Hokkien→En (Two-pass decoding)
GenTranslateV2
SeamlessM4T Large

speech research paper

Most implemented papers

Robust speech recognition via large-scale weak supervision.

speech research paper

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

AudioLM: a Language Modeling Approach to Audio Generation

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs).

Direct speech-to-speech translation with a sequence-to-sequence model

We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.

Towards Automatic Face-to-Face Translation

Rudrabha/LipGAN • ACM Multimedia, 2019 2019

As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.

ESPnet-ST: All-in-One Speech Translation Toolkit

We present ESPnet-ST, which is designed for the quick development of speech-to-speech translation systems in a single framework.

Direct speech-to-speech translation with discrete units

When target text transcripts are available, we design a joint speech and text training framework that enables the model to generate dual modality output (speech and text) simultaneously in the same inference pass.

Multimodal and Multilingual Embeddings for Large-Scale Speech Mining

Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from Common Crawl.

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation

google-research-datasets/cvss • LREC 2022

In addition, CVSS provides normalized translation text which matches the pronunciation in the translation speech.

My Speech Class

Public Speaking Tips & Speech Topics

717 Good Research Paper Topics

Photo of author

Jim Peterson has over 20 years experience on speech writing. He wrote over 300 free speech topic ideas and how-to guides for any kind of public speaking and speech writing assignments at My Speech Class.

good and interesting research paper topics

Some examples of common research paper styles include:

  • Argumentative Research Papers
  • Persuasive Research Papers
  • Education Research Papers
  • Analytical Research Papers
  • Informative Research Papers

Your research essay topic may also need to be related to the specific class you are taking. For example, an economics class may require a business research paper, while a class on human behavior may call for a psychology research paper.

The requirements for your paper will vary depending on whether you are in high school, college, or a postgraduate student. In high school, you may be able to choose an easy topic and cite five or six sources you found on Google or Yahoo!, but college term papers require more in-depth research from reliable sources, such as scholarly books and peer-reviewed journals.

Do you need some help with brainstorming for topics? Some common research paper topics include abortion, birth control, child abuse, gun control, history, climate change, social media, AI, global warming, health, science, and technology. 

But we have many more!

On this page, we have hundreds of good research paper topics across a wide range of subject fields. Each of these topics could be used “as is” to write your paper, or as a starting point to develop your own topic ideas.

Can We Write Your Speech?

Get your audience blown away with help from a professional speechwriter. Free proofreading and copy-editing included.

How to Choose Your Research Paper Topic

The first step to developing an interesting research paper is choosing a good topic. Finding a topic can be difficult, especially if you don’t know where to start. Finding the Right Research Paper Topic

If you are in a class that allows you to choose your own term paper topic, there are some important areas to consider before you begin your project:

Your Level of Interest: Research papers are time-consuming; you will be spending countless hours researching the topic and related topics, developing several primary and secondary sources, and putting everything together into a paper that is coherent and accomplishes your objectives. If you do not choose a topic you are passionate about, the process will be far more tedious, and the finished product may suffer as a result.

Your Level of Experience: Being interested in a topic is great, but it is even more helpful if you already know something about it. If you can find a topic that you already have some personal and/or professional experience with, it will vastly reduce the amount of research needed and make the whole process much easier.

Available Information on the Topic: Be sure to choose a topic that is not only interesting but also one that has numerous sources available from which to compile your research. A researchable topic with several potential sources gives you access to the level of information you need to become an authority on the subject.

Your Audience: An interesting topic to you may not necessarily be interesting to your professor or whoever is grading your research paper. Before you begin, consider the level of interest of the person(s) who will be reading it. If you are writing a persuasive or argumentative essay, also consider their point of view on the subject matter.

As you begin researching your topic, you may want to revise your thesis statement based on new information you have learned. This is perfectly fine, just have fun and pursue the truth, wherever it leads. If you find that you are not having fun during the research phase, you may want to reconsider the topic you have chosen.

The process of writing the research paper is going to be very time consuming so it’s important to select a topic that is going to sustain your interest for the duration of the project. It is good to select a topic that is relevant to your life since you are going to spend a long time researching and writing about it. Perhaps you are considering starting your own business or pursuing a career in politics. Look through the suggested research paper topics and find one in a category that you can relate to easily. Finding a topic that you have some personal interest in will help make the arduous task a lot easier, and the project will have better results because of your vested interest.

Our List of Research Topics and Issues

Affirmative action, health, pharmacy, medical treatments, interpersonal communication, marketing and advertising, barack obama, discrimination, bill clinton, hilary clinton, computer crimes and security, cosmetic surgery, controversial, criminal justice, donald trump, easy/simple, environment, family violence, foreign policy, gambling and lotteries, the lgbtq community, generational conflict, gun control, hate crimes, immigration, middle east, maternity/paternity leave, natural disasters, police work, population explosion, pornography, prisons and prisoners, prostitution, ronald reagan, student loan debt, teen issues, women, mothers, what, why, and how, relationships.

We compiled an exhaustive list of topics that would make excellent research papers. The topics are specifically organized to help you find one that will work for your project. Broad topics are headed, and then below them are narrowed topics, all to help you find an area to focus on. The way we have organized the topics for research papers can save you lots of time getting prepared to write your research paper.

We have topics that fit into categories that cover such areas as education, environmental sciences, communication and languages, current events, politics, business, criminal justice, art, psychology, economics to name just a few. Simply get started by choosing the category that interests you and peruse through the topics listed in that category and you’ll be well on your way to constructing an excellent research paper.

Be sure to check other topics ideas: persuasive speech topics , argumentative speech topics , policy speech topics . We also have some sample outlines and essay templates .

  • What limits are responsible?
  • What limits are realistic?
  • How to protect abortion doctors, pregnant women, and the protection of abortion clinics vs. the right to protest
  • Partial birth abortion
  • Scientific evidence vs. definition of viability
  • Stem cell research
  • Unborn victims of violence
  • Relative equality has been achieved vs. serious inequities continue
  • Can racial balance in business, education, and the military be achieved without policies that promote Affirmative Action
  • Reverse discrimination
  • NOW, National Organization for Women
  • No government support vs. fairness to parents who pay twice for education
  • Separation of church and state vs. religion’s contribution to the public good
  • Placement by age vs. placement by academic ability
  • Mainstreaming students with disabilities vs. special classrooms for their special needs
  • Required standardized tests for advancement vs. course requirements only
  • National standardized tests vs. local control of education
  • Discrimination in education
  • Multicultural/bilingual education vs. traditional basics
  • Teacher competency tests vs. degree requirements only
  • Teacher’s needs/demands vs. teaching as a service profession
  • Policing schools
  • School’s responsibility vs. parental responsibility for school violence
  • Drug and alcohol abuse, pregnancy, suicide
  • Zero tolerance toward violence vs. toughness with flexibility
  • Permit corporal punishment
  • Exams often do little more than measure a person’s ability to take exams. Should exams be outlawed in favor of another form of assessment?
  • Should teens in the U.S. adopt the British custom of taking a “gap year” between high school and college?
  • In some European schools, fewer than 10% of students get “As”. Is there grade inflation in the U.S.? Why so many “As” for Americans?
  • Education and funding
  • Grade inflation
  • No Child Left Behind Act: Is it working?
  • Home schooling
  • Standardized tests
  • Are children smarter (or more socialized) because of the Internet?
  • Should the federal government be allowed to regulate information on the internet?
  • How has the music industry been affected by the internet and digital downloading?
  • How does a search engine work?
  • What are the effects of prolonged steroid use on the human body?
  • What are the benefits and hazards of medical marijuana?
  • How does tobacco use affect the human body?
  • Do the benefits of vaccination outweigh the risks?
  • What are some common sleep disorders and how are they treated?
  • What are the risks of artificial tanning or prolonged exposure to the sun?
  • Should thin people have to pay Medicare and other health costs for the health problems of obese people? Should obese people have higher premiums?
  • Low carbohydrate vs. low fat diets
  • Benefits of weight training vs. aerobics
  • How much weekly exercise is needed to achieve lasting health benefits
  • Health websites give too much information
  • Psychological disorders, such as cutting and self-harm, eating disorders, Autism, Tourette Syndrome, ADHD, ADD, Asperger Syndrome
  • Are we taking it too far by blaming fast food restaurants for obesity? When is it individual responsibility and when is it appropriate to place blame?
  • Should companies allow employees to exercise on work time?
  • Steroids, Antibiotics, Sprays; Are food manufacturers killing us?
  • Alternative medicine
  • Alzheimer’s disease
  • Causes of eating disorders, society’s portrayal of women
  • Eating disorders statistics
  • Down’s syndrome
  • Birth control
  • Dietary supplements
  • Exercise and fitness
  • Heart disease
  • In vitro fertilization
  • Attention deficit disorder
  • Investigate the history and authenticity of ADHD and ADD.
  • Organic foods
  • Prescription drugs
  • Vegetarianism
  • Learning disabilities
  • Schizophrenia
  • Coma recovery: techniques, successes, new strategies.
  • What are the primary types of cancer, and in what ways are they related?
  • Investigate the success ratio of holistic and non-medical cancer treatments.
  • Is Alzheimer’s inevitable? Examine theories regarding its prevention.
  • What forms of physical degeneracy are seen as linked to aging?
  • Investigate the connections between emotional stability and physical well-being, and provide evidence as to how the two may be related.
  • Investigate differences in rates of injury recovery and overcoming illness based on cultural parameters.
  • Examine the modern history of viral epidemics, researching what is known about the emergence of deadly viruses.
  • Examine how congenital heart disease may be treated, and how it differs from other forms of heart disease.
  • Is occasional depression a natural state to an extent, and is society too eager to treat this as a disorder?
  • Investigate Sociopathy, determine biological and psychological roots, typical patterns, and potentials of treatment.
  • How are compulsive behaviors determined as such? Explore examples of anal retention and expulsion, OCD, etc., as offering accepted criteria.
  • Research and analyze the nature of codependency as both a normal state of relations and as an unhealthy extreme.
  • Investigate the history and practice of electroshock, analyzing how and why this extreme treatment came to be widely used.
  • Hoarding: symptoms and treatments, causes, types of hoarding
  • Limits on extraordinary, costly treatments vs. doing everything possible
  • Nutritional/alternative therapy vs. mainstream medical treatment insurance coverage for alternative treatment?
  • Government grants for alternative treatment research?
  • Health superiority of alternative treatments?
  • Assisted suicide vs. preservation of life
  • Governmental insurance requirements
  • Should there be a national database to track controlled substances (i.e., OXYCODONE) or should it be a state issue?
  • Should parents avoid vaccinating their children?
  • Decline of communication due to technology
  • Online social networks and their influence
  • Impact of texting and cell phones
  • How do men and women communicate differently using body language, and why does it matter (in dating, the workplace, and social circles)?
  • Limitations of the media
  • Marketing to children
  • Sexual innuendos in marketing
  • Global marketing trends
  • Should certain kinds of ads be banned in the interest of health/morality/annoyance – alcohol, cigarettes, prescription meds, etc…?
  • Children’s programming and advertising
  • Most controversial political ads
  • Media response and public outcry to political ads
  • Campaign funds and their relation to political advertising
  • Domestic policy
  • Separation of church and state
  • Judge nominations and make up of supreme court
  • Congressional opposition to presidential nominees/filibusters
  • Affirmative action
  • Erosion of civil liberties vs. protection against terrorism
  • Patriot Act One and Two
  • Most developed nations have universal health coverage. Why doesn’t the U.S., the wealthiest nation, have it?
  • Tax cut as economic stimulation
  • Needs of the states vs. needs of the individuals
  • Budget deficits and deficit spending
  • Rich vs. poor
  • Protection of victims vs. freedom of speech/rights of the accused
  • How to improve race relations
  • Women still earn only 75 cents for every $1 a man earns. Explain why.
  • Discrimination in the workplace: analyzing issues for today’s corporations.
  • Gender discrimination
  • Interracial marriage
  • Should government impose restrictions on what kinds of foods can be served in school cafeterias?
  • Pros and cons of school uniforms.
  • Do children learn better in boys-only and girls-only schools?
  • Charter schools
  • Prayer in schools
  • Rights of the individual vs. community safety (or campus safety)
  • Funding for research
  • U.S. obligation to third world countries
  • Manufacturing of generic drugs vs. U.S. pharmaceutical companies
  • How contagious diseases “jump” from animal hosts to human
  • What treatments are available to people infected with HIV and are they effective?
  • Right to privacy of a child with AIDS vs. safety of other children
  • Limits for campus safety vs. personal freedom
  • Implications on violence and crime
  • Issues with binge drinking
  • Should the U.S. lower the drinking age to 18?
  • Leniency because of condition vs. community safety
  • Revoking drivers license vs. being able to attend classes and work
  • Age discrimination of violators
  • Animal rights vs. medical research
  • Should it be illegal to use animals for sports and entertainment?
  • Humane treatment of animals vs. factory farms
  • Animal welfare in slaughter houses
  • Animal protection vs. business, employment interests
  • School prestige vs. academic standards
  • Should shoe companies be able to give away free shoes and equipment to high school athletes?
  • Should college athletes be paid?
  • Doping in sports
  • What are the effects on children whose parents push them in sports?
  • Steroids: Should they be legalized?
  • Title IX: Has it helped women’s sports? Has it harmed men’s sports?
  • Social effects of team sports
  • Needed in public school library/curriculum?
  • Needed in entertainment industry?
  • Needed on the Internet?
  • Should parents censor textbooks and other literature for children in schools?
  • Parental filters on the Internet. Does censorship actually increase curiosity and use of pornography?
  • How is internet censorship used in China and around the world?
  • How has United States censorship changed over the decades?
  • Democratic kingmaker, influence on political succession
  • Impact of global initiative
  • Influence on fundraising
  • Influence as Secretary of State
  • Foreign policies
  • Influence on women
  • ACT or SAT score requirements
  • Promotional techniques, such as 1st time scholarships
  • 4 year vs. 2 year colleges
  • College admission policies
  • College tuition planning
  • Distance education
  • Diploma mills
  • Online porn vs. freedom of speech
  • Stalking, invasion of privacy vs. reasonable access
  • Hacking crimes–workable solutions?
  • What are the latest ways to steal identity and money?
  • From where does spam email come and can we stop it?
  • How do computer viruses spread and in what ways do they affect computers?
  • Cyber security
  • Securing Internet commerce: is it possible in today’s arms’ race of hackers and evolving technology?
  • Is downloading of media (music, videos, software) infringing on the rights of media producers and causing economic hardships on media creators?
  • Should media producers prosecute students and individuals that they suspect of downloading copyrighted materials?
  • Programs such as SPOTIFY and PANDORA
  • Copyright Law
  • Age limitations on surgery
  • Addiction to surgery
  • Demand for beauty by society
  • The dangers of breast implants for teenagers
  • The cost of cosmetic surgery
  • Plastic surgery
  • Weight loss surgery
  • Are surgeons “scissor happy,” and are surgeries widely unnecessarily
  • Negative texting, instant messaging, email
  • Is cyber-bullying as bad as face-to-face?
  • Kinds of punishment for cyber-bullying
  • Media response
  • Should the state or federal government put laws into place to prevent bullying?
  • Is homosexuality a choice, or are people born gay?
  • Evolution vs. Creationism.
  • Should “under God” remain in the Pledge of Allegiance?
  • Is healthcare a right or a privilege?
  • Fossil fuels vs. alternative energy.
  • Transgender bathroom policies.
  • Capitalism vs. socialism.
  • Should parents be allowed to spank their children?
  • Should sanctuary cities lose their federal funding?
  • The pros and cons of gun control.
  • Should the U.S. continue drone strikes in foreign countries?
  • Was the U.S. justified in going to war with Iraq?
  • How to solve the Israeli-Palestinian conflict.
  • The pros and cons of animal testing.
  • Do pro athletes have the right to sit during the national anthem?
  • Incarceration rates in the U.S.
  • Technology and the criminal justice system.
  • Police brutality and minorities.
  • Should the police wear body cameras?
  • In what circumstances should the death penalty be allowed?
  • Should we have stiffer penalties for drunk driving?
  • Should those who text while driving be put in jail?
  • White-collar crime and punishment.
  • Criminalizing protests and activism.
  • The rise of wrongful convictions.
  • Mutual consent vs. exploitation
  • Campuses with “no touch” policy
  • Drugs associated to Date Rape
  • Violence and Rape
  • Government support vs. parental financing
  • Benefits vs. harmful effects
  • Trump’s unconventional presidential campaign.
  • The psychology of Donald Trump.
  • Who is behind Trump’s political rise?
  • Donald Trump and evangelical voters.
  • Donald Trump the businessman.
  • Trump’s war on the press (aka “fake news”).
  • The Trump Organization and conflicts of interest.
  • The border wall and illegal immigration policy.
  • Global warming and climate change policy.
  • Trump-Russia collusion.
  • The rapid rise of “The Resistance.”
  • Trump’s legislative agenda; e.g., health care, tax policy, deregulation, etc.
  • Trump’s “America First” trade and foreign policy.
  • The case for (or against) the Trump presidency.
  • Punishment vs. treatment
  • Family reactions
  • Social acceptance
  • Community safety vs. legalization
  • United States military involvement in Colombian drug trade?
  • Drug legalization
  • Abstinence Program: Do they work?
  • Should the federal government legalize the use of marijuana?
  • What is the true key to happiness?
  • What is the cause of America’s obesity crisis?
  • Why sleep is necessary.
  • Are plastic bottles really bad for you?
  • How to encourage people to recycle more.
  • How 3D printers benefit everyone.
  • How do GPS systems on smartphones work?
  • How have oil spills impacted the environment?
  • Verbal vs. nonverbal communication.
  • The accuracy of lie detector tests.
  • How Bill Gates and Steve Jobs changed the world.
  • The pros and cons of hitchhiking.
  • The PC vs. the Mac.
  • What causes tornadoes?
  • Pollution, air, and water
  • Endangered species
  • What are the risks of climate change and global warming?
  • Rain forests
  • Alternative energy
  • Alternative fuel/hybrid vehicles
  • Conservation
  • Deforestation
  • Greenhouse effect
  • Marine pollution
  • How have oil spills affected the planet and what steps are being taken to prevent them?
  • Sustainability of buildings
  • Recycling programs
  • Cost of “green” programs
  • Wind turbines
  • Landfill issues
  • Renewable fuels
  • Radioactive waste disposal
  • Soil pollution
  • Wildlife conservation: what efforts are being taken to protect endangered wildlife?
  • Excessive burden on industries?
  • Drilling for oil in Alaska’s ANWR (Arctic National Wildlife Refuge)
  • Gasoline consumption vs. SUV’s popularity
  • Wildlife protection vs. rights of developers
  • Clean air and water standards–weakened vs. strengthened
  • What are the dangers of scuba diving and underwater exploration?
  • Should the use of coal be subjected to stricter environmental regulations than other fuels?
  • Is global warming a hoax? Is it being exaggerated?
  • How much is too much noise? What, if anything, should we do to curb it?
  • Protecting victims vs. rights of the accused
  • Women who kill abusive husbands vs. punishment for murder
  • Marital rape?
  • How to protect children vs. respect for parental rights
  • Children who kill abusive parents
  • Child abuse–workable solutions?
  • Child abuse
  • Domestic abuse
  • Organic farming vs. mainline use of chemical sprays
  • How to best protect the environment; conservation
  • Family vs. corporate farms
  • Food production costs
  • Interventionism?
  • Third world debt and World Bank/International Monetary Fund
  • Military support vs. economic development of third world countries
  • Human rights violations
  • European Union in competition with the U.S.
  • Unilateralism
  • Relevance of the United Nations
  • Neocon role in foreign policy
  • Christian right influence on foreign policy
  • Pentagon vs. State Department
  • Nation building as a policy
  • Arms control
  • Obama’s National Strategy for Counterterrorism
  • Control of al Qaeda
  • Drawdown of U.S. Armed Forces in the Middle East
  • Cats vs. dogs: which makes the better pet?
  • My pet can live forever: why I love animal clones.
  • According to my social media profile, my life is perfect.
  • Football vs. baseball: which sport is America’s favorite pastime?
  • Starbucks vs. Caribou: whose coffee is better?
  • What does your dog really think of you?
  • Why millennials deserve lower pay.
  • What makes people end up with so many mismatched socks?
  • How to become a research paper master.
  • How reading Tuesdays with Morrie can make you wiser.
  • Easy way to earn revenues vs. social damage
  • Individual freedom vs. social damage
  • Do lotteries actually benefit education or is it a scam?
  • Can gamblers ever acquire a statistical advantage over the house in casino games?
  • Should there be a constitutional amendment that allows gays and lesbians to legally marry?
  • Adoption rights?
  • Need special rights for protection?
  • College campus response
  • Gay, lesbian, bisexual, or transgender
  • Gay parenting
  • Elderly to share in the tax burden vs. government support of elderly
  • Future of social security
  • Job discrimination
  • Child rearing
  • Employment issues
  • Generational differences
  • Community and police safety vs. unrestricted right to bear arms
  • NRA (National Rifle Association)
  • 2nd Amendment
  • Do states that allow citizens to carry guns have higher or lower crime rates?
  • Community safety vs. freedom of Speech
  • Punishment inequities
  • Persecution of alternative lifestyles
  • Church Arson: Hate crime?
  • Prevention of hazing
  • Greek organizations and rituals of hazing
  • Statistics of death or injury due to Hazing
  • High Schools and Hazing
  • What happened during the Salem witch trials?
  • How did trains and railroads change life in America?
  • What may have occurred during the Roswell UFO incident of 1947?
  • What Olympic events were practiced in ancient Greece?
  • How did Cleopatra come to power in Egypt? What did she accomplish during her reign?
  • What are the origins of the conflict in Darfur?
  • What was the women’s suffrage movement and how did it change America?
  • How was the assassination of Abraham Lincoln plotted and executed?
  • How did Cold War tension affect the US and the world?
  • What happened to the lost settlers at Roanoke?
  • How did Julius Caesar affect Rome?
  • How did the Freedom Riders change society?
  • What was the code of the Bushido and how did it affect samurai warriors?
  • How did Joan of Arc change history?
  • What dangers and hardships did Lewis and Clark face when exploring the Midwest?
  • How are the Great Depression and the Great Recession similar and different?
  • What was the Manhattan Project and what impact did it have on the world?
  • Why did Marin Luther protest against the Catholic Church?
  • How did the Roman Empire fall?
  • How did the black plague affect Europe?
  • How did Genghis Khan conquer Persia?
  • How did journalists influence US war efforts in Vietnam?
  • Who is Vlad the Impaler and what is his connection to Count Dracula?
  • Who was a greater inventor, Leonardo di Vinci or Thomas Edison?
  • What was the role of African Americans during the Revolutionary War?
  • What was Britain’s view of India during British rule?
  • What were the factors in the China-Tibet conflict?
  • Research and analyze the emergence of the Catholic Church as a political force following the collapse of the Roman Empire.
  • Investigate Dr. Eileen Powers’ claim that the Roman Empire was lost primarily due to an inability to perceive itself as subject to the change inevitable to all governments, or her “force of nature” theory.
  • Explore and discuss the actual cooperation occurring through the centuries of Barbarian conquest of Rome.
  • Examine the differences and similarities between Western and Eastern concepts and practices of kingship.
  • Investigate and explain the trajectory of ALEXANDER THE GREAT’s empire, with minimal emphasis on personal leadership.
  • To what extent did commerce first link Eastern and Western cultures, and how did this influence early international relations?
  • Research and analyze how Japan moved from a feudalistic to a modern state, and how geographic isolation played a role in the process.
  • Analyze the process and effects of Romanization on the Celtic people of ancient England: benefits, conflicts, influences.
  • Overview of British dominance of Ireland, Wales, and Scotland! How was this justified in each case, and what motivated the attempts over centuries of rebellion and failure?
  • Investigate the known consequences of Guttenberg’s printing press within the first 30 years of its invention, and only in regard to the interaction between European nations.
  • Identify and analyze the point at which the Reformation became fused with European politics and nationalist agendas.
  • To what extent did Henry VIII promote the Reformation, despite his vigorous persecution of heretics in England?
  • Trace and discuss the uses of papal power as a military and political device in the 14th and 15th centuries.
  • Research the city/state of Florence from the 13th to the 16th centuries, discussing how and why it evolved as so fiercely republican.
  • Compare and contrast the Russian Czarism of Peter, Elizabeth, and Catherine with the monarchies of England and France in the 18th and 19th centuries.
  • Investigate the enormous significance of Catholic Orthodoxy as the dominant faith in Russia, and its meaning and influence in an empire populated by a minimal aristocracy and predominant serfdom.
  • To what extent did Philip II’s religious convictions shape European policy and conflict in the 16th century?
  • Trace the path leading to the convocation of the Estates in France in the late 18th century, leading to the Revolution. Assess political and social errors responsible.
  • What eventually ended serfdom in Russia, and why were numerous attempts to end it by the Czars in power consistently unsuccessful?
  • Research and report on how England was transformed in the 19th century by the industrial revolution and the advent of the railroad.
  • Compare and contrast the consequences of the industrial revolutions in England and America in terms of urbanization.
  • What were the circumstances leading to World War I, and how might the war have been averted?
  • Assess the Cold War of the 20th century in an historical context: can any parallels be made between this conflict and other ongoing tensions between major powers in earlier centuries?
  • Analyze Roosevelt’s decisions in implementing the New Deal, beginning with the closing of the banks. Suggest alternative strategies, or reinforce the rationale of the actions.
  • What architectural marvels were found in Tenochtitlan, capital of the Aztec Empire?
  • What was the cultural significance of the first moon landing?
  • Food programs
  • Welfare reform
  • Governmental supplementation
  • Homeless: urban restrictions vs. needs of the destitute
  • Workable solutions?
  • Realistic limits vs. openness toward people in need
  • English as official language vs. respect for diversity
  • Should illegal immigrants be made legal citizens?
  • Access to public school and public programs for Illegal Aliens
  • Policing borders–workable solutions?
  • Employment and/or taxation for Illegal Aliens
  • International trade
  • Democratization
  • “Shock and awe”
  • U.S. occupation vs. liberation
  • Iraqi run vs. U.S. puppet state
  • Oil and Gas prices-Control of resources
  • Effective self-government
  • War on Terrorism
  • Is America winning or losing the War? What is the measurement of success? Have the benefits outweighed the costs?
  • Parental leave for both parents
  • FMLA (Family Medical Leave Act)
  • Bonding time
  • Preemptive strike policy
  • Precision weapons
  • Intelligence reliability
  • Afghanistan – a success or stalemate
  • Should the U.S. have mandatory military conscriptions? For whom?
  • Governmental support
  • Preparedness
  • School emergency plans
  • Community warning systems
  • Damage costs
  • U.S. presidential elections should be decided by the popular vote, rather than the Electoral College.
  • The minimum wage should be increased to provide a “livable” wage for working families.
  • There should be stiffer penalties for those who commit animal cruelty.
  • School vouchers increase competition and create better quality schools.
  • The corporate tax rate should be lowered to create more jobs.
  • Social Security should be privatized.
  • Human torture should be banned in all circumstances.
  • Affirmative action is still needed to ensure racial and gender equality.
  • The U.S. dollar should go back on the gold standard.
  • Euthanasia and assisted suicide should be outlawed.
  • Police brutality vs. dangers that police face
  • Racially motivated brutality?
  • Politician’s right to privacy vs. the public’s right to know
  • Amount of money going into presidential campaigns
  • Views on abortion, gay marriage, and other controversial topics
  • Political debates throughout history
  • Third-party candidates at presidential debates
  • Rights of religious citizens vs. freedom from imposition (e.g. prayer in schools)
  • Religious motivation for political involvement vs. cultural pluralism
  • Christian Right’s influence on foreign policy
  • How serious? Causes? Workable solutions?
  • Funding abortion as a form of birth control in third world countries?
  • What would happen globally if the demand for natural resources is greater than the supply?
  • Limitation of social deterioration vs. freedom of speech
  • Definition of Pornography
  • Child Pornography
  • Building prisons vs. alternative sentencing
  • Adjusted sentencing for lesser crimes
  • Community service
  • Diversion Programs for inmates
  • How does the prison population in America compare to other nations?
  • Prostitution laws in the US and abroad
  • Benefits and drawbacks to legalizing prostitution
  • Psychological effect on prostitutes and former prostitutes
  • Sex slavery, buying and selling
  • Should the government be allowed to wire tap without permission?
  • What limitations, if any, should be applied to the paparazzi?
  • What medical information should be confidential? Who, if anybody, should have access to medical records?
  • Does the public have a right to know about a public figure’s private life?
  • Privacy rights
  • Do harsher punishments mean fewer convictions?
  • Date rape: consent vs. exploitation
  • Drugs-Rohypnol, GHB, KETAMINE
  • Legalization of Date Rape Drugs
  • Recently, a 17-year-old boy was sentenced to 10 years in prison for having consensual oral sex with a 15-year-old girl. Are statutory rape laws patronizing to girls and discriminatory to boys?
  • Acquaintance rape
  • Is there one true religion?
  • Freedom of religion
  • Offer distinct reasons why the Bible should be studied as literature, removed from religious significance.
  • From Hollywood to the White House: the political rise of Ronald Reagan.
  • The Great Communicator: how Reagan captured the hearts of Americans.
  • 1981 assassination attempt: bullet wound leaves Reagan inches away from death.
  • Reagan appoints the first female Supreme Court justice.
  • The PATCO breakup and decline of the labor unions.
  • Tax cuts and “Reaganomics.”
  • The “Iran-Contra” scandal.
  • Reagan, Gorbachev, and the end of the Cold War.
  • The final act: Reagan’s Alzheimer’s disease diagnosis and long goodbye.
  • How has airport security intensified since September 11th, 2001?
  • Identity theft
  • Homeland Security: Are we safer since the creation of this department?
  • Should the government use invasive pat-downs and body scans to ensure passenger safety or are there better methods?
  • Is arming Pilots a good idea?
  • What responsibilities do secret service agents have?
  • Student loan scams
  • How to avoid student loan debt
  • Managing student loan debt
  • Driverless cars and the future of transportation.
  • Breaking the glass ceiling: the impact of the women’s rights movement.
  • How seniors contribute to societal well-being.
  • How disabled individuals are viewed by society.
  • The modern-day civil rights movement.
  • Has technology made us more detached from society?
  • The role of religion in society.
  • In today’s society, are we better off or worse off than previous generations?
  • Popular music and its impact on the culture.
  • Class and geographical segregation.
  • The differences between life in the city, suburbs, and/or rural areas.
  • Should parents be able to create designer babies?
  • Should microchips be implanted inside humans for better tracking and security?
  • Will smart watches eventually replace cell phones?
  • The pros and cons of being a global citizen.
  • Progressive vs. flat tax
  • Excessive taxes vs. worthwhile programs
  • Is text messaging contributing to teen illiteracy?
  • How eating disorders impact teens.
  • Tablets vs. textbooks.
  • Do standardized tests improve teen education?
  • Are violent video games contributing to juvenile delinquency?
  • Is English literature relevant for today’s teens?
  • Should the HPV vaccine be required for teen girls?
  • Do teachers inflate grades so students can pass?
  • Should advertisers be allowed to target teens?
  • How to encourage teens to stop smoking.
  • The causes and effects of teen alcohol and drug abuse.
  • How to prevent teen pregnancy.
  • Osama Bin Laden
  • World Trade Center and Pentagon bombings
  • September 11, 2001
  • War on terrorism
  • Afghanistan
  • Bioterrorism
  • Al Qaida: Has U.S. policy actually spread terrorism rather than contained it? Will it get better or worse? Why and how?
  • Can terrorism ever be justified?
  • What kind of person becomes a suicide bomber?
  • What were the circumstances surrounding the death of Osama Bin Laden?
  • Has the Patriot Act prevented or stopped terrorist acts in America?
  • How is text messaging affecting teen literacy?
  • Cell Phones: How have they changed us socially?
  • Does the Information Age mean we are losing important historical information?
  • Where did hip-hop music originate?
  • A day in the life of a Buddhist monk.
  • How does the brain store and retrieve memories?
  • What life is like inside an ant colony.
  • The case for and against the existence of UFOs.
  • Can virtual reality adequately substitute for actual reality?
  • Are dreams hidden messages or just hot air?
  • Why do people collect the most ridiculous things?
  • When is it time to get out of an abusive relationship?
  • The art of pretending to care.
  • Public attitudes toward veterans
  • Health issues caused by service time
  • Organizations for veterans
  • Governmental support for veterans
  • What programs are available to help war veterans get back into society?
  • Iraq War Vets: Are they being cheated on medical benefits?
  • Is there a glass ceiling?
  • Obstacles to women running for political office?
  • Should women be priests, pastors, ministers, and rabbis?
  • What differences, if any, are there in children who are raised by stay-at-home moms and working moms? Does society today still discriminate against working mothers who wish to have flexible work schedules?
  • Should stay-at-home moms get a salary from the government?
  • Why do we sleep?
  • How do GPS systems work?
  • Who was the first person to reach the North Pole?
  • Did anybody ever escape Alcatraz?
  • What was life like for a gladiator?
  • Are there any effective means of repelling insects?
  • How is bulletproof clothing made?
  • How was the skateboard invented and how has it changed over the years?
  • What is life like inside of a beehive?
  • Where did hip hop originate and who were its founders?
  • What makes the platypus a unique and interesting mammal?
  • What is daily life like for a Buddhist monk?
  • How did gunpowder change warfare?
  • How were cats and dogs domesticated and for what purposes?
  • What do historians know about ninjas?
  • Are humans still evolving?
  • What is the curse of the pharaohs?
  • Why was Socrates executed?
  • How did ancient sailors navigate the globe?
  • How are black holes formed?
  • How do submarines work?
  • Do lie detector tests accurately determine truthful statements?
  • How does a hybrid car save energy?
  • What ingredients can be found in a hotdog?
  • How does a shark hunt?
  • How does the human brain store and retrieve memories?
  • How does stealth technology shield aircraft from radar?
  • What causes tornados?
  • How does night vision work?
  • What causes desert mirages, and how do they affect wanderers?
  • What are sinkholes, and how are they formed?
  • What are the major theories explaining the disappearance of the dinosaurs?
  • Should we reform laws to make it harder to get a divorce?
  • Divorce rates
  • Family relationships
  • Family values
  • Race relations
  • Marriage and Divorce
  • A view of home life and its effect on child development
  • How 4 generations in the workplace can work together.
  • Building positive employee relationships
  • Modern work environments
  • Business leadership
  • Workforce regulations
  • Small business and taxation
  • Corporate law
  • Issues in modern Human Resources: Are today’s corporations patronizing employees or being more responsible for them?
  • Cultural conflict in globalization: Strategies for successfully establishing a presence in a foreign culture
  • Corporate abuse: How can executives so successfully manipulate corporations criminally?
  • Identifying stakeholders in non-public companies: is the corporate responsibility the same as for public offerings?
  • Devise a new model of leadership for business today, incorporating elements of existing leadership models and theories.
  • Examine the actual impact of social media as a business promotion instrument.
  • Devise a scenario in which traditionally unethical business practices may be justified.
  • Should newspaper reporters be required to reveal their sources?
  • Do the media (both print and broadcast) report fairly? Do they ever cross the line between reporting the news and creating the news?
  • Does news coverage favor whites?
  • What steps are involved in creating a movie or television show?
  • How have the film and music industries dealt with piracy?
  • Media conglomerates/ownership
  • Minorities in mass media
  • Portrayal of women
  • Reality television
  • Television violence
  • Media portrayals
  • Sensationalized media
  • Examine the issues of responsibility in pharmaceutical companies’ promotion of drugs in the media.
  • Forensic science technology
  • What are the current capabilities and future goals of genetic engineers?
  • What obstacles faced scientists in breaking the sound barrier?
  • What is alchemy and how has it been attempted?
  • What technologies are available to home owners to help them conserve energy?
  • Nuclear energy
  • Clean energy resources
  • Wind energy: Is wind energy really that inexpensive? Is it effective? Is it practical?
  • What are the dangers and hazards of using nuclear power?
  • Investigate Freud’s contributions to psychology as they exist today: what value remains?
  • Are there gender foundations to psychology and behavior that are removed from cultural considerations? To what extent does gender actually dictate thought process?
  • To what extent is sexual orientation dictated by culture, and is there an orientation not subject to social and cultural influences?
  • Investigate the psychological process in group dynamics with regard to the emergence of leaders and the compliance of others.
  • Compare and contrast Jung, Freud, and Adler: explore distinctions and commonalities.
  • What is “normal,” and to what extent is psychology reliant on culture to define this?
  • Research and assess the effectiveness of radical psychotherapies and unconventional treatments.
  • Research the concept of human will as both a component of individual psychology and a process or element removed from it.
  • To what extent is self-image influenced by culture in regard to eating disorders? Are external factors entirely to blame?
  • How do centuries-old beliefs of madness and dementia relate to modern conceptions of mental illness?
  • Is psychology itself inevitably a non-science in that virtually any theory may be substantiated, or is there a foundation of science to the subject to which all theorists must conform?
  • Examine Euripides and gender psychology: what do the Trojan Women and Medea reveal?
  • Using three characters, explore Chaucer’s insight into human behavior in The Canterbury Tales.
  • Identify the true relationship between Dante and Virgil in The Divine Comedy, emphasizing Dante’s reliance on the poet.
  • Research and discuss the English fascination for euphemism and ornate narratives in the 16th century, beginning with John Lyly.
  • Examine any existing controversies regarding Shakespearean authorship, citing arguments on both sides.
  • Analyze similarities and differences between Marlowe and Shakespeare in regard to Tamburlaine and Titus Andronicus.
  • Defend or support Bloom’s assertion of Shakespeare as the “inventor of the human being.”
  • To what degree are Shakespeare’s plays influenced by, or reflective, of the Elizabethan era? Identify specific cultural and national events linked to at least 3 plays.
  • Analyze the unusual construction of A Winter’s Tale in regard to transition from comedy to drama. Is this valid? Does the transition benefit or harm the play?
  • Support the belief that Shakespeare is representing himself as Prospero through evidence, or similarly refute the belief.
  • Why was extreme violence so popular in English Reformation drama? Cite Marlowe, Kyd, Webster, and Shakespeare.
  • Analyze the metaphysical in Donne’s poetry: is it spiritual, existential, or both?
  • What is Shelley seeking to say in Frankenstein? Support your answer with passages from the novel.
  • Compare and contrast Tolstoy’s Anna Karenina with Flaubert’s Madame Bovary, noting the characters of the heroines.
  • It is argued that Dickens failed when he turned to serious, romantic narrative in his novels. Using Copperfield, Great Expectations, and Dombey and Son, defend or refute this claim.
  • Assess Dickens’ stance as a moralist in Bleak House and Hard Times: to what extent does he seek reform, and to what does he comment on the human condition?
  • Was the Harry Potter phenomenon warranted by quality of storytelling or more a matter of public receptivity at the time combined with media exposure?

Top 10 Microphone Isolation Shield Reflectors + Buyer’s Guide 2021

Best Microphones for Streaming, Gaming and Live Chats in 2022

20 thoughts on “717 Good Research Paper Topics”

How has music evolved? How has music effected history? Music of the past vs music of the present. How has the music industry effected the music’s quality?

Do you think abortion is legal? Why they do abortion?

Why are people instinctively afraid of animals that are not mammals?

Should abortion be legalized? Should domestic abuse and child abuse victims be granted clemency for killing their abuser?

Jewish holocaust and its contribution to European History, specifically Germany

What is the most popular college in the United States?

The Black Knight: Space Waste or Alien Satellite? The Moon Landing: Real or Hollywood Hoax? Have We Become Too Politically Correct? Paranormal Research: Real? Fake? Should it be offered in college? Who really was Jack the Ripper? Can a zombie apocalypse truly occur? Who is the best or worst president of the USA? The Men in Black: real or hoax?

Why Marching Band is a sport.

Marching band is not a sport

how did aids start?

Topic : Alternative medicine Research question : Does the alternative medicine is safe and standardized Hypothesis : analyse the quality controle of alternative medicine formulations

Does our nostalgic music/childhood songs affect our present lifestyle, and in what ways?

reverse discriminations is still discrimination so there’s no such thing as that. like reverse racism isn’t a thing because that is still racism

Men on birth control and not women.

You forget the topic Islamophobia 😉

You should add a music section. Is Muzio Clementi overshadowed by Mozart? The Toccata and Fugue in D- really wasn’t written by Bach The use of the “Dies Irae” in cinema Why is modern music so repetitive and simple compared to classical music?

I want to do a research project on Education

I want to research but not get a perfect topic help me give me a best topic about current affairs

Topic: History. Are the Crusades oversimplified? where they justified? If so, how? Topic: Current affairs. Is the term “conspiracy theory” used to discredit any non-mainstream, controversial opinions. Topic: Gun control. Does limiting magazine capacity for firearms have any effect on gun crime? Are high-capacity magazines ever necessary for self-defense? Topic: Economics. Are minimum wage laws necessary to guarantee “decent”, or do the laws of supply and demand automatically ensure that?

Are women funny?

Leave a Comment

I accept the Privacy Policy

Reach out to us for sponsorship opportunities

Vivamus integer non suscipit taciti mus etiam at primis tempor sagittis euismod libero facilisi.

© 2024 My Speech Class

Democratic National Convention (DNC) in Chicago

Samantha Putterman, PolitiFact Samantha Putterman, PolitiFact

Leave your feedback

  • Copy URL https://www.pbs.org/newshour/politics/fact-checking-warnings-from-democrats-about-project-2025-and-donald-trump

Fact-checking warnings from Democrats about Project 2025 and Donald Trump

This fact check originally appeared on PolitiFact .

Project 2025 has a starring role in this week’s Democratic National Convention.

And it was front and center on Night 1.

WATCH: Hauling large copy of Project 2025, Michigan state Sen. McMorrow speaks at 2024 DNC

“This is Project 2025,” Michigan state Sen. Mallory McMorrow, D-Royal Oak, said as she laid a hardbound copy of the 900-page document on the lectern. “Over the next four nights, you are going to hear a lot about what is in this 900-page document. Why? Because this is the Republican blueprint for a second Trump term.”

Vice President Kamala Harris, the Democratic presidential nominee, has warned Americans about “Trump’s Project 2025” agenda — even though former President Donald Trump doesn’t claim the conservative presidential transition document.

“Donald Trump wants to take our country backward,” Harris said July 23 in Milwaukee. “He and his extreme Project 2025 agenda will weaken the middle class. Like, we know we got to take this seriously, and can you believe they put that thing in writing?”

Minnesota Gov. Tim Walz, Harris’ running mate, has joined in on the talking point.

“Don’t believe (Trump) when he’s playing dumb about this Project 2025. He knows exactly what it’ll do,” Walz said Aug. 9 in Glendale, Arizona.

Trump’s campaign has worked to build distance from the project, which the Heritage Foundation, a conservative think tank, led with contributions from dozens of conservative groups.

Much of the plan calls for extensive executive-branch overhauls and draws on both long-standing conservative principles, such as tax cuts, and more recent culture war issues. It lays out recommendations for disbanding the Commerce and Education departments, eliminating certain climate protections and consolidating more power to the president.

Project 2025 offers a sweeping vision for a Republican-led executive branch, and some of its policies mirror Trump’s 2024 agenda, But Harris and her presidential campaign have at times gone too far in describing what the project calls for and how closely the plans overlap with Trump’s campaign.

PolitiFact researched Harris’ warnings about how the plan would affect reproductive rights, federal entitlement programs and education, just as we did for President Joe Biden’s Project 2025 rhetoric. Here’s what the project does and doesn’t call for, and how it squares with Trump’s positions.

Are Trump and Project 2025 connected?

To distance himself from Project 2025 amid the Democratic attacks, Trump wrote on Truth Social that he “knows nothing” about it and has “no idea” who is in charge of it. (CNN identified at least 140 former advisers from the Trump administration who have been involved.)

The Heritage Foundation sought contributions from more than 100 conservative organizations for its policy vision for the next Republican presidency, which was published in 2023.

Project 2025 is now winding down some of its policy operations, and director Paul Dans, a former Trump administration official, is stepping down, The Washington Post reported July 30. Trump campaign managers Susie Wiles and Chris LaCivita denounced the document.

WATCH: A look at the Project 2025 plan to reshape government and Trump’s links to its authors

However, Project 2025 contributors include a number of high-ranking officials from Trump’s first administration, including former White House adviser Peter Navarro and former Housing and Urban Development Secretary Ben Carson.

A recently released recording of Russell Vought, a Project 2025 author and the former director of Trump’s Office of Management and Budget, showed Vought saying Trump’s “very supportive of what we do.” He said Trump was only distancing himself because Democrats were making a bogeyman out of the document.

Project 2025 wouldn’t ban abortion outright, but would curtail access

The Harris campaign shared a graphic on X that claimed “Trump’s Project 2025 plan for workers” would “go after birth control and ban abortion nationwide.”

The plan doesn’t call to ban abortion nationwide, though its recommendations could curtail some contraceptives and limit abortion access.

What’s known about Trump’s abortion agenda neither lines up with Harris’ description nor Project 2025’s wish list.

Project 2025 says the Department of Health and Human Services Department should “return to being known as the Department of Life by explicitly rejecting the notion that abortion is health care.”

It recommends that the Food and Drug Administration reverse its 2000 approval of mifepristone, the first pill taken in a two-drug regimen for a medication abortion. Medication is the most common form of abortion in the U.S. — accounting for around 63 percent in 2023.

If mifepristone were to remain approved, Project 2025 recommends new rules, such as cutting its use from 10 weeks into pregnancy to seven. It would have to be provided to patients in person — part of the group’s efforts to limit access to the drug by mail. In June, the U.S. Supreme Court rejected a legal challenge to mifepristone’s FDA approval over procedural grounds.

WATCH: Trump’s plans for health care and reproductive rights if he returns to White House The manual also calls for the Justice Department to enforce the 1873 Comstock Act on mifepristone, which bans the mailing of “obscene” materials. Abortion access supporters fear that a strict interpretation of the law could go further to ban mailing the materials used in procedural abortions, such as surgical instruments and equipment.

The plan proposes withholding federal money from states that don’t report to the Centers for Disease Control and Prevention how many abortions take place within their borders. The plan also would prohibit abortion providers, such as Planned Parenthood, from receiving Medicaid funds. It also calls for the Department of Health and Human Services to ensure that the training of medical professionals, including doctors and nurses, omits abortion training.

The document says some forms of emergency contraception — particularly Ella, a pill that can be taken within five days of unprotected sex to prevent pregnancy — should be excluded from no-cost coverage. The Affordable Care Act requires most private health insurers to cover recommended preventive services, which involves a range of birth control methods, including emergency contraception.

Trump has recently said states should decide abortion regulations and that he wouldn’t block access to contraceptives. Trump said during his June 27 debate with Biden that he wouldn’t ban mifepristone after the Supreme Court “approved” it. But the court rejected the lawsuit based on standing, not the case’s merits. He has not weighed in on the Comstock Act or said whether he supports it being used to block abortion medication, or other kinds of abortions.

Project 2025 doesn’t call for cutting Social Security, but proposes some changes to Medicare

“When you read (Project 2025),” Harris told a crowd July 23 in Wisconsin, “you will see, Donald Trump intends to cut Social Security and Medicare.”

The Project 2025 document does not call for Social Security cuts. None of its 10 references to Social Security addresses plans for cutting the program.

Harris also misleads about Trump’s Social Security views.

In his earlier campaigns and before he was a politician, Trump said about a half-dozen times that he’s open to major overhauls of Social Security, including cuts and privatization. More recently, in a March 2024 CNBC interview, Trump said of entitlement programs such as Social Security, “There’s a lot you can do in terms of entitlements, in terms of cutting.” However, he quickly walked that statement back, and his CNBC comment stands at odds with essentially everything else Trump has said during the 2024 presidential campaign.

Trump’s campaign website says that not “a single penny” should be cut from Social Security. We rated Harris’ claim that Trump intends to cut Social Security Mostly False.

Project 2025 does propose changes to Medicare, including making Medicare Advantage, the private insurance offering in Medicare, the “default” enrollment option. Unlike Original Medicare, Medicare Advantage plans have provider networks and can also require prior authorization, meaning that the plan can approve or deny certain services. Original Medicare plans don’t have prior authorization requirements.

The manual also calls for repealing health policies enacted under Biden, such as the Inflation Reduction Act. The law enabled Medicare to negotiate with drugmakers for the first time in history, and recently resulted in an agreement with drug companies to lower the prices of 10 expensive prescriptions for Medicare enrollees.

Trump, however, has said repeatedly during the 2024 presidential campaign that he will not cut Medicare.

Project 2025 would eliminate the Education Department, which Trump supports

The Harris campaign said Project 2025 would “eliminate the U.S. Department of Education” — and that’s accurate. Project 2025 says federal education policy “should be limited and, ultimately, the federal Department of Education should be eliminated.” The plan scales back the federal government’s role in education policy and devolves the functions that remain to other agencies.

Aside from eliminating the department, the project also proposes scrapping the Biden administration’s Title IX revision, which prohibits discrimination based on sexual orientation and gender identity. It also would let states opt out of federal education programs and calls for passing a federal parents’ bill of rights similar to ones passed in some Republican-led state legislatures.

Republicans, including Trump, have pledged to close the department, which gained its status in 1979 within Democratic President Jimmy Carter’s presidential Cabinet.

In one of his Agenda 47 policy videos, Trump promised to close the department and “to send all education work and needs back to the states.” Eliminating the department would have to go through Congress.

What Project 2025, Trump would do on overtime pay

In the graphic, the Harris campaign says Project 2025 allows “employers to stop paying workers for overtime work.”

The plan doesn’t call for banning overtime wages. It recommends changes to some Occupational Safety and Health Administration, or OSHA, regulations and to overtime rules. Some changes, if enacted, could result in some people losing overtime protections, experts told us.

The document proposes that the Labor Department maintain an overtime threshold “that does not punish businesses in lower-cost regions (e.g., the southeast United States).” This threshold is the amount of money executive, administrative or professional employees need to make for an employer to exempt them from overtime pay under the Fair Labor Standards Act.

In 2019, the Trump’s administration finalized a rule that expanded overtime pay eligibility to most salaried workers earning less than about $35,568, which it said made about 1.3 million more workers eligible for overtime pay. The Trump-era threshold is high enough to cover most line workers in lower-cost regions, Project 2025 said.

The Biden administration raised that threshold to $43,888 beginning July 1, and that will rise to $58,656 on Jan. 1, 2025. That would grant overtime eligibility to about 4 million workers, the Labor Department said.

It’s unclear how many workers Project 2025’s proposal to return to the Trump-era overtime threshold in some parts of the country would affect, but experts said some would presumably lose the right to overtime wages.

Other overtime proposals in Project 2025’s plan include allowing some workers to choose to accumulate paid time off instead of overtime pay, or to work more hours in one week and fewer in the next, rather than receive overtime.

Trump’s past with overtime pay is complicated. In 2016, the Obama administration said it would raise the overtime to salaried workers earning less than $47,476 a year, about double the exemption level set in 2004 of $23,660 a year.

But when a judge blocked the Obama rule, the Trump administration didn’t challenge the court ruling. Instead it set its own overtime threshold, which raised the amount, but by less than Obama.

Support Provided By: Learn more

Educate your inbox

Subscribe to Here’s the Deal, our politics newsletter for analysis you won’t find anywhere else.

Thank you. Please check your inbox to confirm.

speech research paper

Listen to Research Papers & Retain More

speech research paper

Featured In

Table of contents, listen to research papers aloud: we show you how, types of research papers, how text to speech works, technical language, length and density, time constraints, accessibility issues, proofreading, benefits of listening while reading research papers, text highlighting, speed controls, lifelike voices, ocr scanning, how you can listen to research papers aloud with the speechify website, how you can listen to research papers with the speechify chrome extension, how you can listen to research papers aloud with the speechify app, scan and listen to printed research papers with the speechify app, try speechify and read any text aloud, frequently asked questions.

Listen to research papers aloud and boost productivity and comprehension with our TTS .

In the realm of academia, research papers are a cornerstone for disseminating knowledge and contributing to the growth of various fields. However, the dense and technical nature of these papers can pose a challenge for many readers. Fortunately, text to speech (TTS) technology has emerged as a powerful tool to aid in the consumption of all academic papers. This article will explore different types of research papers, delve into the challenges of reading them, and highlight the benefits of using TTS, with a special focus on Speechify as a premier TTS app for academic purposes.

Research papers are a cornerstone of academic exploration, acting as vehicles for the dissemination of knowledge and the advancement of various fields. Within the realm of scholarly writing, a diverse array of research papers exists, each tailored to specific objectives and methodologies, including:

  • Analytical research papers: These delve into breaking down and examining a subject, often presenting an in-depth analysis of complex ideas or concepts.
  • Argumentative or persuasive research papers: These papers aim to convince the reader of a particular viewpoint, often involving the presentation of evidence and logical reasoning.
  • Cause and effect research papers: Focused on exploring the relationships between events, these papers aim to identify the causes and consequences of a particular phenomenon.
  • Compare and contrast research papers: These papers highlight similarities and differences between two or more subjects, encouraging critical thinking and analysis.
  • Definition research papers: These aim to provide a comprehensive understanding of a specific concept or term, often clarifying its various facets.
  • Experimental research papers: Centered around scientific experiments, these papers detail the methodology, results, and conclusions of research studies.
  • Interpretative research papers: These involve the interpretation of data, literature, or artistic works, requiring a nuanced understanding of the subject matter.
  • Survey research papers: Based on survey data, these papers analyze and present findings from questionnaires or interviews.

Text to speech (TTS) is a technology that converts written text into spoken language. This innovative system enables computers, devices, or applications to audibly articulate the content of written material, ranging from articles and documents to emails and web pages.

TTS works by processing the input text through algorithms that analyze linguistic elements, such as syntax and semantics, to generate a corresponding audio output. The synthesized speech can be delivered in a variety of voices and accents, often aiming for a natural and human-like sound.

TTS serves a crucial role in enhancing accessibility, aiding individuals with visual impairments or learning disabilities, and providing a versatile solution for consuming written content in situations where reading may be impractical or inconvenient.

Challenges of reading research papers and how text to speech can help

Studying often involves grappling with the challenges presented by research papers. As we navigate through these dense repositories of knowledge crucial for intellectual growth, one powerful ally emerges to mitigate these challenges: text to speech (TTS) technology. Let’s unravel the challenges posed by academic texts and delve into how TTS emerges as a transformative tool, enhancing accessibility, efficiency, and overall engagement:

One of the primary challenges of reading research papers is the abundance of technical language and specialized terminology. For individuals not well-versed in the specific field, deciphering these terms can be a daunting task. Text to speech (TTS) technology addresses this challenge by providing an auditory component to the reading process. Hearing the content aloud can aid in pronunciation, contextual understanding, and overall comprehension of intricate terms. By engaging multiple senses, TTS assists readers in navigating the intricate linguistic landscape of academic papers.

Research papers are often lengthy and densely packed with information, requiring dedicated time and mental focus to absorb the content fully. TTS can alleviate this challenge by allowing users to listen to papers while performing other tasks or listen at a faster rate than physical reading allows. By breaking down the information into manageable auditory segments, TTS enables users to absorb complex concepts without the need for prolonged, uninterrupted reading sessions.

Busy schedules, whether due to academic, professional, or personal commitments, can limit the time available for in-depth reading and analysis of research papers. TTS provides a solution by offering a more time-efficient means of consuming academic content. Users can listen to research papers during activities such as commuting, exercising, or doing household chores, maximizing the utility of their time and seamlessly integrating learning into their daily routines.

Traditional reading methods can pose accessibility challenges for individuals with conditions such as dyslexia, vision issues, or attention disorders. TTS technology serves as an inclusive solution, offering an alternative mode of content consumption. By listening to research papers, individuals with learning differences can overcome barriers related to text-based challenges, making academic content more accessible and fostering a more equitable learning environment. TTS also addresses eye strain issues associated with prolonged reading, promoting a more comfortable reading experience.

Writing research articles can be difficult and re-reading them for typos can seem even more daunting. Text to speech platforms offer a distinct advantage in catching typos and grammatical errors that might be easily missed during traditional visual proofreading. By listening to your research paper, you engage a different cognitive process, allowing you to detect discrepancies in syntax, grammar, and word choice more effectively. This dual approach to proofreading, both visual and auditory, enhances the overall accuracy of your written work, ensuring that typos are promptly identified and rectified, contributing to the production of polished and error-free research papers.

Listening while reading research papers can significantly enhance the learning experience. Combining auditory input with the visual engagement of reading creates a multimodal learning approach that caters to different learning styles. The act of listening to text to speech read research papers aloud can help improve concentration and maintain focus during the often rigorous and dense process of digesting such content. This dual-input method not only reinforces comprehension but also aids in retaining information by tapping into multiple cognitive channels. Additionally, it can make the learning process more dynamic and enjoyable, potentially reducing the perceived difficulty of understanding complex topics.

Why Speechify is the best text to speech for reading research papers

In the ever-expanding landscape of text to speech (TTS) applications, Speechify emerges as a standout contender, particularly for the discerning academic reader. Navigating the intricate realm of research papers demands a tool that not only provides seamless functionality but also caters to the diverse needs of scholars and learners. Speechify, with its comprehensive set of features and user-friendly design, stands out as the premier TTS app for reading research papers. Here are just a few unique features that position Speechify as the go-to TTS app for the academic community, elevating the reading experience for research papers to unprecedented heights:

Speechify offers text highlighting synchronized with the audio, facilitating better retention and comprehension. This feature is especially beneficial for individuals with dyslexia, ADHD, and other learning differences, who benefit substantially from following along with the text as it is read aloud.

Users can adjust the reading speed to suit their preferences, enabling a customized and comfortable listening experience. Students can easily slow down the reading as they take notes or speed up the reading to meet deadlines or boost productivity.

Speechify boasts a diverse range of 200+ natural-sounding voices indistinguishable from human speech across 30+ various languages and accents, accommodating a global audience and providing an immersive reading experience.

The OCR scanning functionality allows users to convert printed or handwritten text into digital format, enabling students to listen to any digital or physical text aloud.

How to read research papers aloud with Speechify

Speechify, the leading text to speech app, provides an unparalleled solution for listening to research papers aloud, offering a seamless and enriching experience for academic readers. In fact, let’s explore how you can use the Speechify website, Chrome extension , or app to listen to research papers, including how to listen to scanned research papers.

You can listen to research papers straight from the Speechify website. Simply follow the steps below:

  • Open your web browser and navigate to Speechify.com
  • Sign in or create an account if you haven't already.
  • Tap “New” in the left-hand toolbar.
  • Click “Text Document.”
  • Copy and paste the research paper copy into the text box.
  • Press submit.
  • Customize the voice, reading speed, and other preferences.
  • Click the "Play" button to listen to your research paper with Speechify.
  • Enjoy a seamless and accessible reading experience right in your web browser.

If your favorite browser is Google Chrome, you can also listen to research papers by using the Speechify Chrome extension. Here’s a breakdown of how to get started:

  • Install the Speechify Chrome extension from the Chrome Web Store.
  • Click on the Speechify icon in your browser toolbar.
  • Sign in or create an account.
  • Select the text you want to read and choose your desired settings.
  • Click the "Play" button on the Speechify pop-up to start the text to speech conversion.
  • Listen to the content being read out loud while you browse the web, and even adjust settings on the fly.

If you’d like to read research papers on the go, follow this easy tutorial showing how to use the Speechify app:

  • Download the Speechify IOS or Android app from the App store or Google Play store.
  • Open the app and sign in or create a new account.
  • Tap “Add” on the bottom toolbar.
  • Choose “From your computer.”
  • Choose files and import your research paper or copy and paste text into the app.
  • Customize voice preferences, reading speed, and other settings.
  • Tap the “Play” button to begin listening to the converted content.
  • Use the app’s additional features, such as highlighting text or changing the voice for a more interactive reading experience.

You can even read printed research papers with Speechify. Follow this guide to use the Speechify app to scan pictures of your physical documents:

  • Download the Speechify IOS or Android app on your mobile device from the App store or Google Play store.
  • Choose “Scan Pages.”
  • Grant Speechify access to your camera.
  • Use the OCR scanner to take photos of the research paper you wish to convert to audio files.
  • Press “Next” in the bottom right hand corner.
  • Click “Listen” in the top right hand corner.
  • Press “Save.”
  • Tap the "Play" button to begin listening to the new audio version of your research paper.
  • Customize the settings to suit your preferences, such as reading speed and voice selection.
  • Enjoy hands-free learning while you focus on comprehension or follow along as the text is highlighted.

Navigate through dense research papers, craft concise summaries or Google Doc annotations, review social science notes, explore journal articles, read ChatGPT responses, or immerse yourself in academic journals, check emails, and listen to research papers with the help of Speechify. Whether you're a student, researcher, or lifelong learner, Speechify makes it easy to transform any text into speech. Try Speechify for free today and transform your reading experience all while taking advantage of its user-friendly design and innovative features.

Yes, text to speech software, such as NaturalReader or Speechify can read HTML tags and citations aloud, making it easier to follow the structure of the paper and understand the sources cited.

Speechify allows you to easily listen to any physical or digital text aloud. Sign up for free and check it out today.

Text to speech can benefit language learners by improving their pronunciation and listening skills, increasing vocabulary and comprehension, and providing access to a variety of materials in the target language.

For academic research, some of the best podcasts include "The Research Report Show" and "Research in Action," which provide insights into the latest research across various fields.

Some of the best audiobooks about researching include, How to Read a Book by Mortimer Adler and The Craft of Research by Wayne Booth, Gregory Colomb, and Joseph Williams. These audiobooks are highly recommended for academic researchers.

You can listen to any text aloud, including research papers on an iPhone using the Speechify app.

PPT to video converter

Read Aloud: Transforming the Way We Experience Text

Cliff Weitzman

Cliff Weitzman

Cliff Weitzman is a dyslexia advocate and the CEO and founder of Speechify, the #1 text-to-speech app in the world, totaling over 100,000 5-star reviews and ranking first place in the App Store for the News & Magazines category. In 2017, Weitzman was named to the Forbes 30 under 30 list for his work making the internet more accessible to people with learning disabilities. Cliff Weitzman has been featured in EdSurge, Inc., PC Mag, Entrepreneur, Mashable, among other leading outlets.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

information-logo

Article Menu

speech research paper

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Recurrent neural networks: a comprehensive review of architectures, variants, and applications.

speech research paper

1. Introduction

2. related works, 3. fundamentals of rnns, 3.1. basic architecture and working principle of standard rnns, 3.2. activation functions, 3.3. the vanishing and exploding gradient problems, 3.4. bidirectional rnns, 3.5. deep rnns, 4. advanced variants of rnns, 4.1. long short-term memory networks, 4.2. bidirectional lstm, stacked lstm, 4.3. gated recurrent units, comparison with lstm, 4.4. other notable variants, 4.4.1. peephole lstm, 4.4.2. echo state networks.

  • Deep Echo-State Networks: Recent research has extended the ESN architecture to deeper variants, known as deep echo-state networks (DeepESNs). In DeepESNs, multiple reservoir layers are stacked, allowing the network to capture hierarchical temporal features across different timescales [ 87 ]. Each layer in a DeepESN processes the output from the previous layer’s reservoir, enabling the model to learn more abstract and complex representations of the input data. The state update for a DeepESN can be generalized as follows: h t l = tanh ( W i n l h t l − 1 + W r e s l h t − 1 l ) , (31) where l denotes the layer number, h t l is the hidden state at layer l , W i n l is the input weight matrix for layer l , and h t l − 1 is the hidden state from the previous layer. DeepESNs have demonstrated improved performance in tasks requiring the modeling of complex temporal patterns, such as speech recognition and financial time series forecasting [ 88 ].
  • Ensemble Deep ESNs: In ensemble deep ESNs, multiple DeepESNs are trained independently, and their outputs are combined to form the final prediction [ 89 ]. This ensemble approach leverages the diversity of the reservoirs and the deep architecture to improve robustness and accuracy, particularly in time series forecasting applications. For instance, Gao et al. [ 90 ] demonstrated the effectiveness of Deep ESN ensembles in predicting significant wave heights, where the ensemble approach helped mitigate the impact of reservoir initialization variability and improved the model’s generalization ability.
  • Input Processing with Signal Decomposition: Another critical aspect of effectively utilizing RNNs and ESNs is the preprocessing of input signals. Given the complex and often noisy nature of real-world time series data, signal decomposition techniques such as the empirical wavelet transform (EWT) have been employed to enhance the input to ESNs [ 91 ]. The EWT decomposes the input signal into different frequency components, allowing the ESN to process each component separately and improve the model’s ability to capture underlying patterns. The combination of the EWT with ESNs has shown promising results in various applications, including time series forecasting, where it helps reduce noise and enhance the predictive performance of the model.

4.4.3. Independently Recurrent Neural Network

5. innovations in rnn architectures and training methodologies, 5.1. hybrid architectures, 5.2. neural architecture search, 5.3. advanced optimization techniques, 5.4. rnns with attention mechanisms, 5.5. rnns integrated with transformer models, 6. public datasets for rnn research, 7. applications of rnns in peer-reviewed literature, 7.1. natural language processing, 7.1.1. text generation, 7.1.2. sentiment analysis, 7.1.3. machine translation, 7.2. speech recognition, 7.3. time series forecasting, 7.4. signal processing, 7.5. bioinformatics, 7.6. autonomous vehicles, 7.7. anomaly detection, 8. challenges and future research directions, 8.1. scalability and efficiency, 8.2. interpretability and explainability, 8.3. bias and fairness, 8.4. data dependency and quality, 8.5. overfitting and generalization, 9. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest, abbreviations.

AIArtificial intelligence
ANNArtificial neural network
BiLSTMBidirectional long short-term memory
CNNConvolutional neural network
DLDeep learning
GRUGated recurrent unit
LSTMLong short-term memory
MLMachine learning
NASNeural architecture search
NLPNatural language processing
RNNRecurrent neural network
RLReinforcement learning
SHAPsShapley Additive Explanations
TPUTensor processing unit
VAEVariational autoencoder
  • O’Halloran, T.; Obaido, G.; Otegbade, B.; Mienye, I.D. A deep learning approach for Maize Lethal Necrosis and Maize Streak Virus disease detection. Mach. Learn. Appl. 2024 , 16 , 100556. [ Google Scholar ] [ CrossRef ]
  • Peng, Y.; He, L.; Hu, D.; Liu, Y.; Yang, L.; Shang, S. Decoupling Deep Learning for Enhanced Image Recognition Interpretability. ACM Trans. Multimed. Comput. Commun. Appl. 2024 . [ Google Scholar ] [ CrossRef ]
  • Khan, W.; Daud, A.; Khan, K.; Muhammad, S.; Haq, R. Exploring the frontiers of deep learning and natural language processing: A comprehensive overview of key challenges and emerging trends. Nat. Lang. Process. J. 2023 , 4 , 100026. [ Google Scholar ] [ CrossRef ]
  • Obaido, G.; Achilonu, O.; Ogbuokiri, B.; Amadi, C.S.; Habeebullahi, L.; Ohalloran, T.; Chukwu, C.W.; Mienye, E.; Aliyu, M.; Fasawe, O.; et al. An Improved Framework for Detecting Thyroid Disease Using Filter-Based Feature Selection and Stacking Ensemble. IEEE Access 2024 , 12 , 89098–89112. [ Google Scholar ] [ CrossRef ]
  • Mienye, I.D.; Obaido, G.; Aruleba, K.; Dada, O.A. Enhanced Prediction of Chronic Kidney Disease using Feature Selection and Boosted Classifiers. In Proceedings of the International Conference on Intelligent Systems Design and Applications, Virtual, 13–15 December 2021; pp. 527–537. [ Google Scholar ]
  • Al-Jumaili, A.H.A.; Muniyandi, R.C.; Hasan, M.K.; Paw, J.K.S.; Singh, M.J. Big data analytics using cloud computing based frameworks for power management systems: Status, constraints, and future recommendations. Sensors 2023 , 23 , 2952. [ Google Scholar ] [ CrossRef ]
  • Gill, S.S.; Wu, H.; Patros, P.; Ottaviani, C.; Arora, P.; Pujol, V.C.; Haunschild, D.; Parlikad, A.K.; Cetinkaya, O.; Lutfiyya, H.; et al. Modern computing: Vision and challenges. Telemat. Inform. Rep. 2024 , 13 , 100116. [ Google Scholar ] [ CrossRef ]
  • Mienye, I.D.; Jere, N. A Survey of Decision Trees: Concepts, Algorithms, and Applications. IEEE Access 2024 , 12 , 86716–86727. [ Google Scholar ] [ CrossRef ]
  • Aruleba, R.T.; Adekiya, T.A.; Ayawei, N.; Obaido, G.; Aruleba, K.; Mienye, I.D.; Aruleba, I.; Ogbuokiri, B. COVID-19 diagnosis: A review of rapid antigen, RT-PCR and artificial intelligence methods. Bioengineering 2022 , 9 , 153. [ Google Scholar ] [ CrossRef ]
  • Alhajeri, M.S.; Ren, Y.M.; Ou, F.; Abdullah, F.; Christofides, P.D. Model predictive control of nonlinear processes using transfer learning-based recurrent neural networks. Chem. Eng. Res. Des. 2024 , 205 , 1–12. [ Google Scholar ] [ CrossRef ]
  • Shahinzadeh, H.; Mahmoudi, A.; Asilian, A.; Sadrarhami, H.; Hemmati, M.; Saberi, Y. Deep Learning: A Overview of Theory and Architectures. In Proceedings of the 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), Babol, Iran, 21–22 February 2024; pp. 1–11. [ Google Scholar ]
  • Baruah, R.D.; Organero, M.M. Explicit Context Integrated Recurrent Neural Network for applications in smart environments. Expert Syst. Appl. 2024 , 255 , 124752. [ Google Scholar ] [ CrossRef ]
  • Werbos, P. Backpropagation through time: What it does and how to do it. Proc. IEEE 1990 , 78 , 1550–1560. [ Google Scholar ] [ CrossRef ]
  • Lalapura, V.S.; Amudha, J.; Satheesh, H.S. Recurrent neural networks for edge intelligence: A survey. ACM Comput. Surv. (CSUR) 2021 , 54 , 1–38. [ Google Scholar ] [ CrossRef ]
  • Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997 , 9 , 1735–1780. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014 , arXiv:1406.1078. [ Google Scholar ]
  • Liu, F.; Li, J.; Wang, L. PI-LSTM: Physics-informed long short-term memory network for structural response modeling. Eng. Struct. 2023 , 292 , 116500. [ Google Scholar ] [ CrossRef ]
  • Ni, Q.; Ji, J.; Feng, K.; Zhang, Y.; Lin, D.; Zheng, J. Data-driven bearing health management using a novel multi-scale fused feature and gated recurrent unit. Reliab. Eng. Syst. Saf. 2024 , 242 , 109753. [ Google Scholar ] [ CrossRef ]
  • Niu, Z.; Zhong, G.; Yue, G.; Wang, L.N.; Yu, H.; Ling, X.; Dong, J. Recurrent attention unit: A new gated recurrent unit for long-term memory of important parts in sequential data. Neurocomputing 2023 , 517 , 1–9. [ Google Scholar ] [ CrossRef ]
  • Lipton, Z.C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv 2015 , arXiv:1506.00019. [ Google Scholar ]
  • Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019 , 31 , 1235–1270. [ Google Scholar ] [ CrossRef ]
  • Tarwani, K.M.; Edem, S. Survey on recurrent neural network in natural language processing. Int. J. Eng. Trends Technol. 2017 , 48 , 301–304. [ Google Scholar ] [ CrossRef ]
  • Tsoi, A.C.; Back, A.D. Locally recurrent globally feedforward networks: A critical review of architectures. IEEE Trans. Neural Netw. 1994 , 5 , 229–239. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mastorocostas, P.A.; Theocharis, J.B. A stable learning algorithm for block-diagonal recurrent neural networks: Application to the analysis of lung sounds. IEEE Trans. Syst. Man. Cybern. Part B (Cybern.) 2006 , 36 , 242–254. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Dutta, K.K.; Poornima, S.; Sharma, R.; Nair, D.; Ploeger, P.G. Applications of Recurrent Neural Network: Overview and Case Studies. In Recurrent Neural Networks ; CRC Press: Boca Raton, FL, USA, 2022; pp. 23–41. [ Google Scholar ]
  • Quradaa, F.H.; Shahzad, S.; Almoqbily, R.S. A systematic literature review on the applications of recurrent neural networks in code clone research. PLoS ONE 2024 , 19 , e0296858. [ Google Scholar ] [ CrossRef ]
  • Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning ; MIT Press: Cambridge, MA, USA, 2016. [ Google Scholar ]
  • Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016 , 28 , 2222–2232. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Al-Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From applications to modeling techniques and beyond—Systematic review. J. King Saud-Univ.-Comput. Inf. Sci. 2024 , 36 , 102068. [ Google Scholar ] [ CrossRef ]
  • Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent neural network regularization. arXiv 2014 , arXiv:1409.2329. [ Google Scholar ]
  • Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018 , arXiv:1803.01271. [ Google Scholar ]
  • Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018 , 8 , 6085. [ Google Scholar ] [ CrossRef ]
  • Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014 , arXiv:1412.3555. [ Google Scholar ]
  • Badawy, M.; Ramadan, N.; Hefny, H.A. Healthcare predictive analytics using machine learning and deep learning techniques: A survey. J. Electr. Syst. Inf. Technol. 2023 , 10 , 40. [ Google Scholar ] [ CrossRef ]
  • Ismaeel, A.G.; Janardhanan, K.; Sankar, M.; Natarajan, Y.; Mahmood, S.N.; Alani, S.; Shather, A.H. Traffic pattern classification in smart cities using deep recurrent neural network. Sustainability 2023 , 15 , 14522. [ Google Scholar ] [ CrossRef ]
  • Mers, M.; Yang, Z.; Hsieh, Y.A.; Tsai, Y. Recurrent neural networks for pavement performance forecasting: Review and model performance comparison. Transp. Res. Rec. 2023 , 2677 , 610–624. [ Google Scholar ] [ CrossRef ]
  • Chen, Y.; Cheng, Q.; Cheng, Y.; Yang, H.; Yu, H. Applications of recurrent neural networks in environmental factor forecasting: A review. Neural Comput. 2018 , 30 , 2855–2881. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Linardos, V.; Drakaki, M.; Tzionas, P.; Karnavas, Y.L. Machine learning in disaster management: Recent developments in methods and applications. Mach. Learn. Knowl. Extr. 2022 , 4 , 446–473. [ Google Scholar ] [ CrossRef ]
  • Zhang, J.; Liu, H.; Chang, Q.; Wang, L.; Gao, R.X. Recurrent neural network for motion trajectory prediction in human-robot collaborative assembly. CIRP Ann. 2020 , 69 , 9–12. [ Google Scholar ] [ CrossRef ]
  • Tsantekidis, A.; Passalis, N.; Tefas, A. Recurrent Neural Networks. In Deep Learning for Robot Perception and Cognition ; Elsevier: Amsterdam, The Netherlands, 2022; pp. 101–115. [ Google Scholar ]
  • Mienye, I.D.; Jere, N. Deep Learning for Credit Card Fraud Detection: A Review of Algorithms, Challenges, and Solutions. IEEE Access 2024 , 12 , 96893–96910. [ Google Scholar ] [ CrossRef ]
  • Mienye, I.D.; Sun, Y. A machine learning method with hybrid feature selection for improved credit card fraud detection. Appl. Sci. 2023 , 13 , 7254. [ Google Scholar ] [ CrossRef ]
  • Rezk, N.M.; Purnaprajna, M.; Nordström, T.; Ul-Abdin, Z. Recurrent neural networks: An embedded computing perspective. IEEE Access 2020 , 8 , 57967–57996. [ Google Scholar ] [ CrossRef ]
  • Yu, Y.; Adu, K.; Tashi, N.; Anokye, P.; Wang, X.; Ayidzoe, M.A. Rmaf: Relu-memristor-like activation function for deep learning. IEEE Access 2020 , 8 , 72727–72741. [ Google Scholar ] [ CrossRef ]
  • Mienye, I.D.; Ainah, P.K.; Emmanuel, I.D.; Esenogho, E. Sparse Noise Minimization in Image Classification using Genetic Algorithm and DenseNet. In Proceedings of the 2021 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 10–11 March 2021; pp. 103–108. [ Google Scholar ]
  • Ciaburro, G.; Venkateswaran, B. Neural Networks with R: SMART Models Using CNN, RNN, Deep Learning, and Artificial Intelligence Principles ; Packt Publishing Ltd.: Birmingham, UK, 2017. [ Google Scholar ]
  • Nwankpa, C.; Ijomah, W.; Gachagan, A.; Marshall, S. Activation functions: Comparison of trends in practice and research for deep learning. arXiv 2018 , arXiv:1811.03378. [ Google Scholar ]
  • Szandała, T. Review and comparison of commonly used activation functions for deep neural networks. Bio-Inspired Neurocomp. 2021 , 203–224. [ Google Scholar ]
  • Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv 2015 , arXiv:1511.07289. [ Google Scholar ]
  • Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022 , 503 , 92–108. [ Google Scholar ] [ CrossRef ]
  • Obaido, G.; Mienye, I.D.; Egbelowo, O.F.; Emmanuel, I.D.; Ogunleye, A.; Ogbuokiri, B.; Mienye, P.; Aruleba, K. Supervised machine learning in drug discovery and development: Algorithms, applications, challenges, and prospects. Mach. Learn. Appl. 2024 , 17 , 100576. [ Google Scholar ] [ CrossRef ]
  • Mienye, I.D.; Sun, Y. Effective Feature Selection for Improved Prediction of Heart Disease. In Proceedings of the Pan-African Artificial Intelligence and Smart Systems Conference, Durban, South Africa, 4–6 December 2021; pp. 94–107. [ Google Scholar ]
  • Martins, A.; Astudillo, R. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1614–1623. [ Google Scholar ]
  • Bianchi, F.M.; Maiorino, E.; Kampffmeyer, M.C.; Rizzi, A.; Jenssen, R.; Bianchi, F.M.; Maiorino, E.; Kampffmeyer, M.C.; Rizzi, A.; Jenssen, R. Properties and Training in Recurrent Neural Networks. In Recurrent Neural Networks for Short-Term Load Forecasting: An Overview and Comparative Analysis ; Springer: Berlin/Heidelberg, Germany, 2017; pp. 9–21. [ Google Scholar ]
  • Mohajerin, N.; Waslander, S.L. State Initialization for Recurrent Neural Network Modeling of Time-Series Data. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2330–2337. [ Google Scholar ]
  • Forgione, M.; Muni, A.; Piga, D.; Gallieri, M. On the adaptation of recurrent neural networks for system identification. Automatica 2023 , 155 , 111092. [ Google Scholar ] [ CrossRef ]
  • Zhang, J.; He, T.; Sra, S.; Jadbabaie, A. Why gradient clipping accelerates training: A theoretical justification for adaptivity. arXiv 2019 , arXiv:1905.11881. [ Google Scholar ]
  • Qian, J.; Wu, Y.; Zhuang, B.; Wang, S.; Xiao, J. Understanding Gradient Clipping in Incremental Gradient Methods. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual, 13–15 April 2021; pp. 1504–1512. [ Google Scholar ]
  • Fei, H.; Tan, F. Bidirectional grid long short-term memory (bigridlstm): A method to address context-sensitivity and vanishing gradient. Algorithms 2018 , 11 , 172. [ Google Scholar ] [ CrossRef ]
  • Dong, X.; Chowdhury, S.; Qian, L.; Li, X.; Guan, Y.; Yang, J.; Yu, Q. Deep learning for named entity recognition on Chinese electronic medical records: Combining deep transfer learning with multitask bi-directional LSTM RNN. PLoS ONE 2019 , 14 , e0216046. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chorowski, J.K.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attention-based models for speech recognition. Adv. Neural Inf. Process. Syst. 2015 , 28 . [ Google Scholar ]
  • Zhou, M.; Duan, N.; Liu, S.; Shum, H.Y. Progress in neural NLP: Modeling, learning, and reasoning. Engineering 2020 , 6 , 275–290. [ Google Scholar ] [ CrossRef ]
  • Naseem, U.; Razzak, I.; Khan, S.K.; Prasad, M. A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. Trans. Asian Low-Resour. Lang. Inf. Process. 2021 , 20 , 1–35. [ Google Scholar ] [ CrossRef ]
  • Adil, M.; Wu, J.Z.; Chakrabortty, R.K.; Alahmadi, A.; Ansari, M.F.; Ryan, M.J. Attention-based STL-BiLSTM network to forecast tourist arrival. Processes 2021 , 9 , 1759. [ Google Scholar ] [ CrossRef ]
  • Min, S.; Park, S.; Kim, S.; Choi, H.S.; Lee, B.; Yoon, S. Pre-training of deep bidirectional protein sequence representations with structural information. IEEE Access 2021 , 9 , 123912–123926. [ Google Scholar ] [ CrossRef ]
  • Jain, A.; Zamir, A.R.; Savarese, S.; Saxena, A. Structural-rnn: Deep Learning on Spatio-Temporal Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5308–5317. [ Google Scholar ]
  • Pascanu, R.; Gulcehre, C.; Cho, K.; Bengio, Y. How to construct deep recurrent neural networks. arXiv 2013 , arXiv:1312.6026. [ Google Scholar ]
  • Shi, H.; Xu, M.; Li, R. Deep learning for household load forecasting—A novel pooling deep RNN. IEEE Trans. Smart Grid 2017 , 9 , 5271–5280. [ Google Scholar ] [ CrossRef ]
  • Gal, Y.; Ghahramani, Z. A theoretically grounded application of dropout in recurrent neural networks. Adv. Neural Inf. Process. Syst. 2016 , 29 . [ Google Scholar ]
  • Moradi, R.; Berangi, R.; Minaei, B. A survey of regularization strategies for deep models. Artif. Intell. Rev. 2020 , 53 , 3947–3986. [ Google Scholar ] [ CrossRef ]
  • Salehin, I.; Kang, D.K. A review on dropout regularization approaches for deep neural networks within the scholarly domain. Electronics 2023 , 12 , 3106. [ Google Scholar ] [ CrossRef ]
  • Cai, S.; Shu, Y.; Chen, G.; Ooi, B.C.; Wang, W.; Zhang, M. Effective and efficient dropout for deep convolutional neural networks. arXiv 2019 , arXiv:1904.03392. [ Google Scholar ]
  • Garbin, C.; Zhu, X.; Marques, O. Dropout vs. batch normalization: An empirical study of their impact to deep learning. Multimed. Tools Appl. 2020 , 79 , 12777–12815. [ Google Scholar ] [ CrossRef ]
  • Borawar, L.; Kaur, R. ResNet: Solving Vanishing Gradient in Deep Networks. In Proceedings of the International Conference on Recent Trends in Computing: ICRTC 2022, Delhi, India, 3–4 June 2022; Springer: Berlin/Heidelberg, Germany, 2023; pp. 235–247. [ Google Scholar ]
  • Mienye, I.D.; Sun, Y. A deep learning ensemble with data resampling for credit card fraud detection. IEEE Access 2023 , 11 , 30628–30638. [ Google Scholar ] [ CrossRef ]
  • Kiperwasser, E.; Goldberg, Y. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Linguist. 2016 , 4 , 313–327. [ Google Scholar ] [ CrossRef ]
  • Zhang, W.; Li, H.; Tang, L.; Gu, X.; Wang, L.; Wang, L. Displacement prediction of Jiuxianping landslide using gated recurrent unit (GRU) networks. Acta Geotech. 2022 , 17 , 1367–1382. [ Google Scholar ] [ CrossRef ]
  • Cahuantzi, R.; Chen, X.; Güttel, S. A Comparison of LSTM and GRU Networks for Learning Symbolic Sequences. In Proceedings of the Science and Information Conference, Nanchang, China, 2–4 June 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 771–785. [ Google Scholar ]
  • Shewalkar, A.; Nyavanandi, D.; Ludwig, S.A. Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 2019 , 9 , 235–245. [ Google Scholar ] [ CrossRef ]
  • Vatanchi, S.M.; Etemadfard, H.; Maghrebi, M.F.; Shad, R. A comparative study on forecasting of long-term daily streamflow using ANN, ANFIS, BiLSTM and CNN-GRU-LSTM. Water Resour. Manag. 2023 , 37 , 4769–4785. [ Google Scholar ] [ CrossRef ]
  • Mateus, B.C.; Mendes, M.; Farinha, J.T.; Assis, R.; Cardoso, A.M. Comparing LSTM and GRU models to predict the condition of a pulp paper press. Energies 2021 , 14 , 6958. [ Google Scholar ] [ CrossRef ]
  • Gers, F.A.; Schmidhuber, J. Recurrent Nets That Time and Count. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, IJCNN 2000, Neural Computing: New Challenges and Perspectives for the New Millennium, Como, Italy, 24–27 July 2000; Volume 3, pp. 189–194. [ Google Scholar ]
  • Gers, F.A.; Schraudolph, N.N.; Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 2002 , 3 , 115–143. [ Google Scholar ]
  • Jaeger, H. Adaptive nonlinear system identification with echo state networks. Adv. Neural Inf. Process. Syst. 2002 , 15 , 593–600. [ Google Scholar ]
  • Ishaq, M.; Kwon, S. A CNN-Assisted deep echo state network using multiple Time-Scale dynamic learning reservoirs for generating Short-Term solar energy forecasting. Sustain. Energy Technol. Assessments 2022 , 52 , 102275. [ Google Scholar ]
  • Sun, C.; Song, M.; Cai, D.; Zhang, B.; Hong, S.; Li, H. A systematic review of echo state networks from design to application. IEEE Trans. Artif. Intell. 2022 , 5 , 23–37. [ Google Scholar ] [ CrossRef ]
  • Gallicchio, C.; Micheli, A. Deep echo state network (deepesn): A brief survey. arXiv 2017 , arXiv:1712.04323. [ Google Scholar ]
  • Gallicchio, C.; Micheli, A. Richness of Deep Echo State Network Dynamics. In Proceedings of the Advances in Computational Intelligence: 15th International Work-Conference on Artificial Neural Networks, IWANN 2019, Gran Canaria, Spain, 12–14 June 2019, Proceedings, Part I 15 ; Springer: Berlin/Heidelberg, Germany, 2019; pp. 480–491. [ Google Scholar ]
  • Hu, R.; Tang, Z.R.; Song, X.; Luo, J.; Wu, E.Q.; Chang, S. Ensemble echo network with deep architecture for time-series modeling. Neural Comput. Appl. 2021 , 33 , 4997–5010. [ Google Scholar ] [ CrossRef ]
  • Gao, R.; Li, R.; Hu, M.; Suganthan, P.N.; Yuen, K.F. Dynamic ensemble deep echo state network for significant wave height forecasting. Appl. Energy 2023 , 329 , 120261. [ Google Scholar ] [ CrossRef ]
  • Gao, R.; Du, L.; Duru, O.; Yuen, K.F. Time series forecasting based on echo state network and empirical wavelet transformation. Appl. Soft Comput. 2021 , 102 , 107111. [ Google Scholar ] [ CrossRef ]
  • Li, S.; Li, W.; Cook, C.; Zhu, C.; Gao, Y. Independently Recurrent Neural Network (indrnn): Building a Longer and Deeper rnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5457–5466. [ Google Scholar ]
  • Yang, J.; Qu, J.; Mi, Q.; Li, Q. A CNN-LSTM model for tailings dam risk prediction. IEEE Access 2020 , 8 , 206491–206502. [ Google Scholar ] [ CrossRef ]
  • Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Chen, X.; Wang, X. A comprehensive survey of neural architecture search: Challenges and solutions. ACM Comput. Surv. (CSUR) 2021 , 54 , 1–34. [ Google Scholar ] [ CrossRef ]
  • Mellor, J.; Turner, J.; Storkey, A.; Crowley, E.J. Neural Architecture Search without Training. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 7588–7598. [ Google Scholar ]
  • Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv 2016 , arXiv:1611.01578. [ Google Scholar ]
  • Chen, X.; Wu, S.Z.; Hong, M. Understanding gradient clipping in private sgd: A geometric perspective. Adv. Neural Inf. Process. Syst. 2020 , 33 , 13773–13782. [ Google Scholar ]
  • Zhang, Z. Improved Adam Optimizer for Deep Neural Networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2. [ Google Scholar ]
  • De Santana Correia, A.; Colombini, E.L. Attention, please! A survey of neural attention models in deep learning. Artif. Intell. Rev. 2022 , 55 , 6037–6124. [ Google Scholar ] [ CrossRef ]
  • Lin, J.; Ma, J.; Zhu, J.; Cui, Y. Short-term load forecasting based on LSTM networks considering attention mechanism. Int. J. Electr. Power Energy Syst. 2022 , 137 , 107818. [ Google Scholar ] [ CrossRef ]
  • Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An attentive survey of attention models. ACM Trans. Intell. Syst. Technol. (TIST) 2021 , 12 , 1–32. [ Google Scholar ] [ CrossRef ]
  • Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014 , arXiv:1409.0473. [ Google Scholar ]
  • Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015 , arXiv:1508.04025. [ Google Scholar ]
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017 , 30 . [ Google Scholar ]
  • Marcus, M.P.; Marcinkiewicz, M.A.; Santorini, B. Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 1993 , 19 , 313–330. [ Google Scholar ]
  • Maas, A.L.; Daly, R.E.; Pham, P.T.; Huang, D.; Ng, A.Y.; Potts, C. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 142–150. [ Google Scholar ]
  • LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998 , 86 , 2278–2324. [ Google Scholar ] [ CrossRef ]
  • Garofolo, J.S.; Lamel, L.F.; Fisher, W.M.; Fiscus, J.G.; Pallett, D.S. TIMIT acoustic-phonetic continuous speech corpus. Linguist. Data Consort. 1993 , 93 , 27403. [ Google Scholar ]
  • Lewis, D. Reuters-21578 Text Categorization Test Collection ; Distribution 1.0; AT&T Labs-Research: Atlanta, GA, USA, 1997. [ Google Scholar ]
  • Dua, D.; Graff, C. UCI Machine Learning Repository ; School of Information and Computer Science, University of California: Irvine, CA, USA, 2017. [ Google Scholar ]
  • Lomonaco, V.; Maltoni, D. Core50: A New Dataset and Benchmark for Continuous Object Recognition. In Proceedings of the Conference on Robot Learning. PMLR, Mountain View, CA, USA, 13–15 November 2017; pp. 17–26. [ Google Scholar ]
  • Souri, A.; El Maazouzi, Z.; Al Achhab, M.; El Mohajir, B.E. Arabic Text Generation using Recurrent Neural Networks. In Proceedings of the Big Data, Cloud and Applications: Third International Conference, BDCA 2018, Kenitra, Morocco, 4–5 April 2018 ; Revised Selected Papers 3; Springer: Berlin/Heidelberg, Germany, 2018; pp. 523–533. [ Google Scholar ]
  • Islam, M.S.; Mousumi, S.S.S.; Abujar, S.; Hossain, S.A. Sequence-to-sequence Bangla sentence generation with LSTM recurrent neural networks. Procedia Comput. Sci. 2019 , 152 , 51–58. [ Google Scholar ] [ CrossRef ]
  • Gajendran, S.; Manjula, D.; Sugumaran, V. Character level and word level embedding with bidirectional LSTM–Dynamic recurrent neural network for biomedical named entity recognition from literature. J. Biomed. Inform. 2020 , 112 , 103609. [ Google Scholar ] [ CrossRef ]
  • Hu, H.; Liao, M.; Mao, W.; Liu, W.; Zhang, C.; Jing, Y. Variational Auto-Encoder for Text Generation. In Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 12–14 June 2020; pp. 595–598. [ Google Scholar ]
  • Holtzman, A.; Buys, J.; Du, L.; Forbes, M.; Choi, Y. The curious case of neural text degeneration. arXiv 2019 , arXiv:1904.09751. [ Google Scholar ]
  • Yin, W.; Schütze, H. Attentive convolution: Equipping cnns with rnn-style attention mechanisms. Trans. Assoc. Comput. Linguist. 2018 , 6 , 687–702. [ Google Scholar ] [ CrossRef ]
  • Hussein, M.A.H.; Savaş, S. LSTM-Based Text Generation: A Study on Historical Datasets. arXiv 2024 , arXiv:2403.07087. [ Google Scholar ]
  • Baskaran, S.; Alagarsamy, S.; S, S.; Shivam, S. Text Generation using Long Short-Term Memory. In Proceedings of the 2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), Krishnankoil, India, 14–16 March 2024; pp. 1–6. [ Google Scholar ] [ CrossRef ]
  • Keskar, N.S.; McCann, B.; Varshney, L.R.; Xiong, C.; Socher, R. Ctrl: A conditional transformer language model for controllable generation. arXiv 2019 , arXiv:1909.05858. [ Google Scholar ]
  • Guo, H. Generating text with deep reinforcement learning. arXiv 2015 , arXiv:1510.09202. [ Google Scholar ]
  • Yadav, V.; Verma, P.; Katiyar, V. Long short term memory (LSTM) model for sentiment analysis in social data for e-commerce products reviews in Hindi languages. Int. J. Inf. Technol. 2023 , 15 , 759–772. [ Google Scholar ] [ CrossRef ]
  • Abimbola, B.; de La Cal Marin, E.; Tan, Q. Enhancing Legal Sentiment Analysis: A Convolutional Neural Network–Long Short-Term Memory Document-Level Model. Mach. Learn. Knowl. Extr. 2024 , 6 , 877–897. [ Google Scholar ] [ CrossRef ]
  • Zulqarnain, M.; Ghazali, R.; Aamir, M.; Hassim, Y.M.M. An efficient two-state GRU based on feature attention mechanism for sentiment analysis. Multimed. Tools Appl. 2024 , 83 , 3085–3110. [ Google Scholar ] [ CrossRef ]
  • Pujari, P.; Padalia, A.; Shah, T.; Devadkar, K. Hybrid CNN and RNN for Twitter Sentiment Analysis. In Proceedings of the International Conference on Smart Computing and Communication ; Springer: Berlin/Heidelberg, Germany, 2024; pp. 297–310. [ Google Scholar ]
  • Wankhade, M.; Annavarapu, C.S.R.; Abraham, A. CBMAFM: CNN-BiLSTM multi-attention fusion mechanism for sentiment classification. Multimed. Tools Appl. 2024 , 83 , 51755–51786. [ Google Scholar ] [ CrossRef ]
  • Sangeetha, J.; Kumaran, U. A hybrid optimization algorithm using BiLSTM structure for sentiment analysis. Meas. Sensors 2023 , 25 , 100619. [ Google Scholar ] [ CrossRef ]
  • He, R.; McAuley, J. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 507–517. [ Google Scholar ]
  • Samir, A.; Elkaffas, S.M.; Madbouly, M.M. Twitter Sentiment Analysis using BERT. In Proceedings of the 2021 31st International Conference on Computer Theory and Applications (ICCTA), Kochi, Kerala, India, 17–19 August 2021; pp. 182–186. [ Google Scholar ]
  • Prottasha, N.J.; Sami, A.A.; Kowsher, M.; Murad, S.A.; Bairagi, A.K.; Masud, M.; Baz, M. Transfer learning for sentiment analysis using BERT based supervised fine-tuning. Sensors 2022 , 22 , 4157. [ Google Scholar ] [ CrossRef ]
  • Mujahid, M.; Rustam, F.; Shafique, R.; Chunduri, V.; Villar, M.G.; Ballester, J.B.; Diez, I.d.l.T.; Ashraf, I. Analyzing sentiments regarding ChatGPT using novel BERT: A machine learning approach. Information 2023 , 14 , 474. [ Google Scholar ] [ CrossRef ]
  • Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016 , arXiv:1609.08144. [ Google Scholar ]
  • Sennrich, R.; Haddow, B.; Birch, A. Neural machine translation of rare words with subword units. arXiv 2015 , arXiv:1508.07909. [ Google Scholar ]
  • Kang, L.; He, S.; Wang, M.; Long, F.; Su, J. Bilingual attention based neural machine translation. Appl. Intell. 2023 , 53 , 4302–4315. [ Google Scholar ] [ CrossRef ]
  • Yang, Z.; Dai, Z.; Salakhutdinov, R.; Cohen, W.W. Breaking the softmax bottleneck: A high-rank RNN language model. arXiv 2017 , arXiv:1711.03953. [ Google Scholar ]
  • Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. Mass: Masked sequence to sequence pre-training for language generation. arXiv 2019 , arXiv:1905.02450. [ Google Scholar ]
  • Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.r.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 2012 , 29 , 82–97. [ Google Scholar ] [ CrossRef ]
  • Hannun, A.; Case, C.; Casper, J.; Catanzaro, B.; Diamos, G.; Elsen, E.; Prenger, R.; Satheesh, S.; Sengupta, S.; Coates, A.; et al. Deep speech: Scaling up end-to-end speech recognition. arXiv 2014 , arXiv:1412.5567. [ Google Scholar ]
  • Amodei, D.; Ananthanarayanan, S.; Anubhai, R.; Bai, J.; Battenberg, E.; Case, C.; Casper, J.; Catanzaro, B.; Cheng, Q.; Chen, G.; et al. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 173–182. [ Google Scholar ]
  • Chiu, C.C.; Sainath, T.N.; Wu, Y.; Prabhavalkar, R.; Nguyen, P.; Chen, Z.; Kannan, A.; Weiss, R.J.; Rao, K.; Gonina, E.; et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 15–20 April 2018; pp. 4774–4778. [ Google Scholar ]
  • Zhang, Y.; Chan, W.; Jaitly, N. Very Deep Convolutional Networks for End-to-End Speech Recognition. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 4845–4849. [ Google Scholar ]
  • Dong, L.; Xu, S.; Xu, B. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 5884–5888. [ Google Scholar ]
  • Bhaskar, S.; Thasleema, T. LSTM model for visual speech recognition through facial expressions. Multimed. Tools Appl. 2023 , 82 , 5455–5472. [ Google Scholar ] [ CrossRef ]
  • Daouad, M.; Allah, F.A.; Dadi, E.W. An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture. Int. J. Speech Technol. 2023 , 26 , 775–787. [ Google Scholar ]
  • Dhanjal, A.S.; Singh, W. A comprehensive survey on automatic speech recognition using neural networks. Multimed. Tools Appl. 2024 , 83 , 23367–23412. [ Google Scholar ] [ CrossRef ]
  • Nasr, S.; Duwairi, R.; Quwaider, M. End-to-end speech recognition for arabic dialects. Arab. J. Sci. Eng. 2023 , 48 , 10617–10633. [ Google Scholar ] [ CrossRef ]
  • Kumar, D.; Aziz, S. Performance Evaluation of Recurrent Neural Networks-LSTM and GRU for Automatic Speech Recognition. In Proceedings of the 2023 International Conference on Computer, Electronics & Electrical Engineering & Their Applications (IC2E3), Srinagar Garhwal, India, 8–9 June 2023; pp. 1–6. [ Google Scholar ]
  • Fischer, T.; Krauss, C. Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 2018 , 270 , 654–669. [ Google Scholar ] [ CrossRef ]
  • Nelson, D.M.; Pereira, A.C.; De Oliveira, R.A. Stock Market’s Price Movement Prediction with LSTM Neural Networks. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1419–1426. [ Google Scholar ]
  • Luo, A.; Zhong, L.; Wang, J.; Wang, Y.; Li, S.; Tai, W. Short-term stock correlation forecasting based on CNN-BiLSTM enhanced by attention mechanism. IEEE Access 2024 , 12 , 29617–29632. [ Google Scholar ] [ CrossRef ]
  • Bao, W.; Yue, J.; Rao, Y. A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS ONE 2017 , 12 , e0180944. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Feng, F.; Chen, H.; He, X.; Ding, J.; Sun, M.; Chua, T.S. Enhancing Stock Movement Prediction with Adversarial Training. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; Volume 19, pp. 5843–5849. [ Google Scholar ]
  • Rundo, F. Deep LSTM with reinforcement learning layer for financial trend prediction in FX high frequency trading systems. Appl. Sci. 2019 , 9 , 4460. [ Google Scholar ] [ CrossRef ]
  • Devi, T.; Deepa, N.; Gayathri, N.; Rakesh Kumar, S. AI-Based Weather Forecasting System for Smart Agriculture System Using a Recurrent Neural Networks (RNN) Algorithm. Sustain. Manag. Electron. Waste 2024 , 97–112. [ Google Scholar ]
  • Anshuka, A.; Chandra, R.; Buzacott, A.J.; Sanderson, D.; van Ogtrop, F.F. Spatio temporal hydrological extreme forecasting framework using LSTM deep learning model. Stoch. Environ. Res. Risk Assess. 2022 , 36 , 3467–3485. [ Google Scholar ] [ CrossRef ]
  • Marulanda, G.; Cifuentes, J.; Bello, A.; Reneses, J. A hybrid model based on LSTM neural networks with attention mechanism for short-term wind power forecasting. Wind. Eng. 2023 , 0309524X231191163. [ Google Scholar ] [ CrossRef ]
  • Chen, W.; An, N.; Jiang, M.; Jia, L. An improved deep temporal convolutional network for new energy stock index prediction. Inf. Sci. 2024 , 682 , 121244. [ Google Scholar ] [ CrossRef ]
  • Hasanat, S.M.; Younis, R.; Alahmari, S.; Ejaz, M.T.; Haris, M.; Yousaf, H.; Watara, S.; Ullah, K.; Ullah, Z. Enhancing Load Forecasting Accuracy in Smart Grids: A Novel Parallel Multichannel Network Approach Using 1D CNN and Bi-LSTM Models. Int. J. Energy Res. 2024 , 2024 , 2403847. [ Google Scholar ] [ CrossRef ]
  • Asiri, M.M.; Aldehim, G.; Alotaibi, F.; Alnfiai, M.M.; Assiri, M.; Mahmud, A. Short-term load forecasting in smart grids using hybrid deep learning. IEEE Access 2024 , 12 , 23504–23513. [ Google Scholar ] [ CrossRef ]
  • Yıldız Doğan, G.; Aksoy, A.; Öztürk, N. A Hybrid Deep Learning Model to Estimate the Future Electricity Demand of Sustainable Cities. Sustainability 2024 , 16 , 6503. [ Google Scholar ] [ CrossRef ]
  • Bhambu, A.; Gao, R.; Suganthan, P.N. Recurrent ensemble random vector functional link neural network for financial time series forecasting. Appl. Soft Comput. 2024 , 161 , 111759. [ Google Scholar ] [ CrossRef ]
  • Mienye, E.; Jere, N.; Obaido, G.; Mienye, I.D.; Aruleba, K. Deep Learning in Finance: A Survey of Applications and Techniques. Preprints 2024 . [ Google Scholar ] [ CrossRef ]
  • Mastoi, Q.U.A.; Wah, T.Y.; Gopal Raj, R. Reservoir computing based echo state networks for ventricular heart beat classification. Appl. Sci. 2019 , 9 , 702. [ Google Scholar ] [ CrossRef ]
  • Valin, J.M.; Tenneti, S.; Helwani, K.; Isik, U.; Krishnaswamy, A. Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based on Percepnet. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 7133–7137. [ Google Scholar ]
  • Li, Y.; Huang, C.; Ding, L.; Li, Z.; Pan, Y.; Gao, X. Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 2019 , 166 , 4–21. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Qiao, S.; Ji, S.; Li, Y. DeepSite: Bidirectional LSTM and CNN models for predicting DNA–protein binding. Int. J. Mach. Learn. Cybern. 2020 , 11 , 841–851. [ Google Scholar ] [ CrossRef ]
  • Xu, J.; Mcpartlon, M.; Li, J. Improved protein structure prediction by deep learning irrespective of co-evolution information. Nat. Mach. Intell. 2021 , 3 , 601–609. [ Google Scholar ] [ CrossRef ]
  • Yadav, S.; Ekbal, A.; Saha, S.; Kumar, A.; Bhattacharyya, P. Feature assisted stacked attentive shortest dependency path based Bi-LSTM model for protein–protein interaction. Knowl.-Based Syst. 2019 , 166 , 18–29. [ Google Scholar ] [ CrossRef ]
  • Aybey, E.; Gümüş, Ö. SENSDeep: An ensemble deep learning method for protein–protein interaction sites prediction. Interdiscip. Sci. Comput. Life Sci. 2023 , 15 , 55–87. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, Z.; Du, X.; Cao, Y. DAT-RNN: Trajectory Prediction with Diverse Attention. In Proceedings of the 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 14–17 December 2020; pp. 1512–1518. [ Google Scholar ]
  • Lee, M.j.; Ha, Y.g. Autonomous Driving Control Using End-to-End Deep Learning. In Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Republic of Korea, 19–22 February 2020; pp. 470–473. [ Google Scholar ] [ CrossRef ]
  • Codevilla, F.; Müller, M.; López, A.; Koltun, V.; Dosovitskiy, A. End-to-End Driving via Conditional Imitation Learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 4693–4700. [ Google Scholar ]
  • Altché, F.; de La Fortelle, A. An LSTM Network for Highway Trajectory Prediction. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Abu Dhabi, United Arab Emirates, 25–28 October 2017; pp. 353–359. [ Google Scholar ]
  • Li, P.; Zhang, Y.; Yuan, L.; Xiao, H.; Lin, B.; Xu, X. Efficient long-short temporal attention network for unsupervised video object segmentation. Pattern Recognit. 2024 , 146 , 110078. [ Google Scholar ] [ CrossRef ]
  • Li, R.; Shu, X.; Li, C. Driving Behavior Prediction Based on Combined Neural Network Model. IEEE Trans. Comput. Soc. Syst. 2024 , 11 , 4488–4496. [ Google Scholar ] [ CrossRef ]
  • Liu, Y.; Diao, S. An automatic driving trajectory planning approach in complex traffic scenarios based on integrated driver style inference and deep reinforcement learning. PLoS ONE 2024 , 19 , e0297192. [ Google Scholar ] [ CrossRef ]
  • Altindal, M.C.; Nivlet, P.; Tabib, M.; Rasheed, A.; Kristiansen, T.G.; Khosravanian, R. Anomaly detection in multivariate time series of drilling data. Geoenergy Sci. Eng. 2024 , 237 , 212778. [ Google Scholar ] [ CrossRef ]
  • Matar, M.; Xia, T.; Huguenard, K.; Huston, D.; Wshah, S. Multi-Head Attention Based bi-lstm for Anomaly Detection in Multivariate Time-Series of wsn. In Proceedings of the 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hangzhou, China, 11–13 June 2023; pp. 1–5. [ Google Scholar ]
  • Kumaresan, S.J.; Senthilkumar, C.; Kongkham, D.; Beenarani, B.; Nirmala, P. Investigating the Effectiveness of Recurrent Neural Networks for Network Anomaly Detection. In Proceedings of the 2024 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), Bangalore, India, 24–25 January 2024; pp. 1–5. [ Google Scholar ]
  • Li, E.; Bedi, S.; Melek, W. Anomaly detection in three-axis CNC machines using LSTM networks and transfer learning. Int. J. Adv. Manuf. Technol. 2023 , 127 , 5185–5198. [ Google Scholar ] [ CrossRef ]
  • Minic, A.; Jovanovic, L.; Bacanin, N.; Stoean, C.; Zivkovic, M.; Spalevic, P.; Petrovic, A.; Dobrojevic, M.; Stoean, R. Applying recurrent neural networks for anomaly detection in electrocardiogram sensor data. Sensors 2023 , 23 , 9878. [ Google Scholar ] [ CrossRef ]
  • Zhou, C.; Paffenroth, R.C. Anomaly Detection with Robust Deep Autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 665–674. [ Google Scholar ]
  • Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q. Time-Series Anomaly Detection Service at Microsoft. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 3009–3017. [ Google Scholar ]
  • Munir, M.; Siddiqui, S.A.; Dengel, A.; Ahmed, S. DeepAnT: A deep learning approach for unsupervised anomaly detection in time series. IEEE Access 2018 , 7 , 1991–2005. [ Google Scholar ] [ CrossRef ]
  • Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent neural networks for time series forecasting: Current status and future directions. Int. J. Forecast. 2021 , 37 , 388–427. [ Google Scholar ] [ CrossRef ]
  • Ahmed, S.F.; Alam, M.S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023 , 56 , 13521–13617. [ Google Scholar ] [ CrossRef ]
  • Li, X.; Qin, T.; Yang, J.; Liu, T.Y. LightRNN: Memory and computation-efficient recurrent neural networks. Adv. Neural Inf. Process. Syst. 2016 , 29 . [ Google Scholar ]
  • Katharopoulos, A.; Vyas, A.; Pappas, N.; Fleuret, F. Transformers Are rnns: Fast Autoregressive Transformers with Linear Attention. In Proceedings of the International Conference on Machine Learning, Virtual, 12–18 July 2020; pp. 5156–5165. [ Google Scholar ]
  • Shao, W.; Li, B.; Yu, W.; Xu, J.; Wang, H. When Is It Likely to Fail? Performance Monitor for Black-Box Trajectory Prediction Model. IEEE Trans. Autom. Sci. Eng. 2024 , 4 , 765–772. [ Google Scholar ] [ CrossRef ]
  • Jacobs, W.R.; Kadirkamanathan, V.; Anderson, S.R. Interpretable deep learning for nonlinear system identification using frequency response functions with ensemble uncertainty quantification. IEEE Access 2024 , 12 , 11052–11065. [ Google Scholar ] [ CrossRef ]
  • Mamalakis, M.; Mamalakis, A.; Agartz, I.; Mørch-Johnsen, L.E.; Murray, G.; Suckling, J.; Lio, P. Solving the enigma: Deriving optimal explanations of deep networks. arXiv 2024 , arXiv:2405.10008. [ Google Scholar ]
  • Shah, M.; Sureja, N. A Comprehensive Review of Bias in Deep Learning Models: Methods, Impacts, and Future Directions. Arch. Comput. Methods Eng. 2024 , 1–13. [ Google Scholar ] [ CrossRef ]
  • Goethals, S.; Calders, T.; Martens, D. Beyond Accuracy-Fairness: Stop evaluating bias mitigation methods solely on between-group metrics. arXiv 2024 , arXiv:2401.13391. [ Google Scholar ]
  • Weerts, H.; Pfisterer, F.; Feurer, M.; Eggensperger, K.; Bergman, E.; Awad, N.; Vanschoren, J.; Pechenizkiy, M.; Bischl, B.; Hutter, F. Can fairness be automated? Guidelines and opportunities for fairness-aware AutoML. J. Artif. Intell. Res. 2024 , 79 , 639–677. [ Google Scholar ] [ CrossRef ]
  • Bai, Y.; Geng, X.; Mangalam, K.; Bar, A.; Yuille, A.L.; Darrell, T.; Malik, J.; Efros, A.A. Sequential Modeling Enables Scalable Learning for Large Vision Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle WA, USA, 17–21 June 2024; pp. 22861–22872. [ Google Scholar ]
  • Taye, M.M. Understanding of machine learning with deep learning: Architectures, workflow, applications and future directions. Computers 2023 , 12 , 91. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

ReferenceYearDescription
Zaremba et al. [ ]2014Insights into RNNs in language modeling
Chung et al. [ ]2014Survey of advancements in RNN training, optimization, and architectures
Goodfellow et al. [ ]2016Review on deep learning, including RNNs
Greff et al. [ ]2016Extensive comparison of LSTM variants
Tarwani et al. [ ]2017In-depth analysis of RNNs in NLP
Chen et al. [ ]2018Effectiveness of RNNs in environmental monitoring and climate modeling
Bai et al. [ ]2018Comparison of RNNs with other sequence modeling techniques like CNNs and attention mechanisms
Che et al. [ ]2018Potential of RNNs in medical applications
Zhang et al. [ ]2020RNN applications in robotics, including path planning, motion control, and human–robot interaction
Dutta et al. [ ]2022Overview of RNNs, challenges in training, and advancements in LSTM and GRU for sequence learning
Linardos et al. [ ]2022RNNs for early warning systems, disaster response, and recovery planning in natural disaster prediction
Badawy et al. [ ]2023Integration of RNNs with other ML techniques for predictive analytics and patient monitoring in healthcare
Ismaeel et al. [ ]2023Application of RNNs in smart city technologies, including traffic prediction, energy management, and urban planning
Mers et al. [ ]2023Performance comparison of various RNN models in pavement performance forecasting
Quradaa et al. [ ]2024Start-of-the-art review of RNNs, covering core architectures with a focus on applications in code clones
Al-Selwi et al. [ ]2024Review of LSTM applications from 2018 to 2023
RNN TypeKey FeaturesGradient StabilityTypical Applications
Basic RNNSimple structure with short-term memoryHigh risk of vanishing gradientsSimple sequence tasks like text generation
LSTMLong-term memory with input, forget, and output gatesStable, handles vanishing gradients wellLanguage translation, speech recognition
GRUSimplified LSTM with fewer gatesStable, handles vanishing gradients effectivelyTasks requiring faster training than LSTM
Bidirectional RNNProcesses data in both forward and backward directions for better context understandingMedium stability, depends on depthSpeech recognition and sentiment analysis
Deep RNNMultiple RNN layers are stacked to learn hierarchical featuresVariable, and the risk of vanishing gradients increases with depthComplex sequence modeling like video processing
ESNFixed hidden layer weights, trained only at the outputNot applicable as training bypasses typical gradient issuesTime series prediction and system control
Peephole LSTMAdds peephole connections to LSTM gatesStable and similar to LSTMRecognition of complex temporal patterns like musical notation
IndRNNAllows training of deeper networks by maintaining independence between time stepsReduces risk of vanishing and exploding gradientsVery long sequences, such as in video processing or long text generation
Dataset NameApplicationDescription
Penn
Treebank [ ]
Natural
language processing
A corpus of English sentences annotated for part-of-speech tagging, parsing, and named entity recognition; widely used for language modeling with RNNs
IMDB
Reviews [ ]
Sentiment analysisA dataset of movie reviews used for binary sentiment classification; suitable for studying the effectiveness of RNNs in text sentiment classification tasks
MNIST
Sequential [ ]
Image recognitionA version of the MNIST dataset formatted as sequences for studying sequence-to-sequence learning with RNNs
TIMIT Speech
Corpus [ ]
Speech recognitionAn annotated speech database used for automatic speech recognition systems
Reuters-21578
Text
Categorization
Collection [ ]
Text categorizationA collection of newswire articles that is a common benchmark for text categorization and NLP tasks with RNNs
UCI ML Repository: Time Series Data [ ]Time series analysisContains various time series datasets, including stock prices and weather data, ideal for forecasting with RNNs.
CORe50 Dataset [ ]Object RecognitionUsed for continuous object recognition, ideal for RNN models dealing with video input sequences where object persistence and temporal context are important
Application DomainReferenceYearMethods and Application
Text generationSouri et al. [ ]2018RNNs for generating coherent and contextually relevant Arabic text
Holtzman et al. [ ]2019Controlled text generation using RNNs for style and content control
Hu et al. [ ]2020VAEs combined with RNNs to enhance creativity in text generation
Gajendran et al. [ ]2020Character-level text generation using BiLSTM for various tasks
Hussein and Savas [ ]2024LSTM for text generation
Baskaran et al. [ ]2024LSTM for text generation, achieving excellent performance
Islam [ ]2019Sequence-to-sequence framework using LSTM for improved text generation quality
Yin et al. [ ]2018Attention mechanisms with RNNs for improved text generation quality
Guo [ ]2015Integration of reinforcement learning with RNNs for text generation
Keskar et al. [ ]2019Conditional Transformer Language (CTRL) for generating text in various styles
Sentiment analysisHe and McAuley [ ]2016Adversarial training framework for robustness in sentiment analysis
Pujari et al. [ ]2024Hybrid CNN-RNN model for sentiment classification
Wankhade et al. [ ]2024Fusion of CNN and BiLSTM with attention mechanism for sentiment classification
Sangeetha and Kumaran [ ]2023BiLSTM for sentiment analysis by processing text in both directions
Yadav et al. [ ]2023LSTM-based models for sentiment analysis in customer reviews and social media posts
Zulqarnain et al. [ ]2024Attention mechanisms and GRU for enhanced sentiment analysis
Samir et al. [ ]2021Use of pre-trained models like BERT for sentiment analysis
Prottasha et al. [ ]2022Transfer learning with BERT and GPT for sentiment analysis
Abimbola et al. [ ]2024Hybrid LSTM-CNN model for document-level sentiment classification
Mujahid et al. [ ]2023Analyzing sentiment with pre-trained models fine-tuned for specific tasks
Machine TranslationSennrich et al. [ ]2015Byte-Pair Encoding for handling rare words in translation models
Wu et al. [ ]2016Google Neural Machine Translation with deep RNNs for improved accuracy
Vaswani et al. [ ]2017Fully attention-based transformer models for superior translation performance
Yang et al. [ ]2017Hybrid model integrating RNNs into the transformer architecture
Song et al. [ ]2019Incorporating BERT into translation models for enhanced understanding and fluency
Kang et al. [ ]2023Bilingual attention-based machine translation model combining RNN with attention
Zulqarnain et al. [ ]2024Multi-stage feature attention mechanism model using GRU
Application DomainReferenceYearMethods and Application
Hinton et al. [ ]2012Deep neural networks, including RNNs, for speech-to-text systems
Hannun et al. [ ]2014DeepSpeech: LSTM-based speech recognition system
Amodei et al. [ ]2016DeepSpeech2: Enhanced LSTM-based speech recognition with bidirectional RNNs
Zhang et al. [ ]2017Convolutional RNN for robust speech recognition
Chiu et al. [ ]2018RNN-transducer models for end-to-end speech recognition
Dong et al. [ ]2018Speech-Transformer: Leveraging self-attention for better processing of audio sequences
Bhaskar and Thasleema [ ]2023LSTM for visual speech recognition using facial expressions
Daouad et al. [ ]2023Various RNN variants for automatic speech recognition
Nasr et al. [ ]2023End-to-end speech recognition using RNNs
Kumar et al. [ ]2023Performance evaluation of RNNs in speech recognition tasks
Dhanjal et al. [ ]2024Comprehensive study of different RNN models for speech recognition
Nelson et al. [ ]2017Hybrid CNN-RNN model for stock price prediction
Bao et al. [ ]2017Combining LSTM with stacked autoencoders for financial time series forecasting
Fischer and Krauss [ ]2018Deep RNNs for predicting stock returns, outperforming traditional ML models
Feng et al. [ ]2019Transfer learning with RNNs for stock prediction
Rundo [ ]2019Combining reinforcement learning with LSTM for trading strategy development
Devi et al. [ ]2024RNN-based model for weather prediction and capturing sequential dependencies in meteorological data
Anshuka et al. [ ]2022LSTM networks for predicting extreme weather events by learning complex temporal patterns
Lin et al. [ ]2022Integrating attention mechanisms with LSTM for enhanced weather forecasting accuracy
Marulanda et al. [ ]2023LSTM model for short-term wind power forecasting and improving prediction accuracy
Chen et al. [ ]2024Bidirectional GRU with TCNs for energy time series forecasting
Hasanat et al. [ ]2024RNNs for forecasting energy demand in smart grids and optimizing renewable energy integration
Asiri et al. [ ]2024Short-term renewable energy predictions using RNN-based models
Yildiz et al. [ ]2024Hybrid model of LSTM with CNN for accurate electricity demand prediction
Luo et al. [ ]2024Attention-based CNN-BiLSTM model for improved financial forecasting
Gao et al. [ ]2023Dynamic ensemble deep ESN for wave height forecasting
Bhambu et al. [ ]2024Recurrent ensemble deep random vector functional link neural network for financial time series forecasting
Application DomainReferenceYearMethods and Application
Signal processingMastoi et al. [ ]2019ESNs for real-time heart rate variability monitoring
Valin et al. [ ]2021ESNs for speech signal enhancement in noisy environments
Gao et al. [ ]2021EWT integrated with ESNs for enhanced time series forecasting
BioinformaticsLi et al. [ ]2019RNNs for gene prediction and protein-structure prediction
Zhang et al. [ ]2020Bidirectional LSTM for predicting DNA-binding protein sequences
Xu et al. [ ]2021RNN-based model for predicting protein secondary structures
Yadav et al. [ ]2019Combining BiLSTM with CNNs for protein sequence analysis
Aybey et al. [ ]2023Ensemble model for predicting protein–protein interactions
Autonomous vehiclesAltché and de La Fortelle [ ]2017LSTM for predicting the future trajectories of vehicles
Codevilla et al. [ ]2018RNNs with imitation learning for autonomous driving
Li et al. [ ]2020RNNs for path planning and object detection
Lee et al. [ ]2020Integrating LSTM with CNN for end-to-end autonomous driving
Li et al. [ ]2024Attention-based LSTM for video object tracking
Liu and Diao [ ]2024GRU with deep reinforcement learning for decision-making
Anomaly detectionZhou and Paffenroth [ ]2017RNNs in unsupervised anomaly detection with deep autoencoders
Munir et al. [ ]2018Hybrid CNN-RNN model for anomaly detection in time series
Ren et al. [ ]2019Attention-based RNN model for anomaly detection
Li et al. [ ]2023RNNs with Transfer learning for anomaly detection in manufacturing
Mini et al. [ ]2023RNNs for detecting anomalies in ECG signals
Matar et al. [ ]2023BiLSTM for anomaly detection in multivariate time series
Kumaresan et al. [ ]2024RNNs for detecting network traffic anomalies
Altindal et al. [ ]2024LSTM for anomaly detection in time series data
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Mienye, I.D.; Swart, T.G.; Obaido, G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information 2024 , 15 , 517. https://doi.org/10.3390/info15090517

Mienye ID, Swart TG, Obaido G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information . 2024; 15(9):517. https://doi.org/10.3390/info15090517

Mienye, Ibomoiye Domor, Theo G. Swart, and George Obaido. 2024. "Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications" Information 15, no. 9: 517. https://doi.org/10.3390/info15090517

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

We use cookies to enhance our website for you. Proceed if you agree to this policy or learn more about it.

  • Essay Database >
  • Essays Samples >
  • Essay Types >
  • Research Paper Example

Speech Research Papers Samples For Students

391 samples of this type

If you're looking for a workable method to simplify writing a Research Paper about Speech, WowEssays.com paper writing service just might be able to help you out.

For starters, you should browse our extensive database of free samples that cover most diverse Speech Research Paper topics and showcase the best academic writing practices. Once you feel that you've determined the basic principles of content structuring and taken away actionable ideas from these expertly written Research Paper samples, putting together your own academic work should go much easier.

However, you might still find yourself in a situation when even using top-notch Speech Research Papers doesn't let you get the job accomplished on time. In that case, you can contact our writers and ask them to craft a unique Speech paper according to your individual specifications. Buy college research paper or essay now!

Free Patrick Henrys Speech in March 1775 Research Paper Example

Research paper on martin luther king jr, good franklin delano roosevelt: master communicator research paper example.

Don't waste your time searching for a sample.

Get your research paper done by professional writers!

Just from $10/page

Historical Analysis Research Papers Examples

Good why communication issues arise research paper example, communication issues for children with autism, example of research paper on aristotles principles of rhetoric within queen elizabeth is speech at tilbury, constitutional rights due research papers examples, free bong hits 4 jesus:school speech and constitutional conflict research paper example, sample research paper on single subject research, language and speech areas in the brain research paper example, the effect sounds have on a person studying research papers examples, introduction, good example of problem-solution paper: anonymous speech online research paper, good example of first amendment issues research paper.

Introduction Most lawyers grapple with First Amendment Issues Daily according to Professor Marvin Amori as they have to deal with laws that govern and concern speech and consequently write rules on them. Laws on speech are a crucial component in the society due to their significant contribution not only to national matters but also in global corporate boards as well as different public and private institutions. These lawyers therefore have to operate globally and are not only concerned with the First Amendment of the United States constitution but with customs, laws, and practices of different foreign nations (Mark 2238).

Tinker’s Decison

Good bong hits 4 jesus:school speech and constitutional conflict research paper example, leadership and communication theory research paper sample, research question one., sample research paper on terrorism and the media, hearing loss and the cochlear implant research paper samples.

“Cochlear implants are electronic devices that contain a current source and an electrode array that is implanted into the cochlea” (American Speech-Language-Hearing Association, 2004) and where electrical current is used for stimulating the surviving auditory nerve fibers. With the development of the cochlear implant having undergone a long history, this paper aims to discuss this history, focusing particularly on the evolution of the cochlear implant over the last thirty years. In addition, this paper discusses the advantages and disadvantages of the implant, as well as the public’s response to its development and use.

Early Beginnings of the Cochlear Implant

Sign language research paper sample, historical context revealing luke collaborating with paul to author hebrews research papers examples, thesis statement.

Hebrews represents one of the New Testament books that lack self attestation author thus its canonicity is disputed. The absentia of Hebrews authorship has forced several scholars and theologians to carry out enormous research to reveal who might have written the book.

INTRODUCTION

Good research paper about effects of television and movies on children.

<Student’s name> <Professor’s name>

Winston Chruchill, Finding A Way Through Research Paper Examples

Foundation course –, effective presentation research paper samples, good example of research paper on adolf hitler's power of words, example of research paper on history of autism, free cultural diversity in speech pathologists research paper sample, example of research paper on argumentative interpretation of martin luther kings i have a dream speech, free research paper on free speech and content control, example of research paper on symptoms of schizophrenia, example of right and left brain hemispheres research paper, constitutional rights research paper examples, perspectives on free-speech zones on college campuses research paper examples, give me liberty or give me death by patrick henry: question and answer research paper examples, research paper on electropalatography, flag burning as a symbolic speech research paper, flag burning as a symbolic speech, free research paper on the rhetorical triangle, research paper on interoperability communication plan, tools used for communications., example of research paper on democracy and the internet, free research paper on language and gender, example of research paper on different discourse ways between korean men and women.

There are several studies that have been conducted to analyse women’s and men’s speech across various cultural identities. The main goal of the studies has been to find out whether there is a difference in the way men and women speak. Speech behaviour of both genders can be analysed in respect to language usage, phonology, verbal choice and the general interactions between men and women in discourse.

The Left Versus The Right Sides Of The Brain Their Impact On Learning Research Paper Sample

The left versus the right sides of the brain: their impact on learning, research paper on frederick douglass' struggle, research paper on westboro baptist church, inaugural address research paper examples.

1. Why did you pick your speech?

Good Causes For Alalia Research Paper Example

Speech disorders, research paper on free speech on college campus, need of different styles of speech research papers examples, speech styles, good research paper about review of criticism, what is the fourth of july to slaves, free speech on college campuses research paper example, research paper on martin luther king's "i have a dream" speech.

The orator of the speech is Martin Luther King, a man trusted, respected, and considered the most renowned civil rights movement leader of America by the audience. King had developed some promising ethos through the speech (Martin Luther King). For instance, Martin Luther stared the speech through reading from his arranged text, and half way ignored this text during the speech to include a theme “I have a dream”. He was enthusiastic and became more confident as he gained trust and reassuring applause from his audiences (Sundquist).

Gender And Language At Workplace: Free Sample Research Paper To Follow

Free utilization of metaphoric expressions by children and adults: a comparative qualitative evaluation research paper example, an assignment submitted by, comparative analysis of freedom & human rights in the united states & saudi arabia research paper template for faster writing, adapting the learning environment for children with disabilities research paper sample, free research paper about learning disability of american children, good research paper on ethnography of speaking, free booker t. washington and frederick douglass’ conflicting negro thoughts research paper example, good martin luther king in civil right movement research paper example, deng xiaoping: research paper you might want to emulate, deng xiaoping on the anti-rightist campaign of 1957, aphasia research paper.

Aphasia is a disorder that attacks the sensory system. The person is unable to use the oral, written, auditory comprehension adequately. The interpretation of objects, sounds and feeling is impaired. It is highly associated with diseases such as stroke which leads to paralysis of the brain and nervous system in general. It can also be caused by tumors or cancers that affect the brain as well as any vascular problems affecting the circulation supplying the brain tissues. The problem does not affect the intelligence of the person, but rather their synthesis and interpretation of the same information.

Are all types of speech difficulties aphasia?

Free understanding of developmental theories research paper example, how inner speech and outer speech are connected.

Password recovery email has been sent to [email protected]

Use your new password to log in

You are not register!

By clicking Register, you agree to our Terms of Service and that you have read our Privacy Policy .

Now you can download documents directly to your device!

Check your email! An email with your password has already been sent to you! Now you can download documents directly to your device.

or Use the QR code to Save this Paper to Your Phone

The sample is NOT original!

Short on a deadline?

Don't waste time. Get help with 11% off using code - GETWOWED

No, thanks! I'm fine with missing my deadline

arXiv's Accessibility Forum starts next month!

Help | Advanced Search

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: maskcyclegan-based whisper to normal speech conversion.

Abstract: Whisper to normal speech conversion is an active area of research. Various architectures based on generative adversarial networks have been proposed in the recent past. Especially, recent study shows that MaskCycleGAN, which is a mask guided, and cyclic consistency keeping, generative adversarial network, performs really well for voice conversion from spectrogram representations. In the current work we present a MaskCycleGAN approach for the conversion of whispered speech to normal speech. We find that tuning the mask parameters, and pre-processing the signal with a voice activity detector provides superior performance when compared to the existing approach. The wTIMIT dataset is used for evaluation. Objective metrics such as PESQ and G-Loss are used to evaluate the converted speech, along with subjective evaluation using mean opinion score. The results show that the proposed approach offers considerable benefits.
Comments: submitted to TENCON 2024
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
Cite as: [eess.AS]
  (or [eess.AS] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • Frontiers in Oral Health
  • Oral Cancers
  • Research Topics

Multidisciplinary Management of Oral Cancer: Diagnosis, Treatment, and Rehabilitation

Total Downloads

Total Views and Downloads

About this Research Topic

The management of oral cancer is a complex task, often requiring a multidisciplinary effort. To date, advancements in early diagnosis, tailored therapies, and post-treatment care have been achieved through collaboration between surgeons, radiotherapists, oncologists, pathologists, and other specialists. Dentists also play a pivotal role in early diagnosis and post-operative rehabilitation, while speech pathologists are often vital for improving quality of life after treatment; nutrition specialists are fundamental before, during, and after treatment. New frontiers of research in this field have developed new diagnostic and prognostic tools, less invasive ablative and reconstructive surgery, and better tailored systemic treatments. Nonetheless, significant gaps remain in uniform diagnostic standards, optimal treatment modalities tailored to individual patient profiles, and effective rehabilitation measures that optimize long-term quality of life. The scope of this Research Topic is to publish high quality papers, either clinical research, systematic reviews and meta-analyses covering, but will not be limited to, the following topics: 1. Diagnostic methods to detect or to better identify oral potentially malignant disorders at high risk of transformation into cancer, including molecular, genetic, and clinical analysis. 2. Optimal ways to diagnose and follow-up oral cavity cancer through radiology and the potential applications of radiomics. 3. Surgical aspects of oral cavity cancer management including novel ablative and reconstructive techniques. 4. The importance of novel prognostic tools such as lymph node yield and lymph node ratio, tumor infiltrating lymphocytes, and tumor stromal ratio. 5. The role of systemic treatments in improving the survival in specific subsets of oral cancer patients. For example, the emerging role of immunotherapy in the adjuvant or neoadjuvant setting. 6. The importance of dental care, nutritional support, and speech rehabilitation in the perioperative setting for patients affected by oral cancer. 7. The management of treatment complications such as mucositis, xerostomia, or osteoradionecrosis. Any alternative submission proposals are more than welcome; authors are encouraged to submit a manuscript summary proposal via the homepage to check the scope of their potential contribution.

Keywords : oral cancer treatment multidisciplinary approach, comprehensive oral cancer management, oral cancer diagnosis techniques, reconstructive surgery for oral cancer, systemic therapies oral cancer, multidisciplinary oral cancer care team, post-operative rehabilitation, speech therapy, systemic treatments, mucositis, xerostomia, osteoradionecrosis, personalized medicine

Important Note : All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic Editors

Topic coordinators, submission deadlines.

Manuscript Summary
Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

total views

  • Demographics

No records found

total views article views downloads topic views

Top countries

Top referring sites, about frontiers research topics.

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

We've detected unusual activity from your computer network

To continue, please click the box below to let us know you're not a robot.

Why did this happen?

Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. For more information you can review our Terms of Service and Cookie Policy .

For inquiries related to this message please contact our support team and provide the reference ID below.

IMAGES

  1. speech analysis paper assignment

    speech research paper

  2. Aspects Of Connected Speech Research Paper Example

    speech research paper

  3. 😍 How to write speech essay. How to Write a Speech. 2019-02-14

    speech research paper

  4. How To Start Informative Speech

    speech research paper

  5. Free Speech

    speech research paper

  6. (PDF) Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep

    speech research paper

COMMENTS

  1. Speech and language therapy interventions for children with primary

    Therefore, in both research and intervention, it is difficult to tease speech and language disorders apart. It is thought that approximately 5% to 8% of children may have difficulties with speech and/or language ( Boyle 1996 ; Tomblin 1997 ), of which a significant proportion will have 'primary' speech and/or language disorders.

  2. [2106.15561] A Survey on Neural Speech Synthesis

    A Survey on Neural Speech Synthesis. Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu. View a PDF of the paper titled A Survey on Neural Speech Synthesis, by Xu Tan and 3 other authors. Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and ...

  3. Robust Speech Recognition via Large-Scale Weak Supervision

    View a PDF of the paper titled Robust Speech Recognition via Large-Scale Weak Supervision, by Alec Radford and 5 other authors. We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the ...

  4. A Comprehensive Review of Speech Emotion Recognition Systems

    During the last decade, Speech Emotion Recognition (SER) has emerged as an integral component within Human-computer Interaction (HCI) and other high-end speech processing systems. Generally, an SER system targets the speaker's existence of varied emotions by extracting and classifying the prominent features from a preprocessed speech signal. However, the way humans and machines recognize and ...

  5. EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech

    State-of-the-art speech synthesis models try to get as close as possible to the human voice. Hence, modelling emotions is an essential part of Text-To-Speech (TTS) research. In our work, we selected FastSpeech2 as the starting point and proposed a series of modifications for synthesizing emotional speech. According to automatic and human evaluation, our model, EmoSpeech, surpasses existing ...

  6. Communication 1200 Public Speaking: Researching Your Speech

    Create a set of concept word found in your thesis; add synonyms to the list. Example: A properly researched and carefully cited speech will build confidence in the speaker and credibility for the audience. Use quotations around phrases; truncate; use Boolean Logic to broaden or narrow the search

  7. Manifestation of depression in speech overlaps with ...

    Schematic depiction of the outline of the paper. There are three different phases in this work (a) Pre-training for speaker embeddings using a large non-medical speech data collected from N ...

  8. How Do We Imagine a Speech? A Triple Network Model for ...

    Abstract. Inner speech, a silent verbal experience, is central to human consciousness and cognition, yet its neural mechanisms remain largely unknown.In this study, we adopted an ecological paradigm called situationally simulated inner speech, which involves the dynamic integration of contextual background, episodic and semantic memories, and external events into a coherent structure.

  9. Text-To-Speech Synthesis

    FastSpeech 2: Fast and High-Quality End-to-End Text to Speech. coqui-ai/TTS • • ICLR 2021 In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e ...

  10. A Scoping Review in Speech Pathology and Applications to Future Health

    A scoping review assesses the potential scope of research done about a certain topic in hopes of retrieving evidence on the team's research topics. To conduct our scoping review, we used the software, Covidence, which allows reviewers to complete article screening and data extraction quickly and flexibly.

  11. (PDF) SPEECH RECOGNITION SYSTEMS

    In this paper, we provide an overview of the invited and contributed papers presented at the special session at ICASSP-2013, entitled "New Types of Deep Neural Network Learning for Speech ...

  12. Thirty years of research into hate speech: topics of ...

    The volume of academic papers published in a representative sample, from 1992 to 2019, displays a significant increase after 2010; thus, in the main evolution of online hate speech research, it has been possible to identify an initial development stage (1992-2010) followed by a rapid development (2011-2019).

  13. (PDF) Speech to text conversion and summarization for effective

    The research work presented in this paper describes an easy and effective method for speech recognition. The speech is converted to the corresponding text and produces summarized text. This has ...

  14. (PDF) On Speech Acts

    This paper is intended to give insight to the readers about the development of speech act theories which include categories, characteristics and validities, and strategies.

  15. How to Do Research for a Speech: 14 Steps (with Pictures)

    Conducting Good Research. 1. Keep your research organized. Use a system that works for you. Making sure that you have information organized and sources accounted for will make it much easier for you when it comes time to write your speech. [6] Keep all your research in one place, like a notepad or a word document.

  16. Speech Recognition

    Speech Recognition. 1194 papers with code • 236 benchmarks • 89 datasets. Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio ...

  17. Deep Speech: Scaling up end-to-end speech recognition

    View a PDF of the paper titled Deep Speech: Scaling up end-to-end speech recognition, by Awni Hannun and 9 other authors. We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing ...

  18. Clinical Topics and Disorders in Speech-Language Pathology

    American Speech-Language-Hearing Association 2200 Research Blvd., Rockville, MD 20850 Members: 800-498-2071 Non-Member: 800-638-8255. MORE WAYS TO CONNECT

  19. Researcher finds sound progress in babies' speech development

    Researcher finds sound progress in babies' speech development Date: August 23, 2024 Source: University of Texas at Dallas ... The research was funded by grants from the NIDCD (R01DC015108) and the ...

  20. Speech-to-Speech Translation

    35 papers with code • 3 benchmarks • 5 datasets. Speech-to-speech translation (S2ST) consists on translating speech from one language to speech in another language. This can be done with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, which is text-centric.

  21. 717 Good Research Paper Topics [Updated August 2024 ]

    Some common research paper topics include abortion, birth control, child abuse, gun control, history, climate change, social media, AI, global warming, health, science, and technology. But we have many more! On this page, we have hundreds of good research paper topics across a wide range of subject fields. Each of these topics could be used ...

  22. Fact-checking warnings from Democrats about Project 2025 and ...

    Vice President Kamala Harris, the Democratic presidential nominee, has warned Americans about "Trump's Project 2025" agenda — even though former President Donald Trump doesn't claim the ...

  23. The Impact of Neural Networks on Image and Speech Recognition

    This paper provides a comprehensive overview of ChatGPT, exploring its development, underlying technology, applications, ethical considerations, and future implications.

  24. Listen to Research Papers & Retain More

    Interpretative research papers: These involve the interpretation of data, literature, or artistic works, requiring a nuanced understanding of the subject matter. Survey research papers: Based on survey data, these papers analyze and present findings from questionnaires or interviews. How text to speech works

  25. Information

    Recurrent neural networks (RNNs) have significantly advanced the field of machine learning (ML) by enabling the effective processing of sequential data. This paper provides a comprehensive review of RNNs and their applications, highlighting advancements in architectures, such as long short-term memory (LSTM) networks, gated recurrent units (GRUs), bidirectional LSTM (BiLSTM), echo state ...

  26. Speech Research Paper Examples That Really Inspire

    An example of this is the famous Iron Curtain speech made on March 5th 1946, where he changed democratic Western perceptions of the Communist East regions of the Soviet Union. Churchill's speech, entitled "The Sinews of Peace" was a unifying call for the British to be strategic and in their post war actions.

  27. MaskCycleGAN-based Whisper to Normal Speech Conversion

    View PDF HTML (experimental) Abstract: Whisper to normal speech conversion is an active area of research. Various architectures based on generative adversarial networks have been proposed in the recent past. Especially, recent study shows that MaskCycleGAN, which is a mask guided, and cyclic consistency keeping, generative adversarial network, performs really well for voice conversion from ...

  28. Multidisciplinary Management of Oral Cancer: Diagnosis ...

    The scope of this Research Topic is to publish high quality papers, either clinical research, systematic reviews and meta-analyses covering, but will not be limited to, the following topics: 1. Diagnostic methods to detect or to better identify oral potentially malignant disorders at high risk of transformation into cancer, including molecular ...

  29. (PDF) HATE SPEECH DETECTION USING MACHINE LEARNING: A SURVEY

    Abstract and Figures. This survey paper aims to provide a comprehensive overview of the existing research on hate speech detection using machine learning. We review various methodologies and ...

  30. The Stakes Are High for Powell and the Fed at Jackson Hole

    The Kansas City Fed's conference is a golden opportunity for the world's most important central bank to regain control of the policy narrative.