Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- My Account Login
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Open access
- Published: 16 December 2021
Global predictors of language endangerment and the future of linguistic diversity
- Lindell Bromham ORCID: orcid.org/0000-0003-2202-2609 1 na1 ,
- Russell Dinnage 1 ,
- Hedvig Skirgård 2 , 3 ,
- Andrew Ritchie 1 ,
- Marcel Cardillo 1 ,
- Felicity Meakins 4 ,
- Simon Greenhill ORCID: orcid.org/0000-0001-7832-6156 2 , 3 &
- Xia Hua ORCID: orcid.org/0000-0003-3485-789X 1 , 5 na1
Nature Ecology & Evolution volume 6 , pages 163–173 ( 2022 ) Cite this article
36k Accesses
54 Citations
876 Altmetric
Metrics details
- Language and linguistics
- Macroecology
A Publisher Correction to this article was published on 03 February 2022
This article has been updated
Language diversity is under threat. While each language is subject to specific social, demographic and political pressures, there may also be common threatening processes. We use an analysis of 6,511 spoken languages with 51 predictor variables spanning aspects of population, documentation, legal recognition, education policy, socioeconomic indicators and environmental features to show that, counter to common perception, contact with other languages per se is not a driver of language loss. However, greater road density, which may encourage population movement, is associated with increased endangerment. Higher average years of schooling is also associated with greater endangerment, evidence that formal education can contribute to loss of language diversity. Without intervention, language loss could triple within 40 years, with at least one language lost per month. To avoid the loss of over 1,500 languages by the end of the century, urgent investment is needed in language documentation, bilingual education programmes and other community-based programmes.
Similar content being viewed by others
The network nature of language endangerment hotspots
English language hegemony: retrospect and prospect
A revised digital edition of Wurm & Hattori’s Language Atlas of the Pacific Area
As with global biodiversity, the world’s language diversity is under threat. Of the approximately 7,000 documented languages, nearly half are considered endangered 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 . In comparison, around 40% of amphibian species, 25% of mammals and 14% of birds are currently threatened with extinction 9 . The processes of endangerment are ongoing 10 , with rates of loss estimated as equivalent to a language lost every one to three months 7 , 11 , 12 , and the most pessimistic predictions suggesting that 90% of the world’s languages will be lost within a century 13 . However, unlike biodiversity loss 14 , predictions of language loss have not been based on statistically rigorous analysis. Here we provide a global analysis to model patterns of current and future language endangerment, and compare the predictive power of variables representing some of the potential drivers of language loss. Our analysis has three key features. First, we examined a broader set of influences than previous studies, encompassing demographic factors, linguistic resources, socioeconomic setting, language ecology, connectivity, land use, environment, climate and biodiversity (Table 1 ). Second, we addressed major statistical challenges of large-scale comparative analyses, by simultaneously accounting for phylogenetic non-independence, spatial autocorrelation and covariation among variables. Third, our models incorporated demographic and environmental variables that can be projected into the future, allowing us to make predictions of future patterns of language endangerment in time and space.
While language change and shift are natural processes of human cultural evolution, the loss of global language diversity has been massively accelerated by colonization and globalization. Many factors contribute to language endangerment, some of which are specific to particular regions, language groups or languages. The historical context of each language, such as patterns of colonial expansion, and particular political climates, such as support for bilingual education, are expected to have substantial impacts on language endangerment patterns 10 . In addition to specific historical and local influences, there may also be widespread general factors that contribute to language endangerment, which can be used to identify languages that may come under increasing threat in the future. For a dataset containing 6,511 languages (over 90% of the world’s spoken languages), we analysed 51 predictor variables that target different aspects of language maintenance 15 , including language transmission (for example, whether a language is actively learned by children or used in education), language shift (for example, connectivity, urbanization, world languages) and language policy (for example, provision for minority language education, official language status). We also included variables that have been associated with language diversity, including features of climate and landscape. Clearly, any list of threatening processes will be incomplete, and the requirement for globally consistent data will fail to capture important influences on language vitality that operate at regional or local levels. Our aim is not to provide a comprehensive picture of language endangerment but a useful exploration of the influence of a selection of potential impacts. Broad-scale quantitative studies are therefore a complement to more focused qualitative studies on language endangerment and loss.
Understanding global threats to language diversity requires that we develop a macroecology of language endangerment and loss 16 . A macroecological approach has many advantages: it allows evaluation of a large range of factors that influence language vitality; formal testing of general patterns above the signal of individual language trajectories; statistical comparison of the explanatory power of different models, accounting for covariation of cultural, socioeconomic and environmental factors; and a way of avoiding the confounding effects of spatial distribution and relationships between languages 17 . Although threats to linguistic diversity, shaped by social, cultural, political and economic influences, often differ from processes threatening biodiversity 18 , the analytical challenges associated with studying global patterns of endangerment are common to biologists and linguists 17 , 19 , 20 , 21 . Here we use global analysis to illuminate some of the complex interactions of extrinsic factors threatening language diversity, and use this understanding to predict the fate of the world’s languages over the next century.
Results and discussion
Current patterns of endangerment.
We use an endangerment scale based on EGIDS, which incorporates a range of factors including domains of use and intergenerational transmission 22 , 23 . We describe languages that are losing first-language (L1) speakers as ‘Threatened’, those with only adult speakers and no child learners as ‘Endangered’ or those with only elderly speakers as ‘critically endangered’, and languages with no L1 speakers as ‘Sleeping’ (the term preferred by many speakers of endangered languages 1 , 24 , 25 : Supplementary Table 1 ). Of the 6,511 languages in our database, 37% are considered threatened or above (which we will refer to generally as ‘endangered languages’); 13% of these are no longer spoken (sleeping). The areas of greatest absolute number of endangered languages are in New Guinea, Central America, Himalayas and Yunnan, and regions between Central and West Africa (Extended Data Figs. 1 and 2 ), but this pattern may largely reflect diversity 17 : where there are more languages, speaker populations and geographic ranges tend to be smaller, potentially resulting in more endangered languages. Areas with the highest proportion of their languages endangered include Australia, North China, Siberia, North Africa and Arabia, North America, and parts of South America (Fig. 1 ). Areas with the greatest language loss to date are in Australia, South America and USA (Extended Data Fig. 2 ).
Each hexagon represents approximately 415,000 km 2 . The coloured bars show the predictors of level of endangerment identified in the best model for a global language database of 6,511 languages, and for each of 12 regions any additional influences on patterns of language endangerment (see Supplementary Data 3 ). Dark grey areas on the map do not have data for all the independent variables in the best model for language endangerment level. Language distribution data are from WLMS 16 ( worldgeodatasets.com ).
Predictors of language endangerment
Our analysis seeks the best set of variables, from 51 candidate variables, to explain variation in endangerment level (the dependent variable), over and above covariation due to relationships between languages, spatial autocorrelation and contact between language distributions, and allowing for interactions between predictor variables and region. We reduced the number of variables by grouping variables according to their pairwise correlations, identified independent variables with significant predictive power on a proportion of the data (training dataset), then evaluated the fit of the model on the remaining data (test dataset). We then estimated model parameters on the full dataset (see Methods for details).
Our best-fit model explains 34% of the variation in language endangerment (comparable to similar analyses on species endangerment 26 , 27 , 28 ). These variables cannot provide a full picture of the processes threatening language diversity, as there will be many other important factors that cannot be included due to lack of appropriate and consistent data with global coverage, or because of the idiosyncratic nature of processes of language endangerment and the influence of historical factors that cannot be captured in a broad-scale model. For example, patterns of human migration and past episodes of population expansion and contraction will not be captured fully in contemporary language distribution data. Furthermore, language endangerment and loss is an ongoing process, and there may be historical factors that caused dramatic reduction in L1 speakers that will not be captured in current values of socioeconomic variables, such as massacres of Indigenous populations or ethnic groups, punishing people for speaking their language and separating children from parents. Patterns of language endangerment may at least partially reflect past influences, such that current predictors might not fully capture important processes that resulted in the current endangerment status (a phenomenon known in conservation biology as extinction filter effect 29 ). Because of these unavoidable limitations, no study of this kind can aim to comprehensively describe factors affecting vitality of all of the world’s languages. But by identifying contemporary factors that are significant predictors of current patterns of endangerment at a global scale, we contribute to the understanding of the complex interaction of factors contributing to language endangerment.
Five predictors of language endangerment are consistently identified at global and regional scales: L1 speakers, bordering language richness, road density, years of schooling and the number of endangered languages in the immediate neighbourhood. Each of these predictors highlights a different process in language endangerment; taken together, they paint a picture of the way interactions between languages shape language vitality.
Number of first-language (L1) speakers is the greatest predictor of endangerment. It is important to emphasize that not all small languages are endangered, and that language loss does not necessarily result from a reduction in number of people in a particular culture or population, but often occurs when people shift from using their heritage language to a different language 1 , 30 . Therefore the multilingual setting in which each language is embedded (referred to as the language ecology) plays a key role in endangerment, by influencing whether speakers shift to another language or adopt additional languages in their multilingual repertoires 31 . Our results suggest that direct contact with neighbouring languages, as reflected in the number languages with overlapping or touching distributions, is not in itself a threatening process. In fact, languages whose distributions are directly in contact with a greater number of other autochthonous languages have lower average endangerment levels (Fig. 1 ). This may reflect a common observation that communities in regular contact with speakers of other Indigenous languages may be multilingual without necessarily giving up their L1 language 31 . If ongoing language contact was a threat to language vitality, then we might expect that more isolated languages, such as those on islands, would be less endangered, but this is not the case (Supplementary Fig. 7 ). Similarly, we find no evidence that barriers to human movement that might be expected to reduce contact between nearby speaker populations, such as steep or rough terrain, are associated with reduced endangerment. We conclude that being in regular contact with speakers of another language does not in itself usually endanger Indigenous language vitality. Instead there are other more complex social, economic and political dynamics influencing language endangerment that may co-occur with language contact but are not synonymous with it.
A language is more likely to be endangered if a higher proportion of languages in the region are also endangered, suggesting that, in addition to language-specific threats, there are also widespread factors that influence language vitality across a region. One such factor is the density of roads in the neighbourhood surrounding each language (Fig. 1 ). One interpretation of the association between road density and language endangerment is that roads increase human movement and thus bring people into contact with speakers of other languages, and this may result in language shift. However, our results suggest that the association between language endangerment and roads is unlikely to simply reflect language contact. If language contact always generated language shift and loss, then we would expect languages with a high degree of contact with other languages to be more endangered. In fact, we find the opposite: languages whose distribution overlaps or meets many other languages are less endangered (Fig. 1 and Supplementary Data 3 ). Furthermore, if contact between speakers of different languages was a driver of language loss, then we would expect landscapes that inhibit movement to reduce language contact and show lower levels of endangerment, but none of the other connectivity variables, such as altitudinal range, landscape roughness or density of waterways, show consistent association with language endangerment. The association with roads is neither simply a result of socioeconomic shift, as other indicators of development (for example, GDP, life expectancy) are not associated with language endangerment, nor is it a reflection of increasing urbanization, land use change or increase in built environment (Supplementary Fig. 7 ). Instead, road density may reflect connectivity between previously remote communities and larger towns, with increase in the influence of commerce and centralized government. Lack of roads has been cited as a protective factor in maintaining Indigenous language vitality, as it may limit the spread of ‘lingua francas’, such as Tok Pisin in Papua New Guinea 32 . The association between road density and language endangerment may reflect movement of people in two directions, as people move from their traditional homelands into larger population centres, and outsiders move into previously isolated communities, both of which have been implicated as threats to Indigenous language vitality 33 . For example, access to new employment opportunities (such as a shift from rural work to factory or construction work) may result in shift away from heritage languages to dominant languages of commerce 34 , 35 , 36 . Roads can aid the spread of ‘lingua francas’ or languages of central governance 37 .
There is consistent global support for higher average levels of schooling being associated with greater language endangerment (Fig. 1 ). The association between schooling and language endangerment cannot be interpreted as a side effect of growing socioeconomic development, because years of schooling is a much stronger predictor of endangerment patterns than other socioeconomic indicators. Instead, it is consistent with a growing number of studies showing a negative impact of formal schooling on minority language vitality, particularly where bilingual education is not supported or, in some cases, is actively discouraged 38 , 39 , 40 . Yet having a minority education policy is not globally associated with reduced threats to language diversity, possibly due to variation in the extent and manner of provision of bilingual education for speakers of minority languages. For example, the Bilingual Education Act of the United States (1968) was primarily concerned with improving access to mainstream education for students from non-English speaking backgrounds by using heritage language as a bridge to English acquisition, rather than being designed to allow students to maintain their first language 41 .
The spatial scale of the variables reflecting education policy and outcomes cannot capture variation within countries. Reliable statistics on average years of schooling are, for most parts of the world, only available as national averages, even though years of schooling may vary within a country, particularly between socioeconomic groups, or when comparing rural and urban populations. However, we note that the same effects have been reported in local-scale studies: for example, in a remote northern Australian Indigenous community, increased number of years schooling is associated with reduced use of Indigenous language elements across all generations, from elders to children 42 . Collection of regional data on variation in number of years of schooling would allow the generality of this relationship to be tested at a range of spatial scales.
Similarly, our data on education policy is necessarily coarse grained, which may mask some patterns at local scales: national legal provision may not reflect use of minority languages in schools at a local or regional level. For example, in China, the Regional Ethnic Autonomy Law (1984) promotes learning both regional languages and Mandarin Chinese, but the policy is not translated into educational practice evenly across all regions due to lack of resources in some languages, or local emphasis in some places on students from minorities learning the centralized language of governance and commerce 39 . The same bilingual education policy may invigorate minority languages in some areas, but result in greater emphasis on education in the dominant national language in other places 43 . More fine-grained analysis at regional level is needed to examine the influence of minority languages in classrooms on language diversity and vitality.
Our results not only identify global threats to language vitality, but also reveal differences in threatening processes in different regions. For example, in Africa, language endangerment is associated with greater areas of pasture or croplands, potentially reflecting language shift associated with subsistence change (for example, as hunter–gatherer societies adopt the languages of neighbouring pastoral or agricultural groups 44 ). Climate has the strongest association with language endangerment in Europe, with endangerment levels increasing with temperature seasonality, reflecting patterns of language erosion in Arctic regions. These regional patterns are ideal foci for future studies of language endangerment: while the current study is constrained to predictors that are globally relevant and consistently measured for all regions of the world, a targeted study could focus on variables considered important at regional scales, such as land use and subsistence in Africa, population density change in Oceania, or climate in Europe and Central and Eastern Asia (Supplementary Fig. 7 ).
Predicting future language loss
If a language is no longer being learned by children, we can use demographic information to predict when, in the absence of interventions to increase language transmission, there will be no more living L1 speakers. We can combine the current L1 speaker population size with endangerment score (which tells us the relative generational distribution of L1 speakers and whether the number of L1 speakers is declining; Supplementary Table 1 ), and use demographic information on age structure of the population (Supplementary Table 8 ) to predict how many L1 speakers will be alive in the future (see Supplementary Methods 5 for details). Our analysis is conservative in that it only considers change in L1 speakers in languages identified as having reduced transmission to younger generations (see Supplementary Table 1 ): we did not model change in speaker number for languages currently considered to be stably transmitted, even though they may become endangered in the future.
Our model predicts that language loss will at least triple in the next 40 years (Fig. 2 ). Without intervention to increase language transmission to younger generations, we predict that by the end of the century there will be a nearly five-fold increase in Sleeping languages, with at least 1,500 languages ceasing to be spoken (Fig. 3 ). Some parts of the world stand out as ‘hotspots’ of future language loss, with the greatest absolute loss of languages predicted to occur in the west coast of North America, Central America, the Amazon rainforest, West Africa, north coast of New Guinea and northern Australia (Extended Data Fig. 4 ). After 80 years, the model predicts additional areas of language loss in Borneo, southwest China and areas around the Caspian Sea. The greatest proportional loss of languages is predicted to occur in the Arctic, interior plains of Northern America, temperate areas of southern Chile and the Sahara (Extended Data Fig. 5 ).
a , b , The red shading represents the differences between the predicted values at present and the predicted values in 40 years, for the absolute number ( a ) and proportion of languages ( b ) per hex grid, based on generational shift and demographic transition in L1 speakers. c , Proportion of languages predicted to become Sleeping (EGIDS ≥ 9) in the next 40 years. See Supplementary Table 1 for information on endangerment scales. Language distribution data from WLMS 16 ( worldgeodatasets.com ).
a , Current and predicted proportion of languages that are endangered (EGIDS 6b–8b) or Sleeping (no living L1 speakers, EGIDS 9–10). b , c , Current and predicted number of endangered (6b–8b) ( b ) and Sleeping (9–10) ( c ) languages according to the current level of language documentation. Each violin gives the probability distribution of the number or proportion of languages that are predicted to be endangered or Sleeping, with the dot showing the mean and the whisker showing the standard deviation. Each dashed line shows the number or proportion of languages that are currently endangered or Sleeping. This figure projects current levels of documentation for each language, hence does not reflect future documentation efforts of threatened languages.
In addition to demographic shift, our model also identifies predictors of language endangerment that are likely to change over time. For some of the variables associated with language endangerment, such as average years of schooling, we lack an adequate predictive model that is global in extent but would allow for regional variation. However, there are some variables identified as significant predictors of language endangerment at regional levels, such as land use and climate, for which we can predict future values on the basis of current trends (see Supplementary Information 5.2 ). For example, we can use climate change models to predict future values of climate variables at all points of the globe, and we can use information on rates of change in land use in each grid cell to project possible future values for land use variables in that grid cell. Clearly, such predictions should be regarded as possible values only, and all such future projections are subject to caveats: for example, we chose a mid-range climate model so the future values could be higher or lower depending on the effectiveness of global climate change mitigation strategies, and the land-use projections are based on the average rate of change in the last few decades, although local factors may cause those rates of change to either increase or decrease in the future. But it is a useful exercise to add climate and land use to the predictive model to illustrate the potential for forward prediction of variables impacting endangerment status. The results of the predictions based on generational shift and demographic transition are shown in Figs. 2 and 3 . Predictions that are additionally adjusted for change in climate and land-use variables show qualitatively the same results (Extended Data Figs. 2 – 5 ).
Safeguarding language diversity
The crisis of language endangerment has prompted worldwide efforts to recognize, document and support language diversity 45 , reflected in the UNESCO International Decade of Indigenous Languages, beginning in 2022. Every language represents a unique expression of human culture, and each is subject to idiosyncratic influences of their specific history and local sociopolitical environment. By identifying general factors that impact language vitality, or areas at greatest risk of language loss, we may be better placed to direct resources for maintenance of language diversity.
In biology, ‘extinction debt’ describes the inevitable loss of species that are currently persisting with inviable populations or insufficient habitat 46 , 47 . For languages, ‘extinction debt’ arises from reduced intergenerational transmission. Languages currently spoken by adults but not learned as a first language by children will, without active intervention and revitalization, have no more L1 speakers once the current L1 speakers die. Using information on intergenerational transmission for each language combined with demographic information, our model predicts that the greatest increase in endangered languages will coincide with areas of greatest language diversity, in particular New Guinea and Central America (Fig. 2a ). However, some regions are predicted to lose a greater proportion of their current language diversity, such as the Great Lakes region of North America, the northern Sahara and eastern Siberia (Fig. 2 ).
We emphasize that these predictions are not death knells, but possible outcomes in the absence of investment in language vitality. For example, while our model predicts Alutiiq (Pacific Gulf Yupik {ems}) in Alaska to increase in endangerment level, the community has instituted a language revitalization programme that may counter the predicted trend. Identifying external factors associated with language endangerment can focus attention on areas where language vitality might become threatened. For example, some areas with the greatest predicted increase in road density, such as Nigeria, Papua New Guinea and Brazil 48 , are predicted by our model to have the highest potential loss of languages (Extended Data Fig. 4 ). Since increasing road density also has negative impacts on biodiversity, focusing mitigation efforts on areas of increasing road density may be beneficial for both language vitality and biodiversity 49 , 50 .
In addition to identifying correlates of language endangerment that are likely to change in the future, such as land use, we also identify factors that are open to intervention to reduce loss of language diversity. Currently, more years of formal schooling are associated with greater rates of language endangerment (Fig. 1 ). Research suggests that bilingual education, where students learn part or all of the curriculum in their first language, typically results in greater overall academic achievement without sacrificing proficiency in the dominant national language 51 , but emphasis on high-stakes testing for competency in the national language can contribute to erosion of heritage language proficiency 42 . Having provision for bilingual education enshrined in legislation, or official recognition of minority languages in government or in education, is not sufficient to reduce language endangerment (Supplementary Fig. 7 ). Implementation requires genuine commitment to bilingual education, and support from community members who can bring heritage language to the classroom. The benefits of providing support to enhance Indigenous language vitality, in terms of wellbeing 52 , 53 and socioeconomic outcomes 54 , are likely to far outweigh the costs. Implementation of support for Indigenous language vitality at all levels of governance and within speaker communities is urgent, given the predicted loss of L1 speakers who can aid language vitality and transmission (Fig. 3 ).
We emphasize that our analysis is focused on L1 speakers who learned the language as children, reflecting continuity of language transmission over generations. A language classified as ‘Sleeping’ (no L1 speakers) may be spoken as an acquired (L2) language in a multilingual context, as a reflection of ethnic identity or through revitalization (which may ultimately generate new L1 speakers). Language revitalization benefits from documentation, such as texts, dictionaries and grammars. Our future predictions give cause for concern that within 80 years there could be 1,500 or more languages that will no longer be spoken, yet a third of these currently have little or no documentation (Fig. 3 ). The majority of these languages currently have living L1 speakers, so there is still time to increase documentation based on the expert knowledge of fluent first-language speakers 55 , and to support communities to re-invigorate intergenerational language transmission 56 .
The loss of language diversity results from a complex network of factors, particularly those associated with colonization, globalization, and social and economic change. While identifying correlates of endangerment does not provide a full picture of the loss or erosion of any particular language, it does contribute to a general ‘theory of language loss’ 38 , 57 . A key difference between species and language endangerment patterns is that while many correlates of species extinction risk are intrinsic features of species biology (such as low reproductive rate or specialist diet 58 ), we have considered only ‘external’ factors, which frame the context in which languages persist. But external factors, unlike species traits, are amenable to manipulation. Some identified predictors of language endangerment may act as ‘red flags’, highlighting areas that would benefit from interventions to support language vitality (such as regions where road networks are expanding rapidly) or prompt finer-grained analysis of potential impacts (such as educational policy). Our study highlights the critical level of under-documentation of language diversity (Fig. 3 ), showing that without intervention, we might lose a substantial proportion of language diversity without having ever adequately documented how those languages represent unique expressions of human cultural diversity 59 . Investing in speaker communities to provide them with the support they need to encourage language diversity and vitality will bring measurable benefits in terms of social justice, community wellbeing and cultural engagement 53 , 54 , 55 , 60 .
Language data
We used data on L1 speakers, geographic distribution, endangerment level and relationships for 6,511 languages classified as ‘spoken L1 languages’ 17 , 61 , 62 (see Supplementary Methods for details of data and variables). We give the standard nomenclature according to the ISO 639-3 three-letter language identifiers in Supplementary Data 1 , and throughout this document we give the ISO code in curly brackets at the first mention of a language. Nine ‘world languages’ were included only as factors potentially influencing language vitality (see Supplementary Table 2 ) but were otherwise excluded from all language-level analyses. There are several schemes for evaluating and categorizing the risk of language loss 63 , 64 , most of which target indicators of language vitality, such as intergenerational transmission, official recognition, domains of use, and level of documentation and resources 23 , 65 (Supplementary Table 1 ). We based our analysis on EGIDS because it provides the most comprehensive coverage for our data (Supplementary Methods 2.1.2 and Fig. 1 ). Signed languages were not included in this analysis due to insufficient information on number of L1 signers, distributions and endangerment status for the majority of the world’s signed languages (Supplementary Information section 2.1.6 ).
Many previous analyses of global patterns of language endangerment relied on speaker population size and geographic distribution as proxies of endangerment status 4 , 20 , 66 . While low speaker number is the ultimate outcome of endangerment, current population size may not always provide a reliable indicator of language vitality or risk of loss 67 , 68 . Small localized languages may be stable and vigorous, for example some Papuan languages are confined to one or a few villages with only hundreds of speakers, yet are not considered endangered (for example, Neko {ISO 639-3: nej}, Mato {met}), and large widespread languages are not secure if they are not being reliably transmitted to younger generations (for example, Domari {rmt}, an endangered Indo-European language with over a quarter of a million speakers). Using population and range size to represent endangerment also conflates endangerment and diversity: range and population size correlate with number of languages per unit area 17 , so an area with more languages may, all things being equal, also contain a larger number of endangered languages 4 , 20 . Our analysis emphasizes global trends and general patterns over specific language trajectories or local histories. Use of global databases provides an overview of language diversity and vitality, but it is not possible to verify current speaker numbers, endangerment and distributions without expert knowledge of each individual language. Some regions or language families may be less well represented in global databases (for example, Australian languages have patchy representation and would benefit from expert revision on speaker numbers and endangerment levels). Furthermore, there is often no clear line between a dialect and a language, and this can result in variation in assigning L1 speakers to languages (Supplementary Methods 2.1.2 ). Our results should therefore be interpreted as providing general patterns and broad-brush predictions rather than specific detail on particular languages.
Predictor variables
We included ten broad categories of variables to describe key extrinsic factors that influence language vitality (Table 1 ). Variables were either recorded per language, as a weighted average across the language area or national values, or for a 10,000 km 2 ‘neighbourhood’ around the language (see Supplementary Methods for details). For each language, we recorded the reported number of L1 speakers, endangerment level (Supplementary Table 1 ), distribution 62 , level of documentation 61 , whether the language has official recognition in any country, or is officially recognized as a language of education. We characterized the ‘language ecology’ by the diversity of languages in the surrounding area, the number and proportion of endangered languages in the area, the relative representation of speakers compared to nearby languages, and whether it occurs in a country (or countries) that has one of nine ‘world languages’ as an official language (Supplementary Table 3 ). We recorded levels of educational attainment and education spending at national level, as well as the presence of a general provision for the use of minority languages for instruction in all or part of formal schooling, and whether each language is recognized for use in education (Supplemenary Tables 5 and 6 ). Socioeconomic context is represented by Gross Domestic Product per capita (GDPpc), the Gini index of income inequality and life expectancy at 60 years of age (Supplementary Tables 5 and 7 ), noting that these national averages do not capture variation between groups or areas within each country (see Supplementary Information 2.4 ).
To represent the environmental context of each language, we included variables representing population density, climate, land use, biodiversity loss, connectivity and ‘shift’ (that is, the rate of change in land use, population, built environment) (Table 1 ). Because language loss is often a result of language expansion replacing autochthonous languages, we included measures of connectivity: density of roads and navigable waterways (which encourage human movement) and landscape roughness and altitudinal extent (which discourage human movement). To indicate human impact on the natural environment, we included ‘human footprint’ (which summarizes anthropogenic impacts on the environment 69 ) and measures of biodiversity loss. We included factors previously shown to be correlates of language diversity: mean growing season, average temperature, temperature seasonality and precipitation seasonality (we did not include species richness because biodiversity patterns are not significantly associated with language diversity above and beyond these climatic covariables 17 ). To model the impact of changing landscape and environment, we included rates of change in urbanization, population density, land use and human footprint 69 .
The variables we included vary in their degree of spatial resolution. For variables concerning legislation and policy (for example, provision for minority language education), data is typically available only at country level. For some socioeconomic variables, such as life expectancy, there is regional data for some countries, but most areas only have country level data, so for consistency we used national averages provided by global organizations such as the World Bank and World Health Organization (Table 1 ). For environmental variables, such as temperature seasonality, we averaged values over all grid cells in the language distribution area, but for landscape factors influencing human movement, such as mountains and roads, values within the language area are not fully informative because we wish to capture movement between language areas. For these variables, we averaged over all grid cells in a ‘neighbourhood’ centred on the language distribution. For full details of the spatial resolution of each variable, see Supplementary Methods .
The variables included in this study necessarily represent current environments, socioeconomic status and contemporary policy settings. Aside from shift variables (Table 1 ), which represent change over time, we cannot directly capture historical processes, such as past educational programmes, historical disease epidemics, warfare or genocide. These are important factors in language endangerment but cannot be easily represented in globally consistent, universally available variables, so investigating the impact of these factors is beyond the scope of this analysis.
Previous analyses of global language endangerment included relatively few potential predictors and did not control for the confounding effects of both spatial proximity and relationships between languages 2 , 4 , 20 , 66 . Languages that cluster in space will share many environmental, social and economic features. Related languages may share not only many linguistic features but also many environmental, social and economic factors and shared historical influences 17 . All analyses rest on the assumption that datapoints are statistically independent of each other, so if we find that the residuals of the model show phylogenetic signal, then phylogenetic non-independence (when datapoints are related by descent) violates the assumption of standard statistical tests and can lead to spurious relationships 70 , 71 . Our method estimates the contribution of relatedness to observed patterns of endangerment, so that if there is little or no influence of relatedness on patterns of endangerment, then the phylogeny will have no effect on the outcomes 17 . A large contribution of phylogeny tells us that languages tend to be more similar to related languages in their endangerment status than they are to randomly selected languages. This does not imply that languages inherit either their endangerment status or threatening processes from their ancestors, but that relatives show patterns of similarity of endangerment 72 . If this is the case, we need to account for this phylogenetic non-independence in our analysis, so that we can identify factors that are significantly associated with endangerment above the association which is expected purely due to their shared relationships (closely related languages having more similar patterns of endangerment).
Failure to account for spatial autocorrelation can lead to false inference of patterns of language endangerment 19 . For example, socioeconomic indicators such as GDP have a strong latitudinal gradient, and so does language diversity and range size, so if range size is associated with endangerment, we would expect a significant correlation between GDP and language endangerment even if there is no direct influence of one on the other 71 . Just as repeatedly sampling two neighbouring areas but counting each observation as a unique datapoint inflates perceived environmental correlations by pseudoreplication 73 , repeatedly sampling related languages with similar cultural traits, linguistic features, historical influences and language ecologies also potentially inflates perceived associations between endangerment and environmental or social factors 19 , 70 . Both of these sources of covariation in the data must be accounted for to find meaningful correlates of language endangerment.
In our analysis, the dependent variable is the level of endangerment, based on EGIDS rankings (Supplementary Table 1 ). We are seeking global correlates of language endangerment, but we are aware that some threatening processes may have greater or lesser impact in different regions (Supplementary Methods 2.1.3 ). Therefore, in addition to the predictors we described above, we included an interaction term between each region and each independent predictor, to account for any region-specific effect of the predictor on endangerment. This interaction term was constructed by taking the product of the predictor and a binary variable recording whether a language belongs to the region. Any interaction term with no variation in the corresponding region was removed. Instead, we included an intercept for each region to account for differences in the average level of language endangerment among regions. In total, we have 51 predictors, 51 by 12 interaction terms, and 12 intercepts in the independent variables (Supplementary Data 3 ).
The basic steps of our statistical analysis are:
applying transformations to the 51 predictors (Supplementary Methods 4.1 , Table 1 ), then calculating their interaction terms;
grouping the 51 predictors according to their pairwise correlation (Supplementary Methods 4.2 ) and grouping interaction terms with their corresponding predictors (Supplementary Data 3 );
dividing the dataset into two, with two-thirds of the languages assigned to a training dataset and one-third to a test dataset. The training dataset was used to select the independent variables (candidate models) to predict current endangerment level (Supplementary Methods 4.3 ) and the test dataset was used to evaluate the fit of these candidate models to predict endangerment level (Supplementary Methods 4.4 );
using the best model, re-estimating the model parameters using all 6,511 languages;
using the predicted change in L1 speaker population, environment and climate to generate future values of variables, then using the best model to predict future endangerment given these predicted future values (Supplementary Methods 5 ).
Because the dependent variable in our analysis (endangerment level) is an ordinal variable, we used ordinal probit regression 74 to model language endangerment status. To satisfy the parallel regression assumption (that an independent variable has the same effect on threat status across all endangerment levels) for the majority of variables, we grouped recorded EGIDS scores into seven levels by combining levels 1–6a into a ‘stable’ level (Supplementary Methods 4.2 and Table 1 ). To account for spatial and phylogenetic autocorrelation, we constructed three matrices. The phylogenetic matrix represents relationships between languages as inferred from a taxonomy, with branch lengths scaled to relative divergence depths 17 (Supplementary Methods 3 ). The distance matrix captures similarity in nearby languages due to shared environment using an exponentially decreasing function of the great-circle distance between the centroids of polygons of two languages. Since distance between centroids may not reflect on-the-ground language contact, we also used a contact matrix which contains 1 if two language polygons overlap (allowing a buffer of 100 km around each polygon), and 0 otherwise. We do not expect this contact matrix to fully capture the degree of ongoing contact between languages, which may be determined by local factors including modes of transport, form of subsistence or connectivity, but we included it to allow for an influence of close association between language distributions as an influence on patterns of endangerment, above and beyond the great-circle distances between the centres of language distributions. The distance, contact and phylogenetic matrices had zero diagonals and each row was normalized to unity. Because each matrix had its own coefficient, if patterns of autocorrelation due to distance, contact or relatedness were not important in shaping the values of variables, then the model would estimate the coefficient to zero and the matrix would not influence the result.
We then fitted an autoregressive ordinal probit model to the data. We modelled the threat status of a language as a linear function of not only the independent variables but also the threat status of other languages whose associations with the language depend on the distance, contact and phylogenetic matrices. The model was fitted to the data using a two-stage least squares approach 74 implemented in a custom R code based on the ‘ordinalNet’ package 75 . We used a weighted sum of all the three matrices to describe autocorrelation among languages 17 . The weight was estimated by maximum likelihood using the ‘L-BFGS-B’ method 76 in the ‘optim’ function in R.
To select the best model to predict endangerment level in our data, we first randomly divided the data into a training dataset (including 2/3 of the languages) and a test dataset (the remaining 1/3 of the languages). Then, we grouped highly correlated independent variables together and applied a stepwise selection procedure to the training dataset (see step 3) to select candidate models (details in Supplementary Methods 4.4 ). The procedure started with a model of a single independent variable that had the highest likelihood to the training dataset, then goes through each group (see step 2) in a random order by adding a variable of the group to the model that significantly and maximally increased model fit, and removing a variable of the group from the model that had the least and non-significant impact on model fit. These steps were repeated until there were no more variables that could be added that increased model fit, or could be subtracted without reducing model fit. This model selection procedure left us with a set of candidate models. Lastly, we measured the predictive power of each model by predicting the threat status of the languages in the test dataset and constructed the best model on the basis of its predictive power.
The best model was constructed by including predictor variables that were selected in over one-third of the candidate models which did not significantly differ in their predictive power from the model with the highest predictive power. We then estimated the coefficients of predictor variables on the complete dataset. We used this best model to predict, for each language, the probability that the language falls in each of the seven endangerment levels (combining 1–6a into one ‘Stable’ level; Supplementary Table 1 ). Using these probabilities, we randomly sampled the endangerment level of each language and counted the number of languages with sampled endangerment level of 2 or above (that is, EGIDS 6b–10) as the number of languages predicted to be endangered, or those in the top two levels (that is EGIDS 9–10) as the number of languages predicted to be Sleeping. This procedure was repeated 1,000 times to generate the probability distribution of the number of languages predicted to be endangered or Sleeping. We found that the expected endangerment level tends to be lower than the reported endangerment level for individual languages (Supplementary Fig. 6 ), but, over all the languages, the model accurately predicted the proportions of languages that are endangered and sleeping (Fig. 3 ).
In some cases, the mismatch between predicted and observed endangerment levels may reflect ‘latent risk’ in endangerment status 27 : languages that have characteristics typical of an endangered language, such as low L1 speaker population size, yet are rated as stable (Extended Data Fig. 1 ). These languages may be expected to come under increasing threat in the future. For example, Yindjibarndi {yij}, a language of the Pilbara region of Australia, has an EGIDS rating of 6a (Stable) but has a small L1 speaker population (310) and is in an area where many languages are endangered or no longer spoken. Our model predicts the expected endangerment level of this language as ‘Critically Endangered’ (EGIDS 8) at present, and without intervention to ensure language vitality, it could potentially be no longer spoken within 80 years. The reported endangerment level and the predicted probability of each language falling in each endangerment level at present, in 40 years and in 80 years are listed in Supplementary Data 4 .
Future prediction
We used the best model of current language endangerment status to predict possible future changes in endangerment status for our global database of languages. Current EGIDS levels give us information on intergenerational transmission, so we can use that information to model declining L1 speaker population: if a language is currently only spoken by adults and not transmitted to children, then, without revitalization, there will be no more L1 speakers once the current speakers die. EGIDS also indicates which languages are declining in L1 speaker population so we can model the probable decline in numbers in 40 years (2060) and 80 years (2100; Supplementary Methods 5.2.1 ). These models predict possible patterns of language loss in the absence of revitalization programmes that might increase the number of L1 speakers, by assuming that without intervention to improve language transmission and vitality, endangered languages will undergo demographic shift that changes endangerment level, as described in Supplementary Methods 5.1 and Table 7 . These predictions are conservative in the sense that they assume that languages that are not currently endangered will remain stable into the future. We emphasize that this procedure is specifically modelling the shift in number of first language (L1) speakers of a language, not the population they belong to. A population may thrive and its ethnic identity remain strong even if speakers shift to a different language. To model the L1 speaker population size, we need to consider generational transmission of the language (that is, are children learning it as their first language?), rather than the number of people in the population that they belong to.
For example, if a language with an EGIDS level of 6b (Threatened) is predicted to be Endangered (EGIDS level 7) in the future on the basis of having no child L1 speakers, we adjust the probability distribution of the endangerment level predicted by the model for the language at that timepoint by shifting the probability distribution one level up, setting the probability that the language has an endangerment level lower than Endangered to zero, and renormalizing the probability distribution. We then randomly sample the endangerment level of each language, and count the number of languages overlapping each hex grid that are Endangered or Sleeping. This procedure is repeated 1,000 times to get the probability distribution of the number of languages predicted to be endangered or sleeping in each hex grid. We plot the combined predictions on a map, showing both the expected value of the number of languages per grid that are endangered or sleeping in 40 and 80 years, and also the proportion of languages per grid that are Threatened, Endangered or Sleeping. In the Supplementary Information , we demonstrate how this predictive model can be extended to incorporate future values of predictor variables, such as changing climate or land use.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
All variables analysed are provided in Supplementary Data . These variables are derived from a range of sources, as cited in the text and in Table 1 (most of these data are freely available but some are under license).
Code availability
Code for data preparation is available at https://github.com/rdinnager/language_endangerment . Code for running the analysis is available at https://github.com/huaxia1985/LanguageEndangerment . The custom R code includes functions that modify functions in the ‘ordinalNet ’ R package to correct for autocorrelation in ordinal probit regression.
Change history
03 february 2022.
A Correction to this paper has been published: https://doi.org/10.1038/s41559-022-01684-4
Rehg, K. L. & Campbell, L. The Oxford Handbook of Endangered Languages (Oxford Univ. Press, 2018).
Romaine, S. in Language and Poverty (eds Harbert, W. et al.) Ch. 8 (Multilingual Matters, 2009).
Sallabank, J. & Austin, P. The Cambridge Handbook of Endangered Languages (Cambridge Univ. Press, 2011).
Sutherland, W. J. Parallel extinction risk and global distribution of languages and species. Nature 423 , 276–279 (2003).
Article CAS Google Scholar
Eberhard, D. M., Simons, G. F. & Fennig, C. D. Ethnologue: Languages of the World 22nd edn (SIL International, 2019); https://www.ethnologue.com/
Moseley, C. Atlas of the World’s Languages in Danger (UNESCO Publishing, 2010); http://www.unesco.org/culture/en/endangeredlanguages/atlas
Catalogue of Endangered Languages (University of Hawaii at Manoa, 2020); http://www.endangeredlanguages.com
Campbell, L. & Okura, E. in Cataloguing the World’s Endangered Languages 1st edn (eds Campbell, L. & Belew, A.) 79–84 (Routledge, 2018).
The IUCN Red List of Threatened Species Version 2019-2 (IUCN, 2019); http://www.iucnredlist.org
Romaine, S. in The Routledge Handbook of Ecolinguistics (eds Fill, A. F. & Penz, H.) Ch. 3 (Routledge, 2017).
Crystal, D. Language Death (Cambridge Univ. Press, 2000).
Simons, G. F. Two centuries of spreading language loss. Proc. Linguist. Soc. Am. 4 , 27–38 (2019).
Article Google Scholar
Krauss, M. The world’s languages in crisis. Language 68 , 4–10 (1992).
Brondizio, E. S., Settele, J., Díaz, S. & Ngo, H. T. (eds) Global Assessment Report on Biodiversity and Ecosystem Services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES, 2019).
Bowern, C. Language vitality: theorizing language loss, shift, and reclamation (Response to Mufwene). Language 93 , e243–e253 (2017).
Mufwene, S. S. Language vitality: The weak theoretical underpinnings of what can be an exciting research area. Language 93 , e202–e223 (2017).
Hua, X., Greenhill, S. J., Cardillo, M., Schneemann, H. & Bromham, L. The ecological drivers of variation in global language diversity. Nat. Commun. 10 , 2047 (2019).
Grenoble, L. A. & Whaley, L. J. in Endangered Languages (eds Grenoble, L. A. & Whaley, L. J.) 22–54 (Cambridge Univ. Press, 1998).
Cardillo, M., Bromham, L. & Greenhill, S. J. Links between language diversity and species richness can be confounded by spatial autocorrelation. Proc. R. Soc. B 282 , 20142986 (2015).
Amano, T. et al. Global distribution and drivers of language extinction risk. Proc. R. Soc. B 281 , 20141574 (2014).
Loh, J. & Harmon, D. Biocultural Diversity: Threatened Species, Endangered Languages (WWF, 2014).
Fishman, J. A. Reversing Language Shift: Theoretical and Empirical Foundations of Assistance to Threatened Languages Vol. 76 (Multilingual Matters, 1991).
Lewis, M. P. & Simons, G. F. Assessing endangerment: expanding Fishman’s GIDS. Rev. Roum. Linguist. 55 , 103–120 (2010).
Google Scholar
Hinton, L. in The Green Book of Language Revitalization in Practice (eds Hinton, L. & Hale, K.) 413–417 (Brill, 2001).
Hobson, J. R. Re-awakening Languages: Theory and Practice in the Revitalisation of Australia’s Indigenous Languages (Sydney Univ. Press, 2010).
Di Marco, M. et al. A novel approach for global mammal extinction risk reduction. Conserv. Lett. 5 , 134–141 (2012).
Cardillo, M., Mace, G. M., Gittleman, J. L. & Purvis, A. Latent extinction risk and the future battlegrounds of mammal conservation. Proc. Natl Acad. Sci. USA 103 , 4157–4161 (2006).
Bolam, F. C. et al. How many bird and mammal extinctions has recent conservation action prevented? Conserv. Lett. 14 , e12762 (2020).
Balmford, A. Extinction filters and current resilience: the significance of past selection pressures for conservation biology. Trends Ecol. Evol. 11 , 193–196 (1996).
Brenzinger, M. Language Death: Factual and Theoretical Explorations with Special Reference to East Africa (Mouton de Gruyter, 1992).
Aikhenvald, A. Y. in Language Endangerment and Language Maintenance: An Active Approach (eds Bradley, D. & Bradley, M.) 24–33 (Taylor & Francis, 2002).
Aikhenvald, A. Y. in Lectures on Endangered Languages: 5. Endangered Languages of the Pacific Rim (eds Sakiyama, O. & Endo, F.) 97–142 (ELPR, 2004).
van Driem, G. in Language Diversity Endangered (ed. Brenzinger, M.) Ch. 14 (Mouton de Gruyter, 2007).
Muysken, P. in Historicity and Variation in Creole Studies (eds Highfield, A. & Valdman, A.) 52–78 (Karoma, 1981).
Gal, S. Language Shift: Social Determinants of Linguistic Change in Bilingual Austria (Academic Press, 1979).
Holmquist, J. Social correlates of a linguistic variable: a study in a Spanish village. Lang. Soc. 14 , 191–203 (1985).
Dobrin, L. M. in Endangered Languages: Beliefs and Ideologies in Language Documentation and Revitalization (eds Austin, P. K. & Sallabank, J.) Ch. 7 (British Academy, 2014).
Sasse, H.-J. in Language Death: Factual and Theoretical Explorations with Special Reference to East Africa (ed Brenzinger M.) 7–30 (Mouton de Gruyter, 1992).
Wang, Y. & Phillion, J. Minority language policy and practice in China: the need for multicultural education. Int. J. Multicult. Educ. 11 , 1–14 (2009).
McCarty, T. L. in Language Policies in Education: Critical Issues (ed. Tollefson, J. W.) 285–307 (2002).
Wiese, A.-M. & Garcia, E. E. The Bilingual Education Act: language minority students and equal educational opportunity. Biling. Res. J. 22 , 1–18 (1998).
Bromham, L., Hua, X., Algy, C. & Meakins, F. Language endangerment: a multidimensional analysis of risk factors. J. Lang. Evol. 5 , 75–91 (2020).
Gao, X. & Ren, W. Controversies of bilingual education in China. Int. J. Biling. Educ. Biling. 22 , 267–273 (2019).
Dimmendaal, G. J. in Investigating Obsolescence: Studies in Language Contraction and Death (ed. Dorian N. C.) 13-32 (Cambridge Univ. Press, 1989).
Brenzinger, M. in Language Diversity Endangered (ed. Brenzinger, M.) IX–XVII (Mouton de Gruyter, 2007).
Kuussaari, M. et al. Extinction debt: a challenge for biodiversity conservation. Trends Ecol. Evol. 24 , 564–571 (2009).
Tilman, D., May, R. M., Lehman, C. L. & Nowak, M. A. Habitat destruction and the extinction debt. Nature 371 , 65–66 (1994).
Meijer, J. R., Huijbregts, M. A., Schotten, K. C. & Schipper, A. M. Global patterns of current and future road infrastructure. Environ. Res. Lett. 13 , 064006 (2018).
Laurance, W. F. & Balmford, A. A global map for road building. Nature 495 , 308–309 (2013).
Newbold, T. et al. Global effects of land use on local terrestrial biodiversity. Nature 520 , 45–50 (2015).
Crawford, J. Language politics in the U.S.A.: the paradox of bilingual education. Soc. Justice 25 , 50–69 (1998).
Hallett, D., Chandler, M. J. & Lalonde, C. E. Aboriginal language knowledge and youth suicide. Cogn. Dev. 22 , 392–399 (2007).
Taff, A. et al. in The Oxford Handbook of Endangered Languages (eds Rehg, K. & Campbell, L.) 862–883 (Oxford Univ. Press, 2018).
Dinku, Y. et al. Language Use is Connected to Indicators of Wellbeing: Evidence from the National Aboriginal and Torres Strait Islander Social Survey 2014/15 . CAEPR Working Paper no. 132/2019 (CAEPR, 2020); https://doi.org/10.25911/5ddb9fd6394e8
Essegbey, J., Henderson, B. & McLaughlin, F. Language Documentation and Endangerment in Africa (John Benjamins, 2015).
Davis, J. L. Language affiliation and ethnolinguistic identity in Chickasaw language revitalization. Lang. Commun. 47 , 100–111 (2016).
Clyne, M. in Maintenance and Loss of Minority Languages (eds Fase, W. et al.) 17–36 (John Benjamins, 1992).
Cardillo, M. et al. The predictability of extinction: biological and external correlates of decline in mammals. Proc. R. Soc. B 275 , 1441–1448 (2008).
Evans, N. Dying Words: Endangered Languages and What They Have to Tell Us Vol. 22 (John Wiley & Sons, 2011).
Ndhlovu, F. in Language Planning and Policy: Ideologies, Ethnicities, and Semiotic Spaces of Power (eds Abdelhay, A. et al.) 133–151 (Cambridge Scholars, 2020).
Hammarström, H., Forkel, R. & Haspelmath, M. Glottolog 4.1. http://glottolog.org (Max Planck Institute for the Science of Human History, 2019).
Lewis, M. P., Simons, G. F. & Fennig, C. D. Ethnologue: Languages of the World 17th edn http://www.ethnologue.com (SIL International, 2013).
King, K. A., Schilling-Estes, N., Lou, J. J., Fogle, F. & Soukup, B. Sustaining Linguistic Diversity: Endangered and Minority Languages and Language Varieties (Georgetown Univ. Press, 2008).
Lee, N. H. & van Way, J. Assessing levels of endangerment in the Catalogue of Endangered Languages (ELCat) using the Language Endangerment Index (LEI). Lang. Soc. 45 , 271–292 (2016).
Language Vitality and Endangerment: International Expert Meeting on UNESCO Programme Safeguarding of Endangered Languages (UNESCO, 2003).
Tershy, B. R., Shen, K.-W., Newton, K. M., Holmes, N. D. & Croll, D. A. The importance of islands for the protection of biological and linguistic diversity. BioScience 65 , 592–597 (2015).
Igboanusi, H. Is Igbo an endangered language? Multilingua 25 , 443–452 (2006).
Ravindranath, M. & Cohn, A. C. Can a language with millions of speakers be endangered? J. Southeast Asian Linguist. Soc. 7 , 64–75 (2014).
Venter, O. et al. Sixteen years of change in the global terrestrial human footprint and implications for biodiversity conservation. Nat. Commun. 7 , 12558 (2016).
Bromham, L., Hua, X., Cardillo, M., Schneemann, H. & Greenhill, S. J. Parasites and politics: why cross-cultural studies must control for relatedness, proximity and covariation. R. Soc. Open Sci. 5 , 181100 (2018).
Bromham, L., Skeels, A., Schneemann, H., Dinnage, R. & Hua, X. There is little evidence that spicy food in hot countries is an adaptation to reducing infection risk. Nat. Hum. Behav. https://doi.org/10.1038/s41562-020-01039-8 (2021).
Purvis, A., Cardillo, M., Grenyer, R. & Collen, B. in Phylogeny and Conservatio n (eds Purvis, A. et al.) 295–316 (Cambridge Univ. Press, 2005).
Hurlbert, S. H. Pseudoreplication and the design of ecological field experiments. Ecol. Monogr. 54 , 187–211 (1984).
Dow, M. M. Network autocorrelation regression with binary and ordinal dependent variables: Galton’s problem. Cross Cult. Res. 42 , 394–419 (2008).
Wurm, M. J., Rathouz, P. J. & Hanlon, B. M. Regularized ordinal regression and the ordinalNet R package. Preprint at https://arxiv.org/abs/1706.05003 (2017).
Byrd, R. H., Lu, P., Nocedal, J. & Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16 , 1190–1208 (1995).
Barro, R. L. & Lee, J.-W. A new data set of educational attainment in the world, 1950–2010. J. Dev. Econ. 104 , 184–198 (2013).
Leclerc, J. L’aménagement linguistique dans le monde http://www.axl.cefan.ulaval.ca/monde/index_alphabetique.htm (2019).
Solt, F. The Standardized World Income Inequality Database, Version 8 https://doi.org/10.7910/DVN/LM4OWF (2019).
Global Agro-ecological Zones (GAEZ v3.0) (FAO, IIASA, 2010).
Download references
Author information
These authors contributed equally: Lindell Bromham, Xia Hua.
Authors and Affiliations
Macroevolution and Macroecology, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
Lindell Bromham, Russell Dinnage, Andrew Ritchie, Marcel Cardillo & Xia Hua
ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, Australian Capital Territory, Australia
Hedvig Skirgård & Simon Greenhill
Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
ARC Centre of Excellence for the Dynamics of Languages, School of Languages and Cultures, University of Queensland, Brisbane, Queensland, Australia
Felicity Meakins
Mathematical Sciences Institute, Australian National University, Canberra, Australian Capital Territory, Australia
You can also search for this author in PubMed Google Scholar
Contributions
L.B., R.D., H.S., A.R., M.C., F.M., S.G. and X.H. conceived and designed the experiments; X.H. analysed the data; L.B., R.D., H.S., A.R. and M.C. contributed materials/analysis tools; L.B. wrote the paper.
Corresponding author
Correspondence to Lindell Bromham .
Ethics declarations
Competing interests.
The authors declare no competing interests.
Additional information
Peer review information Nature Ecology and Evolution thanks Ruth Oliver, Salikoko Mufwene, Claire Bowern and Hannah Wauchope for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended data fig. 1 residual in the best model for language endangerment level..
Residuals of the model predicting number of endangered languages ( a ) and Sleeping languages ( b ), calculated, for each hex grid, as the predicted number of languages with distribution in the hex grid and with (A) predicted endangerment level above ‘Stable’ (corresponding to EGIDS 6b-10) and (B) predicted to be no longer spoken (ie EGIDS 9-10) minus the number of languages with distribution in the hex grid and with reported EGIDS from 6b-10 (A) and from 9-10 (B). The predicted number of languages in each category is calculated by using the best model to estimate the probability distribution of endangerment level for each language with distribution in the hex grid, sampling from the probability distribution the endangerment level of each language, repeating the sampling 1000 times, and averaging the number of languages with sampled endangerment level of endangered or Sleeping over the 1000 times. A negative value (blue) indicates that the model estimates fewer endangered or Sleeping languages than the reported EGIDS score from Ethnologue (e17/e16). A positive value (red) indicates the model estimating a greater number of endangered or Sleeping languages than observed. In some cases, this could indicate higher ‘latent risk’, for languages that have many of the predictors of high endangerment but are currently rated as stable or at a lower level of endangerment. Dark grey areas do not have data for all the independent variables in the best model for language endangerment level. Language distribution data from WLMS 16 worldgeodatasets.com .
Extended Data Fig. 2 Current and future predicted distribution of endangered languages.
The current patterns of language endangerment are plotted as absolute number of languages with a reported EGIDS score of 6b-10 with distribution in each hex grid. a ) the number of languages with observed EGIDS from 6b to 10 at present. b ) the predicted number of languages with EGIDS from 6b to 10 in 40 years minus the predicted number of languages with EGIDS from 6b to 10 at present. c ) the predicted number of languages with EGIDS from 6b to 10 in 80 years minus the predicted number of languages with EGIDS from 6b to 10 in 40 years. The predicted number of languages is calculated in the same way as Supplementary Fig. 7 . Dark grey areas have no data for independent variables in the best model for language endangerment level. Language distribution data from WLMS 16 worldgeodatasets.com .
Extended Data Fig. 3 Current and future predicted proportion of endangered languages.
a ) the proportion of languages with observed EGIDS from 6b to 10 at present. b ) the predicted proportion of languages with EGIDS from 6b to 10 in 40 years minus the predicted proportion of languages with EGIDS from 6b to 10 at present. c ) the predicted proportion of languages with EGIDS from 6b to 10 in 80 years minus the predicted proportion of languages with EGIDS from 6b to 10 in 40 years. The predicted proportion of languages is calculated as the predicted number of languages divided by the total number of languages with distribution in each hex grid, where the predicted number of languages is calculated in the same way as Fig. 7. Dark grey areas have no data for independent variables in the best model for language endangerment level. Language distribution data from WLMS 16 worldgeodatasets.com .
Extended Data Fig. 4 Current and future predicted number of languages no longer spoken.
a ) the number of languages with observed EGIDS from 9 to 10 at present. b ) the predicted number of languages with EGIDS from 9 to 10 in 40 years minus the predicted number of languages with EGIDS from 9 to 10 at present. c ) the predicted number of languages with EGIDS from 9 to 10 in 80 years minus the predicted number of languages with EGIDS from 9 to 10 in 40 years. The predicted number of languages is calculated in the same way as Fig. 7. Dark grey areas have no data for independent variables in the best model for language endangerment level. Language distribution data from WLMS 16 worldgeodatasets.com .
Extended Data Fig. 5 Current and future predicted proportion of languages no longer spoken.
The proportion of Sleeping languages with distribution in each hex grid. a ) the proportion of languages with observed EGIDS from 9 to 10 at present. b ) the predicted proportion of languages with EGIDS from 9 to 10 in 40 years minus the predicted proportion of languages with EGIDS from 9 to 10 at present. c ) the predicted proportion of languages with EGIDS from 9 to 10 in 80 years minus the predicted proportion of languages with EGIDS from 9 to 10 in 40 years. The predicted proportion of languages is calculated as the predicted number of languages divided by the total number of languages with distribution in each hex grid, where the predicted number of languages is calculated in the same way as Fig. 7. Dark grey areas have no data for independent variables in the best model for language endangerment level. Language distribution data from WLMS 16 worldgeodatasets.com .
Supplementary information
Supplementary information.
Supplementary Information.
Peer Review Information
Supplementary data.
Supplementary Data 1–4.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
About this article
Cite this article.
Bromham, L., Dinnage, R., Skirgård, H. et al. Global predictors of language endangerment and the future of linguistic diversity. Nat Ecol Evol 6 , 163–173 (2022). https://doi.org/10.1038/s41559-021-01604-y
Download citation
Received : 06 May 2021
Accepted : 27 October 2021
Published : 16 December 2021
Issue Date : February 2022
DOI : https://doi.org/10.1038/s41559-021-01604-y
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
This article is cited by
Le système meta ai donne un coup de pouce aux langues menacées - à condition que les humains ne soient pas oubliés.
Nature Africa (2024)
Meta’s AI system is a boost to endangered languages — as long as humans aren’t forgotten
Nature (2024)
Islands are engines of language diversity
- Lindell Bromham
- Keaghan J. Yaxley
- Marcel Cardillo
Nature Ecology & Evolution (2024)
Critical transitions in the Amazon forest system
- Bernardo M. Flores
- Encarni Montoya
- Marina Hirota
Descriptive ethnobotanical studies are needed for the rescue operation of documenting traditional knowledge
- Łukasz Łuczaj
Journal of Ethnobiology and Ethnomedicine (2023)
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
ORIGINAL RESEARCH article
Spoken language development and the challenge of skill integration.
- 1 Laboratory for Oral Language Acquisition, Linguistic Department, University of Potsdam, Potsdam, Germany
- 2 Haskins Laboratories, New Haven, CT, United States
- 3 Department of Linguistics, University of Oslo, Oslo, Norway
- 4 Department of Education, Jyväskylä University, Jyväskylä, Finland
The development of phonological awareness, the knowledge of the structural combinatoriality of a language, has been widely investigated in relation to reading (dis)ability across languages. However, the extent to which knowledge of phonemic units may interact with spoken language organization in (transparent) alphabetical languages has hardly been investigated. The present study examined whether phonemic awareness correlates with coarticulation degree, commonly used as a metric for estimating the size of children’s production units. A speech production task was designed to test for developmental differences in intra-syllabic coarticulation degree in 41 German children from 4 to 7 years of age. The technique of ultrasound imaging allowed for comparing the articulatory foundations of children’s coarticulatory patterns. Four behavioral tasks assessing various levels of phonological awareness from large to small units and expressive vocabulary were also administered. Generalized additive modeling revealed strong interactions between children’s vocabulary and phonological awareness with coarticulatory patterns. Greater knowledge of sub-lexical units was associated with lower intra-syllabic coarticulation degree and greater differentiation of articulatory gestures for individual segments. This interaction was mostly nonlinear: an increase in children’s phonological proficiency was not systematically associated with an equivalent change in coarticulation degree. Similar findings were drawn between vocabulary and coarticulatory patterns. Overall, results suggest that the process of developing spoken language fluency involves dynamical interactions between cognitive and speech motor domains. Arguments for an integrated-interactive approach to skill development are discussed.
Introduction
In the first decade of life, most children learn to speak their native language effortlessly, without explicit instruction but with daily exposure and experiencing of their native language as a speech motor activity. With the gradual expansion of children’s expressive repertoire comes the fine tuning of phonological knowledge (e.g., Ferguson and Farwell, 1975 ; Menn and Butterworth, 1983 ; Beckman and Edwards, 2000 ; Munson et al., 2012 ). While relationships between lexical and phonological developments have been well documented over the last decades ( Storkel and Morrisette, 2002 ; Edwards et al., 2004 , 2011 ; Stoel-Gammon, 2011 ; Vihman, 2017 ), research addressing their interaction with spoken language production has often been restricted to production accuracy or duration measures as metrics for assessing spoken language proficiency (e.g., Edwards et al., 2004 ; Munson et al., 2005 ). Likewise, speech motor control studies have provided in-depth analyses of developmental changes in articulatory variability, or movement velocity during word or sentence production ( Smith and Goffman, 1998 ; Smith and Zelaznik, 2004 ; Green et al., 2010 ) without equivalently thorough assessments of children’s phonological or lexical knowledge allowing developmental interactions to be evaluated. Despite a certain imbalance in the focus and analytical approaches of interaction studies, the findings suggest that spoken language proficiency entails dynamical interactions among a set of language-related domains including speech motor skill.
In the present research, we adopted an integrated approach to the study of spoken language development considering parallel developments of the lexical, phonological, and speech motor systems. The study more specifically investigated interactions between domains that have not yet been empirically connected: in particular phonological awareness , the awareness of the particulate nature of the language (e.g., Fowler, 1991 ; Studdert-Kennedy, 1998 , 2005 ) that develops with literacy (reviews in Anthony and Francis, 2005 ; Brady et al., 2011 ; Goswami and Bryant, 2016 ; in German: Fricke et al., 2016 ) and anticipatory coarticulation , a mechanism that is deeply rooted in kinematics (e.g., Parush et al., 1983 ) and motor planning (e.g., Whalen, 1990 ; Levelt and Wheeldon, 1994 ; Grimme et al., 2011 ; Perrier, 2012 ; Davis and Redford, 2019 ) and is fundamental to speech fluency.
While phonological awareness and coarticulatory mechanisms may in principle belong to different realms, we argue that they are developmentally strongly interconnected: phonological awareness relates to the ability to consciously extract functional units of phonological organization from the continuous speech flow (e.g., syllables, segments) and combine those discrete units into new sequences of variable size and meaning (e.g., Metsala, 2011 ). Coarticulation embodies speakers’ structural knowledge of the language, combining and (re)modeling its elementary particles into continuous articulatory movements and acoustic streams, hence contextualizing abstract representations into a decipherable “speech code” ( Liberman et al., 1974 ; Fowler et al., 2016 ). In this perspective, investigating developmental changes in children’s coarticulatory processes may give us an opportunity to track how a combinatorial principle is situated within the representational and production levels and to capture more broadly how motor and cognitive functions come together to develop the skill of spoken language.
While children’s speech organization very early reflects their ability to combine phonetic units, the explicit awareness of the combinatorial nature of their native language forming larger compounds from smaller-sized units follows a more protracted development and seems to climax around the time children acquire literacy (e.g., Gillon, 2007 ). During that period, a gain in phonological awareness allows children to convert the already acquired phonetic units (i.e., sounds they hear and produce by means of distinct speech gestures) into phonological units. However, whether the acquisition of phonological knowledge only relates to attaining literacy or also modifies children’s spoken language organization in fundamental ways remains an empirical question. The alternative direction in which a gain in spoken language practice would stimulate the development of phonological awareness and literacy has also not yet been demonstrated. The present study provides a first step toward addressing this issue by testing whether lexical and phonological skills interact with speech motor control in development. More specifically, we examined whether children with greater knowledge of the segmental makeup of words in their native language exhibited a segmentally specified organization of their speech gestures and reflected in their coarticulatory patterns. We focused on the period encompassing kindergarten to the end of the first primary school year, which is relevant for phonological development as well as for attaining literacy. Our motivations driven from empirical research are further outlined below.
What Are Children’s Units of Spoken Language Organization
In the last decades, a growing number of developmental studies in the area of spoken language ability have focused on coarticulation degree, which characterizes the extent to which the articulatory gestures for neighboring phonemes overlap temporally (e.g., Browman and Goldtstein, 1992 ). Looking specifically at lingual coarticulation, which regards the gestural organization of the tongue, some research has found a developmental decrease in vocalic anticipatory coarticulation over previous segments, within the syllables (e.g., Nittrouer et al., 1996 ; Zharkova et al., 2011 ; Noiray et al., 2018 ) and beyond the syllabic span (e.g., Nijland et al., 2002 ; Rubertus and Noiray, 2018 ). On the basis of these results, Noiray et al. (2019) reasoned that spoken language fluency may entail a gradual narrowing of speech units toward smaller-sized units. In young children, vowels may represent building blocks, which children organize their speech around because of their perceptual salience, long duration, and earlier acquisition compared to consonants (e.g., Polka and Werker, 1994 ; review Nazzi and Cutler, 2019 ). Hence, children’s vocalic and consonantal gestures may be activated more simultaneously than in adults, resulting in an overall larger vocalic influence on previous consonants and a greater degree of vocalic coarticulation than for adults. Instead, adults have been found to organize their speech with more temporally individuated gestures ( Abakarova et al., 2018 ; Rubertus and Noiray, 2018 ). The result of rather large unit size speech organization echoes the multiple findings of whole-word learning ( Vihman and Velleman, 1989 ; Keren-Portnoy et al., 2009 ; Menn and Vihman, 2011 ), transitional probability across syllables (e.g., Jusczyk et al., 1993 ; Saffran et al., 1996 ), or lexically grounded phonological development and production accuracy ( Edwards et al., 2004 ; Velleman and Vihman, 2007 ; Vihman and Keren-Portnoy, 2013 ). The opposite finding of a lesser degree of coarticulation between consonants and vowel gestures in children compared to adults has also been reported (e.g., Katz et al., 1991 ), favoring a more segmental perspective of early spoken units.
Based on our own in-depth examinations of coarticulatory mechanism in both adults ( Abakarova et al., 2018 ) and children ( Noiray et al., 2018 ; Rubertus and Noiray, 2018 ), we have argued that (young) speakers exhibit gradients of coarticulation degree within a continuum from a more syllabic to a more segmental organization. The degree to which segment overlap depends on the gestural demands associated with the combined segments. In adults, contextual differences in coarticulation degree are well attested (e.g., Recasens, 1985 ; Fowler, 1994 ). For instance, syllables recruiting a single organ for the consecutive production of both consonantal and vowel targets (e.g. the tongue in /du/) require from speakers a functional differentiation between the subparts of the tongue (tongue tip, tongue dorsum). This type of syllable further requires greater spatiotemporal coordination in comparison to syllables recruiting two separate primary organs (e.g., the lips and tongue dorsum in /bi/). This phenomenon described within the theory of coarticulatory resistance has been reported in adults across languages over the past decades (review in Recasens, 2018 ). In children, extensive kinematic investigations of coarticulatory processes have been more challenging and hence somewhat restricted in scope compared to adults (e.g., limited variety of stimuli that can be tested in the young age, age range, sample size, scarcity of methodological replications across studies). Yet, once these studies are examined together, they support the view of coarticulatory gradients as observed in adults. While children show overall greater coarticulation degree than adults, they also exhibit contextual effects on coarticulation degree, which result from the particular combination of gestural goals between individual consonants and vowels. Based on those observations, we recently suggested a gestural approach as a “unifying organizational scheme to relate adults’ to children’s patterns. How coarticulatory organization matures over time is then no longer solely a question of direction (toward a greater or lesser coarticulatory degree) or categorical change in phonological organization (e.g., into segments or syllables) but a question of how a primitive gestural scheme shares similar tools (the articulators of speech), constraints, and principles (dynamic interarticulator coordination over time) with adults to instantiate complex phonetic combinations in line with the native language’s phonological grammar” ( Noiray et al., 2019 , p. 3037). In this context, the question of (early) units of speech production may be viewed as a part-whole interaction.
The Development of the Lexical, Phonological, and Motor Domains
While the maturation of the speech motor system is central to spoken language fluency, lexical and phonological developments are equally crucial (e.g., Smith et al., 2010 ; Edwards et al., 2011 ), and research has suggested that they interact dynamically over time (e.g., Beckman et al., 2007 ; Sosa and Stoel-Gammon, 2012 ; Vihman, 2017 ). A main hypothesis motivating the present study is that adults’ coarticulatory patterns do not differ from those of children on the sole basis of greater precision of control from children’s speech production system. Adults also have (1) built an expressive lexicon from which to harness their phonological representations, (2) they have gained an explicit understanding of the structure of their language, and (3) an ability to manipulate this information into a quasi-infinite set of intelligible spoken forms. Hence, considering speech motor development as a goal-directed process – for example, speaking a language fluently – what distinguishes children from adults is that children have not yet built explicit correspondences between phonetic segments and their motor realizations. The rapid growth of the expressive lexicon observed during the kindergarten-to-school years may help children solve this correspondence problem and more generally develop stable relations between representational and executional levels. Vocabulary is indeed often considered the backbone of language acquisition, supporting the development of phonological representations (e.g., Ferguson and Farwell, 1975 ; Metsala, 1999 ) and production accuracy (e.g., Edwards et al., 2004 ; Nicholson et al., 2015 ). Previous research also suggests that children first develop articulatory “routines” for the syllables present in their expressive repertoire (e.g., Menn and Butterworth, 1983 ; Munson et al., 2005 ; Ziegler and Goswami, 2005 ; Vihman, 2017 ). This lexically based process may lay the ground for increased phonetic distinctions along the dimensions of height, fronting and rounding for vowels, place and manner of articulation for consonants, and the maturation of coarticulatory flexibility for a wider range of phonetic environments.
This knowledge is at first experience-based; before entering primary school, children have limited explicit knowledge about the structural organization of their native language, that is, they have limited conscious awareness that the words they hear can be segmented into smaller-sized units (and recombined into new forms; e.g., Liberman et al., 1974 ; Gillon, 2007 ). Note that while the development of phonological awareness differs as a function of orthographic transparency (e.g. Fricke et al., 2016 ) or the age at which children learn how to read (e.g., review in Wimmer et al., 2000 ; Mann and Wimmer, 2002 ; Schaeffer et al., 2014 ; Goswami and Bryant, 2016 ) on average, children in kindergarten show only more or less equivalent proficiency in syllabic units’ awareness to that of school-aged children (in English: e.g., Liberman et al., 1974 ; in German: Ziegler and Goswami, 2005 ; Schaeffer et al., 2014 ) but no advanced phonemic awareness before explicitly learning how to read. Taken together, young listener-speakers would progressively access smaller units allowing them to decipher a wider range of speech forms and manipulate those flexible units to craft increasingly more complex speech flows. Figure 1 provides an illustrative conceptualization of these seemingly parallel developmental trajectories, from more holistic access and production of large units (e.g., lexemes) to more segmentally specified representations and coarticulatory organizations. Developmental overlaps (e.g., from lexeme access to rhyme access) and short-term regressions between learning phases may at times occur (e.g., Anthony et al., 2003 ), as noted in other domains (e.g., “phonological templates” during early word production: Vihman and Vihman, 2011 ; lip-jaw movement variability: Green et al., 2002 ; walking: Thelen and Smith, 1994 ). The developmental pace may also well change over time, as in other domains (e.g., speech motor control: Green et al., 2010 ). Figure 1 highlights the nonlinearity of those developmental processes over time (blue descending and ascending curves). With an advanced knowledge of their native language and a mature control of their speech motor system, adults naturally exhibit more flexible, context-specific organizations with greater or lesser coarticulation degree depending on the gestural properties of the individual segments assembled with one another.
Figure 1 . Theoretical conceptualization of the parallel development of phonological awareness and coarticulatory organization from holistic to more segmental organizations. The horizontal arrow ( x -axis) illustrates developmental time (age in years). The curves indicate the nonlinear change in phonological and coarticulatory organizations over time.
Overall, results from these separate literatures suggest that the development of lexical, phonological, and speech motor abilities are fundamental to the maturation of children’s spoken language. However, to our knowledge, empirical studies examining their interactions with precision have been rare, and this gap has prevented a unifying account of spoken language development. The central hypothesis driving our current research is that the transition from the rather self-paced development of large unit phonological awareness to the more explicit knowledge of the phonemic constituents of the language initiated in primary school should correlate with a significant change in spoken language production from an experience-based holistic organization to a structurally informed, segmentally specified organization of spoken language. Because quantitative longitudinal investigations over a 2- to 3-year span are extremely difficult to conduct, we first opted for a cross-sectional examination of a sample of 41 children in the last 2 years of kindergarten (at 4.5 and 5.5 years of age) and the end of the first grade (at age 7). The latter cohort was chosen to ensure children have been exposed to explicit literacy instruction for a year. With this approach, we first tested for significant interactions between children’s motor, lexical, and phonological skills. Potential implications for causal relations are laid out in the discussion.
Based on our previous research, we expect differences in intra-syllabic coarticulation degree between children and adults but not necessarily between all child cohorts ( Noiray et al., 2019 ). We also anticipated consonantal effects on children’s lingual coarticulatory patterns within each age cohort as found in a preceding study investigating children’s intra-syllabic coarticulation from the age of 3 ( Noiray et al., 2018 ). More specifically, we expected a lower degree of lingual coproduction for consonant-vowel syllables requiring two constriction goals by spatially distinct articulatory organs than from those requiring two constriction goals by a single organ as found in adults (e.g., Iskarous et al., 2013 ; Abakarova et al., 2018 ), albeit to a lesser extent than adults. Importantly, expanding on previous research, we predicted greater phonological awareness and vocabulary would coincide with lower coarticulation degree, i.e., greater segmental differentiation of consonants and vowels in syllables. We further suspected interactions between motor and cognitive domains to be nonlinear and to reflect the complex dynamics in place during the development spoken language fluency. If this were found, it would suggest that the skill of spoken language fluency is not solely tied to production-related considerations but may instead result from and be an integral part of multiple interactions, which are fundamental to the development of each individual skill. If no correlation was to be found, it would on the contrary indicate that representational and production levels may not be tightly coupled in the sense that greater awareness of phonological discreteness does not interact with coarticulatory degree.
Materials and Methods
Participants.
Forty-one monolingual German children all living in the Potsdam region (Brandenburg) were tested: ten 4-year olds (6 females, mean age: 4; 06, called K1 in subsequent analyses), thirteen 5-year-old children (7 females, mean: 5; 06, called K2 hereafter) in kindergarten, and eighteen 7-year-old children at the very end of the first or very beginning of the second grade in primary school (12 females, mean: 7; 02, called P1 hereafter). The discrepancy in sample size was due to greater difficulty in recruiting children in kindergarten. All children were raised in monolingual German families without any known history of hearing, language, or cognitive impairment. They were recruited via the child registry from the BabyLab of the University of Potsdam. Ethics approval was obtained from the Ethic Committee of the University of Potsdam prior to the study. All parents were also fully informed of the study and gave written consent for their child to participate.
Production Task
The speech production task consisted in the repetition of trochaic pseudowords (i.e., conforming to German phonotactics) of the form consonant 1 -vowel-consonant 2 -schwa ( C 1 V C 2 ǝ). Target phrases used as stimuli were pre-recorded by a native German female adult speaker. Three consonants varying in place of articulation: /b/, /d/, and /g/ and six tense, long vowels /i/, /y/, /u/, /a/, /e/, and /o/ were used. Pseudowords were chosen instead of real words to combine consonants and vowels varying in lingual gestures and coarticulatory resistance. Target pseudowords were embedded in a carrier phrase with the article /aɪnə/ resulting in utterances such as /aɪnə ba:də/. Utterances were repeated six times in semi-randomized blocks. To measure lingual coarticulation, we employed the technique of ultrasound imaging (Sonosite edge, fps: 48 Hz) that permits recording movement from participants’ tongue over time while producing various speech materials ( Noiray et al., 2013 ). In this study, tongue imaging was integrated in a space journey narrative to stimulate children’s motivation to complete the task. Children were seated in a pilot seat with seatbelts, facing the operating console from a space rocket replica. The ultrasound probe on which children positioned their chin was integrated into a customized probe-holder as part of the rocket console (for a full description of the method, see Noiray et al., 2018 ). The acoustic speech signal was recorded synchronously with the ultrasound tongue video via a microphone (Shure, fps: 48KHz).
Assessment of Phonological Awareness and Vocabulary
Assessments of various levels of phonological awareness (rhyme, onset segment, and individual phonemes) were conducted with the Test für Phonologische Bewusstheitsfähigkeiten (TPB; Fricke and Schäfer, 2008 ). Prior to testing, children were familiarized with all images used as test items. The procedure for each of the TPB test is briefly summarized below; a complete description can be found by Fricke and Schäfer (2008) . The tests were scored according to the test instructions, and raw scores were considered for subsequent analyses.
Rhyme Production
Children are shown a picture and are instructed to produce (non)words that rhyme with the word corresponding to the target picture (e.g., Puppe: Muppe, Kuppe, Wuppe ). Children are instructed to provide as many rhymes as they can. However, to make the task comparable for every child, we scored children’s proficiency differently from the test instructions: for each of the 12 target words, children scored 1 point if they succeeded in giving at least one correct rhyme; if not, they scored zero. This way, we could assess the stability and generalization of the rhyming skill rather than relying on raw number of rhymes produced (e.g. if a child produced six rhymes for two target words but then failed for all other target words).
Onset Segment Deletion
Children are shown a picture and are instructed to delete the onset segment from the word represented by the picture and utter the resulting nonword (e.g. Mond: ond; Zahn: ahn). Note children were precisely instructed what to delete (e.g. “delete “m” from Mond”). A total of 12 words is tested in each age cohort.
Phoneme Synthesis
Children are instructed to produce a word after hearing a pre-recorded female voice uttering its phonemes one by one (e.g. fee: [f-e:], dose: [d-o:-z-Ə], salat: [z-a-l-a:-t]). For the onset segment deletion task, the TPB assessment uses a total of 12 words for each age cohort.
Expressive Vocabulary
Expressive vocabulary was tested with Patholinguistische Diagnostik bei Sprachentwicklungsstörungen (PDSS; Siegmüller and Kauschke, 2010 ) and widely used to assess German children’s lexical repertoire. The test consists of a 20-word picture naming task assessing nouns for the target ages (see Table 1 for an overview). In subsequent analyses, we used a composite score for phonemic awareness (PA hereafter that includes the two tasks tapping phoneme-size awareness: onset deletion and phoneme synthesis).
Table 1 . Summary of the results from the assessments tapping phonological awareness (Rhyme, Composite PA) and expressive vocabulary (VOC) conducted in 4-year-old (K1), 5-year-old (K2), and 7-year-old children at the end of first grade (P1).
We focused on output phonological tasks as well as expressive vocabulary because we were interested in their direct relationship with children’s speech production. Given that young children have a limited attention span, we could also assess children’s actual proficiency with better confidence than when conducting long series of cognitively demanding assessments. All assessments were conducted in our laboratories by experimenters trained by a speech language pathologist.
Statistical Analyses
Consistent with previous research, intra-syllabic coarticulation degree was estimated in terms of whether the lingual gesture for a target vowel was anticipated in the previous consonant (see review on vowels’ degrees of aggressiveness in the context of different consonants: Iskarous et al., 2010 ). We focused on the antero-posterior tongue dorsum position that is highly relevant in terms of articulatory and acoustical contrasts between vowels (e.g., Delattre, 1951 ). We calculated differences in tongue dorsum position between the production of consonants and following vowels. A tongue dorsum position for a consonant (e.g., /g/) that varies in the context of various vowels (e.g., /a/, /i/) indicates vocalic anticipation onto the previous consonant and hence a high coarticulation degree. On the contrary, low coarticulation degree is reflected by an absence of change in tongue dorsum position during the consonant in the context of various vowels (review in Iskarous et al., 2010 ).
Differences in coarticulation degree were estimated for each consonantal context from the midpoint of the consonant (C 1 ) compared to the vowel midpoint (V). A few preliminary processing steps were necessary. First, the corresponding midsagittal tongue contours for both C 1 and V were extracted from the ultrasound video based on the acoustic speech signal labeling. The tongue contours were then analyzed using SOLLAR (Noiray et al., submitted), a platform created in our laboratory for the analysis of kinematic data (Matlab environment). For each target tongue contour, a 100-point spline was automatically generated, and the x - and y -coordinates for each point were extracted. In subsequent analyses, we used the horizontal x -coordinate for the highest y -coordinate point of the tongue dorsum to reflect its variation in the anterior-posterior dimension (e.g., anterior position for /i/, posterior position for /u/, e.g., Abakarova et al., 2018 ). Data were normalized for each participant by setting the most anterior tongue dorsum position during the target vowel midpoints to 0 and the most posterior tongue dorsum position to 1. Tongue dorsum positions for consonant midpoints were then scaled within this range.
To test for developmental differences in coarticulation degree, we employed linear mixed effects models (LMER), using the “lme4” package in R (version 1.1–19; Bates et al., 2015 ). Coarticulation degree was calculated by regressing the horizontal position of the tongue dorsum at consonant midpoint (PEAKC 1 _X) on the horizontal position of the tongue dorsum at vowel midpoint (PEAKV_X) for each age group (K1, K2, and P1). Two interaction terms were used: Coarticulation and Consonant (C 1 ) and Coarticulation and Age. By-subject C 1 and by-word random slopes for PEAKV_X were included as random effects.
To test for an effect of phonological awareness and vocabulary on children’s coarticulation degree, we then employed Generalized Additive Modeling (GAM), a statistical approach allowing us to test for linear and nonlinear relationships ( Winter and Wieling, 2016 ; Wood, 2017 ; for a comprehensive tutorial, see Wieling, 2018 ). To date, this approach has only been used in psycholinguistic research with adults (e.g., Strycharczuk and Scobbie, 2017 ; Wieling et al., 2017 ) and only recently in the developmental domain ( Noiray et al., 2019 ). In this study, we were interested in the effect of three variables on the degree of coarticulation: RHYME, COMPOSITE_PA (a composite computed from the sum of the scores obtained for both phonemic awareness tasks: onset segment deletion and phoneme synthesis, see section “Descriptive Statistics for Phonological Awareness and Vocabulary”), and VOC. We used the function bam of the mgcv R package (version 1.8–26) and itsadug (version 2.3). Our dependent variable was again PEAKC 1 _X with respect to PEAKV_X. We predicted this value on the basis of a nonlinear interaction, which is modeled by a tensor product smooth (te). A tensor product smooth can model both linear and nonlinear effects across a set of predictors and their interaction (see Wieling, 2018 ) here between: RHYME, COMPOSITE_PA or VOC, and PEAKV_X. The resulting estimated degrees of freedom (edf) indicate whether the relation is linear (value close to 1) or nonlinear (values above 1).
Testing for Developmental Differences in Coarticulation Organization
Table 2 shows the results from the LMER testing for age-related differences in coarticulation degree across all consonants and vowels. No significant difference was noted across the three target age groups. However, differences in coarticulation degree were found across consonantal contexts, with a lower coarticulation degree in alveolar /d/ context as compared to labial /b/ context (estimate: −0.11793, p < 0.05). Coarticulation degree did not differ across other consonantal contexts.
Table 2 . Results from the linear mixed effects model testing for age comparisons in coarticulation degree between the 4-year-old group (K1), 5-year-old group (K2), and 7-year-old group (P1).
Descriptive Statistics for Phonological Awareness and Vocabulary
Pearson product-moment correlations were computed to assess relationships between all developmental assessments. For the rhyming task, we conducted the task in 40 of the 41 children because one P1 child did not want to conduct the rhyming task. A strong positive 0.94 correlation ( p < 0.001) was found between scores for onset deletion and phoneme synthesis. In subsequent analyses, testing the effect of phonological awareness on coarticulatory organization, we therefore computed a composite score as the sum of the scores obtained in the two tasks. This score was taken to reflect children’s phonemic awareness (COMPOSITE_PA), that is, of phonemic units in comparison to the awareness of larger phonological units (rhymes).
Figure 2 provides an overview of the score distribution for each of the four developmental assessments conducted across child cohorts. Dot plots were used to highlight variations in the number of children obtaining a target score. Table 1 provides a summary of the descriptive statistics reflecting children’s phonological awareness and expressive vocabulary. Mean score and range reflect the number of correct items (raw scores). While mean scores increased with age for all language-related skills, results (1) revealed stark individual differences within the same age-group and (2) overlap in scores across age groups for rhyme and expressive vocabulary. For the phonological tasks targeting the awareness of phonemic units (onset segments and individual phonemes), children in kindergarten had overall great difficulty completing the tasks (despite being familiarized with pre-test items), while children in the first grade could complete the tasks with various levels of proficiency.
Figure 2 . Score distribution for each of the four developmental assessments conducted across age groups (K1, K2, and P1). From left to right: rhyme production, onset deletion, phoneme synthesis, and vocabulary. The filled colored circles from different sizes represent different numbers of participants sharing a similar score.
The Welch t test was conducted to test for developmental differences in phonological awareness and vocabulary. Performance on rhyme production for the scoring procedure we employed did not yield any significant differences among age groups (K1–K2: t = −0.58, df = 17.47, p < 0.6; K1–P1: t = −0.58238, df = 17.47, p < 0.6; K2–P1: t = −1.9085, df = 12.524, p < 0.08). With regard to the composite score computed to target the awareness of phonemic units, 5-year-old children (K2) did not differ in performance from 4-year olds (K1) ( t = −1, df = 12, p < 0.4). Only 7-year-old children (P1) showed greater proficiency than K2 ( t = −15.572, df = 21.128, p < 0.0001 4.693e-13) and K1 ( t = −30.006, df = 14, p < 0.0001). Hence, a developmental increase in awareness of segmental units was found between children in kindergarten altogether and those in the first year of primary school, which yielded an overall high correlation between age and PA composite of 0.9 ( p < 0.0001). Regarding vocabulary, similar directions were found. K1 children did not exhibit lower proficiency than K2 ( t = −0.95914, df = 19.728, p < 0.4), only when compared to P1 children ( t = −7.0665, df = 16.375, p < 0.0001). K2 children also had lower vocabulary scores than P1 children ( t = −4.0338, df = 16.257, p < 0.001). However, unlike for phonemic awareness, the correlation between age and vocabulary was not significant (0.12, p < 0.3).
Interaction Between Phonological Awareness and Coarticulation Degree
Given the results from the developmental assessments, we adopted the following statistical approach: we first tested the interaction between rhyme proficiency as an index of intermediate unit-size awareness and coarticulation degree for all children. We then further tested for a separate interaction between phonemic awareness (COMPOSITE_PA, named PA for short hereafter) or vocabulary (VOC) and coarticulation degree. We conducted GAM analyses to illuminate potentially nonlinear interactions.
First and foremost, an interaction between rhyme awareness and coarticulation degree was found across all three consonantal contexts ( p < 0.0001). More specifically, greater rhyming skills were associated with lower coarticulation degree. Furthermore, the estimated degrees of freedom (edf) were all above 1, which indicates that rhyme proficiency was non-linearly related to an increase in children’s coarticulation scores. Nonlinear interactions between rhyme and coarticulation degree were found in each consonantal context ( Table 3 ). The nonlinearity was the highest in the alveolar context (edf: 10.778), followed by the velar and labial contexts. This means that the pattern of interaction between rhyme and coarticulation degree was specific to the gestural organization of the consonant-vowel combinations.
Table 3 . Tensor smooth terms of the generalized additive model testing for an interaction between rhyme and coarticulation degree for all children per consonantal context /b/, /d/, /g/. edf: estimated degrees of freedom.
Table 4 presents an overview of the GAM model testing for an interaction between phonemic awareness (PA) and coarticulation degree. A negative correlation was found, that is, greater phonemic proficiency coincided with lower coarticulation degree. This interaction differed significantly across consonant contexts ( p < 0.0001). The nonlinearity of the interaction was again the most prominent in the alveolar context and lowest in the labial context. Figure 3 presents three-dimensional visualizations of the nonlinear interaction patterns obtained for each consonantal context, called terrain maps. These visualizations (also called contour plots) provide further insights into the direction of the observed interaction between PA and coarticulation degree. More specifically, they depict differences in the tongue dorsum position during the production of each stop consonant (/b, d, g/ from left to right plot) with respect to the tongue dorsum position during the production of the subsequent target vowel ( y -axis) as a function of children’s PA score ( x -axis). In the plot, changes are expressed by means of a color scaling. The color scheme in the small upper right rectangle provides a referential color coding for various tongue dorsum positions scaled from 0 to 1. While blue shades characterize more anterior tongue dorsum positions (as expected for anterior vowels such as /i/), orange shades correspond to more posterior tongue positions (e.g., for /u/). The full-size plots themselves display the tongue position during the consonant as a function of its subsequent vowel position ( y -axis) and PA scores obtained (value on the x -axis). If the tongue dorsum position of the consonant is highly influenced by the upcoming vowel (i.e., if coarticulation degree is high), the color distribution within the plots is expected to resemble the referential color scaling provided for the vowel tongue dorsum positions (i.e., yellow color for more posterior and blue color for more anterior tongue dorsum positions). The red contour lines are used similarly to isolines in topographic maps (e.g. for hiking) to indicate locations sharing the same (predicted, based on all trials) value. Here, the values are not altitude landmarks, but tongue dorsum positions. Hence, red contour lines characterize locations of identical consonant tongue dorsum positions across a set of PA scores (from 0 to 24) as a function of their vocalic environment. The direction and shape of the contour line provide information whether changes in tongue dorsum position are linear (straight line) or not (curved line).
Table 4 . Tensor smooth terms of the generalized additive model testing for an interaction between phonemic awareness (composite_PA) and coarticulation degree for all children per consonantal context /b/, /d/, /g/.
Figure 3 . Terrain maps illustrating changes in the tongue dorsum gesture across three consonantal contexts (/b/: left column, /d/: middle column, /g/: right column) as a function of tongue dorsum position for target vowels ( y -axis) and composite phonological awareness scores from 0 (the minimal score obtained) to the maximal score of 25 ( x -axis).
Let us now take a concrete example. In the labial context /b/, we can see that for a target vocalic tongue dorsum position of 0.3 (value on the y-axis), the corresponding position at the consonant midpoint is about 0.4 (value on the red contour line) for children who have obtained a PA score close to 0. From a score of 10 upward, the tongue dorsum position during the consonant becomes slightly more posterior (i.e., above the 0.4 red contour line, hence further away from the target 0.3 value for its subsequent vowel).
Moving on to the alveolar context, it can be noted that the position of the tongue dorsum during the alveolar /d/ stop remains overall in a central (green shade) to anterior position (blue shade) regardless of the upcoming vowel. This shows that the tongue dorsum position during the alveolar stop resists vocalic influences due to more immediate gestural constraints requiring a more anterior to central tongue dorsum position. However, scores starting from 10 (about half the maximal score) onward are associated with a change toward a more central tongue dorsum position as compared to children with poorer PA scores. In labial and velar contexts, the color scaling characterizes more faithfully the range of vocalic targets in the antero-posterior dimension: from blue for anterior vowels to orange for more posterior vowels. This is very clear for children with a poor PA score: the tongue dorsum position for all vowels is well anticipated in the consonant. The color patterning differs in children with higher PA scores reflecting a more central tongue dorsum position (larger green portion) and hence a lower coarticulation degree. Furthermore, in velar context, the contour lines are flatter with central vowels (e.g., on y -axis: 0.5–0.6 values) and more non-linear in the context of posterior vowels (0.8 and above). In the labial context, the interaction between phonemic awareness and coarticulation degree is slightly nonlinear (edf value: 3). In Figure 3 , the red contour lines look overall flat, except with anterior vowels (e.g., 0.3 value and below). Overall, Figure 3 shows that the interaction of PA and coarticulation degree: (1) approximates linearity in labial and velar contexts contrary to the alveolar context and (2) varies as a function of the various combination of individual consonants and vowels. The implications of these nonlinear relationships between phonological and motor domains are discussed in section “Nonlinear Interactions Between Vocabulary, Phonological Awareness, and Coarticulatory Organization.”
These visual outputs differ markedly from standard numerical reports. They are quite valuable for speech production research in general and more so for the developmental field (e.g., Figure 3 ) because the continuous color scaling used in these plots can reveal gradients in target effects or interactions between parameters and hence potentially identifying nonlinear patterning. In the case of spoken language acquisition, these permit departing from categorization of children’s articulations in terms of abstract phonological targets (which they are in the process of acquiring) and instead obtain more faithful descriptions of the variety of articulatory expressions for a given target. This type of description is particularly relevant in the developmental field because like adults – and even to a greater extent than adults – children do not produce words or segments uniformly across repetitions. Acoustic and articulatory variability are indeed ubiquitous in child speech (e.g., Heisler et al., 2010 ). The color scaling in the GAM contour plots hence provides a fair depiction of the variations in tongue dorsum positions within regions associated with a specific target (e.g., individual vowels) or in interaction with a phonetic environment (e.g., a specific vowel in the context of a specific consonant).
Interaction Between Expressive Vocabulary and Coarticulation Degree
Last, we tested for an interaction between children’s expressive vocabulary and their pattern of coarticulation degree. A significant effect was found in all three consonantal contexts ( Table 5 , p < 0.0001). Overall, nonlinear patterns of interactions between domains were noted. However, those were not uniform across consonant and vowel combinations ( Figure 4 ). In the labial context, an increase in vocabulary score coincides with lower coarticulation degree. For example, in anterior vowels that have a 0.2 tongue dorsum position value ( y -axis), the corresponding tongue dorsum position during the labial stop production has a value of 0.3 in children with low vocabulary while close to 0.4 in children with advanced vocabulary. Similar trends are observed in syllables including an alveolar onset, but the interaction between vocabulary and coarticulation degree is this time more nonlinear (more pronounced curved lines) and complex than in the labial context. For children with more proficient vocabulary (e.g., score 16 upward), the tongue dorsum position is slightly more central in the case of anterior vowels (e.g., 0.2). Consonantal tongue positions in the context of central vowels (e.g., 0.6) are characterized by a slightly oscillatory behavior from more to less to more central. Last, tongue position for the alveolar stop flanked by posterior vowels (e.g., 0.8) also shows a nonlinear pattern with an overall central tongue dorsum position. Last, in the velar context, the relation between vocabulary and coarticulation degree also translates into slightly more central tongue dorsum positions in children with higher vocabulary score. To summarize, greater expressive vocabulary is associated with a more central tongue dorsum during the consonant and hence lesser influence from individual vowels.
Table 5 . Tensor function terms of the generalized additive model testing for an interaction between expressive vocabulary and coarticulation degree for all children per consonantal context /b/, /d/, /g/.
Figure 4 . Terrain maps illustrating changes in the tongue dorsum gesture across three consonantal contexts (/b/: left column, /d/: middle column, /g/: right column) as a function of tongue dorsum position for target vowels ( y -axis) and vocabulary scores from 13 (the minimal score obtained) to the maximal score of 25 ( x -axis).
In this study, we asked whether children’s phonological awareness and expressive vocabulary have an impact on anticipatory coarticulation . Our general motivation for this research stemmed from independent findings made in speech motor control and developmental phonology suggesting an increasing access to and use of phonemic units during the kindergarten-to-primary school period. Results drawn from a cross-sectional investigation of 41 children provide the first empirical evidence that vocabulary and phonological awareness interact dynamically with coarticulation degree during the period from kindergarten to primary school. In general, greater phonemic awareness and vocabulary were associated with greater segmental differentiation of tongue gestures in children’s coarticulatory organization. We expand below on the implications of those findings for the development of spoken language fluency.
Age-Related Versus Skill-Based Descriptions of Spoken Language Development
In the past decade, a fair amount of empirical research has reported greater vocabulary and phonological awareness in school-aged children than children in kindergarten (in German: Kauschke, 2000 ; Wimmer and Mayringer, 2002 ; Schäfer et al., 2014 ; in English: Carroll et al., 2003 ; Ziegler and Goswami, 2005 ). However, results from the present study suggest that age-driven categorizations are not always the only suitable ways to characterize skill development or at least they may underestimate its complexity. Several findings uphold this argument.
First of all, the language-related assessments conducted in this study provide a mixed validation of prior findings regarding a developmental increase in expressive vocabulary and phonological awareness. Indeed, our sample of kindergarten children was seemingly as proficient as first-grade children in expressive vocabulary as attested by the absence of significant age differences. Likewise, they were as proficient as first-grade children in their rhyming skills, which suggest that by the age of 4.5, they have gained awareness of intermediate size phonological components. This may be due to rhyming practices being initiated early in age, via singing, counting rhyming games at home or in kindergarten. With respect to tasks probing phonemic units, the two youngest cohorts did not differ from each other but showed significantly lower awareness than school-aged children at age 7. Interestingly in our study, the only 5-year old who could actually perform the phonemic task was able to read a few words and had knowledge about some letters. Hence, success in these tasks may emerge only once children have been explicitly trained in phonemic decoding/encoding, either in primary school in the context of reading acquisition (e.g., Ziegler and Goswami, 2005 ; Schaeffer et al., 2014 ) or with parents at home. We discuss this point further in section “An Integrated-Interactive Approach to Skill Development.”
Second, children within the same age group did not behave all in the same way but instead exhibited substantial individual variability ( Figure 2 ), a phenomenon also previously noted (e.g., review in Sosa and Stoel-Gammon, 2012 ; see also Wimmer and Mayringer, 2002 ; Schäfer et al., 2014 ). In the present study, this was the case in all three age groups and for all assessments, except for tasks probing phonemic awareness in kindergarteners (onset segments, phoneme synthesis) for which we noted a floor effect. Regarding first-grade children, it seems that while they have gained substantial awareness of sub-lexical units in comparison to children in kindergarten, it takes longer to be fully proficient in manipulating phonemic units (cf. the scores distribution, Figure 2 ). Regarding vocabulary, wide disparities across children from the same age are well-established (e.g., CDI reports within and across languages). Similar conclusions have been drawn regarding children’s coarticulatory patterns (e.g., at 4 years of age in Nittrouer and Burton, 2005 ; Barbier et al., 2015 ; at 5 years of age in Zharkova, 2017 ; overlap between 3–4-year and 5-year olds in Noiray et al., 2019 ) and here again with no systematic age-related difference in coarticulatory degree across consonantal contexts.
It is not uncommon for developmental researchers to point to between-age overlaps and/or substantial within age-group differences in various abilities. The question is then why those differences are observed. A simple answer may be that children are at different individual stages in their developmental trajectory. For instance, well-attested vocabulary spurts seem to depend on pre-existing achievements (e.g. reaching the 50 words milestone) rather than be the result of biological age progression (see review of lexical development in Nazzi and Bertoncini, 2003 ). Other studies have underlined stronger developmental dependencies based on proficiency rather than age (e.g., between phonological development and motor ability, e.g., Smith, 2006 ; Goffman, 2010 ; between vocabulary and production accuracy, e.g., Edwards et al., 2004 ; Vihman and Croft, 2007 ). When that is the case, age-related interpretations are problematic because they may attribute evidence (e.g., a decrease in coarticulation degree) to the wrong source or hide complex relationships between factors that are individual-specific rather than age-dependent. This is not to argue that age does not matter: the development of speech motor skill along with lexical and phonological knowledge can actually be described within a maturational perspective because all skills develop in the time domain. It is hence not surprising that correlations between age and phonological awareness were found in our study – albeit not with all PA tasks and not with vocabulary. However, while age-based descriptions of language acquisition may be interpreted in the perspective of biologically-driven developments, it may instead be the effect of experience upon the learning mechanism (i.e., the exposure to and practice speaking the language) that gives maturation its transformational power (e.g., in perception: Kuhl et al., 1992 ; Hay, 2018 ). Uncovering how experience shapes (spoken) language acquisition independent of age has been not only thrilling but also enduring challenge for psycholinguists because experience unfolds within an extended time scale and results from multiple interactions in a continuously variable environment that remains difficult to replicate in lab environments.
To summarize, the results reported in this study provide good incentives for future research to draw skill-based comparisons of children’s linguistic ability. With this approach, we will not only account for the complex developmental relationships across domains taking place in the first decade of life, we will also better capture the complexity of (spoken) language acquisition arising from both experience-based and biologically driven processes than if our analyses are restricted to age comparisons. This leads us to the discussion of the role of skill interactions for (spoken) language development.
Nonlinear Interactions Between Vocabulary, Phonological Awareness, and Coarticulatory Organization
As reported in previous sections, no uniformly strong differences in coarticulation degree emerged between 4-, 5- and 7-year-old children ( Table 2 ). However, children showing poor phonological awareness indicated overall greater coarticulation degree than children with higher scores. This suggests that for children with poorer phonemic representations, lingual gestures for consecutive consonants and vowels may be activated together with substantial vocalic anticipation. Further, we noted no uniform relation between coarticulation and phonemic awareness across children’s scores, by which each unit change in one domain would result in an equivalent (linear) unit change in the other domain of interest. In our sample of children, the relationship between domains was non-linear and therefore more complex: an increase in children’s phonemic awareness score was at times not associated with any equivalent change in coarticulatory pattern until reaching a certain stage. Last, those non-linear interactions varied across phonetic contexts (cf. edf values). The shape of the skill interactions indeed differed as a function of the identity of the coarticulated consonants and vowels and the compatibility of their gestural goals (cf. colored terrain maps). For instance, in the case of a syllable involving two gestures from two anatomically distinct organs (the lips for the labial /b/ and the tongue for any vowel), vocalic influences remained high regardless of children’s phonemic proficiency (rather flat isolines and all colors well represented; Figure 3 ). However, in the context of the alveolar /d/ stop that involves two consecutive lingual gestures within a short-temporal span (tongue dorsum for both /d/ and subsequent vowels), non-linear interactions were more noticeable. Children with advanced awareness of the smallest phonemic units (e.g., higher scores) exhibited slightly more central tongue dorsum positions than children with poorer ability (larger blue portion characterizing an anterior tongue position). This suggests a gradual functional decoupling between the anterior (tip-blade) and the posterior subparts of the tongue (dorsum-back). While the tongue remains in a rather anterior position during the alveolar stop production, the tongue dorsum seems a little more central as if to anticipate the production upcoming vocalic gesture. Non-linear interactions were also visible in syllables including a velar onset. Variation in phonemic awareness coincided with variation in the palatal-to-velar constriction location as a function of the vowel (see Recasens, 2014 ). While lower phonemic awareness was associated with greater vocalic influences (full color scale represented, Figure 3 ), greater awareness correlated with more central tongue positions during the consonant articulation. This finding corroborates previous research reporting a lack of speech motor independence in the early age (e.g., Nittrouer et al., 1996 ) and provides additional evidence for an important interaction with phonemic awareness, which seems particularly relevant for the coarticulation of complex gestural goals involving a single organ.
Nonlinearities were also observed in the interaction between vocabulary and coarticulatory patterns. First, results indicated that children with greater expressive vocabulary showed lower intra-syllabic coarticulation degree independently of age (cf. 0.12 correlation) and hence greater sensitivity to the gestural demands underlying various consonant-vowel combinations, while children with poorer vocabulary showed larger coarticulatory units with greater vocalic influence over previous consonants. Given numerous findings supporting a lexically grounded development of phonological representations and its impact on production accuracy (e.g., Ferguson and Farwell, 1975 ; Metsala, 1999 ; Beckman and Edwards, 2000 ; Edwards et al., 2004 , 2011 ; Munson et al., 2005 ; Vihman and Keren-Portnoy, 2013 ), our results supplement existing evidence that a rich lexical repertoire leads to greater phonological differentiation, by showing it may also support greater motor differentiation and flexibility in coarticulatory patterns depending on the gestural demands associated with consecutive segments. In the present study, the interaction between vocabulary and coarticulation degree in the alveolar context provides a compelling example that children with more proficient vocabulary show greater differentiation between the tongue dorsum and tongue tip for coarticulating consecutive consonantal and vocalic gestures recruiting the same organ. Second, the nonlinear nature of the interaction between vocabulary and coarticulation also suggests that the coupling between domains does not develop incrementally but rather that it may be when individual children reach a certain size of expressive vocabulary that the interaction with production weighs in children’s coarticulatory organization.
Taken together, results support the view of a by-stage approach to skill development. Milestones and developmental stages have long been identified in various developmental domains (e.g., walking: Thelen and Smith, 1994 ; perception: e.g. Best, 1994 ; Maye et al., 2002 ; Werker, 2018 ; spoken language: e.g., Kuhl, 2011 ; language processing: e.g., Vilain et al., 2019 ) and provide researchers with referential landmarks for a better understanding of typical trajectories, as well as useful tools for the diagnosis and prediction of potential deviations from typical pathways. In the domain of spoken language development, canonical babbling stands as an undisputed milestone allowing children to move toward a more complex quality of the speech production skill (e.g., production of the first meaningful words). This study points to a similar mechanism for skill interaction. In the same way children continuously develop individual skills (e.g., spoken language, expressive vocabulary), there may be milestones and developmental stages characterizing periods for which an interaction is (more significantly) activated. The outcome of this interaction would lead children to progress toward a new developmental stage. Taking again the relation between phonemic awareness and coarticulation, an average score reaching above 10 may characterize a developmental stage by which phonemic differentiation is maturing both at the representational and speech motor levels.
An Integrated-Interactive Approach to Skill Development
In a preceding study, we had argued that the question “whether children organize their speech in segments versus syllables versus phonological words or lexical items is twofold: It requires finding the phonological units guiding children’s speech production and the motor units embedding those higher-level units” ( Noiray et al., 2018 , p. 8). The research conducted since then motivates us to endorse an integrated-interactive approach to (spoken) language acquisition. By integrated, we mean that the gradually acquired knowledge about different unit types and sizes does not constrain children to move from one organizational scheme to another (e.g., from holistic to segmental representation of speech or vice versa). Instead, this knowledge would integrate into an increasingly more complex and flexible language system allowing children to gradually manipulate a greater variety of phonetic compounds and structural organizations ( Noiray et al., 2019 ). At the production level, this integrative process is exemplified in preschool-age children using gradients of coarticulation degree to accommodate the varying gestural demands of consecutive consonants and vowels ( Noiray et al., 2019 ). At the representational level, the way phonological awareness has been traditionally assessed directly reflects an integrative approach to phonological development: children’s structural knowledge of their native language is usually tested incrementally with tasks tapping different levels of unit complexity (e.g., words, syllables, rhymes, and segments). Phonological awareness may therefore be envisioned as an integrative learning process: it is only once children have fully integrated all organizational levels and can manipulate them into various ways that they have reached adult-like phonological representations.
The process of combinatoriality is not unique to language. In their discussion of language discreteness, Studdert-Kennedy and Goldstein (2003) had remarked on striking structural similarities between the way languages pattern and the way other processes in nature pattern (e.g., in biology, physics, chemistry). They argue for a “particulate principle” ( Abler, 1989 ) under which “units that combine into a larger unit do not disappear or lose their integrity: they can re-emerge or be recovered through mechanisms of physical, chemical, or genetic interaction, or, for language, through the mechanisms of human speech perception and language understanding” ( Studdert-Kennedy and Goldstein, 2003 , pp. 52–53). Congruent with this theoretical position, we consider a view of (spoken) language in which various structural types of combinations – gestures, segments, syllables, and words – are not mutually exclusive but reflect complementary levels of linguistic organizations that all contribute to the richness and complexity of language systems (e.g., Goffman et al., 2008 ; Noiray et al., 2019 ). From very early in development, the process of coarticulation itself binds gestures, sounds, phonetic units together to create compounds that ultimately lend meaning to speech streams. This imparts to coarticulation a special role for (spoken) language development beyond its usual circumscription to low-level motor processes. By tracking the maturation of coarticulatory organization, we can indeed capture the gradual binding of representational and executional levels. Expanding on that view, the present findings provide evidence for subtle differences in the implementation of this relationship due to the very nature of the phonemes represented in children’s mind and their motor expressions. From our preceding studies ( Noiray et al., 2013 , 2018 , 2019 ; Rubertus and Noiray, 2018 ) and research conducted in the domains of lexical and phonological development, it seems that holistic and segmental organizations (both in representation and production) develop together, albeit probably at different paces at different times. For instance, lexically based organizations may prevail at an early stage because they support object-word correspondences and referencing which are particularly relevant for children at an early stage of their life, while segmental representations may develop more slowly because they are more abstract and not bound to real-world objects. While variability in individual trajectories is evidently to be expected (e.g., Smith et al., 2010 ), overall there is converging evidence in typically developing children that these types of organization integrate with one another in the course of developing spoken language fluency (e.g., Vihman, 2015 ).
Furthermore, we argue for an interactive approach to (spoken) language development in which various skills develop together and are equally important to the uniqueness of human communication. While the literature abounds with studies highlighting developmental interactions between phonological awareness and various cognitive domains (e.g. literacy: Ziegler and Goswami, 2005 ; or with vocabulary: Charles-Luce and Luce, 1995 ; Muter et al., 2004 ; Hilden, 2016 ), the present study sheds light on the interaction between cognitive and speech motor skills. Results suggest that motor, lexical, and phonological developments collaborate dynamically over time by contact with the language (i.e., via increasingly richer exposure and practice speaking the language). This is a fairly significant finding that has various implications.
First, it may challenge models of adult speech production that have suggested a modular approach with lexical, phonological, and motor processes considered as separate components sequentially orchestrated (e.g., Levelt and Wheeldon, 1994 , Figure 1; Levelt, 1999 , Figure 1). It may also promote a revision of speech production models that have considered interactions across domains but with a top-down approach, whereby motor execution depends on the output of preceding cognitive or neural processes (e.g., in Levelt and Wheeldon’s model: motor execution is comprised within phonological encoding but implemented as the final component, p. 245; in Guenther and Vladusich, 2012 ’s DIVA model: between the motor, auditory, and somatosensory domains, Figure 1, review in Tourville and Guenther, 2011 ). If interactions between the lexical, phonological, and motor domains exist in the developing speech system of children, those should prevail in adults’ speech organization or at least residuals from such relationships may remain. Assuming a developmental continuity from children to adults’ speech production, models of speech production would benefit in taking the ontogenetic findings into account and perhaps adopt a more integrated-interactive perspective. By doing so, it may be possible to move forward in the longstanding quest for determining the nature of the units of speech production (see, for example, discussion in Pierrehumbert, 2003 ; Hickok, 2014 ).
Second, the finding of interactions across domains is relevant for the clinical field. Indeed, while predictive studies have usually tested how skill X at a time T1 predicts the stage of another skill Y at time T2 (e.g., Walley et al., 2003 ; Edwards et al., 2004 ), no study has to our knowledge ventured to examine how interactions between specific skills change over developmental time or predict the stage of another interaction at a later time. Although the present study was not designed to demonstrate a specific causal direction in the relationships observed, it is highly likely that speech motor, lexical, and phonological skills mutually influence each other over time. There is enough evidence in infant and child research supporting both directions (e.g., motor, lexical and phonological developments: Menn and Butterworth, 1983 ; DePaolis et al., 2013 ; articulatory filter hypothesis: Vihman, 1996 ; DePaolis et al., 2011 ; Majorano et al., 2014 ; phonological templates: Vihman and Croft, 2007 ; Vihman and Wauquier, 2018 ; role of articulatory skills for later phonemic awareness). Given that coarticulated speech is initiated years before children gain adult-like knowledge about the structural combinatoriality of their native language, an effect of coarticulatory practice on the development of phonological awareness is not an implausible scenario. In the first 4-to-5 years of life, children acquire a basic awareness of the structural combinatoriality of sounds (phonetic awareness) because they can form new words (real words or imaginary creations) and converse comfortably with others. This raises the question whether phonological awareness is indispensable to adult-like fluent speech or only to fluent reading. To elucidate whether it is only a by-product of literacy acquisition that happens to create collateral changes to children’s speech organization, it will be crucial to examine whether the maturational trajectories of illiterate adults or children’s coarticulatory patterns are similar to those of literate children. If they do, it may suggest that developing adult-like coarticulatory patterns does not entail any advanced awareness of the structural combinatoriality of their native language. Instead, maturation of coarticulatory patterns may relate more to children tuning their speech motor system to the phonetic regularities of their native language and therefore interact more significantly with perceptual rather than phonological development. Expanding on this hypothesis, the process of language acquisition may encompass two types of interactions: one serving oral communication and primarily involving perceptual, motor, and lexical skills; another one developing in a more protracted fashion for the purpose of literacy acquisition and involving primary interactions between motor, lexical, and phonological skills. Comparisons with preschool-aged children with advanced phonemic awareness would also provide a compelling experimental framework for assessing the role of phonological awareness with respect to speech motor control skill for developing adult-like patterns of coarticulation. In a recently funded project, we have initiated a first step in this direction, testing for interactions between various levels of phonological awareness, reading proficiency, and production fluency in typically developing school-aged children ( Popescu and Noiray, 2019 ) in comparison to children at risk or diagnosed with reading disorders.
Limitations and Perspectives for Future Research
Overall, results from the present study provide strong evidence that the process of developing spoken language fluency encompasses dynamic interactions between vocabulary, phonological awareness, and speech motor control in German children. While this represents a promising first step, further empirical work is obviously needed to understand these multidimensional interactions in greater detail. Generalized additive modeling (GAM) represents an innovative and powerful method because it can unveil nonlinear relationships between cognitive and motor domains and estimate their interrelated change over time. In the present study, it was possible to use GAM models to illuminate nonlinear patterns of interactions, which would have remained hidden if we had used linear mixed models. Note, however, our dataset presents some weaknesses. For instance, the examination of vocabulary being limited to nouns in this study, our assessment of children’s expressive lexicon was limited, and hence, correlation should be considered with caution. As mentioned earlier, it was not possible to reliably test for the combined effect of vocabulary together with phonological awareness on coarticulatory coarticulation due to dataset requirements (e.g., recording many more children and obtaining many more scores per participant). For generalized additive modeling to provide reliable results, large sample-sized investigations are also necessary, which remain challenging in the developmental field due to various methodological constraints and time-consuming data processing. However, given the growing statistical expertise among developmental psycholinguists combined with greater effort to conduct synergistic data collection across laboratories, there is no doubt that future quantitative studies will succeed in teasing apart their (in)dependent effect on the development of spoken language fluency.
The present study is part of a longer-term project aiming to elucidate whether the expansion of vocabulary and phonological awareness contributes to increasingly more segmentally specified coarticulatory organizations from kindergarten to primary school. This question is not only important for theories of language acquisition but also for clinical practice. Assessments of deviant coarticulatory patterns have primarily tested their motor origins (e.g., apraxia of speech: Nijland et al., 2002 ; speech sound disorder: Maas and Mailend, 2012 ; phonological disorders: Gibbon, 1999 ; stuttering: Lenoci and Ricci, 2018 ). Evidence of an intricate relationship with other linguistic components of the language system would certainly affect the way diagnosis and treatment are envisioned. The opposite question whether increased practice coarticulating a wide range of phonetic combinations supports greater phonemic differentiation and the stabilization of motor correspondences would be equally exciting in terms of its implications for language-related cognitive development. In this study, we have first demonstrated that important interactions between cognitive and motor domains occur in the course of developing spoken language fluency. We believe our findings now warrant longitudinal investigations to further test whether the interactions observed are bi-directional and hence fundamental to the growth of each individual skill or unilateral.
Last, if phonological awareness is the knowledge of the discrete and coarticulation represents its continuous articulatory-acoustic make-up, it will be important in future studies to design analytical approaches that can adequately account for the development of this intricate relationship over time. Dynamical systems seem a promising avenue in that respect. In a recent discussion of speech dynamics, Iskarous emphasizes that dynamical systems “do not assume separate sets of principles to describe discrete and continuous aspects of a system. Rather, the discrete description is shown to predict the continuous one, using the concept of a differential equation” ( Iskarous, 2017 , p. 8). The present study provides an ontogenetic perspective illustrating how access to various levels of phonological discreteness (words, syllables, segments) interacts with the organization of the continuous: from the production of syllabic entities to the fine integration of segmentally specified gestures. In future research on this topic, we aim to combine dynamical systems theory with longitudinal data to address how this dynamical relationship precisely unfold in the developing language system of children.
The present study tested whether developmental differences in coarticulation degree widely reported in the literature over the past decades were strictly related to maturational differences in speech motor abilities or also interacted with children’s language-related abilities. An examination of children’s coarticulatory patterns in relation to their lexical and phonological proficiency allowed us to uncover developmental differences that would remain unexplained if each skill was considered separately. Other domains, which have not been examined in the present study, are likely to play a role and should be thoroughly considered in future studies (e.g., assessment of literacy, phonological memory). The question of what skill interactions allow children to become fluent language users and how those evolve dynamically over time have become pressing issues for developmental researchers. However, for those to be uncovered interdisciplinary collaborations will be necessary, between developmental biology, psychology, and linguistics. While all domains have separately argued that multiple developments are intricately connected over time, only actual collaborations across disciplines will generate a unified account of language development.
Data Availability Statement
The datasets generated for this study are available on request to the corresponding author.
Ethics Statement
The study reported in the manuscript has been approved by the Ethic Committee of the University of Potsdam in Germany. The goals of the research, the children population recorded, the method, and recruitment procedure have been described and reviewed by the Committee prior to giving a positive review.
Author Contributions
AN provided the theoretical framework of the study, obtained the funding, and designed the empirical questions resulting in the manuscript. AN and AP conceptualized and designed the statistical analyses. AN, AP, and LH organized the dataset for subsequent statistical analyses. AP performed all statistical analyses. AN, AP, HK, ER, SK, and LH contributed to ultrasound data collection and processing and/or administration and scoring of the behavioural assessments. HK trained the team in administration and scoring the developmental assessments. AN wrote the manuscript. AN and AP provided all visualizations and edited the first draft. HK, ER, and SK provided feedback on the pre-final draft. All authors read the manuscript and agreed on its submission.
This research was generously supported by the Deutsche Forschungsgemeinschaft (DFG) grant N° 255676067 and 1098 and PredictAble (Marie Skłodowska-Curie Actions, H2020-MSCA-ITN-2014, N° 641858).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
Many colleagues have contributed to the success of this study to whom we are indebted: Martijn Wieling for his careful guidance in the statistical analyses of the present dataset and Bodo Winter for useful related advice, Jan Ries and Mark Tiede for co-developing the SOLLAR platform used in this research, the BabyLab at University of Potsdam recruitment assistance (in particular Barbara Höhle and Tom Fritzsche), the team at Laboratory for Oral Language Acquisition (LOLA) involved in data recording and processing, and all participants enrolled in the study. We thank two reviewers for their thorough and insightful input. We are also grateful to Carol Fowler for stimulating discussions and for reviewing an earlier draft of this manuscript. Last, we shall thank the various scholars cited in this manuscript whose referential work has been a great source of inspiration. In that respect, a special thought for Michael Studdert-Kennedy who first sparked enthusiasm for this research. The publishing of this manuscript was supported by the Deutsche Forschungsgemeinschaft (DFG) and the Publishing fund of the University of Potsdam.
Abakarova, D., Iskarous, K., and Noiray, A. (2018). Quantifying lingual coarticulation in German using mutual information: an ultrasound study. J. Acoust. Soc. Am. 144, 897–907. doi: 10.1121/1.5047669
PubMed Abstract | CrossRef Full Text | Google Scholar
Abler, W. L. (1989). On the particulate principle of self-diversifying systems. J. Soc. Biol. Struct. 12, 1–13.
Google Scholar
Anthony, J. L., and Francis, D. J. (2005). Development of phonological awareness. Curr. Dir. Psychol. Sci. 14, 255–259. doi: 10.1111/j.0963-7214.2005.00376.x
CrossRef Full Text | Google Scholar
Anthony, J. L., Lonigan, C. J., Driscoll, K., Phillips, B. M., and Burgess, S. R. (2003). Phonological sensitivity: a quasi-parallel progression of word structure units and cognitive operations. Read. Res. Q. 38, 470–487. doi: 10.1598/RRQ.38.4.3
Barbier, G., Perrier, P., Ménard, L., Payan, Y., Tiede, M., and Perkell, J. (2015). “Speech planning in 4-year-old children versus adults: acoustic and articulatory analyses” in Proceedings 16th Annual Conference of the International Speech Communication Association (Germany: Dresden).
Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R. H. B., Singmann, H., et al. (2015). Package ‘lme4’. Convergence 12.
Beckman, M. E., and Edwards, J. (2000). The ontogeny of phonological categories and the primacy of lexical learning in linguistic development. Child Dev. 71, 240–249. doi: 10.1111/1467-8624.00139
Beckman, M. E., Munson, B., and Edwards, J. (2007). Vocabulary growth and the developmental expansion of types of phonological knowledge. Lab. Phonol. 9, 241–264.
Best, C. T. (1994). “The emergence of language-specific phonemic influences in infant speech perception” in The development of speech perception: The transition from speech sounds to spoken word . 167–224.
Brady, S. A., Braze, D., and Fowler, C. A. (2011). Explaining individual differences in reading: Theory and evidence . Psychology Press.
Browman, C. P., and Goldstein, L. (1992). Articulatory phonology: an overview. Phonetica 49, 155–180. doi: 10.1159/000261913
Carroll, J. M., Snowling, M. J., Stevenson, J., and Hulme, C. (2003). The development of phonological awareness in preschool children. Dev. Psychol. 39:913. doi: 10.1037/0012-1649.39.5.913
Charles-Luce, J., and Luce, P. A. (1995). An examination of similarity neighbourhoods in young children’s receptive vocabularies. J. Child Lang. 22, 727–735. doi: 10.1017/S0305000900010023
Davis, M., and Redford, M. A. (2019). The emergence of perceptual-motor units in a production model that assumes holistic speech plans. Front. Psychol. 10:2121. doi: 10.3389/fpsyg.2019.02121
Delattre, P. (1951). The physiological interpretation of sound spectrograms . Publications of the Modern Language Association of America. 864–875.
DePaolis, R. A., Vihman, M. M., and Keren-Portnoy, T. (2011). Do production patterns influence the processing of speech in prelinguistic infants? Infant Behav. Dev. 34, 590–601. doi: 10.1016/j.infbeh.2011.06.005
DePaolis, R. A., Vihman, M. M., and Nakai, S. (2013). The influence of babbling patterns on the processing of speech. Infant Behav. Dev. 36, 642–649. doi: 10.1016/j.infbeh.2013.06.007
Edwards, J., Beckman, M. E., and Munson, B. (2004). The interaction between vocabulary size and phonotactic probability effects on children’s production accuracy and fluency in nonword repetition. J. Speech Lang. Hear. Res. 47, 421–436. doi: 10.1044/1092-4388(2004/034)
Edwards, J., Munson, B., and Beckman, M. E. (2011). Lexicon–phonology relationships and dynamics of early language development–a commentary on Stoel-Gammon’s ‘relationships between lexical and phonological development in young children’. J. Child Lang. 38, 35–40. doi: 10.1017/S0305000910000450
Ferguson, C. A., and Farwell, C. B. (1975). Words and sounds in early language acquisition: English initial consonants in the first fifty words. Lg 51, 419–439.
Fowler, A. E. (1991). “How early phonological development might set the stage for phoneme awareness” in Phonological processes in literacy: A tribute to Isabelle Y. Liberman . Vol. 106 . eds. Brady, S. A., and Shankweiler, D. P., 97–117.
Fowler, C. A. (1994). Invariants, specifiers, cues: an investigation of locus equations as information for place of articulation. Percept. Psychophys. 55, 597–610. doi: 10.3758/BF03211675
Fowler, C. A., Shankweiler, D., and Studdert-Kennedy, M. (2016). “Perception of the speech code” revisited: speech is alphabetic after all. Psychol. Rev. 123, 125–150. doi: 10.1037/rev0000013
Fricke, S., and Schäfer, B. (2008). Test für phonologische Bewusstheitsfähigkeiten (TPB) . Idstein: Schulz-Kirchner.
Fricke, S., Szczerbinski, M., Fox-Boyer, A., and Stackhouse, J. (2016). Preschool predictors of early literacy acquisition in German-speaking children. Read. Res. Q. 51, 29–53. doi: 10.1002/rrq.116
Gibbon, F. E. (1999). Undifferentiated lingual gestures in children with articulation/phonological disorders. J. Speech Lang. Hear. Res. 42, 382–397. doi: 10.1044/jslhr.4202.382
Gillon, G. T. (2007). Effective practice in phonological awareness intervention for children with speech sound disorder. Persp. Lang. Learn. Educ. 14, 18–23. doi: 10.1044/lle14.3.18
Goffman, L. (2010). “Dynamic interaction of motor and language factors in normal and disordered development” in Speech motor control . eds. Maassen, B., and van Lieshout, P. (Oxford University Press), 137–152.
Goffman, L., Smith, A., Heisler, L., and Ho, M. (2008). Speech production units in children and adults: evidence from coarticulation. J. Speech Lang. Hear. Res. 51, 1423–1437. doi: 10.1044/1092-4388(2008/07-0020)
Goswami, U., and Bryant, P. (2016). Phonological skills and learning to read : Routledge.
Green, J. R., Moore, C. A., and Reilly, K. J. (2002). The sequential development of jaw and lip control for speech. J. Speech Lang. Hear. Res. 45, 66–79. doi: 10.1044/1092-4388(2002/005)
Green, J. R., Nip, I. S., and Maassen, B. (2010). “Some organization principles in early speech development” in Speech motor control . eds. Maassen, B., and van Lieshout, P. (Oxford University Press), 171–188.
Grimme, B., Fuchs, S., Perrier, P., and Schöner, G. (2011). Limb versus speech motor control: a conceptual review. Mot. Control. 15, 5–33. doi: 10.1123/mcj.15.1.5
Guenther, F. H., and Vladusich, T. (2012). A neural theory of speech acquisition and production. J. Neurolinguistics 25, 408–422. doi: 10.1016/j.jneuroling.2009.08.006
Hay, J. (2018). Sociophonetics: the role of words, the role of context, and the role of words in context. Top. Cognit. Sci. 10, 696–706. doi: 10.1111/tops.12326
Heisler, L., Goffman, L., and Younger, B. (2010). Lexical and articulatory interactions in children’s language production. Dev. Sci. 13, 722–730. doi: 10.1111/j.1467-7687.2009.00930.x
Hickok, G. (2014). The architecture of speech production and the role of the phoneme in speech processing. Lang. Cogn. Neurosci. 29, 2–20. doi: 10.1080/01690965.2013.834370
Hilden, R. (2016). Empirische Studie zum Zusammenhang von Lexikon und Phonologischen Bewusstheitsfähigkeiten bei monolingual deutschen Kindern im Alter von 5;0 bis 6;6 Jahren. Bachelor’s thesis.
Iskarous, K. (2017). The relation between the continuous and the discrete: a note on the first principles of speech dynamics. J. Phon. 64, 8–20. doi: 10.1016/j.wocn.2017.05.003
Iskarous, K., Fowler, C. A., and Whalen, D. H. (2010). Locus equations are an acoustic expression of articulator synergy. J. Acoust. Soc. Am. 128, 2021–2032. doi: 10.1121/1.3479538
Iskarous, K., Mooshammer, C., Hoole, P., Recasens, D., Shadle, C. H., Saltzman, E., et al. (2013). The coarticulation/invariance scale: mutual information as a measure of coarticulation resistance, motor synergy, and articulatory invariance. J. Acoust. Soc. Am. 134, 1271–1282. doi: 10.1121/1.4812855
Jusczyk, P. W., Cutler, A., and Redanz, N. J. (1993). Infants’ preference for the predominant stress patterns of English words. Child Dev. 64, 675–687. doi: 10.1111/j.1467-8624.1993.tb02935.x
Katz, W. F., Kripke, C., and Tallal, P. (1991). Anticipatory coarticulation in the speech of adults and young children: acoustic, perceptual, and video data. J. Speech Hear. Res. 34, 1222–1232. doi: 10.1044/jshr.3406.1222
Kauschke, C. (2000). Der Erwerb des frühkindlichen Lexikons: eine empirische Studie zur Entwicklung des Wortschatzes im Deutschen. Vol. 27 . Gunter Narr Verlag.
Keren-Portnoy, T., Majorano, M., and Vihman, M. M. (2009). From phonetics to phonology: the emergence of first words in Italian. J. Child Lang. 36, 235–267. doi: 10.1017/s0305000908008933
Kuhl, P. K. (2011). Early language learning and literacy: neuroscience implications for education. Mind Brain Educ. 5, 128–142. doi: 10.1111/j.1751-228X.2011.01121.x
Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., and Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by 6 months of age. Science 255, 606–608. doi: 10.1126/science.1736364
Lenoci, G., and Ricci, I. (2018). An ultrasound investigation of the speech motor skills of stuttering Italian children. Clin. Linguist. Phon. 32, 1126–1144. doi: 10.1080/02699206.2018.1510983
Levelt, W. (1999). “Producing spoken language” in The neurocognition of language . (Cambridge University Press), 83–122.
Levelt, W. J., and Wheeldon, L. (1994). Do speakers have access to a mental syllabary? Cognition 50, 239–269. doi: 10.1016/0010-0277(94)90030-2
Liberman, I. Y., Shankweiler, D., Fischer, F. W., and Carter, B. (1974). Explicit syllable and phoneme segmentation in the young child. J. Exp. Child Psychol. 18, 201–212. doi: 10.1016/0022-0965(74)90101-5
Maas, E., and Mailend, M. L. (2012). Speech planning happens before speech execution: online reaction time methods in the study of apraxia of speech. J. Speech Lang. Hear. Res. 55:S1523-34. doi: 10.1044/1092-4388(2012/11-0311)
Majorano, M., Vihman, M. M., and DePaolis, R. A. (2014). The relationship between infants’ production experience and their processing of speech. Lang. Learn. Develop. 10, 179–204. doi: 10.1080/15475441.2013.829740
Mann, V., and Wimmer, H. (2002). Phoneme awareness and pathways into literacy: a comparison of German and American children. Reading Writing 15, 653–682. doi: 10.1023/a:1020984704781
Maye, J., Werker, J. F., and Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition 82, B101–B111. doi: 10.1016/S0010-0277(01)00157-3
Menn, L., and Butterworth, B. (1983). Development of articulatory, phonetic, and phonological capabilities. Lang. Prod. 2, 3–50.
Menn, L., and Vihman, M. (2011). “Features in child phonology” in Where do phonological features come from . 261–301.
Metsala, J. L. (1999). Young children’s phonological awareness and nonword repetition as a function of vocabulary development. J. Educ. Psychol. 91:3. doi: 10.1037/0022-0663.91.1.3
Metsala, J. L. (2011). “Lexical reorganization and the emergence of phonological awareness” in Handbook of early literacy research . Vol. 3 . eds. Neuman, S. B., and Dickinson, D. K. 66–84.
Munson, B., Beckman, M., and Edwards, J. (2012). “Abstraction and specificity in early lexical representations: climbing the ladder of abstraction” in The Oxford handbook of laboratory phonology . American Speech-Language-Hearing Association (Oxford: Oxford University Press), 288–309.
Munson, B., Kurtz, B. A., and Windsor, J. (2005). The influence of vocabulary size, phonotactic probability, and wordlikeness on nonword repetitions of children with and without specific language impairment. J. Speech Lang. Hear. Res. 48, 1033–1047. doi: 10.1044/1092-4388(2005/072)
Munson, B., Swenson, C. L., and Manthei, S. C. (2005). Lexical and phonological organization in children. J. Speech Lang. Hear. Res. 48, 108–124. doi: 10.1044/1092-4388(2005/009)
Muter, V., Hulme, C., Snowling, M. J., and Stevenson, J. (2004). Phonemes, rimes, vocabulary, and grammatical skills as foundations of early reading development: evidence from a longitudinal study. Dev. Psychol. 40, 665–681. doi: 10.1037/0012-1649.40.5.665
Nazzi, T., and Bertoncini, J. (2003). Before and after the vocabulary spurt: two modes of word acquisition? Dev. Sci. 6, 136–142. doi: 10.1111/1467-7687.00263
Nazzi, T., and Cutler, A. (2019). How consonants and vowels shape spoken-language recognition. Annu. Rev. Linguist. 5, 25–47. doi: 10.1146/annurev-linguistics-011718-011919
Nicholson, H., Munson, B., Reidy, P., and Edwards, J. (2015). “Effects of age and vocabulary size on production accuracy and acoustic differentiation of young children’s sibilant fricatives” in ICPhS .
Nijland, L., Maassen, B., Meulen, S. V. D., Gabreëls, F., Kraaimaat, F. W., and Schreuder, R. (2002). Coarticulation patterns in children with developmental apraxia of speech. Clin. Linguist. Phon. 16, 461–483. doi: 10.1080/02699200210159103
Nittrouer, S., and Burton, L. T. (2005). The role of early language experience in the development of speech perception and phonological processing abilities: evidence from 5-year-olds with histories of otitis media with effusion and low socioeconomic status. J. Commun. Disord. 38, 29–63. doi: 10.1016/j.jcomdis.2004.03.006
Nittrouer, S., Studdert-Kennedy, M., and Neely, S. T. (1996). How children learn to organize their speech gestures: further evidence from fricative-vowel syllables. J. Speech Lang. Hear. Res. 39, 379–389. doi: 10.1044/jshr.3902.379
Noiray, A., Abakarova, D., Rubertus, E., Krüger, S., and Tiede, M. (2018). How do children organize their speech in the first years of life? Insight from ultrasound imaging. J. Speech Lang. Hear. Res. 61, 1355–1368. doi: 10.1044/2018_JSLHR-S-17-0148
Noiray, A., Ménard, L., and Iskarous, K. (2013). The development of motor synergies in children: ultrasound and acoustic measurements. J. Acoust. Soc. Am. 133, 444–452. doi: 10.1121/1.4763983
Noiray, A., Wieling, M., Abakarova, D., Rubertus, E., and Tiede, M. (2019). Back from the future: nonlinear anticipation in adults’ and children’s speech. J. Speech Lang. Hear. Res. 62, 3033–3054. doi: 10.1044/2019_JSLHR-S-CSMC7-18-0208
Parush, A., Ostry, D. J., and Munhall, K. G. (1983). A kinematic study of lingual coarticulation in VCV sequences. J. Acoust. Soc. Am. 74, 1115–1125. doi: 10.1121/1.390035
Perrier, P. (2012). “Gesture planning integrating knowledge of the motor plant’s dynamics: a literature review from motor control and speech motor control” in Speech planning and dynamics . eds. Fuchs, S., Weirich, M., Pape, D., and Perrier, P. (Peter Lang Publishers), 191–238.
Pierrehumbert, J. B. (2003). Phonetic diversity, statistical learning, and acquisition of phonology. Lang. Speech 46, 115–154. doi: 10.1177/00238309030460020501
Polka, L., and Werker, J. F. (1994). Developmental changes in perception of nonnative vowel contrasts. J. Exp. Psychol. Hum. Percept. Perform. 20, 421–435. doi: 10.1037//0096-1523.20.2.421
Popescu, A., and Noiray, A. (2019). “Reading proficiency and phonemic awareness as predictors for coarticulatory gradients in children” in Proceeding of BUCLD 44 , Boston.
Recasens, D. (1985). Coarticulatory patterns and degrees of coarticulatory resistance in Catalan CV sequences. Lang. Speech 28, 97–114.
Recasens, D. (2014). Coarticulation and sound change in romance . Vol. 329, John Benjamins Publishing Company.
Recasens, D. (2018). “Coarticulation” in The Oxford research encyclopedia of linguistics .
Rubertus, E., and Noiray, A. (2018). On the development of gestural organization: a cross-sectional study of vowel-to-vowel anticipatory coarticulation. PLoS One 13:e0203562. doi: 10.1371/journal.pone.0203562
Saffran, J. R., Aslin, R. N., and Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science 274, 1926–1928. doi: 10.1126/science.274.5294.1926
Schaeffer, D. J., Krafft, C. E., Schwarz, N. F., Chi, L., Rodrigue, A. L., Pierce, J. E., et al. (2014). An 8‐month exercise intervention alters frontotemporal white matter integrity in overweight children. Psychophysiology 51, 728–733. doi: 10.1111/psyp.12227
Schäfer, B., Bremer, M., and Herrmann, F. (2014). Onset and phoneme awareness and its relationship to letter knowledge in German-speaking preschool children. Folia Phoniatr. Logop. 66, 126–131. doi: 10.1159/000368228
Siegmüller, J., Kauschke, C., van Minnen, B., and Bittner, D. (2010). Test zum Satzverstehen von Kindern: Eine profilorientierte Diagnostik der Syntax.
Smith, A. (2006). Speech motor development: integrating muscles, movements, and linguistic units. J. Commun. Disord. 39, 331–349. doi: 10.1016/j.jcomdis.2006.06.017
Smith, A., and Goffman, L. (1998). Stability and patterning of speech movement sequences in children and adults. J. Speech Lang. Hear. Res. 41, 18–30. doi: 10.1044/jslhr.4101.18
Smith, A., Sadagopan, N., Walsh, B., and Weber-Fox, C. (2010). Increasing phonological complexity reveals heightened instability in inter-articulatory coordination in adults who stutter. J. Fluen. Disord. 35, 1–18. doi: 10.1016/j.jfludis.2009.12.001
Smith, A., and Zelaznik, H. N. (2004). Development of functional synergies for speech motor coordination in childhood and adolescence. Dev. Psychobiol. 45, 22–33. doi: 10.1002/dev.20009
Sosa, A. V., and Stoel-Gammon, C. (2012). Lexical and phonological effects in early word production. J. Speech Lang. Hear. Res. 55, 596–608. doi: 10.1044/1092-4388(2011/10-0113)
Stoel-Gammon, C. (2011). Relationships between lexical and phonological development in young children. J. Child Lang. 38, 1–34. doi: 10.1017/S0305000910000425
Storkel, H. L., and Morrisette, M. L. (2002). The lexicon and phonology. Lang. Speech Hear. Serv. Sch. 33, 24–37. doi: 10.1044/0161-1461(2002/003)
Strycharczuk, P., and Scobbie, J. M. (2017). Fronting of southern British English high-back vowels in articulation and acoustics. J. Acoust. Soc. Am. 142, 322–331. doi: 10.1121/1.4991010
Studdert-Kennedy, M. (1998). “The particulate origins of language generativity: from syllable to gesture” in Approaches to the evolution of language . eds. Hurford, J., Studdert-Kennedy, M., and Knight, C. (Cambridge), 202–221.
Studdert-Kennedy, M. (2005). “How did language go discrete” in Evolutionary prerequisites for language . ed. Tallerman, M. (Oxford University Press).
Studdert-Kennedy, M., and Goldstein, L. (2003). Launching language: the gestural origin of discrete infinity. Stud. Evol. Lang. 3, 235–254. doi: 10.1093/acprof:oso/9780199244843.003.0013
Thelen, E., and Smith, L. B. (1994). A dynamic systems approach to the development of perception and action . Cambridge, MA: MIT Press.
Tourville, J. A., and Guenther, F. H. (2011). The DIVA model: a neural theory of speech acquisition and production. Lang. Cognit. Proc. 26, 952–981. doi: 10.1080/01690960903498424
Velleman, S. L., and Vihman, M. M. (2007). “Phonology in infancy and early childhood: implications for theories of language learning” in Phonology in context . (London: Palgrave Macmillan), 25–50.
Vihman, M. M. (1996). Phonological development: The origins of language in the child : Blackwell Publishing.
Vihman, M. (2015). “20 perception and production in phonological development” in The handbook of language emergence . Vol. 87 . (Wiley Online Library), 437.
Vihman, M. M. (2017). Learning words and learning sounds: advances in language development. Br. J. Psychol. 108, 1–27. doi: 10.1111/bjop.12207
Vihman, M., and Croft, W. (2007). Phonological development: toward a “radical” templatic phonology. Linguistics 45, 683–725. doi: 10.1515/LING.2007.021
Vihman, M. M., and Keren-Portnoy, T. (eds.) (2013). The emergence of phonology: Whole-word approaches and cross-linguistic evidence . Cambridge University Press.
Vihman, M. M., and Velleman, S. L. (1989). Phonological reorganization: a case study. Lang. Speech 32, 149–170.
Vihman, M. M., and Vihman, V. A. (2011). “From first words to segments” in Experience, variation and generalization: Learning a first language. Vol. 7 . eds. Arnon, I., and Clark, E. V., 109–133.
Vihman, M., and Wauquier, S. (2018). “Templates in child language” in Sources of variation in first language acquisition: Languages, contexts, and learners . eds. Hickmann, M., Veneziano, E., and Jisa, H. 27–44.
Vilain, A., Dole, M., Lœvenbruck, H., Pascalis, O., and Schwartz, J. L. (2019). The role of production abilities in the perception of consonant category in infants. Develop. Sci. :e12830. doi: 10.1111/desc.12830
Walley, A. C., Metsala, J. L., and Garlock, V. M. (2003). Spoken vocabulary growth: its role in the development of phoneme awareness and early reading ability. Read. Writ. 16, 5–20. doi: 10.1023/A:1021789804977
Werker, J. F. (2018). Perceptual beginnings to language acquisition. Appl. Psycholinguist. 39, 703–728. doi: 10.1017/S0142716418000152
Whalen, D. H. (1990). Coarticulation is largely planned 7/3. J. Phon. 18, 3–35. doi: 10.1016/S0095-4470(19)30356-0
Wieling, M. (2018). Analyzing dynamic phonetic data using generalized additive mixed modeling: a tutorial focusing on articulatory differences between L1 and L2 speakers of English. J. Phon. 70, 86–116. doi: 10.1016/j.wocn.2018.03.002
Wieling, M., Veenstra, P., Adank, P., and Tiede, M. (2017). Articulatory differences between L1 and L2 speakers of English. Proceedings paper Proceedings of the 11th International Seminar on Speech Production , Melbourne, August, 2019.
Wimmer, H., and Mayringer, H. (2002). Dysfluent reading in the absence of spelling difficulties: a specific disability in regular orthographies. J. Educ. Psychol. 94, 272–277. doi: 10.1037/0022-0663.94.2.272
Wimmer, H., Mayringer, H., and Landerl, K. (2000). The double-deficit hypothesis and difficulties in learning to read a regular orthography. J. Educ. Psychol. 92, 668. doi: 10.1037/0022-0663.92.4.668
Winter, B., and Wieling, M. (2016). How to analyze linguistic change using mixed models, growth curve analysis and generalized additive Modeling. J. Lang. Evol. 1, 7–18. doi: 10.1093/jole/lzv003
Wood, S. N. (2017). Generalized additive models: An introduction with R . Chapman and Hall/CRC.
Zharkova, N. (2017). Voiceless alveolar stop coarticulation in typically developing 5-year-olds and 13-year-olds. Clin. Linguist. Phon. 31, 503–513. doi: 10.1080/02699206.2016.1268209
Zharkova, N., Hewlett, N., and Hardcastle, W. J. (2011). Coarticulation as an indicator of speech motor control development in children: an ultrasound study. Mot. Control. 15, 118–140. doi: 10.1123/mcj.15.1.118
Ziegler, J. C., and Goswami, U. (2005). Reading acquisition, developmental dyslexia, and skilled reading across languages: a psycholinguistic grain size theory. Psychol. Bull. 131, 3–20. doi: 10.1037/0033-2909.131.1.3
Keywords: language acquisition, coarticulation, speech motor control, phonological awareness, vocabulary, speech production
Citation: Noiray A, Popescu A, Killmer H, Rubertus E, Krüger S and Hintermeier L (2019) Spoken Language Development and the Challenge of Skill Integration. Front. Psychol . 10:2777. doi: 10.3389/fpsyg.2019.02777
Received: 07 May 2019; Accepted: 25 November 2019; Published: 17 December 2019.
Reviewed by:
Copyright © 2019 Noiray, Popescu, Killmer, Rubertus, Krüger and Hintermeier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Aude Noiray, [email protected]
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
SRAG: Speech Retrieval Augmented Generation for Spoken Language Understanding
Ieee account.
- Change Username/Password
- Update Address
Purchase Details
- Payment Options
- Order History
- View Purchased Documents
Profile Information
- Communications Preferences
- Profession and Education
- Technical Interests
- US & Canada: +1 800 678 4333
- Worldwide: +1 732 981 0060
- Contact & Support
- About IEEE Xplore
- Accessibility
- Terms of Use
- Nondiscrimination Policy
- Privacy & Opting Out of Cookies
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
Machine translation from signed to spoken languages: state of the art and challenges
- Review Paper
- Open access
- Published: 01 April 2023
- Volume 23 , pages 1305–1331, ( 2024 )
Cite this article
You have full access to this open access article
- Mathieu De Coster ORCID: orcid.org/0000-0002-1103-2441 1 ,
- Dimitar Shterionov 2 ,
- Mieke Van Herreweghe 3 &
- Joni Dambre 1
6644 Accesses
3 Altmetric
Explore all metrics
A Correction to this article was published on 25 January 2024
This article has been updated
Automatic translation from signed to spoken languages is an interdisciplinary research domain on the intersection of computer vision, machine translation (MT), and linguistics. While the domain is growing in terms of popularity—the majority of scientific papers on sign language (SL) translation have been published in the past five years—research in this domain is performed mostly by computer scientists in isolation. This article presents an extensive and cross-domain overview of the work on SL translation. We first give a high level introduction to SL linguistics and MT to illustrate the requirements of automatic SL translation. Then, we present a systematic literature review of the state of the art in the domain. Finally, we outline important challenges for future research. We find that significant advances have been made on the shoulders of spoken language MT research. However, current approaches often lack linguistic motivation or are not adapted to the different characteristics of SLs. We explore challenges related to the representation of SL data, the collection of datasets and the evaluation of SL translation models. We advocate for interdisciplinary research and for grounding future research in linguistic analysis of SLs. Furthermore, the inclusion of deaf and hearing end users of SL translation applications in use case identification, data collection, and evaluation, is of utmost importance in the creation of useful SL translation models.
Similar content being viewed by others
Machine translation from text to sign language: a systematic review
A Survey on Indian Sign Language Translation Using Artificial Intelligence
LSA-T: The First Continuous Argentinian Sign Language Dataset for Sign Language Translation
Avoid common mistakes on your manuscript.
1 Introduction
The speedy progress in deep learning has seemingly enabled a bevy of new applications related to sign language recognition, translation, and synthesis, which can be grouped under the umbrella term “sign language processing.” Sign Language Recognition (SLR) can be likened to “information extraction from sign language data,” for example fingerspelling recognition [ 1 , 2 ] and sign classification [ 3 , 4 ]. Sign Language Translation (SLT) maps this extracted information to meaning and translates it to another (signed or spoken) language [ 5 , 6 ]; the opposite direction, from text to sign language, is also possible [ 7 , 8 ]. Sign Language Synthesis (SLS) aims to generate sign language from some representation of meaning, for example through virtual avatars [ 9 , 10 ]. In this article, we are zooming in on translation from signed languages to spoken languages.
In particular, we focus on translating videos containing sign language utterances to text, i.e., the written form of spoken language. We will only discuss SLT models that support video data as input, as opposed to models that require wearable bracelets or gloves, or 3D cameras. Systems that use smart gloves, wristbands or other wearables are considered intrusive and not accepted by sign language communities (SLCs) [ 11 ]. In addition, they are unable to capture all information present in signing, such as non-manual actions. Video-based approaches also have benefits compared to wearable-based approaches: they can be trained with existing data, and they could for example be integrated into conference calling software, or used for automatic captioning in videos of signing vloggers.
Sign language translation lies on the intersection of computer vision, machine translation, and linguistics
Several previously published scientific papers liken SLR to gesture recognition (among others, [ 12 , 13 ]), or even present a fingerspelling recognition system as an SLT solution (among others, [ 14 , 15 ]). Such classifications are overly simplified and incorrect. They may lead to a misunderstanding of the technical challenges that must be solved. As Fig. 1 illustrates, SLT lies on the intersection of computer vision, machine translation, and linguistics. Experts from each domain must come together to truly address SLT.
This article aims to provide a comprehensive overview of the state of the art (SOTA) of sign to spoken language translation. To do this, we perform a systematic literature review and discuss the state of the domain. We aim to find answers to the following research questions:
Which datasets are used, for what languages, and what are the properties of these datasets?
How should we represent sign language data for Machine Translation (MT) purposes?
Which algorithms are currently the SOTA for SLT?
How are current SLT models evaluated?
Furthermore, we list several challenges in SLT. These challenges are of a technical and linguistic nature. We propose research directions to tackle them.
In parallel with this article, another survey on SLT was written and published [ 16 ]. It provides a narrative historical overview of the domains of SLR and SLT and positions them in the wider scope of sign language processing. They also discuss the “to-sign” direction of SLT that we disregard. We provide a systematic and extensive analysis of the most recent work on SLT, supported by a discussion of sign language linguistics. Their work is a broader overview of the domain, but less in depth and remains mostly limited to computer science.
This article is also related to the work of Bragg et al. [ 17 ], that gives a limited but informative overview of the domains of SLR, SLS and SLT that is based on the results of panel discussions. They list several challenges in the field that align with our own findings, e.g., data scarcity and involvement of SLCs.
We first provide a high level overview of some required background information on sign languages in Sect. 2 . This background can help in the understanding of the remainder of this article, in particular the reasoning behind our inclusion criteria. Section 3 provides the necessary background in machine translation. We discuss the inclusion criteria and search strategy for our systematic literature search in Sect. 4 and objectively compare the considered papers on SLT in Sect. 5 ; this includes Sect. 5.7 focusing on a specific benchmark dataset. The research questions introduced above are answered in our discussion of the literature overview, in Sect. 6 . We present several open challenges in SLT in Sect. 7 . The conclusion and takeaway messages are given in Sect. 8 .
2 Sign language background
2.1 introduction.
It is a common misconception that there exists a single, universal, sign language. Just like spoken languages, sign languages evolve naturally through time and space. Several countries have national sign languages, but often there are also regional differences and local dialects. Furthermore, signs in a sign language do not have a one-to-one mapping to words in any spoken language: translation is not as simple as recognizing individual signs and replacing them with the corresponding words in a spoken language. Sign languages have distinct vocabularies and grammars and they are not tied to any spoken language. Even in two regions with a shared spoken language, the regional sign languages used can differ greatly. In the Netherlands and in Flanders (Belgium), for example, the majority spoken language is Dutch. However, Flemish Sign Language (VGT) and the Sign Language of the Netherlands (NGT) are quite different. Meanwhile, VGT is linguistically and historically much closer to French Belgian Sign Language (LSFB) [ 18 ], the sign language used primarily in the French-speaking part of Belgium, because both originate from a common Belgian Sign Language, diverging in the 1990s [ 19 ]. In a similar vein, American Sign Language (ASL) and British Sign Language (BSL) are completely different even though the two countries share English as the official spoken language.
2.2 Sign language characteristics
2.2.1 sign components.
Sign languages are visual; they make use of a large space around the signer. Signs are not composed solely of manual gestures. In fact, there are many more components to a sign. Stokoe stated in 1960 that signs are composed of hand shape, movement and place of articulation parameters [ 20 ]. Battison later added orientation, both of the palm and of the fingers [ 21 ]. There are also non-manual components such as mouth patterns. These can be divided into mouthings—where the pattern refers to (part of) a spoken language word—and mouth gestures, e.g., touting one’s lips. Non-manual components play an important role in sign language lexicons and grammars [ 22 ]. They can, for example, separate minimal pairs: signs that share all articulation parameters but one. When hand shape, orientation, movement and place of articulation are identical, mouth patterns can for example be used to differentiate two signs. Non-manual actions are not only important at the lexical level as just illustrated, but also at the grammatical level. A clear example of this can be found in eyebrow movements: furrowing or raising the eyebrows can signal that a question is being asked and indicate the type of question (open or closed).
2.2.2 Simultaneity
Sign languages exhibit simultaneity on several levels. There is simultaneity on the component level: as explained above, manual actions can be combined with non-manual actions simultaneously. We also observe simultaneity at the utterance level. It is, for example, possible to turn a positive utterance into a negative utterance by shaking one’s head while performing the manual actions. Another example is the use of eyebrow movements to transform a statement into a question.
2.2.3 Signing space
The space around the signer can also be used to indicate, for instance, the location or moment in time of the conversational topic. A signer can point behind their back to specify that an event occurred in the past and likewise, point in front of them to indicate a future event. An imaginary timeline can also be constructed in front of the signer, with time passing from left to right. Space is also used to position referents [ 18 , 23 ]. For example, a person can be discussing a conversation with their mother and father. Both referents get assigned a location (locus) in the signing space and further references to these persons are made by pointing to, looking at, or signing toward these loci. For example, “mom gives something to dad” can be signed by moving the sign for “to give” from the locus associated with the mother to the one associated with the father. Modeling space, detecting positions in space, and remembering these positions is important for SLT models.
2.2.4 Classifiers
Another important aspect of sign languages is the use of classifiers. Zwitserlood [ 24 ] describes them as “morphemes with a non-specific meaning, which are expressed by particular configurations of the manual articulator (or: hands) and which represent entities by denoting salient characteristics.” There are many more intricacies of classifiers than can be listed here, so we give a limited set of examples instead. Several types of classifiers exist. They can, for example, represent nouns or adjectives according to their shape or size. Whole entity classifiers can be used to represent objects, e.g., a flat hand can represent a car; handling classifiers can be used to indicate that an object is being handled, e.g., a pencil is picked up from a table. In a whole entity classifier, the articulator represents the object, whereas in a handling classifier it operates on the object.
2.2.5 The established and the productive lexicon
The vocabularies of sign languages are not fixed. Oftentimes new signs are constructed by sign language users. On the one hand, sign languages can borrow signs from other sign languages, similar to loanwords in spoken languages. In this case, these signs become part of the established lexicon. On the other hand, there is the productive lexicon—one can create an ad hoc sign. Vermeerbergen [ 25 ] gives the example of “a man walking on long legs” in VGT: rather than expressing this clause by signing “man,” “walk,” “long” and “legs”, the hands are used (as classifiers) to imitate the man walking. Both the established and productive lexicons are integral parts of sign languages.
Signers can also enact other subjects with their whole body, or part of it. They can, for example, enact animals by imitating their movements or behaviors.
2.2.6 Fingerspelling
Fingerspelling can be used to convey concepts for which a sign does not (yet) exist, or to introduce a person who has not yet been assigned a name sign. It is based on the alphabet of a spoken language, where every letter in that alphabet has a corresponding (static or dynamic) sign. Fingerspelling is also not shared between sign languages. For example, in ASL, fingerspelling is one-handed, but in BSL two hands are used.
2.3 Notation systems for sign languages
Unlike many spoken languages, sign languages do not have a standardized written form. Several notation systems do exist, but none of them are generally accepted as a standard [ 26 ]. The earliest notation system was proposed in the 1960s, namely the Stokoe notation [ 20 ]. It was designed for ASL and comprises a set of symbols to notate the different components of signs. The position, movement and orientation of the hands are encoded in iconic symbols, and for hand shapes, letters from the Latin alphabet corresponding to the most similar fingerspelling hand shape are used [ 20 ]. Later, in the 1970s, Sutton introduced SignWriting Footnote 1 : a notation system for sign languages based on a dance choreography notation system [ 27 ]. The SignWriting notation for a sign is composed of iconic symbols for the hands, face and body. The signing location and movements are also encoded in symbols, in order to capture the dynamic nature of signing. SignWriting is designed as a system for writing signed utterances for everyday communication. In 1989, the Hamburg Notation System (HamNoSys) was introduced [ 28 ]. Unlike SignWriting, it is designed mainly for linguistic analysis of sign languages. It encodes hand shapes, hand orientation, movements and non-manual components in the form of symbols.
Stokoe notation, SignWriting and HamNoSys represent the visual nature of signs in a compact format. They are notation systems that operate on the phonological level. These systems, however, do not capture the meaning of signs. In linguistic analysis of sign languages, glosses are typically used to represent meaning. A sign language gloss is a written representation of a sign in one or more words of a spoken language, commonly the majority language of the region. Glosses can be composed of single words in the spoken language, but also of combinations of words. Examples of glosses are: “CAR,” “BRIDGE,” but also “car-crosses-bridge.” Glosses do not accurately represent the meaning of signs in all cases and glossing has several limitations and problems [ 26 ]. They are inherently sequential, whereas signs often exhibit simultaneity [ 29 ]. Footnote 2 Furthermore, as glosses are based on spoken languages, there may be an implicit influence of the spoken language projected onto the sign language [ 25 , 26 ]. Finally, there is no universal standard on how glosses should be constructed: this leads to differences between corpora of different sign languages, or even between several sign language annotators working on the same corpus [ 30 ].
Sign_A is a recently developed framework that aims to define an architecture that is sufficiently robust to model sign languages on both the phonological level as well as containing meaning (when combined with a role and reference grammar (RRG)) [ 31 ]. Sign_A with RRG does not only encode the meaning of sign language utterances, but also parameters pertaining to manual and non-manual actions. De Sisto et al. [ 32 ] propose investigating the application of Sign_A for data-driven SLT systems.
The above notation systems for sign languages range from graphical to written and computational representations of signs and signed utterances. None of these notation systems were originally designed for the purpose of automatic translation from signed to spoken languages, but they can be used to train MT models. For example, glosses are often used for SLT because of their similarity to written language text, e.g., [ 5 , 6 ]. These notation systems can also be used as labels to pre-train feature extractors for SLT models. For instance, Koller et al. presented SLR systems that exploit SignWriting [ 33 , 34 ], and these systems are leveraged in some later works on SLT, e.g., [ 35 , 36 ]. Many SLT models also use feature extractors that were pre-trained with gloss labels, e.g., [ 37 , 38 ].
3 Machine translation
3.1 spoken language mt.
Machine translation is a sequence-to-sequence task. That is, given an input sequence of tokens that constitute a sentence in a source language, an MT system generates a new sequence of tokens that represents a sentence in a target language. A token refers to a sentence construction unit: a word, a number, a symbol, a character or a subword unit.
Current SOTA models for spoken language MT are based on a neural encoder-decoder architecture: an encoder network encodes an input sequence in the source language into a multi-dimensional representation; it is then fed into a decoder network which generates a hypothesis translation conditioned on this representation. The original encoder-decoder was based on Recurrent Neural Networks (RNNs) [ 39 ]. To deal with long sequences, Long Short-Term Memory Networks (LSTMs) [ 40 ] and Gated Recurrent Units (GRUs) [ 41 ] were used. To further improve the performance of RNN-based MT, an attention mechanism was introduced by Bahdanau et al. [ 42 ]. In recent years the transformer architecture [ 43 ], based primarily on the idea of attention (in combination with positional encoding) has pushed the SOTA even further.
As noted above, a sentence is broken down into tokens and each token is fed into the Neural Machine Translation (NMT) model. NMT converts each token into a multidimensional representation before that token representation is used in the encoder or decoder to construct a sentence level representation. These token representations, typically referred to as word embeddings, encode the meaning of a token based on its context. Learning word embeddings is a monolingual task, since they are associated with tokens in a particular language. Given that for a large number of languages and use cases monolingual data is abundant, it is relatively easy to build word embedding models of high quality and coverage. Building such word embedding models is typically performed using unsupervised algorithms such as GLoVe [ 44 ], BERT [ 45 ] and BART [ 46 ]. These algorithms encode words into vectors in such a way that the vectors of related words are similar. Footnote 3
The domain of spoken language MT is extensive and the current SOTA of NMT builds upon years of research. To provide a complete overview of spoken language MT is out of scope for this article. For a more in depth overview of the domain, we refer readers to the work of Stahlberg [ 49 ].
Neural machine translation models for spoken ( a ) and sign ( b ) language translation are similar; the main difference is the input modality: text for ( a ) and video for ( b )
3.2 Sign language MT
Conceptually, sign language MT and spoken language MT are similar. The main difference is the input modality. Spoken language MT operates on two streams of discrete tokens (text to text). As sign languages have no standardized notation system, a generic SLT model must translate from a continuous stream to a discrete stream (video to text). To reduce the complexity of this problem, sign language videos are discretized to a sequence of still frames that make up the video. SLT can now be framed as a sequence-to-sequence, frame-to-token task. As they are, these individual frames do not convey meaning in the way that the word embeddings in a spoken language translation model do. Even though it is possible to train SLT models using frame-based representations as inputs, the extraction of salient sign language representations is required to facilitate the modeling of meaning in sign language encoders.
Figure 2 shows a spoken language NMT and sign language NMT model side by side. The main difference between the two is the input modality. For a spoken language NMT model, both the inputs and outputs are text. For a sign language NMT model, the inputs are some representation of sign language (in this case, video embeddings). Other than this input modality, the models function similarly and are trained and evaluated in the same manner.
3.2.1 Sign language representations
For the encoder of the translation model to capture the meaning of the sign language utterance, a salient representation for sign language videos is required. We can differentiate between representations that are linked to the source modality, namely videos, and linguistically motivated representations.
As will be discussed in Sect. 5.4 , the former type of representations are often frame-based, i.e., every frame in the video is assigned a vector, or clip-based, i.e., clips of arbitrary length are assigned a vector. These types of representations are rather simple to derive, e.g., by extracting information directly from a Convolutional Neural Network (CNN). However, they suffer from two main drawbacks. First, such representations are fairly long. For example, the RWTH-PHOENIX-Weather 2014T dataset [ 6 ] contains samples of on average 114 frames (in German Sign Language (DGS)), whereas the average sentence length (in German) is 13.7 words in that dataset. As a result, frame-based representations for sign languages negatively impact the computational performance of SLT models. Second, such representations do not originate from domain knowledge. That is, they do not capture the semantics of sign language. If semantic information is not encoded in the sign language representation, the translation model is forced to model the semantics and perform translation at the same time.
The second category includes a range of linguistically motivated representations, from semantic representations to individual sign representations. In Sect. 2.3 , we presented an overview of some notation systems for sign languages: Stokoe notation, SignWriting, HamNoSys, glosses, and Sign_A. These notation systems can be used as representations in an SLT model, or to pre-train the feature extractor of SLT models. In current research, only glosses have been used as inputs or labels for the SLT models themselves, because large annotated datasets for the other systems do not exist.
3.2.2 Tasks
There are five distinct translation tasks in the considered scientific literature on SLT. The Sign2Gloss2Text and (Sign2Gloss, Gloss2Text) models both use the same architecture, but a different training algorithm
The reviewed papers cover five distinct translation tasks that can be classified based on whether, and how, glosses are used. To denote these tasks, we borrow the naming conventions from Camgöz et al. [ 6 , 37 ]. These tasks are illustrated in Fig. 3 .
Gloss2Text Gloss2Text models are used to translate from sign language glosses to spoken language text. They provide a reference for the performance that can be achieved using a salient representation. Therefore they can serve as a compass for the design of sign language representations and the corresponding SLR systems. Note that the performance of a Gloss2Text model is not an upper bound for the performance of an SLT model: glosses do not capture all linguistic properties of signs (see Sect. 2.3 ).
Sign2Gloss2Text A Sign2Gloss2Text translation system includes an SLR system as the first step, to predict glosses from video. Consequently, errors made by the recognition system are propagated to the translation system. Camgöz et al. [ 6 ] for example report a drop in translation accuracy when comparing a Sign2Gloss2Text system to a Gloss2Text system.
(Sign2Gloss, Gloss2Text) In this training setup, first a Gloss2Text model is trained using ground truth gloss and text data. Then, this model is fixed and used to evaluate the performance of the entire translation model, including the SLR model (Sign2Gloss). This is different from Sign2Gloss2Text models, where the Gloss2Text model is trained with the gloss annotations generated by the Sign2Gloss model. Camgöz et al. [ 6 ] show that these models perform worse than Sign2Gloss2Text models, because those can learn to correct the noisy outputs of the Sign2Gloss model in the translation model.
Sign2(Gloss+Text) Glosses can provide a supervised signal to a translation system without being an information bottleneck, if the model is trained to jointly predict both glosses and text [ 37 ]. Such a model must be able to predict glosses and text from a single sign language representation. The gloss labels provide additional information to the encoder, facilitating the training process. In a Sign2Gloss2Text model (as previously discussed), the translation model receives glosses as inputs: any information that is not present in glosses cannot be used to translate into spoken language text. In Sign2(Gloss+Text) models, however, the translation model input is the sign language representation (embeddings), which may be richer.
Sign2Text Sign2Text models forgo the explicit use of a separate SLR model, and instead perform translation directly with features that are extracted from videos. Sign2Text models do not need glosses to train the translation model. Note that in some cases, these features are still extracted using a model that was pre-trained for SLR, e.g., [ 37 , 38 ]. This means that some Sign2Text models do indirectly require gloss level annotations for training.
3.3 Requirements for sign language MT
With the given information on sign language linguistics and MT techniques, we are now able to sketch the requirements for sign language MT.
3.3.1 Data requirements
The training of data-driven MT models requires large datasets. The collection of such datasets is expensive and should therefore be tailored to specific use cases. To determine these use cases, members of SLCs must be involved. We answer RQ1 by providing an overview of existing datasets for SLT.
3.3.2 Video processing and sign language representation
We need to be able to process sign language videos and convert them into an internal representation (SLR). This representation must be rich enough to cover several aspects of sign languages (including manual and non-manual actions, simultaneity, signing space, classifiers, the productive lexicon, and fingerspelling). We look in our literature overview for an answer to RQ2 on how we should represent sign language data.
3.3.3 Translating between sign and spoken representations
We need to be able to translate from such a representation into a spoken language representation, which can be reused from existing spoken language MT systems. We need to adapt NMT systems to be able to work with the sign language representation, which will possibly contain simultaneous elements. By comparing different methods for SLT, we evaluate which MT algorithms perform best in the current SOTA (RQ3).
3.3.4 Evaluation
The evaluation of the resulting models can be automated by computing metrics on corpora. These metrics provide an estimate of the quality of translations. Human evaluation (by hearing and deaf people, signing and non-signing) and qualitative evaluations can provide insights into the models and data. We illustrate how current SLT models are evaluated (RQ4).
4 Literature review methodology
4.1 inclusion criteria and search strategy.
To provide an overview of sound SLT research, we adhere to the following principles in our literature search. We consider only peer-reviewed publications. We include journal articles as well as conference papers: the latter are especially important in computer science research. Any paper that is included must be on the topic of sign language machine translation and must not misrepresent the natural language status of sign languages. Therefore, we omit any papers that present classification or transcription of signs or fingerspelling recognition as SLT models (we will show in this section that there are many papers that do this). As we focus on non-intrusive translation from sign languages to text, we exclude papers that use gloves or other wearable devices.
Three scientific databases were queried: Google Scholar, Web of Science and IEEE Xplore. Footnote 4 Four queries were used to obtain initial results: “sign language translation,” “sign language machine translation,” “gloss translation” and “gloss machine translation.” These key phrases were chosen for the following reasons. We aimed to obtain scientific research papers on the topic of MT from sign to spoken languages; therefore, we search for “sign language machine translation.” Several works perform translation between sign language glosses and spoken language text, hence “gloss machine translation”. As many papers omit the word “machine” in “machine translation,” we also include the key phrases “sign language translation” and “gloss translation.”
4.2 Search results and selection of papers
Our initial search yielded 855 results, corresponding to 565 unique documents. We applied our inclusion criteria step by step (see Table 1 ), and obtained a final set of 57 papers [ 5 , 6 , 35 , 36 , 37 , 38 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 ]. The complete list of search results can be found in supplementary material (resource 1).
We further explain the reasons for excluding papers with examples. We found 30 papers not related to sign language. These papers discuss the classification and translation of traffic signs and other public signs. 60 papers consider a topic related to sign language, but not to sign language processing. These include papers from the domains of linguistics and psychology. Out of the remaining 345 papers, 130 papers claim to present a translation model in their title or main text, but in fact present a fingerspelling recognition (52 papers [ 14 , 15 , 101 , 102 , 103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 , 133 , 134 , 135 , 136 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 ]), sign classification (58 papers [ 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 , 160 , 161 , 162 , 163 , 164 , 165 , 166 , 167 , 168 , 169 , 170 , 171 , 172 , 173 , 174 , 175 , 176 , 177 , 178 , 179 , 180 , 181 , 182 , 183 , 184 , 185 , 186 , 187 , 188 , 189 , 190 , 191 , 192 , 193 , 194 , 195 , 196 , 197 , 198 , 199 , 200 , 201 , 202 , 203 , 204 , 205 , 206 , 207 , 208 ]), or SLR (20 papers [ 209 , 210 , 211 , 212 , 213 , 214 , 215 , 216 , 217 , 218 , 219 , 220 , 221 , 222 , 223 , 224 , 225 , 226 , 227 , 228 ]) system. There are 36 papers ([ 16 , 30 , 229 , 230 , 231 , 232 , 233 , 234 , 235 , 236 , 237 , 238 , 239 , 240 , 241 , 242 , 243 , 244 , 245 , 246 , 247 , 248 , 249 , 250 , 251 , 252 , 253 , 254 , 255 , 256 , 257 , 258 , 259 , 260 , 261 , 262 ]) on various topics within the domain of sign language processing that do not implement a new MT model.
We find double the amount of papers on MT from spoken languages to sign languages than vice versa: 117 compared to 59. These papers are closely related to the subject of this article, but often use different techniques, including virtual avatars (e.g., [ 9 , 10 ]), due to the different source and target modality. Hence, translation from a spoken language to a sign language is outside the scope of this article. For an overview of this related field, we refer readers to a recent review article by Kahlon and Singh [ 263 ].
Remark that our final inclusion criterion, “present a non-intrusive system based only on RGB camera inputs” is almost entirely covered by the previous criteria. We find several papers that present glove-based systems, but they do not present translation systems. Instead, they are focused on fingerspelling recognition or sign classification (e.g., [ 119 , 164 , 165 , 166 ]). The following five papers present an intrusive SLT system. Fang et al. [ 264 ] present an SLT system where signing deaf and hard of hearing people wear a device with integrated depth camera and augmented reality glasses to communicate with hearing people. Guo et al. [ 265 ] use a Kinect (RGB-D) camera to record the sign language data. The data used by Xu et al. [ 266 ] was also recorded using a Kinect. Gu et al. propose wearable sensors [ 267 ] and so do Zhang et al. [ 268 ]. After discarding these five papers, we obtain the final set of 57.
5 Literature overview
5.1 sign language mt.
Following our methodology on paper selection, laid out in Sect. 4 , we obtain 57 papers published from 2004 until and including 2022. In the analysis, papers are classified based on tasks, datasets, methods and evaluation techniques.
The earlier papers on SLT all propose Statistical Machine Translation (SMT) models, but since 2018, NMT has become the dominant variant
The early work on MT from signed to spoken languages is based entirely on statistical methods [ 5 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 ]. These works focus on gloss based translation. Several of them add visual inputs to augment the (limited) information provided by the glosses. Bungeroth et al. present the first statistical model that translates from signed to spoken languages [ 5 ]. They remark that glosses have limitations and need to be adapted for use in MT systems. Stein et al. incorporate visual information in the form of small images and hand tracking information to augment their model and enhance its performance [ 50 ], as do Dreuw et al. [ 51 ]. Dreuw et al. later ground this approach by listing requirements for SLT models, such as modeling simultaneity, signing space, and handling coarticulation [ 53 ]. Schmidt et al. further add non-manual visual information by incorporating lip reading [ 57 ]. The other papers in this set use similar techniques but on different datasets, or compare SMT algorithms [ 52 , 54 , 55 , 56 , 58 ].
In 2018, the domain moved away from SMT and toward NMT. This trend is clearly visible in Fig. 4 . This drastic shift was not only motivated by the successful applications of NMT techniques in spoken language MT, but also by the publication of the RWTH-PHOENIX-Weather 2014T dataset and the promising results obtained on that dataset using NMT methods [ 6 ]. Two exceptions are found. Luqman et al. [ 63 ] use Rule-based Machine Translation (RBMT) in 2020 to translate from Arabic sign language into Arabic, and Moe et al. [ 60 ] compare NMT and SMT approaches for Myanmar sign language translation in 2018.
Between 2004 and 2018, research into translation from signed to spoken languages was sporadic (10 papers in our subset were published over 14 years). Since 2018, with the move toward NMT, the domain has become more popular, with 47 papers in our subset published over the span of 5 years.
5.2 Datasets
The RWTH-PHOENIX-Weather 2014T dataset is used the most (31 times) throughout literature whereas other datasets are referenced at most six times in the 57 discussed papers
Several datasets are used in SLT research. Some are used often, whereas others are only used once. The distribution is shown in Fig. 5 . It is clear that the most used dataset is RWTH-PHOENIX-Weather 2014T [ 6 ]. This is because it was the first dataset large enough for neural SLT and because it is readily available for research purposes. This dataset is an extension of earlier versions, RWTH-PHOENIX-Weather [ 269 ] and RWTH-PHOENIX-Weather 2014 [ 58 ]. It contains videos in DGS, gloss annotations, and text in German. Precisely because of the popularity of this dataset, we can compare several approaches to SLT: see Sect. 5.7 .
Other datasets are also used several times. The KETI dataset [ 62 ] contains Korean Sign Language (KSL) videos, gloss annotations, and Korean text. RWTH-Boston-104 [ 50 ] is a dataset for ASL to English translation containing ASL videos, gloss annotations, and English text. The ASLG-PC12 dataset [ 270 ] contains ASL glosses and English text. The glosses are generated from English with a rule based approach. FocusNews and SRF were both introduced as part of the WMTSLT22 task [ 85 ], and they contain news broadcasts in Swiss German Sign Language (DSGS) with German translations. CSL-Daily is a dataset containing translations from Chinese Sign Language (CSL) to Chinese on everyday topics [ 77 ].
In 2022, the first SLT dataset containing parallel data in multiple sign languages was introduced [ 98 ]. The SP-10 dataset contains sign language data and parallel translations from ten sign languages. It was created from data collected in the SpreadTheSign research project [ 271 ]. Yin et al. [ 98 ] show that multilingual training of SLT models can improve performance and allow for zero-shot translation.
Several papers use the CSL dataset to evaluate SLT models [ 72 , 79 , 94 ]. However, this is problematic because this dataset was originally proposed for SLR [ 272 ]. Because the sign (and therefore gloss) order is the same as the word order in the target spoken language, this dataset is not suited for the evaluation of translation models (as explained in Sect. 2.1 ).
Table 2 presents an overview of dataset and vocabulary sizes. The number of sign instances refers to the amount of individual signs that are produced. Each of these belongs to a vocabulary, of which the size is also given. Finally, there can be singleton signs: these are signs that occur only in the training set but not in the validation or test sets. ASLG-PC12 contains 827 thousand training sentences. It is the largest dataset in terms of number of parallel sentences. The most popular dataset with video data (RWTH-PHOENIX-Weather 2014T) contains only 7,096 training sentences. For MT between spoken languages, datasets typically contain several millions of sentences, for example the Paracrawl corpus [ 273 ]. It is clear that compared to spoken language datasets, sign language datasets lack labeled data. In other words, SLT is a low-resource MT task.
A total of 20 papers report on a Gloss2Text model [ 5 , 6 , 37 , 51 , 52 , 54 , 55 , 56 , 57 , 58 , 60 , 61 , 63 , 65 , 68 , 70 , 71 , 74 , 91 , 100 ]. Sign2Gloss2Text models are proposed in five papers [ 6 , 37 , 65 , 77 , 94 ] and (Sign2Gloss, Gloss2Text) models also in five [ 6 , 37 , 50 , 51 , 65 ]. Sign2(Gloss+Text) models are found eight times within the reviewed papers [ 37 , 38 , 78 , 80 , 88 , 89 , 92 , 99 ] and Sign2Text models 30 times [ 6 , 35 , 36 , 37 , 62 , 64 , 66 , 67 , 69 , 72 , 73 , 75 , 76 , 79 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 90 , 92 , 93 , 95 , 96 , 97 , 98 , 99 ].
Gloss-based models are used throughout the entire considered time period (2004–2022), but since 2018 models which translate from video to text are gaining traction
Before 2018, when SMT was dominant, Gloss2Text models were most popular, being proposed nine times out of eleven models, the other two being (Sign2Gloss, Gloss2Text) models. Since 2018, with the availability of larger datasets, deep neural feature extractors and neural SLR models, Sign2Gloss2Text, Sign2(Gloss+Text) and Sign2Text are becoming dominant. This gradual evolution from Gloss2Text models toward end-to-end models is visible in Fig. 6 . Footnote 5
5.4 Sign language representations
The sign language representations used in the current scientific literature are glosses and representations extracted from videos. Early on, researchers using SMT models for SLT recognized the limitations of glosses and began to add additional visual information to their models [ 50 , 51 , 53 , 57 ]. The advent of CNNs has made processing and incorporating visual inputs easier and more robust. All but one model since 2018 that include feature extraction, use neural networks to do so.
We examine the representations on two dimensions. First, there is the method of extracting visual information (e.g., by using human pose estimation or CNNs). Second, there is the matter of which visual information is extracted (e.g., full frames, or specific parts such as the hands or face).
5.4.1 Extraction methods
The most popular feature extraction method in modern SLT is the 2D CNN. 19 papers use a 2D CNN as feature extractor [ 6 , 35 , 36 , 37 , 38 , 64 , 65 , 72 , 75 , 77 , 78 , 79 , 80 , 81 , 87 , 92 , 93 , 95 , 98 ]. These are often pre-trained for image classification using the ImageNet dataset [ 274 ]; some are further pre-trained on the task of Continuous Sign Language Recognition (CSLR), e.g., [ 37 , 38 , 92 , 93 ]. Three papers use a subsequent 1D CNN to temporally process the resulting spatial features [ 64 , 77 , 80 ].
Human pose estimation systems are used to extract features in fifteen papers [ 35 , 36 , 62 , 69 , 73 , 79 , 80 , 84 , 85 , 88 , 89 , 90 , 94 , 95 , 97 ]. The estimated poses can be the only inputs to the translation model [ 35 , 62 , 69 , 73 , 84 , 85 , 90 , 94 , 97 ], or they can augment other spatial or spatio-temporal features [ 36 , 79 , 80 , 88 , 89 , 95 ]. Often, the keypoints are used as a sign language representation directly. In other cases they are processed using a graph neural network to map them onto an embedding space before translation [ 89 , 94 ].
Ten papers use 3D CNNs for feature extraction [ 35 , 66 , 67 , 76 , 82 , 83 , 86 , 88 , 96 , 99 ]. These networks are able to extract spatio-temporal features, leveraging the temporal relations between neighboring frames in video data. The output of a 3D CNN is typically a sequence that is shorter than the input, summarizing multiple frames in a single feature vector. Similarly to 2D CNNs, these networks can be pre-trained on general tasks such as action recognition (on Kinetics [ 275 ]) or on more specific tasks such as isolated SLR (e.g., on WL-ASL [ 276 ]). Chen et al. [ 99 ] and Shi et al. [ 82 ] have shown independently that pre-training on sign language specific tasks yields better downstream SLT scores.
CNNs were state of the art in image feature extraction for several years. More recently, the vision transformer architecture was created that can outperform CNNs in certain scenarios [ 277 ]. Li et al. are the first to leverage vision transformers for feature extraction [ 89 ].
Kumar et al. opt to use traditional computer vision techniques instead of deep neural networks. They represent the hands and the face of the signer in the video as a set of contours [ 59 ]. First, they perform binarization to segment the hands and the face based on skin tone. Then they use the active contours method [ 278 ] to detect the edges of the hands and face. These are normalized with respect to the signer’s position in the video frame by representing every coordinate as an angle (binned to 360 different angles).
5.4.2 Multi-cue approaches
A simple approach to feature extraction is to consider full video frames as inputs. Performing further pre-processing of the visual information to target hands, face and pose information separately (referred to as a multi-cue approach) improves the performance of SLT models [ 36 , 59 , 65 , 75 , 80 , 86 , 96 ]. Zheng et al. [ 75 ] show through qualitative analysis that adding facial feature extraction improves translation accuracy in utterances where facial expressions are used. Dey et al. [ 96 ] observe improvements in BLEU scores when adding lip reading as an input channel. By adding face crops as an additional channel, Miranda et al. [ 86 ] improve the performance of the TSPNet architecture [ 66 ].
5.5 Sign language translation models
The current SOTA in SLT is entirely based on encoder-decoder NMT models. RNNs are evaluated in 16 papers [ 6 , 35 , 59 , 60 , 61 , 62 , 64 , 65 , 67 , 70 , 75 , 76 , 79 , 80 , 91 , 94 ] and transformers in 34 papers [ 35 , 36 , 37 , 38 , 60 , 62 , 65 , 66 , 68 , 69 , 71 , 72 , 73 , 74 , 77 , 78 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 95 , 96 , 98 , 99 , 100 ]. Within the RNN-based models, several attention schemes are used: no attention, Luong attention [ 279 ] and Bahdanau attention [ 42 ].
To the best of our knowledge, there has been no systematic comparison of RNNs and transformers across multiple tasks and datasets for SLT. Some authors perform a comparison between both architectures on specific datasets with specific sign language representations. A conclusive meta-study across papers is problematic due to inter-paper differences.
Ko et al. [ 62 ] report that RNNs with Luong attention obtain the highest ROUGE score, but transformers perform better in terms of METEOR, BLEU, and CIDEr (on the KETI dataset). In their experiments, Luong attention outperforms Bahdanau attention and RNNs without attention.
Moe et al. [ 60 ] compare RNNs and transformers for Gloss2Text with different tokenization schemes, and in every one of the experiments (on their own dataset), the transformer outperforms the RNN.
Four papers compare RNNs and transformers on RWTH-PHOENIX-Weather 2014T. Orbay et al. [ 35 ] report that an RNN with Bahdanau attention outperforms both an RNN with Luong attention and a transformer in terms of ROUGE and BLEU scores. Yin et al. [ 65 ] find that a transformer outperforms RNNs and that an RNN with Luong attention outperforms one with Bahdanau attention. Angelova et al. [ 91 ] achieve higher scores with RNNs than with transformers (on the DGS corpus [ 280 ] as well). Finally, Camgöz et al. [ 37 ] report a large increase in BLEU scores when using transformers, compared to their previous paper using RNNs [ 6 ]. However, the comparison is between models with different feature extractors and the impact of the architecture versus that of the feature extractors is not evaluated. It is likely that replacing a 2D CNN pre-trained on ImageNet [ 274 ] image classification with one pre-trained on CSLR will result in a significant increase in performance, especially when the CSLR model was trained on data from the same source (i.e., RWTH-PHOENIX-Weather 2014), as is the case here.
Pre-trained language models are readily available for transformers (for example via the HuggingFace Transformers library [ 281 ]). De Coster et al. have shown that integrating pre-trained spoken language models can improve SLT performance [ 38 , 92 ]. Chen et al. pre-train their decoder network in two steps: first on a multilingual corpus, and then on Gloss2Text translation [ 88 , 99 ]. This pre-training approach can drastically improve performance. Chen et al. outperform other models on the RWTH-PHOENIX-Weather 2014T dataset [ 99 ]: 28.39 and 28.59 BLEU-4 (the next highest score is 25.59).
5.6 Evaluation
The majority of evaluation studies of the quality of SLT models is based on quantitative metrics. Eight different metrics are used across the 57 papers: BLEU, ROUGE, WER, TER, PER, CIDEr, METEOR, COMET and NIST.
A total of 22 papers [ 5 , 6 , 37 , 51 , 52 , 61 , 63 , 64 , 65 , 66 , 70 , 72 , 75 , 77 , 79 , 81 , 82 , 83 , 86 , 92 , 93 , 97 ] also provide a small set of example translations, along with ground truth reference translations, allowing for qualitative analysis. Dreuw et al.’s model outputs mostly correct translations, but with different word order than the ground truth [ 51 ]. Camgöz et al. mention that the most common errors are related to numbers, dates, and places: these can be difficult to derive from context in weather broadcasts [ 6 , 37 ]. The same kind of errors is made by the models of Partaourides et al. [ 70 ] and Voskou et al. [ 81 ]. Zheng et al. illustrate how their model improves accuracy for longer sentences [ 64 ]. Including facial expressions in the input space improves the detection of emphasis laid on adjectives [ 75 ].
The datasets used in the WMTSLT22 task, FocusNews and SRF, have a broader domain (news broadcasts) than, e.g., the RWTH-PHOENIX-Weather 2014T dataset (weather broadcasts). This makes the task significantly more challenging, as can be observed in the range of BLEU scores that are achieved (typically less than 1, compared to scores in the twenties for RWTH-PHOENIX-Weather 2014T). Example translation outputs also provide insight here. The models of Tarres et al. [ 97 ] and Hamidullah et al. [ 83 ] simply predict the most common German words in many cases, indicating that the SLT model has failed to learn the structure of the data. Shi’s model [ 82 ] only translates phrases correctly when they occur in the training set, suggesting overfitting. Angelova et al. use the DGS corpus [ 280 ] (which contains discourse on general topics) as a dataset; they also obtain much lower translation scores than on RWTH-PHOENIX-Weather 2014T [ 91 ].
To the best of our knowledge, none of the papers discussed in this overview contain evaluations by members of SLCs. Two papers perform human evaluation, but only by hearing people. Luqman et al. [ 63 ] ask native Arabic speakers to evaluate the model’s output translations on a three-point scale. For the WMTSLT22 challenge [ 85 ], translation outputs were scored by human evaluators (native German speakers trained as DSGS interpreters). The resulting scores indicate a considerable gap between the performance of human translators (87%) and MT (2%).
5.7 The RWTH-PHOENIX-Weather 2014T benchmark
The popularity of the RWTH-PHOENIX-Weather 2014T dataset facilitates the comparison of different SLT models on this dataset. We compare models based on their BLEU-4 score as this is the only metric consistently reported on in all of the papers using RWTH-PHOENIX-Weather 2014T (except [ 86 ]).
An overview of Gloss2Text models is shown in Table 3 . For Sign2Gloss2Text, we refer to Table 4 , and for (Sign2Gloss, Gloss2Text) to Table 5 . For Sign2(Gloss+Text) and Sign2Text, we list the results in Tables 6 and 7 , respectively.
5.7.1 Sign language representations
Six papers use features extracted using a 2D CNNs by first training a CSLR model on RWTH-PHOENIX-Weather 2014 Footnote 6 [ 6 , 36 , 38 , 81 , 92 , 93 ]. These papers use the full frame as inputs to the feature extractor.
Others combine multiple input channels. Yin et al. [ 65 ] use Spatio-Temporal Multi-Cue (STMC) features, extracting images of the face, hands and full frames as well as including estimated poses of the body. These features are processed by a network which performs temporal processing, both on the intra- and the inter-cue level. Their models are the SOTA of Sign2Gloss2Text translation (25.4 BLEU-4). The model by Zhou et al. is similar and obtains a BLEU-4 score of 23.65 on Sign2(Gloss+Text) translation [ 80 ]. Camgöz et al. [ 36 ] use mouth pattern cues, pose information and hand shape information; by using this multi-cue representation, they are able to remove glosses from their translation model (but their feature extractors are still trained using glosses). Zheng et al. [ 75 ] use an additional channel of facial information for Sign2Text and obtain an increase of 1.6 BLEU-4 compared to their baseline. Miranda et al. [ 86 ] augment TSPNet [ 66 ] with face crops, improving the performance of the network.
Frame-based feature representations result in long input sequences to the translation model. The length of these sequences can be reduced by considering short clips instead of frames. This is done by using a pre-trained 3D CNN or by reducing the sequence length using temporal convolutions or RNNs that are trained jointly with the translation model. Zhou et al. [ 77 ] use 2D CNN features extracted from full frames, which are then further processed using temporal convolutions, reducing the temporal feature size by a factor 4. They call this approach Temporal Inception Networks (TIN). They achieve near-SOTA performance on Sign2Gloss2Text translation (23.51 BLEU-4) and Sign2Text translation (24.32 BLEU-4). Zheng et al. [ 64 ] use an unsupervised algorithm called Frame Stream Density Compression (FSDC) to remove temporally redundant frames by comparing frames on the level of pixels. The resulting features are processed using a combination of temporal convolutions and RNNs. They compare the different settings and their combination and find that these techniques can be used to reduce the input size of the sign language features and to increase the BLEU-4 score. Chen et al. [ 99 ] achieve SOTA results of 28.39 BLEU-4 using 3D CNNs pre-trained first on Kinetics-400 [ 275 ] and then on WL-ASL [ 276 ].
5.7.2 Neural architectures
We investigate whether RNNs or transformers perform best on this dataset. As this may depend on the used sign language representation, we analyze Gloss2Text, Sign2Gloss2Text, (Sign2Gloss,Gloss2Text) Sign2(Gloss+Text) and Sign2Text separately.
Because all Gloss2Text models use the same sign language representation (glosses), we can directly compare the performance of different encoder-decoder architectures. Transformers ( \(23.02 \pm 2.05\) ) outperform RNNs ( \(18.29 \pm 1.931\) ).
The Sign2Gloss2Text transformer models by Yin et al. [ 65 ] achieve better performance ( \(23.84 \pm 1.225\) ) than their recurrent models ( \(20.47 \pm 2.032\) ).
There is only a single (Sign2Gloss, Gloss2Text) model using an RNN, and it achieves 17.79 BLEU-4 [ 6 ]. The transformer models of Camgöz et al. [ 37 ] and Yin et al. [ 65 ] achieve 21.59 and 23.77, respectively. These models all use different feature extractors, so direct comparison is not possible.
No direct comparison is available for Sign2(Gloss+Text) translation. Zhou et al. [ 80 ] present an LSTMs encoder-decoder using spatio-temporal multi-cue features and obtain 24.32 BLEU-4. The best Sign2(Gloss+Text) model leverages the pre-trained large language model mBART [ 282 ] (a transformer) and obtains 28.39 BLEU-4 [ 283 ].
The Sign2Text translation models exhibit higher variance in their scores than models for the other tasks. This is likely due to the lack of additional supervision signal in the form of glosses: the choice of sign language representation has a larger impact on the translation score. The difference in BLEU-4 score between transformers ( \(19.86 \pm 5.62\) ) and RNNs ( \(10.72 \pm 3.63\) ) is larger than in other tasks. However, we do not draw definitive conclusions from these results, as the sign language representations differ in architecture and pre-training task between models.
We provide a graphical overview of the performance of RNNs and transformers across tasks in Fig. 7 and observe that transformers often outperform RNNs on RWTH-PHOENIX-Weather 2014T. However, we cannot conclusively state whether this is due to the network architecture, or due to the sign language representations that these models are trained with.
Transformers tend to outperform RNNs on different SLT tasks in terms of BLEU-4 score on the RWTH-PHOENIX-Weather 2014T dataset
5.7.3 Evolution of scores
Evolution of model scores on the RWTH-PHOENIX-Weather 2014T per task
Figure 8 shows an overview of the BLEU-4 scores on the RWTH-PHOENIX-Weather 2014T dataset from 2018 until 2023. It illustrates that the current best performing model (28.59 BLEU-4) is a Sign2Text transformer proposed by Chen et al. [ 88 ].
6 Discussion of the current state of the art
The analysis of the scientific literature on SLT in Sect. 5 allows us to formulate answers to the four research questions.
6.1 RQ1: Datasets
RQ1 asks, “Which datasets are used and what are their properties?” The most frequently used dataset is RWTH-PHOENIX-Weather 2014T [ 6 ] for translation from DGS to German. It contains 8257 parallel utterances from several different interpreters. The domain is weather broadcasts.
Current datasets have several limitations. They are typically restricted to controlled domains of discourse (e.g., weather broadcasts) and have little variability in terms of visual conditions (e.g., TV studios). Camgöz et al. recently introduced three new benchmark datasets from the TV news and weather broadcasts domain [ 73 ]. Two similar datasets were introduced in 2022 by Müller et al. [ 85 ]. Because news broadcasts are included, the domain of discourse (and thus the vocabulary) is broader. It is more challenging to achieve acceptable translation performance with broader domains [ 56 , 73 , 83 , 91 , 97 ]. Yet, these datasets are more representative of real-world signing.
Another limitation is not related to the content, but rather the style of signing. Many SLT datasets contain recordings of non-native signers. In several cases, the signing is interpreted (often under time pressure) from spoken language. This means that the used signing may not be representative of the sign language and may in fact be influenced by the grammar of a spoken language. Training a translation model on these kinds of data has implications for the quality and accuracy of the resulting translations.
6.2 RQ2: Sign language representations
RQ2 asks, “Which kinds of sign language representations are most informative?” The limitations and drawbacks of glosses lead to the use of visual-based sign language representations. This representation can have a large impact on the performance of the SLT model. Spatio-temporal and multi-cue sign language representations outperform simple spatial (frame-based) sign language representations. Pre-training on SLR tasks yields better features for SLT.
6.3 RQ3: Translation model architectures
RQ3 asks, “Which algorithms are currently the SOTA for SLT?” Despite the generally small size of the datasets used for SLT, we see that neural MT models achieve the highest translation scores. Transformers outperform RNNs in many cases, but our literature overview suggests that the choice of sign language representation has a larger impact than the choice of translation architecture.
6.4 RQ4: Evaluation
RQ4 asks, “How are current SLT models evaluated?” Many papers report several translation related metrics, such as BLEU, ROUGE, WER and METEOR. These are standard metrics in MT. Several papers also provide example translations to allow the reader to gauge the translation quality for themselves. Whereas the above metrics often correlate quite well with human evaluation, this is not always the case [ 284 ]. They also sometimes do not correlate among each other (what is the best model can be different depending on the considered metric). Only two of the 57 reviewed papers incorporate human evaluators in the loop [ 63 , 85 ]. None of the reviewed papers evaluate their models in collaboration with native signers.
7 Challenges and proposals
Our literature overview (Sect. 5 ) and discussion thereof (Sect. 6 ) illustrate that the current challenges in the domain are threefold: (i) the collection of datasets, (ii) the design of sign language representations, and (iii) evaluation of the proposed models. We discuss these below, and finally give suggestions for the development of SLT models with SOTA methods.
7.1 Dataset collection
7.1.1 challenges.
Currently, SLT is a low-resource MT task: the largest public video datasets for MT contain just thousands of training examples (see Table 2 ). Current research uses datasets in which the videos have fixed viewpoints, similar backgrounds, and sometimes the signers even wear similar clothing for maximum contrast with the background. Yet, in real-world applications, dynamic viewpoints and lighting conditions will be a common occurrence. Furthermore, far from all sign languages have corresponding translation datasets. Additional datasets need to be collected and existing ones need to be extended.
Current datasets are insufficiently large to support SLT on general topics. When moving from weather broadcasts to news broadcasts, we observe a significant drop in translation scores. There is a clear trade-off between dataset size and vocabulary size.
De Meulder [ 285 ] raises concerns with current dataset collection efforts. Existing datasets and those currently being collected suffer from several biases. If interpreted data are used, influence from spoken languages will be present in the dataset. If only native signer data are used, then the majority of signers will have the same ethnicity. Both statistical as well as neural MT exacerbate bias [ 286 , 287 ]. Therefore, when our training datasets are biased and of small volumes, we cannot expect (data driven) MT systems to reach high qualities and be generalizable.
7.1.2 Proposals
We propose to gather two kinds of datasets: focused datasets for training SLT models, but also large, multi-lingual datasets for the design of sign language representations. The former type of datasets already exists, but the latter kind, to the best of our knowledge, does not yet exist.
By collecting larger, multilingual datasets, we can learn sign language representations with (self-)supervised deep learning techniques. Such datasets do not need to consist entirely of native signing. They should include many topics and visual characteristics to be as general as possible.
In contrast, SLT requires high quality labeled data, the collection of which is challenging. Bragg et al.’s first and second calls to action, “Involve Deaf team members throughout” and “Focus on real-world applications” [ 17 ], guide the dataset collection process. By involving SLC members, the dataset collection effort can be guided toward use cases that would benefit SLCs. Additionally, by collecting datasets with a limited domain of discourse targeted at specific use cases, the SLT problem is effectively simplified. As a result, any applications would be limited in scope, but more useful in practice.
7.2 Sign language representations
7.2.1 challenges.
Current sign language representations do not take into account the productive lexicon. In fact, it is doubtful whether a pure end-to-end NMT approach is capable of tackling productive signs. To recognize and understand productive signs, we need models that have the ability to link abstract visual information to the properties of objects. Incorporating the productive lexicon in translation systems is a significant challenge, one for which, to the best of our knowledge, labeled data is currently not available.
Current end-to-end representations moreover do not explicitly account for fingerspelling, signing space, or classifiers. Learning these aspects in the translation model with an end-to-end approach is challenging, especially due the scarcity of annotated data.
Our literature overview shows that the choice of representation has a significant impact on the translation performance. Hence, improving the feature extraction and incorporating the aforementioned sign language characteristics is paramount.
7.2.2 Proposals
Linguistic analysis of sign languages can inform the design of sign language representations. The definition of the so-called meaningful units has been discussed by De Sisto et al. [ 32 ]. It requires collaboration between computer scientists and (computational) linguists. Researchers should analyze the representations that are automatically learned by SOTA SLT models. For example, SLR models appear to implicitly learn to recognize hand shapes [ 288 ]. Based on such analyses, linguists can suggest which components to focus on next.
In parallel, we can exploit unlabeled sign language data to learn sign language representations in a self-supervised manner. Recently, increasingly larger neural networks are being trained on unlabeled datasets to discover latent patterns and to learn neural representations of textual, auditory, and visual data. In the domain of natural language processing, we already observe tremendous advances thanks to self-supervised language models such as BERT [ 45 ]. In computer vision, self-supervised techniques are applied to pre-train powerful feature extractors which can then be applied to downstream tasks such as image classification or object detection. Algorithms such as SimCLR [ 283 ], BYOL [ 289 ] and DINO [ 290 ] are used to train 2D CNNs and vision transformers without labels, reaching performance that is almost on the same level as models trained with supervised techniques. In the audio domain, Wav2Vec 2.0 learns discrete speech units in a self-supervised manner [ 291 ]. In sign language processing, self-supervised learning can be applied to train spatio-temporal representations (like Wav2Vec 2.0 or SimCLR), and to contextualize those representations (like BERT).
Sign languages share some common elements, for example the fact that they all use the human body to convey information. Movements used in signing are composed of motion primitives and the configuration of the hand (shape and orientation) is important in all sign languages. The recognition of these low level components does not require language specific datasets and could be performed on multilingual datasets, containing videos recorded around the world with people of various ages, genders, and ethnicities. The representations extracted from multilingual SLR models can then be fine-tuned in monolingual or multilingual SLT models.
Self-supervised and multilingual learning should be evaluated for the purpose of learning such common elements of sign languages. This will not only facilitate automatic SLT, but could also lead to the development of new tools supporting linguistic analysis of sign languages and their commonalities and differences.
7.3 Evaluation
7.3.1 challenges.
Current research uses mostly quantitative metrics to evaluate SLT models, on datasets with limited scope. In-depth error analysis is missing from many SLT papers. SLT models should also be evaluated on real-world data from real-world settings. Furthermore, human evaluation from signers and non-signers is required to truly assess the translation quality. This is especially true because many of the SLT models are currently designed, implemented and evaluated by hearing researchers.
7.3.2 Proposals
Human-in-the-loop development can alleviate some of the concerns that live in SLCs about the application of MT techniques to sign languages about appropriation of sign languages. Human (signing and non-signing) evaluators should be included in every step of SLT research. Their feedback should guide the development of new models. For example, if the current models fail to properly translate classifiers, then SLT researchers could choose to focus on classifiers. This would hasten the progress in this field which is currently mostly focusing on improving metrics that say little about the usability of the SLT models.
Inspiration for human evaluation can be found in the yearly conference on machine translation (WMT), where researchers perform both direct assessment of translations, and relative ranking [ 292 ]. Müller et al. performed human evaluation on a benchmark dataset after an SLT challenge [ 85 ]. They hired native German speakers trained as DSGS interpreters to evaluate four different models, and compare their outputs to human translations. Their work can be a guideline for human evaluation in future research.
7.4 Applying SOTA techniques
There is still a large gap between MT and human level performance for SLT [ 85 ]. However, with the current SOTA and sufficient constraints, it may be possible to develop limited SLT applications. The development of these applications can be guided with the following three principles.
First, a dataset should be collected that has a specific topic related to the application: it is not yet possible to train robust SLT models with large vocabularies [ 56 , 73 , 83 , 91 , 97 ]. Second, the feature extractor should be pre-trained on SLR tasks as this yields the most informative representations [ 82 , 96 , 99 ]. Third, qualitative evaluation and evaluation by humans can provide insights into the failure cases of SLT models.
8 Conclusion
In this article, we discuss the SOTA of SLT and explore challenges and opportunities for future research through a systematic overview of the papers in this domain. We review 57 papers on machine translation from sign to spoken languages. These papers are selected based on predefined criteria and they are indicative of sound SLT research. The selected papers are written in English and peer-reviewed. They propose, implement and evaluate a sign language machine translation system from a sign language to a spoken language, supporting RGB video inputs. We discuss the SOTA of SLT and explore several challenges and opportunities for future research.
In recent years, neural machine translation has become dominant in the growing domain of SLT. The most powerful sign language representations are those that combine information from multiple channels (manual actions, body movements and mouth patterns) and those that are reduced in length by temporal processing modules. These translation models are typically RNNs or transformers. Transformers outperform RNNs in many cases, and large language models allow for transfer learning. SLT datasets are small: we are dealing with a low-resource machine translation problem. Many datasets consider limited domains of discourse and generally contain recordings of non-native signers. This has implications on the quality and accuracy of translations generated by models trained on these datasets, which must be taken into account when evaluating SLT models. Datasets that consider a broader domain of discourse are too small to train NMT models on. Evaluation is mostly performed using quantitative metrics that can be computed automatically, given a corpus. There are currently no works that perform evaluation of neural SLT models in collaboration with sign language users.
Progressing beyond the current SOTA of SLT requires efforts in data collection, the design of sign language representations, machine translation, and evaluation. Future research may improve sign language representations by incorporating domain knowledge into their design and by leveraging abundant, but as of yet unexploited, unlabeled data. Research should be conducted in an interdisciplinary manner, with computer scientists, sign language linguists, and experts on sign language cultures working together. Finally, SLT models should be evaluated in collaboration with end users: native signers as well as hearing people that do not know any sign language.
Change history
25 january 2024.
A Correction to this paper has been published: https://doi.org/10.1007/s10209-023-01085-9
https://signwriting.org/ .
For this reason, annotators of sign language corpora sometimes provide two parallel gloss tiers: one per hand [ 30 ].
According to the Distributional Semantics, words that have the same or similar meaning appear in the same context and as such the meaning of a word can be defined by the context in which it appears [ 47 , 48 ].
Google Scholar: https://scholar.google.com , Web of Science: https://www.webofscience.com , IEEE Xplore: https://ieeexplore.ieee.org/ .
As one paper may discuss several tasks, the total count is higher than the amount of papers.
As discussed in Sect. 5.2 , this dataset is an earlier version of RWTH-PHOENIX-Weather 2014T and they contain the same videos.
Pugeault, N., Bowden, R.: Spelling it out: Real-time asl fingerspelling recognition. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1114–1119 (2011). IEEE
Fowley, F., Ventresque, A.: Sign language fingerspelling recognition using synthetic data. In: AICS, pp. 84–95 (2021). CEUR-WS
Pigou, L., Van Herreweghe, M., Dambre, J.: Gesture and sign language recognition with temporal residual networks. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3086–3093 (2017)
Jiang, S., Sun, B., Wang, L., Bai, Y., Li, K., Fu, Y.: Skeleton aware multi-modal sign language recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3413–3423 (2021)
Bungeroth, J., Ney, H.: Statistical sign language translation. In: Workshop on Representation and Processing of Sign Languages, LREC, vol. 4, pp. 105–108 (2004). Citeseer
Camgoz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7784–7793 (2018). https://doi.org/10.1109/CVPR.2018.00812
Stein, D., Bungeroth, J., Ney, H.: Morpho-syntax based statistical methods for automatic sign language translation. In: Proceedings of the 11th Annual Conference of the European Association for Machine Translation (2006)
Morrissey, S., Way, A.: Joining hands: Developing a sign language machine translation system with and for the deaf community. In: CVHI (2007)
San-Segundo, R., López, V., Martın, R., Sánchez, D., Garcıa, A.: Language resources for spanish–spanish sign language (lse) translation. In: Proceedings of the 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies at LREC, pp. 208–211 (2010)
David, B., Bouillon, P.: Prototype of automatic translation to the sign language of french-speaking Belgium. Evaluation by the deaf community. Modelling, Measurement and Control C 79(4), 162–167 (2018)
Erard, M.: Why sign language gloves don’t help deaf people (2017). https://www.theatlantic.com/technology/archive/2017/11/why-sign-language-gloves-dont-help-deaf-people/545441/
Adnan, N.H., Wan, K., AB, S., BAKAR, J.A.A.: Learning and manipulating human’s fingertip bending data for sign language translation using pca-bmu classifier. CREAM: Curr. Res. Malaysia (Penyelidikan Terkini di Malaysia) 3 , 361–372 (2013)
Caliwag, A., Angsanto, S.R., Lim, W.: Korean sign language translation using machine learning. In: 2018 Tenth International Conference on Ubiquitous and Future Networks (ICUFN), pp. 826–828 (2018). IEEE
Mistry, J., Inden, B.: An approach to sign language translation using the intel realsense camera. In: 2018 10th Computer Science and Electronic Engineering (CEEC), pp. 219–224 (2018). IEEE
Krishnan, P.T., Balasubramanian, P.: Detection of alphabets for machine translation of sign language using deep neural net. In: 2019 International Conference on Data Science and Communication (IconDSC), pp. 1–3 (2019). IEEE
Núñez-Marcos, A., Perez-de-Viñaspre, O., Labaka, G.: A survey on sign language machine translation. Expert Systems with Applications, 118993 (2022)
Bragg, D., Koller, O., Bellard, M., Berke, L., Boudreault, P., Braffort, A., Caselli, N., Huenerfauth, M., Kacorri, H., Verhoef, T., et al. : Sign language recognition, generation, and translation: An interdisciplinary perspective. In: The 21st International ACM SIGACCESS Conference on Computers and Accessibility, pp. 16–31 (2019)
Vermeerbergen, M., Twilhaar, J.N., Van Herreweghe, M.: Variation between and within sign language of the netherlands and flemish sign language. In: Language and Space Volume 30 (3): Dutch, pp. 680–699. De Gruyter Mouton, Berlin (2013)
Van Herreweghe, M., Vermeerbergen, M.: Flemish sign language standardisation. Current issues in language planning 10 (3), 308–326 (2009)
Google Scholar
Stokoe, W.: Sign language structure: An outline of the visual communication systems of the american deaf. Studies in Linguistics, Occasional Papers 8 (1960)
Battison, R.: Lexical Borrowing in American Sign Language. Linstok Press, Silver Spring (1978)
Bank, R., Crasborn, O.A., Van Hout, R.: Variation in mouth actions with manual signs in Sign Language of the Netherlands (NGT). Sign Language & Linguistics 14 (2), 248–270 (2011)
Perniss, P.: 19. use of sign space. In: Sign Language, pp. 412–431. De Gruyter Mouton, Berlin (2012)
Zwitserlood, I.: In: Pfau, R., Steinbach, M., Woll, B. (eds.) Classifiers, pp. 158–186. De Gruyter Mouton, Berlin (2012). https://doi.org/10.1515/9783110261325.158
Vermeerbergen, M.: Past and current trends in sign language research. Lang. Commun. 26 (2), 168–192 (2006). https://doi.org/10.1016/j.langcom.2005.10.004
Article Google Scholar
Frishberg, N., Hoiting, N., Slobin, D.I.: In: Pfau, R., Steinbach, M., Woll, B. (eds.) Transcription, pp. 1045–1075. De Gruyter Mouton, Berlin (2012). https://doi.org/10.1515/9783110261325.1045
Sutton, V.: Sign Writing for Everyday Use. Sutton Movement Writing Press, New York (1981)
Prillwitz, S.: HamNoSys Version 2.0. Hamburg Notation System for Sign Languages: An Introductory Guide. Intern. Arb. z. Gebärdensprache u. Kommunik. Signum Press, Berlin (1989)
Vermeerbergen, M., Leeson, L., Crasborn, O.A.: Simultaneity in Signed Languages: Form and Function vol. 281. John Benjamins Publishing, Amsterdam (2007). https://doi.org/10.1075/cilt.281
De Sisto, M., Vandeghinste, V., Gómez, S.E., De Coster, M., Shterionov, D., Seggion, H.: Challenges with sign language datasets for sign language recognition and translation. In: LREC2022, the 13th International Conference on Language Resources and Evaluation, pp. 2478–2487 (2022)
Murtagh, I.E.: A linguistically motivated computational framework for irish sign language. PhD thesis, Trinity College Dublin.School of Linguistic Speech and Comm Sci (2019)
De Sisto, M., Shterionov, D., Murtagh, I., Vermeerbergen, M., Leeson, L.: Defining meaningful units. challenges in sign segmentation and segment-meaning mapping. In: Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pp. 98–103. Association for Machine Translation in the Americas, Virtual (2021). https://aclanthology.org/2021.mtsummit-at4ssl.11
Koller, O., Camgoz, N.C., Ney, H., Bowden, R.: Weakly supervised learning with multi-stream cnn-lstm-hmms to discover sequential parallelism in sign language videos. IEEE Trans. Pattern Anal. Mach. Intell. 42 (9), 2306–2320 (2019). https://doi.org/10.1109/TPAMI.2019.2911077
Koller, O., Ney, H., Bowden, R.: Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3793–3802 (2016)
Orbay, A., Akarun, L.: Neural sign language translation by learning tokenization. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 222–228 (2020). IEEE
Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R.: Multi-channel transformers for multi-articulatory sign language translation. In: European Conference on Computer Vision, pp. 301–319 (2020). Springer
Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: Joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10023–10033 (2020)
De Coster, M., D’Oosterlinck, K., Pizurica, M., Rabaey, P., Verlinden, S., Van Herreweghe, M., Dambre, J.: Frozen pretrained transformers for neural sign language translation. In: Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pp. 88–97. Association for Machine Translation in the Americas, Virtual (2021). https://aclanthology.org/2021.mtsummit-at4ssl.10
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9 (8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Cho, K., van Merriënboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1724–1734 (2014)
Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015 (2015)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 7871–7880 (2020). https://doi.org/10.18653/v1/2020.acl-main.703 . ACL
Harris, Z.: Distributional structure. Word 10 (2–3), 146–162 (1954). https://doi.org/10.1007/978-94-009-8467-7_1
Firth, J.: A synopsis of linguistic theory 1930–1955. In: Studies in Linguistic Analysis. Philological Society, Oxford (1957). reprinted in Palmer, F. (ed.): R. Firth, Longman, Harlow (1968)
Stahlberg, F.: Neural machine translation: A review. Journal of Artificial Intelligence Research 69 , 343–418 (2020)
MathSciNet Google Scholar
Stein, D., Dreuw, P., Ney, H., Morrissey, S., Way, A.: Hand in hand: automatic sign language to English translation. In: Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers, Skövde, Sweden (2007). https://aclanthology.org/2007.tmi-papers.26
Dreuw, P., Stein, D., Ney, H.: Enhancing a sign language translation system with vision-based features. In: International Gesture Workshop, pp. 108–113 (2007). Springer
Morrissey, S., Way, A., Stein, D., Bungeroth, J., Ney, H.: Combining data-driven mt systems for improved sign language translation. In: European Association for Machine Translation (2007)
Dreuw, P., Stein, D., Deselaers, T., Rybach, D., Zahedi, M., Bungeroth, J., Ney, H.: Spoken language processing techniques for sign language recognition and translation. Technol. Disabil. 20 (2), 121–133 (2008)
López, V., San-Segundo, R., Martín, R., Lucas, J.M., Echeverry, J.D.: Spanish generation from spanish sign language using a phrase-based translation system. technology 9, 10 (2010)
Stein, D., Schmidt, C., Ney, H.: Sign language machine translation overkill. In: International Workshop on Spoken Language Translation (IWSLT) 2010 (2010)
Stein, D., Schmidt, C., Ney, H.: Analysis, preparation, and optimization of statistical sign language machine translation. Mach. Transl. 26 (4), 325–357 (2012)
Schmidt, C., Koller, O., Ney, H., Hoyoux, T., Piater, J.: Using viseme recognition to improve a sign language translation system. In: International Workshop on Spoken Language Translation, pp. 197–203 (2013). Citeseer
Forster, J., Schmidt, C., Koller, O., Bellgardt, M., Ney, H.: Extensions of the sign language recognition and translation corpus rwth-phoenix-weather. In: LREC, pp. 1911–1916 (2014)
Kumar, S.S., Wangyal, T., Saboo, V., Srinath, R.: Time series neural networks for real time sign language translation. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 243–248 (2018). https://doi.org/10.1109/ICMLA.2018.00043 . IEEE
Moe, S.Z., Thu, Y.K., Thant, H.A., Min, N.W.: Neural machine translation between myanmar sign language and myanmar written text. In: the Second Regional Conference on Optical Character Recognition and Natural Language Processing Technologies for ASEAN Languages, pp. 13–14 (2018)
Arvanitis, N., Constantinopoulos, C., Kosmopoulos, D.: Translation of sign language glosses to text using sequence-to-sequence attention models. In: 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp. 296–302 (2019). https://doi.org/10.1109/SITIS.2019.00056 . IEEE
Ko, S.-K., Kim, C.J., Jung, H., Cho, C.: Neural sign language translation based on human keypoint estimation. Appl. Sci. 9 (13), 2683 (2019)
Luqman, H., Mahmoud, S.A.: A machine translation system from arabic sign language to arabic. Univ. Access Inf. Soc. 19 (4), 891–904 (2020). https://doi.org/10.1007/s10209-019-00695-6
Zheng, J., Zhao, Z., Chen, M., Chen, J., Wu, C., Chen, Y., Shi, X., Tong, Y.: An improved sign language translation model with explainable adaptations for processing long sign sentences. Computational Intelligence and Neuroscience 2020 (2020)
Yin, K., Read, J.: Better sign language translation with stmc-transformer. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5975–5989 (2020). https://doi.org/10.18653/v1/2020.coling-main.525
Li, D., Xu, C., Yu, X., Zhang, K., Swift, B., Suominen, H., Li, H.: Tspnet: Hierarchical feature learning via temporal semantic pyramid for sign language translation. Adv. Neural. Inf. Process. Syst. 33 , 12034–12045 (2020)
Rodriguez, J., Chacon, J., Rangel, E., Guayacan, L., Hernandez, C., Hernandez, L., Martinez, F.: Understanding motion in sign language: A new structured translation dataset. In: Proceedings of the Asian Conference on Computer Vision (2020)
Moe, S.Z., Thu, Y.K., Thant, H.A., Min, N.W., Supnithi, T.: Unsupervised neural machine translation between myanmar sign language and myanmar language. tic 14(15), 16 (2020)
Kim, S., Kim, C.J., Park, H.-M., Jeong, Y., Jang, J.Y., Jung, H.: Robust keypoint normalization method for korean sign language translation using transformer. In: 2020 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1303–1305 (2020). https://doi.org/10.1109/ICTC49870.2020.9289551 . IEEE
Partaourides, H., Voskou, A., Kosmopoulos, D., Chatzis, S., Metaxas, D.N.: Variational bayesian sequence-to-sequence networks for memory-efficient sign language translation. In: International Symposium on Visual Computing, pp. 251–262 (2020). Springer
Zhang, X., Duh, K.: Approaching sign language gloss translation as a low-resource machine translation task. In: Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pp. 60–70. Association for Machine Translation in the Americas, Virtual (2021). https://aclanthology.org/2021.mtsummit-at4ssl.7
Zhao, J., Qi, W., Zhou, W., Nan, D., Zhou, M., Li, H.: Conditional sentence generation and cross-modal reranking for sign language translation. IEEE Trans. Multimedia (2021). https://doi.org/10.1109/TMM.2021.3087006
Camgöz, N.C., Saunders, B., Rochette, G., Giovanelli, M., Inches, G., Nachtrab-Ribback, R., Bowden, R.: Content4all open research sign language translation datasets. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 1–5 (2021). https://doi.org/10.1109/FG52635.2021.9667087
Moryossef, A., Yin, K., Neubig, G., Goldberg, Y.: Data augmentation for sign language gloss translation. In: Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pp. 1–11. Association for Machine Translation in the Americas, Virtual (2021). https://aclanthology.org/2021.mtsummit-at4ssl.1
Zheng, J., Chen, Y., Wu, C., Shi, X., Kamal, S.M.: Enhancing neural sign language translation by highlighting the facial expression information. Neurocomputing 464 , 462–472 (2021)
Rodriguez, J., Martinez, F.: How important is motion in sign language translation? IET Comput. Vision 15 (3), 224–234 (2021)
Zhou, H., Zhou, W., Qi, W., Pu, J., Li, H.: Improving sign language translation with monolingual data by sign back-translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1316–1325 (2021)
Yin, A., Zhao, Z., Liu, J., Jin, W., Zhang, M., Zeng, X., He, X.: Simulslt: End-to-end simultaneous sign language translation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4118–4127 (2021)
Gan, S., Yin, Y., Jiang, Z., Xie, L., Lu, S.: Skeleton-aware neural sign language translation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4353–4361 (2021)
Zhou, H., Zhou, W., Zhou, Y., Li, H.: Spatial-temporal multi-cue network for sign language recognition and translation. IEEE Trans. Multimedia (2021). https://doi.org/10.1109/TMM.2021.3059098
Voskou, A., Panousis, K.P., Kosmopoulos, D., Metaxas, D.N., Chatzis, S.: Stochastic transformer networks with linear competing units: Application to end-to-end sl translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11946–11955 (2021)
Shi, B., Brentari, D., Shakhnarovich, G., Livescu, K.: Ttic’s wmt-slt 22 sign language translation system. In: Proceedings of the Seventh Conference on Machine Translation, pp. 989–993. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.wmt-1.96
Hamidullah, Y., van Genabith, J., España-Bonet, C.: Spatio-temporal sign language representation and translation. In: Proceedings of the Seventh Conference on Machine Translation, pp. 977–982. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.wmt-1.94
Hufe, L., Avramidis, E.: Experimental machine translation of the swiss german sign language via 3d augmentation of body keypoints. In: Proceedings of the Seventh Conference on Machine Translation, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics (2022)
Müller, M., Ebling, S., Avramidis, E., Battisti, A., Berger, M., Bowden, R., Braffort, A., Cihan Camgöz, N., España-Bonet, C., Grundkiewicz, R., Jiang, Z., Koller, O., Moryossef, A., Perrollaz, R., Reinhard, S., Rios, A., Shterionov, D., Sidler-Miserez, S., Tissi, K., Van Landuyt, D.: Findings of the first wmt shared task on sign language translation (wmt-slt22). In: Proceedings of the Seventh Conference on Machine Translation, pp. 744–772. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.wmt-1.71
Miranda, P.B., Casadei, V., Silva, E., Silva, J., Alves, M., Severo, M., Freitas, J.P.: Tspnet-hf: A hand/face tspnet method for sign language translation. In: Ibero-American Conference on Artificial Intelligence, pp. 305–316 (2022). Springer
Jin, T., Zhao, Z., Zhang, M., Zeng, X.: Prior knowledge and memory enriched transformer for sign language translation. In: Findings of the Association for Computational Linguistics: ACL 2022, pp. 3766–3775 (2022)
Chen, Y., Zuo, R., Wei, F., Wu, Y., Liu, S., Mak, B.: Two-stream network for sign language recognition and translation. arXiv preprint arXiv:2211.01367 (2022)
Li, R., Meng, L.: Sign language recognition and translation network based on multi-view data. Appl. Intell. 52 (13), 14624–14638 (2022)
Dal Bianco, P., Ríos, G., Ronchetti, F., Quiroga, F., Stanchi, O., Hasperué, W., Rosete, A.: Lsa-t: The first continuous argentinian sign language dataset for sign language translation. In: Ibero-American Conference on Artificial Intelligence, pp. 293–304 (2022). Springer
Angelova, G., Avramidis, E., Möller, S.: Using neural machine translation methods for sign language translation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 273–284 (2022)
De Coster, M., Dambre, J.: Leveraging frozen pretrained written language models for neural sign language translation. Information 13 (5), 220 (2022)
Jin, T., Zhao, Z., Zhang, M., Zeng, X.: Mc-slt: Towards low-resource signer-adaptive sign language translation. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4939–4947 (2022)
Kan, J., Hu, K., Hagenbuchner, M., Tsoi, A.C., Bennamoun, M., Wang, Z.: Sign language translation with hierarchical spatio-temporal graph neural network. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3367–3376 (2022)
Chaudhary, L., Ananthanarayana, T., Hoq, E., Nwogu, I.: Signnet ii: A transformer-based two-way sign language translation model. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)
Dey, S., Pal, A., Chaabani, C., Koller, O.: Clean text and full-body transformer: Microsoft’s submission to the wmt22 shared task on sign language translation. In: Proceedings of the Seventh Conference on Machine Translation, pp. 969–976. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.wmt-1.93
Tarres, L., Gállego, G.I., Giro-i-Nieto, X., Torres, J.: Tackling low-resourced sign language translation: Upc at wmt-slt 22. In: Proceedings of the Seventh Conference on Machine Translation, pp. 994–1000. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.wmt-1.97
Yin, A., Zhao, Z., Jin, W., Zhang, M., Zeng, X., He, X.: Mlslt: Towards multilingual sign language translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5109–5119 (2022)
Chen, Y., Wei, F., Sun, X., Wu, Z., Lin, S.: A simple multi-modality transfer learning baseline for sign language translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5120–5130 (2022)
Mohamed, A., Hefny, H., et al.: A deep learning approach for gloss sign language translation using transformer. Journal of Computing and Communication 1 (2), 1–8 (2022)
Fraiwan, M., Khasawneh, N., Ershedat, H., Al-Alali, I., Al-Kofahi, H.: A kinect-based system for arabic sign language to speech translation. Int. J. Comput. Appl. Technol. 52 (2–3), 117–126 (2015)
Jin, C.M., Omar, Z., Jaward, M.H.: A mobile application of american sign language translation via image processing algorithms. In: 2016 IEEE Region 10 Symposium (TENSYMP), pp. 104–109 (2016). IEEE
Patil, Y., Krishnadas, S., Kastwar, A., Kulkarni, S.: American and indian sign language translation using computer vision. In: International Conference on Business Management, Innovation & Sustainability (ICBMIS) (2020)
Makarov, I., Veldyaykin, N., Chertkov, M., Pokoev, A.: American and russian sign language dactyl recognition and text2sign translation. In: International Conference on Analysis of Images, Social Networks and Texts, pp. 309–320 (2019). Springer
Bukhari, J., Rehman, M., Malik, S.I., Kamboh, A.M., Salman, A.: American sign language translation through sensory glove; signspeak. International Journal of u-and e-Service, Science and Technology 8 (1), 131–142 (2015)
Joshi, A., Sierra, H., Arzuaga, E.: American sign language translation using edge detection and cross correlation. In: 2017 IEEE Colombian Conference on Communications and Computing (COLCOM), pp. 1–6 (2017). IEEE
Rizwan, S.B., Khan, M.S.Z., Imran, M.: American sign language translation via smart wearable glove technology. In: 2019 International Symposium on Recent Advances in Electrical Engineering (RAEE), vol. 4, pp. 1–6 (2019). IEEE
Halawani, S.M., Zaitun, A.: An avatar based translation system from arabic speech to arabic sign language for deaf people. International Journal of Information Science and Education 2 (1), 13–20 (2012)
Anand, M.S., Kumaresan, A., Kumar, N.M.: An integrated two way isl (indian sign language) translation system–a new approach. International Journal of Advanced Research in Computer Science 4 (1) (2013)
Kanwal, K., Abdullah, S., Ahmed, Y.B., Saher, Y., Jafri, A.R.: Assistive glove for pakistani sign language translation. In: 17th IEEE International Multi Topic Conference 2014, pp. 173–176 (2014). IEEE
Angona, T.M., Shaon, A.S., Niloy, K.T.R., Karim, T., Tasnim, Z., Reza, S.S., Mahbub, T.N.: Automated bangla sign language translation system for alphabets by means of mobilenet. Telkomnika 18 (3), 1292–1301 (2020)
Hoque, M.T., Rifat-Ut-Tauwab, M., Kabir, M.F., Sarker, F., Huda, M.N., Abdullah-Al-Mamun, K.: Automated bangla sign language translation system: Prospects, limitations and applications. In: 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), pp. 856–862 (2016). IEEE
Oliveira, T., Escudeiro, P., Escudeiro, N., Rocha, E., Barbosa, F.M.: Automatic sign language translation to improve communication. In: 2019 IEEE Global Engineering Education Conference (EDUCON), pp. 937–942 (2019). IEEE
Ayadi, K., ElHadj, Y.O., Ferchichi, A.: Automatic translation from arabic to arabic sign language: A review. In: 2018 JCCO Joint International Conference on ICT in Education and Training, International Conference on Computing in Arabic, and International Conference on Geocomputing (JCCO: TICET-ICCA-GECO), pp. 1–5 (2018). IEEE
Mohandes, M.: Automatic translation of arabic text to arabic sign language. AIML Journal 6 (4), 15–19 (2006)
Fernandes, L., Dalvi, P., Junnarkar, A., Bansode, M.: Convolutional neural network based bidirectional sign language translation system. In: 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 769–775 (2020). IEEE
Dabwan, B.A.: Convolutional neural network-based sign language translation system. International Journal of Engineering, Science and Mathematics 9 (6), 47–57 (2020)
Martin, V.: Design and implementation of a system for automatic sign language translation. In: Future Access Enablers of Ubiquitous and Intelligent Infrastructures, pp. 307–313 (2015). Springer
Yudhana, A., Rahmawan, J., Negara, C.: Flex sensors and mpu6050 sensors responses on smart glove for sign language translation. In: IOP Conference Series: Materials Science and Engineering, vol. 403, p. 012032 (2018). IOP Publishing
Mohanty, S., Prasad, S., Sinha, T., Krupa, B.N.: German sign language translation using 3d hand pose estimation and deep learning. In: 2020 IEEE REGION 10 CONFERENCE (TENCON), pp. 773–778 (2020). IEEE
Pranatadesta, R.A., Suwardi, I.S.: Indonesian sign language (bisindo) translation system with orb for bilingual language. In: 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT), pp. 502–505 (2019). IEEE
Prasad, P.K., Shibu, A.P., et al.: Intelligent human sign language translation using support vector machines classifier. IJRAR-International Journal of Research and Analytical Reviews (IJRAR) 5 (4), 461–466 (2018)
Bajpai, D., Mishra, V.: Low cost full duplex wireless glove for static and trajectory based american sign language translation to multimedia output. In: 2016 8th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 646–652 (2016). IEEE
Yang, S., Cui, X., Guo, R., Zhang, Z., Sang, S., Zhang, H.: Piezoelectric sensor based on graphene-doped pvdf nanofibers for sign language translation. Beilstein J. Nanotechnol. 11 (1), 1655–1662 (2020)
Gamarra, J.E.M., Cubas, M.A.S., Silupú, J.D.S., Chirinos, C.E.C.: Prototype for peruvian sign language translation based on an artificial neural network approach. In: 2020 IEEE XXVII International Conference on Electronics, Electrical Engineering and Computing (INTERCON), pp. 1–4 (2020). IEEE
El-Alfi, A., El-Gamal, A., El-Adly, R.: Real time arabic sign language to arabic text & sound translation system. Int. J. Eng 3 (5) (2014)
Salem, N., Alharbi, S., Khezendar, R., Alshami, H.: Real-time glove and android application for visual and audible arabic sign language translation. Procedia Computer Science 163 , 450–459 (2019)
Abraham, E., Nayak, A., Iqbal, A.: Real-time translation of indian sign language using lstm. In: 2019 Global Conference for Advancement in Technology (GCAT), pp. 1–5 (2019). IEEE
Escudeiro, N., Escudeiro, P., Soares, F., Litos, O., Norberto, M., Lopes, J.: Recognition of hand configuration: A critical factor in automatic sign language translation. In: 2017 12th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–5 (2017). IEEE
Quach, L.-D., Duong-Trung, N., Vu, A.-V., Nguyen, C.-N.: Recommending the workflow of vietnamese sign language translation via a comparison of several classification algorithms. In: International Conference of the Pacific Association for Computational Linguistics, pp. 134–141 (2019). Springer
Xiaomei, Z., Shiquan, D., Hui, W.: Research on chinese-american sign language translation. In: 2011 14th IEEE International Conference on Computational Science and Engineering, pp. 555–558 (2011). IEEE
Liqing, G., Wenwen, L., Yong, S., Yanyan, W., Guoming, L.: Research on portable sign language translation system based on embedded system. In: 2018 3rd International Conference on Smart City and Systems Engineering (ICSCSE), pp. 636–639 (2018). IEEE
Dajie, X., Shuning, K., Songlin, L.: Research on the translation of gloves based on embedded sign language. Digital Technology and Application (2017)
Singh, D.K., Kumar, A., Ansari, M.A.: Robust modelling of static hand gestures using deep convolutional network for sign language translation. In: 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), pp. 487–492 (2021). IEEE
Narashiman, D., Vidhya, S., Mala, D.T.: Tamil noun to sign language-a machine translation approach. In: Proceeding of 11th Tamil Internet Conference, pp. 175–179 (2012)
Alam, M., Tanvir, M., Saha, D.K., Das, S.K., et al.: Two dimensional convolutional neural network approach for real-time bangla sign language characters recognition and translation. SN Computer Science 2 (5), 1–13 (2021)
Zou, X., Chai, Y., Ma, H., Jiang, Q., Zhang, W., Ma, X., Wang, X., Lian, H., Huang, X., Ji, J., et al.: Ultrahigh sensitive wearable pressure sensors based on reduced graphene oxide/polypyrrole foam for sign language translation. Advanced Materials Technologies 6 (7), 2001188 (2021)
Sonare, B., Padgal, A., Gaikwad, Y., Patil, A.: Video-based sign language translation system using machine learning. In: 2021 2nd International Conference for Emerging Technology (INCET), pp. 1–4 (2021). IEEE
Madhuri, Y., Anitha, G., Anburajan, M.: Vision-based sign language translation device. In: 2013 International Conference on Information Communication and Embedded Systems (ICICES), pp. 565–568 (2013). IEEE
Lee, S., Jo, D., Kim, K.-B., Jang, J., Park, W.: Wearable sign language translation system using strain sensors. Sens. Actuators, A 331 , 113010 (2021)
Kim, T., Kim, S.: Sign language translation system using latent feature values of sign language images. In: 2016 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), pp. 228–233 (2016). IEEE
Domingo, A., Akmeliawati, R., Chow, K.Y.: Pattern matching for automatic sign language translation system using labview. In: 2007 International Conference on Intelligent and Advanced Systems, pp. 660–665 (2007). IEEE
Yetkin, O., Calderon, K., Krishna Moorthy, P., Nguyen, T.T., Tran, J., Terry, T., Vigil, A., Alsup, A., Tekleab, A., Sancillo, D., et al. : A lightweight wearable american sign language translation device. In: Frontiers in Biomedical Devices, vol. 84815, pp. 001–04007 (2022). American Society of Mechanical Engineers
Kuriakose, Y.V., Jangid, M.: Translation of american sign language to text: Using yolov3 with background subtraction and edge detection. In: Smart Systems: Innovations in Computing, pp. 21–30. Springer, ??? (2022)
Shokoori, A.F., Shinwari, M., Popal, J.A., Meena, J.: Sign language recognition and translation into pashto language alphabets. In: 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), pp. 1401–1405 (2022). IEEE
Bismoy, M.I., Shahrear, F., Mitra, A., Bikash, D., Afrin, F., Roy, S., Arif, H.: Image translation of bangla and english sign language to written language using convolutional neural network. In: 2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), pp. 1–6 (2022). IEEE
Rajarajeswari, S., Renji, N.M., Kumari, P., Keshavamurthy, M., Kruthika, K.: Real-time translation of indian sign language to assist the hearing and speech impaired. In: Innovations in Computational Intelligence and Computer Vision, pp. 303–322. Springer, ??? (2022)
Dabhade, T., Ghawate, S., Diwane, A., Andrade, C., Chavan, P.: Sign language translation using cnn survey
Abougarair, A., Arebi, W.: Smart glove for sign language translation. Int Rob Auto J 8 (3), 109–117 (2022)
Wu, R., Seo, S., Ma, L., Bae, J., Kim, T.: Full-fiber auxetic-interlaced yarn sensor for sign-language translation glove assisted by artificial neural network. Nano-Micro Letters 14 (1), 1–14 (2022)
Klomsae, A., Auephanwiriyakul, S., Theera-Umpon, N.: A novel string grammar unsupervised possibilistic c-medians algorithm for sign language translation systems. Symmetry 9 (12), 321 (2017)
Zhou, Z., Neo, Y., Lui, K.-S., Tam, V.W., Lam, E.Y., Wong, N.: A portable hong kong sign language translation platform with deep learning and jetson nano. In: The 22nd International ACM SIGACCESS Conference on Computers and Accessibility, pp. 1–4 (2020)
Kau, L.-J., Su, W.-L., Yu, P.-J., Wei, S.-J.: A real-time portable sign language translation system. In: 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1–4 (2015). IEEE
Park, H., Lee, J.-S., Ko, J.: Achieving real-time sign language translation using a smartphone’s true depth images. In: 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS), pp. 622–625 (2020). IEEE
Eqab, A., Shanableh, T.: Android mobile app for real-time bilateral arabic sign language translation using leap motion controller. In: 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–5 (2017). IEEE
Tumsri, J., Kimpan, W.: Applied finite automata and quadtree technique for thai sign language translation. In: International MultiConference of Engineers and Computer Scientists, pp. 351–365 (2017). Springer
Kanvinde, A., Revadekar, A., Tamse, M., Kalbande, D.R., Bakereywala, N.: Bidirectional sign language translation. In: 2021 International Conference on Communication Information and Computing Technology (ICCICT), pp. 1–5 (2021). IEEE
Kaur, P., Ganguly, P., Verma, S., Bansal, N.: Bridging the communication gap: with real time sign language translation. In: 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), pp. 485–490 (2018). IEEE
Park, C.-I., Sohn, C.-B.: Data augmentation for human keypoint estimation deep learning based sign language translation. Electronics 9 (8), 1257 (2020)
Salim, B.W., Zeebaree, S.R.: Design & analyses of a novel real time kurdish sign language for kurdish text and sound translation system. In: 2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S &T), pp. 348–352 (2020). IEEE
Lee, J., Heo, S., Baek, D., Park, E., Lim, H., Ahn, H.: Design and implementation of sign language translation program using motion recognition. International Journal of Hybrid Information Technology 12 (2), 47–54 (2019)
Hazari, S.S., Alam, L., Al Goni, N., et al. : Designing a sign language translation system using kinect motion sensor device. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 344–349 (2017). IEEE
Putra, Z.P., Anasanti, M.D., Priambodo, B.: Designing translation tool: Between sign language to spoken text on kinect time series data using dynamic time warping. Sinergi (2018)
Pezzuoli, F., Tafaro, D., Pane, M., Corona, D., Corradini, M.L.: Development of a new sign language translation system for people with autism spectrum disorder. Advances in Neurodevelopmental Disorders 4 (4), 439–446 (2020)
Pezzuoli, F., Corona, D., Corradini, M.L., Cristofaro, A.: Development of a wearable device for sign language translation, 115–126 (2019)
Ab Majid, N.K., Norddin, N., Jaffar, K., Jaafar, R., Abd Halim, A.A., Ahmad, E.Z., dan Elektronik, F.T.K.E.: Development of a wearable glove for a sign language translation. Proceedings of Mechanical Engineering Research Day 2020, 263–265 (2020)
Neo, K.C., Ibrahim, H.: Development of sign signal translation system based on altera’s fpga de2 board. International Journal of Human Computer Interaction (IJHCI) 2 (3), 101 (2011)
Park, H., Lee, Y., Ko, J.: Enabling real-time sign language translation on mobile platforms with on-board depth cameras. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5 (2), 1–30 (2021)
Reis, L.S., de Araújo, T.M.U., Aguiar, Y.P.C., Lima, M.A.C.B.: Evaluating machine translation systems for brazilian sign language in the treatment of critical grammatical aspects. In: Proceedings of the 19th Brazilian Symposium on Human Factors in Computing Systems, pp. 1–6 (2020)
Madushanka, A., Senevirathne, R., Wijesekara, L., Arunatilake, S., Sandaruwan, K.: Framework for sinhala sign language recognition and translation using a wearable armband. In: 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer), pp. 49–57 (2016). IEEE
Estrada Jiménez, L.A., Benalcázar, M.E., Sotomayor, N.: Gesture recognition and machine learning applied to sign language translation. In: VII Latin American Congress on Biomedical Engineering CLAIB 2016, Bucaramanga, Santander, Colombia, October 26th-28th, 2016, pp. 233–236 (2017). Springer
Verma, H.V., Aggarwal, E., Chandra, S.: Gesture recognition using kinect for sign language translation. In: 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013), pp. 96–100 (2013). IEEE
Fu, Q., Shen, J., Zhang, X., Wu, Z., Zhou, M.: Gesture recognition with kinect for automated sign language translation. J. Beijing Normal Univ.(Nat. Sci.) 49(6), 586–587 (2013)
Nagpal, A., Singha, K., Gouri, R., Noor, A., Bagwari, A.: Hand sign translation to audio message and text message: A device. In: 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 243–245 (2020). IEEE
Osman, M.N., Sedek, K.A., Zain, N.Z.M., Karim, M.A.N.A., Maghribi, M.: Hearing assistive technology: Sign language translation application for hearing-impaired communication, 1–11 (2020)
Jose, M.J., Priyadharshni, V., Anand, M.S., Kumaresan, A., Mo-hanKumar, N.: Indian sign language (isl) translation system for sign language learning. International Journal of Innovative Research and Development 2 (5), 358–367 (2013)
Wilson, B.J., Anspach, G.: Neural networks for sign language translation. In: Applications of Artificial Neural Networks IV, vol. 1965, pp. 589–599 (1993). SPIE
Raziq, N., Latif, S.: Pakistan sign language recognition and translation system using leap motion device. In: International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp. 895–902 (2016). Springer
Akmeliawati, R., Ooi, M.P.-L., Kuang, Y.C.: Real-time malaysian sign language translation using colour segmentation and neural network. In: 2007 IEEE Instrumentation & Measurement Technology Conference IMTC 2007, pp. 1–6 (2007). IEEE
Pansare, J., Rampurkar, K.S., Mahamane, P.L., Baravkar, R.J., Lanjewar, S.V.: Real-time static devnagri sign language translation using histogram. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) 2 (4), 1455–1459 (2011)
Praveena, S., Jayasri, C.: Recognition and translation of indian sign language for deaf and dumb people. International Journal Of Information And Computing Science 6 (2019)
He, S.: Research of a sign language translation system based on deep learning. In: 2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM), pp. 392–396 (2019). IEEE
Elsayed, E.K., Fathy, D.R.: Sign language semantic translation system using ontology and deep learning. International Journal of Advanced Computer Science and Applications 11 (2020)
Sharma, A., Panda, S., Verma, S.: Sign language to speech translation. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–8 (2020). IEEE
Harini, R., Janani, R., Keerthana, S., Madhubala, S., Venkatasubramanian, S.: Sign language translation. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 883–886 (2020). IEEE
Fernando, P., Wimalaratne, P.: Sign language translation approach to sinhalese language. GSTF Journal on Computing (JoC) 5 (1), 1–9 (2016)
Abe, M., Sakou, H., Sagawa, H.: Sign language translation based on syntactic and semantic analysis. Systems and computers in Japan 25 (6), 91–103 (1994)
Khan, M., Siddiqui, N., et al. : Sign language translation in urdu/hindi through microsoft kinect. In: IOP Conference Series: Materials Science and Engineering, vol. 899, p. 012016 (2020). IOP Publishing
Antony, R., Paul, S., Alex, S., et al.: Sign language translation system. International Journal of Scientific Research & Engineering Trends 6 (2020)
Shi, G., Li, Z., Tu, K., Jia, S., Cui, Q., Jin, Y.: Sign language translation system based on micro-inertial measurement units and zigbee network. Trans. Inst. Meas. Control. 35 (7), 901–909 (2013)
Ohki, M., Sagawa, H., Hataoka, N., Fujisawa, H.: Sign language translation system using pattern recognition and synthesis. Hitachi review 44 (4), 251–254 (1995)
Wu, C.-H., Chiu, Y.-H., Cheng, K.-W.: Sign language translation using an error tolerant retrieval algorithm. In: Seventh International Conference on Spoken Language Processing (2002)
Abiyev, R.H., Arslan, M., Idoko, J.B.: Sign language translation using deep convolutional neural networks. KSII Transactions on Internet and Information Systems (TIIS) 14 (2), 631–653 (2020)
Khan, S.A., Ansari, Z.A., Singh, R., Rawat, M.S., Khan, F.Z., Yadav, S.K.: Sign translation via natural language processing. population 4 , 5
Vachirapipop, M., Soymat, S., Tiraronnakul, W., Hnoohom, N.: Sign translation with myo armbands. In: 2017 21st International Computer Science and Engineering Conference (ICSEC), pp. 1–5 (2017). IEEE
Sapkota, B., Gurung, M.K., Mali, P., Gupta, R.: Smart glove for sign language translation using arduino. In: 1st KEC Conference Proceedings, vol. 1, pp. 5–11 (2018)
Chanda, P., Auephanwiriyakul, S., Theera-Umpon, N.: Thai sign language translation system using upright speed-up robust feature and c-means clustering. In: 2012 IEEE International Conference on Fuzzy Systems, pp. 1–6 (2012). IEEE
Chanda, P., Auephanwiriyakul, S., Theera-Umpon, N.: Thai sign language translation system using upright speed-up robust feature and dynamic time warping. In: 2012 IEEE International Conference on Computer Science and Automation Engineering (CSAE), vol. 2, pp. 70–74 (2012). IEEE
Phitakwinai, S., Auephanwiriyakul, S., Theera-Umpon, N.: Thai sign language translation using fuzzy c-means and scale invariant feature transform. In: International Conference on Computational Science and Its Applications, pp. 1107–1119 (2008). Springer
Auephanwiriyakul, S., Phitakwinai, S., Suttapak, W., Chanda, P., Theera-Umpon, N.: Thai sign language translation using scale invariant feature transform and hidden markov models. Pattern Recogn. Lett. 34 (11), 1291–1298 (2013)
Tumsri, J., Kimpan, W.: Thai sign language translation using leap motion controller. In: Proceedings of the International Multiconference of Engineers and Computer Scientists, pp. 46–51 (2017)
Maharjan, P., Bhatta, T., Park, J.Y.: Thermal imprinted self-powered triboelectric flexible sensor for sign language translation. In: 2019 20th International Conference on Solid-State Sensors, Actuators and Microsystems & Eurosensors XXXIII (TRANSDUCERS & EUROSENSORS XXXIII), pp. 385–388 (2019). IEEE
Izzah, A., Suciati, N.: Translation of sign language using generic fourier descriptor and nearest neighbour. International Journal on Cybernetics and Informatics 3 (1), 31–41 (2014)
Mean Foong, O., Low, T.J., La, W.W.: V2s: Voice to sign language translation system for malaysian deaf people. In: International Visual Informatics Conference, pp. 868–876 (2009). Springer
Jenkins, J., Rashad, S.: Leapasl: A platform for design and implementation of real time algorithms for translation of american sign language using personal supervised machine learning models. Software Impacts 12 , 100302 (2022)
Natarajan, B., Rajalakshmi, E., Elakkiya, R., Kotecha, K., Abraham, A., Gabralla, L.A., Subramaniyaswamy, V.: Development of an end-to-end deep learning framework for sign language recognition, translation, and video generation. IEEE Access 10 , 104358–104374 (2022)
Axyonov, A.A., Kagirov, I.A., Ryumin, D.A.: A method of multimodal machine sign language translation for natural human-computer interaction. Journal Scientific and Technical Of Information Technologies, Mechanics and Optics 139 (3), 585 (2022)
Chattopadhyay, M., Parulekar, M., Bhat, V., Raisinghani, B., Arya, S.: Sign language translation using a chrome extension for google meet. In: 2022 IEEE Region 10 Symposium (TENSYMP), pp. 1–5 (2022). IEEE
Wilson, E.J., Anspach, G.: Applying neural network developments to sign language translation. In: Neural Networks for Signal Processing III-Proceedings of the 1993 IEEE-SP Workshop, pp. 301–310 (1993). IEEE
Wang, S., Guo, D., Zhou, W.-g., Zha, Z.-J., Wang, M.: Connectionist temporal fusion for sign language translation. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1483–1491 (2018)
Guo, D., Zhou, W., Li, H., Wang, M.: Hierarchical lstm for sign language translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Guo, D., Tang, S., Wang, M.: Connectionist temporal modeling of video and language: a joint model for translation and sign labeling. In: IJCAI, pp. 751–757 (2019)
Guo, D., Wang, S., Tian, Q., Wang, M.: Dense temporal convolution network for sign language translation. In: IJCAI, pp. 744–750 (2019)
Elons, A.S., Ahmed, M., Shedid, H.: Facial expressions recognition for arabic sign language translation. In: 2014 9th International Conference on Computer Engineering & Systems (ICCES), pp. 330–335 (2014). IEEE
Wurm, S.: Finding the bones for the skeleton: A case of developing sign language translation practices. In: The Third Community Interpreting Research Seminar in Ireland (2011)
Fei, B., Jiwei, H., Xuemei, J., Ping, L.: Gesture recognition for sign language video stream translation. In: 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), pp. 1315–1319 (2020). IEEE
Boulares, M., Jemni, M.: Learning sign language machine translation based on elastic net regularization and latent semantic analysis. Artif. Intell. Rev. 46 (2), 145–166 (2016)
Werapan, W., Chotikakamthorn, N.: Improved dynamic gesture segmentation for thai sign language translation. In: Proceedings 7th International Conference on Signal Processing, 2004. Proceedings. ICSP’04. 2004., vol. 2, pp. 1463–1466 (2004). IEEE
Wu, C., Pan, C., Jin, Y., Sun, S., Shi, G.: Improvement of chinese sign language translation system based on collaboration of arm and finger sensing nodes. In: 2016 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 474–478 (2016). IEEE
Tu, K., Pan, C., Zhang, J., Jin, Y., Wang, J., Shi, G.: Improvement of chinese sign language translation system based on multi-node micro inertial measurement unit. In: 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 1781–1786 (2015). IEEE
Pezzuoli, F., Corona, D., Corradini, M.L.: Improvements in a wearable device for sign language translation. In: International Conference on Applied Human Factors and Ergonomics, pp. 70–81 (2019). Springer
Sagawa, H., Sakiyama, T., Oohira, E., Sakou, H., Abe, M.: Prototype sign language translation system. In: Proceedings of IISF/ACM Japan International Symposium, pp. 152–153 (1994)
Song, P., Guo, D., Xin, H., Wang, M.: Parallel temporal encoder for sign language translation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1915–1919 (2019). IEEE
Feng, S., Yuan, T.: Sign language translation based on new continuous sign language dataset. In: 2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), pp. 491–494 (2022). IEEE
Yin, Q., Tao, W., Liu, X., Hong, Y.: Neural sign language translation with sf-transformer. In: 2022 the 6th International Conference on Innovation in Artificial Intelligence (ICIAI), pp. 64–68 (2022)
Samonte, M.J.C., Guingab, C.J.M., Relayo, R.A., Sheng, M.J.C., Tamayo, J.R.D.: Using deep learning in sign language translation to text
Zhou, Z., Tam, V.W., Lam, E.Y.: A portable sign language collection and translation platform with smart watches using a blstm-based multi-feature framework. Micromachines 13 (2), 333 (2022)
Tang, S., Guo, D., Hong, R., Wang, M.: Graph-based multimodal sequential embedding for sign language translation. IEEE Transactions on Multimedia (2021)
Nunnari, F., España-Bonet, C., Avramidis, E.: A data augmentation approach for sign-language-to-text translation in-the-wild. In: 3rd Conference on Language, Data and Knowledge (LDK 2021) (2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik
Borgotallo, R., Marino, C., Piccolo, E., Prinetto, P., Tiotto, G., Rossini, M.: A multilanguage database for supporting sign language translation and synthesis. In: sign-langLREC2010, pp. 23–26 (2010). European Language Resources Association (ELRA)
Morrissey, S.: An asessment of appropriate sign language representation for machine translation in the heathcare domain. In: Sign Language Corpora: Linguistics Issues Workshop (2009). Citeseer
Halawani, S.M.: Arabic sign language translation system on mobile devices. IJCSNS International Journal of Computer Science and Network Security 8 (1), 251–256 (2008)
Kaczmarek, M., Filhol, M.: Assisting sign language translation: what interface given the lack of written form and the spatial grammar? In: Translating and the Computer (2019)
Baumgärtner, L., Jauss, S., Maucher, J., Zimmermann, G.: Automated sign language translation: The role of artificial intelligence now and in the future. In: CHIRA, pp. 170–177 (2020)
Morrissey, S., Somers, H., Smith, R., Gilchrist, S., Dandapat, S.: Building a sign language corpus for use in machine translation. Corpora and Sign Language Technologies, Representation and Processing of Sign Languages (2010)
Grif, M.G., Korolkova, O.O., Demyanenko, Y.A., Tsoy, E.B.: Computer sign language translation system for hearing impaired users. In: 2012 7th International Forum on Strategic Technology (IFOST), pp. 1–4 (2012). IEEE
Kaczmarek, M., Filhol, M.: Computer-assisted sign language translation: a study of translators’ practice to specify cat software. Mach. Transl. 35 (3), 305–322 (2021)
Duarte, A.C.: Cross-modal neural sign language translation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1650–1654 (2019)
Kim, J., Hasimoto, A., Aoki, Y., Burger, A.: Design of a sign-language translation system between the japanese-korean by java-lifo language. In: Proceedings of IEEE. IEEE Region 10 Conference. TENCON 99.’Multimedia Technology for Asia-Pacific Information Infrastructure’(Cat. No. 99CH37030), vol. 1, pp. 423–426 (1999). IEEE
Ali, S.F., Mishra, G.S., Sahoo, A.K.: Domain bounded english to indian sign language translation model. International Journal of Computer Science and Informatics 3 (1), 41–45 (2013)
Ward, A., Escudeiro, N., Escudeiro, P.: Insights into the complexities of communication and automated sign language translation from the i-ace project. In: 2019 29th Annual Conference of the European Association for Education in Electrical and Information Engineering (EAEEIE), pp. 1–5 (2019). IEEE
Hodorogea, V., et al.: Intersemiotics in contemporary advertising. from sign translation to meaning coherence. Professional Communication and Translation Studies (8), 45–55 (2015)
Kawano, S., Izumi, C., Kurokawa, T., Morimoto, K.: Japanese jsl translation and searching display conditions for expressing easy-to-understand sign animation. In: International Conference on Computers for Handicapped Persons, pp. 667–674 (2006). Springer
Barberis, D., Garazzino, N., Prinetto, P., Tiotto, G., Savino, A., Shoaib, U., Ahmad, N.: Language resources for computer assisted translation from italian to italian sign language of deaf people. In: Proceedings of Accessibility Reaching Everywhere AEGIS Workshop and International Conference, pp. 96–104 (2011)
Kau, L.-J., Zhuo, B.-X.: Live demo: A real-time portable sign language translation system. In: 2016 IEEE Biomedical Circuits and Systems Conference (BioCAS), pp. 134–134 (2016). IEEE
Boulares, M., Jemni, M.: Mobile sign language translation system for deaf community. In: Proceedings of the International Cross-disciplinary Conference on Web Accessibility, pp. 1–4 (2012)
Wolfe, R., Efthimiou, E., Glauert, J., Hanke, T., McDonald, J., Schnepp, J.: recent advances in sign language translation and avatar technology. Univ. Access Inf. Soc. 15 (4), 485–486 (2016)
Liu, Z., Zhang, X., Kato, J.: Research on chinese-japanese sign language translation system. In: 2010 Fifth International Conference on Frontier of Computer Science and Technology, pp. 640–645 (2010). IEEE
Parton, B.S.: Sign language recognition and translation: A multidisciplined approach from the field of artificial intelligence. J. Deaf Stud. Deaf Educ. 11 (1), 94–101 (2006)
Wolfe, R.: Sign language translation and avatar technology. Mach. Transl. 35 (3), 301–304 (2021)
Grover, Y., Aggarwal, R., Sharma, D., Gupta, P.K.: Sign language translation systems for hearing/speech impaired people: a review. In: 2021 International Conference on Innovative Practices in Technology and Management (ICIPTM), pp. 10–14 (2021). IEEE
Camgöz, N.C., Varol, G., Albanie, S., Fox, N., Bowden, R., Zisserman, A., Cormier, K.: Slrtp 2020: The sign language recognition, translation & production workshop. In: European Conference on Computer Vision, pp. 179–185 (2020). Springer
Nguyen, T.B.D., Phung, T.-N.: Some issues on syntax transformation in vietnamese sign language translation. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY 17 (5), 292–297 (2017)
Van Zijl, L., Olivrin, G.: South african sign language assistive translation. In: Proceedings of IASTED International Conference on Assistive Technologies, Page [no Page Numbers], Baltimore, MD (2008). Citeseer
Van Zijl, L., Barker, D.: South african sign language machine translation system. In: Proceedings of the 2nd International Conference on Computer Graphics, Virtual Reality, Visualisation and Interaction in Africa, pp. 49–52 (2003)
Cox, S., Lincoln, M., Tryggvason, J., Nakisa, M., Wells, M., Tutt, M., Abbott, S.: The development and evaluation of a speech-to-sign translation system to assist transactions. International Journal of Human-Computer Interaction 16 (2), 141–161 (2003)
Murtagh, I., Nogales, V.U., Blat, J.: Sign language machine translation and the sign language lexicon: A linguistically informed approach. In: Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pp. 240–251 (2022)
Jang, J.Y., Park, H.-M., Shin, S., Shin, S., Yoon, B., Gweon, G.: Automatic gloss-level data augmentation for sign language translation. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 6808–6813 (2022)
Shterionov, D., De Sisto, M., Vandeghinste, V., Brady, A., De Coster, M., Leeson, L., Blat, J., Picron, F., Scipioni, M., Parikh, A., et al. : Sign language translation: Ongoing development, challenges and innovations in the signon project. In: 23rd Annual Conference of the European Association for Machine Translation, pp. 323–324 (2022)
Huerta-Enochian, M., Lee, D.H., Myung, H.J., Byun, K.S., Lee, J.W.: Kosign sign language translation project: Introducing the niasl2021 dataset. In: Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives, pp. 59–66 (2022)
Efthimiou, E., Fotinea, S.-E., Hanke, T., McDonald, J.C., Shterionov, D., Wolfe, R.: Proceedings of the 7th international workshop on sign language translation and avatar technology: The junction of the visual and the textual: Challenges and perspectives. In: Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives (2022)
Bertin-Lemée, É., Braffort, A., Challant, C., Danet, C., Dauriac, B., Filhol, M., Martinod, E., Segouat, J.: Rosetta-lsf: an aligned corpus of french sign language and french for text-to-sign translation. In: 13th Conference on Language Resources and Evaluation (LREC 2022) (2022)
Kahlon, N.K., Singh, W.: Machine translation from text to sign language: a systematic review. Universal Access in the Information Society, 1–35 (2021)
Fang, B., Co, J., Zhang, M.: Deepasl: Enabling ubiquitous and non-intrusive word and sentence-level sign language translation. In: Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems, pp. 1–13 (2017)
Guo, D., Zhou, W., Li, A., Li, H., Wang, M.: Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation. IEEE Trans. Image Process. 29 , 1575–1590 (2019). https://doi.org/10.1109/TIP.2019.2941267
Article MathSciNet Google Scholar
Xu, W., Ying, J., Yang, H., Liu, J., Hu, X.: Residual spatial graph convolution and temporal sequence attention network for sign language translation. Multimedia Tools and Applications, 1–25 (2022)
Gu, Y., Zheng, C., Todoh, M., Zha, F.: American sign language translation using wearable inertial and electromyography sensors for tracking hand movements and facial expressions. Frontiers in Neuroscience 16 (2022)
Zhang, Q., Jing, J., Wang, D., Zhao, R.: Wearsign: Pushing the limit of sign language translation using inertial and emg wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6 (1), 1–27 (2022)
Forster, J., Schmidt, C., Hoyoux, T., Koller, O., Zelle, U., Piater, J.H., Ney, H.: Rwth-phoenix-weather: A large vocabulary sign language recognition and translation corpus. In: LREC, vol. 9, pp. 3785–3789 (2012)
Othman, A., Jemni, M.: English-asl gloss parallel corpus 2012: Aslg-pc12. In: 5th Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon LREC (2012)
Hilzensauer, M., Krammer, K.: A multilingual dictionary for sign languages:“spreadthesign”. ICERI2015 Proceedings, 7826–7834 (2015)
Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Esplà-Gomis, M., Forcada, M., Ramírez-Sánchez, G., Hoang, H.T.: Paracrawl: Web-scale parallel corpora for the languages of the eu. In: MTSummit (2019)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). Ieee
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Li, D., Rodriguez, C., Yu, X., Li, H.: Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1459–1469 (2020)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Int. J. Comput. Vision 1 (4), 321–331 (1988)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421. Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/D15-1166
Hanke, T., Schulder, M., Konrad, R., Jahn, E.: Extending the Public DGS Corpus in size and depth. In: Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives, pp. 75–82. European Language Resources Association (ELRA), Marseille, France (2020). https://aclanthology.org/2020.signlang-1.12
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., Rush, A.M.: Transformers: State-of-the-Art Natural Language Processing (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., Lewis, M., Zettlemoyer, L.: Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics 8 , 726–742 (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR
Callison-Burch, C., Osborne, M., Koehn, P.: Re-evaluating the role of BLEU in machine translation research. In: 11th Conference of the European Chapter of the Association for Computational Linguistics (2006)
De Meulder, M.: Is “good enough” good enough? ethical and responsible development of sign language technologies. In: Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), pp. 12–22. Association for Machine Translation in the Americas, Virtual (2021). https://aclanthology.org/2021.mtsummit-at4ssl.2
Vanmassenhove, E., Shterionov, D., Way, A.: Lost in translation: Loss and decay of linguistic richness in machine translation. In: Proceedings of Machine Translation Summit XVII Volume 1: Research Track, pp. 222–232. European Association for Machine Translation, Dublin, Ireland (2019). https://www.aclweb.org/anthology/W19-6622
Vanmassenhove, E., Shterionov, D., Gwilliam, M.: Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL, pp. 2203–2213 (2021). Association for Computational Linguistics. https://aclanthology.org/2021.eacl-main.188/
De Coster, M., Van Herreweghe, M., Dambre, J.: Isolated sign recognition from rgb video using pose flow and self-attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3441–3450 (2021). https://doi.org/10.1109/CVPRW53098.2021.00383
Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Pires, B., Guo, Z., Azar, M., et al. : Bootstrap your own latent: A new approach to self-supervised learning. In: Neural Information Processing Systems (2020)
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems 33 (2020)
Barrault, L., Bojar, O., Costa-Jussa, M.R., Federmann, C., Fishel, M., Graham, Y.: Findings of the 2019 conference on machine translation (wmt19). (2019). Association for Computational Linguistics (ACL)
Download references
Acknowledgments
The authors wish to thank the anonymous reviewers for their detailed evaluation of our manuscript.
Mathieu De Coster’s research is funded by the Research Foundation Flanders (FWO Vlaanderen): file number 77410. This work has been conducted within the SignON project. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101017255.
Author information
Authors and affiliations.
IDLab-AIRO, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, 9052, Belgium
Mathieu De Coster & Joni Dambre
Tilburg University, Warandelaan 2, Tilburg, 5037 AB, Netherlands
Dimitar Shterionov
Ghent University, Blandijnberg 2, Ghent, 9000, Belgium
Mieke Van Herreweghe
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Mathieu De Coster .
Ethics declarations
Conflict of interest.
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article was revised due to update in Tables 6 and 7.
Supplementary Information
Below is the link to the electronic supplementary material.
10209_2023_992_MOESM1_ESM.xlsx
Supplementary file 1 provides an overview of all search results and the considered articles after applying the inclusion criteria.(64KB)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
About this article
De Coster, M., Shterionov, D., Van Herreweghe, M. et al. Machine translation from signed to spoken languages: state of the art and challenges. Univ Access Inf Soc 23 , 1305–1331 (2024). https://doi.org/10.1007/s10209-023-00992-1
Download citation
Accepted : 21 March 2023
Published : 01 April 2023
Issue Date : August 2024
DOI : https://doi.org/10.1007/s10209-023-00992-1
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Sign language
- Computer vision
- Machine translation
- Deep learning
- Literature review
- Find a journal
- Publish with us
- Track your research
Help | Advanced Search
Computer Science > Computation and Language
Title: a survey on spoken language understanding: recent advances and new frontiers.
Abstract: Spoken Language Understanding (SLU) aims to extract the semantics frame of user queries, which is a core component in a task-oriented dialog system. With the burst of deep neural networks and the evolution of pre-trained language models, the research of SLU has obtained significant breakthroughs. However, there remains a lack of a comprehensive survey summarizing existing approaches and recent trends, which motivated the work presented in this article. In this paper, we survey recent advances and new frontiers in SLU. Specifically, we give a thorough review of this research field, covering different aspects including (1) new taxonomy: we provide a new perspective for SLU filed, including single model vs. joint model, implicit joint modeling vs. explicit joint modeling in joint model, non pre-trained paradigm vs. pre-trained paradigm;(2) new frontiers: some emerging areas in complex SLU as well as the corresponding challenges; (3) abundant open-source resources: to help the community, we have collected, organized the related papers, baseline projects and leaderboard on a public website where SLU researchers could directly access to the recent progress. We hope that this survey can shed a light on future research in SLU field.
Submission history
Access paper:.
- Other Formats
References & Citations
- Google Scholar
- Semantic Scholar
DBLP - CS Bibliography
Bibtex formatted citation.
Bibliographic and Citation Tools
Code, data and media associated with this article, recommenders and search tools.
- Institution
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
- Tools and Resources
- Customer Services
- Communication and Culture
- Communication and Social Change
- Communication and Technology
- Communication Theory
- Critical/Cultural Studies
- Gender (Gay, Lesbian, Bisexual and Transgender Studies)
- Health and Risk Communication
- Intergroup Communication
- International/Global Communication
- Interpersonal Communication
- Journalism Studies
- Language and Social Interaction
- Mass Communication
- Media and Communication Policy
- Organizational Communication
- Political Communication
- Rhetorical Theory
- Share Facebook LinkedIn Twitter
Article contents
Language and power.
- Sik Hung Ng Sik Hung Ng Department of Psychology, Renmin University of China
- , and Fei Deng Fei Deng School of Foreign Studies, South China Agricultural University
- https://doi.org/10.1093/acrefore/9780190228613.013.436
- Published online: 22 August 2017
Five dynamic language–power relationships in communication have emerged from critical language studies, sociolinguistics, conversation analysis, and the social psychology of language and communication. Two of them stem from preexisting powers behind language that it reveals and reflects, thereby transferring the extralinguistic powers to the communication context. Such powers exist at both the micro and macro levels. At the micro level, the power behind language is a speaker’s possession of a weapon, money, high social status, or other attractive personal qualities—by revealing them in convincing language, the speaker influences the hearer. At the macro level, the power behind language is the collective power (ethnolinguistic vitality) of the communities that speak the language. The dominance of English as a global language and international lingua franca, for example, has less to do with its linguistic quality and more to do with the ethnolinguistic vitality of English-speakers worldwide that it reflects. The other three language–power relationships refer to the powers of language that are based on a language’s communicative versatility and its broad range of cognitive, communicative, social, and identity functions in meaning-making, social interaction, and language policies. Such language powers include, first, the power of language to maintain existing dominance in legal, sexist, racist, and ageist discourses that favor particular groups of language users over others. Another language power is its immense impact on national unity and discord. The third language power is its ability to create influence through single words (e.g., metaphors), oratories, conversations and narratives in political campaigns, emergence of leaders, terrorist narratives, and so forth.
- power behind language
- power of language
- intergroup communication
- World Englishes
- oratorical power
- conversational power
- leader emergence
- al-Qaeda narrative
- social identity approach
Introduction
Language is for communication and power.
Language is a natural human system of conventionalized symbols that have understood meanings. Through it humans express and communicate their private thoughts and feelings as well as enact various social functions. The social functions include co-constructing social reality between and among individuals, performing and coordinating social actions such as conversing, arguing, cheating, and telling people what they should or should not do. Language is also a public marker of ethnolinguistic, national, or religious identity, so strong that people are willing to go to war for its defense, just as they would defend other markers of social identity, such as their national flag. These cognitive, communicative, social, and identity functions make language a fundamental medium of human communication. Language is also a versatile communication medium, often and widely used in tandem with music, pictures, and actions to amplify its power. Silence, too, adds to the force of speech when it is used strategically to speak louder than words. The wide range of language functions and its versatility combine to make language powerful. Even so, this is only one part of what is in fact a dynamic relationship between language and power. The other part is that there is preexisting power behind language which it reveals and reflects, thereby transferring extralinguistic power to the communication context. It is thus important to delineate the language–power relationships and their implications for human communication.
This chapter provides a systematic account of the dynamic interrelationships between language and power, not comprehensively for lack of space, but sufficiently focused so as to align with the intergroup communication theme of the present volume. The term “intergroup communication” will be used herein to refer to an intergroup perspective on communication, which stresses intergroup processes underlying communication and is not restricted to any particular form of intergroup communication such as interethnic or intergender communication, important though they are. It echoes the pioneering attempts to develop an intergroup perspective on the social psychology of language and communication behavior made by pioneers drawn from communication, social psychology, and cognate fields (see Harwood et al., 2005 ). This intergroup perspective has fostered the development of intergroup communication as a discipline distinct from and complementing the discipline of interpersonal communication. One of its insights is that apparently interpersonal communication is in fact dynamically intergroup (Dragojevic & Giles, 2014 ). For this and other reasons, an intergroup perspective on language and communication behavior has proved surprisingly useful in revealing intergroup processes in health communication (Jones & Watson, 2012 ), media communication (Harwood & Roy, 2005 ), and communication in a variety of organizational contexts (Giles, 2012 ).
The major theoretical foundation that has underpinned the intergroup perspective is social identity theory (Tajfel, 1982 ), which continues to service the field as a metatheory (Abrams & Hogg, 2004 ) alongside relatively more specialized theories such as ethnolinguistic identity theory (Harwood et al., 1994 ), communication accommodation theory (Palomares et al., 2016 ), and self-categorization theory applied to intergroup communication (Reid et al., 2005 ). Against this backdrop, this chapter will be less concerned with any particular social category of intergroup communication or variant of social identity theory, and more with developing a conceptual framework of looking at the language–power relationships and their implications for understanding intergroup communication. Readers interested in an intra- or interpersonal perspective may refer to the volume edited by Holtgraves ( 2014a ).
Conceptual Approaches to Power
Bertrand Russell, logician cum philosopher and social activist, published a relatively little-known book on power when World War II was looming large in Europe (Russell, 2004 ). In it he asserted the fundamental importance of the concept of power in the social sciences and likened its importance to the concept of energy in the physical sciences. But unlike physical energy, which can be defined in a formula (e.g., E=MC 2 ), social power has defied any such definition. This state of affairs is not unexpected because the very nature of (social) power is elusive. Foucault ( 1979 , p. 92) has put it this way: “Power is everywhere, not because it embraces everything, but because it comes from everywhere.” This view is not beyond criticism but it does highlight the elusiveness of power. Power is also a value-laden concept meaning different things to different people. To functional theorists and power-wielders, power is “power to,” a responsibility to unite people and do good for all. To conflict theorists and those who are dominated, power is “power over,” which corrupts and is a source of social conflict rather than integration (Lenski, 1966 ; Sassenberg et al., 2014 ). These entrenched views surface in management–labor negotiations and political debates between government and opposition. Management and government would try to frame the negotiation in terms of “power to,” whereas labor and opposition would try to frame the same in “power over” in a clash of power discourses. The two discourses also interchange when the same speakers reverse their power relations: While in opposition, politicians adhere to “power over” rhetorics, once in government, they talk “power to.” And vice versa.
The elusive and value-laden nature of power has led to a plurality of theoretical and conceptual approaches. Five approaches that are particularly pertinent to the language–power relationships will be discussed, and briefly so because of space limitation. One approach views power in terms of structural dominance in society by groups who own and/or control the economy, the government, and other social institutions. Another approach views power as the production of intended effects by overcoming resistance that arises from objective conflict of interests or from psychological reactance to being coerced, manipulated, or unfairly treated. A complementary approach, represented by Kurt Lewin’s field theory, takes the view that power is not the actual production of effects but the potential for doing this. It looks behind power to find out the sources or bases of this potential, which may stem from the power-wielders’ access to the means of punishment, reward, and information, as well as from their perceived expertise and legitimacy (Raven, 2008 ). A fourth approach views power in terms of the balance of control/dependence in the ongoing social exchange between two actors that takes place either in the absence or presence of third parties. It provides a structural account of power-balancing mechanisms in social networking (Emerson, 1962 ), and forms the basis for combining with symbolic interaction theory, which brings in subjective factors such as shared social cognition and affects for the analysis of power in interpersonal and intergroup negotiation (Stolte, 1987 ). The fifth, social identity approach digs behind the social exchange account, which has started from control/dependence as a given but has left it unexplained, to propose a three-process model of power emergence (Turner, 2005 ). According to this model, it is psychological group formation and associated group-based social identity that produce influence; influence then cumulates to form the basis of power, which in turn leads to the control of resources.
Common to the five approaches above is the recognition that power is dynamic in its usage and can transform from one form of power to another. Lukes ( 2005 ) has attempted to articulate three different forms or faces of power called “dimensions.” The first, behavioral dimension of power refers to decision-making power that is manifest in the open contest for dominance in situations of objective conflict of interests. Non-decision-making power, the second dimension, is power behind the scene. It involves the mobilization of organizational bias (e.g., agenda fixing) to keep conflict of interests from surfacing to become public issues and to deprive oppositions of a communication platform to raise their voices, thereby limiting the scope of decision-making to only “safe” issues that would not challenge the interests of the power-wielder. The third dimension is ideological and works by socializing people’s needs and values so that they want the wants and do the things wanted by the power-wielders, willingly as their own. Conflict of interests, opposition, and resistance would be absent from this form of power, not because they have been maneuvered out of the contest as in the case of non-decision-making power, but because the people who are subject to power are no longer aware of any conflict of interest in the power relationship, which may otherwise ferment opposition and resistance. Power in this form can be exercised without the application of coercion or reward, and without arousing perceived manipulation or conflict of interests.
Language–Power Relationships
As indicated in the chapter title, discussion will focus on the language–power relationships, and not on language alone or power alone, in intergroup communication. It draws from all the five approaches to power and can be grouped for discussion under the power behind language and the power of language. In the former, language is viewed as having no power of its own and yet can produce influence and control by revealing the power behind the speaker. Language also reflects the collective/historical power of the language community that uses it. In the case of modern English, its preeminent status as a global language and international lingua franca has shaped the communication between native and nonnative English speakers because of the power of the English-speaking world that it reflects, rather than because of its linguistic superiority. In both cases, language provides a widely used conventional means to transfer extralinguistic power to the communication context. Research on the power of language takes the view that language has power of its own. This power allows a language to maintain the power behind it, unite or divide a nation, and create influence.
In Figure 1 we have grouped the five language–power relationships into five boxes. Note that the boundary between any two boxes is not meant to be rigid but permeable. For example, by revealing the power behind a message (box 1), a message can create influence (box 5). As another example, language does not passively reflect the power of the language community that uses it (box 2), but also, through its spread to other language communities, generates power to maintain its preeminence among languages (box 3). This expansive process of language power can be seen in the rise of English to global language status. A similar expansive process also applies to a particular language style that first reflects the power of the language subcommunity who uses the style, and then, through its common acceptance and usage by other subcommunities in the country, maintains the power of the subcommunity concerned. A prime example of this type of expansive process is linguistic sexism, which reflects preexisting male dominance in society and then, through its common usage by both sexes, contributes to the maintenance of male dominance. Other examples are linguistic racism and the language style of the legal profession, each of which, like linguistic sexism and the preeminence of the English language worldwide, has considerable impact on individuals and society at large.
Space precludes a full discussion of all five language–power relationships. Instead, some of them will warrant only a brief mention, whereas others will be presented in greater detail. The complexity of the language–power relations and their cross-disciplinary ramifications will be evident in the multiple sets of interrelated literatures that we cite from. These include the social psychology of language and communication, critical language studies (Fairclough, 1989 ), sociolinguistics (Kachru, 1992 ), and conversation analysis (Sacks et al., 1974 ).
Figure 1. Power behind language and power of language.
Power Behind Language
Language reveals power.
When negotiating with police, a gang may issue the threatening message, “Meet our demands, or we will shoot the hostages!” The threatening message may succeed in coercing the police to submit; its power, however, is more apparent than real because it is based on the guns gangsters posses. The message merely reveals the power of a weapon in their possession. Apart from revealing power, the gangsters may also cheat. As long as the message comes across as credible and convincing enough to arouse overwhelming fear, it would allow them to get away with their demands without actually possessing any weapon. In this case, language is used to produce an intended effect despite resistance by deceptively revealing a nonexisting power base and planting it in the mind of the message recipient. The literature on linguistic deception illustrates the widespread deceptive use of language-reveals-power to produce intended effects despite resistance (Robinson, 1996 ).
Language Reflects Power
Ethnolinguistic vitality.
The language that a person uses reflects the language community’s power. A useful way to think about a language community’s linguistic power is through the ethnolinguistic vitality model (Bourhis et al., 1981 ; Harwood et al., 1994 ). Language communities in a country vary in absolute size overall and, just as important, a relative numeric concentration in particular regions. Francophone Canadians, though fewer than Anglophone Canadians overall, are concentrated in Quebec to give them the power of numbers there. Similarly, ethnic minorities in mainland China have considerable power of numbers in those autonomous regions where they are concentrated, such as Inner Mongolia, Tibet, and Xinjiang. Collectively, these factors form the demographic base of the language community’s ethnolinguistic vitality, an index of the community’s relative linguistic dominance. Another base of ethnolinguistic vitality is institutional representations of the language community in government, legislatures, education, religion, the media, and so forth, which afford its members institutional leadership, influence, and control. Such institutional representation is often reinforced by a language policy that installs the language as the nation’s sole official language. The third base of ethnolinguistic vitality comprises sociohistorical and cultural status of the language community inside the nation and internationally. In short, the dominant language of a nation is one that comes from and reflects the high ethnolinguistic vitality of its language community.
An important finding of ethnolinguistic vitality research is that it is perceived vitality, and not so much its objective demographic-institutional-cultural strengths, that influences language behavior in interpersonal and intergroup contexts. Interestingly, the visibility and salience of languages shown on public and commercial signs, referred to as the “linguistic landscape,” serve important informational and symbolic functions as a marker of their relative vitality, which in turn affects the use of in-group language in institutional settings (Cenoz & Gorter, 2006 ; Landry & Bourhis, 1997 ).
World Englishes and Lingua Franca English
Another field of research on the power behind and reflected in language is “World Englishes.” At the height of the British Empire English spread on the back of the Industrial Revolution and through large-scale migrations of Britons to the “New World,” which has since become the core of an “inner circle” of traditional native English-speaking nations now led by the United States (Kachru, 1992 ). The emergent wealth and power of these nations has maintained English despite the decline of the British Empire after World War II. In the post-War era, English has become internationalized with the support of an “outer circle” nations and, later, through its spread to “expanding circle” nations. Outer circle nations are made up mostly of former British colonies such as India, Pakistan, and Nigeria. In compliance with colonial language policies that institutionalized English as the new colonial national language, a sizeable proportion of the colonial populations has learned and continued using English over generations, thereby vastly increasing the number of English speakers over and above those in the inner circle nations. The expanding circle encompasses nations where English has played no historical government roles, but which are keen to appropriate English as the preeminent foreign language for local purposes such as national development, internationalization of higher education, and participation in globalization (e.g., China, Indonesia, South Korea, Japan, Egypt, Israel, and continental Europe).
English is becoming a global language with official or special status in at least 75 countries (British Council, n.d. ). It is also the language choice in international organizations and companies, as well as academia, and is commonly used in trade, international mass media, and entertainment, and over the Internet as the main source of information. English native speakers can now follow the worldwide English language track to find jobs overseas without having to learn the local language and may instead enjoy a competitive language advantage where the job requires English proficiency. This situation is a far cry from the colonial era when similar advantages had to come under political patronage. Alongside English native speakers who work overseas benefitting from the preeminence of English over other languages, a new phenomenon of outsourcing international call centers away from the United Kingdom and the United States has emerged (Friginal, 2007 ). Callers can find the information or help they need from people stationed in remote places such as India or the Philippines where English has penetrated.
As English spreads worldwide, it has also become the major international lingua franca, serving some 800 million multilinguals in Asia alone, and numerous others elsewhere (Bolton, 2008 ). The practical importance of this phenomenon and its impact on English vocabulary, grammar, and accent have led to the emergence of a new field of research called “English as a lingua franca” (Brosch, 2015 ). The twin developments of World Englishes and lingua franca English raise interesting and important research questions. A vast area of research lies in waiting.
Several lines of research suggest themselves from an intergroup communication perspective. How communicatively effective are English native speakers who are international civil servants in organizations such as the UN and WTO, where they habitually speak as if they were addressing their fellow natives without accommodating to the international audience? Another line of research is lingua franca English communication between two English nonnative speakers. Their common use of English signals a joint willingness of linguistic accommodation, motivated more by communication efficiency of getting messages across and less by concerns of their respective ethnolinguistic identities. An intergroup communication perspective, however, would sensitize researchers to social identity processes and nonaccommodation behaviors underneath lingua franca communication. For example, two nationals from two different countries, X and Y, communicating with each other in English are accommodating on the language level; at the same time they may, according to communication accommodation theory, use their respective X English and Y English for asserting their ethnolinguistic distinctiveness whilst maintaining a surface appearance of accommodation. There are other possibilities. According to a survey of attitudes toward English accents, attachment to “standard” native speaker models remains strong among nonnative English speakers in many countries (Jenkins, 2009 ). This suggests that our hypothetical X and Y may, in addition to asserting their respective Englishes, try to outperform one another in speaking with overcorrect standard English accents, not so much because they want to assert their respective ethnolinguistic identities, but because they want to project a common in-group identity for positive social comparison—“We are all English-speakers but I am a better one than you!”
Many countries in the expanding circle nations are keen to appropriate English for local purposes, encouraging their students and especially their educational elites to learn English as a foreign language. A prime example is the Learn-English Movement in China. It has affected generations of students and teachers over the past 30 years and consumed a vast amount of resources. The results are mixed. Even more disturbing, discontents and backlashes have emerged from anti-English Chinese motivated to protect the vitality and cultural values of the Chinese language (Sun et al., 2016 ). The power behind and reflected in modern English has widespread and far-reaching consequences in need of more systematic research.
Power of Language
Language maintains existing dominance.
Language maintains and reproduces existing dominance in three different ways represented respectively by the ascent of English, linguistic sexism, and legal language style. For reasons already noted, English has become a global language, an international lingua franca, and an indispensable medium for nonnative English speaking countries to participate in the globalized world. Phillipson ( 2009 ) referred to this phenomenon as “linguistic imperialism.” It is ironic that as the spread of English has increased the extent of multilingualism of non-English-speaking nations, English native speakers in the inner circle of nations have largely remained English-only. This puts pressure on the rest of the world to accommodate them in English, the widespread use of which maintains its preeminence among languages.
A language evolves and changes to adapt to socially accepted word meanings, grammatical rules, accents, and other manners of speaking. What is acceptable or unacceptable reflects common usage and hence the numerical influence of users, but also the elites’ particular language preferences and communication styles. Research on linguistic sexism has shown, for example, a man-made language such as English (there are many others) is imbued with sexist words and grammatical rules that reflect historical male dominance in society. Its uncritical usage routinely by both sexes in daily life has in turn naturalized male dominance and associated sexist inequalities (Spender, 1998 ). Similar other examples are racist (Reisigl & Wodak, 2005 ) and ageist (Ryan et al., 1995 ) language styles.
Professional languages are made by and for particular professions such as the legal profession (Danet, 1980 ; Mertz et al., 2016 ; O’Barr, 1982 ). The legal language is used not only among members of the profession, but also with the general public, who may know each and every word in a legal document but are still unable to decipher its meaning. Through its language, the legal profession maintains its professional dominance with the complicity of the general public, who submits to the use of the language and accedes to the profession’s authority in interpreting its meanings in matters relating to their legal rights and obligations. Communication between lawyers and their “clients” is not only problematic, but the public’s continual dependence on the legal language contributes to the maintenance of the dominance of the profession.
Language Unites and Divides a Nation
A nation of many peoples who, despite their diverse cultural and ethnic background, all speak in the same tongue and write in the same script would reap the benefit of the unifying power of a common language. The power of the language to unite peoples would be stronger if it has become part of their common national identity and contributed to its vitality and psychological distinctiveness. Such power has often been seized upon by national leaders and intellectuals to unify their countries and serve other nationalistic purposes (Patten, 2006 ). In China, for example, Emperor Qin Shi Huang standardized the Chinese script ( hanzi ) as an important part of the reforms to unify the country after he had defeated the other states and brought the Warring States Period ( 475–221 bc ) to an end. A similar reform of language standardization was set in motion soon after the overthrow of the Qing Dynasty ( ad 1644–1911 ), by simplifying some of the hanzi and promoting Putonghua as the national standard oral language. In the postcolonial part of the world, language is often used to service nationalism by restoring the official status of their indigenous language as the national language whilst retaining the colonial language or, in more radical cases of decolonization, relegating the latter to nonofficial status. Yet language is a two-edged sword: It can also divide a nation. The tension can be seen in competing claims to official-language status made by minority language communities, protest over maintenance of minority languages, language rights at schools and in courts of law, bilingual education, and outright language wars (Calvet, 1998 ; DeVotta, 2004 ).
Language Creates Influence
In this section we discuss the power of language to create influence through single words and more complex linguistic structures ranging from oratories and conversations to narratives/stories.
Power of Single Words
Learning a language empowers humans to master an elaborate system of conventions and the associations between words and their sounds on the one hand, and on the other hand, categories of objects and relations to which they refer. After mastering the referential meanings of words, a person can mentally access the objects and relations simply by hearing or reading the words. Apart from their referential meanings, words also have connotative meanings with their own social-cognitive consequences. Together, these social-cognitive functions underpin the power of single words that has been extensively studied in metaphors, which is a huge research area that crosses disciplinary boundaries and probes into the inner workings of the brain (Benedek et al., 2014 ; Landau et al., 2014 ; Marshal et al., 2007 ). The power of single words extends beyond metaphors. It can be seen in misleading words in leading questions (Loftus, 1975 ), concessive connectives that reverse expectations from real-world knowledge (Xiang & Kuperberg, 2014 ), verbs that attribute implicit causality to either verb subject or object (Hartshorne & Snedeker, 2013 ), “uncertainty terms” that hedge potentially face-threatening messages (Holtgraves, 2014b ), and abstract words that signal power (Wakslak et al., 2014 ).
The literature on the power of single words has rarely been applied to intergroup communication, with the exception of research arising from the linguistic category model (e.g., Semin & Fiedler, 1991 ). The model distinguishes among descriptive action verbs (e.g., “hits”), interpretative action verbs (e.g., “hurts”) and state verbs (e.g., “hates”), which increase in abstraction in that order. Sentences made up of abstract verbs convey more information about the protagonist, imply greater temporal and cross-situational stability, and are more difficult to disconfirm. The use of abstract language to represent a particular behavior will attribute the behavior to the protagonist rather than the situation and the resulting image of the protagonist will persist despite disconfirming information, whereas the use of concrete language will attribute the same behavior more to the situation and the resulting image of the protagonist will be easier to change. According to the linguistic intergroup bias model (Maass, 1999 ), abstract language will be used to represent positive in-group and negative out-group behaviors, whereas concrete language will be used to represent negative in-group and positive out-group behaviors. The combined effects of the differential use of abstract and concrete language would, first, lead to biased attribution (explanation) of behavior privileging the in-group over the out-group, and second, perpetuate the prejudiced intergroup stereotypes. More recent research has shown that linguistic intergroup bias varies with the power differential between groups—it is stronger in high and low power groups than in equal power groups (Rubini et al., 2007 ).
Oratorical Power
A charismatic speaker may, by the sheer force of oratory, buoy up people’s hopes, convert their hearts from hatred to forgiveness, or embolden them to take up arms for a cause. One may recall moving speeches (in English) such as Susan B. Anthony’s “On Women’s Right to Vote,” Winston Churchill’s “We Shall Fight on the Beaches,” Mahatma Gandhi’s “Quit India,” or Martin Luther King, Jr.’s “I Have a Dream.” The speech may be delivered face-to-face to an audience, or broadcast over the media. The discussion below focuses on face-to-face oratories in political meetings.
Oratorical power may be measured in terms of money donated or pledged to the speaker’s cause, or, in a religious sermon, the number of converts made. Not much research has been reported on these topics. Another measurement approach is to count the frequency of online audience responses that a speech has generated, usually but not exclusively in the form of applause. Audience applause can be measured fairly objectively in terms of frequency, length, or loudness, and collected nonobtrusively from a public recording of the meeting. Audience applause affords researchers the opportunity to explore communicative and social psychological processes that underpin some aspects of the power of rhetorical formats. Note, however, that not all incidences of audience applause are valid measures of the power of rhetoric. A valid incidence should be one that is invited by the speaker and synchronized with the flow of the speech, occurring at the appropriate time and place as indicated by the rhetorical format. Thus, an uninvited incidence of applause would not count, nor is one that is invited but has occurred “out of place” (too soon or too late). Furthermore, not all valid incidences are theoretically informative to the same degree. An isolated applause from just a handful of the audience, though valid and in the right place, has relatively little theoretical import for understanding the power of rhetoric compared to one that is made by many acting in unison as a group. When the latter occurs, it would be a clear indication of the power of rhetorically formulated speech. Such positive audience response constitutes the most direct and immediate means by which an audience can display its collective support for the speaker, something which they would not otherwise show to a speech of less power. To influence and orchestrate hundreds and thousands of people in the audience to precisely coordinate their response to applaud (and cheer) together as a group at the right time and place is no mean feat. Such a feat also influences the wider society through broadcast on television and other news and social media. The combined effect could be enormous there and then, and its downstream influence far-reaching, crossing country boarders and inspiring generations to come.
To accomplish the feat, an orator has to excite the audience to applaud, build up the excitement to a crescendo, and simultaneously cue the audience to synchronize their outburst of stored-up applause with the ongoing speech. Rhetorical formats that aid the orator to accomplish the dual functions include contrast, list, puzzle solution, headline-punchline, position-taking, and pursuit (Heritage & Greatbatch, 1986 ). To illustrate, we cite the contrast and list formats.
A contrast, or antithesis, is made up of binary schemata such as “too much” and “too little.” Heritage and Greatbatch ( 1986 , p. 123) reported the following example:
Governments will argue that resources are not available to help disabled people. The fact is that too much is spent on the munitions of war, and too little is spent on the munitions of peace [italics added]. As the audience is familiar with the binary schema of “too much” and “too little” they can habitually match the second half of the contrast against the first half. This decoding process reinforces message comprehension and helps them to correctly anticipate and applaud at the completion point of the contrast. In the example quoted above, the speaker micropaused for 0.2 seconds after the second word “spent,” at which point the audience began to applaud in anticipation of the completion point of the contrast, and applauded more excitedly upon hearing “. . . on the munitions of peace.” The applause continued and lasted for 9.2 long seconds.
A list is usually made up of a series of three parallel words, phrases or clauses. “Government of the people, by the people, for the people” is a fine example, as is Obama’s “It’s been a long time coming, but tonight, because of what we did on this day , in this election , at this defining moment , change has come to America!” (italics added) The three parts in the list echo one another, step up the argument and its corresponding excitement in the audience as they move from one part to the next. The third part projects a completion point to cue the audience to get themselves ready to display their support via applause, cheers, and so forth. In a real conversation this juncture is called a “transition-relevance place,” at which point a conversational partner (hearer) may take up a turn to speak. A skilful orator will micropause at that juncture to create a conversational space for the audience to take up their turn in applauding and cheering as a group.
As illustrated by the two examples above, speaker and audience collaborate to transform an otherwise monological speech into a quasiconversation, turning a passive audience into an active supportive “conversational” partner who, by their synchronized responses, reduces the psychological separation from the speaker and emboldens the latter’s self-confidence. Through such enjoyable and emotional participation collectively, an audience made up of formerly unconnected individuals with no strong common group identity may henceforth begin to feel “we are all one.” According to social identity theory and related theories (van Zomeren et al., 2008 ), the emergent group identity, politicized in the process, will in turn provide a social psychological base for collective social action. This process of identity making in the audience is further strengthened by the speaker’s frequent use of “we” as a first person, plural personal pronoun.
Conversational Power
A conversation is a speech exchange system in which the length and order of speaking turns have not been preassigned but require coordination on an utterance-by-utterance basis between two or more individuals. It differs from other speech exchange systems in which speaking turns have been preassigned and/or monitored by a third party, for example, job interviews and debate contests. Turn-taking, because of its centrality to conversations and the important theoretical issues that it raises for social coordination and implicit conversational conventions, has been the subject of extensive research and theorizing (Goodwin & Heritage, 1990 ; Grice, 1975 ; Sacks et al., 1974 ). Success at turn-taking is a key part of the conversational process leading to influence. A person who cannot do this is in no position to influence others in and through conversations, which are probably the most common and ubiquitous form of human social interaction. Below we discuss studies of conversational power based on conversational turns and applied to leader emergence in group and intergroup settings. These studies, as they unfold, link conversation analysis with social identity theory and expectation states theory (Berger et al., 1974 ).
A conversational turn in hand allows the speaker to influence others in two important ways. First, through current-speaker-selects-next the speaker can influence who will speak next and, indirectly, increases the probability that he or she will regain the turn after the next. A common method for selecting the next speaker is through tag questions. The current speaker (A) may direct a tag question such as “Ya know?” or “Don’t you agree?” to a particular hearer (B), which carries the illocutionary force of selecting the addressee to be the next speaker and, simultaneously, restraining others from self-selecting. The A 1 B 1 sequence of exchange has been found to have a high probability of extending into A 1 B 1 A 2 in the next round of exchange, followed by its continuation in the form of A 1 B 1 A 2 B 2 . For example, in a six-member group, the A 1 B 1 →A 1 B 1 A 2 sequence of exchange has more than 50% chance of extending to the A 1 B 1 A 2 B 2 sequence, which is well above chance level, considering that there are four other hearers who could intrude at either the A 2 or B 2 slot of turn (Stasser & Taylor, 1991 ). Thus speakership not only offers the current speaker the power to select the next speaker twice, but also to indirectly regain a turn.
Second, a turn in hand provides the speaker with an opportunity to exercise topic control. He or she can exercise non-decision-making power by changing an unfavorable or embarrassing topic to a safer one, thereby silencing or preventing it from reaching the “floor.” Conversely, he or she can exercise decision-making power by continuing or raising a topic that is favorable to self. Or the speaker can move on to talk about an innocuous topic to ease tension in the group.
Bales ( 1950 ) has studied leader emergence in groups made up of unacquainted individuals in situations where they have to bid or compete for speaking turns. Results show that individuals who talk the most have a much better chance of becoming leaders. Depending on the social orientations of their talk, they would be recognized as a task or relational leader. Subsequent research on leader emergence has shown that an even better behavioral predictor than volume of talk is the number of speaking turns. An obvious reason for this is that the volume of talk depends on the number of turns—it usually accumulates across turns, rather than being the result of a single extraordinary long turn of talk. Another reason is that more turns afford the speaker more opportunities to realize the powers of turns that have been explicated above. Group members who become leaders are the ones who can penetrate the complex, on-line conversational system to obtain a disproportionately large number of speaking turns by perfect timing at “transition-relevance places” to self-select as the next speaker or, paradoxical as it may seem, constructive interruptions (Ng et al., 1995 ).
More recent research has extended the experimental study of group leadership to intergroup contexts, where members belonging to two groups who hold opposing stances on a social or political issue interact within and also between groups. The results showed, first, that speaking turns remain important in leader emergence, but the intergroup context now generates social identity and self-categorization processes that selectively privilege particular forms of speech. What potential leaders say, and not only how many speaking turns they have gained, becomes crucial in conveying to group members that they are prototypical members of their group. Prototypical communication is enacted by adopting an accent, choosing code words, and speaking in a tone that characterize the in-group; above all, it is enacted through the content of utterances to represent or exemplify the in-group position. Such prototypical utterances that are directed successfully at the out-group correlate strongly with leader emergence (Reid & Ng, 2000 ). These out-group-directed prototypical utterances project an in-group identity that is psychologically distinctive from the out-group for in-group members to feel proud of and to rally together when debating with the out-group.
Building on these experimental results Reid and Ng ( 2003 ) developed a social identity theory of leadership to account for the emergence and maintenance of intergroup leadership, grounding it in case studies of the intergroup communication strategies that brought Ariel Sharon and John Howard to power in Israel and Australia, respectively. In a later development, the social identity account was fused with expectation states theory to explain how group processes collectively shape the behavior of in-group members to augment the prototypical communication behavior of the emergent leader (Reid & Ng, 2006 ). Specifically, when conversational influence gained through prototypical utterances culminates to form an incipient power hierarchy, group members develop expectations of who is and will be leading the group. Acting on these tacit expectations they collectively coordinate the behavior of each other to conform with the expectations by granting incipient leaders more speaking turns and supporting them with positive audience responses. In this way, group members collectively amplify the influence of incipient leaders and jointly propel them to leadership roles (see also Correll & Ridgeway, 2006 ). In short, the emergence of intergroup leaders is a joint process of what they do individually and what group members do collectively, enabled by speaking turns and mediated by social identity and expectation states processes. In a similar vein, Hogg ( 2014 ) has developed a social identity account of leadership in intergroup settings.
Narrative Power
Narratives and stories are closely related and are sometimes used interchangeably. However, it is useful to distinguish a narrative from a story and from other related terms such as discourse and frames. A story is a sequence of related events in the past recounted for rhetorical or ideological purposes, whereas a narrative is a coherent system of interrelated and sequentially organized stories formed by incorporating new stories and relating them to others so as to provide an ongoing basis for interpreting events, envisioning an ideal future, and motivating and justifying collective actions (Halverson et al., 2011 ). The temporal dimension and sense of movement in a narrative also distinguish it from discourse and frames. According to Miskimmon, O’Loughlin, and Roselle ( 2013 ), discourses are the raw material of communication that actors plot into a narrative, and frames are the acts of selecting and highlighting some events or issues to promote a particular interpretation, evaluation, and solution. Both discourse and frame lack the temporal and causal transformation of a narrative.
Pitching narratives at the suprastory level and stressing their temporal and transformational movements allows researchers to take a structurally more systemic and temporally more expansive view than traditional research on propaganda wars between nations, religions, or political systems (Halverson et al., 2011 ; Miskimmon et al., 2013 ). Schmid ( 2014 ) has provided an analysis of al-Qaeda’s “compelling narrative that authorizes its strategy, justifies its violent tactics, propagates its ideology and wins new recruits.” According to this analysis, the chief message of the narrative is “the West is at war with Islam,” a strategic communication that is fundamentally intergroup in both structure and content. The intergroup structure of al-Qaeda narrative includes the rhetorical constructions that there are a group grievance inflicted on Muslims by a Zionist–Christian alliance, a vision of the good society (under the Caliphate and sharia), and a path from grievance to the realization of the vision led by al-Qaeda in a violent jihad to eradicate Western influence in the Muslim world. The al-Qaeda narrative draws support not only from traditional Arab and Muslim cultural narratives interpreted to justify its unorthodox means (such as attacks against women and children), but also from pre-existing anti-Semitism and anti-Americanism propagated by some Arab governments, Soviet Cold War propaganda, anti-Western sermons by Muslim clerics, and the Israeli government’s treatment of Palestinians. It is deeply embedded in culture and history, and has reached out to numerous Muslims who have emigrated to the West.
The intergroup content of al-Qaeda narrative was shown in a computer-aided content analysis of 18 representative transcripts of propaganda speeches released between 2006–2011 by al-Qaeda leaders, totaling over 66,000 words (Cohen et al., 2016 ). As part of the study, an “Ideology Extraction using Linguistic Extremization” (IELEX) categorization scheme was developed for mapping the content of the corpus, which revealed 19 IELEX rhetorical categories referring to either the out-group/enemy or the in-group/enemy victims. The out-group/enemy was represented by four categories such as “The enemy is extremely negative (bloodthirsty, vengeful, brainwashed, etc.)”; whereas the in-group/enemy victims were represented by more categories such as “we are entirely innocent/good/virtuous.” The content of polarized intergroup stereotypes, demonizing “them” and glorifying “us,” echoes other similar findings (Smith et al., 2008 ), as well as the general finding of intergroup stereotyping in social psychology (Yzerbyt, 2016 ).
The success of the al-Qaeda narrative has alarmed various international agencies, individual governments, think tanks, and religious groups to spend huge sums of money on developing counternarratives that are, according to Schmid ( 2014 ), largely feeble. The so-called “global war on terror” has failed in its effort to construct effective counternarratives although al-Qaeda’s finance, personnel, and infrastructure have been much weakened. Ironically, it has developed into a narrative of its own, not so much for countering external extremism, but for promoting and justifying internal nationalistic extremist policies and influencing national elections. This reactive coradicalization phenomenon is spreading (Mink, 2015 ; Pratt, 2015 ; Reicher & Haslam, 2016 ).
Discussion and Future Directions
This chapter provides a systematic framework for understanding five language–power relationships, namely, language reveals power, reflects power, maintains existing dominance, unites and divides a nation, and creates influence. The first two relationships are derived from the power behind language and the last three from the power of language. Collectively they provide a relatively comprehensible framework for understanding the relationships between language and power, and not simply for understanding language alone or power alone separated from one another. The language–power relationships are dynamically interrelated, one influencing the other, and each can draw from an array of the cognitive, communicative, social, and identity functions of language. The framework is applicable to both interpersonal and intergroup contexts of communication, although for present purposes the latter has been highlighted. Among the substantive issues discussed in this chapter, English as a global language, oratorical and narrative power, and intergroup leadership stand out as particularly important for political and theoretical reasons.
In closing, we note some of the gaps that need to be filled and directions for further research. When discussing the powers of language to maintain and reflect existing dominance, we have omitted the countervailing power of language to resist or subvert existing dominance and, importantly, to create social change for the collective good. Furthermore, in this age of globalization and its discontents, English as a global language will increasingly be resented for its excessive unaccommodating power despite tangible lingua franca English benefits, and challenged by the expanding ethnolinguistic vitality of peoples who speak Arabic, Chinese, or Spanish. Internet communication is no longer predominantly in English, but is rapidly diversifying to become the modern Tower of Babel. And yet we have barely scratched the surface of these issues. Other glaring gaps include the omission of media discourse and recent developments in Corpus-based Critical Discourse Analysis (Loring, 2016 ), as well as the lack of reference to languages other than English that may cast one or more of the language–power relationships in a different light.
One of the main themes of this chapter—that the diverse language–power relationships are dynamically interrelated—clearly points to the need for greater theoretical fertilization across cognate disciplines. Our discussion of the three powers of language (boxes 3–5 in Figure 1 ) clearly points in this direction, most notably in the case of the powers of language to create influence through single words, oratories, conversations, and narratives, but much more needs to be done. The social identity approach will continue to serve as a meta theory of intergroup communication. To the extent that intergroup communication takes place in an existing power relation and that the changes that it seeks are not simply a more positive or psychologically distinctive social identity but greater group power and a more powerful social identity, the social identity approach has to incorporate power in its application to intergroup communication.
Further Reading
- Austin, J. L. (1975). How to do things with words . Oxford: Oxford University Press.
- Billig, M. (1991). Ideology and opinions: Studies in rhetorical psychology . Newbury Park, CA: SAGE.
- Crystal, D. (2012). English as a global language , 2d ed. Cambridge, U.K.: Cambridge University Press.
- Culpeper, J. (2011). Impoliteness . New York: John Wiley.
- Holtgraves, T. M. (2010). Social psychology and language: Words, utterances, and conversations. In S. Fiske , D. Gilbert , & G. Lindzey (Eds.), Handbook of social psychology (5th ed., pp. 1386–1422). New York: John Wiley.
- Mumby, D. K. (Ed.). (1993). Narrative and social control: Critical perspectives (Vol. 21). Newbury Park, CA: SAGE.
- Ng, S. H. , & Bradac, J. J. (1993). Power in language: Verbal communication and social influence . Newbury Park, CA: SAGE. Retrieved from http://dx.doi.org/10.4135/9781412994088.n202 .
- Abrams, D. , & Hogg, M. A. (2004). Metatheory: Lessons from social identity research. Personality and Social Psychology Review , 8 , 98–106.
- Bales, R. F. (1950). Interaction process analysis: A method for the study of small groups . Oxford: Addison-Wesley.
- Benedek, M. , Beaty, R. , Jauk, E. , Koschutnig, K. , Fink, A. , Silvia, P. J. , . . . & Neubauer, A. C. (2014). Creating metaphors: The neural basis of figurative language production. NeuroImage , 90 , 99–106.
- Berger, J. , Conner, T. L. , & Fisek, M. H. (Eds.). (1974). Expectation states theory: A theoretical research program . Cambridge, MA: Winthrop.
- Bolton, K. (2008). World Englishes today. In B. B. Kachru , Y. Kachru , & C. L. Nelson (Eds.), The handbook of world Englishes (pp. 240–269). Oxford: Blackwell.
- Bourhis, R. Y. , Giles, H. , & Rosenthal, D. (1981). Notes on the construction of a “Subjective vitality questionnaire” for ethnolinguistic groups. Journal of Multilingual and Multicultural Development , 2 , 145–155.
- British Council . (n.d.). Retrieved from http://www.britishcouncil.org/learning-faq-the-english-language.htm .
- Brosch, C. (2015). On the conceptual history of the term Lingua Franca . Apples . Journal of Applied Language Studies , 9 (1), 71–85.
- Calvet, J. (1998). Language wars and linguistic politics . Oxford: Oxford University Press.
- Cenoz, J. , & Gorter, D. (2006). Linguistic landscape and minority languages. International Journal of Multilingualism , 3 , 67–80.
- Cohen, S. J. , Kruglanski, A. , Gelfand, M. J. , Webber, D. , & Gunaratna, R. (2016). Al-Qaeda’s propaganda decoded: A psycholinguistic system for detecting variations in terrorism ideology . Terrorism and Political Violence , 1–30.
- Correll, S. J. , & Ridgeway, C. L. (2006). Expectation states theory . In L. DeLamater (Ed.), Handbook of social psychology (pp. 29–51). Hoboken, NJ: Springer.
- Danet, B. (1980). Language in the legal process. Law and Society Review , 14 , 445–564.
- DeVotta, N. (2004). Blowback: Linguistic nationalism, institutional decay, and ethnic conflict in Sri Lanka . Stanford, CA: Stanford University Press.
- Dragojevic, M. , & Giles, H. (2014). Language and interpersonal communication: Their intergroup dynamics. In C. R. Berger (Ed.), Handbook of interpersonal communication (pp. 29–51). Berlin: De Gruyter.
- Emerson, R. M. (1962). Power–Dependence Relations. American Sociological Review , 27 , 31–41.
- Fairclough, N. L. (1989). Language and power . London: Longman.
- Foucault, M. (1979). The history of sexuality volume 1: An introduction . London: Allen Lane.
- Friginal, E. (2007). Outsourced call centers and English in the Philippines. World Englishes , 26 , 331–345.
- Giles, H. (Ed.) (2012). The handbook of intergroup communication . New York: Routledge.
- Goodwin, C. , & Heritage, J. (1990). Conversation analysis. Annual review of anthropology , 19 , 283–307.
- Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and semantics (pp. 41–58). New York: Academic Press.
- Halverson, J. R. , Goodall H. L., Jr. , & Corman, S. R. (2011). Master narratives of Islamist extremism . New York: Palgrave Macmillan.
- Hartshorne, J. K. , & Snedeker, J. (2013). Verb argument structure predicts implicit causality: The advantages of finer-grained semantics. Language and Cognitive Processes , 28 , 1474–1508.
- Harwood, J. , Giles, H. , & Bourhis, R. Y. (1994). The genesis of vitality theory: Historical patterns and discoursal dimensions. International Journal of the Sociology of Language , 108 , 167–206.
- Harwood, J. , Giles, H. , & Palomares, N. A. (2005). Intergroup theory and communication processes. In J. Harwood & H. Giles (Eds.), Intergroup communication: Multiple perspectives (pp. 1–20). New York: Peter Lang.
- Harwood, J. , & Roy, A. (2005). Social identity theory and mass communication research. In J. Harwood & H. Giles (Eds.), Intergroup communication: Multiple perspectives (pp. 189–212). New York: Peter Lang.
- Heritage, J. , & Greatbatch, D. (1986). Generating applause: A study of rhetoric and response at party political conferences. American Journal of Sociology , 92 , 110–157.
- Hogg, M. A. (2014). From uncertainty to extremism: Social categorization and identity processes. Current Directions in Psychological Science , 23 , 338–342.
- Holtgraves, T. M. (Ed.). (2014a). The Oxford handbook of language and social psychology . Oxford: Oxford University Press.
- Holtgraves, T. M. (2014b). Interpreting uncertainty terms. Journal of Personality and Social Psychology , 107 , 219–228.
- Jenkins, J. (2009). English as a lingua franca: interpretations and attitudes. World Englishes , 28 , 200–207.
- Jones, L. , & Watson, B. M. (2012). Developments in health communication in the 21st century. Journal of Language and Social Psychology , 31 , 415–436.
- Kachru, B. B. (1992). The other tongue: English across cultures . Urbana: University of Illinois Press.
- Landau, M. J. , Robinson, M. D. , & Meier, B. P. (Eds.). (2014). The power of metaphor: Examining its influence on social life . Washington, DC: American Psychological Association.
- Landry, R. , & Bourhis, R. Y. (1997). Linguistic landscape and ethnolinguistic vitality an empirical study. Journal of language and social psychology , 16 , 23–49.
- Lenski, G. (1966). Power and privilege: A theory of social stratification . New York: McGraw-Hill.
- Loftus, E. F. (1975). Leading questions and the eyewitness report. Cognitive Psychology , 7 , 560–572.
- Loring, A. (2016). Ideologies and collocations of “Citizenship” in media discourse: A corpus-based critical discourse analysis. In A. Loring & V. Ramanathan (Eds.), Language, immigration and naturalization: Legal and linguistic issues (chapter 9). Tonawanda, NY: Multilingual Matters.
- Lukes, S. (2005). Power: A radical view , 2d ed. New York: Palgrave.
- Maass, A. (1999). Linguistic intergroup bias: Stereotype perpetuation through language. Advances in experimental social psychology , 31 , 79–121.
- Marshal, N. , Faust, M. , Hendler, T. , & Jung-Beeman, M. (2007). An fMRI investigation of the neural correlates underlying the processing of novel metaphoric expressions. Brain and language , 100 , 115–126.
- Mertz, E. , Ford, W. K. , & Matoesian, G. (Eds.). (2016). Translating the social world for law: Linguistic tools for a new legal realism . New York: Oxford University Press.
- Mink, C. (2015). It’s about the group, not god: Social causes and cures for terrorism. Journal for Deradicalization , 5 , 63–91.
- Miskimmon, A. , O’Loughlin, B. , & Roselle, L. (2013). Strategic narratives: Communicating power and the New World Order . New York: Routledge.
- Ng, S. H. , Brooke, M. & Dunne, M. (1995). Interruptions and influence in discussion groups. Journal of Language & Social Psychology , 14 , 369–381.
- O’Barr, W. M. (1982). Linguistic evidence: Language, power, and strategy in the courtroom . London: Academic Press.
- Palomares, N. A. , Giles, H. , Soliz, J. , & Gallois, C. (2016). Intergroup accommodation, social categories, and identities. In H. Giles (Ed.), Communication accommodation theory: Negotiating personal relationships and social identities across contexts (pp. 123–151). Cambridge, U.K.: Cambridge University Press.
- Patten, A. (2006). The humanist roots of linguistic nationalism. History of Political Thought , 27 , 221–262.
- Phillipson, R. (2009). Linguistic imperialism continued . New York: Routledge.
- Pratt, D. (2015). Reactive co-radicalization: Religious extremism as mutual discontent. Journal for the Academic Study of Religion , 28 , 3–23.
- Raven, B. H. (2008). The bases of power and the power/interaction model of interpersonal influence. Analyses of Social Issues and Public Policy , 8 , 1–22.
- Reicher, S. D. , & Haslam, S. A. (2016). Fueling extremes. Scientific American Mind , 27 , 34–39.
- Reid, S. A. , Giles, H. , & Harwood, J. (2005). A self-categorization perspective on communication and intergroup relations. In J. Harwood & H. Giles (Eds.), Intergroup communication: Multiple perspectives (pp. 241–264). New York: Peter Lang.
- Reid, S. A. , & Ng, S. H. (2000). Conversation as a resource for influence: Evidence for prototypical arguments and social identification processes. European Journal of Social Psychology , 30 , 83–100.
- Reid, S. A. , & Ng, S. H. (2003). Identity, power, and strategic social categorisations: Theorising the language of leadership. In P. van Knippenberg & M. A. Hogg (Eds.), Leadership and power: Identity processes in groups and organizations (pp. 210–223). London: SAGE.
- Reid, S. A. , & Ng, S. H. (2006). The dynamics of intragroup differentiation in an intergroup social context. Human Communication Research , 32 , 504–525.
- Reisigl, M. , & Wodak, R. (2005). Discourse and discrimination: Rhetorics of racism and antisemitism . London: Routledge.
- Robinson, W. P. (1996). Deceit, delusion, and detection . Newbury Park, CA: SAGE.
- Rubini, M. , Moscatelli, S. , Albarello, F. , & Palmonari, A. (2007). Group power as a determinant of interdependence and intergroup discrimination. European Journal of Social Psychology , 37 (6), 1203–1221.
- Russell, B. (2004). Power: A new social analysis . Originally published in 1938. London: Routledge.
- Ryan, E. B. , Hummert, M. L. , & Boich, L. H. (1995). Communication predicaments of aging patronizing behavior toward older adults. Journal of Language and Social Psychology , 14 (1–2), 144–166.
- Sacks, H. , Schegloff, E. A. , & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. L anguage , 50 , 696–735.
- Sassenberg, K. , Ellemers, N. , Scheepers, D. , & Scholl, A. (2014). “Power corrupts” revisited: The role of construal of power as opportunity or responsibility. In J. -W. van Prooijen & P. A. M. van Lange (Eds.), Power, politics, and paranoia: Why people are suspicious of their leaders (pp. 73–87). Cambridge, U.K.: Cambridge University Press.
- Schmid, A. P. (2014). Al-Qaeda’s “single narrative” and attempts to develop counter-narratives: The state of knowledge . The Hague, The Netherlands: International Centre for Counter-Terrorism, 26. Available at https://www.icct.nl/download/file/A-Schmid-Al-Qaedas-Single-Narrative-January-2014.pdf .
- Semin, G. R. , & Fiedler, K. (1991). The linguistic category model, its bases, applications and range. In W. Stroebe & M. Hewstone (Eds.), European review of social psychology (Vol. 2, pp. 1–50). Chichester, U.K.: John Wiley.
- Smith, A. G. , Suedfeld, P. , Conway, L. G. , IIl, & Winter, D. G. (2008). The language of violence: Distinguishing terrorist from non-terrorist groups by thematic content analysis. Dynamics of Asymmetric Conflict , 1 (2), 142–163.
- Spender, D. (1998). Man made language , 4th ed. London: Pandora.
- Stasser, G. , & Taylor, L. (1991). Speaking turns in face-to-face discussions. Journal of Personality & Social Psychology , 60 , 675–684.
- Stolte, J. (1987). The formation of justice norms. American Sociological Review , 52 (6), 774–784.
- Sun, J. J. M. , Hu, P. , & Ng, S. H. (2016). Impact of English on education reforms in China: With reference to the learn-English movement, the internationalisation of universities and the English language requirement in college entrance examinations . Journal of Multilingual and Multicultural Development , 1–14 (Published online January 22, 2016).
- Tajfel, H. (1982). Social psychology of intergroup relations. Annual Review of Psychology , 33 , 1–39.
- Turner, J. C. (2005). Explaining the nature of power: A three—process theory. European Journal of Social Psychology , 35 , 1–22.
- Van Zomeren, M. , Postmes, T. , & Spears, R. (2008). Toward an integrative social identity model of collective action: A quantitative research synthesis of three socio-psychological perspectives. Psychological Bulletin , 134 (4), 504–535.
- Wakslak, C. J. , Smith, P. K. , & Han, A. (2014). Using abstract language signals power. Journal of Personality and Social Psychology , 107 (1), 41–55.
- Xiang, M. , & Kuperberg, A. (2014). Reversing expectations during discourse comprehension . Language, Cognition and Neuroscience , 30 , 648–672.
- Yzerbyt, V. (2016). Intergroup stereotyping. Current Opinion in Psychology , 11 , 90–95.
Related Articles
- Language Attitudes
- Vitality Theory
- The Politics of Translation and Interpretation in International Communication
Printed from Oxford Research Encyclopedias, Communication. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).
date: 25 October 2024
- Cookie Policy
- Privacy Policy
- Legal Notice
- Accessibility
- [66.249.64.20|91.193.111.216]
- 91.193.111.216
Character limit 500 /500
Subscribe to the PwC Newsletter
Join the community, add a new evaluation result row, spoken language identification.
12 papers with code • 12 benchmarks • 4 datasets
Identify the language being spoken from an audio input only.
Benchmarks Add a Result
Most implemented papers
Voxlingua107: a dataset for spoken language recognition.
Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.
Automatic Dialect Detection in Arabic Broadcast Speech
Qatar-Computing-Research-Institute/dialectID • 23 Sep 2015
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.
Language Identification Using Deep Convolutional Recurrent Neural Networks
Language Identification (LID) systems are used to classify the spoken language from a given audio sample and are typically the first step for many spoken language processing tasks, such as Automatic Speech Recognition (ASR) systems.
Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages
State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language.
Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems
Even though the models trained using Triplet Entropy Loss showed a better understanding of the languages and higher accuracies, it appears as though the models still memorise word patterns present in the spectrograms rather than learning the finer nuances of a language.
BERT-LID: Leveraging BERT to Improve Spoken Language Identification
It has a profound impact on the multilingual interoperability of an intelligent speech system.
EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use
In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio.
Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices
This work introduces BRILLsson, a novel binary neural network-based representation learning model for a broad range of non-semantic speech tasks.
Improving Spoken Language Identification with Map-Mix
skit-ai/map-mix • 16 Feb 2023
The pre-trained multi-lingual XLSR model generalizes well for language identification after fine-tuning on unseen languages.
Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech
This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom.
An official website of the United States government
Official websites use .gov A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
- Publications
- Account settings
- Advanced Search
- Journal List
I understand you feel that way, but I feel this way: the benefits of I-language and communicating perspective during conflict
Shane l rogers, jill howieson, casey neame.
- Author information
- Article notes
- Copyright and License information
Corresponding author.
Received 2018 Feb 23; Accepted 2018 May 3; Collection date 2018.
This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
Using hypothetical scenarios, we provided participants with potential opening statements to a conflict discussion that varied on I/you language and communicated perspective. Participants rated the likelihood that the recipient of the statement would react in a defensive manner. Using I-language and communicating perspective were both found to reduce perceptions of hostility. Statements that communicated both self- and other-perspective using I-language (e.g. ‘I understand why you might feel that way, but I feel this way, so I think the situation is unfair’ ) were rated as the best strategy to open a conflict discussion. Simple acts of initial language use can reduce the chances that conflict discussion will descend into a downward spiral of hostility.
Keywords: Communicating perspective, I-statements, You-statements, Defensiveness, Hostility, Interpersonal conflict, Hypothetical scenarios, Factorial ANOVA, Perspective taking, Rating statements
Introduction
During interpersonal conflict, the initial communication style can set the scene for the remainder of the discussion ( Drake & Donohue, 1996 ). A psychological principle termed the norm of reciprocity describes a basic human tendency to match the behaviour and communication style of one’s partner during social interaction ( Park & Antonioni, 2007 ). During conflict, a hostile approach typically produces hostility in return from the other person, potentially creating a negative downward spiral ( Bowen, Winczewski & Collins, 2016 ; Park & Antonioni, 2007 ; Pike & Sillars, 1985 ; Wiebe & Zhang, 2017 ). Therefore, it is not surprising that recommendations abound in the academic and popular literature about specific communication tactics to minimise perceptions of hostility ( Bloomquist, 2012 ; Hargie, 2011 ; Heydenberk & Heydenberk, 2007 ; Howieson & Priddis, 2015 ; Kidder, 2017 ; Moore, 2014 ; Whitcomb & Whitcomb, 2013 ).
The present research assesses two specific aspects of language style that theorists have recommended as beneficial tactics for minimising hostility during conflict: the use of I-language instead of you-language, and communicating perspective ( Hargie, 2011 ). We broadly define communicating perspective as language that is clearly communicating one’s own point of view, and/or communicating an understanding of the perspective of the other person. This paper sets out to examine the relative merits of I-language and communicating perspective, both singularly, and used together, to open communication to prevent conflict escalation.
Communicating perspective—giving and taking
Consideration of the perspective of the other party is widely held to be beneficial during conflict ( Ames, 2008 ; Galinksy et al., 2008 ; Hargie, 2011 ; Howieson & Priddis, 2015 ; Kidder, 2017 ). An understanding of perspectives facilitates a more integrative approach where parties are willing to compromise to arrive at a mutually beneficial solution ( Galinksy et al., 2008 ; Kemp & Smith, 1994 ; Todd & Galinsky, 2014 ). Therefore, in dispute resolution a mediator will endeavour to encourage perspective taking by both parties ( Howieson & Priddis, 2012 , 2015 ; Ingerson, DeTienne & Liljenquist, 2015 ; Kidder, 2017 ; Lee et al., 2015 ). However, fostering perspective taking is more involved than simply telling someone to try and consider the other person’s point of view. In the negotiation/mediation literature it is recognised that increasing levels of perspective taking is best achieved via the communication process that occurs during a negotiation ( Howieson & Priddis, 2012 , 2015 ; Kidder, 2017 ).
It is important to consider that perspective taking is an internal cognitive process, so is not readily apparent to the other party unless it is made observable via communication ( Kellas, Willer & Trees, 2013 ). Kellas, Willer & Trees (2013) refer to this as communicating perspective taking , and report that married couples typically perceive agreement, relevant contributions, coordination, and positive tone as communicated evidence of perspective taking. A more direct strategy to communicate perspective taking is to paraphrase the perspective of the other party ( Howieson & Priddis, 2015 ; Kidder, 2017 ). For example, by explicitly stating what you perceive is the other’s point of view by saying something like ‘What I’m hearing is that perhaps you aren’t seeing enough evidence of appreciation and so feel like you’re being taken for granted’. Seehausen et al. (2012) demonstrated the utility of paraphrasing by interviewing participants (interviewees) about a recent social conflict while varying interviewer responses across participants as simple note taking or paraphrasing. Interviewees receiving paraphrase responses reported feeling less negative emotion associated with the conflict compared to interviewees receiving note taking responses. Similarly, other research has reported that perceptions of empathic effort during conflict is associated with relationship satisfaction ( Cohen et al., 2012 ).
While communicating perspective taking is useful during conflict, so is perspective giving (i.e. attempting to communicate one’s own position/perspective) ( Bruneau & Saxe, 2012 ; Graumann, 1989 ). For example, ‘I am not feeling great about this because I don’t feel like I am receiving a fair deal’. Perspective giving can be beneficial for the person offering the perspective to help them ‘feel heard’ ( Bruneau & Saxe, 2012 ), and also assist the other party to engage in perspective taking, which fosters a greater sense of mutual understanding ( Ames, 2008 ; Ames & Wazlawek, 2014 ). This is why negotiation experts recommend that during conflict both parties should communicate what their perspective is, and also communicate explicitly to the other person that they are attempting to consider their perspective ( Howieson & Priddis, 2015 ). However, while communication of perspective taking and giving have received research attention individually, no prior research has attempted to systematically compare them. In the present study we examine the perception of statements that vary in the extent of communicated perspective (i.e. none, self-only, other-only, self & other). By contrasting two types of communicating perspective (i.e. giving and taking) we can contribute to the research literature while also answering questions that have practical relevance for everyday communication.
I-Language and You-Language
Another aspect of language that is beneficial during conflict is the use of I-language (e.g. ‘I think things need to change’ ) versus you-language (e.g. ‘You need to change’ ) ( Hargie, 2011 ; Kubany et al., 1992a ; Simmons, Gordon & Chambless, 2005 ). For example, Simmons, Gordon & Chambless (2005) reported that a higher proportion of I-language and a lower proportion of you-language was associated with better problem solving and higher marital satisfaction. Similarly, Bieson, Schooler & Smith (2016) found that more frequent you-language during face-to-face conflict discussion was negatively associated with interaction quality of couples.
Kubany and colleagues took an experimental approach to directly assess the impact of I/you-language by examining participant ratings for statements that varied inclusion of I-language or you-language. Their research focused on the communication of emotion using I-statements (e.g. ‘I am feeling upset’ ) versus you-statements (e.g. ‘You have made me upset’ ). In a series of studies it was found that I-language was less likely to evoke negative emotions and more likely to evoke compassion and cooperative behavioural inclinations in the recipient ( Kubany et al., 1992b , 1995a , 1995b ). The benefit of I-language over you-language is that I-language communicates to the recipient that the sender acknowledges they are communicating from their own point of view and therefore they are open to negotiation ( Burr, 1990 ). Further, recipients often perceive you-language as accusatory and hostile ( Burr, 1990 ; Hargie, 2011 ; Kubany et al., 1992a ).
Furthermore, research from the embodied cognition field of study has produced evidence to suggest that text narratives using you-language foster more self-referential processing in the reader compared to I-language ( Beveridge & Pickering, 2013 ; Brunye et al., 2009 , 2011 ; Ditman et al., 2010 ). Compared to I-language, studies have found you-language associated with subsequent faster response times for pictures from a self-perspective (versus other-perspective) ( Brunye et al., 2009 ), better memory performance ( Ditman et al., 2010 ), and higher emotional reactivity ( Brunye et al., 2011 ). While these studies do not directly investigate conflict, the finding that you-language fosters greater self-referential processing is of relevance for conflict. As previously mentioned, during conflict the ideal scenario is for both parties to engage in mutual perspective taking to facilitate the search for a mutually beneficial solution. Therefore, you-language has potential to foster inward focus that can reduce the amount of perspective taking during communication.
The present study
The present study will extend earlier research investigating the impact of I/you language and communicated perspective by employing a rating statements design similar to the experimental research of Kubany and colleagues ( Kubany et al., 1992a , 1992b , 1995a , 1995b ). Based on prior research, we hypothesise that participants will rate: (1) I-language as less likely to provoke a defensive reaction than you-language ( Kubany et al., 1992a , 1992b ; Simmons, Gordon & Chambless, 2005 ); and (2) communicating perspective as less likely to provoke a defensive reaction compared to statements that do not communicate any clear perspective ( Cohen et al., 2012 ; Gordon & Chen, 2016 ; Howieson & Priddis, 2015 ; Kellas, Willer & Trees, 2013 ).
Importantly, we also varied the type of communicated perspective. As there has not been any prior research contrasting self-oriented versus other-oriented communicated perspective using a statement-rating paradigm, only tentative expectations were possible. We anticipated that a blend of one’s own and the other’s perspective would be received the best. We think this strategy is superior because it conveys both understanding (by acknowledging the other) and positive assertion (by acknowledging the self) ( Howieson & Priddis, 2015 ).
Participants
Participants were 253 university students (Mean = 28 years; SD = 8.75; 77% female). Prior to commencement of the research ethical approval was obtained from Edith Cowan University ethics committee (Ref: 15257). All participants supplied informed consent to take part in this research. Prior research by Kubany and colleagues using a similar rating statements paradigm reported significant effects with relatively low sample sizes ranging from 16 to 40 ( Kubany et al., 1992a , 1992b , 1995a ), with one study using a larger sample of 160 ( Kubany et al., 1995b ). We were therefore initially aiming to collect around 100 responses, but ended up with a larger sample, which has the benefit of assisting us feel more confident in our results.
Six hypothetical scenarios were constructed that described a hypothetical conflict situation. One example scenario was: Mike and his partner Lucy are living together and both working full time. Whenever Mike does some cleaning of the house and asks Lucy to help, Lucy typically replies that she is too tired after a full day at work to do cleaning. Mike feels it is unfair that he should be responsible for all the cleaning duties .
Each scenario was designed with two protagonists, an offended party (e.g. Mike), and the party causing offense (e.g. Lucy). An overall problem was communicated within the scenario description (e.g. Lucy is not helping to clean the house as often as Mike would like). In addition, both the perspective of the offended party (e.g. Mike feels it unfair that he is responsible for all cleaning), and the offending party (e.g. Lucy is usually very tired after a full day at her work) were also included. After reading a scenario, the participant was presented with eight statements that the offended party might use to begin a conflict discussion with the offending party. These statements varied on the usage of I/you language and communicated perspective. Table 1 shows all the statements presented to participants for the example scenario.
Table 1. Example statements provided to participants.
Provided above each statement in italics is the type of perspective/s communicated in the statement, and whether the statement is written predominately using either I or you language. Note that the statement type information provided in italics in the table was not presented to participants.
The six scenarios were evenly spread across issues between romantic partners (two), friends (two), and work colleagues (two). All scenarios are provided in Supplemental Information 1 associated with this article. This research aims to investigate a general impact of subtle differences in communication rather than focusing on specific relationships or situations. The authors acknowledge that there is likely to be interesting differences to be found across different relationships and situations but those questions, while interesting, are beyond the scope of the present research.
Scenarios were presented in random order to participants, as were the order of the statements, via the online survey program Qualtrics. For each statement the participant rated the likelihood that the offending party would react in a defensive manner on a six-point scale (extremely unlikely to extremely likely).
Participant ratings were averaged across scenarios. This provided a composite likelihood of defensive reaction score for each of the eight statement types, see Fig. 1 . Next, a 4 (communicated perspective: self and other, other-only, self-only, none) × 2 (I/you language: I-language, you-language) repeated measures factorial ANOVA was conducted on the mean composite scores. A main effect of I/you language was found, F (1,252) = 357.88, p < 0.001, η p 2 = 0.59, demonstrated in Fig. 1 by the consistently lower defensiveness ratings for I-language statements compared to you-language statements. A main effect of communicated perspective was also found, F (3,756) = 364.04, p < 0.001, η p 2 = 0.59. In Fig. 1 the difference in overall defensiveness ratings follows the pattern: none > self-only > other-only > self and other, all p s < 0.001.
Figure 1. Mean composite defensive rating scores for statements that varied in the type of communicated perspective, and I/you language; error bars represent 95% confidence limits.
A significant interaction effect was also found between I/you language and communicated perspective, F (3,756) = 77.12, p < 0.001, η p 2 = 0.23. This interaction effect occurred because while there was a significant difference between I and you-statements for each perspective type (all p s < 0.001), the effect size was similar across self-and-other ( r = 0.72), other-only ( r = 0.74), and self-only ( r = 0.68) perspective types but substantially lower when no perspective was communicated ( r = 0.47). This suggests that when I-language is used in conjunction with communicated perspective there is a larger benefit of incorporating I-language compared to when no perspective is communicated.
The present study investigated whether subtle changes in language can influence the perceived impact of statements used to open a conflict discussion. As expected, participants rated statements that contained I-language as having a lower likelihood of evoking a defensive reaction compared with statements that contained you-language. This result is consistent with earlier findings that report a superiority of I-language over you-language for conflict communication ( Bieson, Schooler & Smith, 2016 ; Kubany et al., 1992a , 1992b , 1995a , 1995b ; Simmons, Gordon & Chambless, 2005 ).
In the present study, the benefit of I-language compared to you-language was larger for statements that communicated one or more perspectives. For example, compared to simple statements such as ‘Lucy, I think you should help with the cleaning’ versus ‘Lucy, you should help with the cleaning’ there was a more pronounced benefit for I-language when the statements communicated an acknowledgement of the perspective of the offending party, such as ‘Lucy, I understand that you are very tired after work, but I think you should help with the cleaning’ versus ‘Lucy, you are very tired after work, but you should help with the cleaning’ .
Another expectation was that participants would rate communicating perspective more favourably compared to no communication of perspective. Results supported this expectation, which is consistent with prior recommendations that there is a benefit to communicating perspective during conflict and tense situations ( Ames, 2008 ; Cohen et al., 2012 ; Gordon & Chen, 2016 ; Howieson & Priddis, 2015 ; Kellas, Willer & Trees, 2013 ; Kidder, 2017 ). Furthermore, the results revealed that when communicating perspective, the participants rated communicating both the self- and other-perspective as the most favourable, followed by communicating the other-perspective only, and finally the self-perspective only.
The present study is the first to directly compare ratings across statements that vary in the type of perspective communicated. These results suggest that as an individual act, communicating the perspective of the other is more important than communicating the perspective of the self. However, results also suggest that a combination of communicating both the perspective of the self and other is the best strategy for opening a conflict discussion.
In sum, the results suggest that we are more likely to receive a defensive and hostile reaction when we do not communicate any type of perspective, regardless of whether we use I-language or you-language. On the other hand, if we communicate using statements that include both the perspective of the self and the other person, and include I-language, then we are less likely to receive a defensive response. For example, the statement, ‘Lucy, I understand that you are very tired after work, but I feel it is unfair that I have to do all the cleaning by myself, and I think you should help with the cleaning’ , which includes both self and other perspectives and I-language, was rated as the least likely to produce a defensive response.
Limitations
Scholars have recognised that the non-interactive rating statements paradigm used in the present study limits the generalisability of findings beyond non-interactive communication contexts ( Bippus & Young, 2005 ; Kubany et al., 1992a ). Arguably, our findings reported here are more appropriately generalised to less interactive forms of communication such as text messaging and email. However, it must also be noted that the results of the present study do parallel findings investigating more interactive contexts ( Cohen et al., 2012 ; Gordon & Chen, 2016 ; Simmons, Gordon & Chambless, 2005 ).
In our study we utilised a within-participants design, where each statement was rated by all participants. A within-participants approach for a rating statements task has been criticized due to a potential risk that differences in the perception of statements might be exaggerated by participants engaging in a comparative process when making judgments ( Bippus & Young, 2005 ). There is however no empirical evidence to confirm or disconfirm this assertion. In the present study, our interest was to gain insight into the relative utility of the statements used. So even if participants engaged in a comparative process we don’t believe this seriously compromises the findings, as we were interested in determining which statement/s the participants thought was best (compared to all other statements). We concede however that there is a possibility that the magnitude of differences observed among statements in the present research might be lower if the study used a between-participants approach.
Our study is limited to perceptions regarding the likelihood of a defensive reaction. We assume that underlying the broader perception provided by our participants are more specific perceptions regarding levels of politeness, appropriateness, aggression, assertiveness, effectiveness, rationality, fairness, consideration, and clarity, and so on ( Bippus & Young, 2005 ; Hess et al., 1980 ; Kasper, 1990 ; Kubany et al., 1992b ; Lewis & Gallois, 1984 ; Schroeder, Rakos & Moe, 1983 ). The purpose of our study was to broadly assess the perception of the statements, so we decided upon broad terminology (i.e. rating the likelihood of a defensive response). Asking participants to rate all statements for many adjectives (e.g. aggression, clarity, and so on) would have caused the survey to be too long. Investigating the more specific perceptions mentioned here, perhaps using fewer scenarios and/or statements for practical reasons, is an avenue for future research.
Future research
The present study suggests that I-language is less likely to produce a defensive reaction in a message recipient compared with you-language, particularly when it includes self and other communicated perspective. However, a related type of language framing outside the scope of the present study is we-language . Various researchers have suggested that we-language (e.g. ‘We should talk about what we need to do to solve our problem’ ) is of greater benefit than I-language during times of conflict or tension ( Seider et al., 2009 ; Williams-Bausom et al., 2010 ). Seider et al. (2009) argue that we-language emphasises togetherness and therefore fosters more collaboration compared to I-language, which might foster a sense of separation. A replication and extension of the present study incorporating statements predominately using we-language would therefore be a useful test of some arguments that exist for the benefits of we-language.
Our research has focused on opening statements to a conflict discussion. Assessing language for making closing statements would also be a useful avenue of further inquiry. Additionally, an investigation of the impact of specific statements mid-discussion would be interesting. For example, if the discussion has started badly, will the use of a targeted statement that communicates perspective using I-language ‘fall on deaf ears’? Or, as discussed by Howieson & Priddis (2015) , might communicating perspective mid-discussion halt a downwards spiral, or even direct the flow of conversation into an upwards spiral? Jameson, Sohan & Hodge (2014) describe these kinds of moments in a negotiation as ‘turning points’.
Another avenue for future research is to specifically investigate if communicating perspective, or I-language and you-language, have different effects across discursive actions such as requesting, describing, questioning, criticizing, blaming, offering alternatives, rejecting, refusing, inviting, and so on. As an example, consider the action of making an invitation ( Bella & Moser, 2018 ; Margutti & Galatolo, 2018 ). What might be the best way to solicit a commitment? A simple request using you-language (e.g. ‘You should come to dinner tomorrow night’ ), or incorporating I-language (e.g. ‘I think you should come to dinner tomorrow night’ ), or communicating perspective in addition to the request (e.g. ‘I think you should come to dinner tomorrow night, because I think you’ll benefit from getting out of the house’ OR ‘You should come to dinner tomorrow night, because you’ll benefit from getting out of the house’ ). The current research has potential to be extended to further understand how to best communicate during conflict, but also how to be more persuasive in general.
A practical implication of this research is to provide empirical evidence to inform guidelines about how to frame opening statements during conflict. Generally, communicating some perspective (i.e. self and/or other) is better than neglecting to do so. When communicating perspective, the results of this study suggest that it is most beneficial to communicate both points of view (i.e. self and other) rather than a single perspective, using I-language.
This study has highlighted how different forms of language can interact with one another during conflict. Specifically, the study demonstrates how I-language becomes more beneficial for minimising hostility when one also communicates perspective (e.g. ‘ Lucy, I understand that you are very tired after work, but I feel it is unfair that I have to do all the cleaning by myself, and I think you should help with the cleaning’ ), compared to simple requests where no perspective is communicated (e.g. ‘Lucy, I think you should help with the cleaning’ ). The primary mechanisms provided in the literature to explain the underlying benefits of communicating perspective is that it fosters a greater sense of ‘feeling heard’, and mutual understanding ( Ames, 2008 ; Ames & Wazlawek, 2014 ; Bruneau & Saxe, 2012 ; Howieson & Priddis, 2011 ; Lee et al., 2015 ). It also fosters a sense of openness, transparency, and honesty that maximises perceived politeness and minimises perceived hostility ( Howieson & Priddis, 2015 ; Ingerson, DeTienne & Liljenquist, 2015 ; Kellas, Willer & Trees, 2013 ; Kidder, 2017 ; Seehausen et al., 2012 ). The primary mechanisms to explain a benefit of I-language over you-language are that I-language indicates a recognition of providing a specific point of view that is open for discussion ( Burr, 1990 ), and that you-language can at times be perceived as accusatory ( Kubany et al., 1992a , 1992b ). Additionally, you-language might foster an increased tendency for inward focus in one’s partner ( Brunye et al., 2009 , 2011 ; Ditman et al., 2010 ).
How other specific language (e.g. we-language, should-statements, and speaking in absolutes) interact with one another to influence perceptions during conflict is an important avenue for further study. However, for now, this research gives us a significant insight into how two of the most commonly cited language strategies interact with each other (i.e. I-language and communicating perspective-taking). This research supports the recommendations made by scholars within communication ( Hargie, 2011 ), developmental ( Bloomquist, 2012 ; Heydenberk & Heydenberk, 2007 ), business ( Whitcomb & Whitcomb, 2013 ), and legal ( Howieson & Priddis, 2015 ; Kidder, 2017 ; Moore, 2014 ) fields of scholarship to make use of communicative strategies such as I-language and communicating perspective during conflict to minimise hostility.
Supplemental Information
This supplementary file contains all the scenarios and statements that were used as part of this research. Participants rated each of the statements regarding the extent they believed the intended recipient would react in a defensive manner.
Raw data in SPSS format contains all participant ratings of all statements, in addition to the composite variables created from those ratings.
Raw data in Excel, variable descriptions in second tab in the file.
Funding Statement
The authors received no funding for this work.
Additional Information and Declarations
Competing interests.
The authors declare that they have no competing interests.
Author Contributions
Shane L. Rogers conceived and designed the experiments, performed the experiments, analysed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
Jill Howieson conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.
Casey Neame conceived and designed the experiments, performed the experiments, authored or reviewed drafts of the paper, approved the final draft.
Human Ethics
The following information was supplied relating to ethical approvals (i.e. approving body and any reference numbers):
Edith Cowan University granted ethical approval to carry out the study (Ethical Application Ref: 15257).
Data Availability
The following information was supplied regarding data availability:
The raw data are provided in the Supplemental Files .
- Ames (2008). Ames DR. In search of the right touch: interpersonal assertiveness in organizational life. Current Directions in Psychological Science. 2008;17(6):381–385. doi: 10.1111/j.1467-8721.2008.00610.x. [ DOI ] [ Google Scholar ]
- Ames & Wazlawek (2014). Ames DR, Wazlawek AS. Pushing in the dark: causes and consequences of limited self-awareness for interpersonal assertiveness. Personality and Social Psychology Bulletin. 2014;40(6):775–790. doi: 10.1177/0146167214525474. [ DOI ] [ PubMed ] [ Google Scholar ]
- Bella & Moser (2018). Bella S, Moser A. What’s in a first? The link between impromptu invitations and their responses. Journal of Pragmatics. 2018;125:96–110. doi: 10.1016/j.pragma.2017.08.009. [ DOI ] [ Google Scholar ]
- Beveridge & Pickering (2013). Beveridge MEL, Pickering MJ. Perspective taking in language: integrating the spatial and action domains. Frontiers in Human Neuroscience. 2013;7:1–11. doi: 10.3389/fnhum.2013.00577. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Bieson, Schooler & Smith (2016). Bieson JN, Schooler DE, Smith DA. What a difference a pronoun makes: i/we versus you/me and worried couples’ perceptions of their interaction quality. Journal of Language and Social Psychology. 2016;35(2):180–205. doi: 10.1177/0261927x15583114. [ DOI ] [ Google Scholar ]
- Bippus & Young (2005). Bippus AM, Young SL. Owning your own emotions: reactions to expressions of self- versus other-attributed positive and negative emotions. Journal of Applied Communication Research. 2005;33(1):26–45. doi: 10.1080/0090988042000318503. [ DOI ] [ Google Scholar ]
- Bloomquist (2012). Bloomquist ML. The Practitioner Guide to Skills Training for Struggling Kids. New York: Guilford Press; 2012. [ Google Scholar ]
- Bowen, Winczewski & Collins (2016). Bowen JD, Winczewski LA, Collins NL. Language style matching in romantic partners’ conflict and support interactions. Journal of Language and Social Psychology. 2016;36(3):263–284. doi: 10.1177/0261927x16666308. [ DOI ] [ Google Scholar ]
- Bruneau & Saxe (2012). Bruneau EG, Saxe R. The power of being heard: the benefits of ‘perspective-giving’ in the context of intergroup conflict. Journal of Experimental Social Psychology. 2012;48(4):855–866. doi: 10.1016/j.jesp.2012.02.017. [ DOI ] [ Google Scholar ]
- Brunye et al. (2009). Brunye TT, Ditman T, Mahoney CR, Augustyn JS, Taylor HA. When you and I share perspectives: pronouns modulate perspective taking during narrative comprehension. Psychological Science. 2009;20(1):27–32. doi: 10.1111/j.1467-9280.2008.02249.x. [ DOI ] [ PubMed ] [ Google Scholar ]
- Brunye et al. (2011). Brunye TT, Ditman T, Mahoney CR, Taylor HA. Better you than I: perspectives and emotion simulation during narrative comprehension. Journal of Cognitive Psychology. 2011;23(5):659–666. doi: 10.1080/20445911.2011.559160. [ DOI ] [ Google Scholar ]
- Burr (1990). Burr WR. Beyond I-statements in family communication. Family Relations. 1990;39(3):266–273. doi: 10.2307/584870. [ DOI ] [ Google Scholar ]
- Cohen et al. (2012). Cohen S, Schulz MS, Weiss E, Waldinger RJ. Eye of the beholder: the individual and dyadic contributions of empathic accuracy and perceived empathic effort to relationship satisfaction. Journal of Family Psychology. 2012;26(2):236–245. doi: 10.1037/a0027488. [ DOI ] [ PubMed ] [ Google Scholar ]
- Ditman et al. (2010). Ditman T, Brunye TT, Mahoney CR, Taylor HA. Simulating an enactment effect: pronouns guide action simulation during narrative comprehension. Cognition. 2010;115(1):172–178. doi: 10.1016/j.cognition.2009.10.014. [ DOI ] [ PubMed ] [ Google Scholar ]
- Drake & Donohue (1996). Drake LE, Donohue WA. Communicative framing theory in conflict resolution. Communication Research. 1996;23(3):297–322. doi: 10.1177/009365096023003003. [ DOI ] [ Google Scholar ]
- Galinksy et al. (2008). Galinksy AD, Maddux WW, Gilin D, White JB. Why it pays to get inside the head of your opponent: the differential effects of perspective taking and empathy in negotiations. Psychological Science. 2008;19(4):378–384. doi: 10.1111/j.1467-9280.2008.02096.x. [ DOI ] [ PubMed ] [ Google Scholar ]
- Gordon & Chen (2016). Gordon AM, Chen S. Do you get where I’m coming from? Perceived understanding buffers against the negative impact of conflict on relationship satisfaction. Journal of Personality and Social Psychology. 2016;110(2):239–260. doi: 10.1037/pspi0000039. [ DOI ] [ PubMed ] [ Google Scholar ]
- Graumann (1989). Graumann CF. Perspective setting and taking in verbal interaction. North-Holland Linguistic Series: Linguistic Variations. 1989;54:95–122. doi: 10.1016/b978-0-444-87144-2.50007-0. [ DOI ] [ Google Scholar ]
- Hargie (2011). Hargie O. Skilled Interpersonal Communication. New York: Routledge; 2011. [ Google Scholar ]
- Hess et al. (1980). Hess EP, Bridgwater CA, Bornstein PH, Sweeney TM. Situational determinants in the perception of assertiveness: gender-related influences. Behavior Therapy. 1980;11(1):49–58. doi: 10.1016/s0005-7894(80)80035-9. [ DOI ] [ Google Scholar ]
- Heydenberk & Heydenberk (2007). Heydenberk W, Heydenberk R. More than manners: conflict resolution in primary level classrooms. Early Childhood Education Journal. 2007;35(2):119–126. doi: 10.1007/s10643-007-0185-4. [ DOI ] [ Google Scholar ]
- Howieson & Priddis (2011). Howieson J, Priddis L. Building resilience for separating parents through mentalizing and constructive lawyering techniques. Psychiatry, Psychology and Law. 2011;18(2):202–211. doi: 10.1080/13218711003739532. [ DOI ] [ Google Scholar ]
- Howieson & Priddis (2012). Howieson J, Priddis L. Mentalising in mediation: towards an understanding of the mediation shift. Australasian Dispute Resolution Journal. 2012;23:52–60. [ Google Scholar ]
- Howieson & Priddis (2015). Howieson J, Priddis L. A mentalizing-based approach to family mediation: harnessing our fundamental capacity to resolve conflict and building an evidence-based practice for the field. Family Court Review. 2015;53(1):79–95. doi: 10.1111/fcre.12132. [ DOI ] [ Google Scholar ]
- Ingerson, DeTienne & Liljenquist (2015). Ingerson M-C, DeTienne KB, Liljenquist KA. Beyond instrumentalism: a relational approach to negotiation. Negotiation Journal. 2015;31(1):31–46. doi: 10.1111/nejo.12078. [ DOI ] [ Google Scholar ]
- Jameson, Sohan & Hodge (2014). Jameson JK, Sohan D, Hodge J. Turning points and conflict transformation in mediation. Negotiation Journal. 2014;30(2):209–229. doi: 10.1111/nejo.12056. [ DOI ] [ Google Scholar ]
- Kasper (1990). Kasper G. Linguistic politeness: current research issues. Journal of Pragmatics. 1990;14(2):193–218. [ Google Scholar ]
- Kellas, Willer & Trees (2013). Kellas JK, Willer EK, Trees AR. Communicated perspective-taking during stories of marital stress: spouses’ perceptions of one another’s perspective-taking behaviors. Southern Communication Journal. 2013;78(4):326–351. doi: 10.1080/1041794X.2013.815264. [ DOI ] [ Google Scholar ]
- Kemp & Smith (1994). Kemp KE, Smith WP. Information exchange, toughness, and integrative bargaining: the roles of explicit cues and perspective-taking. International Journal of Conflict Management. 1994;5(1):5–21. doi: 10.1108/eb022734. [ DOI ] [ Google Scholar ]
- Kidder (2017). Kidder DL. BABO negotiating: enhancing students’ perspective-taking skills. Negotiation Journal. 2017;33(3):255–267. doi: 10.1111/nejo.12185. [ DOI ] [ Google Scholar ]
- Kubany et al. (1995a). Kubany ES, Bauer GB, Muraoka MY, Richard DC, Read P. Impact of labeled anger and blame in intimate relationships. Journal of Social and Clinical Psychology. 1995a;14(1):53–60. doi: 10.1521/jscp.1995.14.1.53. [ DOI ] [ Google Scholar ]
- Kubany et al. (1995b). Kubany ES, Bauer GB, Pangilinan ME, Muraoka MY, Enriquez VG. Impact of labeled anger and blame in intimate relationships: cross-cultural extension of findings. Journal of Cross-Cultural Psychology. 1995b;26(1):65–83. doi: 10.1177/0022022195261005. [ DOI ] [ Google Scholar ]
- Kubany et al. (1992a). Kubany ES, Richard DC, Bauer GB, Muraoka MY. Impact of assertive and accusatory communication of distress and anger: a verbal component analysis. Aggressive Behavior. 1992a;18(5):337–347. doi: 10.1002/1098-2337(1992)18:5<337::aid-ab2480180503>3.0.co;2-k. [ DOI ] [ Google Scholar ]
- Kubany et al. (1992b). Kubany ES, Richard DC, Bauer GB, Muraoka MY. Verbalized anger and accusatory “you” messages as cues for anger and antagonism among adolescents. Adolescence. 1992b;27(107):505–516. [ PubMed ] [ Google Scholar ]
- Lee et al. (2015). Lee DS, Moeller SJ, Kopelman S, Ybarra O. Biased social perceptions of knowledge: implications for negotiators’ rapport and egocentrism. Negotiation and Conflict Management Research. 2015;8(2):85–99. doi: 10.1111/ncmr.12047. [ DOI ] [ Google Scholar ]
- Lewis & Gallois (1984). Lewis PN, Gallois C. Disagreements, refusals, or negative feelings: perception of negatively assertive messages from friends and strangers. Behavior Therapy. 1984;15(4):353–368. doi: 10.1016/s0005-7894(84)80003-9. [ DOI ] [ Google Scholar ]
- Margutti & Galatolo (2018). Margutti P, Galatolo R. Reason-for-calling invitations in Italian telephone calls: action design and recipient commitment. Journal of Pragmatics. 2018;125:76–95. doi: 10.1016/j.pragma.2017.06.017. [ DOI ] [ Google Scholar ]
- Moore (2014). Moore CW. The Mediation Process: Practical Strategies for Resolving Conflict. San Francisco: Jossey-Bass; 2014. [ Google Scholar ]
- Park & Antonioni (2007). Park H, Antonioni D. Personality, reciprocity, and strength of conflict resolution strategy. Journal of Research in Personality. 2007;41(1):110–125. doi: 10.1016/j.jrp.2006.03.003. [ DOI ] [ Google Scholar ]
- Pike & Sillars (1985). Pike GR, Sillars A. Reciprocity of marital communication. Journal of Social and Personal Relationships. 1985;2(3):303–324. doi: 10.1177/0265407585023005. [ DOI ] [ Google Scholar ]
- Schroeder, Rakos & Moe (1983). Schroeder HE, Rakos RF, Moe J. The social perception of assertive behavior as a function of response class and gender. Behavior Therapy. 1983;14(4):534–544. doi: 10.1016/s0005-7894(83)80076-8. [ DOI ] [ Google Scholar ]
- Seehausen et al. (2012). Seehausen M, Kazzer P, Bajbouj M, Prehn K. Effects of empathic paraphrasing—extrinsic emotion regulation in social conflict. Frontiers in Psychology. 2012;3:482. doi: 10.3389/fpsyg.2012.00482. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Seider et al. (2009). Seider BH, Hirschberger G, Nelson KL, Levenson RW. We can work it out: age differences in relational pronouns, physiology, and behavior in marital conflict. Psychology and Aging. 2009;24(3):604–613. doi: 10.1037/a0016950. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Simmons, Gordon & Chambless (2005). Simmons RA, Gordon PC, Chambless DL. Pronouns in marital interaction: what do “you” and “I” say about marital health? Psychological Science. 2005;16(2):932–936. doi: 10.1111/j.1467-9280.2005.01639.x. [ DOI ] [ PubMed ] [ Google Scholar ]
- Todd & Galinsky (2014). Todd AR, Galinsky AD. Perspective-taking as a strategy for improving intergroup relations: evidence, mechanisms, and qualifications. Social and Personality Psychology Compass. 2014;8(7):374–387. doi: 10.1111/spc3.12116. [ DOI ] [ Google Scholar ]
- Whitcomb & Whitcomb (2013). Whitcomb CA, Whitcomb LE. Effective Interpersonal and Team Communication Skills for Engineers. Hoboken: John Wiley & Sons, Inc; 2013. I, you, and the team; pp. 53–61. [ Google Scholar ]
- Wiebe & Zhang (2017). Wiebe WT, Zhang YB. Conflict initiating factors and management styles in family and nonfamily intergenerational relationships: young adults’ retrospective written accounts. Journal of Language and Social Psychology. 2017;36(3):368–379. doi: 10.1177/0261927x16660829. [ DOI ] [ Google Scholar ]
- Williams-Bausom et al. (2010). Williams-Bausom KJ, Atkins DC, Sevier M, Eldridge KA, Christensen A. “You” and “I” need to talk about “us”: linguistic patterns in marital interactions. Personal Relationships. 2010;17(1):41–56. doi: 10.1111/j.1475-6811.2010.01251.x. [ DOI ] [ Google Scholar ]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data availability statement.
- View on publisher site
- PDF (1009.9 KB)
- Collections
Similar articles
Cited by other articles, links to ncbi databases.
- Download .nbib .nbib
- Format: AMA APA MLA NLM
IMAGES
VIDEO
COMMENTS
1 Language Testing Research Centre, The University of Melbourne, Melbourne, VIC, Australia; 2 Department of Linguistics, University of Illinois at Urbana-Champaign, Champaign, IL, United States; This paper provides a narrative review of empirical research on the assessment of speaking proficiency published in selected journals in the field of language assessment.
However, a small body of research suggests that both adults and children may have difficulty using dialect features that are present in one dialect but not the other as cues in spoken language comprehension (Bühler, Reference Bühler 2017; Beyer et al., Reference Beyer, Edwards and Fuller 2015; De Villiers & Johnson, Reference De Villiers and ...
As with global biodiversity, the world's language diversity is under threat. Of the approximately 7,000 documented languages, nearly half are considered endangered 1,2,3,4,5,6,7,8.In comparison ...
The research conducted since then motivates us to endorse an integrated-interactive approach to (spoken) language acquisition. By integrated, we mean that the gradually acquired knowledge about different unit types and sizes does not constrain children to move from one organizational scheme to another (e.g., from holistic to segmental ...
People have long desired intelligent conversational systems that can provide assistance in practical scenarios. The latest advancements in large language models (LLMs) are making significant strides toward turning this aspiration into a tangible reality. LLMs are believed to hold the most potential and value in education, especially in the creation of AI-driven virtual teachers that facilitate ...
The 20 years since the publication of this special issue have seen a second methodological revolution in spoken language research: ... VBM), to spoken language understanding is reviewed in a paper by Wilson (Citation 2017). A similar form of causal inference can derive from experimentally induced changes to brain function. Techniques for short ...
The creation of speech databases for spoken language research and development, especially for less-resourced languages, is a time-consuming and largely manual task. ... In this paper, a detailed review of the spoken language identification has been given which can help in understanding various aspects, features and challenges of these systems ...
Teachers use spoken language to assess learning (Petersen et al., ... Only English language papers were included, thereby potentially narrowing its scope. The age-related selection criteria potentially created issues with the interpretation of review findings as distinctions between early-years and more formal education vary in different ...
This chapter reviews research and practice in six main areas relevant to the teaching of speaking: (1) the growing influence of spoken corpora, (2) the debates concerning native speaker (NS) and nonnative speaker (NNS) models for spoken pedagogy,(3) the issue of authenticity in spoken materials, (4) approaches to understanding speaking in the classroom, (5) the selection of texts and aspects ...
In the three sections which follow, I shall review current thinking on: 1) the spoken language as a formal system, focusing on pronunciation, 2) the spoken language as a medium of information transfer (that is, in its transactional function), and 3) the spoken language as a medium of interpersonal exchange (that is, in its interactional ...
Drawing on research studies in (socio)linguistics, discourse analysis, and literacy, this paper provides a synthesis of findings about lexical and syntactico-semantic differences between spokken and written language, focusing on empirical research on the English language since the 1920s. The major theoretical and methodological aproaches used ...
Spoken and Written Language as Medium of Communication: A Self-r e ection. Ali Alsaawi*. Assistant Professor, Department of English, College of Humanities & Sciences in Alghat, Majmaah University ...
Retrieval augmented generation (RAG) has shown promise for enhancing natural language understanding (NLU) capabilities of large language models (LLMs) by retrieving relevant knowledge as prompts. Extending RAG to spoken language understanding (SLU) represents an important research direction. This paper proposes a RAG approach for improving SLU. First, the encoder of a pretrained automatic ...
Yet, even for a journal far ahead of its time in terms of diversity in topics and authorship, Best and Everett (2010) reported that within the early part of the current century, the majority of papers appearing in JCCP came from U.S. based researchers and/or from English-speaking countries. Although the IACCP has certainly done a lot to advance cross-cultural psychology through "exclusively ...
Automatic translation from signed to spoken languages is an interdisciplinary research domain on the intersection of computer vision, machine translation (MT), and linguistics. While the domain is growing in terms of popularity—the majority of scientific papers on sign language (SL) translation have been published in the past five years—research in this domain is performed mostly by ...
Spoken Language Understanding (SLU) aims to extract the semantics frame of user queries, which is a core component in a task-oriented dialog system. With the burst of deep neural networks and the evolution of pre-trained language models, the research of SLU has obtained significant breakthroughs. However, there remains a lack of a comprehensive survey summarizing existing approaches and recent ...
Research on the power of language takes the view that language has power of its own. This power allows a language to maintain the power behind it, unite or divide a nation, and create influence. In Figure 1 we have grouped the five language-power relationships into five boxes. Note that the boundary between any two boxes is not meant to be ...
This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom. 1. Paper.
Abstract. Spoken language is an innate ability of the human being and represents the most widespread mode of social communication. The ability to share concepts, intentions and feelings, and also to respond to what others are feeling/saying is crucial during social interactions. A growing body of evidence suggests that language evolved from ...
Natural Language Processing (NLP) stands as a pivotal advancement in the field of artificial intelligence, revolutionizing the way machines comprehend and interact with human language. This paper ...
A significant interaction effect was also found between I/you language and communicated perspective, F (3,756) = 77.12, p < 0.001, η p 2 = 0.23. This interaction effect occurred because while there was a significant difference between I and you-statements for each perspective type (all ps < 0.001), the effect size was similar across self-and-other (r = 0.72), other-only (r = 0.74), and self ...