• Reference Manager
  • Simple TEXT file

People also looked at

Systematic review article, are we there yet - a systematic literature review on chatbots in education.

www.frontiersin.org

  • 1 Information Center for Education, DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany
  • 2 Educational Science Faculty, Open University of the Netherlands, Heerlen, Netherlands
  • 3 Computer Science Faculty, Goethe University, Frankfurt am Main, Germany

Chatbots are a promising technology with the potential to enhance workplaces and everyday life. In terms of scalability and accessibility, they also offer unique possibilities as communication and information tools for digital learning. In this paper, we present a systematic literature review investigating the areas of education where chatbots have already been applied, explore the pedagogical roles of chatbots, the use of chatbots for mentoring purposes, and their potential to personalize education. We conducted a preliminary analysis of 2,678 publications to perform this literature review, which allowed us to identify 74 relevant publications for chatbots’ application in education. Through this, we address five research questions that, together, allow us to explore the current state-of-the-art of this educational technology. We conclude our systematic review by pointing to three main research challenges: 1) Aligning chatbot evaluations with implementation objectives, 2) Exploring the potential of chatbots for mentoring students, and 3) Exploring and leveraging adaptation capabilities of chatbots. For all three challenges, we discuss opportunities for future research.

Introduction

Educational Technologies enable distance learning models and provide students with the opportunity to learn at their own pace. They have found their way into schools and higher education institutions through Learning Management Systems and Massive Open Online Courses, enabling teachers to scale up good teaching practices ( Ferguson and Sharples, 2014 ) and allowing students to access learning material ubiquitously ( Virtanen et al., 2018 ).

Despite the innovative power of educational technologies, most commonly used technologies do not substantially change teachers’ role. Typical teaching activities like providing students with feedback, motivating them, or adapting course content to specific student groups are still entrusted exclusively to teachers, even in digital learning environments. This can lead to the teacher-bandwidth problem ( Wiley and Edwards, 2002 ), the result of a shortage of teaching staff to provide highly informative and competence-oriented feedback at large scale. Nowadays, however, computers and other digital devices open up far-reaching possibilities that have not yet been fully exploited. For example, incorporating process data can provide students with insights into their learning progress and bring new possibilities for formative feedback, self-reflection, and competence development ( Quincey et al., 2019 ). According to ( Hattie, 2009 ), feedback in terms of learning success has a mean effect size of d = 0.75, while ( Wisniewski et al., 2019 ) even report a mean effect of d = 0.99 for highly informative feedback. Such feedback provides suitable conditions for self-directed learning ( Winne and Hadwin, 2008 ) and effective metacognitive control of the learning process ( Nelson and Narens, 1994 ).

One of the educational technologies designed to provide actionable feedback in this regard is Learning Analytics. Learning Analytics is defined as the research area that focuses on collecting traces that learners leave behind and using those traces to improve learning ( Duval and Verbert, 2012 ; Greller and Drachsler, 2012 ). Learning Analytics can be used both by students to reflect on their own learning progress and by teachers to continuously assess the students’ efforts and provide actionable feedback. Another relevant educational technology is Intelligent Tutoring Systems. Intelligent Tutoring Systems are defined as computerized learning environments that incorporate computational models ( Graesser et al., 2001 ) and provide feedback based on learning progress. Educational technologies specifically focused on feedback for help-seekers, comparable to raising hands in the classroom, are Dialogue Systems and Pedagogical Conversational Agents ( Lester et al., 1997 ). These technologies can simulate conversational partners and provide feedback through natural language ( McLoughlin and Oliver, 1998 ).

Research in this area has recently focused on chatbot technology, a subtype of dialog systems, as several technological platforms have matured and led to applications in various domains. Chatbots incorporate generic language models extracted from large parts of the Internet and enable feedback by limiting themselves to text or voice interfaces. For this reason, they have also been proposed and researched for a variety of applications in education ( Winkler and Soellner, 2018 ). Recent literature reviews on chatbots in education ( Winkler and Soellner, 2018 ; Hobert, 2019a ; Hobert and Meyer von Wolff, 2019 ; Jung et al., 2020 ; Pérez et al., 2020 ; Smutny and Schreiberova, 2020 ; Pérez-Marín, 2021 ) have reported on such applications as well as design guidelines, evaluation possibilities, and effects of chatbots in education.

In this paper, we contribute to the state-of-the-art of chatbots in education by presenting a systematic literature review, where we examine so-far unexplored areas such as implementation objectives, pedagogical roles, mentoring scenarios, the adaptations of chatbots to learners, and application domains. This paper is structured as follows: First, we review related work (section 2), derive research questions from it, then explain the applied method for searching related studies (section 3), followed by the results (section 4), and finally, we discuss the findings (section 5) and point to future research directions in the field (section 5).

Related Work

In order to accurately cover the field of research and deal with the plethora of terms for chatbots in the literature (e.g. chatbot, dialogue system or pedagogical conversational agent) we propose the following definition:

Chatbots are digital systems that can be interacted with entirely through natural language via text or voice interfaces. They are intended to automate conversations by simulating a human conversation partner and can be integrated into software, such as online platforms, digital assistants, or be interfaced through messaging services.

Outside of education, typical applications of chatbots are in customer service ( Xu et al., 2017 ), counseling of hospital patients ( Vaidyam et al., 2019 ), or information services in smart speakers ( Ram et al., 2018 ). One central element of chatbots is the intent classification, also named the Natural Language Understanding (NLU) component, which is responsible for the sense-making of human input data. Looking at the current advances in chatbot software development, it seems that this technology’s goal is to pass the Turing Test ( Saygin et al., 2000 ) one day, which could make chatbots effective educational tools. Therefore, we ask ourselves “ Are we there yet? - Will we soon have an autonomous chatbot for every learner?”

To understand and underline the current need for research in the use of chatbots in education, we first examined the existing literature, focusing on comprehensive literature reviews. By looking at research questions in these literature reviews, we identified 21 different research topics and extracted findings accordingly. To structure research topics and findings in a comprehensible way, a three-stage clustering process was applied. While the first stage consisted of coding research topics by keywords, the second stage was applied to form overarching research categories ( Table 1 ). In the final stage, the findings within each research category were clustered to identify and structure commonalities within the literature reviews. The result is a concept map, which consists of four major categories. Those categories are CAT1. Applications of Chatbots, CAT2. Chatbot Designs, CAT3. Evaluation of Chatbots and CAT4. Educational Effects of Chatbots. To standardize the terminology and concepts applied, we present the findings of each category in a separate sub-section, respectively ( see Figure 1 , Figure 2 , Figure 3 , and Figure 4 ) and extended it with the outcomes of our own literature study that will be reported in the remaining parts of this article. Due to the size of the concept map a full version can be found in Appendix A .

www.frontiersin.org

TABLE 1 . Assignment of coded research topics identified in related literature reviews to research categories.

www.frontiersin.org

FIGURE 1 . Applications of chatbots in related literature reviews (CAT1).

www.frontiersin.org

FIGURE 2 . Chatbot designs in related literature reviews (CAT2).

www.frontiersin.org

FIGURE 3 . Evaluation of chatbots in related literature reviews (CAT3).

www.frontiersin.org

FIGURE 4 . Educational Effects of chatbots in related literature reviews (CAT4).

Regarding the applications of chatbots (CAT1), application clusters (AC) and application statistics (AS) have been described in the literature, which we visualized in Figure 1 . The study of ( Pérez et al., 2020 ) identifies two application clusters, defined through chatbot activities: “service-oriented chatbots” and “teaching-oriented chatbots.” ( Winkler and Soellner, 2018 ) identify applications clusters by naming the domains “health and well-being interventions,” “language learning,” “feedback and metacognitive thinking” as well as “motivation and self-efficacy.” Concerning application statistics (AS), ( Smutny and Schreiberova, 2020 ) found that nearly 47% of the analyzed chatbots incorporate informing actions, and 18% support language learning by elaborating on chatbots integrated into the social media platform Facebook. Besides, the chatbots studied had a strong tendency to use English, at 89%. This high number aligns with results from ( Pérez-Marín, 2021 ), where 75% of observed agents, as a related technology, were designed to interact in the English language. ( Pérez-Marín, 2021 ) also shows that 42% of the analyzed chatbots had mixed interaction modalities. Finally, ( Hobert and Meyer von Wolff, 2019 ) observed that only 25% of examined chatbots were incorporated in formal learning settings, the majority of published material focuses on student-chatbot interaction only and does not enable student-student communication, as well as nearly two-thirds of the analyzed chatbots center only on a single domain. Overall, we can summarize that so far there are six application clusters for chatbots for education categorized by chatbot activities or domains. The provided statistics allow for a clearer understanding regarding the prevalence of chatbots applications in education ( see Figure 1 ).

Regarding chatbot designs (CAT2), most of the research questions concerned with chatbots in education can be assigned to this category. We found three aspects in this category visualized in Figure 2 : Personality (PS), Process Pipeline (PP), and Design Classifications (DC). Within these, most research questions can be assigned to Design Classifications (DC), which are separated into Classification Aspects (DC2) and Classification Frameworks (DC1). One classification framework is defined through “flow chatbots,” “artificially intelligent chatbots,” “chatbots with integrated speech recognition,” as well as “chatbots with integrated context-data” by ( Winkler and Soellner, 2018 ). A second classification framework by ( Pérez-Marín, 2021 ) covers pedagogy, social, and HCI features of chatbots and agents, which themselves can be further subdivided into more detailed aspects. Other Classification Aspects (DC2) derived from several publications, provide another classification schema, which distinguishes between “retrieval vs. generative” based technology, the “ability to incorporate context data,” and “speech or text interface” ( Winkler and Soellner, 2018 ; Smutny and Schreiberova, 2020 ). By specifying text interfaces as “Button-Based” or “Keyword Recognition-Based” ( Smutny and Schreiberova, 2020 ), text interfaces can be subdivided. Furthermore, a comparison of speech and text interfaces ( Jung et al., 2020 ) shows that text interfaces have advantages for conveying information, and speech interfaces have advantages for affective support. The second aspect of CAT2 concerns the chatbot processing pipeline (PP), highlighting user interface and back-end importance ( Pérez et al., 2020 ). Finally, ( Jung et al., 2020 ) focuses on the third aspect, the personality of chatbots (PS). Here, the study derives four guidelines helpful in education: positive or neutral emotional expressions, a limited amount of animated or visual graphics, a well-considered gender of the chatbot, and human-like interactions. In summary, we have found in CAT2 three main design aspects for the development of chatbots. CAT2 is much more diverse than CAT1 with various sub-categories for the design of chatbots. This indicates the huge flexibility to design chatbots in various ways to support education.

Regarding the evaluation of chatbots (CAT3), we found three aspects assigned to this category, visualized in Figure 3 : Evaluation Criteria (EC), Evaluation Methods (EM), and Evaluation Instruments (EI). Concerning Evaluation Criteria, seven criteria can be identified in the literature. The first and most important in the educational field, according to ( Smutny and Schreiberova, 2020 ) is the evaluation of learning success ( Hobert, 2019a ), which can have subcategories such as how chatbots are embedded in learning scenarios ( Winkler and Soellner, 2018 ; Smutny and Schreiberova, 2020 ) and teaching efficiency ( Pérez et al., 2020 ). The second is acceptance, which ( Hobert, 2019a ) names as “acceptance and adoption” and ( Pérez et al., 2020 ) as “students’ perception.” Further evaluation criteria are motivation, usability, technical correctness, psychological, and further beneficial factors ( Hobert, 2019a ). These Evaluation Criteria show broad possibilities for the evaluation of chatbots in education. However, ( Hobert, 2019a ) found that most evaluations are limited to single evaluation criteria or narrower aspects of them. Moreover, ( Hobert, 2019a ) introduces a classification matrix for chatbot evaluations, which consists of the following Evaluation Methods (EM): Wizard-of-Oz approach, laboratory studies, field studies, and technical validations. In addition to this, ( Winkler and Soellner, 2018 ) recommends evaluating chatbots by their embeddedness into a learning scenario, a comparison of human-human and human-chatbot interactions, and comparing spoken and written communication. Instruments to measure these evaluation criteria were identified by ( Hobert, 2019a ) by naming quantitative surveys, qualitative interviews, transcripts of dialogues, and technical log files. Regarding CAT3, we found three main aspects for the evaluation of chatbots. We can conclude that this is a more balanced and structured distribution in comparison to CAT2, providing researchers with guidance for evaluating chatbots in education.

Regarding educational effects of chatbots (CAT4), we found two aspects visualized in Figure 4 : Effect Size (ES) and Beneficial Chatbot Features for Learning Success (BF). Concerning the effect size, ( Pérez et al., 2020 ) identified a strong dependency between learning and the related curriculum, while ( Winkler and Soellner, 2018 ) elaborate on general student characteristics that influence how students interact with chatbots. They state that students’ attitudes towards technology, learning characteristics, educational background, self-efficacy, and self-regulation skills affect these interactions. Moreover, the study emphasizes chatbot features, which can be regarded as beneficial in terms of learning outcomes (BF): “Context-Awareness,” “Proactive guidance by students,” “Integration in existing learning and instant messaging tools,” “Accessibility,” and “Response Time.” Overall, for CAT4, we found two main distinguishing aspects for chatbots, however, the reported studies vary widely in their research design, making high-level results hardly comparable.

Looking at the related work, many research questions for the application of chatbots in education remain. Therefore, we selected five goals to be further investigated in our literature review. Firstly, we were interested in the objectives for implementing chatbots in education (Goal 1), as the relevance of chatbots for applications within education seems to be not clearly delineated. Secondly, we aim to explore the pedagogical roles of chatbots in the existing literature (Goal 2) to understand how chatbots can take over tasks from teachers. ( Winkler and Soellner, 2018 ) and ( Pérez-Marín, 2021 ), identified research gaps for supporting meta-cognitive skills with chatbots such as self-regulation. This requires a chatbot application that takes a mentoring role, as the development of these meta-cognitive skills can not be achieved solely by information delivery. Within our review we incorporate this by reviewing the mentoring role of chatbots as (Goal 3). Another key element for a mentoring chatbot is adaptation to the learners needs. Therefore, (Goal 4) of our review lies in the investigation of the adaptation approaches used by chatbots in education. For (Goal 5), we want to extend the work of ( Winkler and Soellner, 2018 ) and ( Pérez et al., 2020 ) regarding Application Clusters (AC) and map applications by further investigating specific learning domains in which chatbots have been studied.

To delineate and map the field of chatbots in education, initial findings were collected by a preliminary literature search. One of the takeaways is that the emerging field around educational chatbots has seen much activity in the last two years. Based on the experience of this preliminary search, search terms, queries, and filters were constructed for the actual structured literature review. This structured literature review follows the PRISMA framework ( Liberati et al., 2009 ), a guideline for reporting systematic reviews and meta-analyses. The framework consists of an elaborated structure for systematic literature reviews and sets requirements for reporting information about the review process ( see section 3.2 to 3.4).

Research Questions

Contributing to the state-of-the-art, we investigate five aspects of chatbot applications published in the literature. We therefore guided our research with the following research questions:

RQ1: Which objectives for implementing chatbots in education can be identified in the existing literature?

RQ2: Which pedagogical roles of chatbots can be identified in the existing literature?

RQ3: Which application scenarios have been used to mentor students?

RQ4: To what extent are chatbots adaptable to personal students’ needs?

RQ5: What are the domains in which chatbots have been applied so far?

Sources of Information

As data sources, Scopus, Web of Science, Google Scholar, Microsoft Academics, and the educational research database “Fachportal Pädagogik” (including ERIC) were selected, all of which incorporate all major publishers and journals. In ( Martín-Martín et al., 2018 ) it was shown that for the social sciences only 29.8% and for engineering and computer science, 46.8% of relevant literature is included in all of the first three databases. For the topic of chatbots in education, a value between these two numbers can be assumed, which is why an approach of integrating several publisher-independent databases was employed here.

Search Criteria

Based on the findings from the initial related work search, we derived the following search query:

( Education OR Educational OR Learning OR Learner OR Student OR Teaching OR School OR University OR Pedagogical ) AND Chatbot.

It combines education-related keywords with the “chatbot” keyword. Since chatbots are related to other technologies, the initial literature search also considered keywords such as “pedagogical agents,” “dialogue systems,” or “bots” when composing the search query. However, these increased the number of irrelevant results significantly and were therefore excluded from the query in later searches.

Inclusion and Exclusion Criteria

The queries were executed on 23.12.2020 and applied twice to each database, first as a title search query and secondly as a keyword-based search. This resulted in a total of 3.619 hits, which were checked for duplicates resulting in 2.678 candidate publications. The overall search and filtering process is shown in Figure 5 .

www.frontiersin.org

FIGURE 5 . PRISMA flow chart.

In the case of Google Scholar, the number of results sorted by relevance per query was limited to 300, as this database also delivers many less relevant works. The value was determined by looking at the search results in detail using several queries to exclude as few relevant works as possible. This approach showed promising results and, at the same time, did not burden the literature list with irrelevant items.

The further screening consisted of a four-stage filtering process. First, eliminating duplicates in the results of title and keyword queries of all databases independently and second, excluding publications based on the title and abstract that:

• were not available in English

• did not describe a chatbot application

• were not mainly focused on learner-centered chatbots applications in schools or higher education institutions, which is according to the preliminary literature search the main application area within education.

Third, we applied another duplicate filter, this time for the merged set of publications. Finally, a filter based on the full text, excluding publications that were:

• limited to improve chatbots technically (e.g., publications that compare or develop new algorithms), as research questions presented in these publications were not seeking for additional insights on applications in education

• exclusively theoretical in nature (e.g., publications that discuss new research projects, implementation concepts, or potential use cases of chatbots in education), as they either do not contain research questions or hypotheses or do not provide conclusions from studies with learners.

After the first, second, and third filters, we identified 505 candidate publications. We continued our filtering process by reading the candidate publications’ full texts resulting in 74 publications that were used for our review. Compared to 3.619 initial database results, the proportion of relevant publications is therefore about 2.0%.

The final publication list can be accessed under https://bit.ly/2RRArFT .

To analyze the identified publications and derive results according to the research questions, full texts were coded, considering for each publication the objectives for implementing chatbots (RQ1), pedagogical roles of chatbots (RQ2), their mentoring roles (RQ3), adaptation of chatbots (RQ4), as well as their implementation domains in education (RQ5) as separated sets of codes. To this end, initial codes were identified by open coding and iteratively improved through comparison, group discussion among the authors, and subsequent code expansion. Further, codes were supplemented with detailed descriptions until a saturation point was reached, where all included studies could be successfully mapped to codes, suggesting no need for further refinement. As an example, codes for RQ2 (Pedagogical Roles) were adapted and refined in terms of their level of abstraction from an initial set of only two codes, 1 ) a code for chatbots in the learning role and 2 ) a code for chatbots in a service-oriented role. After coding a larger set of publications, it became clear that the code for service-oriented chatbots needed to be further distinguished. This was because it summarized e.g. automation activities with activities related to self-regulated learning and thus could not be distinguished sharply enough from the learning role. After refining the code set in the next iteration into a learning role, an assistance role, and a mentoring role, it was then possible to ensure the separation of the individual codes. In order to avoid defining new codes for singular or a very small number of publications, studies were coded as “other” (RQ1) or “not defined” (RQ2), if their occurrence was less than eight publications, representing less than 10% of the publications in the final paper list.

By grouping the resulting relevant publications according to their date of publication, it is apparent that chatbots in education are currently in a phase of increased attention. The release distribution shows slightly lower publication numbers in the current than in the previous year ( Figure 6 ), which could be attributed to a time lag between the actual publication of manuscripts and their dissemination in databases.

www.frontiersin.org

FIGURE 6 . Identified chatbot publications in education per year.

Applying the curve presented in Figure 6 to Gartner’s Hype Cycle ( Linden and Fenn, 2003 ) suggests that technology around chatbots in education may currently be in the “Innovation Trigger” phase. This phase is where many expectations are placed on the technology, but the practical in-depth experience is still largely lacking.

Objectives for Implementing Chatbots in Education

Regarding RQ1, we extracted implementation objectives for chatbots in education. By analyzing the selected publications we identified that most of the objectives for chatbots in education can be described by one of the following categories: Skill improvement, Efficiency of Education, and Students’ Motivation ( see Figure 7 ). First, the “improvement of a student’s skill” (or Skill Improvement ) objective that the chatbot is supposed to help with or achieve. Here, chatbots are mostly seen as a learning aid that supports students. It is the most commonly cited objective for chatbots. The second objective is to increase the Efficiency of Education in general. It can occur, for example, through the automation of recurring tasks or time-saving services for students and is the second most cited objective for chatbots. The third objective is to increase Students’ Motivation . Finally, the last objective is to increase the Availability of Education . This objective is intended to provide learning or counseling with temporal flexibility or without the limitation of physical presence. In addition, there are other, more diverse objectives for chatbots in education that are less easy to categorize. In cases of a publication indicating more than one objective, the publication was distributed evenly across the respective categories.

www.frontiersin.org

FIGURE 7 . Objectives for implementing chatbots identified in chatbot publications.

Given these results, we can summarize four major implementing objectives for chatbots. Of these, Skill Improvement is the most popular objective, constituting around one-third of publications (32%). Making up a quarter of all publications, Efficiency of Education is the second most popular objective (25%), while addressing Students’ Motivation and Availability of Education are third (13%) and fourth (11%), respectively. Other objectives also make up a substantial amount of these publications (19%), although they were too diverse to categorize in a uniform way. Examples of these are inclusivity ( Heo and Lee, 2019 ) or the promotion of student teacher interactions ( Mendoza et al., 2020 ).

Pedagogical Roles

Regarding RQ2, it is crucial to consider the use of chatbots in terms of their intended pedagogical role. After analyzing the selected articles, we were able to identify four different pedagogical roles: a supporting learning role, an assisting role, and a mentoring role.

In the supporting learning role ( Learning ), chatbots are used as an educational tool to teach content or skills. This can be achieved through a fixed integration into the curriculum, such as conversation tasks (L. K. Fryer et al., 2020 ). Alternatively, learning can be supported through additional offerings alongside classroom teaching, for example, voice assistants for leisure activities at home ( Bao, 2019 ). Examples of these are chatbots simulating a virtual pen pal abroad ( Na-Young, 2019 ). Conversations with this kind of chatbot aim to motivate the students to look up vocabulary, check their grammar, and gain confidence in the foreign language.

In the assisting role ( Assisting ), chatbot actions can be summarized as simplifying the student's everyday life, i.e., taking tasks off the student’s hands in whole or in part. This can be achieved by making information more easily available ( Sugondo and Bahana, 2019 ) or by simplifying processes through the chatbot’s automation ( Suwannatee and Suwanyangyuen, 2019 ). An example of this is the chatbot in ( Sandoval, 2018 ) that answers general questions about a course, such as an exam date or office hours.

In the mentoring role ( Mentoring ), chatbot actions deal with the student’s personal development. In this type of support, the student himself is the focus of the conversation and should be encouraged to plan, reflect or assess his progress on a meta-cognitive level. One example is the chatbot in ( Cabales, 2019 ), which helps students develop lifelong learning skills by prompting in-action reflections.

The distribution of each pedagogical role is shown in Figure 8 . From this, it can be seen that Learning is the most frequently used role of the examined publications (49%), followed by Assisting (20%) and Mentoring (15%). It should be noted that pedagogical roles were not identified for all the publications examined. The absence of a clearly defined pedagogical role (16%) can be attributed to the more general nature of these publications, e.g. focused on students’ small talk behaviors ( Hobert, 2019b ) or teachers’ attitudes towards chatbot applications in classroom teaching (P. K. Bii et al., 2018 ).

www.frontiersin.org

FIGURE 8 . Pedagogical roles identified in chatbot publications.

Looking at pedagogical roles in the context of objectives for implementing chatbots, relations among publications can be inspected in a relations graph ( Figure 9 ). According to our results, the strongest relation in the examined publications can be considered between Skill Improvement objective and the Learning role. This strong relation is partly because both, the Skill Improvement objective and the Learning role, are the largest in their respective categories. In addition, two other strong relations can be observed: Between the Students’ Motivation objective and the Learning role, as well as between Efficiency of Education objective and Assisting role.

www.frontiersin.org

FIGURE 9 . Relations graph of pedagogical roles and objectives for implementing chatbots.

By looking at other relations in more detail, there is surprisingly no relation between Skill Improvement as the most common implementation objective and Assisting , as the 2nd most common pedagogical role. Furthermore, it can be observed that the Mentoring role has nearly equal relations to all of the objectives for implementing chatbots.

The relations graph ( Figure 9 ) can interactively be explored through bit.ly/32FSKQM.

Mentoring Role

Regarding RQ3, we identified eleven publications that deal with chatbots in this regard. The Mentoring role in these publications can be categorized in two dimensions. Starting with the first dimension, the mentoring method, three methods can be observed:

• Scaffolding ( n = 7)

• Recommending ( n = 3)

• Informing ( n = 1)

An example of Scaffolding can be seen in ( Gabrielli et al., 2020 ), where the chatbot coaches students in life skills, while an example of Recommending can be seen in ( Xiao et al., 2019 ), where the chatbot recommends new teammates. Finally, Informing can be seen in ( Kerly et al., 2008 ), where the chatbot informs students about their personal Open Learner Model.

The second dimension is the addressed mentoring topic, where the following topics can be observed:

• Self-Regulated Learning ( n = 5)

• Life Skills ( n = 4)

• Learning Skills ( n = 2)

While Mentoring chatbots to support Self-Regulated Learning are intended to encourage students to reflect on and plan their learning progress, Mentoring chatbots to support Life Skills address general student’s abilities such as self-confidence or managing emotions. Finally, Mentoring chatbots to support Learning Skills , in contrast to Self-Regulated Learning , address only particular aspects of the learning process, such as new learning strategies or helpful learning partners. An example for Mentoring chatbots supporting Life Skill is the Logo counseling chatbot, which promotes healthy self-esteem ( Engel et al., 2020 ). CALMsystem is an example of a Self-Regulated Learning chatbot, which informs students about their data in an open learner model ( Kerly et al., 2008 ). Finally, there is the Learning Skills topic. Here, the MCQ Bot is an example that is designed to introduce students to transformative learning (W. Huang et al., 2019 ).

Regarding RQ4, we identified six publications in the final publication list that address the topic of adaptation. Within these publications, five adaptation approaches are described:

The first approach (A1) is proposed by ( Kerly and Bull, 2006 ) and ( Kerly et al., 2008 ), dealing with student discussions based on success and confidence during a quiz. The improvement of self-assessment is the primary focus of this approach. The second approach (A2) is presented in ( Jia, 2008 ), where the personality of the chatbot is adapted to motivate students to talk to the chatbot and, in this case, learn a foreign language. The third approach (A3), as shown in the work of ( Vijayakumar et al., 2019 ), is characterized by a chatbot that provides personalized formative feedback to learners based on their self-assessment, again in a quiz situation. Here, the focus is on Hattie and Timperley’s three guiding questions: “Where am I going?,” “How am I going?” and “Where to go next?” ( Hattie and Timperley, 2007 ). In the fourth approach (A4), exemplified in ( Ruan et al., 2019 ), the chatbot selects questions within a quiz. Here, the chatbot estimates the student’s ability and knowledge level based on the quiz progress and sets the next question accordingly. Finally, a similar approach (A5) is shown in ( Davies et al., 2020 ). In contrast to ( Ruan et al., 2019 ), this chatbot adapts the amount of question variation and takes psychological features into account which were measured by psychological tests before.

We examined these five approaches by organizing them according to their information sources and extracted learner information. The results can be seen in Table 2 .

www.frontiersin.org

TABLE 2 . Adaptation approaches of chatbots in education.

Four out of five adaptation approaches (A1, A3, A4, and A5) are observed in the context of quizzes. These adaptations within quizzes can be divided into two mainstreams: One is concerned about students’ feedback (A1 and A3), while the other is concerned about learning material selection (A4 and A5). The only different adaptation approach is shown in A2, which focuses on the adaptation of the chatbot personality within a language learning application.

Domains for Chatbots in Education

Regarding RQ5, we identified 20 domains of chatbots in education. These can broadly be divided by their pedagogical role into three domain categories (DC): Learning Chatbots , Assisting Chatbots , and Mentoring Chatbots . The remaining publications are grouped in the Other Research domain category. The complete list of identified domains can be seen in Table 3 .

www.frontiersin.org

TABLE 3 . Domains of chatbots in education.

The domain category Learning Chatbots , which deals with chatbots incorporating the pedagogical role Learning , can be subdivided into seven domains: 1 ) Language Learning , 2 ) Learn to Program , 3 ) Learn Communication Skills , 4 ) Learn about Educational Technologies , 5 ) Learn about Cultural Heritage , 6 ) Learn about Laws , and 7 ) Mathematics Learning . With more than half of publications (53%), chatbots for Language Learning play a prominent role in this domain category. They are often used as chat partners to train conversations or to test vocabulary. An example of this can be seen in the work of ( Bao, 2019 ), which tries to mitigate foreign language anxiety by chatbot interactions in foreign languages.

The domain category Assisting Chatbots , which deals with chatbots incorporating the pedagogical role Assisting , can be subdivided into four domains: 1 ) Administrative Assistance , 2 ) Campus Assistance , 3 ) Course Assistance , and 4 ) Library Assistance . With one-third of publications (33%), chatbots in the Administrative Assistance domain that help to overcome bureaucratic hurdles at the institution, while providing round-the-clock services, are the largest group in this domain category. An example of this can be seen in ( Galko et al., 2018 ), where the student enrollment process is completely shifted to a conversation with a chatbot.

The domain category Mentoring Chatbots , which deals with chatbots incorporating the pedagogical role Mentoring , can be subdivided into three domains: 1 ) Scaffolding Chatbots , 2 ) Recommending Chatbots , and 3 ) Informing Chatbots . An example of a Scaffolding Chatbots is the CRI(S) chatbot ( Gabrielli et al., 2020 ), which supports life skills such as self-awareness or conflict resolution in discussion with the student by promoting helpful ideas and tricks.

The domain category Other Research , which deals with chatbots not incorporating any of these pedagogical roles, can be subdivided into three domains: 1 ) General Chatbot Research in Education , 2 ) Indian Educational System , and 3 ) Chatbot Interfaces . The most prominent domain, General Chatbot Research , cannot be classified in one of the other categories but aims to explore cross-cutting issues. An example for this can be seen in the publication of ( Hobert, 2020 ), which researches the importance of small talk abilities of chatbots in educational settings.

Discussions

In this paper, we investigated the state-of-the-art of chatbots in education according to five research questions. By combining our results with previously identified findings from related literature reviews, we proposed a concept map of chatbots in education. The map, reported in Appendix A , displays the current state of research regarding chatbots in education with the aim of supporting future research in the field.

Answer to Research Questions

Concerning RQ1 (implementation objectives), we identified four major objectives: 1 ) Skill Improvement , 2 ) Efficiency of Education , 3 ) Students’ Motivation, and 4 ) Availability of Education . These four objectives cover over 80% of the analyzed publications ( see Figure 7 ). Based on the findings on CAT3 in section 2, we see a mismatch between the objectives for implementing chatbots compared to their evaluation. Most researchers only focus on narrow aspects for the evaluation of their chatbots such as learning success, usability, and technology acceptance. This mismatch of implementation objectives and suitable evaluation approaches is also well known by other educational technologies such as Learning Analytics dashboards ( Jivet et al., 2017 ). A more structured approach of aligning implementation objectives and evaluation procedures is crucial to be able to properly assess the effectiveness of chatbots. ( Hobert, 2019a ), suggested a structured four-stage evaluation procedure beginning with a Wizard-of-Oz experiment, followed by technical validation, a laboratory study, and a field study. This evaluation procedure systematically links hypotheses with outcomes of chatbots helping to assess chatbots for their implementation objectives. “Aligning chatbot evaluations with implementation objectives” is, therefore, an important challenge to be addressed in the future research agenda.

Concerning RQ2 (pedagogical roles), our results show that chatbots’ pedagogical roles can be summarized as Learning , Assisting , and Mentoring . The Learning role is the support in learning or teaching activities such as gaining knowledge. The Assisting role is the support in terms of simplifying learners’ everyday life, e.g. by providing opening times of the library. The Mentoring role is the support in terms of students’ personal development, e.g. by supporting Self-Regulated Learning. From a pedagogical standpoint, all three roles are essential for learners and should therefore be incorporated in chatbots. These pedagogical roles are well aligned with the four implementation objectives reported in RQ1. While Skill Improvement and Students’ Motivation is strongly related to Learning , Efficiency of Education is strongly related to Assisting . The Mentoring role instead, is evenly related to all of the identified objectives for implementing chatbots. In the reviewed publications, chatbots are therefore primarily intended to 1 ) improve skills and motivate students by supporting learning and teaching activities, 2 ) make education more efficient by providing relevant administrative and logistical information to learners, and 3 ) support multiple effects by mentoring students.

Concerning RQ3 (mentoring role), we identified three main mentoring method categories for chatbots: 1 ) Scaffolding , 2 ) Recommending , and 3 ) Informing . However, comparing the current mentoring of chatbots reported in the literature with the daily mentoring role of teachers, we can summarize that the chatbots are not at the same level. In order to take over mentoring roles of teachers ( Wildman et al., 1992 ), a chatbot would need to fulfill some of the following activities in their mentoring role. With respect to 1 ) Scaffolding , chatbots should provide direct assistance while learning new skills and especially direct beginners in their activities. Regarding 2 ) Recommending , chatbots should provide supportive information, tools or other materials for specific learning tasks to life situations. With respect to 3 ) Informing, chatbots should encourage students according to their goals and achievements, and support them to develop meta-cognitive skills like self-regulation. Due to the mismatch of teacher vs. chatbot mentoring we see here another research challenge, which we call “Exploring the potential of chatbots for mentoring students.”

Regarding RQ4 (adaptation), only six publications were identified that discuss an adaptation of chatbots, while four out of five adaptation approaches (A1, A3, A4, and A5) show similarities by being applied within quizzes. In the context of educational technologies, providing reasonable adaptations for learners requires a high level of experience. Based on our results, the research on chatbots does not seem to be at this point yet. Looking at adaptation literature like ( Brusilovsky, 2001 ) or ( Benyon and Murray, 1993 ), it becomes clear that a chatbot needs to consider the learners’ personal information to fulfill the requirement of the adaptation definition. Personal information must be retrieved and stored at least temporarily, in some sort of learner model. For learner information like knowledge and interest, adaptations seem to be barely explored in the reviewed publications, while the model of ( Brusilovsky and Millán, 2007 ) points out further learner information, which can be used to make chatbots more adaptive: personal goals, personal tasks, personal background, individual traits, and the learner’s context. We identify research in this area as a third future challenge and call it the “Exploring and leveraging adaptation capabilities of chatbots” challenge.

In terms of RQ5 (domains), we identified a detailed map of domains applying chatbots in education and their distribution ( see Table 3 ). By systematically analyzing 74 publications, we identified 20 domains and structured them according to the identified pedagogical role into four domain categories: Learning Chatbots , Assisting Chatbots , Mentoring Chatbots , and Other Research . These results extend the taxonomy of Application Clusters (AC) for chatbots in education, which previously comprised the work from ( Pérez et al., 2020 ), who took the chatbot activity as characteristic, and ( Winkler and Soellner, 2018 ), who characterized the chatbots by domains. It draws relationships between these two types of Application Clusters (AC) and structures them accordingly. Our structure incorporates Mentoring Chatbots and Other Research in addition to the “service-oriented chatbots” (cf. Assisting Chatbots ) and “teaching-oriented chatbots” (cf. Learning Chatbots ) identified by (Perez). Furthermore, the strong tendencies of informing students already mentioned by ( Smutny and Schreiberova, 2020 ) can also be recognized in our results, especially in Assisting Chatbots . Compared to ( Winkler and Soellner, 2018 ), we can confirm the prominent domains of “language learning” within Learning Chatbots and “metacognitive thinking” within Mentoring Chatbots . Moreover, through Table 3 , a more detailed picture of chatbot applications in education is reflected, which could help researchers to find similar works or unexplored application areas.

Limitations

One important limitation to be mentioned here is the exclusion of alternative keywords for our search queries, as we exclusively used chatbot as keyword in order to avoid search results that do not fit our research questions. Though we acknowledge that chatbots share properties with pedagogical agents, dialog systems, and bots, we carefully considered this trade-off between missing potentially relevant work and inflating our search procedure by including related but not necessarily pertinent work. A second limitation may lie in the formation of categories and coding processes applied, which, due to the novelty of the findings, could not be built upon theoretical frameworks or already existing code books. Although we have focused on ensuring that codes used contribute to a strong understanding, the determination of the abstraction level might have affected the level of detail of the resulting data representation.

In this systematic literature review, we explored the current landscape of chatbots in education. We analyzed 74 publications, identified 20 domains of chatbots and grouped them based on their pedagogical roles into four domain categories. These pedagogical roles are the supporting learning role ( Learning ), the assisting role ( Assisting ), and the mentoring role ( Mentoring ). By focusing on objectives for implementing chatbots, we identified four main objectives: 1 ) Skill Improvement , 2 ) Efficiency of Education , 3 ) Students’ Motivation, and 4 ) Availability of Education . As discussed in section 5, these objectives do not fully align with the chosen evaluation procedures. We focused on the relations between pedagogical roles and objectives for implementing chatbots and identified three main relations: 1 ) chatbots to improve skills and motivate students by supporting learning and teaching activities, 2 ) chatbots to make education more efficient by providing relevant administrative and logistical information to learners, and 3 ) chatbots to support multiple effects by mentoring students. We focused on chatbots incorporating the Mentoring role and found that these chatbots are mostly concerned with three mentoring topics 1 ) Self-Regulated Learning , 2 ) Life Skills , and 3 ) Learning Skills and three mentoring methods 1 ) Scaffolding , 2 ) Recommending , and 3 ) Informing . Regarding chatbot adaptations, only six publications with adaptations were identified. Furthermore, the adaptation approaches found were mostly limited to applications within quizzes and thus represent a research gap.

Based on these outcomes we consider three challenges for chatbots in education that offer future research opportunities:

Challenge 1: Aligning chatbot evaluations with implementation objectives . Most chatbot evaluations focus on narrow aspects to measure the tool’s usability, acceptance or technical correctness. If chatbots should be considered as learning aids, student mentors, or facilitators, the effects on the cognitive, and emotional levels should also be taken into account for the evaluation of chatbots. This finding strengthens our conclusion that chatbot development in education is still driven by technology, rather than having a clear pedagogical focus of improving and supporting learning.

Challenge 2: Exploring the potential of chatbots for mentoring students . In order to better understand the potentials of chatbots to mentor students, more empirical studies on the information needs of learners are required. It is obvious that these needs differ from schools to higher education. However, so far there are hardly any studies investigating the information needs with respect to chatbots nor if chatbots address these needs sufficiently.

Challenge 3: Exploring and leveraging adaptation capabilities of chatbots . There is a large literature on adaptation capabilities of educational technologies. However, we have seen very few studies on the effect of adaptation of chatbots for education purposes. As chatbots are foreseen as systems that should personally support learners, the area of adaptable interactions of chatbots is an important research aspect that should receive more attention in the near future.

By addressing these challenges, we believe that chatbots can become effective educational tools capable of supporting learners with informative feedback. Therefore, looking at our results and the challenges presented, we conclude, “No, we are not there yet!” - There is still much to be done in terms of research on chatbots in education. Still, development in this area seems to have just begun to gain momentum and we expect to see new insights in the coming years.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Author Contributions

SW, JS†, DM†, JW†, MR, and HD.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbasi, S., Kazi, H., and Hussaini, N. N. (2019). Effect of Chatbot Systems on Student’s Learning Outcomes. Sylwan 163(10).

CrossRef Full Text

Abbasi, S., and Kazi, H. (2014). Measuring Effectiveness of Learning Chatbot Systems on Student's Learning Outcome and Memory Retention. Asian J. Appl. Sci. Eng. 3, 57. doi:10.15590/AJASE/2014/V3I7/53576

CrossRef Full Text | Google Scholar

Almahri, F. A. J., Bell, D., and Merhi, M. (2020). “Understanding Student Acceptance and Use of Chatbots in the United Kingdom Universities: A Structural Equation Modelling Approach,” in 2020 6th IEEE International Conference on Information Management, ICIM 2020 , London, United Kingdom , March 27–29, 2020 , (IEEE), 284–288. doi:10.1109/ICIM49319.2020.244712

Bao, M. (2019). Can Home Use of Speech-Enabled Artificial Intelligence Mitigate Foreign Language Anxiety - Investigation of a Concept. Awej 5, 28–40. doi:10.24093/awej/call5.3

Benyon, D., and Murray, D. (1993). Applying User Modeling to Human-Computer Interaction Design. Artif. Intell. Rev. 7 (3-4), 199–225. doi:10.1007/BF00849555

Bii, P. K., Too, J. K., and Mukwa, C. W. (2018). Teacher Attitude towards Use of Chatbots in Routine Teaching. Univers. J. Educ. Res. . 6 (7), 1586–1597. doi:10.13189/ujer.2018.060719

Bii, P., Too, J., and Langat, R. (2013). An Investigation of Student’s Attitude Towards the Use of Chatbot Technology in Instruction: The Case of Knowie in a Selected High School. Education Research 4, 710–716. doi:10.14303/er.2013.231

Google Scholar

Bos, A. S., Pizzato, M. C., Vettori, M., Donato, L. G., Soares, P. P., Fagundes, J. G., et al. (2020). Empirical Evidence During the Implementation of an Educational Chatbot with the Electroencephalogram Metric. Creative Education 11, 2337–2345. doi:10.4236/CE.2020.1111171

Brusilovsky, P. (2001). Adaptive Hypermedia. User Model. User-Adapted Interaction 11 (1), 87–110. doi:10.1023/a:1011143116306

Brusilovsky, P., and Millán, E. (2007). “User Models for Adaptive Hypermedia and Adaptive Educational Systems,” in The Adaptive Web: Methods and Strategies of Web Personalization . Editors P. Brusilovsky, A. Kobsa, and W. Nejdl. Berlin: Springer , 3–53. doi:10.1007/978-3-540-72079-9_1

Cabales, V. (2019). “Muse: Scaffolding metacognitive reflection in design-based research,” in CHI EA’19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems , Glasgow, Scotland, United Kingdom , May 4–9, 2019 , (ACM), 1–6. doi:10.1145/3290607.3308450

Carayannopoulos, S. (2018). Using Chatbots to Aid Transition. Int. J. Info. Learn. Tech. 35, 118–129. doi:10.1108/IJILT-10-2017-0097

Chan, C. H., Lee, H. L., Lo, W. K., and Lui, A. K.-F. (2018). Developing a Chatbot for College Student Programme Advisement. in 2018 International Symposium on Educational Technology, ISET 2018 , Osaka, Japan , July 31–August 2, 2018 . Editors F. L. Wang, C. Iwasaki, T. Konno, O. Au, and C. Li, (IEEE), 52–56. doi:10.1109/ISET.2018.00021

Chang, M.-Y., and Hwang, J.-P. (2019). “Developing Chatbot with Deep Learning Techniques for Negotiation Course,” in 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019 , Toyama, Japan , July 7–11, 2019 , (IEEE), 1047–1048. doi:10.1109/IIAI-AAI.2019.00220

Chen, C.-A., Yang, Y.-T., Wu, S.-M., Chen, H.-C., Chiu, K.-C., Wu, J.-W., et al. (2018). “A Study of Implementing AI Chatbot in Campus Consulting Service”, in TANET 2018-Taiwan Internet Seminar , 1714–1719. doi:10.6861/TANET.201810.0317

Chen, H.-L., Widarso, G. V., and Sutrisno, H. (2020). A ChatBot for Learning Chinese: Learning Achievement and Technology Acceptance. J. Educ. Comput. Res. 58 (6), 1161–1189. doi:10.1177/0735633120929622

Daud, S. H. M., Teo, N. H. I., and Zain, N. H. M. (2020). E-java Chatbot for Learning Programming Language: A post-pandemic Alternative Virtual Tutor. Int. J. Emerging Trends Eng. Res. 8(7). 3290–3298. doi:10.30534/ijeter/2020/67872020

Davies, J. N., Verovko, M., Verovko, O., and Solomakha, I. (2020). “Personalization of E-Learning Process Using Ai-Powered Chatbot Integration,” in Selected Papers of 15th International Scientific-practical Conference, MODS, 2020: Advances in Intelligent Systems and Computing , Chernihiv, Ukraine , June 29–July 01, 2020 . Editors S. Shkarlet, A. Morozov, and A. Palagin, ( Springer ) Vol. 1265, 209–216. doi:10.1007/978-3-030-58124-4_20

Diachenko, A. V., Morgunov, B. P., Melnyk, T. P., Kravchenko, O. I., and Zubchenko, L. V. (2019). The Use of Innovative Pedagogical Technologies for Automation of the Specialists' Professional Training. Int. J. Hydrogen. Energy. 8, 288–295. doi:10.5430/ijhe.v8n6p288

Dibitonto, M., Leszczynska, K., Tazzi, F., and Medaglia, C. M. (2018). “Chatbot in a Campus Environment: Design of Lisa, a Virtual Assistant to Help Students in Their university Life,” in 20th International Conference, HCI International 2018 , Las Vegas, NV, USA , July 15–20, 2018 , Lecture Notes in Computer Science. Editors M. Kurosu, (Springer), 103–116. doi:10.1007/978-3-319-91250-9

Durall, E., and Kapros, E. (2020). “Co-design for a Competency Self-Assessment Chatbot and Survey in Science Education,” in 7th International Conference, LCT 2020, Held as Part of the 22nd HCI International Conference, HCII 2020 , Copenhagen, Denmark , July 19–24, 2020 , Lecture Notes in Computer Science. Editors P. Zaphiris, and A. Ioannou, Berlin: Springer Vol. 12206, 13–23. doi:10.1007/978-3-030-50506-6_2

Duval, E., and Verbert, K. (2012). Learning Analytics. Eleed 8 (1).

Engel, J. D., Engel, V. J. L., and Mailoa, E. (2020). Interaction Monitoring Model of Logo Counseling Website for College Students' Healthy Self-Esteem, I. J. Eval. Res. Educ. 9, 607–613. doi:10.11591/ijere.v9i3.20525

Febriani, G. A., and Agustia, R. D. (2019). Development of Line Chatbot as a Learning Media for Mathematics National Exam Preparation. Elibrary.Unikom.Ac.Id . https://elibrary.unikom.ac.id/1130/14/UNIKOM_GISTY%20AMELIA%20FEBRIANI_JURNAL%20DALAM%20BAHASA%20INGGRIS.pdf .

Ferguson, R., and Sharples, M. (2014). “Innovative Pedagogy at Massive Scale: Teaching and Learning in MOOCs,” in 9th European Conference on Technology Enhanced Learning, EC-TEL 2014 , Graz, Austria , September 16–19, 2014 , Lecture Notes in Computer Science. Editors C. Rensing, S. de Freitas, T. Ley, and P. J. Muñoz-Merino, ( Berlin : Springer) Vol. 8719, 98–111. doi:10.1007/978-3-319-11200-8_8

Fryer, L. K., Ainley, M., Thompson, A., Gibson, A., and Sherlock, Z. (2017). Stimulating and Sustaining Interest in a Language Course: An Experimental Comparison of Chatbot and Human Task Partners. Comput. Hum. Behav. 75, 461–468. doi:10.1016/j.chb.2017.05.045

Fryer, L. K., Nakao, K., and Thompson, A. (2019). Chatbot Learning Partners: Connecting Learning Experiences, Interest and Competence. Comput. Hum. Behav. 93, 279–289. doi:10.1016/j.chb.2018.12.023

Fryer, L. K., Thompson, A., Nakao, K., Howarth, M., and Gallacher, A. (2020). Supporting Self-Efficacy Beliefs and Interest as Educational Inputs and Outcomes: Framing AI and Human Partnered Task Experiences. Learn. Individual Differences , 80. doi:10.1016/j.lindif.2020.101850

Gabrielli, S., Rizzi, S., Carbone, S., and Donisi, V. (2020). A Chatbot-Based Coaching Intervention for Adolescents to Promote Life Skills: Pilot Study. JMIR Hum. Factors 7 (1). doi:10.2196/16762

PubMed Abstract | CrossRef Full Text | Google Scholar

Galko, L., Porubän, J., and Senko, J. (2018). “Improving the User Experience of Electronic University Enrollment,” in 16th IEEE International Conference on Emerging eLearning Technologies and Applications, ICETA 2018 , Stary Smokovec, Slovakia , Nov 15–16, 2018 . Editors F. Jakab, (Piscataway, NJ: IEEE ), 179–184. doi:10.1109/ICETA.2018.8572054

Goda, Y., Yamada, M., Matsukawa, H., Hata, K., and Yasunami, S. (2014). Conversation with a Chatbot before an Online EFL Group Discussion and the Effects on Critical Thinking. J. Inf. Syst. Edu. 13, 1–7. doi:10.12937/EJSISE.13.1

Graesser, A. C., VanLehn, K., Rose, C. P., Jordan, P. W., and Harter, D. (2001). Intelligent Tutoring Systems with Conversational Dialogue. AI Mag. 22 (4), 39–51. doi:10.1609/aimag.v22i4.1591

Greller, W., and Drachsler, H. (2012). Translating Learning into Numbers: A Generic Framework for Learning Analytics. J. Educ. Tech. Soc. 15 (3), 42–57. doi:10.2307/jeductechsoci.15.3.42

Haristiani, N., and Rifa’i, M. M. Combining Chatbot and Social Media: Enhancing Personal Learning Environment (PLE) in Language Learning. Indonesian J Sci Tech. 5 (3), 487–506. doi:10.17509/ijost.v5i3.28687

Hattie, J., and Timperley, H. (2007). The Power of Feedback. Rev. Educ. Res. 77 (1), 81–112. doi:10.3102/003465430298487

Hattie, J. (2009). Visible Learning: A Synthesis of over 800 Meta-Analyses Relating to Achievement . Abingdon, UK: Routledge .

Heller, B., Proctor, M., Mah, D., Jewell, L., and Cheung, B. (2005). “Freudbot: An Investigation of Chatbot Technology in Distance Education,” in Proceedings of ED-MEDIA 2005–World Conference on Educational Multimedia, Hypermedia and Telecommunications , Montréal, Canada , June 27–July 2, 2005 . Editors P. Kommers, and G. Richards, ( AACE ), 3913–3918.

Heo, J., and Lee, J. (2019). “CiSA: An Inclusive Chatbot Service for International Students and Academics,” in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science , Orlando, FL, USA , July 26–31, 2019 . Editors C. Stephanidis, ( Springer ) 11786, 153–167. doi:10.1007/978-3-030-30033-3

Hobert, S. (2019a). “How Are You, Chatbot? Evaluating Chatbots in Educational Settings - Results of a Literature Review,” in 17. Fachtagung Bildungstechnologien, DELFI 2019 - 17th Conference on Education Technologies, DELFI 2019 , Berlin, Germany , Sept 16–19, 2019 . Editors N. Pinkwart, and J. Konert, 259–270. doi:10.18420/delfi2019_289

Hobert, S., and Meyer von Wolff, R. (2019). “Say Hello to Your New Automated Tutor - A Structured Literature Review on Pedagogical Conversational Agents,” in 14th International Conference on Wirtschaftsinformatik , Siegen, Germany , Feb 23–27, 2019 . Editors V. Pipek, and T. Ludwig, ( AIS ).

Hobert, S. (2019b). Say Hello to ‘Coding Tutor’! Design and Evaluation of a Chatbot-Based Learning System Supporting Students to Learn to Program in International Conference on Information Systems (ICIS) 2019 Conference , Munich, Germany , Dec 15–18, 2019 , AIS 2661, 1–17.

Hobert, S. (2020). Small Talk Conversations and the Long-Term Use of Chatbots in Educational Settings ‐ Experiences from a Field Study in 3rd International Workshop on Chatbot Research and Design, CONVERSATIONS 2019 , Amsterdam, Netherlands , November 19–20 : Lecture Notes in Computer Science. Editors A. Folstad, T. Araujo, S. Papadopoulos, E. Law, O. Granmo, E. Luger, and P. Brandtzaeg, ( Springer ) 11970, 260–272. doi:10.1007/978-3-030-39540-7_18

Hsieh, S.-W. (2011). Effects of Cognitive Styles on an MSN Virtual Learning Companion System as an Adjunct to Classroom Instructions. Edu. Tech. Society 2, 161–174.

Huang, J.-X., Kwon, O.-W., Lee, K.-S., and Kim, Y.-K. (2018). Improve the Chatbot Performance for the DB-CALL System Using a Hybrid Method and a Domain Corpus in Future-proof CALL: language learning as exploration and encounters–short papers from EUROCALL 2018 , Jyväskylä, Finland , Aug 22–25, 2018 . Editors P. Taalas, J. Jalkanen, L. Bradley, and S. Thouësny, ( Research-publishing.net ). doi:10.14705/rpnet.2018.26.820

Huang, W., Hew, K. F., and Gonda, D. E. (2019). Designing and Evaluating Three Chatbot-Enhanced Activities for a Flipped Graduate Course. Int. J. Mech. Engineer. Robotics. Research. 813–818. doi:10.18178/ijmerr.8.5.813-818

Ismail, M., and Ade-Ibijola, A. (2019). “Lecturer's Apprentice: A Chatbot for Assisting Novice Programmers,”in Proceedings - 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC) , Vanderbijlpark, South Africa , (IEEE), 1–8. doi:10.1109/IMITEC45504.2019.9015857

Jia, J. (2008). “Motivate the Learners to Practice English through Playing with Chatbot CSIEC,” in 3rd International Conference on Technologies for E-Learning and Digital Entertainment, Edutainment 2008 , Nanjing, China , June 25–27, 2008 , Lecture Notes in Computer Science, (Springer) 5093, 180–191. doi:10.1007/978-3-540-69736-7_20

Jia, J. (2004). “The Study of the Application of a Keywords-Based Chatbot System on the Teaching of Foreign Languages,” in Proceedings of SITE 2004--Society for Information Technology and Teacher Education International Conference , Atlanta, Georgia, USA . Editors R. Ferdig, C. Crawford, R. Carlsen, N. Davis, J. Price, R. Weber, and D. Willis, (AACE), 1201–1207.

Jivet, I., Scheffel, M., Drachsler, H., and Specht, M. (2017). “Awareness is not enough: Pitfalls of learning analytics dashboards in the educational practice,” in 12th European Conference on Technology Enhanced Learning, EC-TEL 2017 , Tallinn, Estonia , September 12–15, 2017 , Lecture Notes in ComputerScience. Editors E. Lavoué, H. Drachsler, K. Verbert, J. Broisin, and M. Pérez-Sanagustín, (Springer), 82–96. doi:10.1007/978-3-319-66610-5_7

Jung, H., Lee, J., and Park, C. (2020). Deriving Design Principles for Educational Chatbots from Empirical Studies on Human-Chatbot Interaction. J. Digit. Contents Society , 21, 487–493. doi:10.9728/dcs.2020.21.3.487

Kerly, A., and Bull, S. (2006). “The Potential for Chatbots in Negotiated Learner Modelling: A Wizard-Of-Oz Study,” in 8th International Conference on Intelligent Tutoring Systems, ITS 2006 , Jhongli, Taiwan , June 26–30, 2006 , Lecture Notes in Computer Science. Editors M. Ikeda, K. D. Ashley, and T. W. Chan, ( Springer ) 4053, 443–452. doi:10.1007/11774303

Kerly, A., Ellis, R., and Bull, S. (2008). CALMsystem: A Conversational Agent for Learner Modelling. Knowledge-Based Syst. 21, 238–246. doi:10.1016/j.knosys.2007.11.015

Kerly, A., Hall, P., and Bull, S. (2007). Bringing Chatbots into Education: Towards Natural Language Negotiation of Open Learner Models. Knowledge-Based Syst. , 20, 177–185. doi:10.1016/j.knosys.2006.11.014

Kumar, M. N., Chandar, P. C. L., Prasad, A. V., and Sumangali, K. (2016). “Android Based Educational Chatbot for Visually Impaired People,” in 2016 IEEE International Conference on Computational Intelligence and Computing Research , Chennai, India , December 15–17, 2016 , 1–4. doi:10.1109/ICCIC.2016.7919664

Lee, K., Jo, J., Kim, J., and Kang, Y. (2019). Can Chatbots Help Reduce the Workload of Administrative Officers? - Implementing and Deploying FAQ Chatbot Service in a University in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science , Orlando, FL, USA , July 26–31, 2019 . Editors C. Stephanidis, ( Springer ) 1032, 348–354. doi:10.1007/978-3-030-23522-2

Lester, J. C., Converse, S. A., Kahler, S. E., Barlow, S. T., Stone, B. A., and Bhogal, R. S. (1997). “The Persona Effect: Affective Impact of Animated Pedagogical Agents,” in Proceedings of the ACM SIGCHI Conference on Human factors in computing systems , Atlanta, Georgia, USA , March 22–27, 1997 , (ACM), 359–366.

Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., et al. (2009). The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies that Evaluate Health Care Interventions: Explanation and Elaboration. J. Clin. Epidemiol. 62 (10), e1–e34. doi:10.1016/j.jclinepi.2009.06.006

Lin, M. P.-C., and Chang, D. (2020). Enhancing Post-secondary Writers’ Writing Skills with a Chatbot. J. Educ. Tech. Soc. 23, 78–92. doi:10.2307/26915408

Lin, Y.-H., and Tsai, T. (2019). “A Conversational Assistant on Mobile Devices for Primitive Learners of Computer Programming,” in TALE 2019 - 2019 IEEE International Conference on Engineering, Technology and Education , Yogyakarta, Indonesia , December 10–13, 2019 , (IEEE), 1–4. doi:10.1109/TALE48000.2019.9226015

Linden, A., and Fenn, J. (2003). Understanding Gartner’s Hype Cycles. Strategic Analysis Report No. R-20-1971 8. Stamford, CT: Gartner, Inc .

Liu, Q., Huang, J., Wu, L., Zhu, K., and Ba, S. (2020). CBET: Design and Evaluation of a Domain-specific Chatbot for mobile Learning. Univ. Access Inf. Soc. , 19, 655–673. doi:10.1007/s10209-019-00666-x

Mamani, J. R. C., Álamo, Y. J. R., Aguirre, J. A. A., and Toledo, E. E. G. (2019). “Cognitive Services to Improve User Experience in Searching for Academic Information Based on Chatbot,” in Proceedings of the 2019 IEEE 26th International Conference on Electronics, Electrical Engineering and Computing (INTERCON) , Lima, Peru , August 12–14, 2019 , (IEEE), 1–4. doi:10.1109/INTERCON.2019.8853572

Martín-Martín, A., Orduna-Malea, E., Thelwall, M., and Delgado López-Cózar, E. (2018). Google Scholar, Web of Science, and Scopus: A Systematic Comparison of Citations in 252 Subject Categories. J. Informetrics 12 (4), 1160–1177. doi:10.1016/j.joi.2018.09.002

Matsuura, S., and Ishimura, R. (2017). Chatbot and Dialogue Demonstration with a Humanoid Robot in the Lecture Class, in 11th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2017, held as part of the 19th International Conference on Human-Computer Interaction, HCI 2017 , Vancouver, Canada , July 9–14, 2017 , Lecture Notes in Computer Science. Editors M. Antona, and C. Stephanidis, (Springer) Vol. 10279, 233–246. doi:10.1007/978-3-319-58700-4

Matsuura, S., and Omokawa, R. (2020). Being Aware of One’s Self in the Auto-Generated Chat with a Communication Robot in UAHCI 2020 , 477–488. doi:10.1007/978-3-030-49282-3

McLoughlin, C., and Oliver, R. (1998). Maximising the Language and Learning Link in Computer Learning Environments. Br. J. Educ. Tech. 29 (2), 125–136. doi:10.1111/1467-8535.00054

Mendoza, S., Hernández-León, M., Sánchez-Adame, L. M., Rodríguez, J., Decouchant, D., and Meneses-Viveros, A. (2020). “Supporting Student-Teacher Interaction through a Chatbot,” in 7th International Conference, LCT 2020, Held as Part of the 22nd HCI International Conference, HCII 2020 , Copenhagen, Denmark , July 19–24, 2020 , Lecture Notes in Computer Science. Editors P. Zaphiris, and A. Ioannou, ( Springer ) 12206, 93–107. doi:10.1007/978-3-030-50506-6

Meyer, V., Wolff, R., Nörtemann, J., Hobert, S., and Schumann, M. (2020). “Chatbots for the Information Acquisition at Universities ‐ A Student’s View on the Application Area,“in 3rd International Workshop on Chatbot Research and Design, CONVERSATIONS 2019 , Amsterdam, Netherlands , November 19–20 , Lecture Notes in Computer Science. Editors A. Folstad, T. Araujo, S. Papadopoulos, E. Law, O. Granmo, E. Luger, and P. Brandtzaeg, (Springer) 11970, 231–244. doi:10.1007/978-3-030-39540-7

Na-Young, K. (2018c). A Study on Chatbots for Developing Korean College Students’ English Listening and Reading Skills. J. Digital Convergence 16. 19–26. doi:10.14400/JDC.2018.16.8.019

Na-Young, K. (2019). A Study on the Use of Artificial Intelligence Chatbots for Improving English Grammar Skills. J. Digital Convergence 17, 37–46. doi:10.14400/JDC.2019.17.8.037

Na-Young, K. (2018a). Chatbots and Korean EFL Students’ English Vocabulary Learning. J. Digital Convergence 16. 1–7. doi:10.14400/JDC.2018.16.2.001

Na-Young, K. (2018b). Different Chat Modes of a Chatbot and EFL Students’ Writing Skills Development . 1225–4975. doi:10.16933/sfle.2017.32.1.263

Na-Young, K. (2017). Effects of Different Types of Chatbots on EFL Learners’ Speaking Competence and Learner Perception. Cross-Cultural Studies 48, 223–252. doi:10.21049/ccs.2017.48.223

Nagata, R., Hashiguchi, T., and Sadoun, D. (2020). Is the Simplest Chatbot Effective in English Writing Learning Assistance?, in 16th International Conference of the Pacific Association for Computational Linguistics , PACLING, Hanoi, Vietnam , October 11–13, 2019 , Communications in Computer and Information Science. Editors L.-M. Nguyen, S. Tojo, X.-H. Phan, and K. Hasida, ( Springer ) Vol. 1215, 245–246. doi:10.1007/978-981-15-6168-9

Nelson, T. O., and Narens, L. (1994). Why Investigate Metacognition. in Metakognition: Knowing About Knowing . Editors J. Metcalfe, and P. Shimamura, (MIT Press) 13, 1–25.

Nghi, T. T., Phuc, T. H., and Thang, N. T. (2019). Applying Ai Chatbot for Teaching a Foreign Language: An Empirical Research. Int. J. Sci. Res. 8.

Ondas, S., Pleva, M., and Hládek, D. (2019). How Chatbots Can Be Involved in the Education Process. in ICETA 2019 - 17th IEEE International Conference on Emerging eLearning Technologies and Applications, Proceedings, Stary Smokovec , Slovakia , November 21–22, 2019 . Editors F. Jakab, (IEEE), 575–580. doi:10.1109/ICETA48886.2019.9040095

Pereira, J., Fernández-Raga, M., Osuna-Acedo, S., Roura-Redondo, M., Almazán-López, O., and Buldón-Olalla, A. (2019). Promoting Learners' Voice Productions Using Chatbots as a Tool for Improving the Learning Process in a MOOC. Tech. Know Learn. 24, 545–565. doi:10.1007/s10758-019-09414-9

Pérez, J. Q., Daradoumis, T., and Puig, J. M. M. (2020). Rediscovering the Use of Chatbots in Education: A Systematic Literature Review. Comput. Appl. Eng. Educ. 28, 1549–1565. doi:10.1002/cae.22326

Pérez-Marín, D. (2021). A Review of the Practical Applications of Pedagogic Conversational Agents to Be Used in School and University Classrooms. Digital 1 (1), 18–33. doi:10.3390/digital1010002

Pham, X. L., Pham, T., Nguyen, Q. M., Nguyen, T. H., and Cao, T. T. H. (2018). “Chatbot as an Intelligent Personal Assistant for mobile Language Learning,” in ACM International Conference Proceeding Series doi:10.1145/3291078.3291115

Quincey, E. de., Briggs, C., Kyriacou, T., and Waller, R. (2019). “Student Centred Design of a Learning Analytics System,” in Proceedings of the 9th International Conference on Learning Analytics & Knowledge , Tempe Arizona, USA , March 4–8, 2019 , (ACM), 353–362. doi:10.1145/3303772.3303793

Ram, A., Prasad, R., Khatri, C., Venkatesh, A., Gabriel, R., Liu, Q, et al. (2018). Conversational Ai: The Science behind the Alexa Prize, in 1st Proceedings of Alexa Prize (Alexa Prize 2017) . ArXiv [Preprint]. Available at: https://arxiv.org/abs/1801.03604 .

Rebaque-Rivas, P., and Gil-Rodríguez, E. (2019). Adopting an Omnichannel Approach to Improve User Experience in Online Enrolment at an E-Learning University, in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science , Orlando, FL, USA , July 26–31, 2019 . Editors C. Stephanidis, ( Springer ), 115–122. doi:10.1007/978-3-030-23525-3

Robinson, C. (2019). Impressions of Viability: How Current Enrollment Management Personnel And Former Students Perceive The Implementation of A Chatbot Focused On Student Financial Communication. Higher Education Doctoral Projects.2 . https://aquila.usm.edu/highereddoctoralprojects/2 .

Ruan, S., Jiang, L., Xu, J., Tham, B. J.-K., Qiu, Z., Zhu, Y., Murnane, E. L., Brunskill, E., and Landay, J. A. (2019). “QuizBot: A Dialogue-based Adaptive Learning System for Factual Knowledge,” in 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019 , Glasgow, Scotland, United Kingdom , May 4–9, 2019 , (ACM), 1–13. doi:10.1145/3290605.3300587

Sandoval, Z. V. (2018). Design and Implementation of a Chatbot in Online Higher Education Settings. Issues Inf. Syst. 19, 44–52. doi:10.48009/4.iis.2018.44-52

Sandu, N., and Gide, E. (2019). “Adoption of AI-Chatbots to Enhance Student Learning Experience in Higher Education in india,” in 18th International Conference on Information Technology Based Higher Education and Training , Magdeburg, Germany , September 26–27, 2019 , (IEEE), 1–5. doi:10.1109/ITHET46829.2019.8937382

Saygin, A. P., Cicekli, I., and Akman, V. (2000). Turing Test: 50 Years Later. Minds and Machines 10 (4), 463–518. doi:10.1023/A:1011288000451

Sinclair, A., McCurdy, K., Lucas, C. G., Lopez, A., and Gaševic, D. (2019). “Tutorbot Corpus: Evidence of Human-Agent Verbal Alignment in Second Language Learner Dialogues,” in EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining .

Smutny, P., and Schreiberova, P. (2020). Chatbots for Learning: A Review of Educational Chatbots for the Facebook Messenger. Comput. Edu. 151, 103862. doi:10.1016/j.compedu.2020.103862

Song, D., Rice, M., and Oh, E. Y. (2019). Participation in Online Courses and Interaction with a Virtual Agent. Int. Rev. Res. Open. Dis. 20, 44–62. doi:10.19173/irrodl.v20i1.3998

Stapić, Z., Horvat, A., and Vukovac, D. P. (2020). Designing a Faculty Chatbot through User-Centered Design Approach, in 22nd International Conference on Human-Computer Interaction,HCII 2020 , Copenhagen, Denmark , July 19–24, 2020 , Lecture Notes in Computer Science. Editors C. Stephanidis, D. Harris, W. C. Li, D. D. Schmorrow, C. M. Fidopiastis, and P. Zaphiris, ( Springer ), 472–484. doi:10.1007/978-3-030-60128-7

Subramaniam, N. K. (2019). Teaching and Learning via Chatbots with Immersive and Machine Learning Capabilities. In International Conference on Education (ICE 2019) Proceedings , Kuala Lumpur, Malaysia , April 10–11, 2019 . Editors S. A. H. Ali, T. T. Subramaniam, and S. M. Yusof, 145–156.

Sugondo, A. F., and Bahana, R. (2019). “Chatbot as an Alternative Means to Access Online Information Systems,” in 3rd International Conference on Eco Engineering Development, ICEED 2019 , Surakarta, Indonesia , November 13–14, 2019 , IOP Conference Series: Earth and Environmental Science, (IOP Publishing) 426. doi:10.1088/1755-1315/426/1/012168

Suwannatee, S., and Suwanyangyuen, A. (2019). “Reading Chatbot” Mahidol University Library and Knowledge Center Smart Assistant,” in Proceedings for the 2019 International Conference on Library and Information Science (ICLIS) , Taipei, Taiwan , July 11–13, 2019 .

Vaidyam, A. N., Wisniewski, H., Halamka, J. D., Kashavan, M. S., and Torous, J. B. (2019). Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape. Can. J. Psychiatry 64 (7), 456–464. doi:10.1177/0706743719828977

Vijayakumar, B., Höhn, S., and Schommer, C. (2019). “Quizbot: Exploring Formative Feedback with Conversational Interfaces,” in 21st International Conference on Technology Enhanced Assessment, TEA 2018 , Amsterdam, Netherlands , Dec 10-11, 2018 . Editors S. Draaijer, B. D. Joosten-ten, and E. Ras, ( Springer ), 102–120. doi:10.1007/978-3-030-25264-9

Virtanen, M. A., Haavisto, E., Liikanen, E., and Kääriäinen, M. (2018). Ubiquitous Learning Environments in Higher Education: A Scoping Literature Review. Educ. Inf. Technol. 23 (2), 985–998. doi:10.1007/s10639-017-9646-6

Wildman, T. M., Magliaro, S. G., Niles, R. A., and Niles, J. A. (1992). Teacher Mentoring: An Analysis of Roles, Activities, and Conditions. J. Teach. Edu. 43 (3), 205–213. doi:10.1177/0022487192043003007

Wiley, D., and Edwards, E. K. (2002). Online Self-Organizing Social Systems: The Decentralized Future of Online Learning. Q. Rev. Distance Edu. 3 (1), 33–46.

Winkler, R., and Soellner, M. (2018). Unleashing the Potential of Chatbots in Education: A State-Of-The-Art Analysis. in Academy of Management Annual Meeting Proceedings 2018 2018 (1), 15903. doi:10.5465/AMBPP.2018.15903abstract

Winne, P. H., and Hadwin, A. F. (2008). “The Weave of Motivation and Self-Regulated Learning,” in Motivation and Self-Regulated Learning: Theory, Research, and Applications . Editors D. H. Schunk, and B. J. Zimmerman, (Mahwah, NJ: Lawrence Erlbaum Associates Publishers ), 297–314.

Wisniewski, B., Zierer, K., and Hattie, J. (2019). The Power of Feedback Revisited: A Meta-Analysis of Educational Feedback Research. Front. Psychol. 10, 1664–1078. doi:10.3389/fpsyg.2019.03087

Wolfbauer, I., Pammer-Schindler, V., and Rose, C. P. (2020). “Rebo Junior: Analysis of Dialogue Structure Quality for a Reflection Guidance Chatbot,” in Proceedings of the Impact Papers at EC-TEL 2020, co-located with the 15th European Conference on Technology-Enhanced Learning “Addressing global challenges and quality education” (EC-TEL 2020) , Virtual , Sept 14–18, 2020 . Editors T. Broos, and T. Farrell, 1–14.

Xiao, Z., Zhou, M. X., and Fu, W.-T. (2019). “Who should be my teammates: Using a conversational agent to understand individuals and help teaming,” in IUI’19: Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray , California, USA , March 17–20, 2019 , (ACM), 437–447. doi:10.1145/3301275.3302264

Xu, A., Liu, Z., Guo, Y., Sinha, V., and Akkiraju, R. (2017). “A New Chatbot for Customer Service on Social media,” in Proceedings of the 2017 CHI conference on human factors in computing systems , Denver, Colorado, USA , May 6–11, 2017 , ACM, 3506–3510. doi:10.1145/3025453.3025496

Yin, J., Goh, T.-T., Yang, B., and Xiaobin, Y. (2020). Conversation Technology with Micro-learning: The Impact of Chatbot-Based Learning on Students' Learning Motivation and Performance. J. Educ. Comput. Res. 59, 154–177. doi:10.1177/0735633120952067

Appendix a aconcept map of chatbots in education

Keywords: chatbots, education, literature review, pedagogical roles, domains

Citation: Wollny S, Schneider J, Di Mitri D, Weidlich J, Rittberger M and Drachsler H (2021) Are We There Yet? - A Systematic Literature Review on Chatbots in Education. Front. Artif. Intell. 4:654924. doi: 10.3389/frai.2021.654924

Received: 17 January 2021; Accepted: 10 June 2021; Published: 15 July 2021.

Reviewed by:

Copyright © 2021 Wollny, Schneider, Di Mitri, Weidlich, Rittberger and Drachsler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Sebastian Wollny, [email protected] ; Jan Schneider, [email protected]

This article is part of the Research Topic

Intelligent Conversational Agents

Future directions for chatbot research: an interdisciplinary research agenda

  • Regular Paper
  • Open access
  • Published: 19 October 2021
  • Volume 103 , pages 2915–2942, ( 2021 )

Cite this article

You have full access to this open access article

literature review for chatbot project

  • Asbjørn Følstad   ORCID: orcid.org/0000-0003-2763-0996 1 ,
  • Theo Araujo   ORCID: orcid.org/0000-0002-4633-9339 2 ,
  • Effie Lai-Chong Law   ORCID: orcid.org/0000-0002-0873-0150 3 ,
  • Petter Bae Brandtzaeg   ORCID: orcid.org/0000-0002-9010-0800 4 , 1 ,
  • Symeon Papadopoulos   ORCID: orcid.org/0000-0002-5441-7341 5 ,
  • Lea Reis   ORCID: orcid.org/0000-0002-6607-0517 6 ,
  • Marcos Baez   ORCID: orcid.org/0000-0003-1666-2474 7 ,
  • Guy Laban   ORCID: orcid.org/0000-0002-3796-1804 8 ,
  • Patrick McAllister   ORCID: orcid.org/0000-0002-0243-1555 9 ,
  • Carolin Ischen   ORCID: orcid.org/0000-0002-4135-1777 2 ,
  • Rebecca Wald   ORCID: orcid.org/0000-0003-2086-903X 2 ,
  • Fabio Catania   ORCID: orcid.org/0000-0002-5403-9002 10 ,
  • Raphael Meyer von Wolff   ORCID: orcid.org/0000-0002-8233-6506 11 ,
  • Sebastian Hobert   ORCID: orcid.org/0000-0003-3621-0272 11 &
  • Ewa Luger   ORCID: orcid.org/0000-0001-7882-9415 12  

35k Accesses

77 Citations

15 Altmetric

Explore all metrics

Chatbots are increasingly becoming important gateways to digital services and information—taken up within domains such as customer service, health, education, and work support. However, there is only limited knowledge concerning the impact of chatbots at the individual, group, and societal level. Furthermore, a number of challenges remain to be resolved before the potential of chatbots can be fully realized. In response, chatbots have emerged as a substantial research area in recent years. To help advance knowledge in this emerging research area, we propose a research agenda in the form of future directions and challenges to be addressed by chatbot research. This proposal consolidates years of discussions at the CONVERSATIONS workshop series on chatbot research. Following a deliberative research analysis process among the workshop participants, we explore future directions within six topics of interest: (a) users and implications, (b) user experience and design, (c) frameworks and platforms, (d) chatbots for collaboration, (e) democratizing chatbots, and (f) ethics and privacy. For each of these topics, we provide a brief overview of the state of the art, discuss key research challenges, and suggest promising directions for future research. The six topics are detailed with a 5-year perspective in mind and are to be considered items of an interdisciplinary research agenda produced collaboratively by avid researchers in the field.

Similar content being viewed by others

literature review for chatbot project

Why People Use Chatbots

literature review for chatbot project

An Overview of Chatbot Technology

literature review for chatbot project

Chasing the Chatbots

Avoid common mistakes on your manuscript.

1 Introduction

Chatbots are conversational agents providing access to information and services through interaction in everyday language. While research on conversational agents has been pursued for decades within fields such as social robotics, embodied conversational agents, and dialogue systems, it is only recently that conversational agents have become practical reality [ 77 ]. Key drivers of this development include advances in artificial intelligence (AI) fields, such as natural language processing (NLP) and natural language understanding (NLU), as well as the increased consumer uptake of platforms conductive to conversational interaction [ 38 ].

Chatbots are currently taken up in application areas as diverse as customer service [ 1 ], health [ 105 ], education [ 53 ], and office work [ 78 ]. There has lately been a marked increase of interest in chatbot research within academia and industry, specifically from 2016 and onwards [ 86 ]. Recent research addresses, for example, chatbot use (e.g. 74 ], interaction design (e.g. 57 ] and assessment (e.g. 63 ], as well as specific applications (e.g. 96 ] and technological advances (e.g. 2 ].

The rapidly growing body of chatbot research has a marked interdisciplinary character—spanning fields such as informatics, management and marketing, media and communication science, linguistics and philosophy, psychology and sociology, engineering, design, and human-computer interaction. This broad emerging knowledge base is valuable, but also implies that research of relevance to chatbots is currently fragmented across disciplines and application domains. With a broad and rich range of chatbot applications, it is imperative to understand why certain chatbot usages are working (or not) by referencing in-depth theoretical frameworks. As the current interdisciplinary wave of chatbot research is progressing, there is a need to define overarching research directions for guidance, allowing new studies and initiatives to systematically build on and benefit from existing work.

In this paper, we propose a research agenda which has been distilled through a series of dedicated workshops on chatbot research—CONVERSATIONS—with intensive discussions among researchers and practitioners actively working on chatbots. The research agenda has the overall aim to motivate and guide research to establish requisite knowledge for fully realizing the potential of chatbots as a powerful means of accessing information and services and for understanding the impact of chatbots at the individual, group, and societal level. As the research on chatbots is rapidly evolving, we hold that deriving a research agenda from collaborations and discussions among avid researchers and practitioners, who keep abreast of the ongoing developments of the area, is a more effective approach as compared, for example, with a mapping study or systematic literature review. Furthermore, this collaborative approach enables us to gain insights from different perspectives to address opportunities, challenges, and perceived research needs within the field. The research agenda serves as a concise research roadmap, offering links to pertinent studies for those readers who are interested in delving further into specific fields.

In the following, we first present relevant background on chatbot research before we detail the need for a consolidation of future directions. We then present our approach and proposed set of directions. Finally, we discuss our proposal and the way forward.

2 Background

2.1 historical roots of chatbot research.

The emerging chatbot research area has its historical roots in several research fields addressing different aspects of conversational computer systems—the most prominent of these with decades of research and efforts at industrial applications. Within the field of dialogue systems [ 77 ], researchers have since the sixties and seventies worked on text based [ 12 ] and later spoken [ 59 ] conversational user interface to support users with specific tasks. Other streams of research preceding and relevant to current chatbot research have addressed conversational interaction with physical social robots [ 15 ], and embodied virtual agents [ 18 ]. There has also been a long-term research initiative addressing computer systems for open-domain small talk [ 98 ], including the development of the artificial intelligence markup language [ 111 ] used to power chatbots for social chit-chat. Conversational computer systems have also had a long and, at times, winding path through various commercial applications—particularly automated solutions for customer service, sales and support [ 72 ], including interactive voice response (IVR) systems for phone-based self service [ 23 ].

The recent substantial increase in chatbot research can be seen as a direct response to the uptake of so-called virtual assistants by big tech companies, specifically the inclusion of Siri as part of Apple operating system in 2011, Amazon's promotion of Alexa since 2014 and the conversational turn of Facebook, Microsoft and Google in 2016 [ 25 ]. Piccolo et al. [ 86 ] concluded that chatbot research has followed in the trail of the industrial uptake of conversational computer systems rather than being at the driver seat. In consequence, the contribution of this burgeoning research area is as much to understand the emerging application, uses, and implications of conversational computing systems, as to improve on their technological underpinnings and methods for design and development. Consequently, the chatbot research area has a broader scope and disciplinary coverage than the fields at its historical roots.

2.2 Clarification of terminology

As noted by McTear [ 77 ], research streams such as those of dialogue systems, embodied conversational agents, and social robotics, are now converging in a common aim for developing and improving on conversational user interfaces to computer systems. However, there is still a wide variety of terms in use in reference to the object of this converging research interest. Since the recent industrial uptake of conversational computing systems, these have increasingly been referred to as chatbots within industry and media [ 25 ] and also in research. To demarcate the research area driven by the industrial uptake of conversational computer systems, and to signify the attention of this area towards emerging patterns of use, as well as broader business and societal implications, we refer to this area as chatbot research.

In line with this scoping of the research area, we understand chatbots as conversational agents providing access to information and services through interaction in everyday language —an understanding which is in line with the definitions by Følstad and Brandtzaeg [ 39 ] and Hobert and Meyer von Wolff [ 54 ]. This use of the term chatbot encompasses conversational agents for goal-oriented task completion, informational purposes, entertainment, and social chatter. It also encompasses agents supporting interactions through text, voice, or both. The use of the term is in reference to the object of our research interest—current and future design, development, and implications of information and services provided through conversational computer systems—rather than in reference to a specific set of technologies or approaches.

In consequence, our use of the term chatbot is broader than what may be found in other research streams. For example, some distinguish between voice-based and text-based conversational agents, using the term chatbot to refer to the latter, (e.g. Ashktorab et al.  6 ). Others distinguish between conversational agents for goal completion versus social chatter, referring only to the latter as chatbots (e.g. Jurafsky and Martin  61 ). However, in consequence of the rapid evolvement both in technology, services, and patterns of use, we find such attempts at principled scoping of the chatbot term challenging. For example, there is often no clear distinction between social chatter and goal-orientation in conversational agents—as seen by the importance of social responses for customer service chatbots [ 114 ]. Likewise, the distinction between text and voice is less than clear-cut as the same conversational agents may make use of different modalities [ 97 ].

2.3 Enablers of current chatbots

Current chatbots are enabled by a large range of technologies and services [ 97 ] at varied levels of sophistication. Dialogue management may be enabled through simple rule-based approaches, statistical data-driven systems, or neural generative end-to-end approaches [ 77 ], and many systems employ hybrid models [ 50 ]. Whereas early chatbots for social chit-chat may exemplify rule-based approaches (e.g., Weizenbaum 112 ) current statistical data-driven systems—such as chatbots for customer service—have user intents and corresponding chatbot responses identified on the basis of training of machine learning models based on example user data [ 66 ]. Generative chatbots based on end-to-end approaches are currently a research topic of substantial interest. A much-cited example is presented by Vinyals and Le [ 109 ]. More recently, Facebook's Blender [ 90 ] and Google’s Meena [ 2 ] have received substantial interest due to their near-human open domain conversational capabilities.

A large number of general-purpose platforms and frameworks are available for chatbot delivery, such as Google's DialogFlow, Footnote 1  Microsoft Bot Framework, Footnote 2  Pandorabots, Footnote 3  and the open-source frameworks Rasa Footnote 4 and Mycroft. Footnote 5  The platforms range from so-called low-code alternatives [ 26 ], where implementation and maintenance may be conducted with limited or no software engineering skills, to frameworks serving as basis for larger software development projects. Platforms and frameworks for chatbot delivery typically provide integrations with a range of communication channels, including social media and chat, as well as websites and collaborative work support systems. Hence, the same chatbot may reach users across their preferred channels.

2.4 Research communities

Chatbot research is currently evolving within and across a range of disciplines and has a strong interdisciplinary character. Ground-breaking research has been presented in fields as diverse as communication (e.g. Go and Sundar 42 ), health (e.g. Fitzpatrick et al.  35 ), informatics (e.g. Adiwardana et al.  2 ), and business (e.g. Adam et al.  1 ). While dedicated workshops and conferences of relevance to chatbot research are emerging—such as CUI, Footnote 6  CONVERSATIONS, Footnote 7  and CAIR Footnote 8 —in addition to established venues—such as SIGDIAL, Footnote 9  IVA, Footnote 10  IWSDS, Footnote 11  and INTERSPEECH Footnote 12 —research findings are typically presented in a broad range of journals and conferences. Research related to chatbots is also conducted in multiple communities with varying degrees of exchange among them. These communities may not label their area of interest as chatbot research but, for example, research addressing conversational agents [ 79 ], dialogue systems [ 59 ] or social robotics [ 93 ]. The research objectives within these communities may only be partially overlapping. However, we believe these communities likely will benefit from strengthening their collaboration and mutually inform and support each other's research.

3 Objective: to propose future research directions

While there is a rapidly expanding body of knowledge relevant to chatbot research, rooted in long-standing research fields, current research and knowledge are fragmented across disciplines, application areas, and communities. Such fragmentation is to be expected in a rapidly expanding field. However, we are now at a point in time where it is beneficial to stake out common directions for future research.

The identification of common research directions is not something that can be achieved by individual researchers or single communities. Rather, it should be seen as a collaborative and continuously evolving process across individuals and communities, where adjustments are made on the basis of new insights and knowledge as it is gathered.

Our objective in presenting this work is therefore to provide a needed interdisciplinary and collaborative basis to initiate and guide a broader discussion on the key future research directions for chatbot research. As such, the work will provide a broader perspective on research directions than what is provided, for example, in current reviews on chatbots within specific domains (e.g. [ 105 , 78 ]), specific aspects of chatbot technology and design (e.g. [ 20 , 86 ]), or user behaviour and experience (e.g. [ 63 , 119 ]).

Furthermore, we address perspectives and topics for chatbot research which may be more broadly scoped than what may be found within, for example, the fields and disciplines in which chatbot research has its roots. As such, we aim for the work to provide a basis for chatbot research that is seem of value to research and practice alike, and which also may serve to bridge relevant research currently embedded in distinct disciplines.

The proposed future research directions are based on the collaborative work conducted as part of the CONVERSATIONS workshops. CONVERSATIONS is an international workshop series for chatbot research, where researchers, students, and practitioners with interest in chatbots gather to present their work, discuss, and collaborate. The first workshop in this series was organised in 2017 and it has since been a yearly event, advancing from being arranged in conjunction with a research conference the two first years to now being a 2-day stand-alone event. The most recent workshop in 2020 [ 41 ], arranged as a virtual event due to the COVID-19 pandemic, involved about 150 registered participants from more than 30 countries and 80 different organizations, including more than 20 paper presentations. The participants represent disciplines such as computer science, information systems, human–computer interaction, communication studies, linguistics, psychology, marketing, and design.

Throughout the CONVERSATIONS workshops, we have discussed chatbot research challenges and how to address these. In the first CONVERSATIONS workshop (2017), approximately half of the overall 30 participants engaged in identifying and clustering key research challenges of the field into overarching research topics. The research challenges within these topics formed the basis for the call for papers to the later CONVERSATIONS workshops (2018, 2019, 2020). At the third CONVERSATIONS workshop (2019), the topics—updated throughout the workshop series—were revisited through in-depth group discussions involving approximately half of the overall 50 workshop participants. The output from these group discussions forms the basis for the presented research directions.

The deliberative process at the workshop series was key to identify and propose research directions in a true interdisciplinary fashion. In the 2019 edition, workshop participants were assigned to groups—each with the mandate to address one of six topics: (a) user and communication studies, (b) user experience and design, (c) frameworks and platforms, (d) chatbots for collaboration, (e) democratizing chatbots, and (f) ethics and privacy. The group work was conducted in two sessions across the 2 days of the workshop. In the first session, each group carefully discussed the research topic in a 5-year time frame, identifying (a) relevant state of the art, (b) key research challenges, and (c) future directions. In the second session, the output of each group was presented to the workshop plenary and discussed.

The collaborative process extended across the following year, taking into account the contributions and discussions of the CONVERSATIONS 2020 workshop as well. As a result, the proposed research directions reflect the interdisciplinary position of a group of collaborating researchers within this emerging field.

5 Proposed future research directions

Through the CONVERSATIONS workshop series, six overarching topics for future chatbot research have been identified. In the following, we detail each of these based on the CONVERSATIONS output, with particular concern for the state of the art, research challenges, and future research directions. An overview of the six topics and associated future research directions is provided in Table  1 .

5.1 Users and implications

Given the current evolving use and emerging use cases for chatbots, important questions to ask concern chatbot users and their contexts of use. This includes investigating antecedents for chatbot use—namely individual characteristics, motivations and boundary conditions for choosing, accepting or even preferring to interact with conversational agents. Furthermore, it is necessary to explore and discuss implications of chatbot use on individuals, groups, organizations and society at large.

5.1.1 State of the art

Chatbot use is becoming commonplace. For example, in 2019, over 50 % of US and German consumers were estimated to have used chatbots at least once—with even higher numbers in the UK or France [ 88 ]. In consequence, chatbot researchers currently have an unprecedented opportunity for real-world study of users [ 106 ], user motivations [ 14 ], and implications at scale. In consequence, knowledge on chatbot use has been gathered for a range of contexts—in the private sphere [ 87 ], at work [ 74 ], and in public spaces [ 17 ].

A substantial body of research of relevance for chatbot use has been developed within broad domains such as health [ 105 ], education [ 84 ], and business [ 8 ], as well as more specific application domains such as polling [ 62 ], information search [ 73 ], libraries [ 92 ], and museums [ 64 ]. Knowledge of relevance for understanding the impact of chatbots on individual users may be found in studies of therapy chatbots (e.g. [ 35 ]), relational agents (e.g. [ 10 ] and chatbots for social relationships [ 103 ]. Specifically, it is of interest to note how such studies address implications of individual long-term use.

Because of this, we have substantial knowledge on potential and actual chatbot users and implications for individuals across a wide variety of contexts, building upon a rich stream of research dating back to the work of Weizenbaum [ 112 ]. Chatbot impact on society has, however, not been comprehensively researched and only tentatively been suggested in studies of chatbots for specific domains—as mentioned above. This may in part be due to the substantial impact on the level of organizations and society is assumed to be seen in the future more so than the present.

5.1.2 Research challenges

While we have substantial knowledge on current chatbot users, important topics lack sufficient coverage. Two warrant particular mention: (a) broader chatbot uses and user groups and (b) implications of chatbot use, both detailed below.

For the broader chatbot uses and user groups, the rich literature needs to be continuously updated, especially when it comes to user motivations and behaviour of emerging user groups. This includes knowledge on specific demographics, for example, vulnerable users, such as children, elderly and users with special needs, as well as user groups within particular application areas. Moreover, research still needs to assess whether there are systematic differences in the adoption and usage of chatbots driven by socio-demographic characteristics.

Implications of chatbot use entail a range of exciting research challenges, as knowledge is needed on how the uptake of chatbots may impact groups, organizations, businesses, and society at large. For example, as chatbots are taken up by different sectors and industries, chatbots may transform service provision and work processes.

Another example is our need for knowledge on how the interaction patterns that emerge between human users and chatbots may spill over to our interaction with other people: Will the demanding communication style we learn to use for virtual assistants, such as Alexa and Siri, impact our communication style with our partners or collaborators? How will the companionship offered by social chatbots influence users' social lives and desires, and how chatbots may enter the social fabric of groups or organizations?

5.1.3 Future research directions

Based on the current state of the art and identified research challenges, two future research directions emerge as particularly promising in the area of chatbot user and communication studies.

Emerging chatbot user groups and behaviours. While there exists knowledge on current chatbot user groups, this needs to be updated as technology, services, and patterns of use evolve. Furthermore, there is a need to move from studies of chatbot users in general to studies of chatbot users and behaviours for particular demographics, domains, or contexts. We are beginning to see this for domains such as health, education, and business, but given the uptake of chatbots in new contexts and domains, this is an area of research which will be in continuous need of update.

Social implications of chatbots. The study of social implications of chatbots is an area where we expect to see substantial research interest in the near future. Knowledge of the social implications of chatbot use will be of importance to guide also future development and design of chatbot services. Possibly, a string of research on the broader social implications could be motivated from the broader discourse on implications of AI for labour and business (e.g. [ 37 , 76 ]). It will be beneficial to accommodate for research on unintended social consequences of chatbots or how chatbots are shaped in response to its uptake in society.

5.2 Chatbot user experience and design

Chatbot user experience and design concerns how users perceive and respond to chatbots, and how chatbot layout, interaction mechanisms and conversational content may be designed so as to manage these perceptions and responses. To gather insight into users' perceptions and responses, and how these are impacted by chatbot design, user-centred evaluations of chatbots is necessary; that is, assessments of users' perceptions and responses to chatbots conducted through established methods.

5.2.1 State of the art

Chatbot user experience has been a key theme in recent research efforts, for voice-based virtual agents [ 75 ] and text-based applications [ 4 ]. This has helped identify factors contributing to positive or negative user experience [ 118 ] and has addressed specific aspects such as trust [ 119 ], perceived social support [ 71 ], human likeness [ 4 ], and how these aspects are impacted by chatbot design [ 42 ]. There is also a growing base of research to inform design of chatbot interactions, whether this concerns conversational design [ 6 ], personalization of chatbots [ 69 ], the use of interactive elements in chatbots [ 57 ], or the use of social cues to indicate social status and capabilities [ 32 ]. Recently, a number of textbooks (e.g. [ 48 , 79 , 97 ]) and industry guidelines (e.g. by Google Footnote 13 and Amazon Footnote 14 ) have also been published on chatbot interaction design and conversational design. Textual and acoustic properties of users' dialogue input are gradually being applied as outcomes in empirical research for studying engagement and experience with conversational agents [ 52 , 67 ]. Furthermore, there exists an extensive body of research on emotion detection through speech (e.g. [ 95 ]) and non-verbal behaviour [ 27 ] of high relevance to chatbot user experience and design.

There is also a grown body of knowledge on methods and measures for evaluating chatbot user experience. User-centred evaluation has been key to research within several of the disciplines at the roots of current chatbot research, such as studies of social presence in social robotics [ 82 ] and the use of user satisfaction measures in evaluations of dialogue systems [ 28 ]. Evaluation in chatbot research is conducted by instruments for users' self reports of user experience [ 63 ], through user observation and interviews [ 75 ] and analyses of chatbot interaction [ 66 ], and also by physiological measurements [ 22 ]. A range of evaluation approaches are employed, including experiments by self-administered online studies [ 5 ] or in the lab [ 22 ], observational studies in the wild [ 64 ], and investigations of long-term interactions with established services [ 73 ].

5.2.2 Research challenges

While there is a growing body of research available on chatbot user experience there still is a lack of knowledge on how to leverage the findings from this research in chatbot designs that consistently delight and engage users. Users still experience issues in chatbot interaction, both in terms of pragmatic experiences—where chatbots fail to understand or to help users achieve their intended goals [ 75 ]—and in terms of hedonic experiences—where chatbots fail to engage users over time [ 117 ]. These issues may in part be seen as due to the more general challenge of designing human-AI interaction [ 116 ]. There are indeed indications that these challenges are being mitigated, for example in the case of improvements in customer service chatbots [ 80 ] and in the uptake of social chatbots such as Replika [ 103 ]. However, the strengthening of chatbot user experiences remains a key research challenge.

Related to the challenge of strengthening chatbot user experience, is the challenge of measuring and assessing chatbots in terms of user experience and from a more holistic perspective to determine whether chatbots are actually beneficial. Relevant aspects for this are, for instance, usefulness, efficiency and process support. While there is a large number of studies on chatbot user experience available, there is a lack of common definitions, metrics and validated scales for key aspects of chatbot evaluations [ 63 ]. Furthermore, while a broad range of approaches are employed there is a lack of commonly applied approaches to evaluation.

5.2.3 Future research directions

Future research should be directed at addressing the identified key research challenges. Specifically, the following two directions are proposed.

Design for improving chatbot user experience. Future research on chatbot user experience needs to evolve from exploring and assessing aspects of user experience and effects of chatbot design elements, towards studying how this knowledge may impact and improve chatbot user experience in industrial applications. Specifically, to translate findings of theoretical interest to conclusions of practical impact on design. This is not to say that research to build theory on chatbot user experience is not needed, but this research may need to take up also more design-oriented objectives—so as to condense current research and knowledge to guidelines that may directly inform conversational design or interaction design.

Modelling and evaluating chatbot user experience. To advance future research on chatbot user experience, there is also a need for convergence of chatbot user experience models, measurements, and approaches to evaluation. While diversity in definitions and operationalizations is to be expected in an emerging field of research interest, there may now be the time for seeking agreement and consistency in the use of terminology and definitions of user experience constructs, and also to identify and apply standardized measurements (benchmarks) for these constructs. While such convergence should not be done in a way that hampers theoretical advancement and method innovation, there is clearly a benefit in including common measurements across studies so as to enable cross-study comparison and aggregation, and to be able to track progress over time. For this purpose, established evaluation approaches from fields such as human-computer interaction or the tradition of dialogue systems may be beneficial.

5.3 Chatbot frameworks and platforms

This area of chatbot research concerns the current and future frameworks and platforms for chatbot development and delivery. That is, the technological underpinnings of chatbot implementations such as solutions for natural language processing, data extraction, storage, and access, as well as mechanisms to identify and adapt chatbot interactions to context and user profile.

5.3.1 State of the art

The advances in chatbot frameworks and platforms are key enablers of the current interest in chatbot applications. As noted in the background Sect. 6 myriad platforms and frameworks are available to support design and development of chatbots. Key advances include the application of supervised machine learning for classification and information retrieval—enabling, for example, intent prediction and identification of user sentiment [ 20 ], which are critical to support task-oriented conversations. Furthermore, the use of generative approaches has seen substantial progress, where end-to-end dialogue systems are applied to predict suitable responses to user input based on models built from large conversational datasets [ 2 , 109 ]. Finally, the introduction of the Transformer [ 107 ] as a dominant and highly effective architecture for natural language processing along with high-quality open-source libraries [ 113 ] have lowered the barrier to entry and make it possible to build conversational models that exhibit high generalization and coherence [ 90 ].

In this regard, large-scale generative models are becoming increasingly impactful, enabling a wide range of tasks that can benefit chatbot development [ 36 ]. Models such as GPT-3 [ 16 ] by OpenAI and BERT (Bidirectional Encoder Representations from Transformers) [ 29 ] by Google leverage massive amounts of data and computational power that would not be available to smaller players. Indeed, GTP-3 currently uses 175 billion parameters, and it is estimated to have cost 12 million US dollars to train [ 36 ]. Thus, opening up these powerful models to the public has the potential to accelerate chatbot development even further. It is important to note, however, that criticism around large models has been growing lately [ 9 ], especially ethical concerns regarding undesirable and often inscrutable societal biases percolating the models [ 9 , 120 ], carbon footprint [ 9 ,  99 ], misuse and misinterpretation [ 9 ], privatization of AI research [ 99 ], and even research opportunity costs [ 49 ].

5.3.2 Research challenges

While substantial advances have been made in chatbot frameworks and platforms, a number of challenges remain. Specifically, we lack the needed technological underpinnings to support some key aspects of chatbot applications. We see four such challenges of particular importance. First, understanding user input remains difficult. While machine learning approaches have strengthened both natural language understanding and intent prediction, chatbot interaction is prone to conversational breakdowns due to interpretation issues—in particular in everyday situations or in the wild [ 87 ]. Second, the challenge of modelling and adapting to the user and conversational context is as important as ever. For example, as chatbots are being increasingly deployed in the health domain, in possibly sensitive scenarios, it becomes of paramount importance for chatbots to adapt the conversation to social, emotional and even the health literacy aspects of users [ 60 ]. These were identified as key challenges already by Weizenbaum [ 112 ] and have remained such ever since. Third, challenges remain in solutions for supporting chatbot development and standardised testing for example in terms of studies simulating production environments and approaches to improve chatbots more easily in production. Last, as chatbots are becoming part of an ecosystem of software systems, supporting chatbot integration in this context is a new emerging challenge—for example by facilitating conversational presentation of information and content also intended for other use [ 7 ].

5.3.3 Future research directions

Interpretation capabilities and context understanding. As in recent years, further progress in the field of chatbots will depend on advances in natural language understanding, which will remain a key area of research interest. To enable progress in natural language understanding, more quality training data in open repositories is needed. Also, new techniques supporting the involvement of domain experts in content development, natural language processing, and dialogue management—through low-code or end-user development approaches—may be relevant. Finally, the challenges of context and user understanding, for sustained dialogue and adaptation of conversations, will remain critical aspects of future research.

Emerging techniques for chatbot design, development, and deployment. Future research is needed to provide increases support for design, development and deployment. The deployment of conversational interfaces on top of software-enabled services is a promising direction for chatbot research and implementation (e.g. [ 115 ])—enabling digital assistants' access to information and services currently out of their reach, and rendering existing systems more accessible. In terms of design, it is promising to see that general guidelines for human-AI interaction are emerging [ 3 ] and more of these are needed. There is also a need for guidelines drawn from systematic comparative studies and to embed research-derived guidelines into chatbot frameworks.

5.4 Chatbots for collaboration

The area of chatbots for collaboration concerns how we may understand and design chatbots in the context of networks that comprise humans and intelligent agents, for example for social networking, teamwork, or service provision. While the current research on chatbots typically addresses dyadic interactions between one chatbot and one user, we foresee that chatbots in collaborative relations involving more people and bots will become more prominent as chatbots mature further. In addition, we consider that collaborative relations can be addressed to a chatbot's relations with external online services in the form of application programming interfaces (APIs) and other artificial agents.

5.4.1 State of the art

Chatbots for collaboration concerns chatbots involved in interactions with humans and possibly with other chatbots in networks larger than dyads. While not as prominent as chatbots for simpler dyadic interaction, chatbots for collaboration have been developed and implemented in a range of contexts and for various purposes, for example, to support group processes in education [ 43 ], at work [ 11 ] and organizational settings [ 104 ], as well as in gaming communities [ 96 ].

Types of collaboration with chatbots may include (a) one human collaborating with one chatbot as an extension of human abilities, for example for analysis, gaming, as part of a service-related inquiry, or as learning partner (e.g. [ 53 ]), (b) chatbots supporting human collaboration, for example by taking notes, documenting, or task management (e.g. [ 104 ]), and (c) chatbots collaborating with other services for example in multi-agent models, networks of chatbots, or external web services (e.g. [ 108 ]).

Chatbots may be integrated into collaborative processes forming what Grudin and Jacques [ 45 ] refer to as humbots , that is, human-chatbot teams which handle challenging service queries better than chatbots alone and more efficiently than humans alone. The concept of humbots assumes a tiered approach to service provision where the chatbots constitute an initial service contact point, and customers are escalated to human helpers only if the chatbot is unable to help. Such human–chatbot teams draw on the concept of human-in-the-loop [ 24 ] from the human factors literature, sensitizing system managers to the need for a collaborative setup allowing sufficient situation awareness to the human part of the team to provide quality takeover if need be. In health-care context, human-in-the-loop concepts for conversational agents supporting hospital nurse teams has proved beneficial [ 13 ]. Likewise, the notion of escalation in customer service chatbots is a practical application of the human-in-the-loop concept for robust application of chatbots in consumer service provision [ 83 ].

5.4.2 Research challenges

There is an essential challenge in studying and designing chatbots for collaboration due to the multifaceted character of such interaction, and the range of potential theoretical perspectives to apply. For example, collaboration may be framed line with game theory—where an agent can be either a collaborator or an opponent [ 56 ]—or follow joint-intention theory where an agent is always aimed to work together with the user [ 55 , 68 ] or to establish a partnership [ 31 ]. When setting the concept of collaboration within social settings, the agent may be considered a mediator of human actors rather than an established actor within the described social structure (e.g. [ 104 ]). Or collaboration is addressed as merely a technical feature when the agent is collaborating with other artificial agents and external web services (e.g., [ 108 ]).

While a range of chatbots for collaboration have been developed, there is relatively scarce research on the characteristics of collaboration with chatbots. That is, we lack models or theories helping us to conceptualize collaboration involving intelligent conversational agents. While this problem of human–machine collaboration is addressed in more generic terms, for example in actor-network theory [ 70 ], there is a lack of models to characterize conversational collaboration involving agents. Related to the challenge of conceptualizing chatbot collaboration, there is a need for research on the different roles chatbots and humans should take in the human–chatbot collaboration, and what the implications of these may be. Should for example, the relation be based on assistance or mutual collaboration? Should chatbot participation be reactive or active? Should the chatbot be submissive or take charge? And what would the implications of these choices be?

5.4.3 Future research directions

Drawing on the above state of the art and research challenges, the following research directions are found to be particularly promising.

Modelling human–chatbot collaboration.  Research is needed to conceptualize and model different forms of human–chatbot collaboration, the roles the collaborative partners may take, and the potential implications these forms and roles may have in the short and long term. Addressing this complex concept within interactions with a novel technology like chatbots may benefit from inductive approaches. Future research may build theory inspired by knowledge on collaboration with humans and other artificial agents than chatbots—such as social robots and embodied virtual agents. Accordingly, the concept of collaboration could be conceptualized in line with chatbots' unique embodiment features, paying particular attention to the possible roles of chatbots in collaboration and identifying properties which express these.

Empirical investigations of human–chatbot collaboration. When robust concepts for human–chatbot collaboration are established, a range of exciting empirical research is foreseen—for example involving experimental studies and case studies. As part of such research, it will be valuable to investigate incentive structures in collaboration, instruments for measuring human–chatbot collaboration, task-specific differences in outcomes, and levels of participant engagement and activity across and within tasks. These may also be included as mediators, moderators and covariates in complex behavioural models studying other concepts as outcomes (e.g., customer satisfaction, user experience, or technology adaptation). Thus, collaboration with chatbots could be situated not solely as an outcome or a predictor, but also as an adaptive behaviour that has a substantial role in a variety of settings and applications.

5.5 Democratizing chatbots–chatbots for all

The topic of democratizing chatbots concerns how chatbots may be developed, designed, and deployed to improve availability and accessibility to information and services. Furthermore, how chatbots may be beneficial in bridging digital divides across various user populations. By nature, democratizing chatbots is a topic of interest to the human–computer Interaction community, but not limited to it. Any discussion around democratizing chatbots has at least some overlap with larger debates concerning the ethics of artificial intelligence—in particular for issues pertaining to fairness, non-discrimination, and justice [ 47 ].

5.5.1 State of the art

By allowing simple natural language dialogues, chatbots are potentially a low-threshold means to access information and services and may as such serve to bridge digital divides and strengthen inclusion [ 14 ]. Chatbots have been suggested as accessible interactive systems for visually impaired in need of an easily navigable user interface [ 7 ], as conversational support for users with special needs [ 19 ], and to support youth to engage in societal issues [ 110 ]. Chatbots may improve access to health care services (e.g. [ 105 ]) and support health-promoting behavior change (e.g. [ 85 ]), and supplement educational programs [ 53 ].

Also relevant for the democratization of chatbots is also the relative lowering of thresholds that chatbots may introduce to interactive systems development and design. A number of current chatbot platforms are marketed under the promise of supporting chatbot design without need for coding skills [ 26 ]. Likewise, to involve domain experts in dialogue design, platforms may include dashboards for low-code updates of chatbot content and interaction design [ 66 ] or take up low-code approaches [ 89 ]. However, to our knowledge, there is a lack of research on the usability, accessibility, and effectiveness of such platforms.

However, some studies highlighted critical aspects in using chatbots since they may sustain and even strengthen existing biases in society. For example, a gender bias has been identified in chatbot design [ 33 ], and voice-based conversational agents have been shown to more easily interpret particular English dialects potentially reducing their utility for users of specific areas [ 51 ], and also to be difficult to use for user groups with speech impairments [ 19 ]. Although many major companies, research institutions, and public sector organizations have issued ethical artificial intelligence guidelines, recent work [ 58 ] has discovered substantial divergence in how these are written and interpreted, highlighting the complexity of designing guidelines for systems with complex social impact. In this way, responsibility is placed on designers and developers to cultivate awareness of these issues and how their approaches impact the end-user instead of discussing shared ethical approaches and focusing on agent decision-making.

5.5.2 Research challenges

Recent studies suggest that while chatbots may indeed serve as a low-threshold interface to information, services, and societal participation, they may also face challenges regarding bias and inclusion. Besides, there is a lack of more systematic or structured investigations of universal and inclusive design of chatbots. Inclusive and responsible design of chatbots requires an understanding of various linguistic elements of conversation and an awareness of broader social and contextual factors. For example, studies are needed on barriers to onboarding and barriers to the use of chatbots. The aim of using chatbots for strengthening democratization, reducing bias, and facilitating universal design has been included in the vision of chatbots for social good [ 40 ], which may be a useful scope for addressing this set of challenges.

Furthermore, while available platforms and frameworks are promoted as low-threshold means of chatbot design and development, there is a lack of knowledge regarding how these are actually employed to democratize chatbot development and design. Also, knowledge is needed on what challenges users with limited technology skills meet when trying to use these platforms and frameworks, and how such challenges may be overcome through changes, for example, in design and training of machine learning models.

5.5.3 Future research directions

In light of the background and research challenges mentioned above, the following broad directions of future research are identified.

Chatbots for social good.  To realize the potential of chatbots as vehicles for bridging digital divides and strengthening accessibility, availability, and affordability of services and information, chatbots for social good may be leveraged as an alternative perspective on chatbot research and design. In this perspective, systematic studies are needed to gain insight into current barriers in chatbot use how these could be employed for social good. In this way, it will be possible to seek to overcome the existing barriers with standardized solutions and follow user-centered design processes focusing on user needs. Finally, research is needed on the normative and ethical implications of the adoption of chatbots in particular contexts, as also outlined in the next section.

Inclusive design with and for diverse user groups. Parallel to the research direction of chatbots for social good, we foresee research and development continuing the work towards making the underlying platforms and frameworks for chatbot design and development more easily applicable also for users without strong technical skills. Here, we foresee studies of current opportunities and challenges faced by chatbot creators, followed by development and design stages, aiming to follow up or mitigate these. Removing the need for complex configuration and simplifying or eliminating coding is probably the easiest way to serve the needs of the small business and research groups—but also the needs of large enterprises that may have domain experts creating chatbots. Furthermore, developing platforms that facilitate the implementation of chatbots and recommend using best practices during the design process will surely raise the quality level of the final products.

5.6 Ethics and privacy in chatbots

The final research topic concerns ethical and privacy implications of chatbots. Specifically, how to reflect ethical and privacy concerns in the design of chatbots, recognising the implications that different chatbot use cases and design choices may have for users’ trust in chatbots, and how we may identify and address unethical chatbot use.

5.6.1 State of the art

AI has recently been the objective of substantial interest from policy-making and regulatory bodies, as well as in discussions and reflections on ethics, privacy management and trust [ 21 ]. This concern for ethics in AI is motivated by its disruptive character and potential for changes in the job market and misuse by malevolent actors, as well as issues pertaining to accountability and bias [ 58 ]. Ethical concerns arising from the design and deployment of AI technology have motivated a number of initiatives [ 47 ]—such as the Ethical Guidelines for Trustworthy AI by the European Commission expert group on AI, and Microsoft’s FATE: Fairness, Accountability, Transparency, and Ethics in AI—addressing issues including mitigating bias and discrimination in AI systems and fairness in the use of AI systems [ 81 ]. Chatbots are a prominent AI-based technology, and as such in principle addressed by the broader concern for ethics and privacy in technology research in general and in AI-based technology in particular. Nevertheless, as noted in a review of the chatbot literature, there has been an initial lack of ethical discussion in chatbot research [ 102 ]—though noteworthy exceptions to this exist, such as the exploration of ethical and social considerations for conversational AI by Ruane et al. [ 91 ]. The ethical discussion in chatbot research may, however, be gaining traction motivated, for example, by Bender et al.’s [ 9 ] critical overview of ethical risks pertaining to large language models.

The interest and discussion concerning ethics and privacy in AI have been particularly impactful in Europe, where the General Data Protection Regulation (GDPR) is now used to govern privacy in technology-based systems and services. Furthermore, based on the advice of a high-level expert group on AI, a European set of ethics guidelines for trustworthy AI has been presented [ 30 ]. According to these guidelines, it is of paramount importance for trustworthy AI to be aligned with (a) legal regulations and (b) ethical principles and values, and also (c) be robust from a technical perspective given its particular social context. From these principles, the European Commission expert group has identified seven key requirements for ethical AI applications, including human agency and oversight, privacy and data governance, and diversity, non-discrimination, and fairness. Finally, a proposed European set of regulations for AI, the AI Act, will help strengthen aspects of ethical concern in AI systems, including legal requirements for human oversight, accuracy, robustness, and security. Of particular relevance for chatbots is the proposed requirement for transparency which will make it an obligation for service providers to ensure users are aware when are interacting with machine agents and not human operators [ 94 ].

5.6.2 Research challenges

Ethical and privacy challenges permeate the field of chatbot research, but specifically where the context is sensitive or high-stakes or the users are marginalised or vulnerable; for example, in designing chatbots for health and education, or in designing chatbots to support asylum seekers or children. There is a large and growing body of ethical and privacy knowledge to draw on, and an emerging set of guidelines and regulations on ethics and privacy for digital systems in general, and AI-based systems in particular. Nevertheless, we lack research and theorising around ethics and privacy specifically for conversational user interfaces. This is problematic, as the conversational character of chatbots may conceivably introduce a range of specific ethical problems, for example the ethical implications of human-like and socially present chatbot interaction, issues of consent, the privacy implications of third-party interactions and the implications of emotional effect on children and vulnerable users. Research is needed to better understand and address these, and other, emergent problems of ethics and privacy.

5.6.3 Future research directions

Drawing on the above, we accentuate the following two directions for future research—though other directions could be possible and maybe equally relevant.

Understanding chatbot ethics and privacy. Future research should facilitate reflections on ethical implication of chatbots, for example through identification of ethical and privacy issues in chatbot design and implementation—including design intentions, practical mitigation of known issues and exploration of unforeseen implications. These could be domain-specific issues, such as ethical implications for research and education, media or marketing and commerce, but these could also be general issues such as how interaction with chatbots may motivate oversharing in users, helping spread misinformation and hate speech, or induce potential negative consequences as a result of over-humanizing chatbots.

Ethics by design.  In parallel with work on chatbot ethics, there will be a need for research on the pragmatic and material issues of how to honour ethical guidelines and principles in the design of chatbot technology and applications. With reference to the principle of privacy by design , we refer to this as ethics by design —where privacy is subsumed as one of several aspects to consider as part of an ethics discussion and subsequent design challenge. Important challenges may include research on how to avoid biases in chatbots, how to avoid chatbot discrimination and redlining, and how to mitigate the ethical issues introduced by the black-box approach to machine learning underpinning aspects of chatbot functioning, as well as to avoid misuse and weaponization of chatbot technology. A useful starting point for an exploration of ethics by design, could be to refine the generic European expert group requirements for ethical AI [ 30 ] to the context of conversational AI.

6 Discussion

Drawing on the involvement of chatbot researchers and practitioners in the CONVERSATIONS workshops, we propose a set of future directions for chatbot research. The directions are motivated by the current state of the art and identified research challenges and structured within six overarching topics. In the following, we discuss the implementation of the future directions, our perspectives on chatbot application areas, and how to continue the discussion and reflection started in this paper.

6.1 Implementing the future directions

Two of the identified research directions concern studies of users and implications, as well as how to design for desirable chatbot use. As chatbots become more pervasive in the coming years, and communication with non-human agents increasingly become part of our daily routines, it becomes even more pressing to expand our knowledge on the antecedents, contents and consequences of human–machine communication. In doing so, this stream of research needs to explore the cognitive, affective and behavioural dimensions of engagement with these agents, the extent to which there are systematic differences between individuals, groups or contexts of use, and the individual, group and societal implications of this phenomenon. Moreover, as the field progresses, there is a growing need to consolidate the existing knowledge, updating and extending overarching theoretical frameworks and models. Work within a wide variety of disciplines can serve as an inspiration in that regard, such as the studies of Sundar [ 100 ] on the psychology of human–agent interaction and Guzman and Lewis [ 46 ] on human–machine communication.

This evolution in our understanding of conversational user experiences should be accompanied with the proper support from platforms and frameworks. We can see the support as increasingly creating abstractions that would facilitate the design, testing, integration and development of chatbots, as it has historically happened with other software artefacts. Current efforts are already moving in that direction, providing development resources that promise anyone with enough motivation, regardless of their background, to deliver human-like interactive experiences. While this has the potential to bring substantial value to societies, empowering communities to develop their own solutions, it can also bring unintended consequences, as we cannot expect users of these platforms to have knowledge about complexities of modelling proper Human-AI experiences [ 116 ]. On the other hand, abstractions can also hide underlying information about machine learning models, AI decision-making, as well as latent bias in the training data (e.g., [ 101 ]) that can translate into social biases (e.g., [ 120 ]).

Human–chatbot collaboration is foreseen as an increasingly important aspect of chatbot research and applications. We hold that such collaboration will benefit from being implemented while reflecting on human collaboration, and in line with relevant empirical evidence of chatbot research—in line with reflections by Grudin and Jacques [ 45 ]. Considering the meaningful value of collaboration for decision making and productivity in professional and organizational settings, tasks assigned to chatbots in these collaborative interactions can vary in complexity and involvement. Such tasks can be as simple as providing individual notifications, or as complicated as communicating processed and analysed data to different stakeholders. Using chatbots for automating these tasks should enrich group productivity and quality of work, promoting mutual understanding and diversity of opinions. Research supporting such automation could benefit from seeing this as a service design challenge, where the chatbot is seen as one of multiple agents and user interfaces [ 14 ]. On a societal level, collaborative networks of humans and chatbots may require new online safe spaces, with chatbots demonstrating higher levels of involvement. These can moderate social interactions, facilitating engagement, inclusivity and understanding within the parties involved. This is in stark contrast to the current challenge of software agents or bots in social networks, as for example seen in Twitter bots utilizing COVID-19 content to spread political conspiracies [ 34 ], and the general trend of deploying bots in large scale for political interference and influence [ 44 ].

Chatbots will both raise critical ethical challenges and hold implications for democratization of technology, and implementing research addressing these directions is important. Chatbots permit users to interact through natural language, and consequently are a potential low threshold means to access information and services and promote inclusion. However, due to technological limitations and design choices, they can be the means to perpetuate and even reinforce existing biases in society, exclude or discriminate against some user groups (e.g. [ 33 , 51 ]) and over-represent or enshrine specific values. Future research will need to investigate and demonstrate democratization of chatbots in practice, where conversational technology is to be made easily and widely accessible for various businesses and user groups across the globe so that more people can benefit from conversational interaction. Moreover, as part of chatbot democratization, it will be important to make their development process more accessible as well, without requiring chatbot developers to have in-depth software engineering knowledge—as exemplified in using visual programming approaches such as Blockly to chatbot development [ 89 ]. In this way, chatbots can be created by experts in the domain where they will be used. This aspect is fundamental since chatbots are not conventional technologies, but they are developing as agents operating in social contexts.

Taking a broader ethical perspective, key questions when implementing future research on chatbots may include: What are the ethical implications of chatbots imitating human beings? Whose (and which) values should guide design practice within a global marketplace? What are the ethical implications of replacing humans with chatbots as a means of support for purposes such as commerce, therapy, or social interaction? How to facilitate chatbot support in decision making without risking or compromising agreed ethical principles? Ethical reflections and discussion on chatbots and chatbot applications are already emerging (e.g. [ 65 , 91 ]). We anticipate that advances in the democratization of chatbots will increasingly inspire ethical discourse that ties in with higher-level discussions about chatbot applications.

6.2 Perspectives on chatbot application areas

The identified research topics and corresponding future research directions may guide research so as to contribute to the fundamental understanding of the chatbot technology and the corresponding user interaction and engagement. However, to generate added value in specific application areas—such as customer service, health, education, office work, and home applications—further reflections about the respective use cases are needed. In particular, researchers need to analyse how chatbots may be leveraged and taken up in different application areas, how knowledge and research may be transferable across different application areas, and whether distinct research agenda should be established.

Many aspects of the outlined on our future research agenda are valid for any application area. For example, results concerning chatbot communication, user experience, design, and technology are the basis for applying chatbots in specific application areas. However, further analysis efforts are needed to understand the characteristics of each application area in more detail. For instance, requirements in the health sector concerning privacy aspects, ethics, and trust may be significantly more demanding than similar requirements in other sectors as they might have severe impacts on the users and concern highly sensitive personal information. In business contexts such as corporate customer support scenarios, the potential impacts may be less severe, but specific corporate regulations and norms need to be considered. In contrast to that, the use of chatbots in personal settings, e.g. the use of chatbots for social relationship, is often mainly driven by motivations for engagement and meaning-making. In contrast to health and business contexts, personal benefits are often not measured in monetary terms, but the main focus of personal usage is the improvement of daily life or wellbeing.

Regardless of differences among the diverse application areas in application-oriented research, many research studies exist in specific domains that could possibly be transferred to others. For instance, studies focusing on information provision in business contexts can most likely be applied in the health sectors as well, e.g., provision of product information will likely be similar to explaining healthy nutrition. However, to enable a transfer of research results across application areas, commonalities and differences of the involved application areas need to be identified and assessed. If the main characteristics of both are similar, transfer of the research results seems viable. Based on such an analysis and comparison, a generalization of the research across application areas seems possible. This procedure for future research on chatbot application areas could lead to a substantial increase in the body of knowledge as many research results from existing pilot studies and prototypes for specific application areas exist and may be reused as the basis for transfer and generalization (i.e., general design guidelines) to further application areas.

6.3 Continuing the discussion and collaboration

The presented challenges may serve as a step in the direction of contributing to the body of knowledge about chatbot usage and challenges, the frameworks and platforms underpinning chatbot applications, as well as needed future work on the broader implications of chatbots to work and society.

The proposed future research directions are intended as a response to the current lack of coherence in the emerging field of chatbot research, which may in part be observed by the broad range of journals and conferences in which findings from chatbot research are presented, and also the lack of commonly agreed key constructs, models, and measurement instruments. While this may be expected in an emerging research area, future research will benefit from a greater degree of coherence and cohesiveness in the field.

Nevertheless, there may be topics that have been omitted in the process leading up to our proposition, and relevant state-of-the-art and current research challenges may have been left out. Furthermore, as the field evolves, it is necessary to update the set of topics and research directions regularly. In consequence, continued interdisciplinary discussion and collaboration are needed to validate and refine the proposed set of future research directions.

One limitation deserving particular mention concerns the context of this discussion. The findings are based on discussions at the CONVERSATIONS workshop and mainly involve researchers from European organizations. While we assume the proposed directions hold broad international relevance and interest, it may be fruitful to test this assumption through discussion in the field—a discussion which we hope this paper will spur.

In further discussion and collaboration on chatbot research directions, care should be taken to involve the broadest possible set of interests and perspectives. For example, it will be beneficial to involve both researchers and practitioners, as well as the emerging and established set of research communities with an interest in conversational computer systems, to make sure that different enabling technologies and knowledge resources needed in future development and design of chatbots are represented. While research on conversational systems and user interfaces has been conducted for decades, chatbot research and design are still in its relative infancy. A consolidation of the field is needed, and we hope the proposed research agenda, with its directions for future research, may serve as a step towards such consolidation.

DialogFlow, https://cloud.google.com/dialogflow .

Microsoft Bot Framework, https://dev.botframework.com/ .

Pandorabots, https://home.pandorabots.com/ .

Rasa, https://rasa.com/ .

Mycroft, https://mycroft.ai/ .

CUI 2021—Conversational User Interfaces, https://www.conversationaluserinterfaces.org/2021/ .

CONVERSATIONS 2021—international workshop on chatbot research, https://conversations2021.wordpress.com/ .

CAIR 2020—Conversational Approaches to Information Retrieval, https://sites.google.com/view/cair-ws/cair-2020 .

SIGDIAL—Special interest Group on Discourse and Dialogue, https://www.sigdial.org/ .

IVA 2021—21st ACM International Conference on Intelligent Virtual Agents, https://sites.google.com/view/iva2021/ .

IWSDS 2021—12th International Workshop on Spoken Dialogue Systems Technology, https://www.iwsds.tech/ .

INTERSPEECH 2021—The 21st Annual Conference of the International Speech Communication Association, https://interspeech2021.org .

Google Conversation Design, https://designguidelines.withgoogle.com/conversation/ .

Alexa Design Guide, https://developer.amazon.com/en-US/docs/alexa/alexa-design/get-started.html .

Adam M, Wessel M, Benlian A (2020) AI-based chatbots in customer service and their effects on user compliance. Electron Markets. https://doi.org/10.1007/s12525-020-00414-7

Article   Google Scholar  

Adiwardana D, Luong MT, So DR, Hall J, Fiedel N, Thoppilan R, Le QV (2020) Towards a human-like open-domain chatbot. arXiv preprint. arXiv:2001.09977

Amershi S, Weld D, Vorvoreanu M, Fourney A, Nushi B, Collisson P, Teevan J (2019) Guidelines for human–AI interaction. In: Proceedings of the CHI 2019 (paper no. 3). ACM, New York

Araujo T (2018) Living up to the chatbot hype: the influence of anthropomorphic design cues and communicative agency framing on conversational agent and company perceptions. Comput Hum Behav 85:183–189

Google Scholar  

Araujo T (2020) Conversational agent research toolkit: an alternative for creating and managing chatbots for experimental research. Comput Commun Res 2(1):35–51

Ashktorab Z, Jain M, Liao QV, Weisz JD (2019) Resilient chatbots: repair strategy preferences for conversational breakdowns. In: Proceedings of CHI 2019 (paper no. 254). ACM, New York

Baez M, Daniel F, Casati F (2019) Conversational web interaction: proposal of a dialog-based natural language interaction paradigm for the Web. In: Chatbot research and design. Third international workshop, CONVERSATIONS. Springer, Cham, pp 94–11

Bavaresco R, Silveira D, Reis E, Barbosa J, Righi R, Costa C, Moreira C (2020) Conversational agents in business: a systematic literature review and future research directions. Comput Sci Rev 36:100239. https://doi.org/10.1016/j.cosrev.2020.100239

Bender EM, Gebru T, McMillan-Major A, Mitchell M (2021) On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. ACM, New York, pp 610–623

Bickmore T, Picard RW (2005) Establishing and maintaining long-term human–computer relationships. ACM Trans Comput Hum Interact 12(2):293–327. https://doi.org/10.1145/1067860.1067867

Bittner E, Shoury O (2019) Designing automated facilitation for design thinking: a chatbot for supporting teams in the empathy map method. In: Proceedings of the 52nd Hawaii international conference on system sciences. Scholar Space, Honolulu, pp 227–236

Bobrow DG, Kaplan RM, Kay M, Norman DA, Thompson H, Winograd T (1977) GUS, a frame-driven dialog system. Artif Intell 8(2):155–173

MATH   Google Scholar  

Bott N, Wexler S, Drury L, Pollak C, Wang V, Scher K, Narducci S (2019) A protocol-driven, bedside digital conversational agent to support nurse teams and mitigate risks of hospitalization in older adults: case control pre-post study. J Med Internet Res 21(10):e13440

Brandtzaeg PB, Følstad A (2017) Why people use chatbots. In: Proceeedings of the international conference on internet science—INSCI 2017. Springer, Cham, pp 377–392

Breazeal C (2003) Toward sociable robots. Robot Auton Syst 42(3–4):167–175

Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Amodei D (2020) Language models are few-shot learners. arXiv preprint. arXiv:2005.14165

Candello H, Pinhanez C, Pichiliani M, Cavalin P, Figueiredo F, Vasconcelos M, Carmo DH (2019) The effect of audiences on the user experience with conversational interfaces in physical spaces. In: Proceedings of CHI 2019 (paper no. 90). ACM, New York

Cassell J, Bickmore T, Billinghurst M, Campbell L, Chang K, Vilhjálmsson H, Yan H (1999) Embodiment in conversational interfaces: Rea. In: Proceedings of CHI ‘99. ACM, New York, pp 520–527

Catania F, Di Nardo N, Garzotto F, Occhiuto D (2019) Emoty: an emotionally sensitive conversational agent for people with neurodevelopmental disorders. In Proceedings of the 52nd Hawaii international conference on system sciences. Scholar Space, Honolulu, pp 2014–2023

Chen H, Liu X, Yin D, Tang J (2017) A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Explor Newsl 19(2):25–35

Chung H, Iorga M, Voas J, Lee S (2017) Alexa, can I trust you? Computer 50(9):100–104

Ciechanowski L, Przegalinska A, Magnuski M, Gloor P (2019) In the shades of the uncanny valley: an experimental study of human–chatbot interaction. Future Gener Comput Syst 92:539–548

Corkrey R, Parkinson L (2002) Interactive voice response: review of studies 1989–2000. Behav Res Methods Instruments Comput 34(3):342–353

Cranor LF (2008) A framework for reasoning about the human in the loop. In :UPSec’08: Proceedings of the 1st conference on usability, psychology, and security. USENIX Association, Berkeley

Dale R (2016) The return of the chatbots. Nat Lang Eng 22(5):811–817

Daniel G, Cabot J, Deruelle L, Derras M (2020) Xatkit: a multimodal low-code chatbot development framework. IEEE Access 8:15332–15346

DeVault D, Artstein R, Benn G, Dey T, Fast E, Gainer A, Morency LP (2014) SimSensei Kiosk: a virtual human interviewer for healthcare decision support. In Proceedings of the 2014 international conference on autonomous agents and multi-agent systems, pp 1061–1068

Deriu J, Rodrigo A, Otegi A, Echegoyen G, Rosset S, Agirre E, Cieliebak M (2021) Survey on evaluation methods for dialogue systems. Artif Intell Rev 54:755–810

Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, arXiv:1810.04805

European Commission (2019) Ethics guidelines for trustworthy AI. Retrieved from: https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai

Farooq U, Grudin J (2016) Human–computer integration. Interactions 23(6):26–32

Feine J, Gnewuch U, Morana S, Maedche A (2019a) A taxonomy of social cues for conversational agents. Int J Hum Comput Stud 132:138–161

Feine J, Gnewuch U, Morana S, Maedche A (2019b) Gender bias in chatbot design. In: Chatbot research and design. Third international workshop CONVERSATIONS 2019. Springer, Cham, pp 79–93

Ferrara E (2020) What types of COVID-19 conspiracies are populated by Twitter bots? First Monday. https://doi.org/10.5210/fm.v25i6.10633

Fitzpatrick KK, Darcy A, Vierhile M (2017) Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): a randomized controlled trial. JMIR Mental Health 4(2):e19

Floridi L, Chiriatti M (2020) GPT-3: Its nature, scope, limits, and consequences. Mind Mach 30(4):681–694

Frey CB, Osborne MA (2017) The future of employment: How susceptible are jobs to computerisation? Technol Forecast Soc Chang 114:254–280

Følstad A, Brandtzæg PB (2017) Chatbots and the new world of HCI. Interactions 24(4):38–42

Følstad A, Brandtzaeg PB (2020) Users’ experiences with chatbots: findings from a questionnaire study. Quality User Exp. https://doi.org/10.1007/s41233-020-00033-2

Følstad A, Brandtzaeg PB, Feltwell T, Law ELC, Tscheligi M, Luger E (2018) Chatbots for social good. In: Extended abstracts of the 2018 CHI conference on human factors in computing systems (paper no. SIG06). ACM, New York

Følstad A, Araujo T, Papadopoulos S, Law EL-C, Luger E, Goodwin M, Brandtzaeg PB (eds) (2020) Chatbot research and design 4th international workshop, CONVERSATIONS 2020. Springer, Cham

Go E, Sundar SS (2019) Humanizing chatbots: the effects of visual, identity and conversational cues on humanness perceptions. Comput Hum Behav 97:304–316

Goel AK, Polepeddi L (2016) Jill Watson: A virtual teaching assistant for online education. Georgia Institute of Technology. Retrieved from https://smartech.gatech.edu/handle/1853/59104

Gorwa R, Guilbeault D (2020) Unpacking the social media bot: a typology to guide research and policy. Policy Internet 12(2):225–248

Grudin J, Jacques R (2019) Chatbots, humbots, and the quest for artificial general intelligence. In: Proceedings of CHI 2019 (paper no. 209). ACM, New York

Guzman AL, Lewis SC (2019) Artificial intelligence and communication: a human–machine communication research agenda. New Media Soc 22(1):70–86

Hagendorff T (2020) The ethics of AI ethics: An evaluation of guidelines. Mind Mach 30(1):99–120

Hall E (2018) Conversational Design. A Book Apart, New York

Hao K (2020) We read the paper that forced Timnit Gebru out of Google. Here’s what it says. MIT Technology Review. Retrieved from https://www.technologyreview.com/2020/12/04/1013294/google-ai-ethics-research-paper-forced-out-timnit-gebru/

Harms JG, Kucherbaev P, Bozzon A, Houben GJ (2018) Approaches for dialog management in conversational agents. IEEE Internet Comput 23(2):13–22

Harwell D (2018) The accent gap. The Washington Post. Retrieved from https://www.washingtonpost.com/graphics/2018/business/alexa-does-not-understand-your-accent/

Ho A, Hancock J, Miner AS (2018) Psychological, relational, and emotional effects of self-disclosure after conversations with a chatbot. J Commun 68(4):712–733

Hobert S, Berens F (2019) Small talk conversations and the long-term use of chatbots in educational settings–experiences from a field study. In: Chatbot research and design. Third international workshop CONVERSATIONS 2019. Springer, Cham, pp 260–272

Hobert S, Meyer von Wolff, R (2019) Say hello to your new automated tutor—a structured literature review on pedagogical conversational agents. In: Proceedings of the 14th international conference on Wirtschaftsinformatik. Association for Information Systems, pp.301–313

Hoffman G, Breazeal C (2004) Collaboration in human–robot teams. In: AIAA 1st intelligent systems technical conference. https://doi.org/10.2514/6.2004-6434

Hsieh T, Chaudhury B, Cross ES (2020) Human–robot cooperation in prisoner dilemma games: people behave more reciprocally than prosocially toward robots. In: Companion of the 2020 ACM/IEEE international conference on human–robot interaction. ACM, New York, pp 257–259

Jain M, Kumar P, Kota R, Patel SN (2018) Evaluating and informing the design of chatbots. In: Proceedings of the 2018 designing interactive systems conference. ACM, New York, pp 895–906

McTear MF (2002) Spoken dialogue technology: enabling the conversational user interface. ACM Comput Surv (CSUR) 34(1):90–169

Jokinen K, McTear M (2009) Spoken dialogue systems. Synthesis lectures on human language technologies. Morgan & Claypool Publishers, Williston

Jovanovic M, Baez M, Casati F (2020) Chatbots as conversational healthcare services. IEEE Internet Comput. https://doi.org/10.1109/MIC.2020.3037151

Jurafsky D, Martin JH (2020) Dialogue systems and chatbots. In: Speech and language processing. Draft of December 30, 2020. https://web.stanford.edu/~jurafsky/slp3/24.pdf

Kim S, Lee J, Gweon G (2019) Comparing data from chatbot and web surveys: effects of platform and conversational style on survey response quality. In: Proceedings of CHI 2019 (paper no. 86). ACM, New York

Kocaballi AB, Laranjo L, Coiera E (2019) Understanding and measuring user experience in conversational interfaces. Interact Comput 31(2):192–207

Kopp S, Gesellensetter L, Krämer NC, Wachsmuth I (2005) A conversational agent as museum guide–design and evaluation of a real-world application. In: International workshop on intelligent virtual agents. Springer, Heidelberg, pp 329–343

Kretzschmar K, Tyroll H, Pavarini G, Manzini A, Singh I (2019) Can your phone be your therapist? Young people’s ethical perspectives on the use of fully automated conversational agents (chatbots) in mental health support. Biomed Inform Insights. https://doi.org/10.1177/1178222619829083

Kvale K, Sell OA, Hodnebrog S, Følstad A (2019) Improving conversations: lessons learnt from manual analysis of chatbot dialogues. In: Chatbot research and design. Third international workshop CONVERSATIONS 2019. Springer, Cham, pp 187–200

Laban G, George J, Morrison V, Cross ES (2021) Tell me more! Assessing interactions with social robots from speech. Paladyn J Behav Robot 12(1):136–159

Laban G, Araujo T (2019) Working together with conversational agents: the relationship of perceived cooperation with service performance evaluations. In: Chatbot research and design. Third international workshop CONVERSATIONS 2019. Springer, Cham, pp 215–228

Laban G, Araujo T (2020) The effect of personalization techniques in users’ perceptions of conversational recommender system. In: IVA ’20: Proceedings of the 20th ACM international conference on intelligent virtual agents (paper no. 34). ACM, New York

Latour B (2005) Reassembling the Social - An introduction to actor-network-theory. Oxford University Press, Oxford, UK

Lee S, Choi J (2017) Enhancing user experience with conversational agent for movie recommendation: effects of self-disclosure and reciprocity. Int J Hum Comput Stud 103:95–105

Lester J, Branting K, Mott B (2004) Conversational agents. In: Singh MP (ed) The Practical Handbook of Internet Computing. Chapman & Hall/CRC, Boca Raton

Liao QV, Geyer W, Muller M, Khazaen Y (2020) Conversational interfaces for information search. Understanding and improving information search. Springer, Cham, pp 267–287

Liao QV, Mas-ud Hussain M, Chandar P, Davis M, Khazaeni Y, Crasso MP, Geyer W (2018) All work and no play? In Proceedings of CHI 2018 (paper no. 3). ACM,New York

Luger E, Sellen A (2016) “Like having a really bad PA”—the gulf between user expectation and experience of conversational agents. In: Proceedings of CHI 2016. ACM, New York, pp 5286–5297

McAfee A, Brynjolfsson E (2017) Machine, platform, crowd: harnessing our digital future. W.W. Norton & Company, New York

McTear M (2021) Conversational AI: Dialogue Systems, Conversational Agents, and Chatbots. Morgan & Claypool, Williston

Meyer von Wolff R, Hobert S, Masuch K, Schumann M (2020) Chatbots at digital workplaces - A grounded-theory approach for surveying application areas and objectives. Pac Asia J Assoc Inf Syst 12(2):64–102

Moore RJ, Arar R (2019) Conversational UX design: a practitioner’s guide to the natural conversation framework. Morgan & Claypool, Williston

Nordheim CB, Følstad A, Bjørkli CA (2019) An initial model of trust in chatbots for customer service—findings from a questionnaire study. Interact Comput 31(3):317–335

Ntoutsi E, Fafalios P, Gadiraju U, Iosifidis V, Nejdl W, Vidal ME, Ruggieri S, Turini F, Papadopoulos S, Krasanakis E, Kompatsiaris I (2020) Bias in data-driven artificial intelligence systems—an introductory survey. Wiley Interdiscipl Rev Data Min Knowl Discov 10(3):e1356

Oh CS, Bailenson JN, Welch GF (2018) A systematic review of social presence: definition, antecedents, and implications. Front Robot AI 5:114

Paikens P, Znotiņš A, Bārzdiņš G (2020) Human-in-the-loop conversation agent for customer service. In: Proceedings of the 25th international conference on applications of natural language to information systems, NLDB 2020. Springer, Cham, pp 277–284

Pérez JQ, Daradoumis T, Puig JMM (2020) Rediscovering the use of chatbots in education: a systematic literature review. Comput Appl Eng Educ 28(6):1549–1565

Perski O, Crane D, Beard E, Brown J (2019) Does the addition of a supportive chatbot promote user engagement with a smoking cessation app? An experimental study. Digital Health 5:2055207619880676

Piccolo LS, Mensio M, Alani H (2018) Chasing the chatbots. In Internet Science. INSCI 2018 international workshops. Springer, Cham, pp 157–169

Porcheron M, Fischer JE, Reeves S, Sharples S (2018) Voice interfaces in everyday life. In: Proceedings of CHI 2018 (paper no. 640). ACM,New York

Press G (2019) AI Stats News: 62 % of US Consumers like using chatbots to interact with businesses. Forbes. Retrieved from https://www.forbes.com/sites/gilpress/2019/10/25/ai-stats-news-us-consumers-interest-in-using-chatbots-to-interact-with-businesses-rise-to-62/

Rodríguez-Gil L, García-Zubia J, Orduña P, Villar-Martinez A, López-De-Ipiña D (2019) New approach for conversational agent definition by non-programmers: a visual domain-specific language. IEEE Access 7:5262–5276

Roller S, Dinan E, Goyal N, Ju D, Williamson M, Liu Y, Boureau YL (2020) Recipes for building an open-domain chatbot. arXiv preprint. arXiv:2004.13637

Ruane E, Birhane A, Ventresque A (2019) Conversational AI: social and ethical considerations. In: Proceedings for the 27th AIAI Irish conference on artificial intelligence and cognitive science—AICS 2019, pp 104–115. CEUR Workshop Proceedings, Vol-2563. http://ceur-ws.org/Vol-2563/

Rubin VL, Chen Y, Thorimbert LM (2010) Artificially intelligent conversational agents in libraries. Library Hi Tech 28(4):496–522

Salem M, Kopp S, Wachsmuth I, Rohlfing K, Joublin F (2012) Generation and evaluation of communicative robot gesture. Int J Social Robot 4(2):201–217

Schaake M (2021). The European commission’s artificial intelligence act. Policy Brief, Stanford HAI. Retrieved from https://hai.stanford.edu/sites/default/files/2021-06/HAI_Issue-Brief_The-European-Commissions-Artificial-Intelligence-Act.pdf

Schuller BW (2018) Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Commun ACM 61(5):90–99

Seering J, Luria M, Ye C, Kaufman G, Hammer J (2020) It takes a village: integrating an adaptive chatbot into an online gaming community. In: Proceedings of CHI 2020 (paper no. 579). ACM,New York

Shevat A (2017) Designing bots: creating conversational experiences. O’Reilly Media, Boston

Shum HY, He XD, Li D (2018) From Eliza to XiaoIce: challenges and opportunities with social chatbots. Front Inf Technol Electron Eng 19(1):10–26

Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in NLP. arXiv preprint, arXiv:1906.02243

Sundar SS (2020) Rise of machine agency: a framework for studying the psychology of Human–AI Interaction (HAII). J Comput Mediat Commun 25(1):74–88

Suresh H, Guttag JV (2019) A framework for understanding unintended consequences of machine learning. arXiv preprint, arXiv:1901.10002

Syvänen S, Valentini C (2020) Conversational agents in online organization–stakeholder interactions: a state-of-the-art analysis and implications for further research. J Commun Manag 339–362

Ta V, Griffith C, Boatfield C, Wang X, Civitello M, Bader H, Loggarakis A (2020) User experiences of social support from companion chatbots in everyday contexts: thematic analysis. J Med Internet Res 22(3):e16235

Toxtli C, Monroy-Hernández A, Cranshaw J (2018) Understanding chatbot-mediated task management. In: Proceedings of CHI 2018 (paper no. 58). ACM,New York

Vaidyam AN, Wisniewski H, Halamka JD, Kashavan MS, Torous JB (2019) Chatbots and conversational agents in mental health: a review of the psychiatric landscape. Can J Psychiatry 64(7):456–464. https://doi.org/10.1177/0706743719828977

Van der Goot MJ, Pilgrim T (2019) Exploring age differences in motivations for and acceptance of chatbot communication in a customer service context. In: Chatbot research and design. Third international workshop CONVERSATIONS 2019. Springer, Cham, pp 173–186

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Conference on neural information processing systems—NIPS 2017, pp 5998–6008

Vaziri M, Mandel L, Shinnar A, Siméon J, Hirzel M (2017) Generating chat bots from web API specifications. In: Proceedings of the 2017 ACM SIGPLAN international symposium on new ideas, new Pparadigm, and reflections on programming and software. ACM, New York, pp 44–57

Vinyals O, Le Q (2015) A neural conversational model. arXiv preprint, arXiv:1506.05869

Väänänen K, Hiltunen A, Varsaluoma J, Pietilä I (2019) CivicBots–Chatbots for supporting youth in societal participation. In: Chatbot research and design. Third international workshop CONVERSATIONS 2019. Springer, Cham, pp 143–157

Wallace R (2003) The elements of AIML style. Alice AI Foundation, 139

Weizenbaum J (1967) Contextual understanding by computers. Commun ACM 10(8):474–480

Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Brew J (2019) HuggingFace’s transformers: state-of-the-art natural language processing. arXiv preprint, arxiv:1910.03771

Xu A, Liu Z, Guo Y, Sinha V, Akkiraju R (2017) A new chatbot for customer service on social media. In: Proceedings of CHI 2017. ACM, New York, pp 3506–3510

Yaghoub-Zadeh-Fard MA, Zamanirad S, Benatallah B, Casati F (2020) REST2Bot: bridging the gap between bot platforms and REST APIs. In: Companion proceedings of the web conference 2020. ACM, New York, pp 245–248

Yang Q, Steinfeld A, Rosé C, Zimmerman J (2020) Re-examining whether, why, and how Human-AI Interaction is uniquely difficult to design. In: Proceedings of CHI 2020 (paper no. 174). ACM,New York

Zamora J (2017) I’m sorry, Dave, I’m afraid I can’t do that: Chatbot perception and expectations. In: Proceedings of the 5th international conference on human agent Interaction. ACM,New York,pp 253–260

Zarouali B, Van den Broeck E, Walrave M, Poels K (2018) Predicting consumer responses to a chatbot on Facebook. Cyberpsychol Behav Soc Netw 21(8):491–497

Zierau N, Engel C, Söllner M, Leimeister JM (2020) Trust in smart personal assistants: A systematic literature review and development of a research agenda. In: Proceedings of the 15th international conference on Wirtschaftsinformatik—WI2020. Association for Information Systems,pp 99–11

Zou J, Schiebinger L (2018) AI can be sexist and racist—it’s time to make it fair. Nature 559:324–326

Download references

Open access funding provided by SINTEF AS. Funding supporting the work conducted by the first author was provided by Norges Forskningsråd (Grant No. 270940).

Author information

Authors and affiliations.

SINTEF, Oslo, Norway

Asbjørn Følstad & Petter Bae Brandtzaeg

University of Amsterdam, Amsterdam, The Netherlands

Theo Araujo, Carolin Ischen & Rebecca Wald

Durham University, Durham, UK

Effie Lai-Chong Law

University of Oslo, Oslo, Norway

Petter Bae Brandtzaeg

CERTH, Thessaloníki, Greece

Symeon Papadopoulos

University of Bamberg, Bamberg, Germany

Claude Bernard University Lyon 1, Villeurbanne, France

Marcos Baez

University of Glasgow, Glasgow, UK

Ulster University, Jordanstown campus, Newtownabbey, UK

Patrick McAllister

Politecnico di Milano, Milan, Italy

Fabio Catania

University of Goettingen, Göttingen, Germany

Raphael Meyer von Wolff & Sebastian Hobert

University of Edinburgh, Edinburgh, UK

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Asbjørn Følstad .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Følstad, A., Araujo, T., Law, E.LC. et al. Future directions for chatbot research: an interdisciplinary research agenda. Computing 103 , 2915–2942 (2021). https://doi.org/10.1007/s00607-021-01016-7

Download citation

Received : 03 September 2020

Accepted : 15 September 2021

Published : 19 October 2021

Issue Date : December 2021

DOI : https://doi.org/10.1007/s00607-021-01016-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Conversational agents
  • Dialogue systems
  • Future research directions

Mathematics Subject Classification

  • 68-02 Research exposition (monographs, survey articles) pertaining to computer science
  • Find a journal
  • Publish with us
  • Track your research
  • Research article
  • Open access
  • Published: 15 December 2021

Educational chatbots for project-based learning: investigating learning outcomes for a team-based design course

  • Jeya Amantha Kumar   ORCID: orcid.org/0000-0002-6920-0348 1  

International Journal of Educational Technology in Higher Education volume  18 , Article number:  65 ( 2021 ) Cite this article

20k Accesses

44 Citations

8 Altmetric

Metrics details

Educational chatbots (ECs) are chatbots designed for pedagogical purposes and are viewed as an Internet of Things (IoT) interface that could revolutionize teaching and learning. These chatbots are strategized to provide personalized learning through the concept of a virtual assistant that replicates humanized conversation. Nevertheless, in the education paradigm, ECs are still novel with challenges in facilitating, deploying, designing, and integrating it as an effective pedagogical tool across multiple fields, and one such area is project-based learning. Therefore, the present study investigates how integrating ECs to facilitate team-based projects for a design course could influence learning outcomes. Based on a mixed-method quasi-experimental approach, ECs were found to improve learning performance and teamwork with a practical impact. Moreover, it was found that ECs facilitated collaboration among team members that indirectly influenced their ability to perform as a team. Nevertheless, affective-motivational learning outcomes such as perception of learning, need for cognition, motivation, and creative self-efficacy were not influenced by ECs. Henceforth, this study aims to add to the current body of knowledge on the design and development of EC by introducing a new collective design strategy and its pedagogical and practical implications.

Introduction

Chatbots are defined as computer programs that replicate human-like conversations by using natural language structures (Garcia Brustenga et al., 2018 ; Pham et al., 2018 ) in the form of text messages (websites or mobile applications), voice-based (Alexa or Siri), or a combination of both (Pereira et al., 2019 ; Sandoval, 2018 ). These automated conversational agents (Riel, 2020 ) have been significantly used to replicate customer service interaction (Holotescu, 2016 ) in various domains (Khan et al., 2019 ; Wang et al., 2021 ) to an extent it has become a common trend (Wang et al., 2021 ). The use of chatbots are further expanded due to the affordance, cost (Chocarro et al., 2021 ), development options (Sreelakshmi et al., 2019 ; Wang et al., 2021 ), and adaption facilitated by social network and mobile instant messaging (MIM) applications (apps) (Brandtzaeg & Følstad, 2018 ; Cunningham-Nelson et al., 2019 ) such as WhatsApp, Line, Facebook, and Telegrams.

Accordingly, chatbots popularized by social media and MIM applications have been widely accepted (Rahman et al., 2018 ; Smutny & Schreiberova, 2020 ) and referred to as mobile-based chatbots. These bots have been found to facilitates collaborative learning (Schmulian & Coetzee, 2019 ), multimodal communication (Haristiani et al., 2019 ), scaffolding, real-time feedback (Gonda et al., 2019 ), personalized learning (Oke & Fernandes, 2020 ; Verleger & Pembridge, 2019 ), scalability, interactivity (Dekker et al., 2020 ) and fosters knowledge creation and dissipation effectively (Verleger & Pembridge, 2019 ). Nevertheless, given the possibilities of MIM in conceptualizing an ideal learning environment, we often overlook if instructors are capable of engaging in high-demand learning activities, especially around the clock (Kumar & Silva, 2020 ). Chatbots can potentially be a solution to such a barrier (Schmulian & Coetzee, 2019 ), especially by automatically supporting learning communication and interactions (Eeuwen, 2017 ; Garcia Brustenga et al., 2018 ) for even a large number of students.

Nevertheless, Wang et al. ( 2021 ) claims while the application of chatbots in education are novel, it is also impacted by scarcity. Smutny and Schreiberova ( 2020 ), Wang et al. ( 2021 ), and Winkler and Söllner ( 2018 ) added that the current domain of research in educational chatbots (EC) has been focusing on language learning (Vázquez-Cano et al., 2021 ), economics, medical education, and programming courses. Henceforth, it is undeniable that the role of EC, while not been widely explored outside these contexts (Schmulian & Coetzee, 2019 ; Smutny & Schreiberova, 2020 ) due to being in the introductory stages (Chen et al., 2020 ), are also constrained with limited pedagogical examples in the educational context (Stathakarou et al., 2020 ). Nevertheless, while this absence is inevitable, it also provides a potential for exploring innovations in educational technology across disciplines (Wang et al., 2021 ). Furthermore, according to Tegos et al. ( 2020 ), investigation on integration and application of chatbots is still warranted in the real-world educational settings. Therefore, the objective of this study is first to address research gaps based on literature, application, and design and development strategies for EC. Next, by situating the study based on these selected research gaps, the effectiveness of EC is explored for team-based projects in a design course using a quasi-experimental approach.

Literature review

The term “chatbot” was derived to represent two main attributes which are “chat” in lieu of the conversational attributes and “bot” short for robot (Chocarro et al., 2021 ). Chatbots are automated programs designed to execute instructions based on specific inputs (Colace et al., 2018 ) and provide feedback that replicates natural conversational style (Ischen et al., 2020 ). According to Adamopoulou and Moussiades ( 2020 ), there are six main chatbots parameters that determines design and development consideration:

knowledge domain—open and closed domains

services—interpersonal, intrapersonal, and inter-agent chatbots

goals—informative, chat-based, or task-based

input processing and response generation—rule-based model, retrieval-based model, and generative model

build—open-source or closed platforms.

These parameters convey that a chatbot can fulfill numerous communication and interaction functionalities based on needs, platforms, and technologies. Typically, they are an exemplary use of artificial intelligence (AI) which conversely initiated various state-of-the-art platforms for developing chatbots such as Google’s DialogFlow, IBM Watson Conversation, Amazon Lex, Flow XO, and Chatterbot (Adamopoulou & Moussiades, 2020 ). However, while using AI is impressive, chatbots application is limited as it primarily uses the concept of artificial narrow intelligence (ANI) (Holotescu, 2016 ). Therefore, it can only perform a single task based on a programmed response, such as examining inputs, providing information, and predicting subsequent moves. While limited, ANI is the only form of AI that humanity has achieved to date (Schmulian & Coetzee, 2019 ). Conversely, such limitation also enables a non-technical person to design and develop chatbots without much knowledge of AI, machine learning, or neuro-linguistic programming (Gonda et al., 2019 ). While this creates an “openness with IT” (Schlagwein et al., 2017 ) across various disciplines, big-tech giants such as Google, Facebook, and Microsoft also view chatbots as the next popular technology for the IoT era (Følstad & Brandtzaeg, 2017 ). Henceforth, if chatbots are able to gain uptake, it will change how people obtain information, communicate (Følstad et al., 2019 ), learn and gather information (Wang et al., 2021 ); hence the introduction of chatbots for education.

Chatbots in education

Chatbots deployed through MIM applications are simplistic bots known as messenger bots (Schmulian & Coetzee, 2019 ). These platforms, such as Facebook, WhatsApp, and Telegram, have largely introduced chatbots to facilitate automatic around-the-clock interaction and communication, primarily focusing on the service industries. Even though MIM applications were not intended for pedagogical use, but due to affordance and their undemanding role in facilitating communication, they have established themselves as a learning platform (Kumar et al., 2020 ; Pereira et al., 2019 ). Henceforth, as teaching is an act of imparting knowledge through effective communication, the ubiquitous format of a mobile-based chatbot could also potentially enhance the learning experience (Vázquez-Cano et al., ( 2021 ); thus, chatbots strategized for educational purposes are described as educational chatbots.

Bii ( 2013 ) defined educational chatbots as chatbots conceived for explicit learning objectives, whereas Riel ( 2020 ) defined it as a program that aids in achieving educational and pedagogical goals but within the parameters of a traditional chatbot. Empirical studies have positioned ECs as a personalized teaching assistant or learning partner (Chen et al., 2020 ; Garcia Brustenga et al., 2018 ) that provides scaffolding (Tutor Support) through practice activities (Garcia Brustenga et al., 2018 ). They also support personalized learning, multimodal content (Schmulian & Coetzee, 2019 ), and instant interaction without time limits (Chocarro et al., 2021 ). All the same, numerous benefits have been reported reflecting positive experiences (Ismail & Ade-Ibijola, 2019 ; Schmulian & Coetzee, 2019 ) that improved learning confidence (Chen et al., 2020 ), motivation, self-efficacy, learner control (Winkler & Söllner, 2018 ), engagement (Sreelakshmi et al., 2019 ), knowledge retention (Cunningham-Nelson et al., 2019 ) and access of information (Stathakarou et al., 2020 ). Furthermore, ECs were found to provide value and learning choices (Yin et al., 2021 ), which in return is beneficial in customizing learning preferences (Tamayo et al., 2020 ).

Besides, as ECs promotes anytime anywhere learning strategies (Chen et al., 2020 ; Ondas et al., 2019 ), it is individually scalable (Chocarro et al., 2021 ; Stathakarou et al., 2020 ) to support learning management (Colace et al., 2018 ) and delivery of context-sensitive information (Yin et al., 2021 ). Henceforth, encouraging participation (Tamayo et al., ( 2020 ); Verleger & Pembridge, 2019 ) and disclosure (Brandtzaeg & Følstad, 2018 ; Ischen et al., 2020 ; Wang et al., 2021 ) of personal aspects that were not possible in a traditional classroom or face to face interaction. Conversely, it may provide an opportunity to promote mental health (Dekker et al., 2020 ) as it can be reflected as a ‘safe’ environment to make mistakes and learn (Winkler & Söllner, 2018 ). Furthermore, ECs can be operated to answer FAQs automatically, manage online assessments (Colace et al., 2018 ; Sandoval, 2018 ), and support peer-to-peer assessment (Pereira et al., 2019 ).

Moreover, according to Cunningham-Nelson et al. ( 2019 ), one of the key benefits of EC is that it can support a large number of users simultaneously, which is undeniably an added advantage as it reduces instructors' workload. Colace et al. ( 2018 ) describe ECs as instrumental when dealing with multiple students, especially testing behavior, keeping track of progress, and assigning tasks. Furthermore, ECs were also found to increase autonomous learning skills and tend to reduce the need for face-to-face interaction between instructors and students (Kumar & Silva, 2020 ; Yin et al., 2021 ). Conversely, this is an added advantage for online learning during the onset of the pandemic. Likewise, ECs can also be used purely for administrative purposes, such as delivering notices, reminders, notifications, and data management support (Chocarro et al., 2021 ). Moreover, it can be a platform to provide standard information such as rubrics, learning resources, and contents (Cunningham-Nelson et al., 2019 ). According to Meyer von Wolff et al ( 2020 ), chatbots are a suitable instructional tool for higher education and student are acceptive towards its application.

Conversely, Garcia Brustenga et al. ( 2018 ) categorized ECs based on eight tasks in the educational context as described in Table 1 . Correspondingly, these tasks reflect that ECs may be potentially beneficial in fulfilling the three learning domains by providing a platform for information retrieval, emotional and motivational support, and skills development.

Albeit, from the instructor’s perspective, ECs could be intricate and demanding, especially when they do not know to code (Schmulian & Coetzee, 2019 ); automation of some of these interactions could benefit educators in focusing on other pedagogical needs (Gonda et al., 2019 ). Nevertheless, enhancing such skills is often time-consuming, and teachers are usually not mentally prepared to take up a designer's (Kim, 2021 ) or programmer's role. The solution may be situated in developing code-free chatbots (Luo & Gonda, 2019 ), especially via MIM (Smutny & Schreiberova, 2020 ).

By so, for EC development, it is imperative to ensure there are design principles or models that can be adapted for pedagogical needs. At the same time, numerous models have been applied in the educational context, such as CommonKADS (Cameron et al., 2018 ), Goal-Oriented Requirements Engineering (GORE) (Arruda et al., 2019 ), and retrieval-based and QANet models (Wu et al., 2020 ). Nevertheless, these models reflect a coding approach that does not emphasize strategies or principles focusing on achieving learning goals. While Garcia Brustenga et al. ( 2018 ), Gonda et al. ( 2019 ), Kerly et al. ( 2007 ), Satow ( 2017 ), Smutny and Schreiberova ( 2020 ), and Stathakarou et al. ( 2020 ) have highlighted some design guidelines for EC, imperatively a concise model was required. Therefore, based on the suggestions of these empirical studies, the researcher identified three main design attributes: reliability, pedagogy, and experience (Table 2 ).

Nevertheless, it was observed that the communicative aspect was absent. Undeniably, chatbots are communication tools that stimulate interpersonal communication (Ischen et al., 2020 ; Wang et al., 2021 ); therefore, integrating interpersonal communication was deemed essential. Interpersonal communication is defined as communication between two individuals who have established a relationship (Devito, 2018 ), and such a relationship is also significant through MIM to represent the communication between peers and instructors (Chan et al., 2020 ). Furthermore, according to Han and Xu ( 2020 ), interpersonal communication moderates the relationship and perception that influences the use of an online learning environment. According to Hobert and Berens ( 2020 ), while chatbot interaction could facilitate small talk that could influence learning, such capabilities should not be overemphasize. Therefore, it was concluded that four fundamental attributes or strategies were deemed critical for EC design: Reliability, i nterpersonal communication, Pedagogy, and E xperience (RiPE), which are explained in Table 3 .

Nevertheless, ECs are not without flaws (Fryer et al., 2019 ). According to Kumar and Silva ( 2020 ), acceptance, facilities, and skills are still are a significant challenge to students and instructors. Similarly, designing and adapting chatbots into existing learning systems is often taxing (Luo & Gonda, 2019 ) as instructors sometimes have limited competencies and strategic options in fulfilling EC pedagogical needs (Sandoval, 2018 ). Moreover, the complexity of designing and capturing all scenarios of how a user might engage with a chatbot also creates frustrations in interaction as expectations may not always be met for both parties (Brandtzaeg & Følstad, 2018 ). Hence, while ECs as conversational agents may have been projected to substitute learning platforms in the future (Følstad & Brandtzaeg, 2017 ), much is still to be explored from stakeholders' viewpoint in facilitating such intervention.

Research gaps in EC research

Three categories of research gaps were identified from empirical findings (i) learning outcomes, (ii) design issues, and (iii) assessment and testing issues. Firstly, research gaps concerning learning outcomes are such as measuring effectiveness (Schmulian & Coetzee, 2019 ), perception, social influence (Chaves & Gerosa, 2021 ), personality traits, affective outcomes (Ciechanowski et al., 2019 ; Winkler & Söllner, 2018 ), acceptance (Chen et al., 2020 ; Chocarro et al., 2021 ), satisfaction (Stathakarou et al., 2020 ), interest (Fryer et al., 2019 ), motivation, learning performance (Yin et al., 2021 ), mental health (Brandtzaeg & Følstad, 2018 ), engagement (Riel, 2020 ) and cognitive effort (Nguyen & Sidorova, 2018 ). EC studies have primarily focused on language learning, programming, and health courses, implying that EC application and the investigation of learning outcomes have not been investigated in various educational domains and levels of education.

Next, as for design and implementation issues, a need to consider strategies that align ECs application for teaching and learning (Haristiani et al., 2019 ; Sjöström et al., 2018 ) mainly to supplement activities that can be used to replace face-to-face interactions (Schmulian & Coetzee, 2019 ) has been implied. According to Schmulian and Coetzee ( 2019 ), there is still scarcity in mobile-based chatbot application in the educational domain, and while ECs in MIM has been gaining momentum, it has not instigated studies to address its implementation. Furthermore, there are also limited studies in strategies that can be used to improvise ECs role as an engaging pedagogical communication agent (Chaves & Gerosa, 2021 ). Besides, it was stipulated that students' expectations and the current reality of simplistic bots may not be aligned as Miller ( 2016 ) claims that ANI’s limitation has delimited chatbots towards a simplistic menu prompt interaction.

Lastly, in regards to assessment issues, measurement strategies for both intrinsic and extrinsic learning outcomes (Sjöström et al., 2018 ) by applying experimental approaches to evaluate user experience (Fryer et al., 2019 ; Ren et al., 2019 ) and psychophysiological reactions (Ciechanowski et al., 2019 ) has been lacking. Nevertheless, Hobert ( 2019 ) claims that the main issue with EC assessment is the narrow view used to evaluate outcomes based on specific fields rather than a multidisciplinary approach. Moreover, evaluating the effectiveness of ECs is a complex process (Winkler & Söllner, 2018 ) as it is unclear what are the characteristics that are important in designing a specific chatbot (Chaves & Gerosa, 2021 ) and how the stakeholders will adapt to its application to support teaching and learning (Garcia Brustenga et al., 2018 ). Furthermore, there is a need for understanding how users experience chatbots (Brandtzaeg & Følstad, 2018 ), especially when they are not familiar with such intervention (Smutny & Schreiberova, 2020 ). Conversely, due to the novelty of ECs, the author has not found any studies pertaining to ECs in design education, project-based learning, and focusing on teamwork outcomes.

Purpose of the study

This study aims to investigate the effects of ECs for an Instructional Design course that applies team-based project towards learning outcomes, namely learning performance, perception of learning, need for cognition, motivation, creative self-efficacy, and teamwork. Learning performance is defined as the students' combined scores accumulated from the project-based learning activities in this study. Next, perception of the learning process is described as perceived benefits obtained from the course (Wei & Chou, 2020 ) and the need for cognition as an individual’s tendency to participate and take pleasure in cognitive activities (de Holanda Coelho et al., 2020 ). The need for cognition also indicates positive acceptance towards problem-solving (Cacioppo et al., 1996 ), enjoyment (Park et al., 2008 ), and it is critical for teamwork, as it fosters team performance and information-processing motivation (Kearney et al., 2009 ). Henceforth, we speculated that EC might influence the need for cognition as it aids in simplifying learning tasks (Ciechanowski et al., 2019 ), especially for teamwork.

Subsequently, motivational beliefs are reflected by perceived self-efficacy and intrinsic values students have towards their cognitive engagement and academic performance (Pintrich & de Groot, 1990 ). According to Pintrich et al. ( 1993 ), self-efficacy and intrinsic value strongly correlate with task value (Eccles & Wigfield, 2002 ), such as interest, enjoyment, and usefulness. Furthermore, Walker and Greene ( 2009 ) explain that motivational factors that facilitate learning are not always solely reliant on self-efficacy, and Pintrich and de Groot ( 1990 ) claims that a combination of self-efficacy and intrinsic value is better in explaining the extent to which students are willing to take on the learning task. Ensuing, the researcher also considered creative self-efficacy, defined as the students' belief in producing creative outcomes (Brockhus et al., 2014 ). Prior research has not mentioned creativity as a learning outcome in EC studies. However, according to Pan et al. ( 2020 ), there is a positive relationship between creativity and the need for cognition as it also reflects individual innovation behavior. Likewise, it was deemed necessary due to the nature of the project, which involves design. Lastly, teamwork perception was defined as students' perception of how well they performed as a team to achieve their learning goals. According to Hadjielias et al. ( 2021 ), the cognitive state of teams involved in digital innovations is usually affected by the task involved within the innovation stages. Hence, the consideration of these variables is warranted.

Therefore, it was hypothesized that using ECs could improve learning outcomes, and a quasi-experimental design comparing EC and traditional (CT) groups were facilitated, as suggested by Wang et al. ( 2021 ), to answer the following research questions.

Does the EC group perform better than students who learn in a traditional classroom setting?

Do students who learn with EC have a better perception of learning, need for cognition, motivational belief, and creative self-efficacy than students in a traditional classroom setting?

Does EC improve teamwork perception in comparison to students in a traditional classroom setting?

Educational chatbot design, development, and deployment

According to Adamopoulou and Moussiades ( 2020 ), it is impossible to categorize chatbots due to their diversity; nevertheless, specific attributes can be predetermined to guide design and development goals. For example, in this study, the rule-based approach using the if-else technique (Khan et al., 2019 ) was applied to design the EC. The rule-based chatbot only responds to the rules and keywords programmed (Sandoval, 2018 ), and therefore designing EC needs anticipation on what the students may inquire about (Chete & Daudu, 2020 ). Furthermore, a designer should also consider chatbot's capabilities for natural language conversation and how it can aid instructors, especially in repetitive and low cognitive level tasks such as answering FAQs (Garcia Brustenga et al., 2018 ). As mentioned previously, the goal can be purely administrative (Chocarro et al., 2021 ) or pedagogical (Sandoval, 2018 ).

Next, as for the design and development of the EC, Textit ( https://textit.com/ ), an interactive chatbots development platform, was utilized. Textit is a third-party software developed by Nyaruka and UNICEF that offers chatbots building possibilities without coding but using the concept of flows and deployment through various platforms such as Facebook Messenger, Twitter, Telegram, and SMS. For the design of this EC, Telegram was used due to data encryption security (de Oliveira et al., 2016 ), cloud storage, and the privacy the student and instructor would have without using their personal social media platforms. Telegram has been previously used in this context for retrieving learning contents (Rahayu et al., 2018 ; Thirumalai et al., 2019 ), information and progress (Heryandi, 2020 ; Setiaji & Paputungan, 2018 ), learning assessment (Pereira, 2016 ), project-based learning, teamwork (Conde et al., 2021 ) and peer to peer assessment (P2P) (Pereira et al., 2019 ).

Subsequently, the chatbot named after the course code (QMT212) was designed as a teaching assistant for an instructional design course. It was targeted to be used as a task-oriented (Yin et al., 2021 ), content curating, and long-term EC (10 weeks) (Følstad et al., 2019 ). Students worked in a group of five during the ten weeks, and the ECs' interactions were diversified to aid teamwork activities used to register group members, information sharing, progress monitoring, and peer-to-peer feedback. According to Garcia Brustenga et al. ( 2018 ), EC can be designed without educational intentionality where it is used purely for administrative purposes to guide and support learning. Henceforth, 10 ECs (Table 4 ) were deployed throughout the semester, where EC1-EC4 was used for administrative purposes as suggested by Chocarro et al. ( 2021 ), EC5-EC6 for assignment (Sjöström et al., 2018 ), EC7 for user feedback (Kerly et al., 2007 ) and acceptance (Yin et al., 2021 ), EC8 for monitoring teamwork progress (Colace et al., 2018 ), EC9 as a project guide FAQ (Sandoval, 2018 ) and lastly EC10 for peer to peer assessment (Colace et al., 2018 ; Pereira et al., 2019 ). The ECs were also developed based on micro-learning strategies to ensure that the students do not spend long hours with the EC, which may cause cognitive fatigue (Yin et al., 2021 ). Furthermore, the goal of each EC was to facilitate group work collaboration around a project-based activity where the students are required to design and develop an e-learning tool, write a report, and present their outcomes. Next, based on the new design principles synthesized by the researcher, RiPE was contextualized as described in Table 5 .

Example flow diagrams from Textit for the design and development of the chatbot are represented in Fig.  1 . The number of choices and possible outputs determine the complexity of the chatbot where some chatbots may have simple interaction that requires them to register their groups (Fig.  2 ) or much more complex interaction for peer-to-peer assessment (Fig.  3 ). Example screenshots from Telegram are depicted in Fig.  4 .

figure 1

Textit flow diagrams

figure 2

Textit flow diagram for group registration

figure 3

Textit flow diagram for peer to peer evaluation

figure 4

Telegram screenshots of the EC

Methodology

Participants.

The participants of this study were second-year Bachelor of Education (Teaching English to Speakers of Other Languages (TESOL)) who are minoring in multimedia and currently enrolled in a higher learning institute in Malaysia. The 60 students were grouped into two classes (30 students per class) as either traditional learning class (control group-CT) or chatbot learning class (treatment group-EC). Out of the 60 participants, only 11 were male, 49 were female, and such distribution is typical for this learning program. Both groups were exposed to the same learning contents, class duration, and instructor, where the difference is only denoted towards different class schedules, and only the treatment group was exposed to EC as an aid for teaching and learning the course. Both groups provided written consent to participate in the study and were given honorarium for participation. However, additional consent was obtained from the EC group in regards of data protection act as the intervention includes the use of social media application and this was obtained through EC1: Welcome Bot.      

The instructional design course aims to provide fundamental skills in designing effective multimedia instructional materials and covers topics such as need analysis, instructional analysis, learner analysis, context analysis, defining goals and objectives, developing instructional strategy and materials, developing assessment methods, and assessing them by conducting formative and summative assessments. The teaching and learning in both classes are identical, wherein the students are required to design and develop a multimedia-based instructional tool that is deemed their course project. Students independently choose their group mates and work as a group to fulfill their project tasks. Moreover, both classes were also managed through the institution's learning management system to distribute notes, attendance, and submission of assignments.

This study applies an interventional study using a quasi-experimental design approach. Creswell ( 2012 ) explained that education-based research in most cases requires intact groups, and thus creating artificial groups may disrupt classroom learning. Therefore, one group pretest–posttest design was applied for both groups in measuring learning outcomes, except for learning performance and perception of learning which only used the post-test design. The total intervention time was ten weeks, as represented in Fig.  5 . The EC is usually deployed for the treatment class one day before the class except for EC6 and EC10, which were deployed during the class. Such a strategy was used to ensure that the instructor could guide the students the next day if there were any issues.

figure 5

Study procedure

This study integrates five instruments which measure perception of learning (Silva et al., 2017 ), perceived motivation belief using the Motivated Strategies for Learning Questionnaire (MSLQ) (Pintrich & de Groot, 1990 ) and modified MSLQ (Silva et al., 2017 ), need for cognition using the Need for Cognition Scale–6 (NCS-6) (de Holanda Coelho et al., 2020 ), creative self-efficacy from the Creative Self-Efficacy (QCSE) (Brockhus et al., 2014 ) and teamwork using a modified version of Team Assessment Survey Questions (Linse, 2007 ). The teamwork survey had open-ended questions, which are:

Give one specific example of something you learned from the team that you probably would not have learned on your own.

Give one specific example of something other team members learned from you that they probably would not have learned without you.

What problems have you had interacting as a team so far?

Suggest one specific, practical change the team could make that would help improve everyone’s learning.

The instruments were rated based on the Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree) and administered using Google Forms for both groups. Where else, learning performance was assessed based on the assessment of the project, which includes report, product, presentation, and peer-to-peer assessment.

A series of one-way analyses of covariance (ANCOVA) was employed to evaluate the difference between the EC and CT groups relating to the need for cognition, motivational belief for learning, creative self-efficacy, and team assessment. As for learning performance, and perception of learning, a t-test was used to identify the difference between the groups. The effect size was evaluated according to Hattie ( 2015 ), where an average effect size (Cohen’s d ) of 0.42 for an intervention using technologies for college students is reflected to improve achievement (Hattie, 2017 ). Furthermore, as the teamwork has open-ended questions, the difference between the groups was evaluated qualitatively using Text analysis performed using the Voyant tool at https://voyant-tools.org/ (Sinclair & Rockwell, 2021 ). Voyant tools is an open-source online tool for text analysis and visualization (Hetenyi et al., 2019 ), and in this study, the collocates graphs were used to represent keywords and terms that occur in close proximity representing a directed network graph.

Learning performance for the course

The EC group (µ = 42.500, SD = 2.675) compared the CT group (µ = 39.933, SD = 2.572) demonstrated significant difference at t (58) = 3.788, p  = 0.000, d  = 0.978; hence indicating difference in learning achievement where the EC group outperformed the control group. The Cohen’s d value as described by Hattie ( 2017 ) indicated that learning performance improved by the intervention.

Need for cognition

The initial Levine’s test and normality indicated that the homogeneity of variance assumptions was met at F (1,58) = 0.077, p = 0.782. The adjusted means of µ = 3.416 for the EC group and µ = 3.422 for the CT group indicated that the post-test scores were not significant at F (1, 57) = 0.002, p  = 0.969, η2p = 0.000, d  = 0.012); hence indicating that student’s perception of enjoyment and tendency to engage in the course is similar for both groups.

Motivational beliefs

The initial Levine’s test and normality indicated that the homogeneity of variance assumptions was met at F (1,58) = 0.062, p = 0.804. The adjusted means of µ = 4.228 for the EC group and µ = 4.200 for the CT group indicated that the post-test scores were not significant at F (1, 57) = 0.046, p  = 0.832, η2p = 0.001, d  = 0.056); hence indicating that the student’s motivation to engage in the course are similar for both groups.

Creative self-efficacy

The initial Levine’s test and normality indicated that the homogeneity of variance assumptions was met at F (1,58) = 0.808, p = 0.372. The adjusted means of µ = 3.566 for the EC group and µ = 3.627 for the CT group indicated that the post-test scores were not significant at F (1, 57) = 0.256, p  = 0.615, η2p = 0.004, d  = 0.133); hence indicating that the student’s perception of creative self-efficacy was similar for both groups.

Perception of learning

The EC group (µ = 4.370, SD = 0.540) compared the CT group (µ = 4.244, SD = 0.479) demonstrated no significant difference at t (58) = 0.956, p = 0.343, d  = 0.247; hence indicating no difference in how students perceived their learning process quantitively. Nevertheless, we also questioned what impacted their learning (project design and development) the most during the course, and the findings, as shown in Table 6 , indicated that both groups (EC = 50.00% and CT = 86.67%) found the group learning activity as having the most impact. The control group was more partial towards the group activities than the EC group indicating online feedback and guidance (30.00%) and interaction with the lecturer as an inequitable influence. It was also indicated in both groups that constructive feedback was mostly obtained from fellow course mates (EC = 56.67%, CT = 50.00%) and the instructor (EC = 36.67%, CT = 43.33%) (Table 7 ) while minimum effort was made to get feedback outside the learning environment.

Team assessment

The initial Levine’s test and normality indicated that the homogeneity of variance assumptions was met at F (1,58) = 3.088, p = 0.051. The adjusted means of µ = 4.518 for the experimental group and µ = 4.049 for the CT group indicated that the post-test scores were significantly different at F (1, 57) = 5.950, p  = 0.018, η2p = 0.095, d  = 0.641; hence indicating that there was a significant difference between groups in how they performed in teams. The Cohen’s d value, as described by Hattie ( 2017 ), indicated that the intervention improved teamwork.

Next, we questioned their perception of teamwork based on what they learned from their teammates, what they felt others learn from them, the problem faced as a team, and recommendations to improve their experience in the course. Based on the feedback, themes such as teamwork, technology, learning management, emotional management, creativity, and none were identified to categories the feedback. The descriptive data are represented in Table 8 for both the groups and the trends reflecting the changes in feedback are described as follow:

Respondent learned from teammates

This question reflects on providing feedback on one aspect they have learned from their team that they probably would not have learned independently. Based on Fig.  6 , the illustration describes changes in each group (EC and CT) pre and post-intervention. First, teamwork showed an increasing trend for EC, whereas CT showed slight changes pre and post-intervention.

Next, using text analysis collocates graphs (Fig.  7 ) for EC post-intervention, a change was observed indicating teamwork perception resonating from just learning new ideas, communicating, and accepting opinions towards a need to cooperate as a team to ensure they achieve their goal of developing the project. It was observed that communicating merely was not the main priority anymore as cooperation towards problem-solving is of utmost importance. Example feedbacks are such as, “I learned teamwork and how to solve complicated problems” and “The project was completed in a shorter period of time, compared to if I had done it by myself.” Next, in both groups, creativity seems to have declined from being an essential aspect in the project's initial phase as it declines towards the end of the semester, whereas an increment was noticed in giving more importance to emotional management when handling matters of the project. Example feedback is such as “I learn to push myself more and commit to the project's success.” Nevertheless, in both groups, all the trends are almost similar.

figure 6

Change in perception pre and post-intervention based on aspects learn from teammates

figure 7

Change in perception for the EC group based on aspects learn from teammate

Teammates learned from the respondent.

This question reflects on an aspect the respondent believes that their team members have learned from them. Initially, both groups reported being unaware of their contribution by stating “nothing” or “I don’t know” which was classified as “other” (Fig.  8 ). Nevertheless, intriguingly both groups showed a decline in such negative perception post-intervention, which can be attributed to self-realization of their contribution in task completion. Furthermore, different trends were observed between both groups for teamwork, where the EC group showed more references to increased teamwork contribution, where else the CT group remained unaffected post-intervention. In terms of technology application, the respondents in both groups described how they were a valuable resource for teaching their peers about technology, where one respondent stated that “My friends learn how to make an application from me.”

figure 8

Change in perception pre and post-intervention based on aspects teammates learned from respondents

Problem respondent faced as a team

Based on the analysis, it was found that the main issue faced in both groups were related to teamwork (Fig.  9 ). The CT group reflected higher teamwork issues than the EC group, and in both groups, these issues escalated during the learning process.

figure 9

Graphical representation of issues faced as a team

Based on analyzing the text, initially, the EC group found issues related to identifying an appropriate time to have group discussions as some teammates were either absent or unavailable (Fig.  10 ), where a respondent stated that “We can barely meet as a group.” Post-intervention, the group found similar issues, highlighting a lack of communication and availability due to insufficient time and being busy with their learning schedule. Example respond, “We do not have enough time to meet up, and most of us have other work to do.” As for the CT group pre-intervention, similar issues were observed as denoted for the EC group, but communication issues were more prevalent as respondents mentioned differences in opinions or void in feedback which affected how they solved problems collectively (Fig. 11 ). Example feedback is “One of the members rarely responds in the group discussion.” Post-intervention, the CT group claimed that the main issues besides communication were non-contributing members and bias in task distribution. Examples are “Some of my teammates were not really contributing” and “The task was not distributed fairly.”

figure 10

Change in perception for the EC group based on issues faced as a team

figure 11

Change in perception for the CT group based on issues faced as a team

Recommendations to improve teamwork

Two interesting trends were observed from Fig.  12 , which are (a) EC group reflected more need teamwork whereas the CT group showed otherwise (b) CT group emphasized learning management for teamwork whereas the EC group showed otherwise. When assessing the changes in the EC group (Fig.  13 ), transformations were observed between pre and post-intervention, where students opined the need for more active collaboration in providing ideas and acceptance. One respondent from the treatment group reflected that acceptance is vital for successful collaboration, stating that “Teamwork and acceptance in a group are important.” Next, for the CT group (Fig.  14 ), the complexity of defining teamwork pre-intervention, such as communicating, confidence, and contribution of ideas, was transformed to reflect more need for commitment by stating, “Make sure everyone is committed and available to contribute accordingly.”

figure 12

Graphical representation of recommendations pre and post-intervention for both groups

figure 13

Changes in perception for the EC group based on recommendations for learning improvement as a team

figure 14

Changes in perception for the CT group based on recommendations for learning improvement as a team

According to Winkler and Söllner ( 2018 ), ECs have the potential to improve learning outcomes due to their ability to personalize the learning experience. This study aims to evaluate the difference in learning outcomes based on the impact of EC on a project-based learning activity. The outcomes were compared quantitively and qualitatively to explore how the introduction of EC will influence learning performance, need for cognition, motivational belief, creative self-efficacy, perception of learning, and teamwork. Based on the findings, EC has influenced learning performance ( d  = 0.978) and teamwork ( d  = 0.641), and based on the Cohen’s d value being above 0.42, a significant impact on the outcome was deduced. However, other outcomes such as the need for cognition, motivational belief, creative self-efficacy, and perception of learning did not reflect significant differences between both groups.

Firstly, Kearney et al. ( 2009 ) explained that in homogenous teams (as investigated in this study), the need for cognition might have a limited amount of influence as both groups are required to be innovative simultaneously in providing project solutions. Lapina ( 2020 ) added that problem-based learning and solving complex problems could improve the need for cognition. Hence, when both classes had the same team-based project task, the homogenous nature of the sampling may have attributed to the similarities in the outcome that overshadowed the effect of the ECs. Equally, for motivational belief, which is the central aspect needed to encourage strategic learning behavior (Yen, 2018 ). A positive relation with cognitive engagement, performance, and the use of metacognitive strategies (Pintrich & de Groot, 1990 ) is accredited to the need to regulate and monitor learning (Yilmaz & Baydas, 2017 ), especially for project-based learning activities (Sart, 2014 ). Therefore, in both groups, due to the same learning task, these attributes are apparent for both groups as they were able to complete their task (cognitive engagement), and to do so, they were required to plan their task, schedule teamwork activities (metacognition), and design and develop their product systematically.

Moreover, individual personality traits such as motivation have also been found to influence creativity (van Knippenberg & Hirst, 2020 ) which indirectly influenced the need for cognition (Pan et al., 2020 ). Nevertheless, these nonsignificant findings may have some interesting contribution as it implies that project-based learning tends to improve these personality-based learning outcomes. At the same time, the introduction of ECs did not create cognitive barriers that would have affected the cognition, motivational and creative processes involved in project-based learning. Furthermore, as there is a triangulated relationship between these outcomes, the author speculates that these outcomes were justified, especially with the small sample size used, as Rosenstein ( 2019 ) explained.

However, when EC is reflected as a human-like conversational agent (Ischen et al., 2020 ) used as a digital assistant in managing and monitoring students (Brindha et al., 2019 ), the question arises on how do we measure such implication and confirm its capabilities in transforming learning? As a digital assistant, the EC was designed to aid in managing the team-based project where it was intended to communicate with students to inquire about challenges and provide support and guidance in completing their tasks. According to Cunningham-Nelson et al. ( 2019 ), such a role improves academic performance as students prioritize such needs. Conversely, for teamwork, technology-mediated communication, such as in ECs, has been found to encourage interaction in team projects (Colace et al., 2018 ) as they perceived the ECs as helping them to learn more, even when they have communication issues (Fryer et al., 2019 ). Therefore, supporting the outcome of this study that observed that the EC groups learning performance and teamwork outcome had a more significant effect size than the CT group.

As for the qualitative findings, firstly, even though the perception of learning did not show much variation statistically, the EC group showed additional weightage that implicates group activities, online feedback, and interaction with the lecturer as impactful. Interestingly, the percentage of students that found “interaction with lecturer” and “online feedback and guidance” for the EC was higher than the control group, and this may be reflected as a tendency to perceive the chatbot as an embodiment of the lecturer. Furthermore, as for constructive feedback, the outcomes for both groups were very similar as the critiques were mainly from the teammates and the instructor, and the ECs were not designed to critique the project task.

Next, it was interesting to observe the differences and the similarities in both groups for teamwork. In the EC group, there were changes in terms of how students identified learning from other individual team members towards a collective perspective of learning from the team. Similarly, there was also more emphasis on how they contributed as a team, especially in providing technical support. As for CT, not much difference were observed pre and post-intervention for teamwork; however, the post-intervention in both groups reflected a reduced need for creativity and emphasizing the importance of managing their learning task cognitively and emotionally as a team. Concurrently, it was evident that the self-realization of their value as a contributing team member in both groups increased from pre-intervention to post-intervention, which was higher for the CT group.

Furthermore, in regard to problems faced, it was observed that in the EC group, the perception transformed from collaboration issues towards communicative issues, whereas it was the opposite for the CT group. According to Kumar et al. ( 2021 ), collaborative learning has a symbiotic relationship with communication skills in project-based learning. This study identifies a need for more active collaboration in the EC group and commitment for the CT group. Overall, it can be observed that the group task performed through ECs contributed towards team building and collaboration, whereas for the CT group, the concept of individuality was more apparent. Interestingly, no feedback from the EC group mentioned difficulties in using the EC nor complexity in interacting with it. It was presumed that students welcomed such interaction as it provided learning support and understood its significance.

Furthermore, the feedbacks also justified why other variables such as the need for cognition, perception of learning, creativity, self-efficacy, and motivational belief did not show significant differences. For instance, both groups portrayed high self-realization of their value as a team member at the end of the course, and it was deduced that their motivational belief was influenced by higher self-efficacy and intrinsic value. Next, in both groups, creativity was overshadowed by post-intervention teamwork significance. Therefore, we conclude that ECs significantly impact learning performance and teamwork, but affective-motivational improvement may be overshadowed by the homogenous learning process for both groups. Furthermore, it can be perceived that the main contribution of the ECs was creating a “team spirit” especially in completing administrative tasks, interactions, and providing feedback on team progress, and such interaction was fundamental in influencing their learning performance.

Theoretical and practical implication

This study report theoretical and practical contributions in the area of educational chatbots. Firstly, given the novelty of chatbots in educational research, this study enriched the current body of knowledge and literature in EC design characteristics and impact on learning outcomes. Even though the findings are not practically satisfactory with positive outcomes regarding the affective-motivational learning outcomes, ECs as tutor support did facilitate teamwork and cognitive outcomes that support project-based learning in design education. In view of that, it is worth noting that the embodiment of ECs as a learning assistant does create openness in interaction and interpersonal relationships among peers, especially if the task were designed to facilitate these interactions.

Limitation and future studies

This study focuses on using chatbots as a learning assistant from an educational perspective by comparing the educational implications with a traditional classroom. Therefore, the outcomes of this study reflected only on the pedagogical outcomes intended for design education and project-based learning and not the interaction behaviors. Even though empirical studies have stipulated the role of chatbots in facilitating learning as a communicative agent, nevertheless instructional designers should consider the underdeveloped role of an intelligent tutoring chatbot (Fryer et al., 2019 ) and question its limits in an authentic learning environment. As users, the students may have different or higher expectations of EC, which are potentially a spillover from use behavior from chatbots from different service industries. Moreover, questions to ponder are the ethical implication of using EC, especially out of the learning scheduled time, and if such practices are welcomed, warranted, and accepted by today's learner as a much-needed learning strategy. According to Garcia Brustenga et al. ( 2018 ), while ECs can perform some administrative tasks and appear more appealing with multimodal strategies, the author questions how successful such strategies will be as a personalized learning environment without the teacher as the EC’s instructional designer. Therefore, future studies should look into educators' challenges, needs, and competencies and align them in fulfill EC facilitated learning goals. Furthermore, there is much to be explored in understanding the complex dynamics of human–computer interaction in realizing such a goal, especially educational goals that are currently being influenced by the onset of the Covid-19 pandemic. Conversely, future studies should look into different learning outcomes, social media use, personality, age, culture, context, and use behavior to understand the use of chatbots for education.

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Abbreviations

Educational chatbots

Control group

Reliability, interpersonal communication, pedagogy, and experience

Goal-oriented requirements engineering

Adamopoulou, E., & Moussiades, L. (2020). An overview of chatbot technology. In: Maglogiannis, I., Iliadis, L. & Pimenidis, E. (eds) Artificial intelligence applications and innovations. AIAI 2020. IFIP advances in information and communication technology , vol 584 (pp. 373–383). Springer. https://doi.org/10.1007/978-3-030-49186-4_31 .

Arruda, D., Marinho, M., Souza, E. & Wanderley, F. (2019) A Chatbot for goal-oriented requirements modeling. In: Misra S. et al. (eds) Computational science and its applications—ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science , vol 11622 (pp. 506–519). Springer. https://doi.org/10.1007/978-3-030-24305-0_38 .

Bii, P. (2013). Chatbot technology: A possible means of unlocking student potential to learn how to learn. Educational Research, 4 (2), 218–221.

Google Scholar  

Brandtzaeg, P. B., & Følstad, A. (2018). Chatbots: User changing needs and motivations. Interactions, 25 (5), 38–43. https://doi.org/10.1145/3236669 .

Article   Google Scholar  

Brindha, S., Dharan, K. R. D., Samraj, S. J. J., & Nirmal, L. M. (2019). AI based chatbot for education management. International Journal of Research in Engineering, Science and Management, 2 (3), 2–4.

Brockhus, S., van der Kolk, T. E. C., Koeman, B., & Badke-Schaub, P. G. (2014). The influence of ambient green on creative performance. Proceeding of International Design Conference (DESIGN 2014) , Croatia, 437–444.

Cacioppo, J. T., Petty, R. E., Feinstein, J. A., & Jarvis, W. B. G. (1996). Dispositional differences in cognitive motivation: The life and times of individuals varying in need for cognition. Psychological Bulletin, 119 , 197–253.

Cameron, G., Cameron, D. M., Megaw, G., Bond, R. B., Mulvenna, M., O’Neill, S. B., Armour, C., & McTear, M. (2018). Back to the future: Lessons from knowledge engineering methodologies for chatbot design and development. Proceedings of British HCI 2018 , 1–5. https://doi.org/10.14236/ewic/HCI2018.153 .

Chan, T. J., Yong, W. K. Y., & Harmizi, A. (2020). Usage of whatsapp and interpersonal communication skills among private university students. Journal of Arts & Social Sciences, 3 (January), 15–25.

Chaves, A. P., & Gerosa, M. A. (2021). How should my chatbot interact? A survey on social characteristics in human–chatbot interaction design. International Journal of Human-Computer Interaction, 37 (8), 729–758. https://doi.org/10.1080/10447318.2020.1841438 .

Chen, H. L., Widarso, G. V., & Sutrisno, H. (2020). A chatbot for learning Chinese: Learning achievement and technology acceptance. Journal of Educational Computing Research, 58 (6), 1161–1189. https://doi.org/10.1177/0735633120929622 .

Chete, F. O., & Daudu, G. O. (2020). An approach towards the development of a hybrid chatbot for handling students’ complaints. Journal of Electrical Engineering, Electronics, Control and Computer Science, 6 (22), 29–38.

Chocarro, R., Cortiñas, M., & Marcos-Matás, G. (2021). Teachers’ attitudes towards chatbots in education: A technology acceptance model approach considering the effect of social language, bot proactiveness, and users’ characteristics. Educational Studies, 00 (00), 1–19. https://doi.org/10.1080/03055698.2020.1850426 .

Ciechanowski, L., Przegalinska, A., Magnuski, M., & Gloor, P. (2019). In the shades of the uncanny valley: An experimental study of human–chatbot interaction. Future Generation Computer Systems, 92 , 539–548. https://doi.org/10.1016/j.future.2018.01.055 .

Colace, F., De Santo, M., Lombardi, M., Pascale, F., Pietrosanto, A., & Lemma, S. (2018). Chatbot for e-learning: A case of study. International Journal of Mechanical Engineering and Robotics Research, 7 (5), 528–533. https://doi.org/10.18178/ijmerr.7.5.528-533 .

Conde, M. Á., Rodríguez-Sedano, F. J., Hernández-García, Á., Gutiérrez-Fernández, A., & Guerrero-Higueras, Á. M. (2021). Your teammate just sent you a new message! The effects of using Telegram on individual acquisition of teamwork competence. International Journal of Interactive Multimedia and Artificial Intelligence, 6 (6), 225. https://doi.org/10.9781/ijimai.2021.05.007 .

Cunningham-Nelson, S., Boles, W., Trouton, L., & Margerison, E. (2019). A review of chatbots in education: practical steps forward. In 30th Annual conference for the australasian association for engineering education (AAEE 2019): Educators becoming agents of change: innovate, integrate, motivate. Engineers Australia , 299–306.

Creswell, J. W. (2012). Educational Research : Planning, Conducting and Evaluating Quantitative and Qualitative Research (4th ed.). Pearson Education.

de Holanda Coelho, G. L., Hanel, H. P., & Wolf, J. L. (2020). The very efficient assessment of need for cognition: Developing a six-item version. Assessment, 27 (8), 1870–1885. https://doi.org/10.1177/1073191118793208 .

de Oliveira, J. C., Santos, D. H., & Neto, M. P. (2016). Chatting with Arduino platform through Telegram Bot. 2016 IEEE International Symposium on Consumer Electronics (ISCE) , 131–132. https://doi.org/10.1109/ISCE.2016.7797406 .

Dekker, I., De Jong, E. M., Schippers, M. C., De Bruijn-Smolders, M., Alexiou, A., & Giesbers, B. (2020). Optimizing students’ mental health and academic performance: AI-enhanced life crafting. Frontiers in Psychology, 11 (June), 1–15. https://doi.org/10.3389/fpsyg.2020.01063 .

Devito, J. (2018). The interpersonal communication book (15th ed.). Pearson Education Limited.

Eccles, J. S., & Wigfield, A. (2002). Motivational Beliefs, Values, and Goals. Annual Review of Psychology , 53 (1), 109–132. https://doi.org/10.1146/annurev.psych.53.100901.135153 .

Eeuwen, M. V. (2017). Mobile conversational commerce: messenger chatbots as the next interface between businesses and consumers . Unpublished Master's thesis. University of Twente.

Følstad, A., Skjuve, M., & Brandtzaeg, P. B. (2019). Different chatbots for different purposes: towards a typology of chatbots to understand interaction design. In: Bodrunova S. et al. (eds) Internet Science. INSCI 2018. Lecture Notes in Computer Science , vol 11551 (pp. 145–156). Springer. https://doi.org/10.1007/978-3-030-17705-8_13 .

Følstad, A., & Brandtzaeg, P. B. (2017). Chatbots and the new world of HCI. Interactions, 24 (4), 38–42. https://doi.org/10.1145/3085558 .

Fryer, L. K., Nakao, K., & Thompson, A. (2019). Chatbot learning partners: Connecting learning experiences, interest and competence. Computers in Human Behavior, 93 , 279–289. https://doi.org/10.1016/j.chb.2018.12.023 .

Garcia Brustenga, G., Fuertes-Alpiste, M., & Molas-Castells, N. (2018). Briefing paper: Chatbots in education . eLearn Center, Universitat Oberta de Catalunya. https://doi.org/10.7238/elc.chatbots.2018 .

Gonda, D. E., Luo, J., Wong, Y. L., & Lei, C. U. (2019). Evaluation of developing educational chatbots based on the seven principles for good teaching. Proceedings of the 2018 IEEE international conference on teaching, assessment, and learning for engineering, TALE 2018 , Australia, 446–453. IEEE. https://doi.org/10.1109/TALE.2018.8615175 .

Hadjielias, E., Dada, O., Discua Cruz, A., Zekas, S., Christofi, M., & Sakka, G. (2021). How do digital innovation teams function? Understanding the team cognition-process nexus within the context of digital transformation. Journal of Business Research, 122 , 373–386. https://doi.org/10.1016/j.jbusres.2020.08.045 .

Han, R., & Xu, J. (2020). A comparative study of the role of interpersonal communication, traditional media and social media in pro-environmental behavior: A China-based study. International Journal of Environmental Research and Public Health . https://doi.org/10.3390/ijerph17061883 .

Haristiani, N., Danuwijaya, A. A., Rifai, M. M., & Sarila, H. (2019). Gengobot: A chatbot-based grammar application on mobile instant messaging as language learning medium. Journal of Engineering Science and Technology, 14 (6), 3158–3173.

Hattie, J. (2017). Visible Learningplus 250+ influences on student achievement. In Visible learning plus . www.visiblelearningplus.com/content/250-influences-student-achievement .

Hattie, J. (2015). The applicability of Visible Learning to higher education. Scholarship of Teaching and Learning in Psychology, 1 (1), 79–91. https://doi.org/10.1037/stl0000021 .

Heryandi, A. (2020). Developing chatbot for academic record monitoring in higher education institution. IOP Conference Series: Materials Science and Engineering . https://doi.org/10.1088/1757-899X/879/1/012049 .

Hetenyi, G., Lengyel, A., & Szilasi, M. (2019). Quantitative analysis of qualitative data: Using voyant tools to investigate the sales-marketing interface. Journal of Industrial Engineering and Management, 12 (3), 393–404. https://doi.org/10.3926/jiem.2929 .

Hobert, S. (2019). How are you, chatbot? Evaluating chatbots in educational settings—Results of a literature review. In N. Pinkwart & J. Konert (Eds.), DELFI 2019 (pp. 259–270). Gesellschaft für Informatik, Bonn. https://doi.org/10.18420/delfi2019_289 .

Hobert S. & Berens F. (2020). Small talk conversations and the long-term use of chatbots in educational settings—experiences from a field study. In: Følstad A. et al. (eds) Chatbot research and design. CONVERSATIONS 2019. Lecture Notes in Computer Science, vol 11970 (pp. 260–272). Springer. https://doi.org/10.1007/978-3-030-39540-7_18 .

Holotescu, C. (2016). MOOCBuddy: A chatbot for personalized learning with MOOCs. In: A. Iftene & J. Vanderdonckt (Eds.), Proceedings of the 13th international conference on human-computer interaction RoCHI’2016 , Romania, 91–94.

Ischen C., Araujo T., Voorveld H., van Noort G., Smit E. (2020) Privacy concerns in chatbot interactions. In: Følstad A. et al. (eds) Chatbot research and design. CONVERSATIONS 2019. Lecture Notes in Computer Science , vol 11970 (pp. 34–48). Springer. https://doi.org/10.1007/978-3-030-39540-7_3 .

Ismail, M., & Ade-Ibijola, A. (2019). Lecturer’s Apprentice: A chatbot for assisting novice programmers. Proceedings—2019 International multidisciplinary information technology and engineering conference, IMITEC 2019 . South Africa, 1–8. IEEE. https://doi.org/10.1109/IMITEC45504.2019.9015857 .

Kearney, E., Gebert, D., & Voelpel, S. (2009). When and how diversity benefits teams: The importance of team members’ need for cognition. Academy of Management Journal, 52 (3), 581–598. https://doi.org/10.5465/AMJ.2009.41331431 .

Kerly, A., Hall, P., & Bull, S. (2007). Bringing chatbots into education: Towards natural language negotiation of open learner models. Knowledge-Based Systems, 20 (2), 177–185. https://doi.org/10.1016/j.knosys.2006.11.014 .

Khan, A., Ranka, S., Khakare, C., & Karve, S. (2019). NEEV: An education informational chatbot. International Research Journal of Engineering and Technology, 6 (4), 492–495.

Kim, M. S. (2021). A systematic review of the design work of STEM teachers. Research in Science & Technological Education, 39 (2), 131–155. https://doi.org/10.1080/02635143.2019.1682988 .

Kumar, J. A., & Silva, P. A. (2020). Work-in-progress: A preliminary study on students’ acceptance of chatbots for studio-based learning. Proceedings of the 2020 IEEE Global Engineering Education Conference (EDUCON) , Portugal, 1627–1631. IEEE https://doi.org/10.1109/EDUCON45650.2020.9125183 .

Kumar, J. A., Bervell, B., Annamalai, N., & Osman, S. (2020). behavioral intention to use mobile learning : Evaluating the role of self-efficacy, subjective norm, and WhatsApp use habit. IEEE Access, 8 , 208058–208074. https://doi.org/10.1109/ACCESS.2020.3037925 .

Kumar, J. A., Silva, P. A., & Prelath, R. (2021). Implementing studio-based learning for design education: A study on the perception and challenges of Malaysian undergraduates. International Journal of Technology and Design Education, 31 (3), 611–631. https://doi.org/10.1007/s10798-020-09566-1 .

Lapina, A. (2020). Does exposure to problem-based learning strategies increase postformal thought and need for cognition in higher education students? A quasi-experimental study (Publication No. 28243240) Doctoral dissertation, Texas State University-San Marcos. ProQuest Dissertations & Theses Global.

Linse, A. R. (2007). Team peer evaluation. In Schreyer Institute for Teaching Excellence . http://www.schreyerinstitute.psu.edu/ .

Luo, C. J., & Gonda, D. E. (2019). Code Free Bot: An easy way to jumpstart your chatbot! Proceeding of the 2019 IEEE International Conference on Engineering, Technology and Education (TALE 2019) , Australia, 1–3, IEEE. https://doi.org/10.1109/TALE48000.2019.9226016 .

Meyer von Wolff, R., Nörtemann, J., Hobert, S., Schumann, M. (2020) Chatbots for the information acquisition at universities—A student’s view on the application area. In: Følstad A. et al. (eds) Chatbot research and design. CONVERSATIONS 2019. Lecture Notes in Computer Science , vol 11970 (pp. 231–244). Springer. https://doi.org/10.1007/978-3-030-39540-7_16 .

Miller, E. (2016). How chatbots will help education. Venturebeat. http://venturebeat.com/2016/09/29/how-chatbots-will-help-education/ .

Nguyen, Q. N., & Sidorova, A. (2018). Understanding user interactions with a chatbot: A self-determination theory approach. Proceedings of the Twenty-Fourth Americas Conference on Information Systems , United States of America, 1–5. Association for Information Systems (AIS).

Oke, A., & Fernandes, F. A. P. (2020). Innovations in teaching and learning: Exploring the perceptions of the education sector on the 4th industrial revolution (4IR). Journal of Open Innovation: Technology, Market, and Complexity., 6 (2), 31. https://doi.org/10.3390/JOITMC6020031 .

Ondas, S., Pleva, M., & Hladek, D. (2019). How chatbots can be involved in the education process. Proccedings of the 17th IEEE international conference on emerging eLearning technologies and applications ICETA 2019, Slovakia , 575–580. https://doi.org/10.1109/ICETA48886.2019.9040095 .

Pan, Y., Shang, Y., & Malika, R. (2020). Enhancing creativity in organizations: The role of the need for cognition. Management Decision . https://doi.org/10.1108/MD-04-2019-0516 .

Park, H. S., Baker, C., & Lee, D. W. (2008). Need for cognition, task complexity, and job satisfaction. Journal of Management in Engineering, 24 (2), 111–117. https://doi.org/10.1061/(asce)0742-597x(2008)24:2(111) .

Pereira, J. (2016). Leveraging chatbots to improve self-guided learning through conversational quizzes. Proceedings of the Fourth International Conference on Technological Ecosystems for Enhancing Multiculturality—TEEM ’16 , Spain, 911–918, ACM. https://doi.org/10.1145/3012430.3012625 .

Pereira, J., Fernández-Raga, M., Osuna-Acedo, S., Roura-Redondo, M., Almazán-López, O., & Buldón-Olalla, A. (2019). Promoting learners’ voice productions using chatbots as a tool for improving the learning process in a MOOC. Technology, Knowledge and Learning, 24 (4), 545–565. https://doi.org/10.1007/s10758-019-09414-9 .

Pham, X. L., Pham, T., Nguyen, Q. M., Nguyen, T. H., & Cao, T. T. H. (2018). Chatbot as an intelligent personal assistant for mobile language learning. Proceedings of the 2018 2nd international conference on education and e-Learning—ICEEL 2018 , Indonesia, 16–21. ACM. https://doi.org/10.1145/3291078.3291115 .

Pintrich, P. R., & de Groot, E. V. (1990). Motivational and self-regulated learning components of classroom academic performance. Journal of Educational Psychology, 82 (1), 33–40. https://doi.org/10.1037/0022-0663.82.1.33 .

Pintrich, P. R., Smith, D. A. F., Garcia, T., & Mckeachie, W. J. (1993). Reliability and predictive validity of the motivated strategies for learning questionnaire (MSLQ). Educational and Psychological Measurement, 53 (3), 801–813. https://doi.org/10.1177/0013164493053003024 .

Rahayu, Y. S., Wibawa, S. C., Yuliani, Y., Ratnasari, E., & Kusumadewi, S. (2018). The development of BOT API social media Telegram about plant hormones using Black Box Testing. IOP Conference Series: Materials Science and Engineering . https://doi.org/10.1088/1757-899X/434/1/012132 .

Rahman, A. M., Al Mamun, A., & Islam, A. (2018). Programming challenges of chatbot: Current and future prospective. 5th IEEE Region 10 Humanitarian Technology Conference 2017 (R10-HTC 2017) , India, 75–78, IEEE. https://doi.org/10.1109/R10-HTC.2017.8288910 .

Ren, R., Castro, J. W., Acuña, S. T., & De Lara, J. (2019). Evaluation techniques for chatbot usability: A systematic mapping study. International Journal of Software Engineering and Knowledge Engineering, 29 (11–12), 1673–1702. https://doi.org/10.1142/S0218194019400163 .

Riel, J. (2020). Essential features and critical issues with educational chatbots: toward personalized learning via digital agents. In: M. Khosrow-Pour (Ed.), Handbook of research on modern educational technologies, applications, and management (pp. 246–262). IGI Global. https://doi.org/10.1142/S0218194019400163 .

Rosenstein, L. D. (2019). Research design and analysis: A primer for the non-statistician . Wiley.

Book   Google Scholar  

Sandoval, Z. V. (2018). Design and implementation of a chatbot in online higher education settings. Issues in Information Systems, 19 (4), 44–52. https://doi.org/10.48009/4_iis_2018_44-52 .

Article   MathSciNet   Google Scholar  

Sart, G. (2014). The effects of the development of metacognition on project-based learning. Procedia—Social and Behavioral Sciences, 152 , 131–136. https://doi.org/10.1016/j.sbspro.2014.09.169 .

Satow, L. (2017). Chatbots as teaching assistants: Introducing a model for learning facilitation by AI Bots. SAP Community . https://blogs.sap.com/2017/07/12/chatbots-as-teaching-assistants-introducing-a-model-for-learning-facilitation-by-ai-bots/ .

Schlagwein, D., Conboy, K., Feller, J., Leimeister, J. M., & Morgan, L. (2017). “Openness” with and without information technology: A framework and a brief history. Journal of Information Technology, 32 (4), 297–305. https://doi.org/10.1057/s41265-017-0049-3 .

Schmulian, A., & Coetzee, S. A. (2019). The development of Messenger bots for teaching and learning and accounting students’ experience of the use thereof. British Journal of Educational Technology, 50 (5), 2751–2777. https://doi.org/10.1111/bjet.12723 .

Setiaji, H., & Paputungan, I. V. (2018). Design of Telegram Bots for campus information sharing. IOP Conference Series: Materials Science and Engineering, 325 , 1–6. https://doi.org/10.1088/1757-899X/325/1/012005 .

Silva, P.A., Polo, B.J., Crosby, M.E. (2017). Adapting the studio based learning methodology to computer science education. In: Fee S., Holland-Minkley A., Lombardi T. (eds) New directions for computing education (pp. 119–142). Springer. https://doi.org/10.1007/978-3-319-54226-3_8 .

Sinclair, S., & Rockwell, G. (2021). Voyant tools (2.4). https://voyant-tools.org/ .

Sjöström, J., Aghaee, N., Dahlin, M., & Ågerfalk, P. J. (2018). Designing chatbots for higher education practice. Proceedings of the 2018 AIS SIGED International Conference on Information Systems Education and Research .

Smutny, P., & Schreiberova, P. (2020). Chatbots for learning: A review of educational chatbots for the Facebook Messenger. Computers and Education, 151 (February), 103862. https://doi.org/10.1016/j.compedu.2020.103862 .

Sreelakshmi, A. S., Abhinaya, S. B., Nair, A., & Jaya Nirmala, S. (2019). A question answering and quiz generation chatbot for education. Grace Hopper Celebration India (GHCI), 2019 , 1–6. https://doi.org/10.1109/GHCI47972.2019.9071832 .

Stathakarou, N., Nifakos, S., Karlgren, K., Konstantinidis, S. T., Bamidis, P. D., Pattichis, C. S., & Davoody, N. (2020). Students’ perceptions on chatbots’ potential and design characteristics in healthcare education. In J. Mantas, A. Hasman, & M. S. Househ (Eds.), The importance of health informatics in public health during a pandemic (Vol. 272, pp. 209–212). IOS Press. https://doi.org/10.3233/SHTI200531 .

Tamayo, P. A., Herrero, A., Martín, J., Navarro, C., & Tránchez, J. M. (2020). Design of a chatbot as a distance learning assistant. Open Praxis, 12 (1), 145. https://doi.org/10.5944/openpraxis.12.1.1063 .

Tegos, S., Demetriadis, S., Psathas, G. & Tsiatsos T. (2020) A configurable agent to advance peers’ productive dialogue in MOOCs. In: Følstad A. et al. (eds) Chatbot research and design. CONVERSATIONS 2019. Lecture Notes in Computer Science , vol 11970 (pp. 245–259). Springer. https://doi.org/10.1007/978-3-030-39540-7_17 .

Thirumalai, B., Ramanathan, A., Charania, A., & Stump, G. (2019). Designing for technology-enabled reflective practice: teachers’ voices on participating in a connected learning practice. In R. Setty, R. Iyenger, M. A. Witenstein, E. J. Byker, & H. Kidwai (Eds.), Teaching and teacher education: south asian perspectives (pp. 243–273). Palgrave Macmillan. https://doi.org/10.1016/S0742-051X(01)00046-4 .

van Knippenberg, D., & Hirst, G. (2020). A motivational lens model of person × situation interactions in employee creativity. Journal of Applied Psychology, 105 (10), 1129–1144. https://doi.org/10.1037/apl0000486 .

Vázquez-Cano, E., Mengual-Andrés, S., & López-Meneses, E. (2021). Chatbot to improve learning punctuation in Spanish and to enhance open and flexible learning environments. International Journal of Educational Technology in Higher Education, 18 (1), 33. https://doi.org/10.1186/s41239-021-00269-8 .

Verleger, M., & Pembridge, J. (2019). A pilot study integrating an AI-driven chatbot in an introductory programming course. Proceeding of the 2018 IEEE Frontiers in Education Conference (FIE) , USA. IEEE. https://doi.org/10.1109/FIE.2018.8659282 .

Walker, C. O., & Greene, B. A. (2009). The relations between student motivational beliefs and cognitive engagement in high school. Journal of Educational Research, 102 (6), 463–472. https://doi.org/10.3200/JOER.102.6.463-472 .

Wang, J., Hwang, G., & Chang, C. (2021). Directions of the 100 most cited chatbot-related human behavior research: A review of academic publications. Computers and Education: Artificial Intelligence, 2 , 1–12. https://doi.org/10.1016/j.caeai.2021.100023 .

Wei, H. C., & Chou, C. (2020). Online learning performance and satisfaction: Do perceptions and readiness matter? Distance Education, 41 (1), 48–69. https://doi.org/10.1080/01587919.2020.1724768 .

Winkler, R., & Söllner, M. (2018). Unleashing the potential of chatbots in education: A state-of-the-art analysis. In Academy of Management Annual Meeting (AOM) . https://www.alexandria.unisg.ch/254848/1/JML_699.pdf .

Wu, E. H. K., Lin, C. H., Ou, Y. Y., Liu, C. Z., Wang, W. K., & Chao, C. Y. (2020). Advantages and constraints of a hybrid model K-12 E-Learning assistant chatbot. IEEE Access, 8 , 77788–77801. https://doi.org/10.1109/ACCESS.2020.2988252 .

Yen, A. M. N. L. (2018). The influence of self-regulation processes on metacognition in a virtual learning environment. Educational Studies, 46 (1), 1–17. https://doi.org/10.1080/03055698.2018.1516628 .

Yilmaz, R. M., & Baydas, O. (2017). An examination of undergraduates’ metacognitive strategies in pre-class asynchronous activity in a flipped classroom. Educational Technology Research and Development, 65 (6), 1547–1567. https://doi.org/10.1007/s11423-017-9534-1 .

Yin, J., Goh, T. T., Yang, B., & Xiaobin, Y. (2021). Conversation technology with micro-learning: The impact of chatbot-based learning on students’ learning motivation and performance. Journal of Educational Computing Research, 59 (1), 154–177. https://doi.org/10.1177/0735633120952067 .

Download references

Acknowledgements

Not applicable.

This study was funded under the Universiti Sains Malaysia Short Term Research Grant 304/PMEDIA/6315219.

Author information

Authors and affiliations.

Centre for Instructional Technology and Multimedia, Universiti Sains Malaysia, Minden, Pulau Pinang, Malaysia

Jeya Amantha Kumar

You can also search for this author in PubMed   Google Scholar

Contributions

The author read and approved the final manuscript.

Corresponding author

Correspondence to Jeya Amantha Kumar .

Ethics declarations

Competing interests.

The author declares that there is no conflict of interest.

Ethical approval and consent to participate

Informed consent was obtained from all participants for being included in the study based on the approval of The Human Research Ethics Committee of Universiti Sains Malaysia (JEPeM) Ref No: USM/JEPeM/18050247.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Kumar, J.A. Educational chatbots for project-based learning: investigating learning outcomes for a team-based design course. Int J Educ Technol High Educ 18 , 65 (2021). https://doi.org/10.1186/s41239-021-00302-w

Download citation

Received : 02 July 2021

Accepted : 23 September 2021

Published : 15 December 2021

DOI : https://doi.org/10.1186/s41239-021-00302-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Design education
  • Project-based learning
  • Collaborative learning
  • Mobile learning

literature review for chatbot project

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Artif Intell

Logo of frontai

Are We There Yet? - A Systematic Literature Review on Chatbots in Education

Sebastian wollny.

1 Information Center for Education, DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany

Jan Schneider

Daniele di mitri, joshua weidlich, marc rittberger, hendrik drachsler.

2 Educational Science Faculty, Open University of the Netherlands, Heerlen, Netherlands

3 Computer Science Faculty, Goethe University, Frankfurt am Main, Germany

Diana Rosario Perez Marin , Rey Juan Carlos University, Spain

Associated Data

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Chatbots are a promising technology with the potential to enhance workplaces and everyday life. In terms of scalability and accessibility, they also offer unique possibilities as communication and information tools for digital learning. In this paper, we present a systematic literature review investigating the areas of education where chatbots have already been applied, explore the pedagogical roles of chatbots, the use of chatbots for mentoring purposes, and their potential to personalize education. We conducted a preliminary analysis of 2,678 publications to perform this literature review, which allowed us to identify 74 relevant publications for chatbots’ application in education. Through this, we address five research questions that, together, allow us to explore the current state-of-the-art of this educational technology. We conclude our systematic review by pointing to three main research challenges: 1) Aligning chatbot evaluations with implementation objectives, 2) Exploring the potential of chatbots for mentoring students, and 3) Exploring and leveraging adaptation capabilities of chatbots. For all three challenges, we discuss opportunities for future research.

Introduction

Educational Technologies enable distance learning models and provide students with the opportunity to learn at their own pace. They have found their way into schools and higher education institutions through Learning Management Systems and Massive Open Online Courses, enabling teachers to scale up good teaching practices ( Ferguson and Sharples, 2014 ) and allowing students to access learning material ubiquitously ( Virtanen et al., 2018 ).

Despite the innovative power of educational technologies, most commonly used technologies do not substantially change teachers’ role. Typical teaching activities like providing students with feedback, motivating them, or adapting course content to specific student groups are still entrusted exclusively to teachers, even in digital learning environments. This can lead to the teacher-bandwidth problem ( Wiley and Edwards, 2002 ), the result of a shortage of teaching staff to provide highly informative and competence-oriented feedback at large scale. Nowadays, however, computers and other digital devices open up far-reaching possibilities that have not yet been fully exploited. For example, incorporating process data can provide students with insights into their learning progress and bring new possibilities for formative feedback, self-reflection, and competence development ( Quincey et al., 2019 ). According to ( Hattie, 2009 ), feedback in terms of learning success has a mean effect size of d = 0.75, while ( Wisniewski et al., 2019 ) even report a mean effect of d = 0.99 for highly informative feedback. Such feedback provides suitable conditions for self-directed learning ( Winne and Hadwin, 2008 ) and effective metacognitive control of the learning process ( Nelson and Narens, 1994 ).

One of the educational technologies designed to provide actionable feedback in this regard is Learning Analytics. Learning Analytics is defined as the research area that focuses on collecting traces that learners leave behind and using those traces to improve learning ( Duval and Verbert, 2012 ; Greller and Drachsler, 2012 ). Learning Analytics can be used both by students to reflect on their own learning progress and by teachers to continuously assess the students’ efforts and provide actionable feedback. Another relevant educational technology is Intelligent Tutoring Systems. Intelligent Tutoring Systems are defined as computerized learning environments that incorporate computational models ( Graesser et al., 2001 ) and provide feedback based on learning progress. Educational technologies specifically focused on feedback for help-seekers, comparable to raising hands in the classroom, are Dialogue Systems and Pedagogical Conversational Agents ( Lester et al., 1997 ). These technologies can simulate conversational partners and provide feedback through natural language ( McLoughlin and Oliver, 1998 ).

Research in this area has recently focused on chatbot technology, a subtype of dialog systems, as several technological platforms have matured and led to applications in various domains. Chatbots incorporate generic language models extracted from large parts of the Internet and enable feedback by limiting themselves to text or voice interfaces. For this reason, they have also been proposed and researched for a variety of applications in education ( Winkler and Soellner, 2018 ). Recent literature reviews on chatbots in education ( Winkler and Soellner, 2018 ; Hobert, 2019a ; Hobert and Meyer von Wolff, 2019 ; Jung et al., 2020 ; Pérez et al., 2020 ; Smutny and Schreiberova, 2020 ; Pérez-Marín, 2021 ) have reported on such applications as well as design guidelines, evaluation possibilities, and effects of chatbots in education.

In this paper, we contribute to the state-of-the-art of chatbots in education by presenting a systematic literature review, where we examine so-far unexplored areas such as implementation objectives, pedagogical roles, mentoring scenarios, the adaptations of chatbots to learners, and application domains. This paper is structured as follows: First, we review related work ( section 2 ), derive research questions from it, then explain the applied method for searching related studies ( section 3 ), followed by the results ( section 4 ), and finally, we discuss the findings ( section 5 ) and point to future research directions in the field ( section 5 ).

Related Work

In order to accurately cover the field of research and deal with the plethora of terms for chatbots in the literature (e.g. chatbot, dialogue system or pedagogical conversational agent) we propose the following definition:

Chatbots are digital systems that can be interacted with entirely through natural language via text or voice interfaces. They are intended to automate conversations by simulating a human conversation partner and can be integrated into software, such as online platforms, digital assistants, or be interfaced through messaging services.

Outside of education, typical applications of chatbots are in customer service ( Xu et al., 2017 ), counseling of hospital patients ( Vaidyam et al., 2019 ), or information services in smart speakers ( Ram et al., 2018 ). One central element of chatbots is the intent classification, also named the Natural Language Understanding (NLU) component, which is responsible for the sense-making of human input data. Looking at the current advances in chatbot software development, it seems that this technology’s goal is to pass the Turing Test ( Saygin et al., 2000 ) one day, which could make chatbots effective educational tools. Therefore, we ask ourselves “ Are we there yet? - Will we soon have an autonomous chatbot for every learner?”

To understand and underline the current need for research in the use of chatbots in education, we first examined the existing literature, focusing on comprehensive literature reviews. By looking at research questions in these literature reviews, we identified 21 different research topics and extracted findings accordingly. To structure research topics and findings in a comprehensible way, a three-stage clustering process was applied. While the first stage consisted of coding research topics by keywords, the second stage was applied to form overarching research categories ( Table 1 ). In the final stage, the findings within each research category were clustered to identify and structure commonalities within the literature reviews. The result is a concept map, which consists of four major categories. Those categories are CAT1. Applications of Chatbots, CAT2. Chatbot Designs, CAT3. Evaluation of Chatbots and CAT4. Educational Effects of Chatbots. To standardize the terminology and concepts applied, we present the findings of each category in a separate sub-section, respectively ( see Figure 1 , Figure 2 , Figure 3 , and Figure 4 ) and extended it with the outcomes of our own literature study that will be reported in the remaining parts of this article. Due to the size of the concept map a full version can be found in Appendix A .

Assignment of coded research topics identified in related literature reviews to research categories.

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g001.jpg

Applications of chatbots in related literature reviews (CAT1).

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g002.jpg

Chatbot designs in related literature reviews (CAT2).

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g003.jpg

Evaluation of chatbots in related literature reviews (CAT3).

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g004.jpg

Educational Effects of chatbots in related literature reviews (CAT4).

Regarding the applications of chatbots (CAT1), application clusters (AC) and application statistics (AS) have been described in the literature, which we visualized in Figure 1 . The study of ( Pérez et al., 2020 ) identifies two application clusters, defined through chatbot activities: “service-oriented chatbots” and “teaching-oriented chatbots.” ( Winkler and Soellner, 2018 ) identify applications clusters by naming the domains “health and well-being interventions,” “language learning,” “feedback and metacognitive thinking” as well as “motivation and self-efficacy.” Concerning application statistics (AS), ( Smutny and Schreiberova, 2020 ) found that nearly 47% of the analyzed chatbots incorporate informing actions, and 18% support language learning by elaborating on chatbots integrated into the social media platform Facebook. Besides, the chatbots studied had a strong tendency to use English, at 89%. This high number aligns with results from ( Pérez-Marín, 2021 ), where 75% of observed agents, as a related technology, were designed to interact in the English language. ( Pérez-Marín, 2021 ) also shows that 42% of the analyzed chatbots had mixed interaction modalities. Finally, ( Hobert and Meyer von Wolff, 2019 ) observed that only 25% of examined chatbots were incorporated in formal learning settings, the majority of published material focuses on student-chatbot interaction only and does not enable student-student communication, as well as nearly two-thirds of the analyzed chatbots center only on a single domain. Overall, we can summarize that so far there are six application clusters for chatbots for education categorized by chatbot activities or domains. The provided statistics allow for a clearer understanding regarding the prevalence of chatbots applications in education ( see Figure 1 ).

Regarding chatbot designs (CAT2), most of the research questions concerned with chatbots in education can be assigned to this category. We found three aspects in this category visualized in Figure 2 : Personality (PS), Process Pipeline (PP), and Design Classifications (DC). Within these, most research questions can be assigned to Design Classifications (DC), which are separated into Classification Aspects (DC2) and Classification Frameworks (DC1). One classification framework is defined through “flow chatbots,” “artificially intelligent chatbots,” “chatbots with integrated speech recognition,” as well as “chatbots with integrated context-data” by ( Winkler and Soellner, 2018 ). A second classification framework by ( Pérez-Marín, 2021 ) covers pedagogy, social, and HCI features of chatbots and agents, which themselves can be further subdivided into more detailed aspects. Other Classification Aspects (DC2) derived from several publications, provide another classification schema, which distinguishes between “retrieval vs. generative” based technology, the “ability to incorporate context data,” and “speech or text interface” ( Winkler and Soellner, 2018 ; Smutny and Schreiberova, 2020 ). By specifying text interfaces as “Button-Based” or “Keyword Recognition-Based” ( Smutny and Schreiberova, 2020 ), text interfaces can be subdivided. Furthermore, a comparison of speech and text interfaces ( Jung et al., 2020 ) shows that text interfaces have advantages for conveying information, and speech interfaces have advantages for affective support. The second aspect of CAT2 concerns the chatbot processing pipeline (PP), highlighting user interface and back-end importance ( Pérez et al., 2020 ). Finally, ( Jung et al., 2020 ) focuses on the third aspect, the personality of chatbots (PS). Here, the study derives four guidelines helpful in education: positive or neutral emotional expressions, a limited amount of animated or visual graphics, a well-considered gender of the chatbot, and human-like interactions. In summary, we have found in CAT2 three main design aspects for the development of chatbots. CAT2 is much more diverse than CAT1 with various sub-categories for the design of chatbots. This indicates the huge flexibility to design chatbots in various ways to support education.

Regarding the evaluation of chatbots (CAT3), we found three aspects assigned to this category, visualized in Figure 3 : Evaluation Criteria (EC), Evaluation Methods (EM), and Evaluation Instruments (EI). Concerning Evaluation Criteria, seven criteria can be identified in the literature. The first and most important in the educational field, according to ( Smutny and Schreiberova, 2020 ) is the evaluation of learning success ( Hobert, 2019a ), which can have subcategories such as how chatbots are embedded in learning scenarios ( Winkler and Soellner, 2018 ; Smutny and Schreiberova, 2020 ) and teaching efficiency ( Pérez et al., 2020 ). The second is acceptance, which ( Hobert, 2019a ) names as “acceptance and adoption” and ( Pérez et al., 2020 ) as “students’ perception.” Further evaluation criteria are motivation, usability, technical correctness, psychological, and further beneficial factors ( Hobert, 2019a ). These Evaluation Criteria show broad possibilities for the evaluation of chatbots in education. However, ( Hobert, 2019a ) found that most evaluations are limited to single evaluation criteria or narrower aspects of them. Moreover, ( Hobert, 2019a ) introduces a classification matrix for chatbot evaluations, which consists of the following Evaluation Methods (EM): Wizard-of-Oz approach, laboratory studies, field studies, and technical validations. In addition to this, ( Winkler and Soellner, 2018 ) recommends evaluating chatbots by their embeddedness into a learning scenario, a comparison of human-human and human-chatbot interactions, and comparing spoken and written communication. Instruments to measure these evaluation criteria were identified by ( Hobert, 2019a ) by naming quantitative surveys, qualitative interviews, transcripts of dialogues, and technical log files. Regarding CAT3, we found three main aspects for the evaluation of chatbots. We can conclude that this is a more balanced and structured distribution in comparison to CAT2, providing researchers with guidance for evaluating chatbots in education.

Regarding educational effects of chatbots (CAT4), we found two aspects visualized in Figure 4 : Effect Size (ES) and Beneficial Chatbot Features for Learning Success (BF). Concerning the effect size, ( Pérez et al., 2020 ) identified a strong dependency between learning and the related curriculum, while ( Winkler and Soellner, 2018 ) elaborate on general student characteristics that influence how students interact with chatbots. They state that students’ attitudes towards technology, learning characteristics, educational background, self-efficacy, and self-regulation skills affect these interactions. Moreover, the study emphasizes chatbot features, which can be regarded as beneficial in terms of learning outcomes (BF): “Context-Awareness,” “Proactive guidance by students,” “Integration in existing learning and instant messaging tools,” “Accessibility,” and “Response Time.” Overall, for CAT4, we found two main distinguishing aspects for chatbots, however, the reported studies vary widely in their research design, making high-level results hardly comparable.

Looking at the related work, many research questions for the application of chatbots in education remain. Therefore, we selected five goals to be further investigated in our literature review. Firstly, we were interested in the objectives for implementing chatbots in education (Goal 1), as the relevance of chatbots for applications within education seems to be not clearly delineated. Secondly, we aim to explore the pedagogical roles of chatbots in the existing literature (Goal 2) to understand how chatbots can take over tasks from teachers. ( Winkler and Soellner, 2018 ) and ( Pérez-Marín, 2021 ), identified research gaps for supporting meta-cognitive skills with chatbots such as self-regulation. This requires a chatbot application that takes a mentoring role, as the development of these meta-cognitive skills can not be achieved solely by information delivery. Within our review we incorporate this by reviewing the mentoring role of chatbots as (Goal 3). Another key element for a mentoring chatbot is adaptation to the learners needs. Therefore, (Goal 4) of our review lies in the investigation of the adaptation approaches used by chatbots in education. For (Goal 5), we want to extend the work of ( Winkler and Soellner, 2018 ) and ( Pérez et al., 2020 ) regarding Application Clusters (AC) and map applications by further investigating specific learning domains in which chatbots have been studied.

To delineate and map the field of chatbots in education, initial findings were collected by a preliminary literature search. One of the takeaways is that the emerging field around educational chatbots has seen much activity in the last two years. Based on the experience of this preliminary search, search terms, queries, and filters were constructed for the actual structured literature review. This structured literature review follows the PRISMA framework ( Liberati et al., 2009 ), a guideline for reporting systematic reviews and meta-analyses. The framework consists of an elaborated structure for systematic literature reviews and sets requirements for reporting information about the review process ( see section 3.2 to 3.4).

Research Questions

Contributing to the state-of-the-art, we investigate five aspects of chatbot applications published in the literature. We therefore guided our research with the following research questions:

RQ1: Which objectives for implementing chatbots in education can be identified in the existing literature?

RQ2: Which pedagogical roles of chatbots can be identified in the existing literature?

RQ3: Which application scenarios have been used to mentor students?

RQ4: To what extent are chatbots adaptable to personal students’ needs?

RQ5: What are the domains in which chatbots have been applied so far?

Sources of Information

As data sources, Scopus, Web of Science, Google Scholar, Microsoft Academics, and the educational research database “Fachportal Pädagogik” (including ERIC) were selected, all of which incorporate all major publishers and journals. In ( Martín-Martín et al., 2018 ) it was shown that for the social sciences only 29.8% and for engineering and computer science, 46.8% of relevant literature is included in all of the first three databases. For the topic of chatbots in education, a value between these two numbers can be assumed, which is why an approach of integrating several publisher-independent databases was employed here.

Search Criteria

Based on the findings from the initial related work search, we derived the following search query:

( Education OR Educational OR Learning OR Learner OR Student OR Teaching OR School OR University OR Pedagogical ) AND Chatbot.

It combines education-related keywords with the “chatbot” keyword. Since chatbots are related to other technologies, the initial literature search also considered keywords such as “pedagogical agents,” “dialogue systems,” or “bots” when composing the search query. However, these increased the number of irrelevant results significantly and were therefore excluded from the query in later searches.

Inclusion and Exclusion Criteria

The queries were executed on 23.12.2020 and applied twice to each database, first as a title search query and secondly as a keyword-based search. This resulted in a total of 3.619 hits, which were checked for duplicates resulting in 2.678 candidate publications. The overall search and filtering process is shown in Figure 5 .

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g005.jpg

PRISMA flow chart.

In the case of Google Scholar, the number of results sorted by relevance per query was limited to 300, as this database also delivers many less relevant works. The value was determined by looking at the search results in detail using several queries to exclude as few relevant works as possible. This approach showed promising results and, at the same time, did not burden the literature list with irrelevant items.

The further screening consisted of a four-stage filtering process. First, eliminating duplicates in the results of title and keyword queries of all databases independently and second, excluding publications based on the title and abstract that:

  • • were not available in English
  • • did not describe a chatbot application
  • • were not mainly focused on learner-centered chatbots applications in schools or higher education institutions, which is according to the preliminary literature search the main application area within education.

Third, we applied another duplicate filter, this time for the merged set of publications. Finally, a filter based on the full text, excluding publications that were:

  • • limited to improve chatbots technically (e.g., publications that compare or develop new algorithms), as research questions presented in these publications were not seeking for additional insights on applications in education
  • • exclusively theoretical in nature (e.g., publications that discuss new research projects, implementation concepts, or potential use cases of chatbots in education), as they either do not contain research questions or hypotheses or do not provide conclusions from studies with learners.

After the first, second, and third filters, we identified 505 candidate publications. We continued our filtering process by reading the candidate publications’ full texts resulting in 74 publications that were used for our review. Compared to 3.619 initial database results, the proportion of relevant publications is therefore about 2.0%.

The final publication list can be accessed under https://bit.ly/2RRArFT .

To analyze the identified publications and derive results according to the research questions, full texts were coded, considering for each publication the objectives for implementing chatbots (RQ1), pedagogical roles of chatbots (RQ2), their mentoring roles (RQ3), adaptation of chatbots (RQ4), as well as their implementation domains in education (RQ5) as separated sets of codes. To this end, initial codes were identified by open coding and iteratively improved through comparison, group discussion among the authors, and subsequent code expansion. Further, codes were supplemented with detailed descriptions until a saturation point was reached, where all included studies could be successfully mapped to codes, suggesting no need for further refinement. As an example, codes for RQ2 (Pedagogical Roles) were adapted and refined in terms of their level of abstraction from an initial set of only two codes, 1 ) a code for chatbots in the learning role and 2 ) a code for chatbots in a service-oriented role. After coding a larger set of publications, it became clear that the code for service-oriented chatbots needed to be further distinguished. This was because it summarized e.g. automation activities with activities related to self-regulated learning and thus could not be distinguished sharply enough from the learning role. After refining the code set in the next iteration into a learning role, an assistance role, and a mentoring role, it was then possible to ensure the separation of the individual codes. In order to avoid defining new codes for singular or a very small number of publications, studies were coded as “other” (RQ1) or “not defined” (RQ2), if their occurrence was less than eight publications, representing less than 10% of the publications in the final paper list.

By grouping the resulting relevant publications according to their date of publication, it is apparent that chatbots in education are currently in a phase of increased attention. The release distribution shows slightly lower publication numbers in the current than in the previous year ( Figure 6 ), which could be attributed to a time lag between the actual publication of manuscripts and their dissemination in databases.

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g006.jpg

Identified chatbot publications in education per year.

Applying the curve presented in Figure 6 to Gartner’s Hype Cycle ( Linden and Fenn, 2003 ) suggests that technology around chatbots in education may currently be in the “Innovation Trigger” phase. This phase is where many expectations are placed on the technology, but the practical in-depth experience is still largely lacking.

Objectives for Implementing Chatbots in Education

Regarding RQ1, we extracted implementation objectives for chatbots in education. By analyzing the selected publications we identified that most of the objectives for chatbots in education can be described by one of the following categories: Skill improvement, Efficiency of Education, and Students’ Motivation ( see Figure 7 ). First, the “improvement of a student’s skill” (or Skill Improvement ) objective that the chatbot is supposed to help with or achieve. Here, chatbots are mostly seen as a learning aid that supports students. It is the most commonly cited objective for chatbots. The second objective is to increase the Efficiency of Education in general. It can occur, for example, through the automation of recurring tasks or time-saving services for students and is the second most cited objective for chatbots. The third objective is to increase Students’ Motivation . Finally, the last objective is to increase the Availability of Education . This objective is intended to provide learning or counseling with temporal flexibility or without the limitation of physical presence. In addition, there are other, more diverse objectives for chatbots in education that are less easy to categorize. In cases of a publication indicating more than one objective, the publication was distributed evenly across the respective categories.

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g007.jpg

Objectives for implementing chatbots identified in chatbot publications.

Given these results, we can summarize four major implementing objectives for chatbots. Of these, Skill Improvement is the most popular objective, constituting around one-third of publications (32%). Making up a quarter of all publications, Efficiency of Education is the second most popular objective (25%), while addressing Students’ Motivation and Availability of Education are third (13%) and fourth (11%), respectively. Other objectives also make up a substantial amount of these publications (19%), although they were too diverse to categorize in a uniform way. Examples of these are inclusivity ( Heo and Lee, 2019 ) or the promotion of student teacher interactions ( Mendoza et al., 2020 ).

Pedagogical Roles

Regarding RQ2, it is crucial to consider the use of chatbots in terms of their intended pedagogical role. After analyzing the selected articles, we were able to identify four different pedagogical roles: a supporting learning role, an assisting role, and a mentoring role.

In the supporting learning role ( Learning ), chatbots are used as an educational tool to teach content or skills. This can be achieved through a fixed integration into the curriculum, such as conversation tasks (L. K. Fryer et al., 2020 ). Alternatively, learning can be supported through additional offerings alongside classroom teaching, for example, voice assistants for leisure activities at home ( Bao, 2019 ). Examples of these are chatbots simulating a virtual pen pal abroad ( Na-Young, 2019 ). Conversations with this kind of chatbot aim to motivate the students to look up vocabulary, check their grammar, and gain confidence in the foreign language.

In the assisting role ( Assisting ), chatbot actions can be summarized as simplifying the student's everyday life, i.e., taking tasks off the student’s hands in whole or in part. This can be achieved by making information more easily available ( Sugondo and Bahana, 2019 ) or by simplifying processes through the chatbot’s automation ( Suwannatee and Suwanyangyuen, 2019 ). An example of this is the chatbot in ( Sandoval, 2018 ) that answers general questions about a course, such as an exam date or office hours.

In the mentoring role ( Mentoring ), chatbot actions deal with the student’s personal development. In this type of support, the student himself is the focus of the conversation and should be encouraged to plan, reflect or assess his progress on a meta-cognitive level. One example is the chatbot in ( Cabales, 2019 ), which helps students develop lifelong learning skills by prompting in-action reflections.

The distribution of each pedagogical role is shown in Figure 8 . From this, it can be seen that Learning is the most frequently used role of the examined publications (49%), followed by Assisting (20%) and Mentoring (15%). It should be noted that pedagogical roles were not identified for all the publications examined. The absence of a clearly defined pedagogical role (16%) can be attributed to the more general nature of these publications, e.g. focused on students’ small talk behaviors ( Hobert, 2019b ) or teachers’ attitudes towards chatbot applications in classroom teaching (P. K. Bii et al., 2018 ).

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g008.jpg

Pedagogical roles identified in chatbot publications.

Looking at pedagogical roles in the context of objectives for implementing chatbots, relations among publications can be inspected in a relations graph ( Figure 9 ). According to our results, the strongest relation in the examined publications can be considered between Skill Improvement objective and the Learning role. This strong relation is partly because both, the Skill Improvement objective and the Learning role, are the largest in their respective categories. In addition, two other strong relations can be observed: Between the Students’ Motivation objective and the Learning role, as well as between Efficiency of Education objective and Assisting role.

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g009.jpg

Relations graph of pedagogical roles and objectives for implementing chatbots.

By looking at other relations in more detail, there is surprisingly no relation between Skill Improvement as the most common implementation objective and Assisting , as the 2nd most common pedagogical role. Furthermore, it can be observed that the Mentoring role has nearly equal relations to all of the objectives for implementing chatbots.

The relations graph ( Figure 9 ) can interactively be explored through bit.ly/32FSKQM.

Mentoring Role

Regarding RQ3, we identified eleven publications that deal with chatbots in this regard. The Mentoring role in these publications can be categorized in two dimensions. Starting with the first dimension, the mentoring method, three methods can be observed:

  • • Scaffolding ( n = 7)
  • • Recommending ( n = 3)
  • • Informing ( n = 1)

An example of Scaffolding can be seen in ( Gabrielli et al., 2020 ), where the chatbot coaches students in life skills, while an example of Recommending can be seen in ( Xiao et al., 2019 ), where the chatbot recommends new teammates. Finally, Informing can be seen in ( Kerly et al., 2008 ), where the chatbot informs students about their personal Open Learner Model.

The second dimension is the addressed mentoring topic, where the following topics can be observed:

  • • Self-Regulated Learning ( n = 5)
  • • Life Skills ( n = 4)
  • • Learning Skills ( n = 2)

While Mentoring chatbots to support Self-Regulated Learning are intended to encourage students to reflect on and plan their learning progress, Mentoring chatbots to support Life Skills address general student’s abilities such as self-confidence or managing emotions. Finally, Mentoring chatbots to support Learning Skills , in contrast to Self-Regulated Learning , address only particular aspects of the learning process, such as new learning strategies or helpful learning partners. An example for Mentoring chatbots supporting Life Skill is the Logo counseling chatbot, which promotes healthy self-esteem ( Engel et al., 2020 ). CALMsystem is an example of a Self-Regulated Learning chatbot, which informs students about their data in an open learner model ( Kerly et al., 2008 ). Finally, there is the Learning Skills topic. Here, the MCQ Bot is an example that is designed to introduce students to transformative learning (W. Huang et al., 2019 ).

Regarding RQ4, we identified six publications in the final publication list that address the topic of adaptation. Within these publications, five adaptation approaches are described:

The first approach (A1) is proposed by ( Kerly and Bull, 2006 ) and ( Kerly et al., 2008 ), dealing with student discussions based on success and confidence during a quiz. The improvement of self-assessment is the primary focus of this approach. The second approach (A2) is presented in ( Jia, 2008 ), where the personality of the chatbot is adapted to motivate students to talk to the chatbot and, in this case, learn a foreign language. The third approach (A3), as shown in the work of ( Vijayakumar et al., 2019 ), is characterized by a chatbot that provides personalized formative feedback to learners based on their self-assessment, again in a quiz situation. Here, the focus is on Hattie and Timperley’s three guiding questions: “Where am I going?,” “How am I going?” and “Where to go next?” ( Hattie and Timperley, 2007 ). In the fourth approach (A4), exemplified in ( Ruan et al., 2019 ), the chatbot selects questions within a quiz. Here, the chatbot estimates the student’s ability and knowledge level based on the quiz progress and sets the next question accordingly. Finally, a similar approach (A5) is shown in ( Davies et al., 2020 ). In contrast to ( Ruan et al., 2019 ), this chatbot adapts the amount of question variation and takes psychological features into account which were measured by psychological tests before.

We examined these five approaches by organizing them according to their information sources and extracted learner information. The results can be seen in Table 2 .

Adaptation approaches of chatbots in education.

Four out of five adaptation approaches (A1, A3, A4, and A5) are observed in the context of quizzes. These adaptations within quizzes can be divided into two mainstreams: One is concerned about students’ feedback (A1 and A3), while the other is concerned about learning material selection (A4 and A5). The only different adaptation approach is shown in A2, which focuses on the adaptation of the chatbot personality within a language learning application.

Domains for Chatbots in Education

Regarding RQ5, we identified 20 domains of chatbots in education. These can broadly be divided by their pedagogical role into three domain categories (DC): Learning Chatbots , Assisting Chatbots , and Mentoring Chatbots . The remaining publications are grouped in the Other Research domain category. The complete list of identified domains can be seen in Table 3 .

Domains of chatbots in education.

The domain category Learning Chatbots , which deals with chatbots incorporating the pedagogical role Learning , can be subdivided into seven domains: 1 ) Language Learning , 2 ) Learn to Program , 3 ) Learn Communication Skills , 4 ) Learn about Educational Technologies , 5 ) Learn about Cultural Heritage , 6 ) Learn about Laws , and 7 ) Mathematics Learning . With more than half of publications (53%), chatbots for Language Learning play a prominent role in this domain category. They are often used as chat partners to train conversations or to test vocabulary. An example of this can be seen in the work of ( Bao, 2019 ), which tries to mitigate foreign language anxiety by chatbot interactions in foreign languages.

The domain category Assisting Chatbots , which deals with chatbots incorporating the pedagogical role Assisting , can be subdivided into four domains: 1 ) Administrative Assistance , 2 ) Campus Assistance , 3 ) Course Assistance , and 4 ) Library Assistance . With one-third of publications (33%), chatbots in the Administrative Assistance domain that help to overcome bureaucratic hurdles at the institution, while providing round-the-clock services, are the largest group in this domain category. An example of this can be seen in ( Galko et al., 2018 ), where the student enrollment process is completely shifted to a conversation with a chatbot.

The domain category Mentoring Chatbots , which deals with chatbots incorporating the pedagogical role Mentoring , can be subdivided into three domains: 1 ) Scaffolding Chatbots , 2 ) Recommending Chatbots , and 3 ) Informing Chatbots . An example of a Scaffolding Chatbots is the CRI(S) chatbot ( Gabrielli et al., 2020 ), which supports life skills such as self-awareness or conflict resolution in discussion with the student by promoting helpful ideas and tricks.

The domain category Other Research , which deals with chatbots not incorporating any of these pedagogical roles, can be subdivided into three domains: 1 ) General Chatbot Research in Education , 2 ) Indian Educational System , and 3 ) Chatbot Interfaces . The most prominent domain, General Chatbot Research , cannot be classified in one of the other categories but aims to explore cross-cutting issues. An example for this can be seen in the publication of ( Hobert, 2020 ), which researches the importance of small talk abilities of chatbots in educational settings.

Discussions

In this paper, we investigated the state-of-the-art of chatbots in education according to five research questions. By combining our results with previously identified findings from related literature reviews, we proposed a concept map of chatbots in education. The map, reported in Appendix A , displays the current state of research regarding chatbots in education with the aim of supporting future research in the field.

Answer to Research Questions

Concerning RQ1 (implementation objectives), we identified four major objectives: 1 ) Skill Improvement , 2 ) Efficiency of Education , 3 ) Students’ Motivation, and 4 ) Availability of Education . These four objectives cover over 80% of the analyzed publications ( see Figure 7 ). Based on the findings on CAT3 in section 2 , we see a mismatch between the objectives for implementing chatbots compared to their evaluation. Most researchers only focus on narrow aspects for the evaluation of their chatbots such as learning success, usability, and technology acceptance. This mismatch of implementation objectives and suitable evaluation approaches is also well known by other educational technologies such as Learning Analytics dashboards ( Jivet et al., 2017 ). A more structured approach of aligning implementation objectives and evaluation procedures is crucial to be able to properly assess the effectiveness of chatbots. ( Hobert, 2019a ), suggested a structured four-stage evaluation procedure beginning with a Wizard-of-Oz experiment, followed by technical validation, a laboratory study, and a field study. This evaluation procedure systematically links hypotheses with outcomes of chatbots helping to assess chatbots for their implementation objectives. “Aligning chatbot evaluations with implementation objectives” is, therefore, an important challenge to be addressed in the future research agenda.

Concerning RQ2 (pedagogical roles), our results show that chatbots’ pedagogical roles can be summarized as Learning , Assisting , and Mentoring . The Learning role is the support in learning or teaching activities such as gaining knowledge. The Assisting role is the support in terms of simplifying learners’ everyday life, e.g. by providing opening times of the library. The Mentoring role is the support in terms of students’ personal development, e.g. by supporting Self-Regulated Learning. From a pedagogical standpoint, all three roles are essential for learners and should therefore be incorporated in chatbots. These pedagogical roles are well aligned with the four implementation objectives reported in RQ1. While Skill Improvement and Students’ Motivation is strongly related to Learning , Efficiency of Education is strongly related to Assisting . The Mentoring role instead, is evenly related to all of the identified objectives for implementing chatbots. In the reviewed publications, chatbots are therefore primarily intended to 1 ) improve skills and motivate students by supporting learning and teaching activities, 2 ) make education more efficient by providing relevant administrative and logistical information to learners, and 3 ) support multiple effects by mentoring students.

Concerning RQ3 (mentoring role), we identified three main mentoring method categories for chatbots: 1 ) Scaffolding , 2 ) Recommending , and 3 ) Informing . However, comparing the current mentoring of chatbots reported in the literature with the daily mentoring role of teachers, we can summarize that the chatbots are not at the same level. In order to take over mentoring roles of teachers ( Wildman et al., 1992 ), a chatbot would need to fulfill some of the following activities in their mentoring role. With respect to 1 ) Scaffolding , chatbots should provide direct assistance while learning new skills and especially direct beginners in their activities. Regarding 2 ) Recommending , chatbots should provide supportive information, tools or other materials for specific learning tasks to life situations. With respect to 3 ) Informing, chatbots should encourage students according to their goals and achievements, and support them to develop meta-cognitive skills like self-regulation. Due to the mismatch of teacher vs. chatbot mentoring we see here another research challenge, which we call “Exploring the potential of chatbots for mentoring students.”

Regarding RQ4 (adaptation), only six publications were identified that discuss an adaptation of chatbots, while four out of five adaptation approaches (A1, A3, A4, and A5) show similarities by being applied within quizzes. In the context of educational technologies, providing reasonable adaptations for learners requires a high level of experience. Based on our results, the research on chatbots does not seem to be at this point yet. Looking at adaptation literature like ( Brusilovsky, 2001 ) or ( Benyon and Murray, 1993 ), it becomes clear that a chatbot needs to consider the learners’ personal information to fulfill the requirement of the adaptation definition. Personal information must be retrieved and stored at least temporarily, in some sort of learner model. For learner information like knowledge and interest, adaptations seem to be barely explored in the reviewed publications, while the model of ( Brusilovsky and Millán, 2007 ) points out further learner information, which can be used to make chatbots more adaptive: personal goals, personal tasks, personal background, individual traits, and the learner’s context. We identify research in this area as a third future challenge and call it the “Exploring and leveraging adaptation capabilities of chatbots” challenge.

In terms of RQ5 (domains), we identified a detailed map of domains applying chatbots in education and their distribution ( see Table 3 ). By systematically analyzing 74 publications, we identified 20 domains and structured them according to the identified pedagogical role into four domain categories: Learning Chatbots , Assisting Chatbots , Mentoring Chatbots , and Other Research . These results extend the taxonomy of Application Clusters (AC) for chatbots in education, which previously comprised the work from ( Pérez et al., 2020 ), who took the chatbot activity as characteristic, and ( Winkler and Soellner, 2018 ), who characterized the chatbots by domains. It draws relationships between these two types of Application Clusters (AC) and structures them accordingly. Our structure incorporates Mentoring Chatbots and Other Research in addition to the “service-oriented chatbots” (cf. Assisting Chatbots ) and “teaching-oriented chatbots” (cf. Learning Chatbots ) identified by (Perez). Furthermore, the strong tendencies of informing students already mentioned by ( Smutny and Schreiberova, 2020 ) can also be recognized in our results, especially in Assisting Chatbots . Compared to ( Winkler and Soellner, 2018 ), we can confirm the prominent domains of “language learning” within Learning Chatbots and “metacognitive thinking” within Mentoring Chatbots . Moreover, through Table 3 , a more detailed picture of chatbot applications in education is reflected, which could help researchers to find similar works or unexplored application areas.

Limitations

One important limitation to be mentioned here is the exclusion of alternative keywords for our search queries, as we exclusively used chatbot as keyword in order to avoid search results that do not fit our research questions. Though we acknowledge that chatbots share properties with pedagogical agents, dialog systems, and bots, we carefully considered this trade-off between missing potentially relevant work and inflating our search procedure by including related but not necessarily pertinent work. A second limitation may lie in the formation of categories and coding processes applied, which, due to the novelty of the findings, could not be built upon theoretical frameworks or already existing code books. Although we have focused on ensuring that codes used contribute to a strong understanding, the determination of the abstraction level might have affected the level of detail of the resulting data representation.

In this systematic literature review, we explored the current landscape of chatbots in education. We analyzed 74 publications, identified 20 domains of chatbots and grouped them based on their pedagogical roles into four domain categories. These pedagogical roles are the supporting learning role ( Learning ), the assisting role ( Assisting ), and the mentoring role ( Mentoring ). By focusing on objectives for implementing chatbots, we identified four main objectives: 1 ) Skill Improvement , 2 ) Efficiency of Education , 3 ) Students’ Motivation, and 4 ) Availability of Education . As discussed in section 5 , these objectives do not fully align with the chosen evaluation procedures. We focused on the relations between pedagogical roles and objectives for implementing chatbots and identified three main relations: 1 ) chatbots to improve skills and motivate students by supporting learning and teaching activities, 2 ) chatbots to make education more efficient by providing relevant administrative and logistical information to learners, and 3 ) chatbots to support multiple effects by mentoring students. We focused on chatbots incorporating the Mentoring role and found that these chatbots are mostly concerned with three mentoring topics 1 ) Self-Regulated Learning , 2 ) Life Skills , and 3 ) Learning Skills and three mentoring methods 1 ) Scaffolding , 2 ) Recommending , and 3 ) Informing . Regarding chatbot adaptations, only six publications with adaptations were identified. Furthermore, the adaptation approaches found were mostly limited to applications within quizzes and thus represent a research gap.

Based on these outcomes we consider three challenges for chatbots in education that offer future research opportunities:

Challenge 1: Aligning chatbot evaluations with implementation objectives . Most chatbot evaluations focus on narrow aspects to measure the tool’s usability, acceptance or technical correctness. If chatbots should be considered as learning aids, student mentors, or facilitators, the effects on the cognitive, and emotional levels should also be taken into account for the evaluation of chatbots. This finding strengthens our conclusion that chatbot development in education is still driven by technology, rather than having a clear pedagogical focus of improving and supporting learning.

Challenge 2: Exploring the potential of chatbots for mentoring students . In order to better understand the potentials of chatbots to mentor students, more empirical studies on the information needs of learners are required. It is obvious that these needs differ from schools to higher education. However, so far there are hardly any studies investigating the information needs with respect to chatbots nor if chatbots address these needs sufficiently.

Challenge 3: Exploring and leveraging adaptation capabilities of chatbots . There is a large literature on adaptation capabilities of educational technologies. However, we have seen very few studies on the effect of adaptation of chatbots for education purposes. As chatbots are foreseen as systems that should personally support learners, the area of adaptable interactions of chatbots is an important research aspect that should receive more attention in the near future.

By addressing these challenges, we believe that chatbots can become effective educational tools capable of supporting learners with informative feedback. Therefore, looking at our results and the challenges presented, we conclude, “No, we are not there yet!” - There is still much to be done in terms of research on chatbots in education. Still, development in this area seems to have just begun to gain momentum and we expect to see new insights in the coming years.

APPENDIX A AConcept map of chatbots in education

An external file that holds a picture, illustration, etc.
Object name is frai-04-654924-g010.jpg

Data Availability Statement

Author contributions.

SW, JS†, DM†, JW†, MR, and HD.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

  • Abbasi S., Kazi H., Hussaini N. N. (2019). Effect of Chatbot Systems on Student’s Learning Outcomes . Sylwan 163 ( 10 ). [ Google Scholar ]
  • Abbasi S., Kazi H. (2014). Measuring Effectiveness of Learning Chatbot Systems on Student's Learning Outcome and Memory Retention . Asian J. Appl. Sci. Eng. 3 , 57. 10.15590/AJASE/2014/V3I7/53576 [ CrossRef ] [ Google Scholar ]
  • Almahri F. A. J., Bell D., Merhi M. (2020). “ Understanding Student Acceptance and Use of Chatbots in the United Kingdom Universities: A Structural Equation Modelling Approach ,” in 2020 6th IEEE International Conference on Information Management, ICIM 2020, London, United Kingdom, March 27–29, 2020, (IEEE), 284–288. 10.1109/ICIM49319.2020.244712 [ CrossRef ] [ Google Scholar ]
  • Bao M. (2019). Can Home Use of Speech-Enabled Artificial Intelligence Mitigate Foreign Language Anxiety - Investigation of a Concept . Awej 5 , 28–40. 10.24093/awej/call5.3 [ CrossRef ] [ Google Scholar ]
  • Benyon D., Murray D. (1993). Applying User Modeling to Human-Computer Interaction Design . Artif. Intell. Rev. 7 ( 3-4 ), 199–225. 10.1007/BF00849555 [ CrossRef ] [ Google Scholar ]
  • Bii P. K., Too J. K., Mukwa C. W. (2018). Teacher Attitude towards Use of Chatbots in Routine Teaching . Univers. J. Educ. Res. . 6 ( 7 ), 1586–1597. 10.13189/ujer.2018.060719 [ CrossRef ] [ Google Scholar ]
  • Bii P., Too J., Langat R. (2013). An Investigation of Student’s Attitude Towards the Use of Chatbot Technology in Instruction: The Case of Knowie in a Selected High School . Education Research 4 , 710–716. 10.14303/er.2013.231 [ CrossRef ] [ Google Scholar ]
  • Bos A. S., Pizzato M. C., Vettori M., Donato L. G., Soares P. P., Fagundes J. G., et al. (2020). Empirical Evidence During the Implementation of an Educational Chatbot with the Electroencephalogram Metric . Creative Education 11 , 2337–2345. 10.4236/CE.2020.1111171 [ CrossRef ] [ Google Scholar ]
  • Brusilovsky P. (2001). Adaptive Hypermedia . User Model. User-Adapted Interaction 11 ( 1 ), 87–110. 10.1023/a:1011143116306 [ CrossRef ] [ Google Scholar ]
  • Brusilovsky P., Millán E. (2007). “ User Models for Adaptive Hypermedia and Adaptive Educational Systems ,” in The Adaptive Web: Methods and Strategies of Web Personalization . Editors Brusilovsky P., Kobsa A., Nejdl W.. Berlin: Springer, 3–53. 10.1007/978-3-540-72079-9_1 [ CrossRef ] [ Google Scholar ]
  • Cabales V. (2019). “ Muse: Scaffolding metacognitive reflection in design-based research ,” in CHI EA’19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, Scotland, United Kingdom, May 4–9, 2019, (ACM), 1–6. 10.1145/3290607.3308450 [ CrossRef ] [ Google Scholar ]
  • Carayannopoulos S. (2018). Using Chatbots to Aid Transition. Int. J. Info. Learn. Tech. 35 , 118–129. 10.1108/IJILT-10-2017-0097 [ CrossRef ] [ Google Scholar ]
  • Chan C. H., Lee H. L., Lo W. K., Lui A. K.-F. (2018). Developing a Chatbot for College Student Programme Advisement . in 2018 International Symposium on Educational Technology, ISET 2018, Osaka, Japan, July 31–August 2, 2018. Editors Wang F. L., Iwasaki C., Konno T., Au O., Li C., (IEEE), 52–56. 10.1109/ISET.2018.00021 [ CrossRef ] [ Google Scholar ]
  • Chang M.-Y., Hwang J.-P. (2019). “ Developing Chatbot with Deep Learning Techniques for Negotiation Course ,” in 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019, Toyama, Japan, July 7–11, 2019, (IEEE), 1047–1048. 10.1109/IIAI-AAI.2019.00220 [ CrossRef ] [ Google Scholar ]
  • Chen C.-A., Yang Y.-T., Wu S.-M., Chen H.-C., Chiu K.-C., Wu J.-W., et al. (2018). “ A Study of Implementing AI Chatbot in Campus Consulting Service ”, in TANET 2018-Taiwan Internet Seminar , 1714–1719. 10.6861/TANET.201810.0317 [ CrossRef ] [ Google Scholar ]
  • Chen H.-L., Widarso G. V., Sutrisno H. (2020). A ChatBot for Learning Chinese: Learning Achievement and Technology Acceptance . J. Educ. Comput. Res. 58 ( 6 ), 1161–1189. 10.1177/0735633120929622 [ CrossRef ] [ Google Scholar ]
  • Daud S. H. M., Teo N. H. I., Zain N. H. M. (2020). E-java Chatbot for Learning Programming Language: A post-pandemic Alternative Virtual Tutor . Int. J. Emerging Trends Eng. Res. 8 (7) . 3290–3298. 10.30534/ijeter/2020/67872020 [ CrossRef ] [ Google Scholar ]
  • Davies J. N., Verovko M., Verovko O., Solomakha I. (2020). “ Personalization of E-Learning Process Using Ai-Powered Chatbot Integration ,” in Selected Papers of 15th International Scientific-practical Conference, MODS, 2020: Advances in Intelligent Systems and Computing, Chernihiv, Ukraine, June 29–July 01, 2020. Editors Shkarlet S., Morozov A., Palagin A., (Springer; ) Vol. 1265 , 209–216. 10.1007/978-3-030-58124-4_20 [ CrossRef ] [ Google Scholar ]
  • Diachenko A. V., Morgunov B. P., Melnyk T. P., Kravchenko O. I., Zubchenko L. V. (2019). The Use of Innovative Pedagogical Technologies for Automation of the Specialists' Professional Training . Int. J. Hydrogen. Energy. 8 , 288–295. 10.5430/ijhe.v8n6p288 [ CrossRef ] [ Google Scholar ]
  • Dibitonto M., Leszczynska K., Tazzi F., Medaglia C. M. (2018). “ Chatbot in a Campus Environment: Design of Lisa, a Virtual Assistant to Help Students in Their university Life ,” in 20th International Conference, HCI International 2018, Las Vegas, NV, USA, July 15–20, 2018, Lecture Notes in Computer Science. Editors Kurosu M., (Springer), 103–116. 10.1007/978-3-319-91250-9 [ CrossRef ] [ Google Scholar ]
  • Durall E., Kapros E. (2020). “ Co-design for a Competency Self-Assessment Chatbot and Survey in Science Education ,” in 7th International Conference, LCT 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Lecture Notes in Computer Science. Editors Zaphiris P., Ioannou A., Berlin: Springer; Vol. 12206 , 13–23. 10.1007/978-3-030-50506-6_2 [ CrossRef ] [ Google Scholar ]
  • Duval E., Verbert K. (2012). Learning Analytics . Eleed 8 ( 1 ). [ Google Scholar ]
  • Engel J. D., Engel V. J. L., Mailoa E. (2020). Interaction Monitoring Model of Logo Counseling Website for College Students' Healthy Self-Esteem , I. J. Eval. Res. Educ. 9 , 607–613. 10.11591/ijere.v9i3.20525 [ CrossRef ] [ Google Scholar ]
  • Febriani G. A., Agustia R. D. (2019). Development of Line Chatbot as a Learning Media for Mathematics National Exam Preparation . Elibrary.Unikom.Ac.Id . https://elibrary.unikom.ac.id/1130/14/UNIKOM_GISTY%20AMELIA%20FEBRIANI_JURNAL%20DALAM%20BAHASA%20INGGRIS.pdf .
  • Ferguson R., Sharples M. (2014). “ Innovative Pedagogy at Massive Scale: Teaching and Learning in MOOCs ,” in 9th European Conference on Technology Enhanced Learning, EC-TEL 2014, Graz, Austria, September 16–19, 2014, Lecture Notes in Computer Science. Editors Rensing C., de Freitas S., Ley T., Muñoz-Merino P. J., (Berlin: Springer) Vol. 8719 , 98–111. 10.1007/978-3-319-11200-8_8 [ CrossRef ] [ Google Scholar ]
  • Fryer L. K., Ainley M., Thompson A., Gibson A., Sherlock Z. (2017). Stimulating and Sustaining Interest in a Language Course: An Experimental Comparison of Chatbot and Human Task Partners . Comput. Hum. Behav. 75 , 461–468. 10.1016/j.chb.2017.05.045 [ CrossRef ] [ Google Scholar ]
  • Fryer L. K., Nakao K., Thompson A. (2019). Chatbot Learning Partners: Connecting Learning Experiences, Interest and Competence . Comput. Hum. Behav. 93 , 279–289. 10.1016/j.chb.2018.12.023 [ CrossRef ] [ Google Scholar ]
  • Fryer L. K., Thompson A., Nakao K., Howarth M., Gallacher A. (2020). Supporting Self-Efficacy Beliefs and Interest as Educational Inputs and Outcomes: Framing AI and Human Partnered Task Experiences . Learn. Individual Differences , 80. 10.1016/j.lindif.2020.101850 [ CrossRef ] [ Google Scholar ]
  • Gabrielli S., Rizzi S., Carbone S., Donisi V. (2020). A Chatbot-Based Coaching Intervention for Adolescents to Promote Life Skills: Pilot Study . JMIR Hum. Factors 7 ( 1 ). 10.2196/16762 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Galko L., Porubän J., Senko J. (2018). “ Improving the User Experience of Electronic University Enrollment ,” in 16th IEEE International Conference on Emerging eLearning Technologies and Applications, ICETA 2018, Stary Smokovec, Slovakia, Nov 15–16, 2018. Editors Jakab F., (Piscataway, NJ: IEEE; ), 179–184. 10.1109/ICETA.2018.8572054 [ CrossRef ] [ Google Scholar ]
  • Goda Y., Yamada M., Matsukawa H., Hata K., Yasunami S. (2014). Conversation with a Chatbot before an Online EFL Group Discussion and the Effects on Critical Thinking . J. Inf. Syst. Edu. 13 , 1–7. 10.12937/EJSISE.13.1 [ CrossRef ] [ Google Scholar ]
  • Graesser A. C., VanLehn K., Rose C. P., Jordan P. W., Harter D. (2001). Intelligent Tutoring Systems with Conversational Dialogue . AI Mag. 22 ( 4 ), 39–51. 10.1609/aimag.v22i4.1591 [ CrossRef ] [ Google Scholar ]
  • Greller W., Drachsler H. (2012). Translating Learning into Numbers: A Generic Framework for Learning Analytics . J. Educ. Tech. Soc. 15 ( 3 ), 42–57. 10.2307/jeductechsoci.15.3.42 [ CrossRef ] [ Google Scholar ]
  • Haristiani N., Rifa’i M. M. Combining Chatbot and Social Media: Enhancing Personal Learning Environment (PLE) in Language Learning . Indonesian J Sci Tech. 5 ( 3 ), 487–506. 10.17509/ijost.v5i3.28687 [ CrossRef ] [ Google Scholar ]
  • Hattie J., Timperley H. (2007). The Power of Feedback . Rev. Educ. Res. 77 ( 1 ), 81–112. 10.3102/003465430298487 [ CrossRef ] [ Google Scholar ]
  • Hattie J. (2009). Visible Learning: A Synthesis of over 800 Meta-Analyses Relating to Achievement . Abingdon, UK: Routledge. [ Google Scholar ]
  • Heller B., Proctor M., Mah D., Jewell L., Cheung B. (2005). “ Freudbot: An Investigation of Chatbot Technology in Distance Education ,” in Proceedings of ED-MEDIA 2005–World Conference on Educational Multimedia, Hypermedia and Telecommunications, Montréal, Canada, June 27–July 2, 2005. Editors Kommers P., Richards G., (AACE; ), 3913–3918. [ Google Scholar ]
  • Heo J., Lee J. (2019). “ CiSA: An Inclusive Chatbot Service for International Students and Academics ,” in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science, Orlando, FL, USA, July 26–31, 2019. Editors Stephanidis C., (Springer; ) 11786 , 153–167. 10.1007/978-3-030-30033-3 [ CrossRef ] [ Google Scholar ]
  • Hobert S. (2019a). “ How Are You, Chatbot? Evaluating Chatbots in Educational Settings - Results of a Literature Review ,” in 17. Fachtagung Bildungstechnologien, DELFI 2019 - 17th Conference on Education Technologies, DELFI 2019, Berlin, Germany, Sept 16–19, 2019. Editors Pinkwart N., Konert J., 259–270. 10.18420/delfi2019_289 [ CrossRef ] [ Google Scholar ]
  • Hobert S., Meyer von Wolff R. (2019). “ Say Hello to Your New Automated Tutor - A Structured Literature Review on Pedagogical Conversational Agents ,” in 14th International Conference on Wirtschaftsinformatik, Siegen, Germany, Feb 23–27, 2019. Editors Pipek V., Ludwig T., (AIS; ). [ Google Scholar ]
  • Hobert S. (2019b). Say Hello to ‘Coding Tutor’! Design and Evaluation of a Chatbot-Based Learning System Supporting Students to Learn to Program in International Conference on Information Systems (ICIS) 2019 Conference, Munich, Germany, Dec 15–18, 2019, AIS; 2661 , 1–17. [ Google Scholar ]
  • Hobert S. (2020). Small Talk Conversations and the Long-Term Use of Chatbots in Educational Settings ‐ Experiences from a Field Study in 3rd International Workshop on Chatbot Research and Design, CONVERSATIONS 2019, Amsterdam, Netherlands, November 19–20: Lecture Notes in Computer Science. Editors Folstad A., Araujo T., Papadopoulos S., Law E., Granmo O., Luger E., Brandtzaeg P., (Springer; ) 11970 , 260–272. 10.1007/978-3-030-39540-7_18 [ CrossRef ] [ Google Scholar ]
  • Hsieh S.-W. (2011). Effects of Cognitive Styles on an MSN Virtual Learning Companion System as an Adjunct to Classroom Instructions . Edu. Tech. Society 2 , 161–174. [ Google Scholar ]
  • Huang J.-X., Kwon O.-W., Lee K.-S., Kim Y.-K. (2018). Improve the Chatbot Performance for the DB-CALL System Using a Hybrid Method and a Domain Corpus in Future-proof CALL: language learning as exploration and encounters–short papers from EUROCALL 2018, Jyväskylä, Finland, Aug 22–25, 2018. Editors Taalas P., Jalkanen J., Bradley L., Thouësny S., (Research-publishing.net; ). 10.14705/rpnet.2018.26.820 [ CrossRef ] [ Google Scholar ]
  • Huang W., Hew K. F., Gonda D. E. (2019). Designing and Evaluating Three Chatbot-Enhanced Activities for a Flipped Graduate Course . Int. J. Mech. Engineer. Robotics. Research. 813–818. 10.18178/ijmerr.8.5.813-818 [ CrossRef ] [ Google Scholar ]
  • Ismail M., Ade-Ibijola A. (2019). “ Lecturer's Apprentice: A Chatbot for Assisting Novice Programmers ,”in Proceedings - 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Vanderbijlpark, South Africa, (IEEE), 1–8. 10.1109/IMITEC45504.2019.9015857 [ CrossRef ] [ Google Scholar ]
  • Jia J. (2008). “ Motivate the Learners to Practice English through Playing with Chatbot CSIEC ,” in 3rd International Conference on Technologies for E-Learning and Digital Entertainment, Edutainment 2008, Nanjing, China, June 25–27, 2008, Lecture Notes in Computer Science, (Springer) 5093 , 180–191. 10.1007/978-3-540-69736-7_20 [ CrossRef ] [ Google Scholar ]
  • Jia J. (2004). “ The Study of the Application of a Keywords-Based Chatbot System on the Teaching of Foreign Languages ,” in Proceedings of SITE 2004--Society for Information Technology and Teacher Education International Conference, Atlanta, Georgia, USA. Editors Ferdig R., Crawford C., Carlsen R., Davis N., Price J., Weber R., Willis D., (AACE), 1201–1207. [ Google Scholar ]
  • Jivet I., Scheffel M., Drachsler H., Specht M. (2017). “ Awareness is not enough: Pitfalls of learning analytics dashboards in the educational practice ,” in 12th European Conference on Technology Enhanced Learning, EC-TEL 2017, Tallinn, Estonia, September 12–15, 2017, Lecture Notes in ComputerScience. Editors Lavoué E., Drachsler H., Verbert K., Broisin J., Pérez-Sanagustín M., (Springer), 82–96. 10.1007/978-3-319-66610-5_7 [ CrossRef ] [ Google Scholar ]
  • Jung H., Lee J., Park C. (2020). Deriving Design Principles for Educational Chatbots from Empirical Studies on Human-Chatbot Interaction . J. Digit. Contents Society , 21 , 487–493. 10.9728/dcs.2020.21.3.487 [ CrossRef ] [ Google Scholar ]
  • Kerly A., Bull S. (2006). “ The Potential for Chatbots in Negotiated Learner Modelling: A Wizard-Of-Oz Study ,” in 8th International Conference on Intelligent Tutoring Systems, ITS 2006, Jhongli, Taiwan, June 26–30, 2006, Lecture Notes in Computer Science. Editors Ikeda M., Ashley K. D., Chan T. W., (Springer; ) 4053 , 443–452. 10.1007/11774303 [ CrossRef ] [ Google Scholar ]
  • Kerly A., Ellis R., Bull S. (2008). CALMsystem: A Conversational Agent for Learner Modelling . Knowledge-Based Syst. 21 , 238–246. 10.1016/j.knosys.2007.11.015 [ CrossRef ] [ Google Scholar ]
  • Kerly A., Hall P., Bull S. (2007). Bringing Chatbots into Education: Towards Natural Language Negotiation of Open Learner Models . Knowledge-Based Syst. , 20 , 177–185. 10.1016/j.knosys.2006.11.014 [ CrossRef ] [ Google Scholar ]
  • Kumar M. N., Chandar P. C. L., Prasad A. V., Sumangali K. (2016). “ Android Based Educational Chatbot for Visually Impaired People ,” in 2016 IEEE International Conference on Computational Intelligence and Computing Research, Chennai, India, December 15–17, 2016, 1–4. 10.1109/ICCIC.2016.7919664 [ CrossRef ] [ Google Scholar ]
  • Lee K., Jo J., Kim J., Kang Y. (2019). Can Chatbots Help Reduce the Workload of Administrative Officers? - Implementing and Deploying FAQ Chatbot Service in a University in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science, Orlando, FL, USA, July 26–31, 2019. Editors Stephanidis C., (Springer; ) 1032 , 348–354. 10.1007/978-3-030-23522-2 [ CrossRef ] [ Google Scholar ]
  • Lester J. C., Converse S. A., Kahler S. E., Barlow S. T., Stone B. A., Bhogal R. S. (1997). “ The Persona Effect: Affective Impact of Animated Pedagogical Agents ,” in Proceedings of the ACM SIGCHI Conference on Human factors in computing systems, Atlanta, Georgia, USA, March 22–27, 1997, (ACM), 359–366. [ Google Scholar ]
  • Liberati A., Altman D. G., Tetzlaff J., Mulrow C., Gøtzsche P. C., Ioannidis J. P. A., et al. (2009). The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies that Evaluate Health Care Interventions: Explanation and Elaboration . J. Clin. Epidemiol. 62 ( 10 ), e1–e34. 10.1016/j.jclinepi.2009.06.006 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lin M. P.-C., Chang D. (2020). Enhancing Post-secondary Writers’ Writing Skills with a Chatbot . J. Educ. Tech. Soc. 23 , 78–92. 10.2307/26915408 [ CrossRef ] [ Google Scholar ]
  • Lin Y.-H., Tsai T. (2019). “ A Conversational Assistant on Mobile Devices for Primitive Learners of Computer Programming ,” in TALE 2019 - 2019 IEEE International Conference on Engineering, Technology and Education, Yogyakarta, Indonesia, December 10–13, 2019, (IEEE), 1–4. 10.1109/TALE48000.2019.9226015 [ CrossRef ] [ Google Scholar ]
  • Linden A., Fenn J. (2003). Understanding Gartner’s Hype Cycles . Strategic Analysis Report No. R-20-1971 8 . Stamford, CT: Gartner, Inc. [ Google Scholar ]
  • Liu Q., Huang J., Wu L., Zhu K., Ba S. (2020). CBET: Design and Evaluation of a Domain-specific Chatbot for mobile Learning . Univ. Access Inf. Soc. , 19 , 655–673. 10.1007/s10209-019-00666-x [ CrossRef ] [ Google Scholar ]
  • Mamani J. R. C., Álamo Y. J. R., Aguirre J. A. A., Toledo E. E. G. (2019). “ Cognitive Services to Improve User Experience in Searching for Academic Information Based on Chatbot ,” in Proceedings of the 2019 IEEE 26th International Conference on Electronics, Electrical Engineering and Computing (INTERCON), Lima, Peru, August 12–14, 2019, (IEEE), 1–4. 10.1109/INTERCON.2019.8853572 [ CrossRef ] [ Google Scholar ]
  • Martín-Martín A., Orduna-Malea E., Thelwall M., Delgado López-Cózar E. (2018). Google Scholar, Web of Science, and Scopus: A Systematic Comparison of Citations in 252 Subject Categories . J. Informetrics 12 ( 4 ), 1160–1177. 10.1016/j.joi.2018.09.002 [ CrossRef ] [ Google Scholar ]
  • Matsuura S., Ishimura R. (2017). Chatbot and Dialogue Demonstration with a Humanoid Robot in the Lecture Class , in 11th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2017, held as part of the 19th International Conference on Human-Computer Interaction, HCI 2017, Vancouver, Canada, July 9–14, 2017, Lecture Notes in Computer Science. Editors Antona M., Stephanidis C., (Springer) Vol. 10279 , 233–246. 10.1007/978-3-319-58700-4 [ CrossRef ] [ Google Scholar ]
  • Matsuura S., Omokawa R. (2020). Being Aware of One’s Self in the Auto-Generated Chat with a Communication Robot in UAHCI 2020 , 477–488. 10.1007/978-3-030-49282-3 [ CrossRef ] [ Google Scholar ]
  • McLoughlin C., Oliver R. (1998). Maximising the Language and Learning Link in Computer Learning Environments . Br. J. Educ. Tech. 29 ( 2 ), 125–136. 10.1111/1467-8535.00054 [ CrossRef ] [ Google Scholar ]
  • Mendoza S., Hernández-León M., Sánchez-Adame L. M., Rodríguez J., Decouchant D., Meneses-Viveros A. (2020). “ Supporting Student-Teacher Interaction through a Chatbot ,” in 7th International Conference, LCT 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Lecture Notes in Computer Science. Editors Zaphiris P., Ioannou A., (Springer; ) 12206 , 93–107. 10.1007/978-3-030-50506-6 [ CrossRef ] [ Google Scholar ]
  • Meyer V., Wolff R., Nörtemann J., Hobert S., Schumann M. (2020). “ Chatbots for the Information Acquisition at Universities ‐ A Student’s View on the Application Area ,“in 3rd International Workshop on Chatbot Research and Design, CONVERSATIONS 2019, Amsterdam, Netherlands, November 19–20, Lecture Notes in Computer Science. Editors Folstad A., Araujo T., Papadopoulos S., Law E., Granmo O., Luger E., Brandtzaeg P., (Springer) 11970 , 231–244. 10.1007/978-3-030-39540-7 [ CrossRef ] [ Google Scholar ]
  • Na-Young K. (2018c). A Study on Chatbots for Developing Korean College Students’ English Listening and Reading Skills . J. Digital Convergence 16 . 19–26. 10.14400/JDC.2018.16.8.019 [ CrossRef ] [ Google Scholar ]
  • Na-Young K. (2019). A Study on the Use of Artificial Intelligence Chatbots for Improving English Grammar Skills . J. Digital Convergence 17 , 37–46. 10.14400/JDC.2019.17.8.037 [ CrossRef ] [ Google Scholar ]
  • Na-Young K. (2018a). Chatbots and Korean EFL Students’ English Vocabulary Learning . J. Digital Convergence 16 . 1–7. 10.14400/JDC.2018.16.2.001 [ CrossRef ] [ Google Scholar ]
  • Na-Young K. (2018b). Different Chat Modes of a Chatbot and EFL Students’ Writing Skills Development . 1225–4975. 10.16933/sfle.2017.32.1.263 [ CrossRef ] [ Google Scholar ]
  • Na-Young K. (2017). Effects of Different Types of Chatbots on EFL Learners’ Speaking Competence and Learner Perception . Cross-Cultural Studies 48 , 223–252. 10.21049/ccs.2017.48.223 [ CrossRef ] [ Google Scholar ]
  • Nagata R., Hashiguchi T., Sadoun D. (2020). Is the Simplest Chatbot Effective in English Writing Learning Assistance? , in 16th International Conference of the Pacific Association for Computational Linguistics, PACLING, Hanoi, Vietnam, October 11–13, 2019, Communications in Computer and Information Science. Editors Nguyen L.-M., Tojo S., Phan X.-H., Hasida K., (Springer; ) Vol. 1215 , 245–246. 10.1007/978-981-15-6168-9 [ CrossRef ] [ Google Scholar ]
  • Nelson T. O., Narens L. (1994). Why Investigate Metacognition . in Metakognition: Knowing About Knowing . Editors Metcalfe J., Shimamura P., (MIT Press) 13 , 1–25. [ Google Scholar ]
  • Nghi T. T., Phuc T. H., Thang N. T. (2019). Applying Ai Chatbot for Teaching a Foreign Language: An Empirical Research . Int. J. Sci. Res. 8 . [ Google Scholar ]
  • Ondas S., Pleva M., Hládek D. (2019). How Chatbots Can Be Involved in the Education Process . in ICETA 2019 - 17th IEEE International Conference on Emerging eLearning Technologies and Applications, Proceedings, Stary Smokovec, Slovakia, November 21–22, 2019. Editors Jakab F., (IEEE), 575–580. 10.1109/ICETA48886.2019.9040095 [ CrossRef ] [ Google Scholar ]
  • Pereira J., Fernández-Raga M., Osuna-Acedo S., Roura-Redondo M., Almazán-López O., Buldón-Olalla A. (2019). Promoting Learners' Voice Productions Using Chatbots as a Tool for Improving the Learning Process in a MOOC . Tech. Know Learn. 24 , 545–565. 10.1007/s10758-019-09414-9 [ CrossRef ] [ Google Scholar ]
  • Pérez J. Q., Daradoumis T., Puig J. M. M. (2020). Rediscovering the Use of Chatbots in Education: A Systematic Literature Review . Comput. Appl. Eng. Educ. 28 , 1549–1565. 10.1002/cae.22326 [ CrossRef ] [ Google Scholar ]
  • Pérez-Marín D. (2021). A Review of the Practical Applications of Pedagogic Conversational Agents to Be Used in School and University Classrooms . Digital 1 ( 1 ), 18–33. 10.3390/digital1010002 [ CrossRef ] [ Google Scholar ]
  • Pham X. L., Pham T., Nguyen Q. M., Nguyen T. H., Cao T. T. H. (2018). “ Chatbot as an Intelligent Personal Assistant for mobile Language Learning ,” in ACM International Conference Proceeding Series 10.1145/3291078.3291115 [ CrossRef ] [ Google Scholar ]
  • Quincey E. de., Briggs C., Kyriacou T., Waller R. (2019). “ Student Centred Design of a Learning Analytics System ,” in Proceedings of the 9th International Conference on Learning Analytics & Knowledge, Tempe Arizona, USA, March 4–8, 2019, (ACM), 353–362. 10.1145/3303772.3303793 [ CrossRef ] [ Google Scholar ]
  • Ram A., Prasad R., Khatri C., Venkatesh A., Gabriel R., Liu Q, et al. (2018). Conversational Ai: The Science behind the Alexa Prize , in 1st Proceedings of Alexa Prize (Alexa Prize 2017) . ArXiv [Preprint]. Available at: https://arxiv.org/abs/1801.03604 .
  • Rebaque-Rivas P., Gil-Rodríguez E. (2019). Adopting an Omnichannel Approach to Improve User Experience in Online Enrolment at an E-Learning University , in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science, Orlando, FL, USA, July 26–31, 2019. Editors Stephanidis C., (Springer; ), 115–122. 10.1007/978-3-030-23525-3 [ CrossRef ] [ Google Scholar ]
  • Robinson C. (2019). Impressions of Viability: How Current Enrollment Management Personnel And Former Students Perceive The Implementation of A Chatbot Focused On Student Financial Communication. Higher Education Doctoral Projects.2 . https://aquila.usm.edu/highereddoctoralprojects/2 . [ Google Scholar ]
  • Ruan S., Jiang L., Xu J., Tham B. J.-K., Qiu Z., Zhu Y., Murnane E. L., Brunskill E., Landay J. A. (2019). “ QuizBot: A Dialogue-based Adaptive Learning System for Factual Knowledge ,” in 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019, Glasgow, Scotland, United Kingdom, May 4–9, 2019, (ACM), 1–13. 10.1145/3290605.3300587 [ CrossRef ] [ Google Scholar ]
  • Sandoval Z. V. (2018). Design and Implementation of a Chatbot in Online Higher Education Settings . Issues Inf. Syst. 19 , 44–52. 10.48009/4.iis.2018.44-52 [ CrossRef ] [ Google Scholar ]
  • Sandu N., Gide E. (2019). “ Adoption of AI-Chatbots to Enhance Student Learning Experience in Higher Education in india ,” in 18th International Conference on Information Technology Based Higher Education and Training, Magdeburg, Germany, September 26–27, 2019, (IEEE), 1–5. 10.1109/ITHET46829.2019.8937382 [ CrossRef ] [ Google Scholar ]
  • Saygin A. P., Cicekli I., Akman V. (2000). Turing Test: 50 Years Later . Minds and Machines 10 ( 4 ), 463–518. 10.1023/A:1011288000451 [ CrossRef ] [ Google Scholar ]
  • Sinclair A., McCurdy K., Lucas C. G., Lopez A., Gaševic D. (2019). “ Tutorbot Corpus: Evidence of Human-Agent Verbal Alignment in Second Language Learner Dialogues ,” in EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining. [ Google Scholar ]
  • Smutny P., Schreiberova P. (2020). Chatbots for Learning: A Review of Educational Chatbots for the Facebook Messenger . Comput. Edu. 151 , 103862. 10.1016/j.compedu.2020.103862 [ CrossRef ] [ Google Scholar ]
  • Song D., Rice M., Oh E. Y. (2019). Participation in Online Courses and Interaction with a Virtual Agent . Int. Rev. Res. Open. Dis. 20 , 44–62. 10.19173/irrodl.v20i1.3998 [ CrossRef ] [ Google Scholar ]
  • Stapić Z., Horvat A., Vukovac D. P. (2020). Designing a Faculty Chatbot through User-Centered Design Approach , in 22nd International Conference on Human-Computer Interaction,HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Lecture Notes in Computer Science. Editors Stephanidis C., Harris D., Li W. C., Schmorrow D. D., Fidopiastis C. M., Zaphiris P., (Springer; ), 472–484. 10.1007/978-3-030-60128-7 [ CrossRef ] [ Google Scholar ]
  • Subramaniam N. K. (2019). Teaching and Learning via Chatbots with Immersive and Machine Learning Capabilities . In International Conference on Education (ICE 2019) Proceedings, Kuala Lumpur, Malaysia, April 10–11, 2019. Editors Ali S. A. H., Subramaniam T. T., Yusof S. M., 145–156. [ Google Scholar ]
  • Sugondo A. F., Bahana R. (2019). “ Chatbot as an Alternative Means to Access Online Information Systems ,” in 3rd International Conference on Eco Engineering Development, ICEED 2019, Surakarta, Indonesia, November 13–14, 2019, IOP Conference Series: Earth and Environmental Science, (IOP Publishing) 426 . 10.1088/1755-1315/426/1/012168 [ CrossRef ] [ Google Scholar ]
  • Suwannatee S., Suwanyangyuen A. (2019). “ Reading Chatbot” Mahidol University Library and Knowledge Center Smart Assistant ,” in Proceedings for the 2019 International Conference on Library and Information Science (ICLIS), Taipei, Taiwan, July 11–13, 2019. [ Google Scholar ]
  • Vaidyam A. N., Wisniewski H., Halamka J. D., Kashavan M. S., Torous J. B. (2019). Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape . Can. J. Psychiatry 64 ( 7 ), 456–464. 10.1177/0706743719828977 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vijayakumar B., Höhn S., Schommer C. (2019). “ Quizbot: Exploring Formative Feedback with Conversational Interfaces ,” in 21st International Conference on Technology Enhanced Assessment, TEA 2018, Amsterdam, Netherlands, Dec 10-11, 2018. Editors Draaijer S., Joosten-ten B. D., Ras E., (Springer; ), 102–120. 10.1007/978-3-030-25264-9 [ CrossRef ] [ Google Scholar ]
  • Virtanen M. A., Haavisto E., Liikanen E., Kääriäinen M. (2018). Ubiquitous Learning Environments in Higher Education: A Scoping Literature Review . Educ. Inf. Technol. 23 ( 2 ), 985–998. 10.1007/s10639-017-9646-6 [ CrossRef ] [ Google Scholar ]
  • Wildman T. M., Magliaro S. G., Niles R. A., Niles J. A. (1992). Teacher Mentoring: An Analysis of Roles, Activities, and Conditions . J. Teach. Edu. 43 ( 3 ), 205–213. 10.1177/0022487192043003007 [ CrossRef ] [ Google Scholar ]
  • Wiley D., Edwards E. K. (2002). Online Self-Organizing Social Systems: The Decentralized Future of Online Learning . Q. Rev. Distance Edu. 3 ( 1 ), 33–46. [ Google Scholar ]
  • Winkler R., Soellner M. (2018). Unleashing the Potential of Chatbots in Education: A State-Of-The-Art Analysis . in Academy of Management Annual Meeting Proceedings 2018 2018 ( 1 ), 15903. 10.5465/AMBPP.2018.15903abstract [ CrossRef ] [ Google Scholar ]
  • Winne P. H., Hadwin A. F. (2008). “ The Weave of Motivation and Self-Regulated Learning ,” in Motivation and Self-Regulated Learning: Theory, Research, and Applications . Editors Schunk D. H., Zimmerman B. J., (Mahwah, NJ: Lawrence Erlbaum Associates Publishers; ), 297–314. [ Google Scholar ]
  • Wisniewski B., Zierer K., Hattie J. (2019). The Power of Feedback Revisited: A Meta-Analysis of Educational Feedback Research . Front. Psychol. 10 , 1664–1078. 10.3389/fpsyg.2019.03087 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wolfbauer I., Pammer-Schindler V., Rose C. P. (2020). “ Rebo Junior: Analysis of Dialogue Structure Quality for a Reflection Guidance Chatbot ,” in Proceedings of the Impact Papers at EC-TEL 2020, co-located with the 15th European Conference on Technology-Enhanced Learning “Addressing global challenges and quality education” (EC-TEL 2020), Virtual, Sept 14–18, 2020. Editors Broos T., Farrell T., 1–14. [ Google Scholar ]
  • Xiao Z., Zhou M. X., Fu W.-T. (2019). “ Who should be my teammates: Using a conversational agent to understand individuals and help teaming ,” in IUI’19: Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray, California, USA, March 17–20, 2019, (ACM), 437–447. 10.1145/3301275.3302264 [ CrossRef ] [ Google Scholar ]
  • Xu A., Liu Z., Guo Y., Sinha V., Akkiraju R. (2017). “ A New Chatbot for Customer Service on Social media ,” in Proceedings of the 2017 CHI conference on human factors in computing systems, Denver, Colorado, USA, May 6–11, 2017, ACM, 3506–3510. 10.1145/3025453.3025496 [ CrossRef ] [ Google Scholar ]
  • Yin J., Goh T.-T., Yang B., Xiaobin Y. (2020). Conversation Technology with Micro-learning: The Impact of Chatbot-Based Learning on Students' Learning Motivation and Performance . J. Educ. Comput. Res. 59 , 154–177. 10.1177/0735633120952067 [ CrossRef ] [ Google Scholar ]

Help | Advanced Search

Computer Science > Computation and Language

Title: a literature survey of recent advances in chatbots.

Abstract: Chatbots are intelligent conversational computer systems designed to mimic human conversation to enable automated online guidance and support. The increased benefits of chatbots led to their wide adoption by many industries in order to provide virtual assistance to customers. Chatbots utilise methods and algorithms from two Artificial Intelligence domains: Natural Language Processing and Machine Learning. However, there are many challenges and limitations in their application. In this survey we review recent advances on chatbots, where Artificial Intelligence and Natural Language processing are used. We highlight the main challenges and limitations of current work and make recommendations for future research investigation.

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • Open access
  • Published: 11 December 2021

A systematic review of artificial intelligence chatbots for promoting physical activity, healthy diet, and weight loss

  • Yoo Jung Oh   ORCID: orcid.org/0000-0002-7829-8535 1 ,
  • Jingwen Zhang 1 , 2 ,
  • Min-Lin Fang 3 &
  • Yoshimi Fukuoka 4  

International Journal of Behavioral Nutrition and Physical Activity volume  18 , Article number:  160 ( 2021 ) Cite this article

18k Accesses

64 Citations

21 Altmetric

Metrics details

This systematic review aimed to evaluate AI chatbot characteristics, functions, and core conversational capacities and investigate whether AI chatbot interventions were effective in changing physical activity, healthy eating, weight management behaviors, and other related health outcomes.

In collaboration with a medical librarian, six electronic bibliographic databases (PubMed, EMBASE, ACM Digital Library, Web of Science, PsycINFO, and IEEE) were searched to identify relevant studies. Only randomized controlled trials or quasi-experimental studies were included. Studies were screened by two independent reviewers, and any discrepancy was resolved by a third reviewer. The National Institutes of Health quality assessment tools were used to assess risk of bias in individual studies. We applied the AI Chatbot Behavior Change Model to characterize components of chatbot interventions, including chatbot characteristics, persuasive and relational capacity, and evaluation of outcomes.

The database search retrieved 1692 citations, and 9 studies met the inclusion criteria. Of the 9 studies, 4 were randomized controlled trials and 5 were quasi-experimental studies. Five out of the seven studies suggest chatbot interventions are promising strategies in increasing physical activity. In contrast, the number of studies focusing on changing diet and weight status was limited. Outcome assessments, however, were reported inconsistently across the studies. Eighty-nine and thirty-three percent of the studies specified a name and gender (i.e., woman) of the chatbot, respectively. Over half (56%) of the studies used a constrained chatbot (i.e., rule-based), while the remaining studies used unconstrained chatbots that resemble human-to-human communication.

Chatbots may improve physical activity, but we were not able to make definitive conclusions regarding the efficacy of chatbot interventions on physical activity, diet, and weight management/loss. Application of AI chatbots is an emerging field of research in lifestyle modification programs and is expected to grow exponentially. Thus, standardization of designing and reporting chatbot interventions is warranted in the near future.

Systematic review registration

International Prospective Register of Systematic Reviews (PROSPERO): CRD42020216761 .

Artificial Intelligence (AI) chatbots, also called conversational agents, employ dialogue systems to enable natural language conversations with users by means of speech, text, or both [ 1 ]. Powered by natural language processing and cloud computing infrastructures, AI chatbots can participate in a broad range, from constrained (i.e., rule-based) to unconstrained conversations (i.e., human-to-human-like communication) [ 1 ]. According to a Pew Research Center survey, 46% of American adults interact with voice-based chatbots (e.g., Apple’s Siri and Amazon’s Alexa) on smartphones and other devices [ 2 ]. The use of AI chatbots in business and finance is rapidly increasing; however, their use in lifestyle modification and health promotion programs remains limited.

Physical inactivity, poor diet, and obesity are global health issues [ 3 ]. They are well-known modifiable risk factors for cardiovascular diseases, type 2 diabetes, certain types of cancers, cognitive decline, and premature death [ 3 , 4 , 5 , 6 ]. However, despite years of attempts to raise awareness about the importance of physical activity (PA) and healthy eating, individuals often do not get enough PA nor do they have healthy eating habits [ 7 , 8 ], resulting in an increasing prevalence of obesity [ 9 , 10 ]. With emerging digital technologies, there has been an increasing number of programs aimed at promoting PA, healthy eating, and/or weight loss, that utilize the internet, social media, and mobile devices in diverse populations [ 11 , 12 , 13 , 14 ]. Several systematic reviews and meta-analyses [ 15 , 16 , 17 , 18 , 19 ] have shown that these digital technology-based programs resulted in increased PA and reduced body weight, at least for a short duration. While digital technologies may not address environmental factors that constrain an individual’s health environment, technology-based programs can provide instrumental help in finding healthier alternatives or facilitating the creation of supportive social groups [ 13 , 14 ]. Moreover, these interventions do not require traditional in-site visits, and thus, help reduce participants’ time and financial costs [ 16 ]. Albeit such potentials, current research programs are still constrained in their capacity to personalize the intervention, deliver tailored content, or adjust the frequency and timing of the intervention based on individual needs in real time.

These limitations can be overcome by utilizing AI chatbots, which have great potential to increase the accessibility and efficacy of personalized lifestyle modification programs [ 20 , 21 ]. Enabling AI chatbots to communicate with individuals via web or mobile applications can make these personalized programs available 24/7 [ 21 , 22 ]. Furthermore, AI chatbots provide new communication modalities for individuals to receive, comprehend, and utilize information, suggestions, and assistance on a personal level [ 20 , 22 ], which can help overcome one’s lack of self-efficacy or social support [ 20 ]. AI chatbots have been utilized in a variety of health care domains such as medical consultations, disease diagnoses, mental health support [ 1 , 23 ], and more recently, risk communications for the COVID-19 pandemic [ 24 ]. Results from a few systematic reviews and meta-analyses suggest that chatbots have a high potential for healthcare and psychiatric use, such as promoting antipsychotic medication adherence as well as reducing stress, anxiety, and/or depression symptoms [ 1 , 25 , 26 ]. However, to the best of our knowledge, none of these studies have focused on the efficacy of AI chatbot-based lifestyle modification programs and the evaluation of chatbot designs and technologies.

Therefore, this systematic review aimed to describe AI chatbot characteristics, functions (e.g., the chatbot’s persuasive and relational strategies), and core conversational capacities, and investigate whether AI chatbot interventions were effective in changing PA, diet, weight management behaviors, and other related health outcomes. We applied the AI Chatbot Behavior Change Model [ 22 ], designed to inform the conceptualization, design, and evaluation of chatbots, to guide our review. The systematic review provides new insights about the strengths and limitations in current AI chatbot-based lifestyle modification programs and can assist researchers and clinicians in building scalable and personalized systems for diverse populations.

The protocol of this systematic review was registered at the International Prospective Register of Systematic Reviews (PROSPERO) (ID: CRD42020216761). The systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis guidelines.

Eligibility criteria

Table  1 shows the summary of the inclusion and exclusion criteria of the study characteristics based on the PICOS framework (i.e., populations/participants, interventions and comparators, outcome(s) of interest, and study designs/type) [ 27 ]. We included peer-reviewed papers or conference proceedings that were available in full-text written in English. Review papers, protocols, editorials, opinion pieces, and dissertations were excluded.

Information sources and search strategy

In consultation with a medical librarian (MF), pre-planned systematic search strategies were used for six electronic databases (PubMed, EMBASE, ACM Digital Library, Web of Science Core Collection, PsycINFO, and IEEE). A combination of MeSH/Emtree terms and keyword searches were used to identify studies on AI chatbot use in lifestyle changes; the comprehensive search strategies for each database are provided in Additional file  1 . Further, hand-searching was done to ensure that relevant articles were not missed during the data collection. The searches were completed on November 14, 2020. No date limits were applied to the searches.

Study selection

All retrieved references were imported into the Endnote reference management software [ 28 ], and duplicates were removed. The remaining references were imported into the Covidence systematic review software [ 29 ], and additional duplicates were removed. Before screening the articles, three researchers (YO, JZ, and YF) met to discuss the procedure for title and abstract screening using 20 randomly selected papers. In the first phase of screening, two researchers (YO and JZ) independently assessed all study titles and abstracts against the eligibility criteria in Table 1 . The agreement in the abstract and title screening between the two reviewers was 97.4% (Cohen’s Kappa = .725). Then, they (YO and JZ) read the remaining studies in full length. The agreement for full text screening was 91.9% (Cohen’s Kappa = .734). Discrepancies at each stage were resolved through discussion with a third researcher (YF).

Data collection process and data items

Data extraction forms were developed based on the AI Chatbot Behavior Change Model [ 22 ], which provides a comprehensive framework for analyzing and evaluating chatbot designs and technologies. This model consists of four major components that provide guidelines to develop and evaluate AI chatbots for health behavior changes: 1) designing chatbot characteristics and understanding user background, 2) building relational capacity, 3) building persuasive capacity, and 4) evaluating mechanisms and outcomes. Based on the model, the data extraction forms were initially drafted by YF and discussed among the research team members. One researcher (YO) extracted information on study and sample characteristics, chatbot characteristics, intervention characteristics, outcome measures and results for main outcomes (i.e., PA, diet, and weight loss) and secondary outcomes (i.e., engagement, acceptability/satisfaction, adverse events, and others). Study and sample characteristics consisted of study aim, study design, theoretical framework, sample size, age, sex/gender, race/ethnicity, education, and income. Chatbot characteristics included the systematic features the chatbots were designed with (i.e., chatbot name and gender, media, user input, conversation initiation, relational capacity, persuasion capacity, safety, and ethics discussion). Intervention characteristics included information such as intervention duration and frequency, intervention components, and technological features (e.g., system infrastructure, platform). Two researchers (YF and JZ) independently validated the extracted data.

Quality assessment and risk of bias

Two reviewers (YO and JZ) independently evaluated the risk of bias of included studies using the two National Institutes of Health (NIH) quality assessment tools [ 30 ]. Randomized controlled trials (RCTs) were assessed for methodological quality using the NIH Quality Assessment of Controlled Intervention Studies. For quasi-experimental studies, the NIH Quality Assessment Tool for Before-After (Pre-Post) Studies with No Control Group was used. Using these tools, the quality of each study was categorized into three groups (“good,” “fair,” and “poor”). These tools were used to assess confidence in the evaluations and conclusions of this systematic review. We did not use these tools to exclude the findings of poor quality studies. It should be noted that the studies included in this systematic review were behavioral intervention trials targeting individual-level outcomes. Therefore, criteria asking 1) whether participants did not know which treatment group they were assigned to and 2) the statistical analyses of group-level data were considered inapplicable.

Synthesis of results

Due to the heterogeneity in the types of study outcomes, outcome measures, and clinical trial designs, we qualitatively evaluated and synthesized the results of the studies. We did not conduct a meta-analysis and did not assess publication bias.

Figure  1 shows the study selection process. The search yielded 2360 references in total, from which 668 duplicates were removed. A total of 1692 abstracts were then screened, among which 1630 were judged ineligible, leaving 62 papers to be read in full text. In total, 9 papers met the eligibility criteria and were included.

figure 1

Flow diagram of the article screening process

Summary of study designs and sample characteristics

The 9 included papers had been recently published (3 were published in 2020 [ 20 , 31 , 32 ], 4 in 2019 [ 21 , 33 , 34 , 35 ], and 2 in 2018 [ 36 , 37 ]). Table  2 provides details of the characteristics of each study. Two studies [ 21 , 37 ] were conducted in the United States and the remaining 7 were conducted in Switzerland [ 31 , 33 , 36 ], Australia [ 20 ], South Korea [ 32 ], and Italy [ 34 ] (1 not reported [ 35 ]). In total, 891 participants were represented in the 9 studies, with sample sizes ranging from 19 to 274 participants. The mean age of the samples ranged from 15.2 to 56.2 years (SD range  = 2.0 to 13.7), and females/women represented 42.1 to 87.9% of the sample. One study [ 21 ] solely targeted an adolescent population, whereas most studies targeted an adult population [ 20 , 31 , 32 , 33 , 34 , 35 , 37 ]. One study [ 36 ] did not report the target population’s age. Participants’ race/ethnicity information was not reported in 8 out of the 9 studies. The study [ 21 ] that reported participants’ race/ethnicity information included 43% Hispanic, 39% White, 9% Black, and 9% Asian participants. Participants’ education and income backgrounds were not reported in 5 out of the 9 studies. Among the 4 studies [ 31 , 34 , 35 , 37 ] that reported the information, the majority included undergraduate students or people with graduate degrees. Overall, reporting of participants’ sociodemographic information was inconsistent and insufficient across the studies.

Five studies employed quasi-experimental designs [ 20 , 21 , 35 , 36 , 37 ], and 4 were RCTs [ 31 , 32 , 33 , 34 ]. Only 5 studies [ 21 , 31 , 32 , 35 , 37 ] used at least one theoretical framework. One was guided by 3 theories [ 35 ] and another by 4 theories [ 21 ]. The theories used in the 5 studies included the Health Action Process Approach ( n  = 2), the Habit Formation Model ( n  = 1), the Technology Acceptance Model ( n  = 1), the AttrakDiff Model ( n  = 1), Cognitive Behavioral Therapy ( n  = 1), Emotionally Focused Therapy ( n  = 1), Behavioral Activation ( n  = 1), Motivational Interviewing ( n  = 1), and the Structured Reflection Model ( n  = 1). It is notable that most of these theories were used to design the intervention contents for inducing behavioral changes. Only the Technology Acceptance Model and the AttrakDiff Model were relevant for guiding the designs of the chatbot characteristics and their technological platforms, independent from intervention contents.

Summary of intervention and chatbot characteristics

Figure  2 provides a visual summary of AI chatbot characteristics and intervention outcomes, and Table  3 provides more detailed information. The 9 studies varied in intervention and program length, lasting from 1 week to 3 months. For most studies ( n  = 8), the chatbot was the only intervention component for delivering contents and engaging with the participants. One study used multi-intervention components, and participants had access to an AI chatbot along with a study website with educational materials [ 20 ]. A variety of commercially available technical platforms were used to host the chatbot and deliver the interventions, including Slack ( n  = 2), KakaoTalk ( n  = 1), Facebook messenger ( n  = 3), Telegram messenger ( n  = 1), WhatsApp ( n  = 1), and short messaging services (SMS) ( n  = 2). One study used 4 different platforms to deliver the intervention [ 21 ], and 2 studies used a chatbot app (i.e., Ally app) that was available on both Android and iOS systems [ 31 , 33 ].

figure 2

Summary of chatbot characteristics and intervention outcomes

Following the AI Chatbot Behavior Change Model [ 22 ], we extracted features of the chatbot and intervention characteristics (Table 3 ). Regarding chatbot characteristics, identity features, such as specific names ( n  = 8) [ 20 , 21 , 31 , 32 , 33 , 35 , 36 , 37 ] and chatbot gender ( n  = 3) [ 20 , 31 , 33 ], were specified. Notably, the chatbot gender was woman in the 3 studies that reported it [ 20 , 31 , 33 ]. All 9 chatbots delivered messages in text format. In addition to text, 3 chatbots used graphs [ 31 , 33 , 37 ], 2 used images [ 32 , 35 ], 1 used voice [ 21 ], and 1 used a combination of graphs, images, and videos [ 36 ].

In 5 studies, the chatbots were constrained (i.e., users could only select pre-programmed responses in the chat) [ 31 , 33 , 34 , 35 , 36 ], and in 4, the chatbots were unconstrained (i.e., users could freely type or speak to the chatbot) [ 20 , 21 , 32 , 37 ]. Six chatbots [ 31 , 32 , 33 , 34 , 36 , 37 ] delivered daily intervention messages to the study participants. One chatbot communicated only on a weekly basis [ 20 ], and 1 communicated daily, weekly, on weekends or weekdays or at a scheduled date and time [ 35 ]. One study did not specify when and how often the messages were delivered [ 21 ]. Only 3 chatbots [ 20 , 21 , 32 ] were available on-demand so that users could initiate conversation at any preferred time. Most chatbots were equipped with relational capacity ( n  = 8; i.e., conversation strategy to establish, maintain, or enhance social relationships with users) and persuasive capacity ( n  = 9; i.e., conversation strategy to change user’s behaviors and behavioral determinants), meaning that the conversations were designed to induce behavioral changes while engaging with users socially. While only 1 study [ 21 ] documented data security, none of the studies provided information on participant safety or ethics (i.e., ethical principle or standards with which the chatbot is designed).

Summary of outcome measures and changes in outcomes

Figure 2 also illustrates the outcome measures and changes in the main and secondary outcomes reported in both RCTs and quasi-experimental studies. Among 7 studies that measured PA [ 20 , 21 , 31 , 32 , 33 , 35 , 37 ], 2 used objective measures [ 31 , 33 ], 4 used self-reported measures [ 20 , 21 , 32 , 35 ], and 1 used both [ 37 ]. Self-reported dietary intake was measured in 4 studies [ 20 , 34 , 35 , 36 ]. Only 1 study assessed objective changes in weight in a research office visit [ 20 ]. Details of intervention outcomes, including direction of effects, statistical significance, and magnitude, are presented in Table  4 .

Sample sizes of the 4 RCT studies ranged from 106 to 274 and a priori power analyses were reported in 3 [ 31 , 32 , 34 ], which showed that the sample sizes had sufficient power for analyzing the specified outcomes. Of the 4 RCT studies [ 31 , 32 , 33 , 34 ], 3 reported PA outcomes using daily step count [ 31 , 33 ] and a self-reported habit index [ 32 ]. In these RCTs, the AI chatbot intervention group resulted in a significant increase in PA, as compared to the control group, over the respective study period (6 weeks to 3 months). In terms of dietary change, 1 study [ 34 ] reported that participants in the intervention group showed higher self-reported intention to reduce red and processed meat consumption compared to the control group during a 2-week period.

In contrast, sample sizes for the 5 quasi-experimental studies were small, ranging from 19 to 36 participants, suggesting that these studies may lack statistical power to detect potential intervention effects. Among the 5 quasi-experimental studies, 2 [ 21 , 37 ] reported only PA change outcomes, 1 [ 36 ] reported only diet change outcomes, and 2 [ 20 , 35 ] reported both outcomes. With regard to PA-related outcomes, 2 studies reported statistically significant improvements [ 20 , 37 ]. Specifically, [ 20 ] observed increased moderate and vigorous PA over the study period [ 37 ]. found a significant increase in the habitual action of PA. One study [ 35 ] found no difference in PA intention within the intervention period. Although this study did not observe a statistically significant increase in PA intention, it revealed that among participants with either high or low intervention adherence, their PA intention showed an increasing trend over the study period [ 21 ]. only reported descriptive statistics and showed that participants experienced positive progress towards PA goals 81% of the time.

Among the quasi-experimental studies, only 1 study reported a statistically significant increase in diet adherence over 12 weeks [ 20 ] [ 35 ]. reported no difference of healthy diet intention over 3 weeks. In this study, participants with high intervention adherence showed a marginal increase, whereas, those with low adherence showed decreased healthy diet intention [ 36 ]. reported that participants’ meal consumption improved in 65% of the cases. The only study [ 20 ] reporting pre-post weight change outcomes using objective weight measures showed that participants experienced a significant weight loss (1.3 kg) from baseline to 12 weeks. To summarize, non-significant findings and a lack of statistical reporting were more prevalent in the quasi-experimental studies, but the direction of intervention effects were similar to those reported in the RCTs.

Engagement, acceptability/satisfaction, and safety measures were reported as secondary outcomes in 7 studies [ 20 , 21 , 31 , 33 , 35 , 36 , 37 ]. Five studies reported engagement [ 20 , 21 , 31 , 33 , 37 ] using various types of measurements, such as user response rate to chatbot messages [ 31 ], frequency of users’ weekly check-ins [ 20 ], and length of conversations between the chatbot and users [ 21 ]. Three studies measured acceptability/satisfaction of the chatbot [ 21 , 35 , 36 ] using measures such as technology acceptance [ 35 ], helpfulness of the chatbot [ 21 ], and perceived efficiency of chatbot communications [ 36 ]. Regarding reporting of adverse events (e.g., experiencing side effects from interventions), only 1 study reported that no adverse events related to study participation were experienced [ 20 ]. Three studies reported additional measures, including feasibility of subject enrollment [ 20 ], using the AttrakDiff questionnaire for measuring four aspects of the chatbot (i.e., pragmatic, hedonic, appealing, social) [ 35 ], and assessing perceived mindfulness about own behaviors [ 37 ].

Among 5 studies that reported engagement [ 20 , 21 , 31 , 33 , 37 ], only 1 [ 33 ] reported statistical significance of the effects of intrinsic (e.g., age, personality traits) and extrinsic factors (e.g., time and day of the delivery, location) on user engagement (e.g., conversation engagement, response delay). Among 3 studies [ 21 , 35 , 36 ] that reported acceptability/satisfaction, 1 study [ 35 ] found that the acceptability of the chatbot was significantly higher than the middle score corresponding to “neutral” (i.e., 4 on a 7-point scale). One study that reported the safety of the intervention did not include statistical significance [ 20 ]. Three studies reported other measures [ 20 , 35 , 37 ], and 1 found that pragmatic, hedonic, appealing, and social ratings of the chatbot were significantly higher than the middle score [ 35 ]. Another study [ 37 ] found no significant changes in the perceived mindfulness between pre- and post-study.

Summary of quality assessment and risk of bias

The results of risk of bias assessments of the 9 studies are reported in Additional file  2 . Of the 4 RCT studies [ 31 , 32 , 33 , 34 ], 3 were rated as fair [ 31 , 32 , 34 ] and 1 was rated as poor [ 33 ] due to its lack of reporting of several critical. The poorly rated study did not report overall dropout rates or the differential dropout rates between treatment groups, did not report that the sample size was sufficiently large to be able to detect differences between groups (i.e., no power analysis), and did not prespecify outcomes for hypothesis testing. Of the 5 quasi-experimental studies [ 20 , 21 , 35 , 36 , 37 ], 1 study was rated as fair [ 20 ] and 4 studies were rated as poor [ 21 , 35 , 36 , 37 ] due to flaws with regard to several critical. These studies reported neither a power analysis to ensure that the sample size was sufficiently large, nor follow-up rates after baseline. Additionally, the statistical methods did not examine pre-to-post changes in outcome measures and lacked reporting of statistical significance.

This systematic review aimed to evaluate the characteristics and potential efficacy of AI chatbot interventions to promote PA, healthy diet, and/or weight management. Most studies focused on changes in PA, and majority [ 20 , 31 , 32 , 33 , 37 ] reported significant improvements in PA-related behaviors. The number of studies with the aim to change diet and weight status was small. Two studies [ 20 , 34 ] found significant improvements in diet-related behaviors. Although only 1 study [ 20 ] reported weight-related outcomes, it reported significant weight change after the intervention. In summation, chatbots can improve PA, but the study not able to make definitive conclusions on the potential efficacy of chatbot interventions on promoting PA, healthy eating, or weight loss.

This qualitative synthesis of effects needs to be interpreted with caution given that the reviewed studies lack consistent usage of measurements and reporting of outcome evaluations. These studies used different measurements and statistical methods to evaluate PA and diet outcomes. For example, 1 study [ 20 ] measured one’s self-reported change in MVPA during the intervention period to gauge the efficacy of the intervention, whereas in another study [ 31 ] step-goal achievement was used as a measure of the intervention efficacy. The two quasi-experimental studies did not report statistical significance of the pre-post changes in PA or diet outcomes [ 21 , 36 ]. Such inconsistency in evaluating the potential efficacy of interventions has been reported in previous systematic reviews [ 1 , 38 ]. To advance the application of chatbot interventions in lifestyle modification programs and to demonstrate the rigor of their efficacy, future studies should examine multiple behavior change indicators, ideally incorporating objectively measured outcomes.

Consistent with other systematic reviews of chatbot interventions in health care and mental health [ 1 , 38 ], reporting of participants’ engagement, acceptability/satisfaction, and adverse events was limited in the studies. In particular, engagement, acceptability, and satisfaction measures varied across the studies, impeding the systematic summarization and assessment of various intervention implementations. For instance, 1 study [ 33 ] used user response rates and user response delay as engagement measures, whereas in another study [ 21 ], the duration of conversation and the ratio of chatbot-initiated on patient-initiated conversations were used to assess the level of user engagement. Inconsistent reporting of user engagement, acceptability, and satisfaction measures may be problematic because it could contribute challenges to the interpretation and comparison of the results across different chatbot systems [ 1 ]. Therefore, standardization of these measures should be implemented in future research. For example, as suggested in previous studies [ 39 , 40 ], conversational turns per session can be a viable, objective, and quantitative metric for user engagement. Regarding reporting of adverse events, despite the recommendation of reporting adverse events in clinical trials by the Consolidated Standards of Reporting Trials Group [ 41 ], only 1 study [ 20 ] reported adverse events. It is recommended that future studies consistently assess and report any unexpected events resulting from the use of AI chatbots to prevent any side effects or potential harm to participants.

Theoretical frameworks for designing and evaluating a chatbot system are essential to understand the rationale behind participants’ motivation, engagement, and behaviors. However, theoretical frameworks were not reported in many of the studies included in this systematic review. The lack of theoretical foundations of existing chatbot systems has also been noted in previous literature [ 42 ]. In this review, we found that the majority of AI chatbots were equipped with persuasion strategies (e.g., setting personalized goals) and relational strategies (e.g., showing empathy) to establish, maintain, or enhance social relationships with participants. The application of theoretical frameworks will guide in developing effective communicative strategies that can be implemented into chatbot designs. For example, designing chatbots with personalized messages can be more effective than non-tailored and standardized messages [ 43 , 44 ]. For relational strategies, future studies can benefit from drawing on the literature on human-computer interaction and relational agents (e.g., [ 45 , 46 ]) and interpersonal communication theories (e.g., Social Penetration Theory [ 47 ]) to develop strategies to facilitate relation formation between participants and chatbots.

Regarding designs of chatbot characteristics and dialogue systems, the rationale behind using human-like identity features (e.g., gender selection) on chatbots was rarely discussed. Only 1 study [ 31 ] referred to literature on human-computer interaction [ 48 ] and discussed the importance of using human-like identity features on chatbots to facilitate successful human-chatbot relationships. Additionally, only one chatbot [ 21 ] was able to deliver spoken outputs. This is inconsistent with a previous systematic review on chatbots used in health care, in which spoken chatbot output was identified as the most common delivery mode across the studies [ 1 ].

With regard to user input, over half of the studies [ 31 , 33 , 34 , 35 , 36 ] used a constrained AI chatbot, while the remaining [ 20 , 21 , 32 , 37 ] used unconstrained AI chatbots. Constrained AI chatbots are rule-based, well-structured, and easy to build, control, and implement, thus ensuring the quality and consistency in the structure and delivery of content [ 42 ]. However, they are not able to adapt to participants’ inquiries and address emergent questions, and are, thus, not suitable for sustaining more natural and complex interactions with participants [ 42 ]. In contrast, unconstrained AI chatbots are known to simulate naturalistic human-to-human communication and may strengthen interventions in general, particularly in the long-term, due to their flexibility and adaptability in conversations [ 1 , 38 , 42 ]. With increasing access to large health care datasets, advanced technologies [ 49 ], and new developments in machine learning that allow for complex dialogue management methods and conversational flexibility [ 1 ], employing unconstrained chatbots to yield long-term efficacy may become more feasible in future research. For instance, increasing the precision of natural language understanding and generation will allow for AI chatbots to better engage users in conversations and follow up with tailored intervention messages.

Safety and data security criteria are essential in designing chatbots. However, only 1 study provided descriptions of these criteria. Conversations between study participants and chatbots should be carefully monitored since erroneous chatbot responses may result in unintended harm. In particular, as conversational flexibility increases, there may be an increase in potential errors associated with natural language understanding or response generation [ 1 ]. Thus, using unconstrained chatbots should be accompanied with careful monitoring of participant and chatbot interactions, and of safety functions.

Strengths and limitations

This review has several strengths. First, to the best of our knowledge, this is the first review to systematically examine the characteristics and potential efficacy of AI chatbot interventions in lifestyle modifications, thereby providing crucial insights for identifying gaps and future directions for research and clinical practice. Second, we developed comprehensive search strategies with an MLS for six electronic databases to increase the sensitivity and comprehensiveness of our search. Despite its strengths, several limitations need to also be acknowledged. First, we did not search gray literature in this systematic review. Second, we limited our search to peer-reviewed studies published as full-text in English only. Lastly, due to the heterogeneity of outcome measures and the limited number of RCT designs in this systematic review, we were not able to conduct a meta-analysis and make firm conclusions of the potential efficacy of chatbot interventions. In addition, the small sample sizes used by the studies made it difficult to scale the results to general populations. More RCTs with larger sample sizes and longer study durations are needed to determine the efficacy of AI chatbot interventions on improving PA, diet, and weight loss.

Conclusions

AI chatbot technologies and their commercial applications continue to rapidly develop, as do the number of studies about these technologies. Chatbots may improve PA, but this study was not able to make definitive conclusions of the potential efficacy of chatbot interventions on PA, diet, and weight management/loss. Despite the rapid increase in publications about chatbot designs and interventions, standard measures for evaluating chatbot interventions and theory-guided chatbots are still lacking. Thus, there is a need for future studies to use standardized criteria for evaluating chatbot implementation and efficacy. Additionally, theoretical frameworks that can capture the unique factors of human-chatbot interactions for behavior changes need to be developed and used to guide future AI chatbot interventions. Lastly, as increased adoption of chatbots will be expected for diverse populations, future research needs to consider equity and equality in designing and implementing chatbot interventions. For target populations with different sociodemographic backgrounds (e.g., living environment, race/ethnicity, cultural backgrounds, etc.), specifically tailored designs and sub-group evaluations need to be employed to ensure adequate delivery and optimal intervention impact.

Availability of data and materials

Not applicable.

Abbreviations

Artificial Intelligence

International Prospective Register of Systematic Reviews

Randomized controlled trial

  • Physical activity

Medical librarian

Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, et al. Conversational agents in healthcare: a systematic review. J Am Med Inform Assoc. 2018;25(9):1248–58.

Article   PubMed   PubMed Central   Google Scholar  

Pew Research Center. Nearly half of Americans use digital vocie assistants, mostly on their smartphones 2017. Available from: https://www.pewresearch.org/fact-tank/2017/12/12/nearly-half-of-americans-use-digital-voice-assistants-mostly-on-their-smartphones/ .

Google Scholar  

Farhud DD. Impact of lifestyle on health. Iran J Public Health. 2015;44(11):1442.

PubMed   PubMed Central   Google Scholar  

Cecchini M, Sassi F, Lauer JA, Lee YY, Guajardo-Barron V, Chisholm D. Tackling of unhealthy diets, physical inactivity, and obesity: health effects and cost-effectiveness. Lancet. 2010;376(9754):1775–84.

Article   PubMed   Google Scholar  

Wagner K-H, Brath H. A global view on the development of non communicable diseases. Prev Med. 2012;54:S38–41.

Bennett JE, Stevens GA, Mathers CD, Bonita R, Rehm J, Kruk ME, et al. NCD countdown 2030: worldwide trends in non-communicable disease mortality and progress towards sustainable development goal target 3.4. Lancet. 2018;392(10152):1072–88.

Article   Google Scholar  

Clarke T, Norris T, Schiller JS. Early release of selected estimates based on data from the National Health Interview Survey. Natl Center Health Stat. 2019.

Department of Health and Human Services. Physical Activity Guidelines for Americans, 2nd edition. Washington, DC: U.S. Department of Health and Human Services; 2018. Available from: https://health.gov/sites/default/files/2019-09/Physical_Activity_Guidelines_2nd_edition.pdf .

Hales C, Carroll M, Fryar C, Ogden C. Prevalence of obesity and severe obesity among adults: United States, 2017–2018. NCHS Data Brief, no 360. Hyattsville: National Center for Health Statistics; 2020.

Prentice AM. The emerging epidemic of obesity in developing countries. Int J Epidemiol. 2006;35(1):93–9.

Vandelanotte C, Müller AM, Short CE, Hingle M, Nathan N, Williams SL, et al. Past, present, and future of eHealth and mHealth research to improve physical activity and dietary behaviors. J Nutr Educ Behav. 2016;48(3):219–28. e1.

Case MA, Burwick HA, Volpp KG, Patel MS. Accuracy of smartphone applications and wearable devices for tracking physical activity data. Jama. 2015;313(6):625–6.

Article   PubMed   CAS   Google Scholar  

Zhang J, Brackbill D, Yang S, Becker J, Herbert N, Centola D. Support or competition? How online social networks increase physical activity: a randomized controlled trial. Prev Med Rep. 2016;4:453–8.

Zhang J, Brackbill D, Yang S, Centola D. Efficacy and causal mechanism of an online social media intervention to increase physical activity: results of a randomized controlled trial. Prev Med Rep. 2015;2:651–7.

Mateo GF, Granado-Font E, Ferré-Grau C, Montaña-Carreras X. Mobile phone apps to promote weight loss and increase physical activity: a systematic review and meta-analysis. J Med Internet Res. 2015;17(11):e253.

Manzoni GM, Pagnini F, Corti S, Molinari E, Castelnuovo G. Internet-based behavioral interventions for obesity: an updated systematic review. Clin Pract Epidemiol Ment Health. 2011;7:19.

Beleigoli AM, Andrade AQ, Cançado AG, Paulo MN, Maria De Fátima HD, Ribeiro AL. Web-based digital health interventions for weight loss and lifestyle habit changes in overweight and obese adults: systematic review and meta-analysis. J Med Internet Res. 2019;21(1):e298.

Laranjo L, Arguel A, Neves AL, Gallagher AM, Kaplan R, Mortimer N, et al. The influence of social networking sites on health behavior change: a systematic review and meta-analysis. J Am Med Inform Assoc. 2015;22(1):243–56.

Laranjo L, Ding D, Heleno B, Kocaballi B, Quiroz JC, Tong HL, et al. Do smartphone applications and activity trackers increase physical activity in adults? Systematic review, meta-analysis and metaregression. Br J Sports Med. 2021;55(8):422–32.

Maher CA, Davis CR, Curtis RG, Short CE, Murphy KJ. A physical activity and diet program delivered by artificially intelligent virtual health coach: proof-of-concept study. JMIR mHealth uHealth. 2020;8(7):e17558.

Stephens TN, Joerin A, Rauws M, Werk LN. Feasibility of pediatric obesity and prediabetes treatment support through Tess, the AI behavioral coaching chatbot. Transl Behav Med. 2019;9(3):440–7.

Zhang J, Oh YJ, Lange P, Yu Z, Fukuoka Y. Artificial intelligence Chatbot behavior change model for designing artificial intelligence Chatbots to promote physical activity and a healthy diet. J Med Internet Res. 2020;22(9):e22845.

Pereira J, Díaz Ó. Using health chatbots for behavior change: a mapping study. J Med Syst. 2019;43(5):135.

Miner AS, Laranjo L, Kocaballi AB. Chatbots in the fight against the COVID-19 pandemic. NPJ Digit Med. 2020;3(1):1–4.

Gentner T, Neitzel T, Schulze J, Buettner R. A Systematic literature review of medical chatbot research from a behavior change perspective. In 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE; 2020. p. 735-40.

Vaidyam AN, Wisniewski H, Halamka JD, Kashavan MS, Torous JB. Chatbots and conversational agents in mental health: a review of the psychiatric landscape. Can J Psychiatry. 2019;64(7):456–64.

Schardt C, Adams MB, Owens T, Keitz S, Fontelo P. Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med Inform Decis Making. 2007;7(1):1–6.

Clarivate Analytics. Endnote x9 2019. Available from: https://endnote.com/ .

Covidence systematic review software. Melbourne, Australia: Veritas Health Innovation. Available from: www.covidence.org .

NIH National Heart, Lung, and Blood Institute. Study quality assessment tools. Available from: https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools .

Kramer J-N, Künzler F, Mishra V, Smith SN, Kotz D, Scholz U, et al. Which components of a smartphone walking app help users to reach personalized step goals? Results from an optimization trial. Ann Behav Med. 2020;54(7):518–28.

Piao M, Ryu H, Lee H, Kim J. Use of the healthy lifestyle coaching Chatbot app to promote stair-climbing habits among office workers: exploratory randomized controlled trial. JMIR mHealth uHealth. 2020;8(5):e15085.

Künzler F, Mishra V, Kramer J-N, Kotz D, Fleisch E, Kowatsch T. Exploring the state-of-receptivity for mhealth interventions. Proc ACM Interact Mobile Wearable Ubiquitous Technol. 2019;3(4):1–27.

Carfora V, Bertolotti M, Catellani P. Informational and emotional daily messages to reduce red and processed meat consumption. Appetite. 2019;141:104331.

Fadhil A, Wang Y, Reiterer H. Assistive conversational agent for health coaching: a validation study. Methods Inf Med. 2019;58(1):009–23.

Casas J, Mugellini E, Khaled OA, editors. Food diary coaching chatbot. Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers; 2018.

Kocielnik R, Xiao L, Avrahami D, Hsieh G. Reflection companion: a conversational system for engaging users in reflection on physical activity. Proc ACM Interact Mobile Wearable Ubiquitous Technol. 2018;2(2):1–26.

Milne-Ives M, de Cock C, Lim E, Shehadeh MH, de Pennington N, Mole G, et al. The effectiveness of artificial intelligence conversational agents in health care: systematic review. J Med Internet Res. 2020;22(10):e20346.

Abd-Alrazaq A, Safi Z, Alajlani M, Warren J, Househ M, Denecke K. Technical metrics used to evaluate health care Chatbots: scoping review. J Med Internet Res. 2020;22(6):e18301.

Shum H-Y, He X-d, Li D. From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic. Engineering. 2018;19(1):10–26.

Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux P, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. J Clin Epidemiol. 2010;63(8):e1–e37.

Fadhil A. Can a chatbot determine my diet?: Addressing challenges of chatbot application for meal recommendation. arXiv preprint arXiv:180209100. 2018.

Kreuter MW, Wray RJ. Tailored and targeted health communication: strategies for enhancing information relevance. Am J Health Behav. 2003;27(1):S227–S32.

Noar SM, Harrington NG, Aldrich RS. The role of message tailoring in the development of persuasive health communication messages. Ann Int Commun Assoc. 2009;33(1):73–133.

Bickmore TW, Caruso L, Clough-Gorr K, Heeren T. ‘It’s just like you talk to a friend’relational agents for older adults. Interact Comput. 2005;17(6):711–35.

Sillice MA, Morokoff PJ, Ferszt G, Bickmore T, Bock BC, Lantini R, et al. Using relational agents to promote exercise and sun protection: assessment of participants’ experiences with two interventions. J Med Internet Res. 2018;20(2):e48.

Altman I, Taylor DA. Social penetration: the development of interpersonal relationships. New York: Holt, Rinehart & Winston; 1973.

Nass C, Steuer J, Tauber ER, editors. Computers are social actors. Proceedings of the SIGCHI conference on Human factors in computing systems; 1994.

Murdoch TB, Detsky AS. The inevitable application of big data to health care. Jama. 2013;309(13):1351–2.

Download references

Acknowledgements

This project was supported by a grant (K24NR015812) from the National Institute of Nursing Research (Dr. Fukuoka) and the Team Science Award by the University of California, San Francisco Academic Senate Committee on Research. Publication made possible in part by support from the UCSF Open Access Publishing Fund. The study sponsors had no role in the study design; collection, analysis, or interpretation of data; writing the report; or the decision to submit the report for publication.

Author information

Authors and affiliations.

Department of Communication, University of California Davis, Davis, USA

Yoo Jung Oh & Jingwen Zhang

Department of Public Health Sciences, University of California Davis, Davis, USA

Jingwen Zhang

Education and Research Services, University of California, San Francisco (UCSF) Library, UCSF, San Francisco, USA

Min-Lin Fang

Department of Physiological Nursing, UCSF, San Francisco, USA

Yoshimi Fukuoka

You can also search for this author in PubMed   Google Scholar

Contributions

YF, JZ, and YO contributed to the conception and design of the review; MF and YO developed the search strategies; YF, JZ, and YO contributed to the screening of papers and synthesizing the results into tables; YF, JZ, and YO wrote sections of the systematic review. All authors contributed to manuscript revision, read, and approved the submitted version. YF is the guarantor of the review.

Corresponding author

Correspondence to Yoo Jung Oh .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Search strategies for PubMed, EMBASE, ACM Digital Library, Web of Science, PsycINFO, and IEEE.

Additional file 2.

Summary of quality assessment and risk of bias.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Oh, Y.J., Zhang, J., Fang, ML. et al. A systematic review of artificial intelligence chatbots for promoting physical activity, healthy diet, and weight loss. Int J Behav Nutr Phys Act 18 , 160 (2021). https://doi.org/10.1186/s12966-021-01224-6

Download citation

Received : 31 May 2021

Accepted : 10 November 2021

Published : 11 December 2021

DOI : https://doi.org/10.1186/s12966-021-01224-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence
  • Conversational agent
  • Weight loss
  • Weight maintenance
  • Sedentary behavior
  • Systematic review

International Journal of Behavioral Nutrition and Physical Activity

ISSN: 1479-5868

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review for chatbot project

A Systematic Literature Review of Medical Chatbot Research from a Behavior Change Perspective

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IMAGES

  1. New Chatbot for Research paper Writing

    literature review for chatbot project

  2. (PDF) A Literature Review On Chatbots In Healthcare Domain

    literature review for chatbot project

  3. (PDF) Are We There Yet?

    literature review for chatbot project

  4. (PDF) REVIEW OF CHATBOT DESIGN AND TRENDS

    literature review for chatbot project

  5. (PDF) A Systematic Literature Review of Medical Chatbot Research from a

    literature review for chatbot project

  6. Chatbot Development Platform with ML and DL

    literature review for chatbot project

VIDEO

  1. AI CHATBOT PROJECT

  2. Education and Chatgpt

  3. chatbot project video

  4. Chatbot Project 2-2: Build Language

  5. Building ChatBot with React-Chatbot-Kit 2023: Last Message

  6. Chatbot Project 2-2 Building Language Part 2

COMMENTS

  1. (PDF) A Literature Review on chatbots in education: An ...

    A Literature Review on chatbots in education: An intelligent chat agent. Gil Maria dos Santos Romão. Catholic University of Mozambique. [email protected]. Abstract. The class size in a ...

  2. Are We There Yet?

    Chatbots are a promising technology with the potential to enhance workplaces and everyday life. In terms of scalability and accessibility, they also offer unique possibilities as communication and information tools for digital learning. In this paper, we present a systematic literature review investigating the areas of education where chatbots have already been applied, explore the pedagogical ...

  3. Exploring agent-based chatbots: a systematic literature review

    This section describes the definition of the structured research questions and the development of the review protocol describing the search strategy, the inclusion and exclusion criteria, the biases and disagreement resolution, and the quality criteria.. 3.1 Research questions. As introduced in Sect. 1, the research community has proposed the usage of multi-agent-based chatbots in recent years ...

  4. Role of AI chatbots in education: systematic literature review

    AI chatbots shook the world not long ago with their potential to revolutionize education systems in a myriad of ways. AI chatbots can provide immediate support by answering questions, offering explanations, and providing additional resources. Chatbots can also act as virtual teaching assistants, supporting educators through various means. In this paper, we try to understand the full benefits ...

  5. Chatbots applications in education: A systematic review

    This study involves a review of existing literature on the use of Chatbots systems in education; thus, the protocol guidelines for systematic reviews in software engineering outlined by (Kitchenham et al., 2007; Wohlin et al., 2012, pp. 45-54) are applied.These guidelines consist of three-step processes including planning, conducting, and reporting.

  6. PDF Role of AI chatbots in education: systematic literature review

    this vast amount of literature, providing a comprehensive understanding of the current research status concerning the inuence of chatbots in education. By conducting a sys-tematic review, we seek to identify common themes, trends, and patterns in the impact of chatbots on education and provide a holistic view of the research, enabling research-

  7. Designing a Chatbot for Contemporary Education: A Systematic Literature

    A chatbot is a technological tool that can simulate a discussion between a human and a program application. This technology has been developing rapidly over recent years, and its usage is increasing rapidly in many sectors, especially in education. For this purpose, a systematic literature review was conducted using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses ...

  8. A Meta-Analysis and Systematic Review of the Effect of Chatbot ...

    The development of artificial intelligence in recent years has promoted the use of chatbot technology in sustainable education. Many studies examined the effect of chatbots on learning outcomes. However, scant studies summarized the effectiveness of chatbots in education. The aim of the study is to investigate the effect of chatbot-assisted learning on various components and how different ...

  9. Interacting with educational chatbots: A systematic review

    Chatbots hold the promise of revolutionizing education by engaging learners, personalizing learning activities, supporting educators, and developing deep insight into learners' behavior. However, there is a lack of studies that analyze the recent evidence-based chatbot-learner interaction design techniques applied in education. This study presents a systematic review of 36 papers to ...

  10. A Literature Survey of Recent Advances in Chatbots

    Additionally, we considered publications that focused on literature review and literature survey of chatbots. ... An Approach to Enhance Chatbot Semantic Power and Maintainability: Experiences within the FRASI Project. In Proceedings of the 2012 IEEE Sixth International Conference on Semantic Computing, Palermo, Italy, 19-21 September 2012. ...

  11. A Literature Survey of Recent Advances in Chatbots

    5. Related Works. Previous literature survey work on different aspects of chatbots have focused on the design and implementation, chatbot history and background, evaluation methods and the application of chatbots in specific domain. Our work is similar to previous work where we outline the background of chatbot.

  12. Future directions for chatbot research: an interdisciplinary ...

    Chatbots are increasingly becoming important gateways to digital services and information—taken up within domains such as customer service, health, education, and work support. However, there is only limited knowledge concerning the impact of chatbots at the individual, group, and societal level. Furthermore, a number of challenges remain to be resolved before the potential of chatbots can ...

  13. Educational chatbots for project-based learning: investigating learning

    Educational chatbots (ECs) are chatbots designed for pedagogical purposes and are viewed as an Internet of Things (IoT) interface that could revolutionize teaching and learning. These chatbots are strategized to provide personalized learning through the concept of a virtual assistant that replicates humanized conversation. Nevertheless, in the education paradigm, ECs are still novel with ...

  14. The Role of Chatbots in Enhancing Customer Experience: Literature Review

    AI powered agents in the context of customer experience are grounded in two main streams of research: information systems (IS) and marketing [6]. However, studies on the importance of chatbots for enhancing customers experience are scarce. Therefore, the purpose of this study is two-fold: • To provide insights from the literature of human ...

  15. Are We There Yet?

    In order to accurately cover the field of research and deal with the plethora of terms for chatbots in the literature (e.g. chatbot, dialogue system or pedagogical conversational agent) we propose the following definition: Chatbots are digital systems that can be interacted with entirely through natural language via text or voice interfaces.

  16. [2201.06657] A Literature Survey of Recent Advances in Chatbots

    A Literature Survey of Recent Advances in Chatbots. Guendalina Caldarini, Sardar Jaf, Kenneth McGarry. Chatbots are intelligent conversational computer systems designed to mimic human conversation to enable automated online guidance and support. The increased benefits of chatbots led to their wide adoption by many industries in order to provide ...

  17. Emotionally Intelligent Chatbots: A Systematic Literature Review

    Conversational technologies are transforming the landscape of human-machine interaction. Chatbots are increasingly being used in several domains to substitute human agents in performing tasks, answering questions, giving advice, and providing social and emotional support. Therefore, improving user satisfaction with these technologies is imperative for their successful integration. Researchers ...

  18. A systematic review of artificial intelligence chatbots for promoting

    This systematic review aimed to evaluate AI chatbot characteristics, functions, and core conversational capacities and investigate whether AI chatbot interventions were effective in changing physical activity, healthy eating, weight management behaviors, and other related health outcomes. In collaboration with a medical librarian, six electronic bibliographic databases (PubMed, EMBASE, ACM ...

  19. Information

    The findings identify critical success factors for chatbot projects, and a model is developed and validated to support the planning and implementation of chatbot projects. ... A Systematic Literature Review of Information Security in Chatbots. Appl. Sci. 2023, 13, 6355. [Google Scholar] Wolford, B. What Is GDPR, the EU's New Data Protection ...

  20. A Systematic Literature Review of Medical Chatbot Research from a

    From a perspective of behavioral change, we review medical chatbot literature included in top peer-reviewed journals and conferences, and build up a comprehensive picture. We examine literature on how people feel about using a medical chatbot and based on that how far chatbots are useful to change harmful behavior. To structure the review we use the theory of planned behavior and the theory of ...

  21. ChatGPT in healthcare: A taxonomy and systematic review

    [11] Aydın Ömer, Karaarslan Enis, OpenAI ChatGPT generated literature review: digital twin in healthcare, in: Emerging Computer Technologies 2, 2022, pp. 22 - 31. Google Scholar [12] Baker Adam, et al., A comparison of artificial intelligence and human doctors for the purpose of triage and diagnosis, Front. Artif. Intell. 3 (2020). Google ...