ai tools for systematic literature review

  • Help Center

GET STARTED

Rayyan

COLLABORATE ON YOUR REVIEWS WITH ANYONE, ANYWHERE, ANYTIME

Rayyan for students

Save precious time and maximize your productivity with a Rayyan membership. Receive training, priority support, and access features to complete your systematic reviews efficiently.

Rayyan for Librarians

Rayyan Teams+ makes your job easier. It includes VIP Support, AI-powered in-app help, and powerful tools to create, share and organize systematic reviews, review teams, searches, and full-texts.

Rayyan for Researchers

RESEARCHERS

Rayyan makes collaborative systematic reviews faster, easier, and more convenient. Training, VIP support, and access to new features maximize your productivity. Get started now!

Over 1 billion reference articles reviewed by research teams, and counting...

Intelligent, scalable and intuitive.

Rayyan understands language, learns from your decisions and helps you work quickly through even your largest systematic literature reviews.

WATCH A TUTORIAL NOW

Solutions for Organizations and Businesses

ai tools for systematic literature review

Rayyan Enterprise and Rayyan Teams+ make it faster, easier and more convenient for you to manage your research process across your organization.

  • Accelerate your research across your team or organization and save valuable researcher time.
  • Build and preserve institutional assets, including literature searches, systematic reviews, and full-text articles.
  • Onboard team members quickly with access to group trainings for beginners and experts.
  • Receive priority support to stay productive when questions arise.
  • SCHEDULE A DEMO
  • LEARN MORE ABOUT RAYYAN TEAMS+

RAYYAN SYSTEMATIC LITERATURE REVIEW OVERVIEW

ai tools for systematic literature review

LEARN ABOUT RAYYAN’S PICO HIGHLIGHTS AND FILTERS

ai tools for systematic literature review

Join now to learn why Rayyan is trusted by already more than 500,000 researchers

Individual plans, teams plans.

For early career researchers just getting started with research.

Free forever

  • 3 Active Reviews
  • Invite Unlimited Reviewers
  • Import Directly from Mendeley
  • Industry Leading De-Duplication
  • 5-Star Relevance Ranking
  • Advanced Filtration Facets
  • Mobile App Access
  • 100 Decisions on Mobile App
  • Standard Support
  • Revoke Reviewer
  • Online Training
  • PICO Highlights & Filters
  • PRISMA (Beta)
  • Auto-Resolver 
  • Multiple Teams & Management Roles
  • Monitor & Manage Users, Searches, Reviews, Full Texts
  • Onboarding and Regular Training

Professional

For researchers who want more tools for research acceleration.

Per month billed annually

  • Unlimited Active Reviews
  • Unlimited Decisions on Mobile App
  • Priority Support
  • Auto-Resolver

For currently enrolled students with valid student ID.

Per month billed annually

Billed monthly

For a team that wants professional licenses for all members.

Per-user, per month, billed annually

  • Single Team
  • High Priority Support

For teams that want support and advanced tools for members.

  • Multiple Teams
  • Management Roles

For organizations who want access to all of their members.

Annual Subscription

Contact Sales

  • Organizational Ownership
  • For an organization or a company
  • Access to all the premium features such as PICO Filters, Auto-Resolver, PRISMA and Mobile App
  • Store and Reuse Searches and Full Texts
  • A management console to view, organize and manage users, teams, review projects, searches and full texts
  • Highest tier of support – Support via email, chat and AI-powered in-app help
  • GDPR Compliant
  • Single Sign-On
  • API Integration
  • Training for Experts
  • Training Sessions Students Each Semester
  • More options for secure access control

ANNUAL ONLY

Per-user, billed monthly

Rayyan Subscription

membership starts with 2 users. You can select the number of additional members that you’d like to add to your membership.

Total amount:

Click Proceed to get started.

Great usability and functionality. Rayyan has saved me countless hours. I even received timely feedback from staff when I did not understand the capabilities of the system, and was pleasantly surprised with the time they dedicated to my problem. Thanks again!

This is a great piece of software. It has made the independent viewing process so much quicker. The whole thing is very intuitive.

Rayyan makes ordering articles and extracting data very easy. A great tool for undertaking literature and systematic reviews!

Excellent interface to do title and abstract screening. Also helps to keep a track on the the reasons for exclusion from the review. That too in a blinded manner.

Rayyan is a fantastic tool to save time and improve systematic reviews!!! It has changed my life as a researcher!!! thanks

Easy to use, friendly, has everything you need for cooperative work on the systematic review.

Rayyan makes life easy in every way when conducting a systematic review and it is easy to use.

An envelope with Laser AI colors behind it

Please contact us by email:

to book a demo session of Laser AI.

The next generation tool for systematic reviews

Built from the ground up to support Living Systematic Reviews, Laser AI provides a unique blend of innovation in methods and technology

About Laser AI

Revolutionize your research with AI-supported Living Systematic Reviews. Streamline screening and data extraction while maintaining control at every stage.

Accelerate the process from research to discovery with the help of AI

Develop laser focus thanks to our innovative user interface

Share data across the organization and never do the same work twice

saves on average 50% of time when compared to a manual workflow.

An independent evaluation by a team at NIEHS shows that LASER AI data extraction module reduces extraction time by 53% without any quality loss.

AI-powered research tools save time and empower policymakers and public health professionals to make well-informed decisions by rapidly sifting through extensive data sets. For health economics and outcome research, our tool contributes to a society where evidence-based strategies support choices that benefit everyone's well-being.

Seamless Screening

Dive into title & abstract screening and full-text screening with efficiency. Laser AI's innovative user interface highlights what matters and streamlines the screening process, giving your research team more time

AI-Assisted Data Extraction

Harmonise data extraction and reduce double work with custom templates that suit your needs. Quality assurance and data cleaning modules make your review audit-ready and transparent with a detailed project history.

Organize and reuse

Improve consistency and collaboration with shared project templates and data extraction forms . Work with your team in real-time from anywhere and reuse data across projects with the help of controlled vocabularies .

MEET LASER AI

Seamless screening.

Our innovative user interface highlights what matters, allowing researchers to perform their tasks more efficiently and reducing fatigue.

Users are always in the driver’s seat - we just build a smarter car!

Semi-automated data extraction

Harmonize data extraction with quality assurance and data cleaning modules and make your review audit-ready and transparent with detailed project history.

Improve consistency with shared project templates and data extraction forms .

Organize and reuse data across projects with help of controlled vocabularies .

Recent Publications

Cochrane colloquium sep 2023, german medical science, keep up to date:.

May 23, 2024

HTA in Evidence-Based Decision Making

Health Technology Assessments (HTAs) help guide evidence-based healthcare decisions. From traditional methods to new approaches, explore how HTAs impact policy, practice, and innovation in an evolving healthcare landscape.

ai tools for systematic literature review

May 8, 2024

ISPOR Atlanta 2024

Here is some of Evidence Primes thoughts for the ISPOR Atlanta conference in 2024.

ai tools for systematic literature review

April 9, 2024

HTAs: Innovations and Trends in Evidence Synthesis

Health Technology Assessments (HTAs) are evolving as the need for better decision-making tools rise. Different innovations and trends are enhancing the decision-making processes. Staying informed and contribute to the future of healthcare.

ai tools for systematic literature review

March 12, 2024

Ethical Implications of AI in Healthcare

Bias, accountability, privacy - explore ethical implications of AI in healthcare and how AI should be fair and secure.

ai tools for systematic literature review

AI in Evidence Synthesis

Delve into the topic of AI in evidence synthesis with Laser AI and Nested Knowledge. With outside industry professionals, you'll learn valuable industry insights and how AI is being used.

ai tools for systematic literature review

February 20, 2024

February 8, 2024

Types of AI in Healthcare

Explore the types of AI and evidence-based healthcare and learn how it is promising better patient outcomes and a brighter future overall.

ai tools for systematic literature review

Conducting A Systematic Review with Laser AI

Whether you're a current user or exploring Laser AI, learn how the tool can improve your systematic review workflow by watching our webinar and/or reading the Q&A from the event.

ai tools for systematic literature review

January 30, 2024

November 12, 2023

ISPOR Europe 2023

Evidence Prime is excited to attend ISPOR Europe 2023! Join us at #ISPOREurope as we showcase GRADEpro & Laser AI, offering HEOR professionals efficient literature reviews and guideline development.

ai tools for systematic literature review

Let's Connect

Join our monthly webinars for insightful discussions or product presentations.

Explore your interests of AI in healthcare through our blog posts .

A blue version of the Laser AI logo which is a series of circular and rectangular shapes forming a circle in a square.

Dive into conversations with us about AI and the future of AI in healthcare on LinkedIn!

Would you like to try Laser AI in your organization?

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 13, Issue 7
  • Artificial intelligence in systematic reviews: promising when appropriately used
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0003-1727-0608 Sanne H B van Dijk 1 , 2 ,
  • Marjolein G J Brusse-Keizer 1 , 3 ,
  • Charlotte C Bucsán 2 , 4 ,
  • http://orcid.org/0000-0003-1071-6769 Job van der Palen 3 , 4 ,
  • Carine J M Doggen 1 , 5 ,
  • http://orcid.org/0000-0002-2276-5691 Anke Lenferink 1 , 2 , 5
  • 1 Health Technology & Services Research, Technical Medical Centre , University of Twente , Enschede , The Netherlands
  • 2 Pulmonary Medicine , Medisch Spectrum Twente , Enschede , The Netherlands
  • 3 Medical School Twente , Medisch Spectrum Twente , Enschede , The Netherlands
  • 4 Cognition, Data & Education, Faculty of Behavioural, Management & Social Sciences , University of Twente , Enschede , The Netherlands
  • 5 Clinical Research Centre , Rijnstate Hospital , Arnhem , The Netherlands
  • Correspondence to Dr Anke Lenferink; a.lenferink{at}utwente.nl

Background Systematic reviews provide a structured overview of the available evidence in medical-scientific research. However, due to the increasing medical-scientific research output, it is a time-consuming task to conduct systematic reviews. To accelerate this process, artificial intelligence (AI) can be used in the review process. In this communication paper, we suggest how to conduct a transparent and reliable systematic review using the AI tool ‘ASReview’ in the title and abstract screening.

Methods Use of the AI tool consisted of several steps. First, the tool required training of its algorithm with several prelabelled articles prior to screening. Next, using a researcher-in-the-loop algorithm, the AI tool proposed the article with the highest probability of being relevant. The reviewer then decided on relevancy of each article proposed. This process was continued until the stopping criterion was reached. All articles labelled relevant by the reviewer were screened on full text.

Results Considerations to ensure methodological quality when using AI in systematic reviews included: the choice of whether to use AI, the need of both deduplication and checking for inter-reviewer agreement, how to choose a stopping criterion and the quality of reporting. Using the tool in our review resulted in much time saved: only 23% of the articles were assessed by the reviewer.

Conclusion The AI tool is a promising innovation for the current systematic reviewing practice, as long as it is appropriately used and methodological quality can be assured.

PROSPERO registration number CRD42022283952.

  • systematic review
  • statistics & research methods
  • information technology

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:  https://creativecommons.org/licenses/by/4.0/ .

https://doi.org/10.1136/bmjopen-2023-072254

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

Potential pitfalls regarding the use of artificial intelligence in systematic reviewing were identified.

Remedies for each pitfall were provided to ensure methodological quality. A time-efficient approach is suggested on how to conduct a transparent and reliable systematic review using an artificial intelligence tool.

The artificial intelligence tool described in the paper was not evaluated for its accuracy.

Medical-scientific research output has grown exponentially since the very first medical papers were published. 1–3 The output in the field of clinical medicine increased and keeps doing so. 4 To illustrate, a quick PubMed search for ‘cardiology’ shows a fivefold increase in annual publications from 10 420 (2007) to 52 537 (2021). Although the medical-scientific output growth rate is not higher when compared with other scientific fields, 1–3 this field creates the largest output. 3 Staying updated by reading all published articles is therefore not feasible. However, systematic reviews facilitate up-to-date and accessible summaries of evidence, as they synthesise previously published results in a transparent and reproducible manner. 5 6 Hence, conclusions can be drawn that provide the highest considered level of evidence in medical research. 5 7 Therefore, systematic reviews are not only crucial in science, but they have a large impact on clinical practice and policy-making as well. 6 They are, however, highly labour-intensive to conduct due to the necessity of screening a large amount of articles, which results in a high consumption of research resources. Thus, efficient and innovative reviewing methods are desired. 8

An open-source artificial intelligence (AI) tool ‘ASReview’ 9 was published in 2021 to facilitate the title and abstract screening process in systematic reviews. Applying this tool facilitates researchers to conduct more efficient systematic reviews: simulations already showed its time-saving potential. 9–11 We used the tool in the study selection of our own systematic review and came across scenarios that needed consideration to prevent loss of methodological quality. In this communication paper, we provide a reliable and transparent AI-supported systematic reviewing approach.

We first describe how the AI tool was used in a systematic review conducted by our research group. For more detailed information regarding searches and eligibility criteria of the review, we refer to the protocol (PROSPERO registry: CRD42022283952). Subsequently, when deciding on the AI screening-related methodology, we applied appropriate remedies against foreseen scenarios and their pitfalls to maintain a reliable and transparent approach. These potential scenarios, pitfalls and remedies will be discussed in the Results section.

In our systematic review, the AI tool ‘ASReview’ (V.0.17.1) 9 was used for the screening of titles and abstracts by the first reviewer (SHBvD). The tool uses an active researcher-in-the-loop machine learning algorithm to rank the articles from high to low probability of eligibility for inclusion by text mining. The AI tool offers several classifier models by which the relevancy of the included articles can be determined. 9 In a simulation study using six large systematic review datasets on various topics, a Naïve Bayes (NB) and a term frequency-inverse document frequency (TF-IDF) outperformed other model settings. 10 The NB classifier estimates the probability of an article being relevant, based on TF-IDF measurements. TF-IDF measures the originality of a certain word within the article relative to the total number of articles the word appears in. 12 This combination of NB and TF-IDF was chosen for our systematic review.

Before the AI tool can be used for the screening of relevant articles, its algorithm needs training with at least one relevant and one irrelevant article (ie, prior knowledge). It is assumed that the more prior knowledge, the better the algorithm is trained at the start of the screening process, and the faster it will identify relevant articles. 9 In our review, the prior knowledge consisted of three relevant articles 13–15 selected from a systematic review on the topic 16 and three randomly picked irrelevant articles.

After training with the prior knowledge, the AI tool made a first ranking of all unlabelled articles (ie, articles not yet decided on eligibility) from highest to lowest probability of being relevant. The first reviewer read the title and abstract of the number one ranked article and made a decision (‘relevant’ or ‘irrelevant’) following the eligibility criteria. Next, the AI tool took into account this additional knowledge and made a new ranking. Again, the next top ranked article was proposed to the reviewer, who made a decision regarding eligibility. This process of AI making rankings and the reviewer making decisions, which is also called ‘researcher-in-the-loop’, was repeated until the predefined data-driven stopping criterion of – in our case – 100 subsequent irrelevant articles was reached. After the reviewer rejected what the AI tool puts forward as ‘most probably relevant’ a hundred times, it was assumed that there were no relevant articles left in the unseen part of the dataset.

The articles that were labelled relevant during the title and abstract screening were each screened on full text independently by two reviewers (SHBvD and MGJB-K, AL, JvdP, CJMD, CCB) to minimise the influence of subjectivity on inclusion. Disagreements regarding inclusion were solved by a third independent reviewer.

How to maintain reliability and transparency when using AI in title and abstract screening

A summary of the potential scenarios, and their pitfalls and remedies, when using the AI tool in a systematic review is given in table 1 . These potential scenarios should not be ignored, but acted on to maintain reliability and transparency. Figure 1 shows when and where to act on during the screening process reflected by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart, 17 from literature search results to publishing the review.

  • Download figure
  • Open in new tab
  • Download powerpoint

Flowchart showing when and where to act on when using ASReview in systematic reviewing. Adapted the PRISMA flowchart from Haddaway et al . 17

  • View inline

Per-scenario overview of potential pitfalls and how to prevent these when using ASReview in a systematic review

In our systematic review, by means of broad literature searches in several scientific databases, a first set of potentially relevant articles was identified, yielding 8456 articles, enough to expect the AI tool to be efficient in the title and abstract screening (scenario ① was avoided, see table 1 ). Subsequently, this complete set of articles was uploaded in reference manager EndNote X9 18 and review manager Covidence, 19 where 3761 duplicate articles were removed. Given that EndNote has quite low sensitivity in identifying duplicates, additional deduplication in Covidence was considered beneficial. 20 Deduplication is usually applied in systematic reviewing, 20 but is increasingly important prior to the use of AI. Since multiple decisions regarding a duplicate article weigh more than one, this will disproportionately influence classification and possibly the results ( table 1 , scenario ② ). In our review, a deduplicated set of articles was uploaded in the AI tool. Prior to the actual AI-supported title and abstract screening, the reviewers (SHBvD and AL, MGJB-K) trained themselves with a small selection of 74 articles. The first reviewer became familiar with the ASReview software, and all three reviewers learnt how to apply the eligibility criteria, to minimise personal influence on the article selection ( table 1 , scenario ③ ).

Defining the stopping criterion used in the screening process is left to the reviewer. 9 An optimal stopping criterion in active learning is considered a perfectly balanced trade-off between a certain cost (in terms of time spent) of screening one more article versus the predictive performance (in terms of identifying a new relevant article) that could be increased by adding one more decision. 21 The optimal stopping criterion in systematic reviewing would be the moment that screening additional articles will not result in more relevant articles being identified. 22 Therefore, in our review, we predetermined a data-driven stopping criterion for the title and abstract screening as ‘100 consecutive irrelevant articles’ in order to prevent the screening from being stopped before or a long time after all relevant articles were identified ( table 1 , scenario ④ ).

Due to the fact that the stopping criterion was reached after 1063 of the 4695 articles, only a part of the total number of articles was seen. Therefore, this approach might be sensitive to possible mistakes when articles are screened by only one reviewer, influencing the algorithm, possibly resulting in an incomplete selection of articles ( table 1 , scenario ③ ). 23 As a remedy, second reviewers (AL, MGJB-K) checked 20% of the titles and abstracts seen by the first reviewer. This 20% had a comparable ratio regarding relevant versus irrelevant articles over all articles seen. The percentual agreement and Cohen’s Kappa (κ), a measure for the inter-reviewer agreement above chance, were calculated to express the reliability of the decisions taken. 24 The decisions were agreed in 96% and κ was 0.83. A κ equal of at least 0.6 is generally considered high, 24 and thus it was assumed that the algorithm was reliably trained by the first reviewer.

The reporting of the use of the AI tool should be transparent. If the choices made regarding the use of the AI tool are not entirely reported ( table 1 , scenario ⑤ ), the reader will not be able to properly assess the methodology of the review, and review results may even be graded as low-quality due to the lack of transparent reporting. The ASReview tool offers the possibility to extract a data file providing insight into all decisions made during the screening process, in contrast to various other ‘black box’ AI-reviewing tools. 9 This file will be published alongside our systematic review to provide full transparency of our AI-supported screening. This way, the screening with AI is reproducible (remedy to scenario ⑥ , table 1 ).

Results of AI-supported study selection in a systematic review

We experienced an efficient process of title and abstract screening in our systematic review. Whereas the screening was performed with a database of 4695 articles, the stopping criterion was reached after 1063 articles, so 23% were seen. Figure 2A shows the proportion of articles identified as being relevant at any point during the AI-supported screening process. It can be observed that the articles are indeed prioritised by the active learning algorithm: in the beginning, relatively many relevant articles were found, but this decreased as the stopping criterion (vertical red line) was approached. Figure 2B compares the screening progress when using the AI tool versus manual screening. The moment the stopping criterion was reached, approximately 32 records would have been found when the titles and abstract would have been screened manually, compared with 142 articles labelled relevant using the AI tool. After the inter-reviewer agreement check, 142 articles proceeded to the full text reviewing phase, of which 65 were excluded because these were no articles with an original research format, and three because the full text could not be retrieved. After full text reviewing of the remaining 74 articles, 18 articles from 13 individual studies were included in our review. After snowballing, one additional article from a study already included was added.

Relevant articles identified after a certain number of titles and abstracts were screened using the AI tool compared with manual screening.

In our systematic review, the AI tool considerably reduced the number of articles in the screening process. Since the AI tool is offered open source, many researchers may benefit from its time-saving potential in selecting articles. Choices in several scenarios regarding the use of AI, however, are still left open to the researcher, and need consideration to prevent pitfalls. These include the choice whether or not to use AI by weighing the costs versus the benefits, the importance of deduplication, double screening to check inter-reviewer agreement, a data-driven stopping criterion to optimally use the algorithm’s predictive performance and quality of reporting of the AI-related methodology chosen. This communication paper is, to our knowledge, the first elaborately explaining and discussing these choices regarding the application of this AI tool in an example systematic review.

The main advantage of using the AI tool is the amount of time saved. Indeed, in our study, only 23% of the total number of articles were screened before the predefined stopping criterion was met. Assuming that all relevant articles were found, the AI tool saved 77% of the time for title and abstract screening. However, time should be invested to become acquainted with the tool. Whether the expected screening time saved outweighs this time investment is context-dependent (eg, researcher’s digital skills, systematic reviewing skills, topic knowledge). An additional advantage is that research questions previously unanswerable due to the insurmountable number of articles to screen in a ‘classic’ (ie, manual) review, now actually are possible to answer. An example of the latter is a review screening over 60 000 articles, 25 which would probably never have been performed without AI supporting the article selection.

Since the introduction of the ASReview tool in 2021, it was applied in seven published reviews. 25–31 An important note to make is that only one 25 clearly reported AI-related choices in the methods and a complete and transparent flowchart reflecting the study selection process in the Results section. Two reviews reported a relatively small number (<400) of articles to screen, 26 27 of which more than 75% of the articles were screened before the stopping criterion was met, so the amount of time saved was limited. Also, three reviews reported many initial articles (>6000) 25 28 29 and one reported 892 articles, 31 of which only 5%–10% needed to be screened. So in these reviews, the AI tool saved an impressive amount of screening time. In our systematic review, 3% of the articles were labelled relevant during the title and abstract screening and eventually, <1% of all initial articles were included. These percentages are low, and are in line with the three above-mentioned reviews (1%–2% and 0%–1%, respectively). 25 28 29 Still, relevancy and inclusion rates are much lower when compared with ‘classic’ systematic reviews. A study evaluating the screening process in 25 ‘classic’ systematic reviews showed that approximately 18% was labelled relevant and 5% was actually included in the reviews. 32 This difference is probably due to more narrow literature searches in ‘classic’ reviews for feasibility purposes compared with AI-supported reviews, resulting in a higher proportion of included articles.

In this paper, we show how we applied the AI tool, but we did not evaluate it in terms of accuracy. This means that we have to deal with a certain degree of uncertainty. Despite the data-driven stopping criterion there is a chance that relevant articles were missed, as 77% was automatically excluded. Considering this might have been the case, first, this could be due to wrong decisions of the reviewer that would have undesirably influenced the training of the algorithm by which the articles were labelled as (ir)relevant and the order in which they were presented to the reviewer. Relevant articles could have therefore remained unseen if the stopping criterion was reached before they were presented to the reviewer. As a remedy, in our own systematic review, of the 20% of the articles screened by the first reviewer, relevancy was also assessed by another reviewer to assess inter-reviewer reliability, which was high. It should be noted, though, that ‘classic’ title and abstract screening is not necessarily better than using AI, as medical-scientific researchers tend to assess one out of nine abstracts wrongly. 32 Second, the AI tool may not have properly ranked highly relevant to irrelevant articles. However, given that simulations proved this AI tool’s accuracy before 9–11 this was not considered plausible. Since our study applied, but did not evaluate, the AI tool, we encourage future studies evaluating the performance of the tool across different scientific disciplines and contexts, since research suggests that the tool’s performance depends on the context, for example, the complexity of the research question. 33 This could not only enrich the knowledge about the AI tool, but also increases certainty about using it. Also, future studies should investigate the effects of choices made regarding the amount of prior knowledge that is provided to the tool, the number of articles defining the stopping criterion, and how duplicate screening is best performed, to guide future users of the tool.

Although various researcher-in-the-loop AI tools for title and abstract screening have been developed over the years, 9 23 34 they often do not develop into usable mature software, 34 which impedes AI to be permanently implemented in research practice. For medical-scientific research practice, it would therefore be helpful if large systematic review institutions, like Cochrane and PRISMA, would consider to ‘officially’ make AI part of systematic reviewing practice. When guidelines on the use of AI in systematic reviews are made available and widely recognised, AI-supported systematic reviews can be uniformly conducted and transparently reported. Only then we can really benefit from AI’s time-saving potential and reduce our research time waste.

Our experience with the AI tool during the title and abstract screening was positive as it has highly accelerated the literature selection process. However, users should consider applying appropriate remedies to scenarios that may form a threat to the methodological quality of the review. We provided an overview of these scenarios, their pitfalls and remedies. These encourage reliable use and transparent reporting of AI in systematic reviewing. To ensure the continuation of conducting systematic reviews in the future, and given their importance for medical guidelines and practice, we consider this tool as an important addition to the review process.

Ethics approval

Not applicable.

  • Bornmann L ,
  • Haunschild R ,
  • Michels C ,
  • Haghani M ,
  • Zwack CC , et al
  • McKenzie JE ,
  • Bossuyt PM , et al
  • Gurevitch J ,
  • Koricheva J ,
  • Nakagawa S , et al
  • Rohrich RJ ,
  • Bastian H ,
  • Glasziou P ,
  • van de Schoot R ,
  • de Bruin J ,
  • Schram R , et al
  • Ferdinands G ,
  • de Bruin J , et al
  • Ferdinands G
  • Havrlant L ,
  • Kreinovich V
  • Li Y , et al
  • Jalloul F ,
  • Ayed S , et al
  • Andrijevic I ,
  • Milutinov S ,
  • Lozanov Crvenkovic Z , et al
  • Hawkins NM ,
  • Virani SA , et al
  • Haddaway NR ,
  • Pritchard CC , et al
  • Clarivate Analytics
  • Veritas Health Innovation
  • McKeown S ,
  • Ishibashi H ,
  • Blaizot A ,
  • Veettil SK ,
  • Saidoung P , et al
  • Bernardes RC ,
  • Botina LL ,
  • Araújo R dos S , et al
  • Silva GFS ,
  • Fagundes TP ,
  • Teixeira BC , et al
  • Miranda L ,
  • Pütz B , et al
  • Schouw HM ,
  • Huisman LA ,
  • Janssen YF , et al
  • Schuengel C ,
  • Sterkenburg PS , et al
  • Procházková M ,
  • Lu J , et al
  • Lam L , et al
  • Tetzlaff J , et al
  • Marshall IJ ,

Contributors SHBvD proposed the methodology and conducted the study selection. MGJB-K, CJMD and AL critically reflected on the methodology. MGJB-K and AL contributed substantially to the study selection. CCB, JvdP and CJMD contributed to the study selection. The manuscript was primarily prepared by SHBvD and critically revised by all authors. All authors read and approved the final manuscript.

Funding The systematic review is conducted as part of the RE-SAMPLE project. RE-SAMPLE has received funding from the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 965315).

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 February 2021

An open source machine learning framework for efficient and transparent systematic reviews

  • Rens van de Schoot   ORCID: orcid.org/0000-0001-7736-2091 1 ,
  • Jonathan de Bruin   ORCID: orcid.org/0000-0002-4297-0502 2 ,
  • Raoul Schram 2 ,
  • Parisa Zahedi   ORCID: orcid.org/0000-0002-1610-3149 2 ,
  • Jan de Boer   ORCID: orcid.org/0000-0002-0531-3888 3 ,
  • Felix Weijdema   ORCID: orcid.org/0000-0001-5150-1102 3 ,
  • Bianca Kramer   ORCID: orcid.org/0000-0002-5965-6560 3 ,
  • Martijn Huijts   ORCID: orcid.org/0000-0002-8353-0853 4 ,
  • Maarten Hoogerwerf   ORCID: orcid.org/0000-0003-1498-2052 2 ,
  • Gerbrich Ferdinands   ORCID: orcid.org/0000-0002-4998-3293 1 ,
  • Albert Harkema   ORCID: orcid.org/0000-0002-7091-1147 1 ,
  • Joukje Willemsen   ORCID: orcid.org/0000-0002-7260-0828 1 ,
  • Yongchao Ma   ORCID: orcid.org/0000-0003-4100-5468 1 ,
  • Qixiang Fang   ORCID: orcid.org/0000-0003-2689-6653 1 ,
  • Sybren Hindriks 1 ,
  • Lars Tummers   ORCID: orcid.org/0000-0001-9940-9874 5 &
  • Daniel L. Oberski   ORCID: orcid.org/0000-0001-7467-2297 1 , 6  

Nature Machine Intelligence volume  3 ,  pages 125–133 ( 2021 ) Cite this article

73k Accesses

226 Citations

165 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Computer science
  • Medical research

A preprint version of the article is available at arXiv.

To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.

Similar content being viewed by others

ai tools for systematic literature review

AI-assisted peer review

ai tools for systematic literature review

A typology for exploring the mitigation of shortcut behaviour

ai tools for systematic literature review

Distributed peer review enhanced with natural language processing and machine learning

With the emergence of online publishing, the number of scientific manuscripts on many topics is skyrocketing 1 . All of these textual data present opportunities to scholars and practitioners while simultaneously confronting them with new challenges. Scholars often develop systematic reviews and meta-analyses to develop comprehensive overviews of the relevant topics 2 . The process entails several explicit and, ideally, reproducible steps, including identifying all likely relevant publications in a standardized way, extracting data from eligible studies and synthesizing the results. Systematic reviews differ from traditional literature reviews in that they are more replicable and transparent 3 , 4 . Such systematic overviews of literature on a specific topic are pivotal not only for scholars, but also for clinicians, policy-makers, journalists and, ultimately, the general public 5 , 6 , 7 .

Given that screening the entire research literature on a given topic is too labour intensive, scholars often develop quite narrow searches. Developing a search strategy for a systematic review is an iterative process aimed at balancing recall and precision 8 , 9 ; that is, including as many potentially relevant studies as possible while simultaneously limiting the total number of studies retrieved. The vast number of publications in the field of study often leads to a relatively precise search, with the risk of missing relevant studies. The process of systematic reviewing is error prone and extremely time intensive 10 . In fact, if the literature of a field is growing faster than the amount of time available for systematic reviews, adequate manual review of this field then becomes impossible 11 .

The rapidly evolving field of machine learning has aided researchers by allowing the development of software tools that assist in developing systematic reviews 11 , 12 , 13 , 14 . Machine learning offers approaches to overcome the manual and time-consuming screening of large numbers of studies by prioritizing relevant studies via active learning 15 . Active learning is a type of machine learning in which a model can choose the data points (for example, records obtained from a systematic search) it would like to learn from and thereby drastically reduce the total number of records that require manual screening 16 , 17 , 18 . In most so-called human-in-the-loop 19 machine-learning applications, the interaction between the machine-learning algorithm and the human is used to train a model with a minimum number of labelling tasks. Unique to systematic reviewing is that not only do all relevant records (that is, titles and abstracts) need to seen by a researcher, but an extremely diverse range of concepts also need to be learned, thereby requiring flexibility in the modelling approach as well as careful error evaluation 11 . In the case of systematic reviewing, the algorithm(s) are interactively optimized for finding the most relevant records, instead of finding the most accurate model. The term researcher-in-the-loop was introduced 20 as a special case of human-in-the-loop with three unique components: (1) the primary output of the process is a selection of the records, not a trained machine learning model; (2) all records in the relevant selection are seen by a human at the end of the process 21 ; (3) the use-case requires a reproducible workflow and complete transparency is required 22 .

Existing tools that implement such an active learning cycle for systematic reviewing are described in Table 1 ; see the Supplementary Information for an overview of all of the software that we considered (note that this list was based on a review of software tools 12 ). However, existing tools have two main drawbacks. First, many are closed source applications with black box algorithms, which is problematic as transparency and data ownership are essential in the era of open science 22 . Second, to our knowledge, existing tools lack the necessary flexibility to deal with the large range of possible concepts to be learned by a screening machine. For example, in systematic reviews, the optimal type of classifier will depend on variable parameters, such as the proportion of relevant publications in the initial search and the complexity of the inclusion criteria used by the researcher 23 . For this reason, any successful system must allow for a wide range of classifier types. Benchmark testing is crucial to understand the real-world performance of any machine learning-aided system, but such benchmark options are currently mostly lacking.

In this paper we present an open source machine learning-aided pipeline with active learning for systematic reviews called ASReview. The goal of ASReview is to help scholars and practitioners to get an overview of the most relevant records for their work as efficiently as possible while being transparent in the process. The open, free and ready-to-use software ASReview addresses all concerns mentioned above: it is open source, uses active learning, allows multiple machine learning models. It also has a benchmark mode, which is especially useful for comparing and designing algorithms. Furthermore, it is intended to be easily extensible, allowing third parties to add modules that enhance the pipeline. Although we focus this paper on systematic reviews, ASReview can handle any text source.

In what follows, we first present the pipeline for manual versus machine learning-aided systematic reviews. We then show how ASReview has been set up and how ASReview can be used in different workflows by presenting several real-world use cases. We subsequently demonstrate the results of simulations that benchmark performance and present the results of a series of user-experience tests. Finally, we discuss future directions.

Pipeline for manual and machine learning-aided systematic reviews

The pipeline of a systematic review without active learning traditionally starts with researchers doing a comprehensive search in multiple databases 24 , using free text words as well as controlled vocabulary to retrieve potentially relevant references. The researcher then typically verifies that the key papers they expect to find are indeed included in the search results. The researcher downloads a file with records containing the text to be screened. In the case of systematic reviewing it contains the titles and abstracts (and potentially other metadata such as the authors’s names, journal name, DOI) of potentially relevant references into a reference manager. Ideally, two or more researchers then screen the records’s titles and abstracts on the basis of the eligibility criteria established beforehand 4 . After all records have been screened, the full texts of the potentially relevant records are read to determine which of them will be ultimately included in the review. Most records are excluded in the title and abstract phase. Typically, only a small fraction of the records belong to the relevant class, making title and abstract screening an important bottleneck in systematic reviewing process 25 . For instance, a recent study analysed 10,115 records and excluded 9,847 after title and abstract screening, a drop of more than 95% 26 . ASReview therefore focuses on this labour-intensive step.

The research pipeline of ASReview is depicted in Fig. 1 . The researcher starts with a search exactly as described above and subsequently uploads a file containing the records (that is, metadata containing the text of the titles and abstracts) into the software. Prior knowledge is then selected, which is used for training of the first model and presenting the first record to the researcher. As screening is a binary classification problem, the reviewer must select at least one key record to include and exclude on the basis of background knowledge. More prior knowledge may result in improved efficiency of the active learning process.

figure 1

The symbols indicate whether the action is taken by a human, a computer, or whether both options are available.

A machine learning classifier is trained to predict study relevance (labels) from a representation of the record-containing text (feature space) on the basis of prior knowledge. We have purposefully chosen not to include an author name or citation network representation in the feature space to prevent authority bias in the inclusions. In the active learning cycle, the software presents one new record to be screened and labelled by the user. The user’s binary label (1 for relevant versus 0 for irrelevant) is subsequently used to train a new model, after which a new record is presented to the user. This cycle continues up to a certain user-specified stopping criterion has been reached. The user now has a file with (1) records labelled as either relevant or irrelevant and (2) unlabelled records ordered from most to least probable to be relevant as predicted by the current model. This set-up helps to move through a large database much quicker than in the manual process, while the decision process simultaneously remains transparent.

Software implementation for ASReview

The source code 27 of ASReview is available open source under an Apache 2.0 license, including documentation 28 . Compiled and packaged versions of the software are available on the Python Package Index 29 or Docker Hub 30 . The free and ready-to-use software ASReview implements oracle, simulation and exploration modes. The oracle mode is used to perform a systematic review with interaction by the user, the simulation mode is used for simulation of the ASReview performance on existing datasets, and the exploration mode can be used for teaching purposes and includes several preloaded labelled datasets.

The oracle mode presents records to the researcher and the researcher classifies these. Multiple file formats are supported: (1) RIS files are used by digital libraries such as IEEE Xplore, Scopus and ScienceDirect; the citation managers Mendeley, RefWorks, Zotero and EndNote support the RIS format too. (2) Tabular datasets with the .csv, .xlsx and .xls file extensions. CSV files should be comma separated and UTF-8 encoded; the software for CSV files accepts a set of predetermined labels in line with the ones used in RIS files. Each record in the dataset should hold the metadata on, for example, a scientific publication. Mandatory metadata is text and can, for example, be titles or abstracts from scientific papers. If available, both are used to train the model, but at least one is needed. An advanced option is available that splits the title and abstracts in the feature-extraction step and weights the two feature matrices independently (for TF–IDF only). Other metadata such as author, date, DOI and keywords are optional but not used for training the models. When using ASReview in the simulation or exploration mode, an additional binary variable is required to indicate historical labelling decisions. This column, which is automatically detected, can also be used in the oracle mode as background knowledge for previous selection of relevant papers before entering the active learning cycle. If unavailable, the user has to select at least one relevant record that can be identified by searching the pool of records. At least one irrelevant record should also be identified; the software allows to search for specific records or presents random records that are most likely to be irrelevant due to the extremely imbalanced data.

The software has a simple yet extensible default model: a naive Bayes classifier, TF–IDF feature extraction, a dynamic resampling balance strategy 31 and certainty-based sampling 17 , 32 for the query strategy. These defaults were chosen on the basis of their consistently high performance in benchmark experiments across several datasets 31 . Moreover, the low computation time of these default settings makes them attractive in applications, given that the software should be able to run locally. Users can change the settings, shown in Table 2 , and technical details are described in our documentation 28 . Users can also add their own classifiers, feature extraction techniques, query strategies and balance strategies.

ASReview has a number of implemented features (see Table 2 ). First, there are several classifiers available: (1) naive Bayes; (2) support vector machines; (3) logistic regression; (4) neural networks; (5) random forests; (6) LSTM-base, which consists of an embedding layer, an LSTM layer with one output, a dense layer and a single sigmoid output node; and (7) LSTM-pool, which consists of an embedding layer, an LSTM layer with many outputs, a max pooling layer and a single sigmoid output node. The feature extraction techniques available are Doc2Vec 33 , embedding LSTM, embedding with IDF or TF–IDF 34 (the default is unigram, with the option to run n -grams while other parameters are set to the defaults of Scikit-learn 35 ) and sBERT 36 . The available query strategies for the active learning part are (1) random selection, ignoring model-assigned probabilities; (2) uncertainty-based sampling, which chooses the most uncertain record according to the model (that is, closest to 0.5 probability); (3) certainty-based sampling (max in ASReview), which chooses the record most likely to be included according to the model; and (4) mixed sampling, which uses a combination of random and certainty-based sampling.

There are several balance strategies that rebalance and reorder the training data. This is necessary, because the data is typically extremely imbalanced and therefore we have implemented the following balance strategies: (1) full sampling, which uses all of the labelled records; (2) undersampling the irrelevant records so that the included and excluded records are in some particular ratio (closer to one); and (3) dynamic resampling, a novel method similar to undersampling in that it decreases the imbalance of the training data 31 . However, in dynamic resampling, the number of irrelevant records is decreased, whereas the number of relevant records is increased by duplication such that the total number of records in the training data remains the same. The ratio between relevant and irrelevant records is not fixed over interactions, but dynamically updated depending on the number of labelled records, the total number of records and the ratio between relevant and irrelevant records. Details on all of the described algorithms can be found in the code and documentation referred to above.

By default, ASReview converts the records’s texts into a document-term matrix, terms are converted to lowercase and no stop words are removed by default (but this can be changed). As the document-term matrix is identical in each iteration of the active learning cycle, it is generated in advance of model training and stored in the (active learning) state file. Each row of the document-term matrix can easily be requested from the state-file. Records are internally identified by their row number in the input dataset. In oracle mode, the record that is selected to be classified is retrieved from the state file and the record text and other metadata (such as title and abstract) are retrieved from the original dataset (from the file or the computer’s memory). ASReview can run on your local computer, or on a (self-hosted) local or remote server. Data (all records and their labels) remain on the users’s computer. Data ownership and confidentiality are crucial and no data are processed or used in any way by third parties. This is unique by comparison with some of the existing systems, as shown in the last column of Table 1 .

Real-world use cases and high-level function descriptions

Below we highlight a number of real-world use cases and high-level function descriptions for using the pipeline of ASReview.

ASReview can be integrated in classic systematic reviews or meta-analyses. Such reviews or meta-analyses entail several explicit and reproducible steps, as outlined in the PRISMA guidelines 4 . Scholars identify all likely relevant publications in a standardized way, screen retrieved publications to select eligible studies on the basis of defined eligibility criteria, extract data from eligible studies and synthesize the results. ASReview fits into this process, particularly in the abstract screening phase. ASReview does not replace the initial step of collecting all potentially relevant studies. As such, results from ASReview depend on the quality of the initial search process, including selection of databases 24 and construction of comprehensive searches using keywords and controlled vocabulary. However, ASReview can be used to broaden the scope of the search (by keyword expansion or omitting limitation in the search query), resulting in a higher number of initial papers to limit the risk of missing relevant papers during the search part (that is, more focus on recall instead of precision).

Furthermore, many reviewers nowadays move towards meta-reviews when analysing very large literature streams, that is, systematic reviews of systematic reviews 37 . This can be problematic as the various reviews included could use different eligibility criteria and are therefore not always directly comparable. Due to the efficiency of ASReview, scholars using the tool could conduct the study by analysing the papers directly instead of using the systematic reviews. Furthermore, ASReview supports the rapid update of a systematic review. The included papers from the initial review are used to train the machine learning model before screening of the updated set of papers starts. This allows the researcher to quickly screen the updated set of papers on the basis of decisions made in the initial run.

As an example case, let us look at the current literature on COVID-19 and the coronavirus. An enormous number of papers are being published on COVID-19. It is very time consuming to manually find relevant papers (for example, to develop treatment guidelines). This is especially problematic as urgent overviews are required. Medical guidelines rely on comprehensive systematic reviews, but the medical literature is growing at breakneck pace and the quality of the research is not universally adequate for summarization into policy 38 . Such reviews must entail adequate protocols with explicit and reproducible steps, including identifying all potentially relevant papers, extracting data from eligible studies, assessing potential for bias and synthesizing the results into medical guidelines. Researchers need to screen (tens of) thousands of COVID-19-related studies by hand to find relevant papers to include in their overview. Using ASReview, this can be done far more efficiently by selecting key papers that match their (COVID-19) research question in the first step; this should start the active learning cycle and lead to the most relevant COVID-19 papers for their research question being presented next. A plug-in was therefore developed for ASReview 39 , which contained three databases that are updated automatically whenever a new version is released by the owners of the data: (1) the Cord19 database, developed by the Allen Institute for AI, with over all publications on COVID-19 and other coronavirus research (for example SARS, MERS and so on) from PubMed Central, the WHO COVID-19 database of publications, the preprint servers bioRxiv and medRxiv and papers contributed by specific publishers 40 . The CORD-19 dataset is updated daily by the Allen Institute for AI and updated also daily in the plugin. (2) In addition to the full dataset, we automatically construct a daily subset of the database with studies published after December 1st, 2019 to search for relevant papers published during the COVID-19 crisis. (3) A separate dataset of COVID-19 related preprints, containing metadata of preprints from over 15 preprints servers across disciplines, published since January 1st, 2020 41 . The preprint dataset is updated weekly by the maintainers and then automatically updated in ASReview as well. As this dataset is not readily available to researchers through regular search engines (for example, PubMed), its inclusion in ASReview provided added value to researchers interested in COVID-19 research, especially if they want a quick way to screen preprints specifically.

Simulation study

To evaluate the performance of ASReview on a labelled dataset, users can employ the simulation mode. As an example, we ran simulations based on four labelled datasets with version 0.7.2 of ASReview. All scripts to reproduce the results in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 , whereas the results are available at OSF ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 .

First, we analysed the performance for a study systematically describing studies that performed viral metagenomic next-generation sequencing in common livestock such as cattle, small ruminants, poultry and pigs 44 . Studies were retrieved from Embase ( n  = 1,806), Medline ( n  = 1,384), Cochrane Central ( n  = 1), Web of Science ( n  = 977) and Google Scholar ( n  = 200, the top relevant references). After deduplication this led to 2,481 studies obtained in the initial search, of which 120 were inclusions (4.84%).

A second simulation study was performed on the results for a systematic review of studies on fault prediction in software engineering 45 . Studies were obtained from ACM Digital Library, IEEExplore and the ISI Web of Science. Furthermore, a snowballing strategy and a manual search were conducted, accumulating to 8,911 publications of which 104 were included in the systematic review (1.2%).

A third simulation study was performed on a review of longitudinal studies that applied unsupervised machine learning techniques to longitudinal data of self-reported symptoms of the post-traumatic stress assessed after trauma exposure 46 , 47 ; 5,782 studies were obtained by searching Pubmed, Embase, PsychInfo and Scopus and through a snowballing strategy in which both the references and the citation of the included papers were screened. Thirty-eight studies were included in the review (0.66%).

A fourth simulation study was performed on the results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors, from a study collecting various systematic review datasets from the medical sciences 15 . The collection is a subset of 2,544 publications from the TREC 2004 Genomics Track document corpus 48 . This is a static subset from all MEDLINE records from 1994 through 2003, which allows for replicability of results. Forty-one publications were included in the review (1.6%).

Performance metrics

We evaluated the four datasets using three performance metrics. We first assess the work saved over sampling (WSS), which is the percentage reduction in the number of records needed to screen achieved by using active learning instead of screening records at random; WSS is measured at a given level of recall of relevant records, for example 95%, indicating the work reduction in screening effort at the cost of failing to detect 5% of the relevant records. For some researchers it is essential that all relevant literature on the topic is retrieved; this entails that the recall should be 100% (that is, WSS@100%). We also propose the amount of relevant references found after having screened the first 10% of the records (RRF10%). This is a useful metric for getting a quick overview of the relevant literature.

For every dataset, 15 runs were performed with one random inclusion and one random exclusion (see Fig. 2 ). The classical review performance with randomly found inclusions is shown by the dashed line. The average work saved over sampling at 95% recall for ASReview is 83% and ranges from 67% to 92%. Hence, 95% of the eligible studies will be found after screening between only 8% to 33% of the studies. Furthermore, the number of relevant abstracts found after reading 10% of the abstracts ranges from 70% to 100%. In short, our software would have saved many hours of work.

figure 2

a – d , Results of the simulation study for the results for a study systematically review studies that performed viral metagenomic next-generation sequencing in common livestock ( a ), results for a systematic review of studies on fault prediction in software engineering ( b ), results for longitudinal studies that applied unsupervised machine learning techniques on longitudinal data of self-reported symptoms of posttraumatic stress assessed after trauma exposure ( c ), and results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors ( d ). Fiteen runs (shown with separate lines) were performed for every dataset, with only one random inclusion and one random exclusion. The classical review performances with randomly found inclusions are shown by the dashed lines.

Usability testing (user experience testing)

We conducted a series of user experience tests to learn from end users how they experience the software and implement it in their workflow. The study was approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences of Utrecht University (ID 20-104).

Unstructured interviews

The first user experience (UX) test—carried out in December 2019—was conducted with an academic research team in a substantive research field (public administration and organizational science) that has conducted various systematic reviews and meta-analyses. It was composed of three university professors (ranging from assistant to full) and three PhD candidates. In one 3.5 h session, the participants used the software and provided feedback via unstructured interviews and group discussions. The goal was to provide feedback on installing the software and testing the performance on their own data. After these sessions we prioritized the feedback in a meeting with the ASReview team, which resulted in the release of v.0.4 and v.0.6. An overview of all releases can be found on GitHub 27 .

A second UX test was conducted with four experienced researchers developing medical guidelines based on classical systematic reviews, and two experienced reviewers working at a pharmaceutical non-profit organization who work on updating reviews with new data. In four sessions, held in February to March 2020, these users tested the software following our testing protocol. After each session we implemented the feedback provided by the experts and asked them to review the software again. The main feedback was about how to upload datasets and select prior papers. Their feedback resulted in the release of v.0.7 and v.0.9.

Systematic UX test

In May 2020 we conducted a systematic UX test. Two groups of users were distinguished: an unexperienced group and an experienced user who already used ASReview. Due to the COVID-19 lockdown the usability tests were conducted via video calling where one person gave instructions to the participant and one person observed, called human-moderated remote testing 49 . During the tests, one person (SH) asked the questions and helped the participant with the tasks, the other person observed and made notes, a user experience professional at the IT department of Utrecht University (MH).

To analyse the notes, thematic analysis was used, which is a method to analyse data by dividing the information in subjects that all have a different meaning 50 using the Nvivo 12 software 51 . When something went wrong the text was coded as showstopper, when something did not go smoothly the text was coded as doubtful, and when something went well the subject was coded as superb. The features the participants requested for future versions of the ASReview tool were discussed with the lead engineer of the ASReview team and were submitted to GitHub as issues or feature requests.

The answers to the quantitative questions can be found at the Open Science Framework 52 . The participants ( N  = 11) rated the tool with a grade of 7.9 (s.d. = 0.9) on a scale from one to ten (Table 2 ). The unexperienced users on average rated the tool with an 8.0 (s.d. = 1.1, N  = 6). The experienced user on average rated the tool with a 7.8 (s.d. = 0.9, N  = 5). The participants described the usability test with words such as helpful, accessible, fun, clear and obvious.

The UX tests resulted in the new release v0.10, v0.10.1 and the major release v0.11, which is a major revision of the graphical user interface. The documentation has been upgraded to make installing and launching ASReview more straightforward. We made setting up the project, selecting a dataset and finding past knowledge is more intuitive and flexible. We also added a project dashboard with information on your progress and advanced settings.

Continuous input via the open source community

Finally, the ASReview development team receives continuous feedback from the open science community about, among other things, the user experience. In every new release we implement features listed by our users. Recurring UX tests are performed to keep up with the needs of users and improve the value of the tool.

We designed a system to accelerate the step of screening titles and abstracts to help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible. Our system uses active learning to train a machine learning model that predicts relevance from texts using a limited number of labelled examples. The classifier, feature extraction technique, balance strategy and active learning query strategy are flexible. We provide an open source software implementation, ASReview with state-of-the-art systems across a wide range of real-world systematic reviewing applications. Based on our experiments, ASReview provides defaults on its parameters, which exhibited good performance on average across the applications we examined. However, we stress that in practical applications, these defaults should be carefully examined; for this purpose, the software provides a simulation mode to users. We encourage users and developers to perform further evaluation of the proposed approach in their application, and to take advantage of the open source nature of the project by contributing further developments.

Drawbacks of machine learning-based screening systems, including our own, remain. First, although the active learning step greatly reduces the number of manuscripts that must be screened, it also prevents a straightforward evaluation of the system’s error rates without further onerous labelling. Providing users with an accurate estimate of the system’s error rate in the application at hand is therefore a pressing open problem. Second, although, as argued above, the use of such systems is not limited in principle to reviewing, no empirical benchmarks of actual performance in these other situations yet exist to our knowledge. Third, machine learning-based screening systems automate the screening step only; although the screening step is time-consuming and a good target for automation, it is just one part of a much larger process, including the initial search, data extraction, coding for risk of bias, summarizing results and so on. Although some other works, similar to our own, have looked at (semi-)automating some of these steps in isolation 53 , 54 , to our knowledge the field is still far removed from an integrated system that would truly automate the review process while guaranteeing the quality of the produced evidence synthesis. Integrating the various tools that are currently under development to aid the systematic reviewing pipeline is therefore a worthwhile topic for future development.

Possible future research could also focus on the performance of identifying full text articles with different document length and domain-specific terminologies or even other types of text, such as newspaper articles and court cases. When the selection of past knowledge is not possible based on expert knowledge, alternative methods could be explored. For example, unsupervised learning or pseudolabelling algorithms could be used to improve training 55 , 56 . In addition, as the NLP community pushes forward the state of the art in feature extraction methods, these are easily added to our system as well. In all cases, performance benefits should be carefully evaluated using benchmarks for the task at hand. To this end, common benchmark challenges should be constructed that allow for an even comparison of the various tools now available. To facilitate such a benchmark, we have constructed a repository of publicly available systematic reviewing datasets 57 .

The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We invite the community to contribute to open source projects such as our own, as well as to common benchmark challenges, so that we can provide measurable and reproducible improvement over current practice.

Data availability

The results described in this paper are available at the Open Science Framework ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 . The answers to the quantitative questions of the UX test can be found at the Open Science Framework (OSF.IO/7PQNM) 52 .

Code availability

All code to reproduce the results described in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 . All code for the software ASReview is available under an Apache 2.0 license ( https://doi.org/10.5281/zenodo.3345592 ) 27 , is maintained on GitHub 63 and includes documentation ( https://doi.org/10.5281/zenodo.4287120 ) 28 .

Bornmann, L. & Mutz, R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66 , 2215–2222 (2015).

Article   Google Scholar  

Gough, D., Oliver, S. & Thomas, J. An Introduction to Systematic Reviews (Sage, 2017).

Cooper, H. Research Synthesis and Meta-analysis: A Step-by-Step Approach (SAGE Publications, 2015).

Liberati, A. et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J. Clin. Epidemiol. 62 , e1–e34 (2009).

Boaz, A. et al. Systematic Reviews: What have They Got to Offer Evidence Based Policy and Practice? (ESRC UK Centre for Evidence Based Policy and Practice London, 2002).

Oliver, S., Dickson, K. & Bangpan, M. Systematic Reviews: Making Them Policy Relevant. A Briefing for Policy Makers and Systematic Reviewers (UCL Institute of Education, 2015).

Petticrew, M. Systematic reviews from astronomy to zoology: myths and misconceptions. Brit. Med. J. 322 , 98–101 (2001).

Lefebvre, C., Manheimer, E. & Glanville, J. in Cochrane Handbook for Systematic Reviews of Interventions (eds. Higgins, J. P. & Green, S.) 95–150 (John Wiley & Sons, 2008); https://doi.org/10.1002/9780470712184.ch6 .

Sampson, M., Tetzlaff, J. & Urquhart, C. Precision of healthcare systematic review searches in a cross-sectional sample. Res. Synth. Methods 2 , 119–125 (2011).

Wang, Z., Nayfeh, T., Tetzlaff, J., O’Blenis, P. & Murad, M. H. Error rates of human reviewers during abstract screening in systematic reviews. PLoS ONE 15 , e0227742 (2020).

Marshall, I. J. & Wallace, B. C. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst. Rev. 8 , 163 (2019).

Harrison, H., Griffin, S. J., Kuhn, I. & Usher-Smith, J. A. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med. Res. Methodol. 20 , 7 (2020).

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. & Ananiadou, S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4 , 5 (2015).

Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C. & Schmid, C. H. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinf. 11 , 55 (2010).

Cohen, A. M., Hersh, W. R., Peterson, K. & Yen, P.-Y. Reducing workload in systematic review preparation using automated citation classification. J. Am. Med. Inform. Assoc. 13 , 206–219 (2006).

Kremer, J., Steenstrup Pedersen, K. & Igel, C. Active learning with support vector machines. WIREs Data Min. Knowl. Discov. 4 , 313–326 (2014).

Miwa, M., Thomas, J., O’Mara-Eves, A. & Ananiadou, S. Reducing systematic review workload through certainty-based screening. J. Biomed. Inform. 51 , 242–253 (2014).

Settles, B. Active Learning Literature Survey (Minds@UW, 2009); https://minds.wisconsin.edu/handle/1793/60660

Holzinger, A. Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3 , 119–131 (2016).

Van de Schoot, R. & De Bruin, J. Researcher-in-the-loop for Systematic Reviewing of Text Databases (Zenodo, 2020); https://doi.org/10.5281/zenodo.4013207

Kim, D., Seo, D., Cho, S. & Kang, P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477 , 15–29 (2019).

Nosek, B. A. et al. Promoting an open research culture. Science 348 , 1422–1425 (2015).

Kilicoglu, H., Demner-Fushman, D., Rindflesch, T. C., Wilczynski, N. L. & Haynes, R. B. Towards automatic recognition of scientifically rigorous clinical research evidence. J. Am. Med. Inform. Assoc. 16 , 25–31 (2009).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Borah, R., Brown, A. W., Capers, P. L. & Kaiser, K. A. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7 , e012545 (2017).

de Vries, H., Bekkers, V. & Tummers, L. Innovation in the Public Sector: a systematic review and future research agenda. Public Adm. 94 , 146–166 (2016).

Van de Schoot, R. et al. ASReview: Active Learning for Systematic Reviews (Zenodo, 2020); https://doi.org/10.5281/zenodo.3345592

De Bruin, J. et al. ASReview Software Documentation 0.14 (Zenodo, 2020); https://doi.org/10.5281/zenodo.4287120

ASReview PyPI Package (ASReview Core Development Team, 2020); https://pypi.org/project/asreview/

Docker container for ASReview (ASReview Core Development Team, 2020); https://hub.docker.com/r/asreview/asreview

Ferdinands, G. et al. Active Learning for Screening Prioritization in Systematic Reviews—A Simulation Study (OSF Preprints, 2020); https://doi.org/10.31219/osf.io/w6qbg

Fu, J. H. & Lee, S. L. Certainty-enhanced active learning for improving imbalanced data classification. In 2011 IEEE 11th International Conference on Data Mining Workshops 405–412 (IEEE, 2011).

Le, Q. V. & Mikolov, T. Distributed representations of sentences and documents. Preprint at https://arxiv.org/abs/1405.4053 (2014).

Ramos, J. Using TF–IDF to determine word relevance in document queries. In Proc. 1st Instructional Conference on Machine Learning Vol. 242, 133–142 (ICML, 2003).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks Preprint at https://arxiv.org/abs/1908.10084 (2019).

Smith, V., Devane, D., Begley, C. M. & Clarke, M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med. Res. Methodol. 11 , 15 (2011).

Wynants, L. et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. Brit. Med. J . 369 , 1328 (2020).

Van de Schoot, R. et al. Extension for COVID-19 Related Datasets in ASReview (Zenodo, 2020). https://doi.org/10.5281/zenodo.3891420 .

Lu Wang, L. et al. CORD-19: The COVID-19 open research dataset. Preprint at https://arxiv.org/abs/2004.10706 (2020).

Fraser, N. & Kramer, B. Covid19_preprints (FigShare, 2020); https://doi.org/10.6084/m9.figshare.12033672.v18

Ferdinands, G., Schram, R., Van de Schoot, R. & De Bruin, J. Scripts for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (Zenodo, 2020); https://doi.org/10.5281/zenodo.4024122

Ferdinands, G., Schram, R., van de Schoot, R. & de Bruin, J. Results for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (OSF, 2020); https://doi.org/10.17605/OSF.IO/2JKD6

Kwok, K. T. T., Nieuwenhuijse, D. F., Phan, M. V. T. & Koopmans, M. P. G. Virus metagenomics in farm animals: a systematic review. Viruses 12 , 107 (2020).

Hall, T., Beecham, S., Bowes, D., Gray, D. & Counsell, S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38 , 1276–1304 (2012).

van de Schoot, R., Sijbrandij, M., Winter, S. D., Depaoli, S. & Vermunt, J. K. The GRoLTS-Checklist: guidelines for reporting on latent trajectory studies. Struct. Equ. Model. Multidiscip. J. 24 , 451–467 (2017).

Article   MathSciNet   Google Scholar  

van de Schoot, R. et al. Bayesian PTSD-trajectory analysis with informed priors based on a systematic literature search and expert elicitation. Multivar. Behav. Res. 53 , 267–291 (2018).

Cohen, A. M., Bhupatiraju, R. T. & Hersh, W. R. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In Proc. 13th Text Retrieval Conference (TREC, 2004).

Vasalou, A., Ng, B. D., Wiemer-Hastings, P. & Oshlyansky, L. Human-moderated remote user testing: orotocols and applications. In 8th ERCIM Workshop, User Interfaces for All Vol. 19 (ERCIM, 2004).

Joffe, H. in Qualitative Research Methods in Mental Health and Psychotherapy: A Guide for Students and Practitioners (eds Harper, D. & Thompson, A. R.) Ch. 15 (Wiley, 2012).

NVivo v. 12 (QSR International Pty, 2019).

Hindriks, S., Huijts, M. & van de Schoot, R. Data for UX-test ASReview - June 2020. OSF https://doi.org/10.17605/OSF.IO/7PQNM (2020).

Marshall, I. J., Kuiper, J. & Wallace, B. C. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J. Am. Med. Inform. Assoc. 23 , 193–201 (2016).

Nallapati, R., Zhou, B., dos Santos, C. N., Gulcehre, Ç. & Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning 280–290 (Association for Computational Linguistics, 2016).

Xie, Q., Dai, Z., Hovy, E., Luong, M.-T. & Le, Q. V. Unsupervised data augmentation for consistency training. Preprint at https://arxiv.org/abs/1904.12848 (2019).

Ratner, A. et al. Snorkel: rapid training data creation with weak supervision. VLDB J. 29 , 709–730 (2020).

Systematic Review Datasets (ASReview Core Development Team, 2020); https://github.com/asreview/systematic-review-datasets

Wallace, B. C., Small, K., Brodley, C. E., Lau, J. & Trikalinos, T. A. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In Proc. 2nd ACM SIGHIT International Health Informatics Symposium 819–824 (Association for Computing Machinery, 2012).

Cheng, S. H. et al. Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv. Biol. 32 , 762–764 (2018).

Yu, Z., Kraft, N. & Menzies, T. Finding better active learners for faster literature reviews. Empir. Softw. Eng . 23 , 3161–3186 (2018).

Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—a web and mobile app for systematic reviews. Syst. Rev. 5 , 210 (2016).

Przybyła, P. et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res. Synth. Methods 9 , 470–488 (2018).

ASReview: Active learning for Systematic Reviews (ASReview Core Development Team, 2020); https://github.com/asreview/asreview

Download references

Acknowledgements

We would like to thank the Utrecht University Library, focus area Applied Data Science, and departments of Information and Technology Services, Test and Quality Services, and Methodology and Statistics, for their support. We also want to thank all researchers who shared data, participated in our user experience tests or who gave us feedback on ASReview in other ways. Furthermore, we would like to thank the editors and reviewers for providing constructive feedback. This project was funded by the Innovation Fund for IT in Research Projects, Utrecht University, the Netherlands.

Author information

Authors and affiliations.

Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands

Rens van de Schoot, Gerbrich Ferdinands, Albert Harkema, Joukje Willemsen, Yongchao Ma, Qixiang Fang, Sybren Hindriks & Daniel L. Oberski

Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Jonathan de Bruin, Raoul Schram, Parisa Zahedi & Maarten Hoogerwerf

Utrecht University Library, Utrecht University, Utrecht, the Netherlands

Jan de Boer, Felix Weijdema & Bianca Kramer

Department of Test and Quality Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Martijn Huijts

School of Governance, Faculty of Law, Economics and Governance, Utrecht University, Utrecht, the Netherlands

Lars Tummers

Department of Biostatistics, Data management and Data Science, Julius Center, University Medical Center Utrecht, Utrecht, the Netherlands

Daniel L. Oberski

You can also search for this author in PubMed   Google Scholar

Contributions

R.v.d.S. and D.O. originally designed the project, with later input from L.T. J.d.Br. is the lead engineer, software architect and supervises the code base on GitHub. R.S. coded the algorithms and simulation studies. P.Z. coded the very first version of the software. J.d.Bo., F.W. and B.K. developed the systematic review pipeline. M.Huijts is leading the UX tests and was supported by S.H. M.Hoogerwerf developed the architecture of the produced (meta)data. G.F. conducted the simulation study together with R.S. A.H. performed the literature search comparing the different tools together with G.F. J.W. designed all the artwork and helped with formatting the manuscript. Y.M. and Q.F. are responsible for the preprocessing of the metadata under the supervision of J.d.Br. R.v.d.S, D.O. and L.T. wrote the paper with input from all authors. Each co-author has written parts of the manuscript.

Corresponding author

Correspondence to Rens van de Schoot .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Jian Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Overview of software tools supporting systematic reviews.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

van de Schoot, R., de Bruin, J., Schram, R. et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3 , 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7

Download citation

Received : 04 June 2020

Accepted : 17 December 2020

Published : 01 February 2021

Issue Date : February 2021

DOI : https://doi.org/10.1038/s42256-020-00287-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

A systematic review, meta-analysis, and meta-regression of the prevalence of self-reported disordered eating and associated factors among athletes worldwide.

  • Hadeel A. Ghazzawi
  • Lana S. Nimer
  • Haitham Jahrami

Journal of Eating Disorders (2024)

Systematic review using a spiral approach with machine learning

  • Amirhossein Saeidmehr
  • Piers David Gareth Steel
  • Faramarz F. Samavati

Systematic Reviews (2024)

The spatial patterning of emergency demand for police services: a scoping review

  • Samuel Langton
  • Stijn Ruiter
  • Linda Schoonmade

Crime Science (2024)

The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analyses

  • Josien Boetje
  • Rens van de Schoot

Effect of capacity building interventions on classroom teacher and early childhood educator perceived capabilities, knowledge, and attitudes relating to physical activity and fundamental movement skills: a systematic review and meta-analysis

  • Matthew Bourke
  • Ameena Haddara
  • Patricia Tucker

BMC Public Health (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

ai tools for systematic literature review

web1.jpg

We generate robust evidence fast

What is silvi.ai    .

Silvi is an end-to-end screening and data extraction tool supporting Systematic Literature Review and Meta-analysis.

Silvi helps create systematic literature reviews and meta-analyses that follow Cochrane guidelines in a highly reduced time frame, giving a fast and easy overview. It supports the user through the full process, from literature search to data analyses. Silvi is directly connected with databases such as PubMed and ClinicalTrials.gov and is always updated with the latest published research. It also supports RIS files, making it possible to upload a search string from your favorite search engine (i.e., Ovid). Silvi has a tagging system that can be tailored to any project.

Silvi is transparent, meaning it documents and stores the choices (and the reasons behind them) the user makes. Whether publishing the results from the project in a journal, sending them to an authority, or collaborating on the project with several colleagues, transparency is optimal to create robust evidence.

Silvi is developed with the user experience in mind. The design is intuitive and easily available to new users. There is no need to become a super-user. However, if any questions should arise anyway, we have a series of super short, instructional videos to get back on track.

To see Silvi in use, watch our short introduction video.

  Short introduction video  

ai tools for systematic literature review

Learn more about Silvi’s specifications here.

"I like that I can highlight key inclusions and exclusions which makes the screening process really quick - I went through 2000+ titles and abstracts in just a few hours"

Eishaan Kamta Bhargava 

Consultant Paediatric ENT Surgeon, Sheffield Children's Hospital

"I really like how intuitive it is working with Silvi. I instantly felt like a superuser."

Henriette Kristensen

Senior Director, Ferring Pharmaceuticals

"The idea behind Silvi is great. Normally, I really dislike doing literature reviews, as they take up huge amounts of time. Silvi has made it so much easier! Thanks."

Claus Rehfeld

Senior Consultant, Nordic Healthcare Group

"AI has emerged as an indispensable tool for compiling evidence and conducting meta-analyses. Silvi.ai has proven to be the most comprehensive option I have explored, seamlessly integrating automated processes with the indispensable attributes of clarity and reproducibility essential for rigorous research practices."

Martin Södermark

M.Sc. Specialist in clinical adult psychology

weba.jpg

Silvi.ai was founded in 2018 by Professor in Health Economic Evidence, Tove Holm-Larsen, and expert in Machine Learning, Rasmus Hvingelby. The idea for Silvi stemmed from their own research, and the need to conduct systematic literature reviews and meta-analyses faster.

The ideas behind Silvi were originally a component of a larger project. In 2016, Tove founded the group “Evidensbaseret Medicin 2.0” in collaboration with researchers from Ghent University, Technical University of Denmark, University of Copenhagen, and other experts. EBM 2.0  wanted to optimize evidence-based medicine to its highest potential using Big Data and Artificial Intelligence, but needed a highly skilled person within AI.

Around this time, Tove met Rasmus, who shared the same visions. Tove teamed up with Rasmus, and Silvi.ai was created.

Our story  

Silvi ikon hvid (uden baggrund)

       Free Trial       

    No   card de t ails nee ded!  

Purdue University

  • Ask a Librarian

Artificial Intelligence (AI)

Ai for systematic review.

  • How to Cite AI Generated Content
  • Prompt Design
  • Resources for Educators
  • Purdue AI Resources
  • AI and Ethics
  • Publisher Policies
  • Selected Journals in AI

Various AI tools are invaluable throughout the systematic review or evidence synthesis process. While the consensus acknowledges the significant utility of AI tools across different review stages, it's imperative to grasp their inherent biases and weaknesses. Moreover, ethical considerations such as copyright and intellectual property must be at the forefront.

  • Application ChatGPT in conducting systematic reviews and meta-analyses
  • Are ChatGPT and large language models “the answer” to bringing us closer to systematic review automation?
  • Artificial intelligence in systematic reviews: promising when appropriately used
  • Harnessing the power of ChatGPT for automating systematic review process: methodology, case study, limitations, and future directions
  • In-depth evaluation of machine learning methods for semi-automating article screening in a systematic review of mechanistic
  • Tools to support the automation of systematic reviews: a scoping review
  • The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study
  • Using artificial intelligence methods for systematic review in health sciences: A systematic review

AI Tools for Systematic Review

  • DistillerSR Securely automate every stage of your literature review to produce evidence-based research faster, more accurately, and more transparently at scale.
  • Rayyan A web-tool designed to help researchers working on systematic reviews, scoping reviews and other knowledge synthesis projects, by dramatically speeding up the process of screening and selecting studies.
  • RobotReviewer A machine learning system aiming which aims to automate evidence synthesis.
  • << Previous: AI Tools
  • Next: How to Cite AI Generated Content >>
  • Last Edited: May 7, 2024 10:25 AM
  • URL: https://guides.lib.purdue.edu/ai

A free, AI-powered research tool for scientific literature

  • Kathy K. Niakan
  • Social Security Tax
  • Market Structure

New & Improved API for Developers

Introducing semantic reader in beta.

Stay Connected With Semantic Scholar Sign Up What Is Semantic Scholar? Semantic Scholar is a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI.

Searching for Systematic Reviews & Evidence Synthesis: AI tools in evidence synthesis

  • Define your search question
  • Searching Databases
  • Drawing up your search strategy
  • Advanced search techniques
  • Using Filters
  • Grey Literature
  • Recording your search strategy and results
  • Managing References & Software Tools
  • Further information
  • Library Workshops, Drop ins and 1-2-1s
  • AI tools in evidence synthesis

Introduction

A variety of AI tools can be used during the systematic review or evidence synthesis process. These may be used to assist with developing a search strategy; locating relevant articles or resources; or during the data screening, data extraction or synthesis stage. They can also be used to draft plain language summaries.

The overall consensus is that the AI tools can be very useful in different stages of the systematic or other evidence review but that it is important to fully understand any bias and weakness they may bring to the process. In many cases using new AI tools, which previous research has not assessed rigorously, should happen in conjunction with existing validated methods. It is also essential to consider ethical, copyright and intellectual property issues for example if the process involves you uploading data or full text of articles to an AI tool.

 Below are some recent published articles on the topic:

  • Alshami, A.; Elsayed, M.; Ali, E.; Eltoukhy, A.E.E.; Zayed, T. Harnessing the Power of ChatGPT for Automating Systematic Review Process: Methodology, Case Study, Limitations, and Future Directions . Systems 2023, 11, 351. https://doi.org/10.3390/systems11070351 Explores the use of ChatGPT in (1) Preparation of Boolean research terms and article collection, (2) Abstract screening and articles categorization, (3) Full-text filtering and information extraction, and (4) Content analysis to identify trends, challenges, gaps, and proposed solutions.
  • Blaizot, A, Veettil, SK, Saidoung, P, et al.  Using artificial intelligence methods for systematic review in health sciences: A systematic review.   Res Syn Meth . 2022; 13(3): 353-362. doi: 10.1002/jrsm.1553 The review below delineated automated tools and platforms that employ artificial intelligence (AI) approaches and evaluated the reported benefits and challenges in using such methods.They report the usage of Rayyan Robot Reviewer EPPI-reviewer; K-means; SWIFT-review; SWIFT-Active Screener; Abstrackr; Wordstat; Qualitative Data Analysis (QDA);  Miner and NLP and assess the quality of the reviews which used these.
  • Janka H, Metzendorf M-I. High precision but variable recall – comparing the performance of five deduplication tools . JEAHIL [Internet]. 17Mar.2024 [cited 28Mar.2024];20(1):12-7. Available from: http://ojs.eahil.eu/ojs/index.php/JEAHIL/article/view/607  
  • Kebede, MM, Le Cornet, C, Fortner, RT.  In-depth evaluation of machine learning methods for semi-automating article screening in a systematic review of mechanistic literature.   Res Syn Meth . 2023; 14(2): 156-172. doi: 10.1002/jrsm.1589 "We aimed to evaluate the performance of supervised machine learning algorithms in predicting articles relevant for full-text review in a systematic review." "Implementing machine learning approaches in title/abstract screening should be investigated further toward refining these tools and automating their implementation"  

Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: a scoping review .  J Clin Epidemiol  2022;  144:  22-42  https://www.jclinepi.com/article/S0895-4356(21)00402-9/fulltext  "The current scoping review identified that LitSuggest, Rayyan, Abstractr, BIBOT, R software, RobotAnalyst, DistillerSR, ExaCT and NetMetaXL have potential to be used for the automation of systematic reviews. However, they are not without limitations. The review also identified other studies that employed algorithms that have not yet been developed into user friendly tools. Some of these algorithms showed high validity and reliability but their use is conditional on user knowledge of computer science and algorithms."

Khraisha Q, Put S, Kappenberg J, Warraitch A, Hadfield K.  Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages .  Res Syn Meth . 2024; 1-11. doi: 10.1002/jrsm.1715 "Although our findings indicate that, currently, substantial caution should be exercised if LLMs are being used to conduct systematic reviews, they also offer preliminary evidence that, for certain review tasks delivered under specific conditions, LLMs can rival human performance."

Mahuli, S., Rai, A., Mahuli, A. et al. Application ChatGPT in conducting systematic reviews and meta-analyses . Br Dent J 235, 90–92 (2023). https://doi.org/10.1038/s41415-023-6132-y Explores using ChatGPT for conducting Risk of Bias analysis and data extraction from a randomised controlled trial.

Ovelman, C., Kugley, S., Gartlehner, G., & Viswanathan, M. (2024). The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study . Cochrane Evidence Synthesis and Methods, 2(2), e12041.  https://onlinelibrary.wiley.com/doi/abs/10.1002/cesm.12041 

Qureshi, R., Shaughnessy, D., Gill, K.A.R.  et al.   Are ChatGPT and large language models “the answer” to bringing us closer to systematic review automation? .  Syst Rev   12 , 72 (2023). https://doi.org/10.1186/s13643-023-02243-z "Our experience from exploring the responses of ChatGPT suggest that while ChatGPT and LLMs show some promise for aiding in SR-related tasks, the technology is in its infancy and needs much development for such applications. Furthermore, we advise that great caution should be taken by non-content experts in using these tools due to much of the output appearing, at a high level, to be valid, while much is erroneous and in need of active vetting."

van Dijk SHB, Brusse-Keizer MGJ, Bucsán CC, et al. Artificial intelligence in systematic reviews: promising when appropriately used . BMJ Open 2023;13:e072254. doi: 10.1136/bmjopen-2023-072254  Suggests how to conduct a transparent and reliable systematic review using the AI tool ‘ASReview’ in the title and abstract screening.

An update on machine learning AI in systematic reviews

June 2023 webinar including a panel discussion exploring the use of machine learning AI in Covidence (screening & data extraction tool).

CLEAR Framework for Prompt Engineering

  • The CLEAR path: A framework for enhancing information literacy through prompt engineering. This article introduces the CLEAR Framework for Prompt Engineering, designed to optimize interactions with AI language models like ChatGPT. The framework encompasses five core principles—Concise, Logical, Explicit, Adaptive, and Reflective—that facilitate more effective AI-generated content evaluation and creation. more... less... Lo, L. S. (2023). The CLEAR path: A framework for enhancing information literacy through prompt engineering. The Journal of Academic Librarianship, 49(4), 102720.

Selection of AI tools used in Evidence Synthesis

  • Systematic Review Toolbox The Systematic Review Toolbox is an online catalogue of tools that support various tasks within the systematic review and wider evidence synthesis process.
  • Rayyan Free web-tool designed to speed up the process of screening and selecting studies
  • Abstrackr Aids in citation screening. Please note you will need to create a free account before accessing the tool.
  • DistillerSR An online application designed to automate all stages of the systematic literature reviews. Priced packages available (please note we cannot offer support on using this system).
  • ExaCT Information Extraction system. Please note you will need to request a free account. The system is trained to find key information from scientific clinical trial publications, namely the descriptions of the trial's interventions, population, outcome measures, funding sources, and other critical characteristics. Please note you will need to request a free account.
  • RobotReviewer RobotReviewer is a machine learning system which aims to support evidence synthesis. The demonstration website allows users to upload RCT articles and see automatically determined information concerning the trial conduct (the 'PICO', study design, and whether there is a risk of bias).

Selection of tools to support the automation of systematic reviews (2022)

Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: a scoping review. J Clin Epidemiol. 2022 Apr;144:22-42. doi: 10.1016/j.jclinepi.2021.12.005. Epub 2021 Dec 8. PMID: 34896236.  https://www.sciencedirect.com/science/article/pii/S0895435621004029?ref=pdf_download&fr=RR-2&rr=821cfdcf2d377762#tbl0004 [accessed 06-11-23].

Summary of validated tools available for each stage of the review

Screenshot of Table 4. Summary of validated tools available for each stage of the review

King’s guidance on generative AI for teaching, assessment and feedback

  • King’s guidance on generative AI for teaching, assessment and feedback comprehensive guidance aims to support the adoption and integration of generative AI at different institutional levels - macro (university), meso (department, programme, module), and micro (individual lecturers, especially those with assessment roles).

Leveraging GPT-4 for Systematic Reviews

Recording of 1 hour webinar exploring Artificial Intelligence (AI) and its potential impact on the process of systematic reviews (August 15th, 2023). Note PICO Portal is a systematic review platform that leverages artificial intelligence to accelerate research and innovation.

Moderator Dr Greg Martin. Presenters: Eitan Agai - PICO Portal Founder & AI Expert; Riaz Qureshi - U. of Colorado Anschutz Medical Campus; Kevin Kallmes - Chief Executive Officer, Cofounder; Jeff Johnson - Chef Design Officer.

PAIR (problem, AI, interaction, reflection) framework guidance

  • PAIR (problem, AI, interaction, reflection) framework guidance The framework is designed to be (i) simple, providing a straightforward structure to harness the potential of generative AI; (ii) customisable, allowing adaptation to align with specific learning objectives and student characteristics; and (iii) compatible, building on established pedagogical approaches such as problem/inquiry-based learning and active learning, making it suitable for different disciplines.

Artificial intelligence (AI) technologies in Cochrane

  • Web Clinic: Artificial intelligence (AI) technologies in Cochrane The session was delivered in May 2024 and you will find the videos from the webinar, together with the accompanying slides to download [PDF]. Recordings from other Methods Support Unit web clinics are available here. Part 1: How Cochrane currently uses machine learning: implementing innovative technology Part 2: What generative AI is, the opportunities it brings and the challenges regarding its safe use Part 3: Cochrane's focus on the responsible use of AI in systematic reviews Part 4: Questions and answers
  • << Previous: Library Workshops, Drop ins and 1-2-1s
  • Last Updated: Jun 3, 2024 2:22 PM
  • URL: https://libguides.kcl.ac.uk/systematicreview

FSTA Logo

Start your free trial

Arrange a trial for your organisation and discover why FSTA is the leading database for reliable research on the sciences of food and health.

REQUEST A FREE TRIAL

  • Research Skills Blog

5 software tools to support your systematic review processes

By Dr. Mina Kalantar on 19-Jan-2021 13:01:01

4 software tools to support your systematic review processes | IFIS Publishing

Systematic reviews are a reassessment of scholarly literature to facilitate decision making. This methodical approach of re-evaluating evidence was initially applied in healthcare, to set policies, create guidelines and answer medical questions.

Systematic reviews are large, complex projects and, depending on the purpose, they can be quite expensive to conduct. A team of researchers, data analysts and experts from various fields may collaborate to review and examine incredibly large numbers of research articles for evidence synthesis. Depending on the spectrum, systematic reviews often take at least 6 months, and sometimes upwards of 18 months to complete.

The main principles of transparency and reproducibility require a pragmatic approach in the organisation of the required research activities and detailed documentation of the outcomes. As a result, many software tools have been developed to help researchers with some of the tedious tasks required as part of the systematic review process.

hbspt.cta._relativeUrls=true;hbspt.cta.load(97439, 'ccc20645-09e2-4098-838f-091ed1bf1f4e', {"useNewLoader":"true","region":"na1"});

The first generation of these software tools were produced to accommodate and manage collaborations, but gradually developed to help with screening literature and reporting outcomes. Some of these software packages were initially designed for medical and healthcare studies and have specific protocols and customised steps integrated for various types of systematic reviews. However, some are designed for general processing, and by extending the application of the systematic review approach to other fields, they are being increasingly adopted and used in software engineering, health-related nutrition, agriculture, environmental science, social sciences and education.

Software tools

There are various free and subscription-based tools to help with conducting a systematic review. Many of these tools are designed to assist with the key stages of the process, including title and abstract screening, data synthesis, and critical appraisal. Some are designed to facilitate the entire process of review, including protocol development, reporting of the outcomes and help with fast project completion.

As time goes on, more functions are being integrated into such software tools. Technological advancement has allowed for more sophisticated and user-friendly features, including visual graphics for pattern recognition and linking multiple concepts. The idea is to digitalise the cumbersome parts of the process to increase efficiency, thus allowing researchers to focus their time and efforts on assessing the rigorousness and robustness of the research articles.

This article introduces commonly used systematic review tools that are relevant to food research and related disciplines, which can be used in a similar context to the process in healthcare disciplines.

These reviews are based on IFIS' internal research, thus are unbiased and not affiliated with the companies.

ross-sneddon-sWlDOWk0Jp8-unsplash-1-2

This online platform is a core component of the Cochrane toolkit, supporting parts of the systematic review process, including title/abstract and full-text screening, documentation, and reporting.

The Covidence platform enables collaboration of the entire systematic reviews team and is suitable for researchers and students at all levels of experience.

From a user perspective, the interface is intuitive, and the citation screening is directed step-by-step through a well-defined workflow. Imports and exports are straightforward, with easy export options to Excel and CVS.

Access is free for Cochrane authors (a single reviewer), and Cochrane provides a free trial to other researchers in healthcare. Universities can also subscribe on an institutional basis.

Rayyan is a free and open access web-based platform funded by the Qatar Foundation, a non-profit organisation supporting education and community development initiative . Rayyan is used to screen and code literature through a systematic review process.

Unlike Covidence, Rayyan does not follow a standard SR workflow and simply helps with citation screening. It is accessible through a mobile application with compatibility for offline screening. The web-based platform is known for its accessible user interface, with easy and clear export options.

Function comparison of 5 software tools to support the systematic review process

Eppi-reviewer.

EPPI-Reviewer is a web-based software programme developed by the Evidence for Policy and Practice Information and Co-ordinating Centre  (EPPI) at the UCL Institute for Education, London .

It provides comprehensive functionalities for coding and screening. Users can create different levels of coding in a code set tool for clustering, screening, and administration of documents. EPPI-Reviewer allows direct search and import from PubMed. The import of search results from other databases is feasible in different formats. It stores, references, identifies and removes duplicates automatically. EPPI-Reviewer allows full-text screening, text mining, meta-analysis and the export of data into different types of reports.

There is no limit for concurrent use of the software and the number of articles being reviewed. Cochrane reviewers can access EPPI reviews using their Cochrane subscription details.

EPPI-Centre has other tools for facilitating the systematic review process, including coding guidelines and data management tools.

CADIMA is a free, online, open access review management tool, developed to facilitate research synthesis and structure documentation of the outcomes.

The Julius Institute and the Collaboration for Environmental Evidence established the software programme to support and guide users through the entire systematic review process, including protocol development, literature searching, study selection, critical appraisal, and documentation of the outcomes. The flexibility in choosing the steps also makes CADIMA suitable for conducting systematic mapping and rapid reviews.

CADIMA was initially developed for research questions in agriculture and environment but it is not limited to these, and as such, can be used for managing review processes in other disciplines. It enables users to export files and work offline.

The software allows for statistical analysis of the collated data using the R statistical software. Unlike EPPI-Reviewer, CADIMA does not have a built-in search engine to allow for searching in literature databases like PubMed.

DistillerSR

DistillerSR is an online software maintained by the Canadian company, Evidence Partners which specialises in literature review automation. DistillerSR provides a collaborative platform for every stage of literature review management. The framework is flexible and can accommodate literature reviews of different sizes. It is configurable to different data curation procedures, workflows and reporting standards. The platform integrates necessary features for screening, quality assessment, data extraction and reporting. The software uses Artificial Learning (AL)-enabled technologies in priority screening. It is to cut the screening process short by reranking the most relevant references nearer to the top. It can also use AL, as a second reviewer, in quality control checks of screened studies by human reviewers. DistillerSR is used to manage systematic reviews in various medical disciplines, surveillance, pharmacovigilance and public health reviews including food and nutrition topics. The software does not support statistical analyses. It provides configurable forms in standard formats for data extraction.

DistillerSR allows direct search and import of references from PubMed. It provides an add on feature called LitConnect which can be set to automatically import newly published references from data providers to keep reviews up to date during their progress.

The systematic review Toolbox is a web-based catalogue of various tools, including software packages which can assist with single or multiple tasks within the evidence synthesis process. Researchers can run a quick search or tailor a more sophisticated search by choosing their approach, budget, discipline, and preferred support features, to find the right tools for their research.

If you enjoyed this blog post, you may also be interested in our recently published blog post addressing the difference between a systematic review and a systematic literature review.

BLOG CTA

  • FSTA - Food Science & Technology Abstracts
  • IFIS Collections
  • Resources Hub
  • Diversity Statement
  • Sustainability Commitment
  • Company news
  • Frequently Asked Questions
  • Privacy Policy
  • Terms of Use for IFIS Collections

Ground Floor, 115 Wharfedale Road,  Winnersh Triangle, Wokingham, Berkshire RG41 5RB

Get in touch with IFIS

© International Food Information Service (IFIS Publishing) operating as IFIS – All Rights Reserved     |     Charity Reg. No. 1068176     |     Limited Company No. 3507902     |     Designed by Blend

RAxter is now Enago Read! Enjoy the same licensing and pricing with enhanced capabilities. No action required for existing customers.

Your all in one AI-powered Reading Assistant

A Reading Space to Ideate, Create Knowledge, and Collaborate on Your Research

  • Smartly organize your research
  • Receive recommendations that cannot be ignored
  • Collaborate with your team to read, discuss, and share knowledge

literature review research assistance

From Surface-Level Exploration to Critical Reading - All in one Place!

Fine-tune your literature search.

Our AI-powered reading assistant saves time spent on the exploration of relevant resources and allows you to focus more on reading.

Select phrases or specific sections and explore more research papers related to the core aspects of your selections. Pin the useful ones for future references.

Our platform brings you the latest research related to your and project work.

Speed up your literature review

Quickly generate a summary of key sections of any paper with our summarizer.

Make informed decisions about which papers are relevant, and where to invest your time in further reading.

Get key insights from the paper, quickly comprehend the paper’s unique approach, and recall the key points.

Bring order to your research projects

Organize your reading lists into different projects and maintain the context of your research.

Quickly sort items into collections and tag or filter them according to keywords and color codes.

Experience the power of sharing by finding all the shared literature at one place.

Decode papers effortlessly for faster comprehension

Highlight what is important so that you can retrieve it faster next time.

Select any text in the paper and ask Copilot to explain it to help you get a deeper understanding.

Ask questions and follow-ups from AI-powered Copilot.

Collaborate to read with your team, professors, or students

Share and discuss literature and drafts with your study group, colleagues, experts, and advisors. Recommend valuable resources and help each other for better understanding.

Work in shared projects efficiently and improve visibility within your study group or lab members.

Keep track of your team's progress by being constantly connected and engaging in active knowledge transfer by requesting full access to relevant papers and drafts.

Find papers from across the world's largest repositories

microsoft academic

Testimonials

Privacy and security of your research data are integral to our mission..

enago read privacy policy

Everything you add or create on Enago Read is private by default. It is visible if and when you share it with other users.

Copyright

You can put Creative Commons license on original drafts to protect your IP. For shared files, Enago Read always maintains a copy in case of deletion by collaborators or revoked access.

Security

We use state-of-the-art security protocols and algorithms including MD5 Encryption, SSL, and HTTPS to secure your data.

Generative AI for Systematic reviews

Automate systematic reviews & meta-analyses

Save time screening & extracting data with language models

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form

Join 2 million researchers using elicit, including those at

Systematic reviews assisted by ai, human-level accuracy, 50% - 80% faster and cheaper, greater impact, product features, extract data for screening or meta-analysis, including data from tables.

ai tools for systematic literature review

Choose from over 30 predefined fields, or define your own

Add descriptions & instructions, like with research assistants.

ai tools for systematic literature review

Check all language model work in-line

View supporting quotes from the paper and see quotes in the context of the paper, pick a plan that's right for you, get in touch, enterprise and institutions, custom pricing.

  • Oracle Mode
  • Oracle Mode – Advanced
  • Exploration Mode
  • Simulation Mode
  • Simulation Infrastructure

Join the movement towards fast, open, and transparent systematic reviews

ASReview LAB v1.5 is out!

YouTube

By loading the video, you agree to YouTube's privacy policy. Learn more

Always unblock YouTube

ASReview uses state-of-the-art active learning techniques to solve one of the most interesting challenges in systematically screening large amounts of text : there’s not enough time to read everything!  

The project has grown into a vivid worldwide community of researchers, users, and developers. ASReview is coordinated at Utrecht University and is part of the official AI-labs at the university.

ai tools for systematic literature review

Free, Open and Transparent

The software is installed on your device locally. This ensures that nobody else has access to your data, except when you share it with others. Nice, isn’t it?

  • Free and open source
  • Local or server installation
  • Full control over your data
  • Follows the Reproducibility and Data Storage Checklist for AI-Aided Systematic Reviews

In 2 minutes up and running

With the smart project setup features, you can start a new project in minutes. Ready, set, start screening!

  • Create as many projects as you want
  • Choose your own or an existing dataset
  • Select prior knowledge
  • Select your favorite active learning algorithm

ai tools for systematic literature review

Three modi to choose from

ASReview LAB can be used for:

  • Screening with the Oracle Mode , including advanced options
  • Teaching using the Exploration Mode
  • Validating algorithms using the Simulation Mode

We also offer an open-source research infrastructure to run large-scale simulation studies for validating newly developed AI algorithms.

Follow the development

Open-source means:

  • All annotated source code is available 
  • You can see the developers at work in open Pull Requests
  • Open Pull Request show in what direction the project is developing
  • Anyone can contribute!

Give a GitHub repo a star if you like our work.

ai tools for systematic literature review

Join the community

A community-driven project means:

  • The project is a joined endeavor  
  • Your contribution matters!

Join the movement towards transparent AI-aided reviewing

Beginner -> User -> Developer -> Maintainer

Organizations

Github stars

Join the ASReview Development Fund

Many users donate their time to continue the development of the different software tools that are part of the ASReview universe. Also, donations and research grants make innovations possible!

ai tools for systematic literature review

Navigating the Maze of Models in ASReview

Starting a systematic review can feel like navigating through a maze, with countless articles and endless…

ai tools for systematic literature review

ASReview LAB Class 101

ASReview LAB Class 101 Welcome to ASReview LAB class 101, an introduction to the most important…

ai tools for systematic literature review

Introducing the Noisy Label Filter (NLF) procedure in systematic reviews

The ASReview team developed a procedure to overcome replication issues in creating a dataset for simulation…

ai tools for systematic literature review

Seven ways to integrate ASReview in your systematic review workflow

Seven ways to integrate ASReview in your systematic review workflow Systematic reviewing using software implementing Active…

ai tools for systematic literature review

Active Learning Explained

Active Learning Explained The rapidly evolving field of artificial intelligence (AI) has allowed the development of…

The Zen of Elas

the Zen of Elas

The Zen of Elas Elas is the Mascotte of ASReview and your Electronic Learning Assistant who…

ai tools for systematic literature review

Five ways to get involved in ASReview

Five ways to get involved in ASReview ASReview LAB is open and free (Libre) software, maintained…

ai tools for systematic literature review

Connecting RIS import to export functionalities

What’s new in v0.19? Connecting RIS import to export functionalities Download ASReview LAB 0.19Update to ASReview…

ai tools for systematic literature review

Meet the new ASReview Maintainer: Yongchao Ma

Meet Front-End Developer and ASReview Maintainer Yongchao Ma As a user of ASReview, you are probably…

ai tools for systematic literature review

UPDATED: ASReview Hackathon for Follow the Money

This event has passed The winners of the hackathon were: Data pre-processing: Teamwork by: Raymon van…

What’s new in release 0.18?

More models more options, available now! Version 0.18 slowly opens ways to the soon to be…

ai tools for systematic literature review

Simulation Mode Class 101

Simulation Mode Class 101 Have you ever done a systematic review manually, but wondered how much…

ai tools for systematic literature review

Accelerate your research with the best systematic literature review tools

The ideal literature review tool helps you make sense of the most important insights in your research field. ATLAS.ti empowers researchers to perform powerful and collaborative analysis using the leading software for literature review.

ai tools for systematic literature review

Finalize your literature review faster with comfort

ATLAS.ti makes it easy to manage, organize, and analyze articles, PDFs, excerpts, and more for your projects. Conduct a deep systematic literature review and get the insights you need with a comprehensive toolset built specifically for your research projects.

ai tools for systematic literature review

Figure out the "why" behind your participant's motivations

Understand the behaviors and emotions that are driving your focus group participants. With ATLAS.ti, you can transform your raw data and turn it into qualitative insights you can learn from. Easily determine user intent in the same spot you're deciphering your overall focus group data.

ai tools for systematic literature review

Visualize your research findings like never before

We make it simple to present your analysis results with meaningful charts, networks, and diagrams. Instead of figuring out how to communicate the insights you just unlocked, we enable you to leverage easy-to-use visualizations that support your goals.

ai tools for systematic literature review

Everything you need to elevate your literature review

Import and organize literature data.

Import and analyze any type of text content – ATLAS.ti supports all standard text and transcription files such as Word and PDF.

Analyze with ease and speed

Utilize easy-to-learn workflows that save valuable time, such as auto coding, sentiment analysis, team collaboration, and more.

Leverage AI-driven tools

Make efficiency a priority and let ATLAS.ti do your work with AI-powered research tools and features for faster results.

Visualize and present findings

With just a few clicks, you can create meaningful visualizations like charts, word clouds, tables, networks, among others for your literature data.

The faster way to make sense of your literature review. Try it for free, today.

A literature review analyzes the most current research within a research area. A literature review consists of published studies from many sources:

  • Peer-reviewed academic publications
  • Full-length books
  • University bulletins
  • Conference proceedings
  • Dissertations and theses

Literature reviews allow researchers to:

  • Summarize the state of the research
  • Identify unexplored research inquiries
  • Recommend practical applications
  • Critique currently published research

Literature reviews are either standalone publications or part of a paper as background for an original research project. A literature review, as a section of a more extensive research article, summarizes the current state of the research to justify the primary research described in the paper.

For example, a researcher may have reviewed the literature on a new supplement's health benefits and concluded that more research needs to be conducted on those with a particular condition. This research gap warrants a study examining how this understudied population reacted to the supplement. Researchers need to establish this research gap through a literature review to persuade journal editors and reviewers of the value of their research.

Consider a literature review as a typical research publication presenting a study, its results, and the salient points scholars can infer from the study. The only significant difference with a literature review treats existing literature as the research data to collect and analyze. From that analysis, a literature review can suggest new inquiries to pursue.

Identify a focus

Similar to a typical study, a literature review should have a research question or questions that analysis can answer. This sort of inquiry typically targets a particular phenomenon, population, or even research method to examine how different studies have looked at the same thing differently. A literature review, then, should center the literature collection around that focus.

Collect and analyze the literature

With a focus in mind, a researcher can collect studies that provide relevant information for that focus. They can then analyze the collected studies by finding and identifying patterns or themes that occur frequently. This analysis allows the researcher to point out what the field has frequently explored or, on the other hand, overlooked.

Suggest implications

The literature review allows the researcher to argue a particular point through the evidence provided by the analysis. For example, suppose the analysis makes it apparent that the published research on people's sleep patterns has not adequately explored the connection between sleep and a particular factor (e.g., television-watching habits, indoor air quality). In that case, the researcher can argue that further study can address this research gap.

External requirements aside (e.g., many academic journals have a word limit of 6,000-8,000 words), a literature review as a standalone publication is as long as necessary to allow readers to understand the current state of the field. Even if it is just a section in a larger paper, a literature review is long enough to allow the researcher to justify the study that is the paper's focus.

Note that a literature review needs only to incorporate a representative number of studies relevant to the research inquiry. For term papers in university courses, 10 to 20 references might be appropriate for demonstrating analytical skills. Published literature reviews in peer-reviewed journals might have 40 to 50 references. One of the essential goals of a literature review is to persuade readers that you have analyzed a representative segment of the research you are reviewing.

Researchers can find published research from various online sources:

  • Journal websites
  • Research databases
  • Search engines (Google Scholar, Semantic Scholar)
  • Research repositories
  • Social networking sites (Academia, ResearchGate)

Many journals make articles freely available under the term "open access," meaning that there are no restrictions to viewing and downloading such articles. Otherwise, collecting research articles from restricted journals usually requires access from an institution such as a university or a library.

Evidence of a rigorous literature review is more important than the word count or the number of articles that undergo data analysis. Especially when writing for a peer-reviewed journal, it is essential to consider how to demonstrate research rigor in your literature review to persuade reviewers of its scholarly value.

Select field-specific journals

The most significant research relevant to your field focuses on a narrow set of journals similar in aims and scope. Consider who the most prominent scholars in your field are and determine which journals publish their research or have them as editors or reviewers. Journals tend to look favorably on systematic reviews that include articles they have published.

Incorporate recent research

Recently published studies have greater value in determining the gaps in the current state of research. Older research is likely to have encountered challenges and critiques that may render their findings outdated or refuted. What counts as recent differs by field; start by looking for research published within the last three years and gradually expand to older research when you need to collect more articles for your review.

Consider the quality of the research

Literature reviews are only as strong as the quality of the studies that the researcher collects. You can judge any particular study by many factors, including:

  • the quality of the article's journal
  • the article's research rigor
  • the timeliness of the research

The critical point here is that you should consider more than just a study's findings or research outputs when including research in your literature review.

Narrow your research focus

Ideally, the articles you collect for your literature review have something in common, such as a research method or research context. For example, if you are conducting a literature review about teaching practices in high school contexts, it is best to narrow your literature search to studies focusing on high school. You should consider expanding your search to junior high school and university contexts only when there are not enough studies that match your focus.

You can create a project in ATLAS.ti for keeping track of your collected literature. ATLAS.ti allows you to view and analyze full text articles and PDF files in a single project. Within projects, you can use document groups to separate studies into different categories for easier and faster analysis.

For example, a researcher with a literature review that examines studies across different countries can create document groups labeled "United Kingdom," "Germany," and "United States," among others. A researcher can also use ATLAS.ti's global filters to narrow analysis to a particular set of studies and gain insights about a smaller set of literature.

ATLAS.ti allows you to search, code, and analyze text documents and PDF files. You can treat a set of research articles like other forms of qualitative data. The codes you apply to your literature collection allow for analysis through many powerful tools in ATLAS.ti:

  • Code Co-Occurrence Explorer
  • Code Co-Occurrence Table
  • Code-Document Table

Other tools in ATLAS.ti employ machine learning to facilitate parts of the coding process for you. Some of our software tools that are effective for analyzing literature include:

  • Named Entity Recognition
  • Opinion Mining
  • Sentiment Analysis

As long as your documents are text documents or text-enable PDF files, ATLAS.ti's automated tools can provide essential assistance in the data analysis process.

Memorial Sloan Kettering Library

ai tools for systematic literature review

The physical space of the MSK Library closed to visitors on Friday, May 17, 2024. Please visit this guide for more information.

Systematic Review Service

  • Review Types
  • How Long Do Reviews Take
  • Which Review Type is Right for Me
  • AI and Reviews
  • Policies for Partnering with an MSK Librarian
  • Request to Work with an MSK Librarian
  • Your First Meeting with an MSK Librarian
  • Covidence Review Software
  • Step 1: Form Your Team
  • Step 2: Define Your Research Question
  • Step 3: Write and Register Your Protocol
  • Step 4: Search for Evidence
  • Step 5: Screen Your Results
  • Step 6: Assess the Quality
  • Step 7: Collect the Data
  • Step 8: Write and Publish the Review
  • Additional Resources

Use of AI in the Review Process

Artificial intelligence (AI) is an evolving field with a lot of interest. There is the promise that it can cut down on the amount of time it takes to complete a review, and it can! But you have to know the strengths and limits of the tool you are using.

AI can be helpful for generating ideas that you vet and adapt. For example, ChatGPT might be beneficial to inform your research question and help you brainstorm inclusion and exclusion criteria and what terms you want in your search strategy. What it is not able to do is build a systematic search strategy. It may also suggest records for articles that do not exist, known as a hallucination.

AI can also be built into software. For example, Covidence , an online software platform often used for systematic reviews, uses machine learning to sort records by relevance and classify randomized controlled trials. They are also beta testing using team-entered inclusion and exclusion criteria to auto-remove records.

Before using something, explore its parameters, and be skeptical about anything that seems too good to be true or removes all human intervention from a task.

  • << Previous: Which Review Type is Right for Me
  • Next: Working with the MSK Library >>
  • Last Updated: May 30, 2024 12:17 PM
  • URL: https://libguides.mskcc.org/systematic-review-service
  • Research Guides
  • University Libraries

AI-Based Literature Review Tools

  • Dialogues: Insightful Facts
  • How to Craft Prompts
  • Plugins / Extensions for AI-powered Searches
  • Cite ChatGPT in APA / MLA
  • AI and Plagiarism
  • ChatGPT & Higher Education
  • Author Profile

Selected AI-Based Literature Review Tools

Updates: See news or release of AI (Beta) across various academic research databases including Web of Science , Scopus , Ebsco , ProQues t, OVID , Dimensions , JStor , Westlaw , and LexisNexis . ********* ********** ********** ********** **********   

Disclaimer: TAMU libraries do not have subscription access to the AI-powered tools listed below the divider line. The guide serves solely as an informational resource. It is recommended that you assess these tools and their usage methodologies independently. ------------------------------------------------------------------------------------------------------------------------------------------------------------- SEMANTIC SCHOLAR

  • SCIENTIFIC LITERATURE SEARCH ENGINE - finding semantically similar research papers.
  • " A free, AI-powered research tool for scientific literature."  <https://www.semanticscholar.org/>. But login is required in order to use all functions.
  • Over 200 millions of papers from all fields of science, the data of which has also served as a wellspring for the development of other AI-driven tools.

The 4000+ results can be sorted by Fields of Study, Date Range, Author, Journals & Conferences

Save the papers in your Library folder. The Research Feeds will recommend similar papers based on the items saved.

Example - SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality Total Citations: 22,438   [Note: these numbers were gathered when this guide was created] Highly Influential Citations 2,001 Background Citations 6,109 Methods Citations 3,273  Results Citations 385

Semantic Reader

TLDRs (Too Long; Didn't Read) Try this example . Press the pen icon to reveal the highlighted key points . TLDRs "are super-short summaries of the main objective and results of a scientific paper generated using expert background knowledge and the latest GPT-3 style NLP techniques. This new feature is available in beta for nearly 60 million papers in computer science, biology, and medicine..." < https://www.semanticscholar.org/product/tldr>

  • https://www.openread.academy/
  • Institutionally accessed by Harvard, MIT, University of Oxford, Johns Hopkins, Standford, Beijing University. .
  • AI-powered Academic Searching + Web Searching - Over 300 million papers and real-time web content.
  • Every keyword search or AI quest will yield a synthesis report with citations. If you want to re-orient the search outcomes, just click on the Re-generate button and all citations will be refreshed accordingly. After that click on Follow-Up Questions to delve deeper into a particular area or subject.
  • Use Paper Q&A to interact with a text directly, e.g. " What does this paper say about literature review ?"
  • Click on the Translation to put a text or search results into another language.
  • Upload a PDF document and let Paper Expresso to read it for you and parse the content into an academic report format for easy screening: Background and context> Research objectives and hypotheses> Methodology> Results and findings> Discussion and interpretation> Contributions to the field > Structure and flow> Achievements and significance> Limitations and future work>
  • AI-POWERED RESEARCH ASSISTANT - finding papers, filtering study types, automating research flow, brainstorming, summarizing and more.
  • " Elicit is a research assistant using language models like GPT-3 to automate parts of researchers’ workflows. Currently, the main workflow in Elicit is Literature Review. If you ask a question, Elicit will show relevant papers and summaries of key information about those papers in an easy-to-use table."   <https://elicit.org/faq#what-is-elicit.>; Find answers from 175 million papers. FAQS
  • Example - How do mental health interventions vary by age group?    /   Fish oil and depression Results: [Login required] (1) Summary of top 4 papers > Paper #1 - #4 with Title, abstract, citations, DOI, and pdf (2) Table view: Abstract / Interventions / Outcomes measured / Number of participants (3) Relevant studies and citations. (4) Click on Search for Paper Information to find - Metadata about Sources ( SJR etc.) >Population ( age etc.) >Intervention ( duration etc.) > Results ( outcome, limitations etc.) and > Methodology (detailed study design etc.) (5) Export as BIB or CSV
  • How to Search / Extract Data / List of Concept Search -Enter a research question >Workflow: Searching > Summarizing 8 papers> A summary of 4 top papers > Final answers. Each result will show its citation counts, DOI, and a full-text link to Semantic Scholar website for more information such as background citations, methods citation, related papers and more. - List of Concepts search - e.g. adult learning motivation . The results will present a list the related concepts. - Extract data from a pdf file - Upload a paper and let Elicit extract data for you.
  • Export Results - Various ways to export results.
  • How to Cite - Includes the elicit.org URL in the citation, for example: Ought; Elicit: The AI Research Assistant; https://elicit.org; accessed xxxx/xx/xx

CONSENSUS.APP

ACADEMIC SEARCH ENGINE- using AI to find insights in research papers.

"We are a search engine that is designed to accept research questions, find relevant answers within research papers, and synthesize the results using the same language model technology." <https://consensus.app/home/blog/maximize-your-consensus-experience-with-these-best-practices/>

  • Example - Does the death penalty reduce the crime?   /  Fish oil and depression  /    (1) Extracted & aggregated findings from relevant papers. (2) Results may include AIMS, DESIGN, PARTICIPANTS, FINDINGS or other methodological or report components. (3) Summaries and Full Text
  • How to Search Direct questions - Does the death penalty reduce the crime? Relationship between two concepts - Fish oil and depression / Does X cause Y? Open-ended concepts - effects of immigration on local economics Tips and search examples from Consensus' Best Practice   
  • Synthesize (beta) / Consensus Meter When the AI recognizes certain types of research questions, this functionality may be activated. It will examine a selection of some studies and provide a summary along with a Consensus Meter illustrating their collective agreement. Try this search: Is white rice linked to diabetes? The Consensus Meter reveals the following outcomes after analyzing 10 papers: 70% indicate a positive association, 20% suggest a possible connection, and 10% indicate no link.

Prompt “ write me a paragraph about the impact of climate change on GDP with citations “

CITATIONS IN CONTEXT

Integrated with Research Solutions.

Over 1.2 billion Citation Statements and metadata from over 181 million papers suggested reference.

How does it work? - "scite uses access to full-text articles and its deep learning model to tell you, for a given publication: - how many times it was cited by others - how it was cited by others by displaying the text where the citation happened from each citing paper - whether each citation offers supporting or contrasting evidence of the cited claims in the publication of interest, or simply mention it."   <https://help.scite.ai/en-us/article/what-is-scite-1widqmr/>

EXAMPLE of seeing all citations and citation statements in one place

More information: Scite: A smart citation index that displays the context of citations and classifies their intent using deep learning  

  • GPT3.5 by OpenAI. Knowledge cutoff date is September 2021.
  • Input/ Output length - ChatGPT-3.5 allows a maximum token limit of 4096 tokens. According to ChatGPT " On average, a token in English is roughly equivalent to 4 bytes or characters. English words are typically around 5 characters long. This means that, very roughly, you could fit around 800 to 1000 English words within 4096 tokens." 
  • According to ChatGPT, the generated responses are non-deterministic by default. So if you run the searches again and get slightly or very different results, it's likely due to this factor.
  • ChatGPT may find non-existent references.
  • According to this study < https://arxiv.org/ftp/arxiv/papers/2304/2304.06794.pdf > "ChatGPT cites the most-cited articles and journals, relying solely on Google Scholar's citation counts" within the field of environmental science.  
  • Example - "INTERVIEW WITH CHATGPT" as a Research Method & Teaching Tool Some researchers began to use this approach to obtain their research data. Try this Google Scholar search link "interview with ChatGPT"   or see two articles below: (1) Chatting about ChatGPT: how may AI and GPT impact academia and libraries? BD Lund, T Wang - Library Hi Tech News, 2023 (2) An interview with ChatGPT: discussing artificial intelligence in teaching, research, and practice , G Scaringi, M Loche - 2023  

Increased risk of sudden infant death syndrome (SIDS) Increased risk of premature birth Increased risk of low birth weight Increased risk of respiratory problems in newborns Increased risk of respiratory problems in infants exposed to secondhand smoke Increased risk of developing asthma and other respiratory illnesses later in life for infants exposed to secondhand smoke [Note : ChatGPT may generate non-existent references or false knowledge. To find out why Large Language Models hallucinate, check out this Wiki article: Hallucination (artificial intelligence) and this blog post -   A Gentle Introduction to Hallucinations in Large Language Models by Adrian Tam ]

Infant death Neonatal mortality (referring specifically to deaths within the first 28 days of life) Perinatal mortality (referring to deaths occurring during the period from 22 completed weeks of gestation to 7 completed days after birth) Early childhood mortality (referring to deaths occurring within the first five years of life) Child mortality (referring to deaths occurring before the age of 18) [Press the Regenerate button to get more.]

  • Example -  RELATED WORDS What are the related words of infant mortality? Neonatal mortality, Post-neonatal mortality, Stillbirths, Low birth weight, Malnutrition, Infectious diseases, Vaccination, Maternal health,  Access to healthcare, Poverty, Social inequality, Sanitation, Hygiene, Water quality, Childbirth complications, Congenital abnormalities, Birth defects, Maternal age, Under-five mortality, Child mortality, Perinatal mortality, Preterm birth, Low birth weight, Sudden infant death syndrome (SIDS), Maternal mortality, Postnatal care, Malnutrition, Immunization, Access to healthcare, Clean water and sanitation, Congenital anomalies, Infant health disparities, Infant mortality rate (IMR), Infant survival.  

(("Infant Death"[Mesh] OR "Sudden Infant Death Syndrome"[Mesh] OR "Infant Mortality"[Mesh] OR "Perinatal Mortality"[Mesh] OR "Neonatal Mortality"[Mesh] OR "Fetal Mortality"[Mesh] OR "Infant Death" OR "Sudden Infant Death Syndrome" OR "Infant Mortality" OR "Perinatal Mortality" OR "Neonatal Mortality" OR "Fetal Mortality") AND ("Risk Factors"[Mesh] OR "Etiology"[Mesh] OR "Prevention and Control"[Mesh] OR "Interventions"[Mesh] OR "Public Health"[Mesh] OR "Health Disparities"[Mesh] OR "Healthcare Disparities"[Mesh] OR "Poverty"[Mesh] OR "Malnutrition"[Mesh] OR "Infection"[Mesh] OR "Congenital Abnormalities"[Mesh] OR "Prematurity"[Mesh] OR "Low Birth Weight"[Mesh] OR "Maternal Education"[Mesh] OR "Maternal Health"[Mesh] OR "Risk Factors" OR "Etiology" OR "Prevention and Control" OR "Interventions" OR "Public Health" OR "Health Disparities" OR "Healthcare Disparities" OR "Poverty" OR "Malnutrition" OR "Infection" OR "Congenital Abnormalities" OR "Prematurity" OR "Low Birth Weight" OR "Maternal Education" OR "Maternal Health")) Comments and Suggestions: #1 -This search string may not work on all database platforms. For example, Medline (OVID) may not be able to run it, Medline & CINAHL (EBSCO) can run it without any problems while PubMed will run it but it will indicate which MeSH terms are not indexed. See PubMed's search History and Details under Advanced . #2 - This is just one possible combination. You may need to adjust it based on your task and the databases you are using . #3 - It might be easier if you break the task into 3 parts: ask for synonyms > ask for thesauri terms > and ask it to combine them.

Examples - DATABASE RECOMMENDATIONS Heath Science - Which databases to use to find articles on the EFFECTS OF MATERNAL SMOKING ON INFANT DEATH? PubMed: This is a database of biomedical literature from the National Library of Medicine. It includes articles from medical journals and other sources on a wide range of health-related topics. Embase: This is a database of biomedical literature that covers more international and pharmaceutical content than PubMed. It includes articles from medical journals and other sources on a wide range of health-related topics. Cochrane Library : This is a collection of databases that provides systematic reviews and meta-analyses of healthcare interventions, including interventions related to maternal smoking and infant death. Scopus: This is a multidisciplinary database that covers research from a wide range of fields, including health sciences. It includes articles from scientific journals, books, and conference proceedings. Web of Science: This is a database that covers research from a wide range of fields, including health sciences. It includes articles from scientific journals, books, and conference proceedings. CINAHL: This is a database of nursing and allied health literature, including articles on maternal smoking and infant death. [Note: The response above was copied and pasted directly from ChatGPT. No editing was added.] Education -- I am doing a meta-analysis on k-12 STEM education. What databases should I use? ERIC : This is a database of education research and information, including articles from journals, reports, and conference papers. It is a commonly used database for educational research. Education Source : This is a comprehensive database of education research and information, including articles from journals, books, and conference papers. Web of Science: This is a multidisciplinary database that covers research from a wide range of fields, including education. It includes articles from scientific journals, books, and conference proceedings. Scopus: This is another multidisciplinary database that covers research from a wide range of fields, including education. It includes articles from scientific journals, books, and conference proceedings. PsycINFO: This is a database of psychology research, including articles from journals, books, and dissertations. It can be useful for research on STEM education that includes psychological factors. Cochrane Library : This is a collection of databases that provides systematic reviews and meta-analyses of healthcare interventions including interventions related to education. [Author's note: You could also ask for the GREY LITERATURE sources related to the topic above,]

Example: ACT-AS-AN-EXPERT PROMPT Here's an alternative way to consolidate all the questions into a single consultation by using this popular Act-As-An-Expert prompt, aka, system message prompt:  I am going to conduct a systematic review on games and mathematics related to K-12 education  I want you to act as a systematic review expert . I will ask you some questions related to my research topic and you advise me on how to do it. 

[You may ask one question at a time] My first question is "which databases should I use?"  My second question is "what are the relevant subject terms, descriptors and key words?" My third question is "how to do data extraction?" ..... , .... Please note that the above prompts are merely for illustrative purposes. Actual questions may contain more specific instructions. As an example, for the first question, could you provide guidance on how to identify relevant publications? Specifically, can you provide a list of the primary databases and grey literature sources related to my topic?  

Example - GREY LITERATURE Please show me all the resources that offer grey literature for conducting research in the field of education     ERIC (Education Resources Information Center): https://eric.ed.gov/     OpenGrey: http://www.opengrey.eu/     British Education Index: https://www.ebscohost.com/academic/british-education-index     WorldCat: https://www.worldcat.org/     ProQuest Dissertations & Theses Global: https://www.proquest.com/products-services/pqdtglobal.html     EThOS (Electronic Theses Online Service): https://ethos.bl.uk/     Institutional Repositories of universities and academic institutions (search for specific institution repositories)     American Educational Research Association (AERA): https://www.aera.net/     National Education Association (NEA): https://www.nea.org/     Association for Supervision and Curriculum Development (ASCD): http://www.ascd.org/     RAND Corporation: https://www.rand.org/     National Center for Education Research (NCER): https://ies.ed.gov/ncer/     National Center for Education Statistics (NCES): https://nces.ed.gov/     European Educational Research Association (EERA): https://www.eera-ecer.de/     UNESCO International Institute for Educational Planning (IIEP): http://www.iiep.unesco.org/     Organization for Economic Cooperation and Development (OECD): https://www.oecd.org/education/     Education Development Center (EDC): https://www.edc.org/     National Institute for Educational Policy Research (NIER) - Japan: https://www.nier.go.jp/     Australian Council for Educational Research (ACER): https://www.acer.org/     Canadian Education Association (CEA): https://www.cea-ace.ca/ My topic is about stem education. Which grey literature databases should I use?    STEM Education Research Centre (SERC) Grey Literature Database    National STEM Education Research Repository (NSERR)    National Science Digital Library (NSDL)    STEM Education Resource Center (SERC) GrayLit Network    STEM Education Grey Literature Library

See other Comparison Examples under Bing AI.

Examples - ZERO SHOT, ONE SHOT & FEW SHOT PROMPTING for literature review See the examples on the left panel- How to Fine Tune Prompts

Bing Chat / Copilot

  • Bing Chat has become Copilot.
  • CONVERSATIONAL LLM 4.0 + SEARCH ENGINE. Connects to the Internet.
  • Three conversation modes: More Creative, More Balanced or More Precise.
  • Comparison - Prompt: My topic is about psychology. Which grey literature databases should I use? More Precise - PsycExtra, NIH Library and links to library guides, .gov and .org websites More Balanced - PsycExtra and three main research guides that mention psychology grey literature sources. More Creative- PsycExtra, LILACS, and Worldcat + definition of grey literature + suggestions on searching for professional association websites, newsletters, magazines, standards and guidelines, and other sources that may contain relevant information for your topic.
  • Some examples below were tested in Bing Chat only.  
  • Example - factors of infant mortality Infant mortality is a term that refers to the death of infants before their first birthday. The leading causes of infant mortality include birth asphyxia, pneumonia, congenital malformations, term birth complications such as abnormal presentation of the fetus umbilical cord prolapse, or prolonged labor, neonatal infection, diarrhea, malaria, measles, and malnutrition1. Many situational factors lead to infant mortality such as the pregnant woman’s level of education, environmental conditions, political infrastructure, and level of medical support1. In the United States, a primary determinant of infant mortality risk is infant birth weight with lower birth weights increasing the risk of infant mortality. The determinants of low birth weight include socioeconomic, psychological, behavioral and environmental factors2.  
  • Example: PEER REVIEWED SOURCES I need to find some peer reviewed articles about infant mortality? The answer includes 5 articles. The first one is “Causes of death and infant mortality rates among full-term births in the United States between 2010 and 2012: An observational study” by Neha Bairoliya and Günther Fink1. A click on  " Tell me more about the first article " leads to this summary: " The first article is titled “Causes of death and infant mortality rates among full-term births in the United States between 2010 and 2012: An observational study” and was published in PLOS Medicine on March 20, 2018. The authors are Neha Bairoliya and Günther Fink. The study used linked birth and death records for the period 2010–2012 to identify the state- and cause-specific burden of infant mortality among full-term infants (born at 37–42 weeks of gestation) in the United States. The researchers found that the full-term infant mortality rate (FTIMR) was 2.2 per 1,000 live births overall and ranged between 1.29 (Connecticut) and 3.77 (Mississippi) at the state level. Sudden unexpected death in infancy (SUDI) accounted for 43% of FTIM; congenital malformations and perinatal conditions accounted for 31% and 11.3% of FTIM, respectively. The largest mortality differentials between states with good and states with poor FTIMR were found for SUDI, with particularly large risk differentials for deaths due to sudden infant death syndrome (SIDS) and suffocation." More searches: Please find five empirical studies focusing on the factors influencing adult learners engaged in online education. Please locate five scholarly articles on stress management among college students site:pubmed.ncbi.nlm.nih.gov

Image illustrating data extraction

The ERIC Thesaurus contains nearly 12,000 terms, including more than 4,500 descriptors and 7,000 synonyms1. You can use the term “Academic Achievement” or its synonyms such as “Educational Achievement”, “Student Achievement”, “Academic Performance”, “Academic Success”, “Learning Achievement” and more. [ Note : The follow-up responses include some additional sources. See two below: https://www.powerthesaurus.org/academic_achievement https://eric.ed.gov/?ti=Achievement+Gap   including https://eric.ed.gov/?ti=Academic+Achievement |

My topic is about psychology.  Which grey literature databases should I use?

ChatGPT PsycEXTRA : This database is produced by the American Psychological Association (APA) and includes grey literature such as conference papers, white papers, and government reports. OpenGrey : This database provides access to grey literature produced in Europe, including reports, theses, and conference papers Grey Literature Report : This database is produced by The New York Academy of Medicine and includes reports, conference papers, and other grey literature on health and biomedical topics, including psychology. ProQuest Dissertations and Theses Global : This database provides access to graduate-level research, including theses and dissertations related to psychology. Google Scholar: While not exclusively a grey literature database, Google Scholar can be a useful tool for finding conference papers and other grey literature related to psychology. National Technical Information Service (NTIS) : This database provides access to technical reports produced by U.S. government agencies, including those related to psychology. Bing AI   - [asked in the Precise mode] For psychology, some recommended grey literature databases include PsycEXTRA and PsyArXiv . PsycEXTRA is a database of grey literature material relating to psychology, behavioral sciences and health. PsyArXiv is a psychology archive designed to facilitate rapid dissemination of psychological research. What other models have been developed based on SERVQUAL?

ChatGPT SERVPERF: This is a modified version of SERVQUAL that measures service performance instead of service quality. It was developed by Cronin and Taylor (1992) and is based on the same five dimensions as SERVQUAL. SERVPERF-GAP: This model, developed by Caruana and Ewing (2006), extends the SERVQUAL model to include three additional gaps that can occur when customers evaluate service quality. E-SERVQUAL: This model, developed by Parasuraman et al. (2005), adapts the SERVQUAL scale to measure the quality of online services. SERVCO: This is a modified version of SERVQUAL that includes a sixth dimension, "cost," in addition to the five original dimensions. It was developed by Sureshchandar et al. (2002) to address the importance of cost in service quality evaluation. SERVQ-HC: This is a modified version of SERVQUAL that is specifically designed for evaluating the quality of health care services. It was developed by Cronin et al. (2000) and includes 20 items across five dimensions. Bing AI The Nordic model of service quality2. The hierarchical model of service quality3. The American Customer Satisfaction Index (ACSI)4.

     Bing AI Sidebar - CHAT, THINK, & WRITE

  • Examples of Using AI sidebar to Recap Google Scholar findings

Chat - If you highlight a study in the search results of Google Scholar, you will be prompted with a question by AI sidebar asking if you want to send the selected text to it. If you choose to do so, you can then select either  EXPLAIN or EXPAND to learn more about the chosen article. It's worth noting that EXPAND provides more detailed information. You can then click on any of the follow-up links to delve further into the selected article or topic. If the text of the chosen study is written in a foreign language, the AI sidebar can also provide an explanation of the text in English. Another option for extracting the key points from a selected article is to submit the URL of the PDF. Insights - Use the insights button to get more ideas about the topic. Compose - Tell the Edge sidebar your topic and let its Compose tool to generate a draft for you. Google Scholar Button- See also the Plugins and Extension page for how to perform a combined search using Google, Google Scholar, and ChatGPT .

Example: FINDING SIMILAR ARTICLES Find similar articles like this one - [then paste in the abstract ] Note - Use this prompt with Bing chat since real-time web info is required.

Examples: SUMMARIZE + REFERENCE Use a basic prompt like " Please summarize . ..." or add a REFERENCE TEXT that provides the necessary information about what you want it to focus or how the summary or resolution should look like. The reference could be a quote, an excerpt, a partial sentence, a conversation, an abstract, a code snippet and so on. Formula : Specify a role > assign a task > provide a reference text. Example#1 - Scholarly Journal Quote

STEP 1 Prompt: As a research assistant specializing in renewable energy, your task is to locate comparable research studies based on the provided abstract from an article: "This paper discusses the perspective of renewable energy (wind, solar, wave and biomass) in the making of strategies for a sustainable development. Such strategies typically involve three major technological changes: energy savings on the demand side, efficiency improvements in the energy production, and replacement of fossil fuels by various sources of renewable energy."

Bing Chat then found 3 documents: the ScienceDirect paper where the abstract came from and two other documents from un.org and frontiersin.org respectively.

STEP 2 Prompt: Please synthesize the findings of the three studies provided above.

Bing Chat: "Sure! Here is a synthesis of the findings from the three studies:

All three studies emphasize the importance of renewable energy in achieving sustainable development and mitigating climate change. They all discuss various strategies and actions that can be taken to increase the use of renewable energy sources and reduce reliance on fossil fuels.

Henrik Lund’s paper discusses the perspective of renewable energy (wind, solar, wave and biomass) in the making of strategies for a sustainable development. Such strategies typically involve three major technological changes: energy savings on the demand side, efficiency improvements in the energy production, and .....

The United Nations outlines five critical actions the world needs to prioritize now to transform our energy systems and speed up the shift to renewable energy: Make renewable energy technology a global public good, Improve global access to components and raw materials......

“Sustainable Energy Transition for Renewable and Low Carbon Grid” discusses that a sustainable global electricity transition will entail increased use of renewable energy sources particularly wind and solar, nuclear energy as a low carbon energy source, electrification of transport and.......

In summary , these studies suggest that transitioning to renewable energy sources is crucial for sustainable development and mitigating climate change. This can be achieved through various strategies such as increasing energy efficiency, improving access to renewable technology and materials, leveling the playing field for renewables, ending fossil fuel subsidies, and redirecting investments towards renewables." Example #2 Y ou are an expert researcher, given the following article, please discuss its key findings. Reference: How Large Language Models are Transforming Machine-Paraphrased Plagiarism at https://arxiv.org/abs/2210.03568 This is another way to put REFERENCE TEXT. As a side not, role specifying or the use of persona is important as it can impact the level of responses obtained. Just imagine if the designated role were " You are a high school student... ." Example #3 Please read this article and summarize it f or me - "Who Should I Trust: AI or Myself? Leveraging Human and AI Correctness Likelihood to Promote Appropriate Trust in AI-Assisted Decision-Making" and then find 5 similar studies and please also summarize each for me.

ai tools for systematic literature review

CONNECTED PAPERS

  • RELATED STUDIES
  • Uses visual graphs or other ways to show relevant studies. The database is connected to the Semantic Scholar Paper Corpus which has compiled hundreds of millions of published papers across many science and social science fields.
  • See more details about how it works .  
  • Example - SERVQUAL and then click on SELECT A PAPER TO BUILD THE GRAPH > The first paper was selected. Results: (1) Origin paper - SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality + Connected papers with links to Connected Papers / PDF / DOI or Publisher's site / Semantic Scholar / Google Scholar. (2) Graph showing the origin paper + connected papers with links to the major sources . See above. (3) Links to Prior Works and Derivative Works See the detailed citations by Semantic Scholar on the origin SERVQUAL paper on the top of this page within Semantic Scholars.
  • How to Search Search by work title. Enter some keywords about a topic.
  • Download / Save Download your saved Items in Bib format.

PAPER DIGEST

  • SUMMARY & SYNTHESIS
  • " Knowledge graph & natural language processing platform tailored for technology domain . <"https://www.paperdigest.org/> Areas covered: technology, biology/health, all sciences areas, business, humanities/ social sciences, patents and grants ...

ai tools for systematic literature review

  • LITERATURE REVIEW - https://www.paperdigest.org/review/ Systematic Review - https://www.paperdigest.org/literature-review/
  • SEARCH CONSOLE - https://www.paperdigest.org/search/ Conference Digest - NIPS conference papers ... Tech AI Tools: Literature Review  | Literature Search | Question Answering | Text Summarization Expert AI Tools: Org AI | Expert search | Executive Search, Reviewer Search, Patent Lawyer Search...

Daily paper digest / Conference papers digest / Best paper digest / Topic tracking. In Account enter the subject areas interested. Daily Digest will upload studies based on your interests.

RESEARCH RABBIT

  • CITATION-BASED MAPPING: SIMILAR / EARLY / LATER WORKS
  • " 100s of millions of academic articles and covers more than 90%+ of materials that can be found in major databases used by academic institutions (such as Scopus, Web of Science, and others) ." See its FAQs page. Search algorithms were borrowed from NIH and Semantic Scholar.

The default “Untitled Collection” will collect your search histories, based on which Research Rabbit will send you recommendations for three types of related results: Similar Works / Earlier Works / Later Works, viewable in graph such as Network, Timeline, First Authors etc.

Zotero integration: importing and exporting between these two apps.

  • Example - SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality [Login required] Try it to see its Similar Works, Earlier Works and Later Works or other documents.
  • Export Results - Findings can be exported in BibTxt, RIS or CSV format.

CITING GENERATIVE AI

  • How to cite ChatGPT  [APA] - https://apastyle. apa.org/blog /how-to-cite-chatgpt  
  • How to Cite Generative AI  [MLA]  https://style. mla.org /citing-generative-ai/
  • Citation Guide - Citing ChatGPT and Other Generative AI (University of Queensland, Australia)
  • Next: Dialogues: Insightful Facts >>
  • Last Updated: May 9, 2024 2:16 PM
  • URL: https://tamu.libguides.com/c.php?g=1289555

Help | Advanced Search

Computer Science > Computation and Language

Title: text generation: a systematic literature review of tasks, evaluation, and challenges.

Abstract: Text generation has become more accessible than ever, and the increasing interest in these systems, especially those using large language models, has spurred an increasing number of related publications. We provide a systematic literature review comprising 244 selected papers between 2017 and 2024. This review categorizes works in text generation into five main tasks: open-ended text generation, summarization, translation, paraphrasing, and question answering. For each task, we review their relevant characteristics, sub-tasks, and specific challenges (e.g., missing datasets for multi-document summarization, coherence in story generation, and complex reasoning for question answering). Additionally, we assess current approaches for evaluating text generation systems and ascertain problems with current metrics. Our investigation shows nine prominent challenges common to all tasks and sub-tasks in recent text generation publications: bias, reasoning, hallucinations, misuse, privacy, interpretability, transparency, datasets, and computing. We provide a detailed analysis of these challenges, their potential solutions, and which gaps still require further engagement from the community. This systematic literature review targets two main audiences: early career researchers in natural language processing looking for an overview of the field and promising research directions, as well as experienced researchers seeking a detailed view of tasks, evaluation methodologies, open challenges, and recent mitigation strategies.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Loading metrics

Open Access

Peer-reviewed

Research Article

Frameworks for procurement, integration, monitoring, and evaluation of artificial intelligence tools in clinical settings: A systematic review

Contributed equally to this work with: Sarim Dawar Khan, Zahra Hoodbhoy

Roles Data curation, Formal analysis, Methodology, Project administration, Writing – original draft

Affiliation CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan

Roles Conceptualization, Methodology, Supervision, Writing – review & editing

Affiliations CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan, Department of Paediatrics and Child Health, Aga Khan University, Karachi, Pakistan

Roles Project administration, Writing – original draft, Writing – review & editing

ORCID logo

Roles Methodology, Writing – review & editing

Affiliation Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States

Roles Data curation, Formal analysis, Methodology, Writing – review & editing

Affiliations Population Health Science Institute, Newcastle University, Newcastle upon Tyne, United Kingdom, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, United Kingdom, Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom

Roles Data curation, Formal analysis, Project administration

Roles Data curation, Formal analysis

Roles Writing – original draft, Writing – review & editing

Roles Methodology, Visualization

Roles Methodology, Project administration, Visualization

Roles Supervision, Writing – review & editing

Affiliations Duke Clinical Research Institute, Duke University School of Medicine, Durham, North Carolina, United States, Division of Cardiology, Duke University School of Medicine, Durham, North Carolina, United States

¶ ‡ ZS and MPS also contributed equally to this work.

Affiliations CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan, Department of Medicine, Aga Khan University, Karachi, Pakistan

* E-mail: [email protected]

  • Sarim Dawar Khan, 
  • Zahra Hoodbhoy, 
  • Mohummad Hassan Raza Raja, 
  • Jee Young Kim, 
  • Henry David Jeffry Hogg, 
  • Afshan Anwar Ali Manji, 
  • Freya Gulamali, 
  • Alifia Hasan, 
  • Asim Shaikh, 

PLOS

  • Published: May 29, 2024
  • https://doi.org/10.1371/journal.pdig.0000514
  • Reader Comments

Fig 1

Research on the applications of artificial intelligence (AI) tools in medicine has increased exponentially over the last few years but its implementation in clinical practice has not seen a commensurate increase with a lack of consensus on implementing and maintaining such tools. This systematic review aims to summarize frameworks focusing on procuring, implementing, monitoring, and evaluating AI tools in clinical practice. A comprehensive literature search, following PRSIMA guidelines was performed on MEDLINE, Wiley Cochrane, Scopus, and EBSCO databases, to identify and include articles recommending practices, frameworks or guidelines for AI procurement, integration, monitoring, and evaluation. From the included articles, data regarding study aim, use of a framework, rationale of the framework, details regarding AI implementation involving procurement, integration, monitoring, and evaluation were extracted. The extracted details were then mapped on to the Donabedian Plan, Do, Study, Act cycle domains. The search yielded 17,537 unique articles, out of which 47 were evaluated for inclusion based on their full texts and 25 articles were included in the review. Common themes extracted included transparency, feasibility of operation within existing workflows, integrating into existing workflows, validation of the tool using predefined performance indicators and improving the algorithm and/or adjusting the tool to improve performance. Among the four domains (Plan, Do, Study, Act) the most common domain was Plan (84%, n = 21), followed by Study (60%, n = 15), Do (52%, n = 13), & Act (24%, n = 6). Among 172 authors, only 1 (0.6%) was from a low-income country (LIC) and 2 (1.2%) were from lower-middle-income countries (LMICs). Healthcare professionals cite the implementation of AI tools within clinical settings as challenging owing to low levels of evidence focusing on integration in the Do and Act domains. The current healthcare AI landscape calls for increased data sharing and knowledge translation to facilitate common goals and reap maximum clinical benefit.

Author summary

The use of artificial intelligence (AI) tools has seen exponential growth in multiple industries, over the past few years. Despite this, the implementation of these tools in healthcare settings is lagging with less than 600 AI tools approved by the United States Food and Drug Administration and fewer job AI related job postings in healthcare as compared to other industries. In this systematic review, we tried to organize and synthesize data and themes from published literature regarding key aspects of AI tool implementation; namely procurement, integration, monitoring and evaluation and map the extracted themes on to the Plan-Do-Study-Act framework. We found that currently the majority of literature on AI implementation in healthcare settings focuses on the “Plan” and “Study” domains with considerably less emphasis on the “Do” and “Act” domains. This is perhaps the reason why experts currently cite the implementation of AI tools in healthcare settings as challenging. Furthermore, the current AI healthcare landscape has poor representation from low and lower-middle-income countries. To ensure, the healthcare industry is able to implement AI tool into clinical workforce, across a variety of settings globally, we call for diverse and inclusive collaborations, coupled with further research targeted on the under-investigated stages of AI implementation.

Citation: Khan SD, Hoodbhoy Z, Raja MHR, Kim JY, Hogg HDJ, Manji AAA, et al. (2024) Frameworks for procurement, integration, monitoring, and evaluation of artificial intelligence tools in clinical settings: A systematic review. PLOS Digit Health 3(5): e0000514. https://doi.org/10.1371/journal.pdig.0000514

Editor: Zhao Ni, Yale University, UNITED STATES

Received: September 4, 2023; Accepted: April 18, 2024; Published: May 29, 2024

Copyright: © 2024 Khan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Funding: This work was supported by the Patrick J. McGovern Foundation (Grant ID 383000239 to SDK, ZH, MHR, JYK, AAAM, FG, AH, AS, ST, NSK, MRP, SB, ZS, MPS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: MPS is a co-inventor of intellectual property licensed by Duke University to Clinetic, Inc., KelaHealth, Inc, and Cohere-Med, Inc. MPS holds equity in Clinetic, Inc. MPS has received honorarium for a conference presentation from Roche. MPS is a board member of Machine Learning for Health Care, a non-profit that convenes an annual research conference. SB is a co-inventor of intellectual property licensed by Duke University to Clinetic, Inc. and Cohere-Med, Inc. SB holds equity in Clinetic, Inc.

Introduction

The use of Artificial Intelligence (AI) tools has been exponentially growing, with several applications in the healthcare industry and tremendous potential to improve health outcomes. While there has been a rapid increase in literature on the use of AI in healthcare, the implementation of AI tools is lagging in both high-income and low-income settings, compared to other industries, has been noted, with fewer than 600 Food and Drug Administration-approved AI algorithms, and even fewer being presently used in clinical settings [ 1 – 4 ]. The development-implementation gap has been further assessed by Goldfarb et al., using job advertisements as a surrogate marker to measure technology diffusion patterns, finding among skilled healthcare job postings between 2015–2018, 1 in 1250 postings required AI skills, comparatively lower than other skilled sectors (information technology, management, finance and insurance, manufacturing etc.) [ 5 ].

Implementation of AI tools is a multi-phase process that involves procurement, integration, monitoring, and evaluation [ 6 , 7 ]. Procurement involves the scouting process before integrating an AI tool, including decisions whether to build the tool or buy the tool. Integration involves deploying an AI tool and incorporating it into existing clinical workflows. Monitoring and evaluation occur post-integration and entails keeping track of tool performance metrics, determining the impact of integrating the tool, and modifying it as needed to ensure it keeps functioning at its original intended level of performance. A key barrier highlighted by healthcare leaders across the globe to AI implementation in healthcare includes a lack of a systematic approach to AI procurement, implementation, monitoring and evaluation, since the majority of research on AI in healthcare does not comprehensively explore the multiple, complex steps involved in ensuring optimal implementation [ 8 – 11 ].

This systematic review aims to summarize themes arising from frameworks focusing on procuring, integrating, monitoring, and evaluating AI tools in clinical practice.

This systematic review followed the Preferred Items for Systematic Review and Meta-Analysis (PRISMA) guidelines for systematic reviews ( S1 Checklist ) [ 12 ]. This review is registered on PROSPERO (ID: CRD42022336899).

Information sources and search strategy

We searched electronic databases (MEDLINE, Wiley Cochrane, Scopus, EBSCO) until June 2022. The search string contained terms that described technology, setting, framework, and implementation phase including AI tool procurement, integration, monitoring, evaluation, including standard MeSH terms. Terms that weren’t standard MeSH terms, such as “clinical setting” were added following iterative discussions. To capture papers that were methodical guidelines for AI implementation, as opposed to experiential papers, and recognizing the heterogeneous nature of “frameworks”, ranging from commentaries to complex, extensively researched models, multiple terms such as “framework”, “model” and “guidelines” were used in the search strategy, without explicit definitions with the understanding that these encompassing terms would capture all relevant literature, which would later be refined as per the inclusion and exclusion criteria. The following search string was employed on MEDLINE: ("Artificial Intelligence"[Mesh] OR "Artificial Intelligence" OR "Machine Learning") AND ("clinical setting*"[tiab] OR clinic*[tiab] OR "Hospital" OR "Ambulatory Care"[Mesh] OR "Ambulatory Care Facilities"[Mesh]) AND (framework OR model OR guidelines) AND (monitoring OR evaluation OR procurement OR integration OR maintenance) without any restrictions. Search strategy used for the other databases are described in the appendix ( S1 Appendix ). All search strings were designed and transformed according to the database by the lead librarian (KM) at The Aga Khan University.

Eligibility criteria

Inclusion criteria..

All studies focused on implementing AI tools in a clinical setting were included. AI implementation was broadly conceptualized to consist of procurement, integration, monitoring, and evaluation. There was no restriction on the types of article included.

Exclusion criteria.

Studies published in any language besides English were excluded. Studies describing a single step of implementation (e.g., procurement) for a single AI tool that did not present a framework for implementation were not included, along with studies that discussed the experience of consumers using an AI tool as opposed to discussion on AI frameworks.

Study selection

Retrieved articles from the systematic search were imported into EndNote Reference Manager (Version X9; Clarivate Analytics, Philadelphia, Pennsylvania) and duplicate articles were removed. All articles were screened in duplicate by two independent pairs of reviewers (AM and JH, FG and SDK). Full texts of articles were then comprehensively reviewed for inclusion based on the predetermined criteria. Due to the heterogenous nature of articles curated (including opinion pieces) a risk of bias assessment was not conducted, as an appropriate, validated tool does not exist for this purpose.

Data extraction

Three pairs of reviewers (SK and SG, SDK and FG, HDJH and AA) independently extracted data from the selected studies by using a spreadsheet. Pairs attempted to resolve disagreements first, followed by adjudication by a third external reviewer (ZH) if needed. Data extracted comprised of the following items: name of authors, year of publication, journal of publication, country of origin, World Bank region (high-income, middle-income, low-income) for the corresponding author, study aim(s), rationale, methodology, framework novelty, and framework components. Framework component categories included procurement, integration, post-implementation monitoring and evaluation [ 6 , 7 ].

Data analysis

The qualitative data were extracted and delineated into themes based on the concepts presented in each individual study. Due to the lack of risk of bias assessment, a sensitivity analysis was not conducted. Once extracted, the themes (that encompassed the four stages of implementation (procurement, integration, evaluation, and monitoring)) were then clustered into different categories through iterative discussion and agreement within the investigator team. The study team felt that while a holistic framework for AI implementation does not yet exist, there are analogous structures that are widely used in healthcare quality improvement. One of the best established structures used for iterative quality improvement is the plan-do-study-act (PDSA) method ( S1 Fig ) [ 13 ]. PDSA is commonly used for a variety of healthcare improvement efforts [ 14 ], including patient feedback systems [ 15 ] and adherence to guideline-based practices [ 16 ]. This method has four stages: plan, do, study, and act. The ‘plan’ stage identifies a change to be improved; the ‘do’ stage tests the change; the ‘study’ stage examines the success of the change and the ‘act’ stage identifies adaptations and next steps to inform a new cycle [ 13 ]. PDSA is well suited to serve as a foundation for implementing AI, because it is well understood by healthcare leaders around the globe and offers a high level of abstraction to accommodate the great breadth of relevant use cases and implementation contexts. Hence the PDSA framework was deductively chosen, and the extracted themes from the articles (irrespective of whether the original article(s) contained the PDSA framework) were then mapped onto the 4 domains of PDSA framework, with the ‘plan’ domain representing the steps required in procurement, the ‘do’ domain representing the clinical integration, the ‘study’ domain highlighting the monitoring and evaluation processes and the ‘act’ domain representing the actions taken after the monitoring and evaluation process to improve functioning of the tool. This is displayed in S1 Table .

Baseline characteristics of included articles

A total of 17,537 unique studies were returned by the search strategy, with 47 studies included after title and abstract screening for full text review. 25 studies were included in the systematic review following full-text review. 22 studies were excluded in total because they either focused on pre-implementation processes (n = 12), evaluated the use of a singular tool (n = 4), evaluated perceptions of consumers (n = 4) or did not focus on a clinical setting (n = 2). Fig 1 . Shows the PRISMA diagram for this process. A range of articles, from narrative reviews and systematic reviews to opinion pieces and letters to the editor, were included for the review.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pdig.0000514.g001

The year of publication of the included articles ranged from 2017 to 2022 with the most (40%, n = 10) articles being published in 2020 and the least being published in 2017 and 2018 (4%, n = 1 each). All corresponding authors of the 25 included articles (100%) originated from high-income countries with the most common country of author affiliation being United States of America (52%, n = 13), followed by the United Kingdom, Canada, and Australia (24%, n = 2 each). Among 172 authors, only 1 (0.6%) was from a low-income country (LIC)(Uganda) and 2 (1.2%) from low-middle-income country (LMIC) (India and Ghana) ( Table 1 ). When stated, funding organizations included institutions in the US, Canada, the European Union and South Korea [ 17 – 24 ].

thumbnail

https://doi.org/10.1371/journal.pdig.0000514.t001

From the 25 included articles, a total of 17 themes were extracted, which were later mapped to respective domains. Table 2 . Shows a summary of the distribution of themes across all the PDSA domains including a few sample quotes from eligible articles. Fig 2 . Shows a Sankey diagram highlighting the overlap between all themes across all articles. The extracted themes are discussed below.

thumbnail

https://doi.org/10.1371/journal.pdig.0000514.g002

thumbnail

https://doi.org/10.1371/journal.pdig.0000514.t002

Seven themes were clustered and mapped to the Plan domain. Most articles in the Plan domain focused on the themes of feasibility of operation within existing workflows (48%, n = 12), followed by transparency (32%, n = 8) and ethical issues and bias (32%, n = 8), the cost of purchasing and implementing the tool (20%, n = 5), regulatory approval (20%, n = 5), rationale for use of AI tools (16% n = 4) and legal liability for harm (12%, n = 3). Example quotes related to each theme are captured in Table 2 .

1) Rationale of use of AI tools. Frameworks highlight the need to select clinically relevant problems and identify the need for acquiring an AI tool before initiating the procurement process [ 27 , 34 – 36 ].

2) Ethical issues and bias. Frameworks noted that AI tools may be developed in the context of competitive venture capitalism, the values, and ethics of which often differ from, and potentially may be incompatible with, the values of the healthcare industry. While ethical considerations should occur at all stages, it is especially important, before any tool is implemented, AI tool should be critically analyzed in their social, legal, and economic domains, to ensure ethical use while fulfilling its initially intended purpose [ 17 , 18 , 23 , 27 , 29 , 32 , 33 , 37 ].

3) Transparency. Transparency of AI tools is needed to increase trust in it and ensure it is fulfilling its initially intended purpose. Black box AI tools introduce implementation challenges. Teams implementing AI must balance priorities related to accuracy and interpretability. Even without model interpretability, frameworks highlight the importance of transparency in the training population, model functionality, architecture, risk factors and outcome definition. Frameworks also recommend transparency in reporting of model performance metrics as well as the test sets and methods to derive model performance [ 24 , 25 , 28 , 29 , 37 – 40 ].

4) Legal liability for harm. There is emphasis on the legal liability that healthcare settings may face from implementing AI tools that potentially cause harm. There is a need to clarify the degree to which an AI tool developer or clinician user is responsible for potential adverse events. Relevant stakeholders involved in the whole implementation process need to be identified to know who is to be held accountable in case of an adverse event [ 23 , 25 , 29 ].

5) Regulatory requirements: Regulatory frameworks differ across geographies and are in flux. Regulatory decisions about AI tool adoption should be made based on proof of clinically important improvements in relevant patient outcomes [ 22 , 23 , 26 , 32 , 36 ].

6) Cost of purchasing and implementing a tool. Cost is an important factor to consider when deciding to implement an AI tool. The cost should be compared to the baseline standard of care without the tool. Organizations should avoid selecting AI tools that fail to create value for patients or clinicians [ 23 , 26 , 27 , 36 , 41 ].

7) Feasibility of AI tool implementation . A careful analysis of available computing and storage resources should be carried out to ensure sufficient resources are in place to implement a new AI tool. Some AI tools might need specialized infrastructure, particularly if they use large datasets, such as images or high frequency streaming data. Moreover, similar efforts should be made to assess the differences between the cohort on which the AI tool was trained and the patient cohort in the implementation context. It is suggested to locally validate AI tools, develop a proper adoption plan, and provide clinician users sufficient training to increase the likelihood of success [ 20 , 25 , 26 , 28 , 29 , 33 , 35 – 38 , 40 , 41 ].

The following four themes were clustered and mapped to the Do domain. Articles that were clustered in the Do domain primarily focused on integrating into clinical workflows (44%, n = 11). User training was the second most common theme (24%, n = 6), followed by appropriate technical expertise (16%, n = 4) and user acceptability (8%, n = 2). Example quotes related to each theme are captured in Table 2 .

1) Appropriate technical expertise . Frameworks emphasized that the team responsible for implementing and evaluating the new AI tool should include people with different relevant expertise. Specific perspectives that should be include a machine learning expert and clinical expert (i.e. a healthcare professional who has extensive knowledge, experience, and expertise in a specific clinical area that the AI tool is being deployed for). Some frameworks suggested involving individuals with expertise across clinical and technical domains who can bridge among the different stakeholders. Inadequate representation among the team may lead to poor quality of the AI tool and patient harm due to incorrect information presented to clinician users [ 27 , 30 , 40 , 41 ].

2) User training. Frameworks highlighted the need to train clinician end users to get the maximum benefit from newly implemented AI tools, from understanding and interacting with the user interface to interpreting the outputs from the tool. A rigorous and comprehensive training plan should be executed to train the end-users with the required skillset so that they can handle high-risk patient situations [ 27 , 29 , 33 , 35 , 37 , 41 ].

3) User acceptability. Frameworks highlighted the key fact that AI models can be used in inappropriate ways that can potentially be harmful to patients. Unlike drugs, AI models do not come with that clear instructions to help users avoid inappropriate use that can lead to negative effects, hence user acceptability evaluates the how well the end users acclimatize to using the tool [ 25 , 30 ].

4) Integrating into clinical workflows. For AI tools to have clinical impact, the healthcare delivery setting and clinician users must be equipped to effectively use the tool. Healthcare delivery settings should ensure that individual clinicians are empowered to use the tool effectively [ 17 , 20 , 25 , 27 , 28 , 30 , 31 , 33 , 35 , 37 , 41 ].

Five themes were clustered and mapped to the Study domain. Articles that were clustered in the Study domain primarily focused on validation of the tool using predefined performance indicators (40%, n = 10). Assessment of clinical outcomes was the second most common theme (24%, n = 6), followed by user experience (8% n = 2), reporting of adverse events (4%, n = 1) and cost evaluation (4%, n = 1). Example quotes related to each theme are captured in Table 2 .

1) User experience. User experience in the study domain concerned the perception of AI system outputs from different perspectives ranging from professionals to patients. It is important to look at barriers to effective use, including trust, instructions, documentation, and user training [ 21 , 27 ].

2) Validation of the tool using predefined performance indicators. Frameworks discussed many different metrics and approaches to AI tool evaluation, including metrics related to sensitivity, specificity, precision, F1 score, the area under the receiver operating curve (ROC), and calibration plots. In addition to the metrics themselves, it is important to specify how the metrics are calculated. Frameworks also discussed the importance of evaluating AI tools on local, independent datasets and potentially fine-tuning AI tools to local settings, if needed [ 20 – 23 , 27 , 29 , 31 , 35 , 37 , 39 ].

3) Cost evaluation. Frameworks discussed the importance of accounting for costs associated with installation, use, and maintenance of AI tools. A particularly important dimension of costs is burden placed on frontline clinicians and changes in time required to complete clinical duties [ 27 ].

4) Assessment of clinical outcomes. Frameworks highlighted the importance of determining if a given AI tool leads to an improvement in clinical patient outcomes. AI tools are unlikely to improve patient outcomes unless clinician users effectively use the tool to intervene on patients. Changes to clinical decision making should be assessed to also ensure that clinician users do not over-rely on the AI tool [ 18 , 19 , 22 , 25 , 30 , 35 ].

5) Reporting adverse events. Frameworks discussed the importance of defining processes to report adverse events / system failures to relevant regulatory agencies. Healthcare settings should agree on protocols for reporting with the AI tool developer. Software updates that address known problems should be categorized as low-risk, medium-risk or high-risk to ensure stable appropriate use at the time of updating [ 32 ].

One theme was mapped to the Act domain.

1 ) Improvement of the tool/algorithm to improve performance. Frameworks discussed the need for tailored guidance on the use of AI tools that continuously learn from new data and allowing users and sites to adjust and fine-tune model thresholds to optimize performance for local contexts. For all AI tools, continuous monitoring should be in place and there should be channels for clinician users to provide feedback to AI tool developers for necessary changes This theme was mentioned by 6 articles, with example quotes related to theme captured in Table 2 . (24%, n = 6) [ 27 , 29 , 33 , 35 , 37 , 41 ].

Framework coverage of PDSA domains.

Among the four domains (Plan, Do, Study, Act) the most common domain was Plan (84%, n = 21), followed by Study (60%, n = 15), Do (52%, n = 13), and Act (24%, n = 6). Among the 25 included frameworks, four (16%) discussed all 4 domains, four (16%) discussed only 3 domains, ten (40%) discussed only 2 domains, and seven (28%) discussed only 1 domain.

Principal findings

In this systematic review, we comprehensively synthesized themes emerging from AI implementation frameworks, in healthcare, with a specific focus on the different phases of implementation. To help frame the AI implementation phases, we utilized the broadly recognizable PDSA approach. The present study found that current literature on AI implementation mainly focused on Plan and Study domains, whereas Do and Act domains were discussed less often, with a disparity in the representation of LMICs/LICs. Almost all framework authors originated from high-income countries (167 out of 172 authors, 97.1%), with the United States of America being the most represented (68 out of 172 authors, 39.5%).

Assessment of the existing frameworks

Finding the most commonly evaluated domains to be Plan and Study is encouraging as the capacity for strategic change management has been identified as a major barrier to AI implementation in healthcare [ 8 ]. Crossnohere et al. explored 14 AI frameworks in medicine and found comparable findings to the current study where most of the frameworks focused on development and validation subthemes in each domain [ 42 ]. This focus may help to mitigate against potential risks from algorithm integration, such as dataset shift, accidental fitting of confounders and differences in performance metrics owing to generalization to new populations [ 43 ]. The need for evolving, unified regulatory mechanisms, with improved understanding of the capabilities of AI, further drives the conversation towards the initial steps of implementation [ 44 ]. This could explain why researchers often choose to focus on the Plan and Study domains much more than other features of AI tool use, since these steps can be focused on ensuring minimal adverse effect on human outcomes, before implementing the AI tool in a wider setting, especially in healthcare, where the margin of error is minimal, if not, none at all.

The most common themes in the Plan domain were assessing feasibility of model operation within existing workflows, transparency and ethical issues and bias. Researchers across context emphasized the importance of effectively integrating AI tools into clinical workflows to enable positive impacts to clinical outcomes. Similarly, there was consensus among existing frameworks to provide transparency around how models are developed and function, by understanding the internal workings of the tool to comprehend medical decisions stemming from the utilization of AI, to help drive adoption and successful roll outs of AI tools [ 45 ]. Furthermore, there is still vast public concern surrounding the ethical issues in utilizing AI tools in clinical settings [ 46 ]. The least common themes in the Plan domain were rationale for use and legal liability for harm. Without a clear problem statement and rationale for use, adoption of AI is unlikely. Unfortunately, existing frameworks do not yet emphasize the importance of deeply understanding and articulating the problem addressed by an AI tool. Similarly, the lack of emphasis placed on legal liability for harm likely stems from variable approaches to product liability and a general lack of understanding of how to attribute responsibility and accountability of product performance.

The most common theme in the Study domain was validation against predefined performance indicators. Owing to their popularity, when these tools are studied, validation and assessment for clinical outcomes compared to standard of care strategies are perhaps easier to conduct as compared to final implementation procedures. Although, validation of the tool is absolutely vital for institutions to develop clinically trustworthy decision support systems [ 47 ], it is not the sole factor responsible for ensuring that an institution commits to a tool. User experience, economic burden, and regulatory compliance are equally important, if not more important, especially in LMICs [ 48 , 49 ].

We found that the Do and Act phases were the least commonly discussed domains. The fact that these domains were the least focused on across medical literature may contribute to the difficulties reported in the implementation of AI tools into existing human processes and clinical settings [ 50 ]. Within the Do domain implementation challenges are not only faced in clinical applications, but also extended to other healthcare disciplines, such as the delivery of medical education, where lack of technical knowledge is often cited as the main reason for difficulties [ 51 ]. Key challenges in implementation identified previously also include logistical complications and human barriers to adoption, such as ease of use, as well as sociocultural implications [ 43 ], which remain under evaluated. These aspects of implementation potentially form the backbone of supporting the practical rollout of AI tools. However, only a small number of studies focused on user acceptability, user training, and technical expertise requirements, which are key facilitators of successful integration [ 52 ]. Furthermore, it is potentially due to the emerging nature of the field, but the Act domain was by far the least prevalent in eligible articles with only 6 articles discussing improvement of the AI tool following integration.

Gaps in the existing frameworks

We identified that in all included articles, in the current systematic review, HICs tend to dominate the research landscape [ 53 ]. HICs have a robust and diverse funding portfolio and are home to the leading institutions that specialize in all aspects of AI [ 54 ]. The role of HICs in AI development is corroborated by existing literature, for example, three systematic reviews of randomized controlled trials (RCTs) assessing AI tools were published in 2021 and 2022 [ 55 – 57 ]. In total, these reviews included 95 studies published in English conducted across 29 countries. The most common settings were the United States, Netherlands, Canada, Spain, and the United Kingdom (n = 3, 3%). Other than China, the Global South is barely represented, with a single study conducted in India, a single study conducted in South America, and no studies conducted in Africa. This is mirrored by qualitative research, where a recent systematic review found among 102 eligible studies, 90 (88.2%) were from countries meeting the United Nations Development Programme’s definition of “very high human development” [ 58 ].

While LICs/LMICs have great potential to benefit from AI tools with their high disease burdens, their lack of representation puts them at a significant disadvantage in AI adoption. Because existing frameworks were developed for resource and capability rich environment, they may not be generalizable or applicable to LICs/LMICs. They considered neither severe limitations in local equipment, trained personnel, infrastructure, data protection frameworks, and public policies that these countries encounter [ 59 ] nor problems unique to these countries, such as societal acceptance [ 60 ] and physician readiness [ 61 ]. In addition, it has also been argued that AI tools should be contextually relevant and designed to fit a specific setting [ 44 ]. LICs/LMICs often have poor governance frameworks which are vital for the success of AI implementation. Governance is a key theme that is often region specific and contextual, providing a clear structure for ethical oversight and implementation processes. If development of AI is not inclusive of researchers in LICs/LMICs, it has the potential to make these regions slow adopters of technology [ 62 ].

Certain themes, which are important in terms of AI use and were expected to be extracted, were notably missing from literature. The fact that the Act domain was least discussed revealed that the existing frameworks failed to discuss when and how AI tools should be decommissioned and what needs to be considered for upgrading existing tools. Furthermore, while there is great potential to implement AI into healthcare there appears to be a disconnect between developers and end users—a missing link. Crossnohere et al. found that among the frameworks examined for the use of AI in medicine, they were least likely to offer direction with regards to engagement with relevant stakeholders and end users, to facilitate the adaption of AI [ 42 ]. Successful implementation of AI requires active collaboration between developers and end users and “facilitators” who promote this collaboration by connecting these parties [ 42 , 63 ]. The lack of these “facilitators” of AI technology will mean that emerging AI technology may remain confined to a minority of early adopters, with very few tools gaining widespread traction.

Strengths, Limitations and future directions

This review has appreciable strengths and some limitations. This is the first study evaluating implementation of AI tools in clinical settings across the entirety of the medical literature using a robust search strategy. A preestablished, extensively researched framework (PDSA) was also employed for domain and theme mapping. The PDSA framework, when utilized for the distinct mapping of AI implementation procedures in the literature, has been done previously but we believe the current paper takes a different approach by mapping distinct themes of AI implementation to a modified PDSA framework [ 64 ]. The current study aimed to focus on four key concepts with regards to AI implementation, namely procurement, integration, monitoring, and evaluation. These were felt to be a comprehensive yet succinct list that describe the steps of AI implementation within healthcare settings, but by no means are meant to be an exhaustive list. As AI only becomes more dominant in healthcare, the need to continuously appraise these tools will arise and hence has important implications with regards to Quality Improvement. Limitations of the current review include the exclusion of studies published in other languages that might have allowed for the exclusion of some relevant studies and the lack of a risk of bias assessment, due to a lack of validated tools for opinion pieces. The term “decision support” was not used in the search strategy, since we were ideally looking to capture frameworks and guidelines from our search strategy on AI implementation rather than articles referring to specific decision support tools. We recognize this may have inadvertently missed some articles however, we felt the terms in the search strategy, formulated iteratively, adequately picked up the necessary articles. A significant number of articles included had an inherently high risk of bias since they are simply expert opinion, and not empirical evidence. Additionally due to the heterogeneity in language surrounding AI implementation, there was considerable difficulty conducting a literature search and some studies may not have been captured by the search strategy. Furthermore, the study only searched scientific papers from four databases, namely MEDLINE, Wiley Cochrane, Scopus, EBSCO. The current review was also not able to compare implementation processes across different countries.

In order to develop clinically applicable strategies to tackle barriers to the implementation of AI tools, we propose that future studies evaluating specific AI tools place additional importance on the specific themes within the later stages of implementation. For future research, strategies to facilitate implementation of AI tools may be developed by identifying subthemes within each PDSA domain. LIC and LMIC stakeholders can fill gaps in current frameworks and must be proactive and intentionally engaged in efforts to develop, integrate, monitor, and evaluate AI tools to ensure wider adoption and benefit globally.

The existing frameworks on AI implementation largely focus on the initial stage of implementation and are generated with little input from LICs/LMICs. Healthcare professionals repeatedly cite how challenging it is to implement AI in their clinical settings with little guidance on how to do so. For future adoption of AI in healthcare, it is necessary to develop a more comprehensive and inclusive framework through engaging collaborators across the globe from different socioeconomic backgrounds and conduct additional studies that evaluate these parameters. Implementation guided by diverse and inclusive collaborations, coupled with further research targeted on under-investigated stages of AI implementation are needed before institutions can start to swiftly adopt existing tools within their clinical settings.

Supporting information

S1 checklist. prisma checklist..

https://doi.org/10.1371/journal.pdig.0000514.s001

S1 Fig. The PDSA cycle.

https://doi.org/10.1371/journal.pdig.0000514.s002

S1 Table. Domains of the Modified PDSA framework for AI implementation.

https://doi.org/10.1371/journal.pdig.0000514.s003

S1 Appendix. Search Strategy.

https://doi.org/10.1371/journal.pdig.0000514.s004

Acknowledgments

The authors gratefully acknowledge the role of Dr. Khwaja Mustafa, Head Librarian at the Aga Khan University for facilitating and synthesizing the initial literature search.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 2. Center for Devices and Radiological Health. Artificial Intelligence and machine learning (AI/ml)-enabled medical devices. Food and Drug Administration. 2022 [cited 2023 Aug 20]. Available from: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices#resources .
  • 5. Goldfarb A, Teodoridis F. Why is AI adoption in health care lagging? Washington DC: Brookings Institution; 2022.

A systematic literature review of empirical research on ChatGPT in education

  • Open access
  • Published: 26 May 2024
  • Volume 3 , article number  60 , ( 2024 )

Cite this article

You have full access to this open access article

ai tools for systematic literature review

  • Yazid Albadarin   ORCID: orcid.org/0009-0005-8068-8902 1 ,
  • Mohammed Saqr 1 ,
  • Nicolas Pope 1 &
  • Markku Tukiainen 1  

475 Accesses

Explore all metrics

Over the last four decades, studies have investigated the incorporation of Artificial Intelligence (AI) into education. A recent prominent AI-powered technology that has impacted the education sector is ChatGPT. This article provides a systematic review of 14 empirical studies incorporating ChatGPT into various educational settings, published in 2022 and before the 10th of April 2023—the date of conducting the search process. It carefully followed the essential steps outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) guidelines, as well as Okoli’s (Okoli in Commun Assoc Inf Syst, 2015) steps for conducting a rigorous and transparent systematic review. In this review, we aimed to explore how students and teachers have utilized ChatGPT in various educational settings, as well as the primary findings of those studies. By employing Creswell’s (Creswell in Educational research: planning, conducting, and evaluating quantitative and qualitative research [Ebook], Pearson Education, London, 2015) coding techniques for data extraction and interpretation, we sought to gain insight into their initial attempts at ChatGPT incorporation into education. This approach also enabled us to extract insights and considerations that can facilitate its effective and responsible use in future educational contexts. The results of this review show that learners have utilized ChatGPT as a virtual intelligent assistant, where it offered instant feedback, on-demand answers, and explanations of complex topics. Additionally, learners have used it to enhance their writing and language skills by generating ideas, composing essays, summarizing, translating, paraphrasing texts, or checking grammar. Moreover, learners turned to it as an aiding tool to facilitate their directed and personalized learning by assisting in understanding concepts and homework, providing structured learning plans, and clarifying assignments and tasks. However, the results of specific studies (n = 3, 21.4%) show that overuse of ChatGPT may negatively impact innovative capacities and collaborative learning competencies among learners. Educators, on the other hand, have utilized ChatGPT to create lesson plans, generate quizzes, and provide additional resources, which helped them enhance their productivity and efficiency and promote different teaching methodologies. Despite these benefits, the majority of the reviewed studies recommend the importance of conducting structured training, support, and clear guidelines for both learners and educators to mitigate the drawbacks. This includes developing critical evaluation skills to assess the accuracy and relevance of information provided by ChatGPT, as well as strategies for integrating human interaction and collaboration into learning activities that involve AI tools. Furthermore, they also recommend ongoing research and proactive dialogue with policymakers, stakeholders, and educational practitioners to refine and enhance the use of AI in learning environments. This review could serve as an insightful resource for practitioners who seek to integrate ChatGPT into education and stimulate further research in the field.

Similar content being viewed by others

ai tools for systematic literature review

Empowering learners with ChatGPT: insights from a systematic literature exploration

ai tools for systematic literature review

Incorporating AI in foreign language education: An investigation into ChatGPT’s effect on foreign language learners

ai tools for systematic literature review

Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT

Avoid common mistakes on your manuscript.

1 Introduction

Educational technology, a rapidly evolving field, plays a crucial role in reshaping the landscape of teaching and learning [ 82 ]. One of the most transformative technological innovations of our era that has influenced the field of education is Artificial Intelligence (AI) [ 50 ]. Over the last four decades, AI in education (AIEd) has gained remarkable attention for its potential to make significant advancements in learning, instructional methods, and administrative tasks within educational settings [ 11 ]. In particular, a large language model (LLM), a type of AI algorithm that applies artificial neural networks (ANNs) and uses massively large data sets to understand, summarize, generate, and predict new content that is almost difficult to differentiate from human creations [ 79 ], has opened up novel possibilities for enhancing various aspects of education, from content creation to personalized instruction [ 35 ]. Chatbots that leverage the capabilities of LLMs to understand and generate human-like responses have also presented the capacity to enhance student learning and educational outcomes by engaging students, offering timely support, and fostering interactive learning experiences [ 46 ].

The ongoing and remarkable technological advancements in chatbots have made their use more convenient, increasingly natural and effortless, and have expanded their potential for deployment across various domains [ 70 ]. One prominent example of chatbot applications is the Chat Generative Pre-Trained Transformer, known as ChatGPT, which was introduced by OpenAI, a leading AI research lab, on November 30th, 2022. ChatGPT employs a variety of deep learning techniques to generate human-like text, with a particular focus on recurrent neural networks (RNNs). Long short-term memory (LSTM) allows it to grasp the context of the text being processed and retain information from previous inputs. Also, the transformer architecture, a neural network architecture based on the self-attention mechanism, allows it to analyze specific parts of the input, thereby enabling it to produce more natural-sounding and coherent output. Additionally, the unsupervised generative pre-training and the fine-tuning methods allow ChatGPT to generate more relevant and accurate text for specific tasks [ 31 , 62 ]. Furthermore, reinforcement learning from human feedback (RLHF), a machine learning approach that combines reinforcement learning techniques with human-provided feedback, has helped improve ChatGPT’s model by accelerating the learning process and making it significantly more efficient.

This cutting-edge natural language processing (NLP) tool is widely recognized as one of today's most advanced LLMs-based chatbots [ 70 ], allowing users to ask questions and receive detailed, coherent, systematic, personalized, convincing, and informative human-like responses [ 55 ], even within complex and ambiguous contexts [ 63 , 77 ]. ChatGPT is considered the fastest-growing technology in history: in just three months following its public launch, it amassed an estimated 120 million monthly active users [ 16 ] with an estimated 13 million daily queries [ 49 ], surpassing all other applications [ 64 ]. This remarkable growth can be attributed to the unique features and user-friendly interface that ChatGPT offers. Its intuitive design allows users to interact seamlessly with the technology, making it accessible to a diverse range of individuals, regardless of their technical expertise [ 78 ]. Additionally, its exceptional performance results from a combination of advanced algorithms, continuous enhancements, and extensive training on a diverse dataset that includes various text sources such as books, articles, websites, and online forums [ 63 ], have contributed to a more engaging and satisfying user experience [ 62 ]. These factors collectively explain its remarkable global growth and set it apart from predecessors like Bard, Bing Chat, ERNIE, and others.

In this context, several studies have explored the technological advancements of chatbots. One noteworthy recent research effort, conducted by Schöbel et al. [ 70 ], stands out for its comprehensive analysis of more than 5,000 studies on communication agents. This study offered a comprehensive overview of the historical progression and future prospects of communication agents, including ChatGPT. Moreover, other studies have focused on making comparisons, particularly between ChatGPT and alternative chatbots like Bard, Bing Chat, ERNIE, LaMDA, BlenderBot, and various others. For example, O’Leary [ 53 ] compared two chatbots, LaMDA and BlenderBot, with ChatGPT and revealed that ChatGPT outperformed both. This superiority arises from ChatGPT’s capacity to handle a wider range of questions and generate slightly varied perspectives within specific contexts. Similarly, ChatGPT exhibited an impressive ability to formulate interpretable responses that were easily understood when compared with Google's feature snippet [ 34 ]. Additionally, ChatGPT was compared to other LLMs-based chatbots, including Bard and BERT, as well as ERNIE. The findings indicated that ChatGPT exhibited strong performance in the given tasks, often outperforming the other models [ 59 ].

Furthermore, in the education context, a comprehensive study systematically compared a range of the most promising chatbots, including Bard, Bing Chat, ChatGPT, and Ernie across a multidisciplinary test that required higher-order thinking. The study revealed that ChatGPT achieved the highest score, surpassing Bing Chat and Bard [ 64 ]. Similarly, a comparative analysis was conducted to compare ChatGPT with Bard in answering a set of 30 mathematical questions and logic problems, grouped into two question sets. Set (A) is unavailable online, while Set (B) is available online. The results revealed ChatGPT's superiority in Set (A) over Bard. Nevertheless, Bard's advantage emerged in Set (B) due to its capacity to access the internet directly and retrieve answers, a capability that ChatGPT does not possess [ 57 ]. However, through these varied assessments, ChatGPT consistently highlights its exceptional prowess compared to various alternatives in the ever-evolving chatbot technology.

The widespread adoption of chatbots, especially ChatGPT, by millions of students and educators, has sparked extensive discussions regarding its incorporation into the education sector [ 64 ]. Accordingly, many scholars have contributed to the discourse, expressing both optimism and pessimism regarding the incorporation of ChatGPT into education. For example, ChatGPT has been highlighted for its capabilities in enriching the learning and teaching experience through its ability to support different learning approaches, including adaptive learning, personalized learning, and self-directed learning [ 58 , 60 , 91 ]), deliver summative and formative feedback to students and provide real-time responses to questions, increase the accessibility of information [ 22 , 40 , 43 ], foster students’ performance, engagement and motivation [ 14 , 44 , 58 ], and enhance teaching practices [ 17 , 18 , 64 , 74 ].

On the other hand, concerns have been also raised regarding its potential negative effects on learning and teaching. These include the dissemination of false information and references [ 12 , 23 , 61 , 85 ], biased reinforcement [ 47 , 50 ], compromised academic integrity [ 18 , 40 , 66 , 74 ], and the potential decline in students' skills [ 43 , 61 , 64 , 74 ]. As a result, ChatGPT has been banned in multiple countries, including Russia, China, Venezuela, Belarus, and Iran, as well as in various educational institutions in India, Italy, Western Australia, France, and the United States [ 52 , 90 ].

Clearly, the advent of chatbots, especially ChatGPT, has provoked significant controversy due to their potential impact on learning and teaching. This indicates the necessity for further exploration to gain a deeper understanding of this technology and carefully evaluate its potential benefits, limitations, challenges, and threats to education [ 79 ]. Therefore, conducting a systematic literature review will provide valuable insights into the potential prospects and obstacles linked to its incorporation into education. This systematic literature review will primarily focus on ChatGPT, driven by the aforementioned key factors outlined above.

However, the existing literature lacks a systematic literature review of empirical studies. Thus, this systematic literature review aims to address this gap by synthesizing the existing empirical studies conducted on chatbots, particularly ChatGPT, in the field of education, highlighting how ChatGPT has been utilized in educational settings, and identifying any existing gaps. This review may be particularly useful for researchers in the field and educators who are contemplating the integration of ChatGPT or any chatbot into education. The following research questions will guide this study:

What are students' and teachers' initial attempts at utilizing ChatGPT in education?

What are the main findings derived from empirical studies that have incorporated ChatGPT into learning and teaching?

2 Methodology

To conduct this study, the authors followed the essential steps of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) and Okoli’s [ 54 ] steps for conducting a systematic review. These included identifying the study’s purpose, drafting a protocol, applying a practical screening process, searching the literature, extracting relevant data, evaluating the quality of the included studies, synthesizing the studies, and ultimately writing the review. The subsequent section provides an extensive explanation of how these steps were carried out in this study.

2.1 Identify the purpose

Given the widespread adoption of ChatGPT by students and teachers for various educational purposes, often without a thorough understanding of responsible and effective use or a clear recognition of its potential impact on learning and teaching, the authors recognized the need for further exploration of ChatGPT's impact on education in this early stage. Therefore, they have chosen to conduct a systematic literature review of existing empirical studies that incorporate ChatGPT into educational settings. Despite the limited number of empirical studies due to the novelty of the topic, their goal is to gain a deeper understanding of this technology and proactively evaluate its potential benefits, limitations, challenges, and threats to education. This effort could help to understand initial reactions and attempts at incorporating ChatGPT into education and bring out insights and considerations that can inform the future development of education.

2.2 Draft the protocol

The next step is formulating the protocol. This protocol serves to outline the study process in a rigorous and transparent manner, mitigating researcher bias in study selection and data extraction [ 88 ]. The protocol will include the following steps: generating the research question, predefining a literature search strategy, identifying search locations, establishing selection criteria, assessing the studies, developing a data extraction strategy, and creating a timeline.

2.3 Apply practical screen

The screening step aims to accurately filter the articles resulting from the searching step and select the empirical studies that have incorporated ChatGPT into educational contexts, which will guide us in answering the research questions and achieving the objectives of this study. To ensure the rigorous execution of this step, our inclusion and exclusion criteria were determined based on the authors' experience and informed by previous successful systematic reviews [ 21 ]. Table 1 summarizes the inclusion and exclusion criteria for study selection.

2.4 Literature search

We conducted a thorough literature search to identify articles that explored, examined, and addressed the use of ChatGPT in Educational contexts. We utilized two research databases: Dimensions.ai, which provides access to a large number of research publications, and lens.org, which offers access to over 300 million articles, patents, and other research outputs from diverse sources. Additionally, we included three databases, Scopus, Web of Knowledge, and ERIC, which contain relevant research on the topic that addresses our research questions. To browse and identify relevant articles, we used the following search formula: ("ChatGPT" AND "Education"), which included the Boolean operator "AND" to get more specific results. The subject area in the Scopus and ERIC databases were narrowed to "ChatGPT" and "Education" keywords, and in the WoS database was limited to the "Education" category. The search was conducted between the 3rd and 10th of April 2023, which resulted in 276 articles from all selected databases (111 articles from Dimensions.ai, 65 from Scopus, 28 from Web of Science, 14 from ERIC, and 58 from Lens.org). These articles were imported into the Rayyan web-based system for analysis. The duplicates were identified automatically by the system. Subsequently, the first author manually reviewed the duplicated articles ensured that they had the same content, and then removed them, leaving us with 135 unique articles. Afterward, the titles, abstracts, and keywords of the first 40 manuscripts were scanned and reviewed by the first author and were discussed with the second and third authors to resolve any disagreements. Subsequently, the first author proceeded with the filtering process for all articles and carefully applied the inclusion and exclusion criteria as presented in Table  1 . Articles that met any one of the exclusion criteria were eliminated, resulting in 26 articles. Afterward, the authors met to carefully scan and discuss them. The authors agreed to eliminate any empirical studies solely focused on checking ChatGPT capabilities, as these studies do not guide us in addressing the research questions and achieving the study's objectives. This resulted in 14 articles eligible for analysis.

2.5 Quality appraisal

The examination and evaluation of the quality of the extracted articles is a vital step [ 9 ]. Therefore, the extracted articles were carefully evaluated for quality using Fink’s [ 24 ] standards, which emphasize the necessity for detailed descriptions of methodology, results, conclusions, strengths, and limitations. The process began with a thorough assessment of each study's design, data collection, and analysis methods to ensure their appropriateness and comprehensive execution. The clarity, consistency, and logical progression from data to results and conclusions were also critically examined. Potential biases and recognized limitations within the studies were also scrutinized. Ultimately, two articles were excluded for failing to meet Fink’s criteria, particularly in providing sufficient detail on methodology, results, conclusions, strengths, or limitations. The review process is illustrated in Fig.  1 .

figure 1

The study selection process

2.6 Data extraction

The next step is data extraction, the process of capturing the key information and categories from the included studies. To improve efficiency, reduce variation among authors, and minimize errors in data analysis, the coding categories were constructed using Creswell's [ 15 ] coding techniques for data extraction and interpretation. The coding process involves three sequential steps. The initial stage encompasses open coding , where the researcher examines the data, generates codes to describe and categorize it, and gains a deeper understanding without preconceived ideas. Following open coding is axial coding , where the interrelationships between codes from open coding are analyzed to establish more comprehensive categories or themes. The process concludes with selective coding , refining and integrating categories or themes to identify core concepts emerging from the data. The first coder performed the coding process, then engaged in discussions with the second and third authors to finalize the coding categories for the first five articles. The first coder then proceeded to code all studies and engaged again in discussions with the other authors to ensure the finalization of the coding process. After a comprehensive analysis and capturing of the key information from the included studies, the data extraction and interpretation process yielded several themes. These themes have been categorized and are presented in Table  2 . It is important to note that open coding results were removed from Table  2 for aesthetic reasons, as it included many generic aspects, such as words, short phrases, or sentences mentioned in the studies.

2.7 Synthesize studies

In this stage, we will gather, discuss, and analyze the key findings that emerged from the selected studies. The synthesis stage is considered a transition from an author-centric to a concept-centric focus, enabling us to map all the provided information to achieve the most effective evaluation of the data [ 87 ]. Initially, the authors extracted data that included general information about the selected studies, including the author(s)' names, study titles, years of publication, educational levels, research methodologies, sample sizes, participants, main aims or objectives, raw data sources, and analysis methods. Following that, all key information and significant results from the selected studies were compiled using Creswell’s [ 15 ] coding techniques for data extraction and interpretation to identify core concepts and themes emerging from the data, focusing on those that directly contributed to our research questions and objectives, such as the initial utilization of ChatGPT in learning and teaching, learners' and educators' familiarity with ChatGPT, and the main findings of each study. Finally, the data related to each selected study were extracted into an Excel spreadsheet for data processing. The Excel spreadsheet was reviewed by the authors, including a series of discussions to ensure the finalization of this process and prepare it for further analysis. Afterward, the final result being analyzed and presented in various types of charts and graphs. Table 4 presents the extracted data from the selected studies, with each study labeled with a capital 'S' followed by a number.

This section consists of two main parts. The first part provides a descriptive analysis of the data compiled from the reviewed studies. The second part presents the answers to the research questions and the main findings of these studies.

3.1 Part 1: descriptive analysis

This section will provide a descriptive analysis of the reviewed studies, including educational levels and fields, participants distribution, country contribution, research methodologies, study sample size, study population, publication year, list of journals, familiarity with ChatGPT, source of data, and the main aims and objectives of the studies. Table 4 presents a comprehensive overview of the extracted data from the selected studies.

3.1.1 The number of the reviewed studies and publication years

The total number of the reviewed studies was 14. All studies were empirical studies and published in different journals focusing on Education and Technology. One study was published in 2022 [S1], while the remaining were published in 2023 [S2]-[S14]. Table 3 illustrates the year of publication, the names of the journals, and the number of reviewed studies published in each journal for the studies reviewed.

3.1.2 Educational levels and fields

The majority of the reviewed studies, 11 studies, were conducted in higher education institutions [S1]-[S10] and [S13]. Two studies did not specify the educational level of the population [S12] and [S14], while one study focused on elementary education [S11]. However, the reviewed studies covered various fields of education. Three studies focused on Arts and Humanities Education [S8], [S11], and [S14], specifically English Education. Two studies focused on Engineering Education, with one in Computer Engineering [S2] and the other in Construction Education [S3]. Two studies focused on Mathematics Education [S5] and [S12]. One study focused on Social Science Education [S13]. One study focused on Early Education [S4]. One study focused on Journalism Education [S9]. Finally, three studies did not specify the field of education [S1], [S6], and [S7]. Figure  2 represents the educational levels in the reviewed studies, while Fig.  3 represents the context of the reviewed studies.

figure 2

Educational levels in the reviewed studies

figure 3

Context of the reviewed studies

3.1.3 Participants distribution and countries contribution

The reviewed studies have been conducted across different geographic regions, providing a diverse representation of the studies. The majority of the studies, 10 in total, [S1]-[S3], [S5]-[S9], [S11], and [S14], primarily focused on participants from single countries such as Pakistan, the United Arab Emirates, China, Indonesia, Poland, Saudi Arabia, South Korea, Spain, Tajikistan, and the United States. In contrast, four studies, [S4], [S10], [S12], and [S13], involved participants from multiple countries, including China and the United States [S4], China, the United Kingdom, and the United States [S10], the United Arab Emirates, Oman, Saudi Arabia, and Jordan [S12], Turkey, Sweden, Canada, and Australia [ 13 ]. Figures  4 and 5 illustrate the distribution of participants, whether from single or multiple countries, and the contribution of each country in the reviewed studies, respectively.

figure 4

The reviewed studies conducted in single or multiple countries

figure 5

The Contribution of each country in the studies

3.1.4 Study population and sample size

Four study populations were included: university students, university teachers, university teachers and students, and elementary school teachers. Six studies involved university students [S2], [S3], [S5] and [S6]-[S8]. Three studies focused on university teachers [S1], [S4], and [S6], while one study specifically targeted elementary school teachers [S11]. Additionally, four studies included both university teachers and students [S10] and [ 12 , 13 , 14 ], and among them, study [S13] specifically included postgraduate students. In terms of the sample size of the reviewed studies, nine studies included a small sample size of less than 50 participants [S1], [S3], [S6], [S8], and [S10]-[S13]. Three studies had 50–100 participants [S2], [S9], and [S14]. Only one study had more than 100 participants [S7]. It is worth mentioning that study [S4] adopted a mixed methods approach, including 10 participants for qualitative analysis and 110 participants for quantitative analysis.

3.1.5 Participants’ familiarity with using ChatGPT

The reviewed studies recruited a diverse range of participants with varying levels of familiarity with ChatGPT. Five studies [S2], [S4], [S6], [S8], and [S12] involved participants already familiar with ChatGPT, while eight studies [S1], [S3], [S5], [S7], [S9], [S10], [S13] and [S14] included individuals with differing levels of familiarity. Notably, one study [S11] had participants who were entirely unfamiliar with ChatGPT. It is important to note that four studies [S3], [S5], [S9], and [S11] provided training or guidance to their participants before conducting their studies, while ten studies [S1], [S2], [S4], [S6]-[S8], [S10], and [S12]-[S14] did not provide training due to the participants' existing familiarity with ChatGPT.

3.1.6 Research methodology approaches and source(S) of data

The reviewed studies adopted various research methodology approaches. Seven studies adopted qualitative research methodology [S1], [S4], [S6], [S8], [S10], [S11], and [S12], while three studies adopted quantitative research methodology [S3], [S7], and [S14], and four studies employed mixed-methods, which involved a combination of both the strengths of qualitative and quantitative methods [S2], [S5], [S9], and [S13].

In terms of the source(s) of data, the reviewed studies obtained their data from various sources, such as interviews, questionnaires, and pre-and post-tests. Six studies relied on interviews as their primary source of data collection [S1], [S4], [S6], [S10], [S11], and [S12], four studies relied on questionnaires [S2], [S7], [S13], and [S14], two studies combined the use of pre-and post-tests and questionnaires for data collection [S3] and [S9], while two studies combined the use of questionnaires and interviews to obtain the data [S5] and [S8]. It is important to note that six of the reviewed studies were quasi-experimental [S3], [S5], [S8], [S9], [S12], and [S14], while the remaining ones were experimental studies [S1], [S2], [S4], [S6], [S7], [S10], [S11], and [S13]. Figures  6 and 7 illustrate the research methodologies and the source (s) of data used in the reviewed studies, respectively.

figure 6

Research methodologies in the reviewed studies

figure 7

Source of data in the reviewed studies

3.1.7 The aim and objectives of the studies

The reviewed studies encompassed a diverse set of aims, with several of them incorporating multiple primary objectives. Six studies [S3], [S6], [S7], [S8], [S11], and [S12] examined the integration of ChatGPT in educational contexts, and four studies [S4], [S5], [S13], and [S14] investigated the various implications of its use in education, while three studies [S2], [S9], and [S10] aimed to explore both its integration and implications in education. Additionally, seven studies explicitly explored attitudes and perceptions of students [S2] and [S3], educators [S1] and [S6], or both [S10], [S12], and [S13] regarding the utilization of ChatGPT in educational settings.

3.2 Part 2: research questions and main findings of the reviewed studies

This part will present the answers to the research questions and the main findings of the reviewed studies, classified into two main categories (learning and teaching) according to AI Education classification by [ 36 ]. Figure  8 summarizes the main findings of the reviewed studies in a visually informative diagram. Table 4 provides a detailed list of the key information extracted from the selected studies that led to generating these themes.

figure 8

The main findings in the reviewed studies

4 Students' initial attempts at utilizing ChatGPT in learning and main findings from students' perspective

4.1 virtual intelligent assistant.

Nine studies demonstrated that ChatGPT has been utilized by students as an intelligent assistant to enhance and support their learning. Students employed it for various purposes, such as answering on-demand questions [S2]-[S5], [S8], [S10], and [S12], providing valuable information and learning resources [S2]-[S5], [S6], and [S8], as well as receiving immediate feedback [S2], [S4], [S9], [S10], and [S12]. In this regard, students generally were confident in the accuracy of ChatGPT's responses, considering them relevant, reliable, and detailed [S3], [S4], [S5], and [S8]. However, some students indicated the need for improvement, as they found that answers are not always accurate [S2], and that misleading information may have been provided or that it may not always align with their expectations [S6] and [S10]. It was also observed by the students that the accuracy of ChatGPT is dependent on several factors, including the quality and specificity of the user's input, the complexity of the question or topic, and the scope and relevance of its training data [S12]. Many students felt that ChatGPT's answers were not always accurate and most of them believed that it requires good background knowledge to work with.

4.2 Writing and language proficiency assistant

Six of the reviewed studies highlighted that ChatGPT has been utilized by students as a valuable assistant tool to improve their academic writing skills and language proficiency. Among these studies, three mainly focused on English education, demonstrating that students showed sufficient mastery in using ChatGPT for generating ideas, summarizing, paraphrasing texts, and completing writing essays [S8], [S11], and [S14]. Furthermore, ChatGPT helped them in writing by making students active investigators rather than passive knowledge recipients and facilitated the development of their writing skills [S11] and [S14]. Similarly, ChatGPT allowed students to generate unique ideas and perspectives, leading to deeper analysis and reflection on their journalism writing [S9]. In terms of language proficiency, ChatGPT allowed participants to translate content into their home languages, making it more accessible and relevant to their context [S4]. It also enabled them to request changes in linguistic tones or flavors [S8]. Moreover, participants used it to check grammar or as a dictionary [S11].

4.3 Valuable resource for learning approaches

Five studies demonstrated that students used ChatGPT as a valuable complementary resource for self-directed learning. It provided learning resources and guidance on diverse educational topics and created a supportive home learning environment [S2] and [S4]. Moreover, it offered step-by-step guidance to grasp concepts at their own pace and enhance their understanding [S5], streamlined task and project completion carried out independently [S7], provided comprehensive and easy-to-understand explanations on various subjects [S10], and assisted in studying geometry operations, thereby empowering them to explore geometry operations at their own pace [S12]. Three studies showed that students used ChatGPT as a valuable learning resource for personalized learning. It delivered age-appropriate conversations and tailored teaching based on a child's interests [S4], acted as a personalized learning assistant, adapted to their needs and pace, which assisted them in understanding mathematical concepts [S12], and enabled personalized learning experiences in social sciences by adapting to students' needs and learning styles [S13]. On the other hand, it is important to note that, according to one study [S5], students suggested that using ChatGPT may negatively affect collaborative learning competencies between students.

4.4 Enhancing students' competencies

Six of the reviewed studies have shown that ChatGPT is a valuable tool for improving a wide range of skills among students. Two studies have provided evidence that ChatGPT led to improvements in students' critical thinking, reasoning skills, and hazard recognition competencies through engaging them in interactive conversations or activities and providing responses related to their disciplines in journalism [S5] and construction education [S9]. Furthermore, two studies focused on mathematical education have shown the positive impact of ChatGPT on students' problem-solving abilities in unraveling problem-solving questions [S12] and enhancing the students' understanding of the problem-solving process [S5]. Lastly, one study indicated that ChatGPT effectively contributed to the enhancement of conversational social skills [S4].

4.5 Supporting students' academic success

Seven of the reviewed studies highlighted that students found ChatGPT to be beneficial for learning as it enhanced learning efficiency and improved the learning experience. It has been observed to improve students' efficiency in computer engineering studies by providing well-structured responses and good explanations [S2]. Additionally, students found it extremely useful for hazard reporting [S3], and it also enhanced their efficiency in solving mathematics problems and capabilities [S5] and [S12]. Furthermore, by finding information, generating ideas, translating texts, and providing alternative questions, ChatGPT aided students in deepening their understanding of various subjects [S6]. It contributed to an increase in students' overall productivity [S7] and improved efficiency in composing written tasks [S8]. Regarding learning experiences, ChatGPT was instrumental in assisting students in identifying hazards that they might have otherwise overlooked [S3]. It also improved students' learning experiences in solving mathematics problems and developing abilities [S5] and [S12]. Moreover, it increased students' successful completion of important tasks in their studies [S7], particularly those involving average difficulty writing tasks [S8]. Additionally, ChatGPT increased the chances of educational success by providing students with baseline knowledge on various topics [S10].

5 Teachers' initial attempts at utilizing ChatGPT in teaching and main findings from teachers' perspective

5.1 valuable resource for teaching.

The reviewed studies showed that teachers have employed ChatGPT to recommend, modify, and generate diverse, creative, organized, and engaging educational contents, teaching materials, and testing resources more rapidly [S4], [S6], [S10] and [S11]. Additionally, teachers experienced increased productivity as ChatGPT facilitated quick and accurate responses to questions, fact-checking, and information searches [S1]. It also proved valuable in constructing new knowledge [S6] and providing timely answers to students' questions in classrooms [S11]. Moreover, ChatGPT enhanced teachers' efficiency by generating new ideas for activities and preplanning activities for their students [S4] and [S6], including interactive language game partners [S11].

5.2 Improving productivity and efficiency

The reviewed studies showed that participants' productivity and work efficiency have been significantly enhanced by using ChatGPT as it enabled them to allocate more time to other tasks and reduce their overall workloads [S6], [S10], [S11], [S13], and [S14]. However, three studies [S1], [S4], and [S11], indicated a negative perception and attitude among teachers toward using ChatGPT. This negativity stemmed from a lack of necessary skills to use it effectively [S1], a limited familiarity with it [S4], and occasional inaccuracies in the content provided by it [S10].

5.3 Catalyzing new teaching methodologies

Five of the reviewed studies highlighted that educators found the necessity of redefining their teaching profession with the assistance of ChatGPT [S11], developing new effective learning strategies [S4], and adapting teaching strategies and methodologies to ensure the development of essential skills for future engineers [S5]. They also emphasized the importance of adopting new educational philosophies and approaches that can evolve with the introduction of ChatGPT into the classroom [S12]. Furthermore, updating curricula to focus on improving human-specific features, such as emotional intelligence, creativity, and philosophical perspectives [S13], was found to be essential.

5.4 Effective utilization of CHATGPT in teaching

According to the reviewed studies, effective utilization of ChatGPT in education requires providing teachers with well-structured training, support, and adequate background on how to use ChatGPT responsibly [S1], [S3], [S11], and [S12]. Establishing clear rules and regulations regarding its usage is essential to ensure it positively impacts the teaching and learning processes, including students' skills [S1], [S4], [S5], [S8], [S9], and [S11]-[S14]. Moreover, conducting further research and engaging in discussions with policymakers and stakeholders is indeed crucial for the successful integration of ChatGPT in education and to maximize the benefits for both educators and students [S1], [S6]-[S10], and [S12]-[S14].

6 Discussion

The purpose of this review is to conduct a systematic review of empirical studies that have explored the utilization of ChatGPT, one of today’s most advanced LLM-based chatbots, in education. The findings of the reviewed studies showed several ways of ChatGPT utilization in different learning and teaching practices as well as it provided insights and considerations that can facilitate its effective and responsible use in future educational contexts. The results of the reviewed studies came from diverse fields of education, which helped us avoid a biased review that is limited to a specific field. Similarly, the reviewed studies have been conducted across different geographic regions. This kind of variety in geographic representation enriched the findings of this review.

In response to RQ1 , "What are students' and teachers' initial attempts at utilizing ChatGPT in education?", the findings from this review provide comprehensive insights. Chatbots, including ChatGPT, play a crucial role in supporting student learning, enhancing their learning experiences, and facilitating diverse learning approaches [ 42 , 43 ]. This review found that this tool, ChatGPT, has been instrumental in enhancing students' learning experiences by serving as a virtual intelligent assistant, providing immediate feedback, on-demand answers, and engaging in educational conversations. Additionally, students have benefited from ChatGPT’s ability to generate ideas, compose essays, and perform tasks like summarizing, translating, paraphrasing texts, or checking grammar, thereby enhancing their writing and language competencies. Furthermore, students have turned to ChatGPT for assistance in understanding concepts and homework, providing structured learning plans, and clarifying assignments and tasks, which fosters a supportive home learning environment, allowing them to take responsibility for their own learning and cultivate the skills and approaches essential for supportive home learning environment [ 26 , 27 , 28 ]. This finding aligns with the study of Saqr et al. [ 68 , 69 ] who highlighted that, when students actively engage in their own learning process, it yields additional advantages, such as heightened motivation, enhanced achievement, and the cultivation of enthusiasm, turning them into advocates for their own learning.

Moreover, students have utilized ChatGPT for tailored teaching and step-by-step guidance on diverse educational topics, streamlining task and project completion, and generating and recommending educational content. This personalization enhances the learning environment, leading to increased academic success. This finding aligns with other recent studies [ 26 , 27 , 28 , 60 , 66 ] which revealed that ChatGPT has the potential to offer personalized learning experiences and support an effective learning process by providing students with customized feedback and explanations tailored to their needs and abilities. Ultimately, fostering students' performance, engagement, and motivation, leading to increase students' academic success [ 14 , 44 , 58 ]. This ultimate outcome is in line with the findings of Saqr et al. [ 68 , 69 ], which emphasized that learning strategies are important catalysts of students' learning, as students who utilize effective learning strategies are more likely to have better academic achievement.

Teachers, too, have capitalized on ChatGPT's capabilities to enhance productivity and efficiency, using it for creating lesson plans, generating quizzes, providing additional resources, generating and preplanning new ideas for activities, and aiding in answering students’ questions. This adoption of technology introduces new opportunities to support teaching and learning practices, enhancing teacher productivity. This finding aligns with those of Day [ 17 ], De Castro [ 18 ], and Su and Yang [ 74 ] as well as with those of Valtonen et al. [ 82 ], who revealed that emerging technological advancements have opened up novel opportunities and means to support teaching and learning practices, and enhance teachers’ productivity.

In response to RQ2 , "What are the main findings derived from empirical studies that have incorporated ChatGPT into learning and teaching?", the findings from this review provide profound insights and raise significant concerns. Starting with the insights, chatbots, including ChatGPT, have demonstrated the potential to reshape and revolutionize education, creating new, novel opportunities for enhancing the learning process and outcomes [ 83 ], facilitating different learning approaches, and offering a range of pedagogical benefits [ 19 , 43 , 72 ]. In this context, this review found that ChatGPT could open avenues for educators to adopt or develop new effective learning and teaching strategies that can evolve with the introduction of ChatGPT into the classroom. Nonetheless, there is an evident lack of research understanding regarding the potential impact of generative machine learning models within diverse educational settings [ 83 ]. This necessitates teachers to attain a high level of proficiency in incorporating chatbots, such as ChatGPT, into their classrooms to create inventive, well-structured, and captivating learning strategies. In the same vein, the review also found that teachers without the requisite skills to utilize ChatGPT realized that it did not contribute positively to their work and could potentially have adverse effects [ 37 ]. This concern could lead to inequity of access to the benefits of chatbots, including ChatGPT, as individuals who lack the necessary expertise may not be able to harness their full potential, resulting in disparities in educational outcomes and opportunities. Therefore, immediate action is needed to address these potential issues. A potential solution is offering training, support, and competency development for teachers to ensure that all of them can leverage chatbots, including ChatGPT, effectively and equitably in their educational practices [ 5 , 28 , 80 ], which could enhance accessibility and inclusivity, and potentially result in innovative outcomes [ 82 , 83 ].

Additionally, chatbots, including ChatGPT, have the potential to significantly impact students' thinking abilities, including retention, reasoning, analysis skills [ 19 , 45 ], and foster innovation and creativity capabilities [ 83 ]. This review found that ChatGPT could contribute to improving a wide range of skills among students. However, it found that frequent use of ChatGPT may result in a decrease in innovative capacities, collaborative skills and cognitive capacities, and students' motivation to attend classes, as well as could lead to reduced higher-order thinking skills among students [ 22 , 29 ]. Therefore, immediate action is needed to carefully examine the long-term impact of chatbots such as ChatGPT, on learning outcomes as well as to explore its incorporation into educational settings as a supportive tool without compromising students' cognitive development and critical thinking abilities. In the same vein, the review also found that it is challenging to draw a consistent conclusion regarding the potential of ChatGPT to aid self-directed learning approach. This finding aligns with the recent study of Baskara [ 8 ]. Therefore, further research is needed to explore the potential of ChatGPT for self-directed learning. One potential solution involves utilizing learning analytics as a novel approach to examine various aspects of students' learning and support them in their individual endeavors [ 32 ]. This approach can bridge this gap by facilitating an in-depth analysis of how learners engage with ChatGPT, identifying trends in self-directed learning behavior, and assessing its influence on their outcomes.

Turning to the significant concerns, on the other hand, a fundamental challenge with LLM-based chatbots, including ChatGPT, is the accuracy and quality of the provided information and responses, as they provide false information as truth—a phenomenon often referred to as "hallucination" [ 3 , 49 ]. In this context, this review found that the provided information was not entirely satisfactory. Consequently, the utilization of chatbots presents potential concerns, such as generating and providing inaccurate or misleading information, especially for students who utilize it to support their learning. This finding aligns with other findings [ 6 , 30 , 35 , 40 ] which revealed that incorporating chatbots such as ChatGPT, into education presents challenges related to its accuracy and reliability due to its training on a large corpus of data, which may contain inaccuracies and the way users formulate or ask ChatGPT. Therefore, immediate action is needed to address these potential issues. One possible solution is to equip students with the necessary skills and competencies, which include a background understanding of how to use it effectively and the ability to assess and evaluate the information it generates, as the accuracy and the quality of the provided information depend on the input, its complexity, the topic, and the relevance of its training data [ 28 , 49 , 86 ]. However, it's also essential to examine how learners can be educated about how these models operate, the data used in their training, and how to recognize their limitations, challenges, and issues [ 79 ].

Furthermore, chatbots present a substantial challenge concerning maintaining academic integrity [ 20 , 56 ] and copyright violations [ 83 ], which are significant concerns in education. The review found that the potential misuse of ChatGPT might foster cheating, facilitate plagiarism, and threaten academic integrity. This issue is also affirmed by the research conducted by Basic et al. [ 7 ], who presented evidence that students who utilized ChatGPT in their writing assignments had more plagiarism cases than those who did not. These findings align with the conclusions drawn by Cotton et al. [ 13 ], Hisan and Amri [ 33 ] and Sullivan et al. [ 75 ], who revealed that the integration of chatbots such as ChatGPT into education poses a significant challenge to the preservation of academic integrity. Moreover, chatbots, including ChatGPT, have increased the difficulty in identifying plagiarism [ 47 , 67 , 76 ]. The findings from previous studies [ 1 , 84 ] indicate that AI-generated text often went undetected by plagiarism software, such as Turnitin. However, Turnitin and other similar plagiarism detection tools, such as ZeroGPT, GPTZero, and Copyleaks, have since evolved, incorporating enhanced techniques to detect AI-generated text, despite the possibility of false positives, as noted in different studies that have found these tools still not yet fully ready to accurately and reliably identify AI-generated text [ 10 , 51 ], and new novel detection methods may need to be created and implemented for AI-generated text detection [ 4 ]. This potential issue could lead to another concern, which is the difficulty of accurately evaluating student performance when they utilize chatbots such as ChatGPT assistance in their assignments. Consequently, the most LLM-driven chatbots present a substantial challenge to traditional assessments [ 64 ]. The findings from previous studies indicate the importance of rethinking, improving, and redesigning innovative assessment methods in the era of chatbots [ 14 , 20 , 64 , 75 ]. These methods should prioritize the process of evaluating students' ability to apply knowledge to complex cases and demonstrate comprehension, rather than solely focusing on the final product for assessment. Therefore, immediate action is needed to address these potential issues. One possible solution would be the development of clear guidelines, regulatory policies, and pedagogical guidance. These measures would help regulate the proper and ethical utilization of chatbots, such as ChatGPT, and must be established before their introduction to students [ 35 , 38 , 39 , 41 , 89 ].

In summary, our review has delved into the utilization of ChatGPT, a prominent example of chatbots, in education, addressing the question of how ChatGPT has been utilized in education. However, there remain significant gaps, which necessitate further research to shed light on this area.

7 Conclusions

This systematic review has shed light on the varied initial attempts at incorporating ChatGPT into education by both learners and educators, while also offering insights and considerations that can facilitate its effective and responsible use in future educational contexts. From the analysis of 14 selected studies, the review revealed the dual-edged impact of ChatGPT in educational settings. On the positive side, ChatGPT significantly aided the learning process in various ways. Learners have used it as a virtual intelligent assistant, benefiting from its ability to provide immediate feedback, on-demand answers, and easy access to educational resources. Additionally, it was clear that learners have used it to enhance their writing and language skills, engaging in practices such as generating ideas, composing essays, and performing tasks like summarizing, translating, paraphrasing texts, or checking grammar. Importantly, other learners have utilized it in supporting and facilitating their directed and personalized learning on a broad range of educational topics, assisting in understanding concepts and homework, providing structured learning plans, and clarifying assignments and tasks. Educators, on the other hand, found ChatGPT beneficial for enhancing productivity and efficiency. They used it for creating lesson plans, generating quizzes, providing additional resources, and answers learners' questions, which saved time and allowed for more dynamic and engaging teaching strategies and methodologies.

However, the review also pointed out negative impacts. The results revealed that overuse of ChatGPT could decrease innovative capacities and collaborative learning among learners. Specifically, relying too much on ChatGPT for quick answers can inhibit learners' critical thinking and problem-solving skills. Learners might not engage deeply with the material or consider multiple solutions to a problem. This tendency was particularly evident in group projects, where learners preferred consulting ChatGPT individually for solutions over brainstorming and collaborating with peers, which negatively affected their teamwork abilities. On a broader level, integrating ChatGPT into education has also raised several concerns, including the potential for providing inaccurate or misleading information, issues of inequity in access, challenges related to academic integrity, and the possibility of misusing the technology.

Accordingly, this review emphasizes the urgency of developing clear rules, policies, and regulations to ensure ChatGPT's effective and responsible use in educational settings, alongside other chatbots, by both learners and educators. This requires providing well-structured training to educate them on responsible usage and understanding its limitations, along with offering sufficient background information. Moreover, it highlights the importance of rethinking, improving, and redesigning innovative teaching and assessment methods in the era of ChatGPT. Furthermore, conducting further research and engaging in discussions with policymakers and stakeholders are essential steps to maximize the benefits for both educators and learners and ensure academic integrity.

It is important to acknowledge that this review has certain limitations. Firstly, the limited inclusion of reviewed studies can be attributed to several reasons, including the novelty of the technology, as new technologies often face initial skepticism and cautious adoption; the lack of clear guidelines or best practices for leveraging this technology for educational purposes; and institutional or governmental policies affecting the utilization of this technology in educational contexts. These factors, in turn, have affected the number of studies available for review. Secondly, the utilization of the original version of ChatGPT, based on GPT-3 or GPT-3.5, implies that new studies utilizing the updated version, GPT-4 may lead to different findings. Therefore, conducting follow-up systematic reviews is essential once more empirical studies on ChatGPT are published. Additionally, long-term studies are necessary to thoroughly examine and assess the impact of ChatGPT on various educational practices.

Despite these limitations, this systematic review has highlighted the transformative potential of ChatGPT in education, revealing its diverse utilization by learners and educators alike and summarized the benefits of incorporating it into education, as well as the forefront critical concerns and challenges that must be addressed to facilitate its effective and responsible use in future educational contexts. This review could serve as an insightful resource for practitioners who seek to integrate ChatGPT into education and stimulate further research in the field.

Data availability

The data supporting our findings are available upon request.

Abbreviations

  • Artificial intelligence

AI in education

Large language model

Artificial neural networks

Chat Generative Pre-Trained Transformer

Recurrent neural networks

Long short-term memory

Reinforcement learning from human feedback

Natural language processing

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

AlAfnan MA, Dishari S, Jovic M, Lomidze K. ChatGPT as an educational tool: opportunities, challenges, and recommendations for communication, business writing, and composition courses. J Artif Intell Technol. 2023. https://doi.org/10.37965/jait.2023.0184 .

Article   Google Scholar  

Ali JKM, Shamsan MAA, Hezam TA, Mohammed AAQ. Impact of ChatGPT on learning motivation. J Engl Stud Arabia Felix. 2023;2(1):41–9. https://doi.org/10.56540/jesaf.v2i1.51 .

Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. 2023. https://doi.org/10.7759/cureus.35179 .

Anderson N, Belavý DL, Perle SM, Hendricks S, Hespanhol L, Verhagen E, Memon AR. AI did not write this manuscript, or did it? Can we trick the AI text detector into generated texts? The potential future of ChatGPT and AI in sports & exercise medicine manuscript generation. BMJ Open Sport Exerc Med. 2023;9(1): e001568. https://doi.org/10.1136/bmjsem-2023-001568 .

Ausat AMA, Massang B, Efendi M, Nofirman N, Riady Y. Can chat GPT replace the role of the teacher in the classroom: a fundamental analysis. J Educ. 2023;5(4):16100–6.

Google Scholar  

Baidoo-Anu D, Ansah L. Education in the Era of generative artificial intelligence (AI): understanding the potential benefits of ChatGPT in promoting teaching and learning. Soc Sci Res Netw. 2023. https://doi.org/10.2139/ssrn.4337484 .

Basic Z, Banovac A, Kruzic I, Jerkovic I. Better by you, better than me, chatgpt3 as writing assistance in students essays. 2023. arXiv preprint arXiv:2302.04536 .‏

Baskara FR. The promises and pitfalls of using chat GPT for self-determined learning in higher education: an argumentative review. Prosiding Seminar Nasional Fakultas Tarbiyah dan Ilmu Keguruan IAIM Sinjai. 2023;2:95–101. https://doi.org/10.47435/sentikjar.v2i0.1825 .

Behera RK, Bala PK, Dhir A. The emerging role of cognitive computing in healthcare: a systematic literature review. Int J Med Inform. 2019;129:154–66. https://doi.org/10.1016/j.ijmedinf.2019.04.024 .

Chaka C. Detecting AI content in responses generated by ChatGPT, YouChat, and Chatsonic: the case of five AI content detection tools. J Appl Learn Teach. 2023. https://doi.org/10.37074/jalt.2023.6.2.12 .

Chiu TKF, Xia Q, Zhou X, Chai CS, Cheng M. Systematic literature review on opportunities, challenges, and future research recommendations of artificial intelligence in education. Comput Educ Artif Intell. 2023;4:100118. https://doi.org/10.1016/j.caeai.2022.100118 .

Choi EPH, Lee JJ, Ho M, Kwok JYY, Lok KYW. Chatting or cheating? The impacts of ChatGPT and other artificial intelligence language models on nurse education. Nurse Educ Today. 2023;125:105796. https://doi.org/10.1016/j.nedt.2023.105796 .

Cotton D, Cotton PA, Shipway JR. Chatting and cheating: ensuring academic integrity in the era of ChatGPT. Innov Educ Teach Int. 2023. https://doi.org/10.1080/14703297.2023.2190148 .

Crawford J, Cowling M, Allen K. Leadership is needed for ethical ChatGPT: Character, assessment, and learning using artificial intelligence (AI). J Univ Teach Learn Pract. 2023. https://doi.org/10.53761/1.20.3.02 .

Creswell JW. Educational research: planning, conducting, and evaluating quantitative and qualitative research [Ebook]. 4th ed. London: Pearson Education; 2015.

Curry D. ChatGPT Revenue and Usage Statistics (2023)—Business of Apps. 2023. https://www.businessofapps.com/data/chatgpt-statistics/

Day T. A preliminary investigation of fake peer-reviewed citations and references generated by ChatGPT. Prof Geogr. 2023. https://doi.org/10.1080/00330124.2023.2190373 .

De Castro CA. A Discussion about the Impact of ChatGPT in education: benefits and concerns. J Bus Theor Pract. 2023;11(2):p28. https://doi.org/10.22158/jbtp.v11n2p28 .

Deng X, Yu Z. A meta-analysis and systematic review of the effect of Chatbot technology use in sustainable education. Sustainability. 2023;15(4):2940. https://doi.org/10.3390/su15042940 .

Eke DO. ChatGPT and the rise of generative AI: threat to academic integrity? J Responsib Technol. 2023;13:100060. https://doi.org/10.1016/j.jrt.2023.100060 .

Elmoazen R, Saqr M, Tedre M, Hirsto L. A systematic literature review of empirical research on epistemic network analysis in education. IEEE Access. 2022;10:17330–48. https://doi.org/10.1109/access.2022.3149812 .

Farrokhnia M, Banihashem SK, Noroozi O, Wals AEJ. A SWOT analysis of ChatGPT: implications for educational practice and research. Innov Educ Teach Int. 2023. https://doi.org/10.1080/14703297.2023.2195846 .

Fergus S, Botha M, Ostovar M. Evaluating academic answers generated using ChatGPT. J Chem Educ. 2023;100(4):1672–5. https://doi.org/10.1021/acs.jchemed.3c00087 .

Fink A. Conducting research literature reviews: from the Internet to Paper. Incorporated: SAGE Publications; 2010.

Firaina R, Sulisworo D. Exploring the usage of ChatGPT in higher education: frequency and impact on productivity. Buletin Edukasi Indonesia (BEI). 2023;2(01):39–46. https://doi.org/10.56741/bei.v2i01.310 .

Firat, M. (2023). How chat GPT can transform autodidactic experiences and open education.  Department of Distance Education, Open Education Faculty, Anadolu Unive .‏ https://orcid.org/0000-0001-8707-5918

Firat M. What ChatGPT means for universities: perceptions of scholars and students. J Appl Learn Teach. 2023. https://doi.org/10.37074/jalt.2023.6.1.22 .

Fuchs K. Exploring the opportunities and challenges of NLP models in higher education: is Chat GPT a blessing or a curse? Front Educ. 2023. https://doi.org/10.3389/feduc.2023.1166682 .

García-Peñalvo FJ. La percepción de la inteligencia artificial en contextos educativos tras el lanzamiento de ChatGPT: disrupción o pánico. Educ Knowl Soc. 2023;24: e31279. https://doi.org/10.14201/eks.31279 .

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor A, Chartash D. How does ChatGPT perform on the United States medical Licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9: e45312. https://doi.org/10.2196/45312 .

Hashana AJ, Brundha P, Ayoobkhan MUA, Fazila S. Deep Learning in ChatGPT—A Survey. In   2023 7th international conference on trends in electronics and informatics (ICOEI) . 2023. (pp. 1001–1005). IEEE. https://doi.org/10.1109/icoei56765.2023.10125852

Hirsto L, Saqr M, López-Pernas S, Valtonen T. (2022). A systematic narrative review of learning analytics research in K-12 and schools.  Proceedings . https://ceur-ws.org/Vol-3383/FLAIEC22_paper_9536.pdf

Hisan UK, Amri MM. ChatGPT and medical education: a double-edged sword. J Pedag Educ Sci. 2023;2(01):71–89. https://doi.org/10.13140/RG.2.2.31280.23043/1 .

Hopkins AM, Logan JM, Kichenadasse G, Sorich MJ. Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr. 2023. https://doi.org/10.1093/jncics/pkad010 .

Househ M, AlSaad R, Alhuwail D, Ahmed A, Healy MG, Latifi S, Sheikh J. Large Language models in medical education: opportunities, challenges, and future directions. JMIR Med Educ. 2023;9: e48291. https://doi.org/10.2196/48291 .

Ilkka T. The impact of artificial intelligence on learning, teaching, and education. Minist de Educ. 2018. https://doi.org/10.2760/12297 .

Iqbal N, Ahmed H, Azhar KA. Exploring teachers’ attitudes towards using CHATGPT. Globa J Manag Adm Sci. 2022;3(4):97–111. https://doi.org/10.46568/gjmas.v3i4.163 .

Irfan M, Murray L, Ali S. Integration of Artificial intelligence in academia: a case study of critical teaching and learning in Higher education. Globa Soc Sci Rev. 2023;8(1):352–64. https://doi.org/10.31703/gssr.2023(viii-i).32 .

Jeon JH, Lee S. Large language models in education: a focus on the complementary relationship between human teachers and ChatGPT. Educ Inf Technol. 2023. https://doi.org/10.1007/s10639-023-11834-1 .

Khan RA, Jawaid M, Khan AR, Sajjad M. ChatGPT—Reshaping medical education and clinical management. Pak J Med Sci. 2023. https://doi.org/10.12669/pjms.39.2.7653 .

King MR. A conversation on artificial intelligence, Chatbots, and plagiarism in higher education. Cell Mol Bioeng. 2023;16(1):1–2. https://doi.org/10.1007/s12195-022-00754-8 .

Kooli C. Chatbots in education and research: a critical examination of ethical implications and solutions. Sustainability. 2023;15(7):5614. https://doi.org/10.3390/su15075614 .

Kuhail MA, Alturki N, Alramlawi S, Alhejori K. Interacting with educational chatbots: a systematic review. Educ Inf Technol. 2022;28(1):973–1018. https://doi.org/10.1007/s10639-022-11177-3 .

Lee H. The rise of ChatGPT: exploring its potential in medical education. Anat Sci Educ. 2023. https://doi.org/10.1002/ase.2270 .

Li L, Subbareddy R, Raghavendra CG. AI intelligence Chatbot to improve students learning in the higher education platform. J Interconnect Netw. 2022. https://doi.org/10.1142/s0219265921430325 .

Limna P. A Review of Artificial Intelligence (AI) in Education during the Digital Era. 2022. https://ssrn.com/abstract=4160798

Lo CK. What is the impact of ChatGPT on education? A rapid review of the literature. Educ Sci. 2023;13(4):410. https://doi.org/10.3390/educsci13040410 .

Luo W, He H, Liu J, Berson IR, Berson MJ, Zhou Y, Li H. Aladdin’s genie or pandora’s box For early childhood education? Experts chat on the roles, challenges, and developments of ChatGPT. Early Educ Dev. 2023. https://doi.org/10.1080/10409289.2023.2214181 .

Meyer JG, Urbanowicz RJ, Martin P, O’Connor K, Li R, Peng P, Moore JH. ChatGPT and large language models in academia: opportunities and challenges. Biodata Min. 2023. https://doi.org/10.1186/s13040-023-00339-9 .

Mhlanga D. Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning. Soc Sci Res Netw. 2023. https://doi.org/10.2139/ssrn.4354422 .

Neumann, M., Rauschenberger, M., & Schön, E. M. (2023). “We Need To Talk About ChatGPT”: The Future of AI and Higher Education.‏ https://doi.org/10.1109/seeng59157.2023.00010

Nolan B. Here are the schools and colleges that have banned the use of ChatGPT over plagiarism and misinformation fears. Business Insider . 2023. https://www.businessinsider.com

O’Leary DE. An analysis of three chatbots: BlenderBot, ChatGPT and LaMDA. Int J Intell Syst Account, Financ Manag. 2023;30(1):41–54. https://doi.org/10.1002/isaf.1531 .

Okoli C. A guide to conducting a standalone systematic literature review. Commun Assoc Inf Syst. 2015. https://doi.org/10.17705/1cais.03743 .

OpenAI. (2023). https://openai.com/blog/chatgpt

Perkins M. Academic integrity considerations of AI large language models in the post-pandemic era: ChatGPT and beyond. J Univ Teach Learn Pract. 2023. https://doi.org/10.53761/1.20.02.07 .

Plevris V, Papazafeiropoulos G, Rios AJ. Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard. arXiv (Cornell University) . 2023. https://doi.org/10.48550/arxiv.2305.18618

Rahman MM, Watanobe Y (2023) ChatGPT for education and research: opportunities, threats, and strategies. Appl Sci 13(9):5783. https://doi.org/10.3390/app13095783

Ram B, Verma P. Artificial intelligence AI-based Chatbot study of ChatGPT, google AI bard and baidu AI. World J Adv Eng Technol Sci. 2023;8(1):258–61. https://doi.org/10.30574/wjaets.2023.8.1.0045 .

Rasul T, Nair S, Kalendra D, Robin M, de Oliveira Santini F, Ladeira WJ, Heathcote L. The role of ChatGPT in higher education: benefits, challenges, and future research directions. J Appl Learn Teach. 2023. https://doi.org/10.37074/jalt.2023.6.1.29 .

Ratnam M, Sharm B, Tomer A. ChatGPT: educational artificial intelligence. Int J Adv Trends Comput Sci Eng. 2023;12(2):84–91. https://doi.org/10.30534/ijatcse/2023/091222023 .

Ray PP. ChatGPT: a comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet Things Cyber-Phys Syst. 2023;3:121–54. https://doi.org/10.1016/j.iotcps.2023.04.003 .

Roumeliotis KI, Tselikas ND. ChatGPT and Open-AI models: a preliminary review. Future Internet. 2023;15(6):192. https://doi.org/10.3390/fi15060192 .

Rudolph J, Tan S, Tan S. War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. J Appl Learn Teach. 2023. https://doi.org/10.37074/jalt.2023.6.1.23 .

Ruiz LMS, Moll-López S, Nuñez-Pérez A, Moraño J, Vega-Fleitas E. ChatGPT challenges blended learning methodologies in engineering education: a case study in mathematics. Appl Sci. 2023;13(10):6039. https://doi.org/10.3390/app13106039 .

Sallam M, Salim NA, Barakat M, Al-Tammemi AB. ChatGPT applications in medical, dental, pharmacy, and public health education: a descriptive study highlighting the advantages and limitations. Narra J. 2023;3(1): e103. https://doi.org/10.52225/narra.v3i1.103 .

Salvagno M, Taccone FS, Gerli AG. Can artificial intelligence help for scientific writing? Crit Care. 2023. https://doi.org/10.1186/s13054-023-04380-2 .

Saqr M, López-Pernas S, Helske S, Hrastinski S. The longitudinal association between engagement and achievement varies by time, students’ profiles, and achievement state: a full program study. Comput Educ. 2023;199:104787. https://doi.org/10.1016/j.compedu.2023.104787 .

Saqr M, Matcha W, Uzir N, Jovanović J, Gašević D, López-Pernas S. Transferring effective learning strategies across learning contexts matters: a study in problem-based learning. Australas J Educ Technol. 2023;39(3):9.

Schöbel S, Schmitt A, Benner D, Saqr M, Janson A, Leimeister JM. Charting the evolution and future of conversational agents: a research agenda along five waves and new frontiers. Inf Syst Front. 2023. https://doi.org/10.1007/s10796-023-10375-9 .

Shoufan A. Exploring students’ perceptions of CHATGPT: thematic analysis and follow-up survey. IEEE Access. 2023. https://doi.org/10.1109/access.2023.3268224 .

Sonderegger S, Seufert S. Chatbot-mediated learning: conceptual framework for the design of Chatbot use cases in education. Gallen: Institute for Educational Management and Technologies, University of St; 2022. https://doi.org/10.5220/0010999200003182 .

Book   Google Scholar  

Strzelecki A. To use or not to use ChatGPT in higher education? A study of students’ acceptance and use of technology. Interact Learn Environ. 2023. https://doi.org/10.1080/10494820.2023.2209881 .

Su J, Yang W. Unlocking the power of ChatGPT: a framework for applying generative AI in education. ECNU Rev Educ. 2023. https://doi.org/10.1177/20965311231168423 .

Sullivan M, Kelly A, McLaughlan P. ChatGPT in higher education: Considerations for academic integrity and student learning. J ApplLearn Teach. 2023;6(1):1–10. https://doi.org/10.37074/jalt.2023.6.1.17 .

Szabo A. ChatGPT is a breakthrough in science and education but fails a test in sports and exercise psychology. Balt J Sport Health Sci. 2023;1(128):25–40. https://doi.org/10.33607/bjshs.v127i4.1233 .

Taecharungroj V. “What can ChatGPT do?” analyzing early reactions to the innovative AI chatbot on Twitter. Big Data Cognit Comput. 2023;7(1):35. https://doi.org/10.3390/bdcc7010035 .

Tam S, Said RB. User preferences for ChatGPT-powered conversational interfaces versus traditional methods. Biomed Eng Soc. 2023. https://doi.org/10.58496/mjcsc/2023/004 .

Tedre M, Kahila J, Vartiainen H. (2023). Exploration on how co-designing with AI facilitates critical evaluation of ethics of AI in craft education. In: Langran E, Christensen P, Sanson J (Eds).  Proceedings of Society for Information Technology and Teacher Education International Conference . 2023. pp. 2289–2296.

Tlili A, Shehata B, Adarkwah MA, Bozkurt A, Hickey DT, Huang R, Agyemang B. What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. Smart Learn Environ. 2023. https://doi.org/10.1186/s40561-023-00237-x .

Uddin SMJ, Albert A, Ovid A, Alsharef A. Leveraging CHATGPT to aid construction hazard recognition and support safety education and training. Sustainability. 2023;15(9):7121. https://doi.org/10.3390/su15097121 .

Valtonen T, López-Pernas S, Saqr M, Vartiainen H, Sointu E, Tedre M. The nature and building blocks of educational technology research. Comput Hum Behav. 2022;128:107123. https://doi.org/10.1016/j.chb.2021.107123 .

Vartiainen H, Tedre M. Using artificial intelligence in craft education: crafting with text-to-image generative models. Digit Creat. 2023;34(1):1–21. https://doi.org/10.1080/14626268.2023.2174557 .

Ventayen RJM. OpenAI ChatGPT generated results: similarity index of artificial intelligence-based contents. Soc Sci Res Netw. 2023. https://doi.org/10.2139/ssrn.4332664 .

Wagner MW, Ertl-Wagner BB. Accuracy of information and references using ChatGPT-3 for retrieval of clinical radiological information. Can Assoc Radiol J. 2023. https://doi.org/10.1177/08465371231171125 .

Wardat Y, Tashtoush MA, AlAli R, Jarrah AM. ChatGPT: a revolutionary tool for teaching and learning mathematics. Eurasia J Math, Sci Technol Educ. 2023;19(7):em2286. https://doi.org/10.29333/ejmste/13272 .

Webster J, Watson RT. Analyzing the past to prepare for the future: writing a literature review. Manag Inf Syst Quart. 2002;26(2):3.

Xiao Y, Watson ME. Guidance on conducting a systematic literature review. J Plan Educ Res. 2017;39(1):93–112. https://doi.org/10.1177/0739456x17723971 .

Yan D. Impact of ChatGPT on learners in a L2 writing practicum: an exploratory investigation. Educ Inf Technol. 2023. https://doi.org/10.1007/s10639-023-11742-4 .

Yu H. Reflection on whether Chat GPT should be banned by academia from the perspective of education and teaching. Front Psychol. 2023;14:1181712. https://doi.org/10.3389/fpsyg.2023.1181712 .

Zhu C, Sun M, Luo J, Li T, Wang M. How to harness the potential of ChatGPT in education? Knowl Manag ELearn. 2023;15(2):133–52. https://doi.org/10.34105/j.kmel.2023.15.008 .

Download references

The paper is co-funded by the Academy of Finland (Suomen Akatemia) Research Council for Natural Sciences and Engineering for the project Towards precision education: Idiographic learning analytics (TOPEILA), Decision Number 350560.

Author information

Authors and affiliations.

School of Computing, University of Eastern Finland, 80100, Joensuu, Finland

Yazid Albadarin, Mohammed Saqr, Nicolas Pope & Markku Tukiainen

You can also search for this author in PubMed   Google Scholar

Contributions

YA contributed to the literature search, data analysis, discussion, and conclusion. Additionally, YA contributed to the manuscript’s writing, editing, and finalization. MS contributed to the study’s design, conceptualization, acquisition of funding, project administration, allocation of resources, supervision, validation, literature search, and analysis of results. Furthermore, MS contributed to the manuscript's writing, revising, and approving it in its finalized state. NP contributed to the results, and discussions, and provided supervision. NP also contributed to the writing process, revisions, and the final approval of the manuscript in its finalized state. MT contributed to the study's conceptualization, resource management, supervision, writing, revising the manuscript, and approving it.

Corresponding author

Correspondence to Yazid Albadarin .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Table  4

The process of synthesizing the data presented in Table  4 involved identifying the relevant studies through a search process of databases (ERIC, Scopus, Web of Knowledge, Dimensions.ai, and lens.org) using specific keywords "ChatGPT" and "education". Following this, inclusion/exclusion criteria were applied, and data extraction was performed using Creswell's [ 15 ] coding techniques to capture key information and identify common themes across the included studies.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Albadarin, Y., Saqr, M., Pope, N. et al. A systematic literature review of empirical research on ChatGPT in education. Discov Educ 3 , 60 (2024). https://doi.org/10.1007/s44217-024-00138-2

Download citation

Received : 22 October 2023

Accepted : 10 May 2024

Published : 26 May 2024

DOI : https://doi.org/10.1007/s44217-024-00138-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Large language models
  • Educational technology
  • Systematic review

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Artificial intelligence maturity model: a systematic literature review

    ai tools for systematic literature review

  2. Mastering Systematic Literature Reviews with AI Tools

    ai tools for systematic literature review

  3. AI tools for writing Systematic Literature Reviews

    ai tools for systematic literature review

  4. A Complete Guide on How to Write Good a Literature Review

    ai tools for systematic literature review

  5. What is a Systematic Review? Ultimate Guide to Systematic Reviews

    ai tools for systematic literature review

  6. Systematic Literature Review Tools

    ai tools for systematic literature review

VIDEO

  1. Reporting Systematic Review Results

  2. Powerful AI Techniques for Systematic Literature Reviews!

  3. DON'T Ruin Your Academic Work with AI Text Humanisers and Detectors

  4. Systematic Literature Review and Meta Analysis(literature review)(quantitative analysis)

  5. How to Use AI Tools for Literature Review

  6. How to Perform Literature Review Using AI Tool?

COMMENTS

  1. Rayyan

    Rayyan Enterprise and Rayyan Teams+ make it faster, easier and more convenient for you to manage your research process across your organization. Accelerate your research across your team or organization and save valuable researcher time. Build and preserve institutional assets, including literature searches, systematic reviews, and full-text ...

  2. The Best AI Supported Systematic Literature Review Tool: Laser AI

    Accelerate your literature reviews with Laser AI through AI powered research & quality assessment tools for systematic reviews. Join Our LinkedIn Group: AI in Evidence-based Healthcare ... learn how the tool can improve your systematic review workflow by watching our webinar and/or reading the Q&A from the event. January 30, 2024. READ NOW ...

  3. Elicit: The AI Research Assistant

    In a survey of users, 10% of respondents said that Elicit saves them 5 or more hours each week. 2. In pilot projects, we were able to save research groups 50% in costs and more than 50% in time by automating data extraction work they previously did manually. 5. Elicit's users save up to 5 hours per week 1.

  4. Artificial intelligence in systematic reviews: promising when

    In our systematic review, the AI tool 'ASReview' (V.0.17.1) 9 was used for the screening of titles and abstracts by the first reviewer (SHBvD). The tool uses an active researcher-in-the-loop machine learning algorithm to rank the articles from high to low probability of eligibility for inclusion by text mining.

  5. Artificial intelligence to automate the systematic review of scientific

    A systematic literature review (SLR) is a secondary study that follows a well-established methodology to find relevant papers, ... Abstrackr is the AI-based tool that has been adopted by more independent researchers to evaluate its performance. In such studies, participants have been asked to use Abstrackr to reproduce the paper screening of ...

  6. An open source machine learning framework for efficient and ...

    It is a challenging task for any research field to screen the literature and determine what needs to be included in a systematic review in a transparent way. A new open source machine learning ...

  7. Automate your literature review with AI

    Best AI Tools for Literature Review. Since generative AI and ChatGPT came into the picture, there are heaps of AI tools for literature review available out there. Some of the most comprehensive ones are: SciSpace. SciSpace is a valuable tool to have in your arsenal. It has a repository of 270M+ papers and makes it easy to find research articles.

  8. Silvi.ai

    Silvi is an end-to-end screening and data extraction tool supporting Systematic Literature Review and Meta-analysis. Silvi helps create systematic literature reviews and meta-analyses that follow Cochrane guidelines in a highly reduced time frame, giving a fast and easy overview. It supports the user through the full process, from literature ...

  9. AI for Systematic Review

    Securely automate every stage of your literature review to produce evidence-based research faster, more accurately, and more transparently at scale. Rayyan A web-tool designed to help researchers working on systematic reviews, scoping reviews and other knowledge synthesis projects, by dramatically speeding up the process of screening and ...

  10. Role of AI Tools in Systematic Literature Review

    Using AI in systematic reviews is revolutionizing the process, making it more practical and sustainable. Incorporating AI into the process not only expedites the systematic literature reviews but also reduces human errors, and comes as a cost-effective systematic approach. In this article, we'll learn more about the role of AI in systematic ...

  11. Semantic Scholar

    Semantic Reader is an augmented reader with the potential to revolutionize scientific reading by making it more accessible and richly contextual. Try it for select papers. Semantic Scholar uses groundbreaking AI and engineering to understand the semantics of scientific literature to help Scholars discover relevant research.

  12. 15 Best AI Literature Review Tools

    10. DistellerSR. DistillerSR is a reliable literature review tool trusted by researchers for its user-friendly interface and advanced search capabilities. With DistillerSR, researchers can quickly identify relevant studies from multiple databases, saving time and effort in the literature review process. 11.

  13. AI tools in evidence synthesis

    A variety of AI tools can be used during the systematic review or evidence synthesis process. These may be used to assist with developing a search strategy; locating relevant articles or resources; or during the data screening, data extraction or synthesis stage.They can also be used to draft plain language summaries.. The overall consensus is that the AI tools can be very useful in different ...

  14. How to optimize the systematic review process using AI tools

    The effective integration of AI tools into the systematic review process has considerable potential to significantly improve efficiency and streamline the research workflow, while accelerating development of new, more targeted tools. ... Additionally, understanding background literature can be a burdensome task. AI tools such as Scite.ai finds ...

  15. 5 software tools to support your systematic review processes

    Covidence. This online platform is a core component of the Cochrane toolkit, supporting parts of the systematic review process, including title/abstract and full-text screening, documentation, and reporting. The Covidence platform enables collaboration of the entire systematic reviews team and is suitable for researchers and students at all ...

  16. Cheap, Quick, and Rigorous: Artificial Intelligence and the Systematic

    The systematic literature review (SLR) is the gold standard in providing research a firm evidence foundation to support decision-making. ... (2014) provide an overview of AI tools that can be used for steps in the review process, and Antons et al. (2023) provide a comprehensive review method for the creation of standalone computational ...

  17. AI-Powered Research and Literature Review Tool

    AI-powered. Reading Assistant. A Reading Space to Ideate, Create Knowledge, and Collaborate on Your Research. Smartly organize your research. Receive recommendations that cannot be ignored. Collaborate with your team to read, discuss, and share knowledge. Get Started. No credit card required.

  18. AI for Systematic Literature Reviews

    The average systematic review takes 5 researchers 1.3 years to publish (Borah et al. 2017). Summarizing literature faster is necessary for evidence-based decision-making. Summarizing literature faster is necessary for evidence-based decision-making.

  19. ASReview

    Follows the Reproducibility and Data Storage Checklist for AI-Aided Systematic Reviews; Download ASReview LAB. In 2 minutes up and running. ... Many users donate their time to continue the development of the different software tools that are part of the ASReview universe. Also, donations and research grants make innovations possible ...

  20. ATLAS.ti

    Finalize your literature review faster with comfort. ATLAS.ti makes it easy to manage, organize, and analyze articles, PDFs, excerpts, and more for your projects. Conduct a deep systematic literature review and get the insights you need with a comprehensive toolset built specifically for your research projects.

  21. A Guide to Using AI Tools to Summarize Literature Reviews

    Key Benefits of Using AI Tools to Summarize Literature Review. 1. Best alternative to traditional literature review. Traditional literature reviews or manual literature reviews can be incredibly time-consuming and often require weeks or even months to complete. Researchers have to sift through myriad articles manually, read them in detail, and ...

  22. MSK Library Guides: Systematic Review Service: AI and Reviews

    Artificial intelligence (AI) is an evolving field with a lot of interest. There is the promise that it can cut down on the amount of time it takes to complete a review, and it can! But you have to know the strengths and limits of the tool you are using. AI can be helpful for generating ideas that you vet and adapt.

  23. Research Guides: AI-Based Literature Review Tools: Home

    AI-POWERED RESEARCH ASSISTANT - finding papers, filtering study types, automating research flow, brainstorming, summarizing and more. " Elicit is a research assistant using language models like GPT-3 to automate parts of researchers' workflows. Currently, the main workflow in Elicit is Literature Review.

  24. PDF Guidelines and Standard Frameworks for Artificial Intelligence in

    A systematic review was conducted following the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA 2020) guidelines [23]. PubMed and WOS databases were systematically searched. Two reviewers screened titles, abstracts, and full texts for eligibility and performed data extraction based on a predefined data extraction sheet.

  25. Literature Filtering for Systematic Reviews with Transformers

    Identifying critical research within the growing body of academic work is an essential element of quality research. Systematic review processes, used in evidence-based medicine, formalise this as a procedure that must be followed in a research program. However, it comes with an increasing burden in terms of the time required to identify the important articles of research for a given topic. In ...

  26. [2405.15604] Text Generation: A Systematic Literature Review of Tasks

    We provide a systematic literature review comprising 244 selected papers between 2017 and 2024. This review categorizes works in text generation into five main tasks: open-ended text generation, summarization, translation, paraphrasing, and question answering. For each task, we review their relevant characteristics, sub-tasks, and specific ...

  27. Frameworks for procurement, integration, monitoring, and evaluation of

    The role of HICs in AI development is corroborated by existing literature, for example, three systematic reviews of randomized controlled trials (RCTs) assessing AI tools were published in 2021 and 2022 [55-57]. In total, these reviews included 95 studies published in English conducted across 29 countries.

  28. A systematic literature review of empirical research on ChatGPT in

    Over the last four decades, studies have investigated the incorporation of Artificial Intelligence (AI) into education. A recent prominent AI-powered technology that has impacted the education sector is ChatGPT. This article provides a systematic review of 14 empirical studies incorporating ChatGPT into various educational settings, published in 2022 and before the 10th of April 2023—the ...

  29. Environmental management accounting for ...

    A systematic literature review is especially helpful for exploring research questions and applied methodologies in emerging fields of literature ... These tools include energy accounting, water accounting, material flow cost accounting ... (AI) is increasingly recognized as a vital accounting and managerial tool, it is imperative to investigate ...

  30. Review: Digital health tools can address BP-control inequities

    Studies show that targeted digital health interventions can improve blood-pressure control levels among populations experiencing health inequities. Multicomponent digital health interventions such as remote BP monitoring, text-message reminders, and the use of community health workers or skilled nurses led to greater reductions in systolic BP ...