• Research article
  • Open access
  • Published: 15 February 2021

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

  • Alan Brnabic 1 &
  • Lisa M. Hess   ORCID: orcid.org/0000-0003-3631-3941 2  

BMC Medical Informatics and Decision Making volume  21 , Article number:  54 ( 2021 ) Cite this article

31k Accesses

61 Citations

3 Altmetric

Metrics details

Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making.

This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist.

A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies.

Conclusions

A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Peer Review reports

Traditional methods of analyzing large real-world databases (big data) and other observational studies are focused on the outcomes that can inform at the population-based level. The findings from real-world studies are relevant to populations as a whole, but the ability to predict or provide meaningful evidence at the patient level is much less well established due to the complexity with which clinical decision making is made and the variety of factors taken into account by the health care provider [ 1 , 2 ]. Using traditional methods that produce population estimates and measures of variability, it is very challenging to accurately predict how any one patient will perform, even when applying findings from subgroup analyses. The care of patients is nuanced, and multiple non-linear, interconnected factors must be taken into account in decision making. When data are available that are only relevant at the population level, health care decision making is less informed as to the optimal course of care for a given patient.

Clinical prediction models are an approach to utilizing patient-level evidence to help inform healthcare decision makers about patient care. These models are also known as prediction rules or prognostic models and have been used for decades by health care professionals [ 3 ]. Traditionally, these models combine patient demographic, clinical and treatment characteristics in the form of a statistical or mathematical model, usually regression, classification or neural networks, but deal with a limited number of predictor variables (usually below 25). The Framingham Heart Study is a classic example of the use of longitudinal data to build a traditional decision-making model. Multiple risk calculators and estimators have been built to predict a patient’s risk of a variety of cardiovascular outcomes, such as atrial fibrillation and coronary heart disease [ 4 , 5 , 6 ]. In general, these studies use multivariable regression evaluating risk factors identified in the literature. Based on these findings, a scoring system is derived for each factor to predict the likelihood of an adverse outcome based on a patient’s score across all risk factors evaluated.

With the advent of more complex data collection and readily available data sets for patients in routine clinical care, both sample sizes and potential predictor variables (such as genomic data) can exceed the tens of thousands, thus establishing the need for alternative approaches to rapidly process a large amount of information. Artificial intelligence (AI), particularly machine learning methods (a subset of AI), are increasingly being utilized in clinical research for prediction models, pattern recognition and deep-learning techniques used to combine complex information for example genomic and clinical data [ 7 , 8 , 9 ]. In the health care sciences, these methods are applied to replace a human expert to perform tasks that would otherwise take considerable time and expertise, and likely result in potential error. The underlying concept is that a machine will learn by trial and error from the data itself, to make predictions without having a pre-defined set of rules for decision making. Simply, machine learning can simply be better understood as “learning from data.” [ 8 ].

There are two types of learning from the data, unsupervised and supervised. Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. Supervised learning involves making a prediction based on a set of pre-specified input and output variables. There are a number of statistical tools used for supervised learning. Some examples include traditional statistical prediction methods like regression models (e.g. regression splines, projection pursuit regression, penalized regression) that involve fitting a model to data, evaluating the fit and estimating parameters that are later used in a predictive equation. Other tools include tree-based methods (e.g. classification and regression trees [CART] and random forests), which successively partition a data set based on the relationships between predictor variables and a target (outcome) variable. Other examples include neural networks, discriminant functions and linear classifiers, support vector classifiers and machines. Often, predictive tools are built using various forms of model aggregation (or ensemble learning) that may combine models based on resampled or re-weighted data sets. These different types of models can be fitted to the same data using model averaging.

Classical statistical regression methods used for prediction modeling are well understood in the statistical sciences and the scientific community that employs them. These methods tend to be transparent and are usually hypothesis driven but can overlook complex associations with limited flexibility when a high number of variables are investigated. In addition, when using classic regression modeling, choosing the ‘right’ model is not straightforward. Non-traditional machine learning algorithms, and machine learning approaches, may overcome some of these limitations of classical regression models in this new era of big data, but are not a complete solution as they must be considered in the context of the limitations of data used in the analysis [ 2 ].

While machine learning methods can be used for both population-based models as well as for informed patient-provider decision making, it is important to note that the data, model, and outputs used to inform the care of an individual patient must meet the highest standards of research quality, as the choice made will likely have an impact on both the long- and short-term patient outcomes. While a range of uncertainty can be expected for population-based estimates, the risk of error for patient level models must be minimized to ensure quality patient care. The risks and concerns of utilizing machine learning for individual patient decision making have been raised by ethicists [ 10 ]. The risks are not limited to the lack of transparency, limited data regarding the confidence of the findings, and the risk of reducing patient autonomy in choice by relying on data that may foster a more paternalistic model of healthcare. These are all important and valid concerns, and therefore the role of machine learning for patient care must meet the highest standards to ensure that shared, not simply informed, evidence-based decision making be supported by these methods.

A systematic literature review was published in 2018 that evaluated the statistical methods that have been used to enable large, real-world databases to be used at the patient-provider level [ 11 ]. Briefly, this study identified a total of 115 articles that evaluated the use of logistic regression (n = 52, 45.2%), Cox regression (n = 24, 20.9%), and linear regression (n = 17, 14.8%). However, an interesting observation noted several studies utilizing novel statistical approaches such as machine learning, recursive partitioning, and development of mathematical algorithms to predict patient outcomes. More recently, publications are emerging describing the use of Individualized Treatment Recommendation algorithms and Outcome Weighted Learning for personalized medicine using large observational databases [ 12 , 13 ]. Therefore, this systematic literature review was designed to further pursue this observation to more comprehensively evaluate the use of machine learning methods to support patient-provider decision making, and to critically evaluate the strengths and weaknesses of these methods. For the purposes of this work, data supporting patient-provider decision making was defined as that which provided information specifically on a treatment or intervention choice; while both population-based and risk estimator data are certainly valuable for patient care and decision making, this study was designed to evaluate data that would specifically inform a choice for the patient with the provider. The overarching goal is to provide evidence of how large datasets can be used to inform decisions at the patient level using machine learning-based methods, and to evaluate the quality of such work to support informed decision making.

This study originated from a systematic literature review that was conducted in MEDLINE and PsychInfo; a refreshed search was conducted in September 2020 to obtain newer publications (Table 1 ). Eligible studies were those that analyzed prospective or retrospective observational data, reported quantitative results, and described statistical methods specifically applicable to patient-level decision making. Specifically, patient-level decision making referred to studies that provided data for or against a particular intervention at the patient level, so that the data could be used to inform decision making at the patient-provider level. Studies did not meet this criterion if only a population-based estimates, mortality risk predictors, or satisfaction with care were evaluated. Additionally, studies designed to improve diagnostic tools and those evaluating health care system quality indicators did not meet the patient-provider decision-making criterion. Eligible statistical methods for this study were limited to machine learning-based approaches. Eligibility was assessed by two reviewers and any discrepancies were discussed; a third reviewer was available to serve as a tie breaker in case of different opinions. The final set of eligible publications were then abstracted into a Microsoft Excel document. Study quality was evaluated using a modified Luo scale, which was developed specifically as a tool to standardize high-quality publication of machine learning models [ 14 ]. A modified version of this tool was utilized for this study; specifically, the optional item were removed, and three terms were clarified: item 6 (define the prediction problem) was redefined as “define the model,” item 7 (prepare data for model building) was renamed “model building and validation,” and item 8 (build the predictive model) was renamed “model selection” to more succinctly state what was being evaluated under each criterion. Data were abstracted and both extracted data and the Luo checklist items were reviewed and verified by a second reviewer to ensure data comprehensiveness and quality. In all cases of differences in eligibility assessment or data entry, the reviewers met and ensured agreement with the final set of data to be included in the database for data synthesis, with a third reviewer utilized as a tie breaker in case of discrepancies. Data were summarized descriptively and qualitatively, based on the following categories: publication and study characteristics; patient characteristics; statistical methodologies used, including statistical software packages; strengths and weaknesses; and interpretation of findings.

The search strategy was run on September 1, 2020 and identified a total of 34 publications that utilized machine learning methods for individual patient-level decision making (Fig.  1 ). The most common reason for study exclusion, as expected, was due to the study not meeting the patient-level decision making criterion. A summary of the characteristics of eligible studies and the patient data are included in Table 2 . Most of the real-world data sources included retrospective databases or designs (n = 27, 79.4%), primarily utilizing electronic health records. Six analyses utilized prospective cohort studies and one utilized data from a cross sectional study.

figure 1

Prisma diagram of screening and study identification

General approaches to machine learning

The types of classification or prediction machine learning algorithms are reported in Table 2 . These included decision tree/random forest analyses (19 studies) [ 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 ] and neural networks (19 studies) [ 24 , 25 , 26 , 27 , 28 , 29 , 30 , 32 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ]. Other approaches included latent growth mixture modeling [ 45 ], support vector machine classifiers [ 46 ], LASSO regression [ 47 ], boosting methods [ 23 ], and a novel Bayesian approach [ 26 , 40 , 48 ]. Within the analytical approaches to support machine learning, a variety of methods were used to evaluate model fit, such as Akaike Information Criterion, Bayesian Information Criterion, and the Lo-Mendel-Rubin likelihood ratio test [ 22 , 45 , 47 ], and while most studies included the area under the curve (AUC) of receiver-operator characteristic (ROC) curves (Table 3 ), analyses also included sensitivity/specificity [ 16 , 19 , 24 , 30 , 41 , 42 , 43 ], positive predictive value [ 21 , 26 , 32 , 38 , 40 , 41 , 42 , 43 ], and a variety of less common approaches such as the geometric mean [ 16 ], use of the Matthews correlation coefficient (ranges from -1.0, completely erroneous information, to + 1.0, perfect prediction) [ 46 ], defining true/false negatives/positives by means of a confusion matrix [ 17 ], calculating the root mean square error of the predicted versus original outcome profiles [ 37 ], or identifying the model with the best average performance training and performance cross validation [ 36 ].

Statistical software packages

The statistical programs used to perform machine learning varied widely across these studies, no consistencies were observed (Table 2 ). As noted above, one study using decision tree analysis used Quinlan’s C5.0 decision tree algorithm [ 15 ] while a second used an earlier version of this program (C4.5) [ 20 ]. Other decision tree analyses utilized various versions of R [ 18 , 19 , 22 , 24 , 27 , 47 ], International Business Machines (IBM) Statistical Package for the Social Sciences (SPSS) [ 16 , 17 , 33 , 47 ], the Azure Machine Learning Platform [ 30 ], or programmed the model using Python [ 23 , 25 , 46 ]. Artificial neural network analyses used Neural Designer [ 34 ] or Statistica V10 [ 35 ]. Six studies did not report the software used for analysis [ 21 , 31 , 32 , 37 , 41 , 42 ].

Families of machine learning algorithms

Also as summarized in Table 2 , more than one third of all publications (n = 13, 38.2%) applied only one family of machine learning algorithm to model development [ 16 , 17 , 18 , 19 , 20 , 34 , 37 , 41 , 42 , 43 , 46 , 48 ]; and only four studies utilized five or more methods [ 23 , 25 , 28 , 45 ]. One applied an ensemble of six different algorithms and the software was set to run 200 iterations [ 23 ], and another ran seven algorithms [ 45 ].

Internal and external validation

Evaluation of study publication quality identified the most common gap in publications as the lack of external validation, which was conducted by only two studies [ 15 , 20 ]. Seven studies predefined the success criteria for model performance [ 20 , 21 , 23 , 35 , 36 , 46 , 47 ], and five studies discussed the generalizability of the model [ 20 , 23 , 34 , 45 , 48 ]. Six studies [ 17 , 18 , 21 , 22 , 35 , 36 ] discussed the balance between model accuracy and model simplicity or interpretability, which was also a criterion of quality publication in the Luo scale [ 14 ]. The items on the checklist that were least frequently met are presented in Fig.  2 . The complete quality assessment evaluation for each item in the checklist is included in Additional file 1 : Table S1.

figure 2

Least frequently met study quality items, modified Luo Scale [ 14 ]

There were a variety of approaches taken to validate the models developed (Table 3 ). Internal validation with splitting into a testing and validation dataset was performed in all studies. The cohort splitting approach was conducted in multiple ways, using a 2:1 split [ 26 ], 60/40 split [ 21 , 36 ], a 70/30 split [ 16 , 17 , 22 , 30 , 33 , 35 ], 75/25 split [ 27 , 40 ], 80/20 split [ 46 ], 90/10 split [ 25 , 29 ], splitting the data based on site of care [ 48 ], a 2/1/1 split for training, testing and validation [ 38 ], and splitting 60/20/20, where the third group was selected for model selection purposes prior to validation [ 34 ]. Nine studies did not specifically mention the form of splitting approach used [ 15 , 18 , 19 , 20 , 24 , 29 , 39 , 45 , 47 ], but most of those noted the use of k fold cross validation. One training set corresponded to 90% of the sample [ 23 ], whereas a second study was less clear, as input data were at the observation level with multiple observations per patient, and 3 of the 15 patients were included in the training set [ 37 ]. The remaining studies did not specifically state splitting the data into testing and validation samples, but most specified they performed five-fold cross validation (including one that generally mentioned cohort splitting) [ 18 , 45 ] or ten-fold cross validation strategies [ 15 , 19 , 20 , 28 ].

External validation was conducted by only two studies (5.9%). Hische and colleagues conducted a decision tree analysis, which was designed to identify patients with impaired fasting glucose [ 20 ]. Their model was developed in a cohort study of patients from the Berlin Potsdam Cohort Study (n = 1527) and was found to have a positive predictive value of 56.2% and a negative predictive value of 89.1%. The model was then tested on an independent from the Dresden Cohort (n = 1998) with a family history of type II diabetes. In external validation, positive predictive value was 43.9% and negative predictive value was 90.4% [ 20 ]. Toussi and colleagues conducted both internal and external validation in their decision tree analysis to evaluate individual physician prescribing behaviors using a database of 463 patient electronic medical records [ 15 ]. For the internal validation step, the cross-validation option was used from Quinlan’s C5.0 decision tree learning algorithm as their study sample was too small to split into a testing and validation sample, and external validation was conducted by comparing outcomes to published treatment guidelines. Unfortunately, they found little concordance between physician behavior and guidelines potentially due to the timing of the data not matching the time period in which guidelines were implemented, emphasizing the need for a contemporaneous external control [ 15 ].

Handling of missing values

Missing values were addressed in most studies (n = 21, 61.8%) in this review, but there were thirteen remaining studies that did not mention if there were missing data or how they were handled (Table 3 ). For those that reported methods related to missing data, there were a wide variety of approaches used in real-world datasets. The full information maximum likelihood method was used for estimating model parameters in the presence of missing data for the development of the model by Hertroijs and colleagues, but patients with missing covariate values at baseline were excluded from the validation of the model [ 45 ]. Missing covariate values were included in models as a discrete category [ 48 ]. Four studies removed patients from the model with missing data [ 46 ], resulting in the loss of 16%-41% of samples in three studies [ 17 , 36 , 47 ]. Missing data from primary outcome variables were reported among with 59% (men) and 70% (women) within a study of diabetes [ 16 ]. In this study, single imputation was used; for continuous variables CART (IBM SPSS modeler V14.2.03) and for categorical variables the authors used the weighted K-Nearest Neighbor approach using RapidMiner (V.5) [ 16 ]. Other studies reported exclusion but not specifically the impact on sample size [ 29 , 31 , 38 , 44 ]. Imputation was conducted in a variety of ways for studies with missing data [ 22 , 25 , 28 , 33 ]. Single imputation was used in the study by Bannister and colleagues, but followed by multiple imputation in the final model to evaluate differences in model parameters [ 22 ]. One study imputed with a standard last-imputation-forward approach [ 26 ]. Spline techniques were used to impute missing data in the training set of one study [ 37 ]. Missingness was largely retained as an informative variable, and only variables missing for 85% or more of participants were excluded by Alaa et al. [ 23 ] while Hearn et al. used a combination of imputation and exclusion strategies [ 40 ]. Lastly, missing or incomplete data were imputed using a model-based approach by Toussi et al. [ 15 ] and using an optimal-impute algorithm by Bertsimas et al. [ 21 ].

Strengths and weaknesses noted by authors

Publications summarized the strengths and weaknesses of the machine learning methods employed. Low complexity and simplicity of machine-based learning models were noted as strengths of this approach [ 15 , 20 ]. Machine learning approaches were both powerful and efficient methods to apply to large datasets [ 19 ]. It was noted that parameters in this study that were significant at the patient level were included, even if at the broader population-based level using traditional regression analysis model development they would have not been significant and therefore would have been otherwise excluded using traditional approaches [ 34 ]. One publication noted the value of machine learning being highly dependent on the model selection strategy and parameter optimization, and that machine learning in and of itself will not provide better estimates unless these steps are conducted properly [ 23 ].

Even when properly planned, machine learning approaches are not without issues that deserve attention in future studies that employ these techniques. Within the eligible publications, weaknesses included overfitting the model with the inclusion of too much detail [ 15 ]. Additional limitations are based on the data sources used for machine learning, such as the lack of availability of all desired variables and missing data that can affect the development and performance of these models [ 16 , 34 , 36 , 48 ]. The lack of all relevant variables was noted as a particular concern for retrospective database studies, where the investigator is limited to what has been recorded [ 26 , 28 , 29 , 38 , 40 ]. Importantly and as observed in the studies included in this review, the lack of external validation was stated as a limitation of studies included in this review [ 28 , 30 , 38 , 42 ].

Limitations can also be on the part of the research team, as the need for both clinical and statistical expertise in the development and execution of studies using machine learning-based methodology, and users are warned against applying these methods blindly [ 22 ]. The importance of the role of clinical and statistical experts in the research team was noted in one study and highlighted as a strength of their work [ 21 ].

This study systematically reviewed and summarized the methods and approaches used for machine learning as applied to observational datasets that can inform patient-provider decision making. Machine learning methods have been applied much more broadly across observational studies than in the context of individual decision making, so the summary of this work does not necessarily apply to all machine learning-based studies. The focus of this work is on an area that remains largely unexplored, which is how to use large datasets in a manner that can inform and improve patient care in a way that supports shared decision making with reliable evidence that is applicable to the individual patient. Multiple publications cite the limitations of using population-based estimates for individual decisions [ 49 , 50 , 51 ]. Specifically, a summary statistic at the population level does not apply to each person in that cohort. Population estimates represent a point on a potentially wide distribution, and any one patient could fall anywhere within that distribution and be far from the point estimate value. On the other extreme, case reports or case series provide very specific individual-level data, but are not generalizable to other patients [ 52 ]. This review and summary provides guidance and suggestions of best practices to improve and hopefully increase the use of these methods to provide data and models to inform patient-provider decision making.

It was common for single modeling strategies to be employed within the identified publications. It has long been known that single algorithms to estimation can produce a fair amount of uncertainty and variability [ 53 ]. To overcome this limitation, there is a need for multiple algorithms and multiple iterations of the models to be performed. This, combined with more powerful analytics in recent years, provides a new standard for machine learning algorithm choice and development. While in some cases, a single model may fit the data well and provide an accurate answer, the certainty of the model can be supported through novel approaches, such as model averaging [ 54 ]. Few studies in this review combined multiple families of modeling strategies along with multiple iterations of the models. This should become a best practice in the future and is recommended as an additional criterion to assess study quality among machine learning-based modeling [ 54 ].

External validation is critical to ensure model accuracy, but was rarely conducted in the publications included in this review. The reasons for this could be many, such as lack of appropriate datasets or due to the lack of awareness of the importance of external validation [ 55 ]. As model development using machine learning increases, there is a need for external validation prior to application of models in any patient-provider setting. The generalizability of models is largely unknown without these data. Publications that did not conduct external validation also did not note the need for this to be completed, as generalizability was discussed in only five studies, one of which had also conducted the external validation. Of the remaining four studies, the role of generalizability was noted in terms of the need for future external validation in only one study [ 48 ]. Other reviews that were more broadly conducted to evaluate machine learning methods similarly found a low rate of external validation (6.6% versus 5.9% in this study) [ 56 ]. It was shown that there was lower prediction accuracy by external validation than simply by cross validation alone. The current review, with a focus on machine learning to support decision making at a practical level, suggests external validation is an important gap that should be filled prior to using these models for patient-provider decision making.

Luo and others suggest that k -fold validation may be used with proper stratification of the response variable as part of the model selection strategy [ 14 , 55 ]. The studies identified in this review generally conducted 5- or tenfold validation. There is no formal rule for the selection for the value of k , which is typically based on the size of the dataset; as k increases, bias will be reduced, but in turn variance will increase. While the tradeoff has to be accounted for, k  = 5–10 has been found to be reasonable for most study purposes [ 57 ].

The evidence from identified publications suggests that the ethical concerns of lack of transparency and failure to report confidence in the findings are largely warranted. These limitations can be addressed through the use of multiple modeling approaches (to clarify the ‘black box’ nature of these approaches) and by including both external and high k-fold validation (to demonstrate the confidence in findings). To ensure these methods are used in a manner that improves patient care, the expectations of population-based risk prediction models of the past are no longer sufficient. It is essential that the right data, the right set of models, and appropriate validation are employed to ensure that the resulting data meet standards for high quality patient care.

This study did not evaluate the quality of the underlying real-world data used to develop, test or validate the algorithms. While not directly part of the evaluation in this review, researchers should be aware that all limitations of real-world data sources apply regardless of the methodology employed. However, when observational datasets are used for machine learning-based research, the investigator should be aware of the extent to which the methods they are using depend on the data structure and availability, and should evaluate a proposed data source to ensure it is appropriate for the machine learning project [ 45 ]. Importantly, databases should be evaluated to fully understand the variables included, as well as those variables that may have prognostic or predictive value, but may not be included in the dataset. The lack of important variables remains a concern with the use of retrospective databases for machine learning. The concerns with confounding (particularly unmeasured confounding), bias (including immortal time bias), and patient selection criteria to be in the database must also be evaluated [ 58 , 59 ]. These are factors that should be considered prior to implementing these methods, and not always at the forefront of consideration when applying machine learning approaches. The Luo checklist is a valuable tool to ensure that any machine-learning study meets high research standards for patient care, and importantly includes the evaluation of missing or potentially incorrect data (i.e. outliers) and generalizability [ 14 ]. This should be supplemented by a thorough evaluation of the potential data to inform the modeling work prior to its implementation, and ensuring that multiple modeling methods are applied.

This review found a wide variety of approaches, methods, statistical software and validation strategies that were employed in the application of machine learning methods to inform patient-provider decision making. Based on these findings, there is a need to ensure that multiple modeling approaches are employed in the development of machine learning-based models for patient care, which requires the highest research standards to reliably support shared evidence-based decision making. Models should be evaluated with clear criteria for model selection, and both internal and external validation are needed prior to applying these models to inform patient care. Few studies have yet to reach that bar of evidence to inform patient-provider decision making.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Abbreviations

Artificial intelligence

Area under the curve

Classification and regression trees

Logistic least absolute shrinkage and selector operator

Steyerberg EW, Claggett B. Towards personalized therapy for multiple sclerosis: limitations of observational data. Brain. 2018;141(5):e38-e.

Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, et al. From hype to reality: data science enabling personalized medicine. BMC Med. 2018;16(1):150.

Article   PubMed   PubMed Central   Google Scholar  

Steyerberg EW. Clinical prediction models. Berlin: Springer; 2019.

Book   Google Scholar  

Schnabel RB, Sullivan LM, Levy D, Pencina MJ, Massaro JM, D’Agostino RB Sr, et al. Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. Lancet. 2009;373(9665):739–45.

D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB. Stroke risk profile: adjustment for antihypertensive medication. Framingham Study Stroke. 1994;25(1):40–3.

Article   CAS   PubMed   Google Scholar  

Framingham Heart Study: Risk Functions 2020. https://www.framinghamheartstudy.org/ .

Gawehn E, Hiss JA, Schneider G. Deep learning in drug discovery. Mol Inf. 2016;35:3–14.

Article   CAS   Google Scholar  

Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18(6):463–77.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Marcus G. Deep learning: A critical appraisal. arXiv preprint arXiv:180100631. 2018.

Grote T, Berens P. On the ethics of algorithmic decision-making in healthcare. J Med Ethics. 2020;46(3):205–11.

Article   PubMed   Google Scholar  

Brnabic A, Hess L, Carter GC, Robinson R, Araujo A, Swindle R. Methods used for the applicability of real-world data sources to individual patient decision making. Value Health. 2018;21:S102.

Article   Google Scholar  

Fu H, Zhou J, Faries DE. Estimating optimal treatment regimes via subgroup identification in randomized control trials and observational studies. Stat Med. 2016;35(19):3285–302.

Liang M, Ye T, Fu H. Estimating individualized optimal combination therapies through outcome weighted deep learning algorithms. Stat Med. 2018;37(27):3869–86.

Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016;18(12):e323.

Toussi M, Lamy J-B, Le Toumelin P, Venot A. Using data mining techniques to explore physicians’ therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Med Inform Decis Mak. 2009;9(1):28.

Ramezankhani A, Hadavandi E, Pournik O, Shahrabi J, Azizi F, Hadaegh F. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study. BMJ Open. 2016;6(12):e013336.

Pei D, Zhang C, Quan Y, Guo Q. Identification of potential type II diabetes in a Chinese population with a sensitive decision tree approach. J Diabetes Res. 2019;2019:4248218.

Neefjes EC, van der Vorst MJ, Verdegaal BA, Beekman AT, Berkhof J, Verheul HM. Identification of patients with cancer with a high risk to develop delirium. Cancer Med. 2017;6(8):1861–70.

Mubeen AM, Asaei A, Bachman AH, Sidtis JJ, Ardekani BA, Initiative AsDN. A six-month longitudinal evaluation significantly improves accuracy of predicting incipient Alzheimer’s disease in mild cognitive impairment. J Neuroradiol. 2017;44(6):381–7.

Hische M, Luis-Dominguez O, Pfeiffer AF, Schwarz PE, Selbig J, Spranger J. Decision trees as a simple-to-use and reliable tool to identify individuals with impaired glucose metabolism or type 2 diabetes mellitus. Eur J Endocrinol. 2010;163(4):565.

Bertsimas D, Dunn J, Pawlowski C, Silberholz J, Weinstein A, Zhuo YD, et al. Applied informatics decision support tool for mortality predictions in patients with cancer. JCO Clin Cancer Inform. 2018;2:1–11.

Bannister CA, Halcox JP, Currie CJ, Preece A, Spasic I. A genetic programming approach to development of clinical prediction models: a case study in symptomatic cardiovascular disease. PLoS ONE. 2018;13(9):e0202685.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 UK Biobank participants. PLoS ONE. 2019;14(5):e0213653.

Baxter SL, Marks C, Kuo TT, Ohno-Machado L, Weinreb RN. Machine learning-based predictive modeling of surgical intervention in glaucoma using systemic data from electronic health records. Am J Ophthalmol. 2019;208:30–40.

Dong Y, Xu L, Fan Y, Xiang P, Gao X, Chen Y, et al. A novel surgical predictive model for Chinese Crohn’s disease patients. Medicine (Baltimore). 2019;98(46):e17510.

Hill NR, Ayoubkhani D, McEwan P, Sugrue DM, Farooqui U, Lister S, et al. Predicting atrial fibrillation in primary care using machine learning. PLoS ONE. 2019;14(11):e0224582.

Kang AR, Lee J, Jung W, Lee M, Park SY, Woo J, et al. Development of a prediction model for hypotension after induction of anesthesia using machine learning. PLoS ONE. 2020;15(4):e0231172.

Karhade AV, Ogink PT, Thio Q, Cha TD, Gormley WB, Hershman SH, et al. Development of machine learning algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation. Spine J. 2019;19(11):1764–71.

Kebede M, Zegeye DT, Zeleke BM. Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques. Comput Methods Programs Biomed. 2017;152:149–57.

Kim I, Choi HJ, Ryu JM, Lee SK, Yu JH, Kim SW, et al. A predictive model for high/low risk group according to oncotype DX recurrence score using machine learning. Eur J Surg Oncol. 2019;45(2):134–40.

Kwon JM, Jeon KH, Kim HM, Kim MJ, Lim S, Kim KH, et al. Deep-learning-based out-of-hospital cardiac arrest prognostic system to predict clinical outcomes. Resuscitation. 2019;139:84–91.

Kwon JM, Lee Y, Lee Y, Lee S, Park J. An algorithm based on deep learning for predicting in-hospital cardiac arrest. J Am Heart Assoc. 2018;7(13):26.

Scheer JK, Smith JS, Schwab F, Lafage V, Shaffrey CI, Bess S, et al. Development of a preoperative predictive model for major complications following adult spinal deformity surgery. J Neurosurg Spine. 2017;26(6):736–43.

Lopez-de-Andres A, Hernandez-Barrera V, Lopez R, Martin-Junco P, Jimenez-Trujillo I, Alvaro-Meca A, et al. Predictors of in-hospital mortality following major lower extremity amputations in type 2 diabetic patients using artificial neural networks. BMC Med Res Methodol. 2016;16(1):160.

Rau H-H, Hsu C-Y, Lin Y-A, Atique S, Fuad A, Wei L-M, et al. Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network. Comput Methods Programs Biomed. 2016;125:58–65.

Ng T, Chew L, Yap CW. A clinical decision support tool to predict survival in cancer patients beyond 120 days after palliative chemotherapy. J Palliat Med. 2012;15(8):863–9.

Pérez-Gandía C, Facchinetti A, Sparacino G, Cobelli C, Gómez E, Rigla M, et al. Artificial neural network algorithm for online glucose prediction from continuous glucose monitoring. Diabetes Technol Therapeut. 2010;12(1):81–8.

Azimi P, Mohammadi HR, Benzel EC, Shahzadi S, Azhari S. Use of artificial neural networks to decision making in patients with lumbar spinal canal stenosis. J Neurosurg Sci. 2017;61(6):603–11.

Bowman A, Rudolfer S, Weller P, Bland JDP. A prognostic model for the patient-reported outcome of surgical treatment of carpal tunnel syndrome. Muscle Nerve. 2018;58(6):784–9.

Hearn J, Ross HJ, Mueller B, Fan CP, Crowdy E, Duhamel J, et al. Neural networks for prognostication of patients with heart failure. Circ. 2018;11(8):e005193.

Google Scholar  

Isma’eel HA, Cremer PC, Khalaf S, Almedawar MM, Elhajj IH, Sakr GE, et al. Artificial neural network modeling enhances risk stratification and can reduce downstream testing for patients with suspected acute coronary syndromes, negative cardiac biomarkers, and normal ECGs. Int J Cardiovasc Imaging. 2016;32(4):687–96.

Isma’eel HA, Sakr GE, Serhan M, Lamaa N, Hakim A, Cremer PC, et al. Artificial neural network-based model enhances risk stratification and reduces non-invasive cardiac stress imaging compared to Diamond-Forrester and Morise risk assessment models: a prospective study. J Nucl Cardiol. 2018;25(5):1601–9.

Jovanovic P, Salkic NN, Zerem E. Artificial neural network predicts the need for therapeutic ERCP in patients with suspected choledocholithiasis. Gastrointest Endosc. 2014;80(2):260–8.

Zhou HF, Huang M, Ji JS, Zhu HD, Lu J, Guo JH, et al. Risk prediction for early biliary infection after percutaneous transhepatic biliary stent placement in malignant biliary obstruction. J Vasc Interv Radiol. 2019;30(8):1233-41.e1.

Hertroijs DF, Elissen AM, Brouwers MC, Schaper NC, Köhler S, Popa MC, et al. A risk score including body mass index, glycated haemoglobin and triglycerides predicts future glycaemic control in people with type 2 diabetes. Diabetes Obes Metab. 2018;20(3):681–8.

Oviedo S, Contreras I, Quiros C, Gimenez M, Conget I, Vehi J. Risk-based postprandial hypoglycemia forecasting using supervised learning. Int J Med Inf. 2019;126:1–8.

Khanji C, Lalonde L, Bareil C, Lussier MT, Perreault S, Schnitzer ME. Lasso regression for the prediction of intermediate outcomes related to cardiovascular disease prevention using the TRANSIT quality indicators. Med Care. 2019;57(1):63–72.

Anderson JP, Parikh JR, Shenfeld DK, Ivanov V, Marks C, Church BW, et al. Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records. J Diabetes Sci Technol. 2016;10(1):6–18.

Patsopoulos NA. A pragmatic view on pragmatic trials. Dialogues Clin Neurosci. 2011;13(2):217–24.

Lu CY. Observational studies: a review of study designs, challenges and strategies to reduce confounding. Int J Clin Pract. 2009;63(5):691–7.

Morgenstern H. Ecologic studies in epidemiology: concepts, principles, and methods. Annu Rev Public Health. 1995;16(1):61–81.

Vandenbroucke JP. In defense of case reports and case series. Ann Intern Med. 2001;134(4):330–4.

Buckland ST, Burnham KP, Augustin NH. Model selection: an integral part of inference. Biometrics. 1997;53:603–18.

Zagar A, Kadziola Z, Lipkovich I, Madigan D, Faries D. Evaluating bias control strategies in observational studies using frequentist model averaging 2020 (submitted).

Kang J, Schwartz R, Flickinger J, Beriwal S. Machine learning approaches for predicting radiation therapy outcomes: a clinician’s perspective. Int J Radiat Oncol Biol Phys. 2015;93(5):1127–35.

Scott IM, Lin W, Liakata M, Wood J, Vermeer CP, Allaway D, et al. Merits of random forests emerge in evaluation of chemometric classifiers by external validation. Anal Chim Acta. 2013;801:22–33.

Kuhn M, Johnson K. Applied predictive modeling. Berlin: Springer; 2013.

Hess L, Winfree K, Muehlenbein C, Zhu Y, Oton A, Princic N. Debunking Myths While Understanding Limitations. Am J Public Health. 2020;110(5):E2-E.

Thesmar D, Sraer D, Pinheiro L, Dadson N, Veliche R, Greenberg P. Combining the power of artificial intelligence with the richness of healthcare claims data: Opportunities and challenges. PharmacoEconomics. 2019;37(6):745–52.

Download references

Acknowledgements

Not applicable.

No funding was received for the conduct of this study.

Author information

Authors and affiliations.

Eli Lilly and Company, Sydney, NSW, Australia

Alan Brnabic

Eli Lilly and Company, Indianapolis, IN, USA

Lisa M. Hess

You can also search for this author in PubMed   Google Scholar

Contributions

AB and LMH contributed to the design, implementation, analysis and interpretation of the data included in this study. AB and LMH wrote, revised and finalized the manuscript for submission. AB and LMH have both read and approved the final manuscript.

Corresponding author

Correspondence to Lisa M. Hess .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

Authors are employees of Eli Lilly and Company and receive salary support in that role.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Table S1. Study quality of eligible publications, modified Luo scale [14].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Brnabic, A., Hess, L.M. Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak 21 , 54 (2021). https://doi.org/10.1186/s12911-021-01403-2

Download citation

Received : 07 July 2020

Accepted : 20 January 2021

Published : 15 February 2021

DOI : https://doi.org/10.1186/s12911-021-01403-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Decision making
  • Decision tree
  • Random forest
  • Automated neural network

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

literature review machine learning example

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 February 2021

An open source machine learning framework for efficient and transparent systematic reviews

  • Rens van de Schoot   ORCID: orcid.org/0000-0001-7736-2091 1 ,
  • Jonathan de Bruin   ORCID: orcid.org/0000-0002-4297-0502 2 ,
  • Raoul Schram 2 ,
  • Parisa Zahedi   ORCID: orcid.org/0000-0002-1610-3149 2 ,
  • Jan de Boer   ORCID: orcid.org/0000-0002-0531-3888 3 ,
  • Felix Weijdema   ORCID: orcid.org/0000-0001-5150-1102 3 ,
  • Bianca Kramer   ORCID: orcid.org/0000-0002-5965-6560 3 ,
  • Martijn Huijts   ORCID: orcid.org/0000-0002-8353-0853 4 ,
  • Maarten Hoogerwerf   ORCID: orcid.org/0000-0003-1498-2052 2 ,
  • Gerbrich Ferdinands   ORCID: orcid.org/0000-0002-4998-3293 1 ,
  • Albert Harkema   ORCID: orcid.org/0000-0002-7091-1147 1 ,
  • Joukje Willemsen   ORCID: orcid.org/0000-0002-7260-0828 1 ,
  • Yongchao Ma   ORCID: orcid.org/0000-0003-4100-5468 1 ,
  • Qixiang Fang   ORCID: orcid.org/0000-0003-2689-6653 1 ,
  • Sybren Hindriks 1 ,
  • Lars Tummers   ORCID: orcid.org/0000-0001-9940-9874 5 &
  • Daniel L. Oberski   ORCID: orcid.org/0000-0001-7467-2297 1 , 6  

Nature Machine Intelligence volume  3 ,  pages 125–133 ( 2021 ) Cite this article

81k Accesses

312 Citations

167 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Computer science
  • Medical research

A preprint version of the article is available at arXiv.

To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.

Similar content being viewed by others

literature review machine learning example

AI-assisted peer review

literature review machine learning example

A typology for exploring the mitigation of shortcut behaviour

literature review machine learning example

Distributed peer review enhanced with natural language processing and machine learning

With the emergence of online publishing, the number of scientific manuscripts on many topics is skyrocketing 1 . All of these textual data present opportunities to scholars and practitioners while simultaneously confronting them with new challenges. Scholars often develop systematic reviews and meta-analyses to develop comprehensive overviews of the relevant topics 2 . The process entails several explicit and, ideally, reproducible steps, including identifying all likely relevant publications in a standardized way, extracting data from eligible studies and synthesizing the results. Systematic reviews differ from traditional literature reviews in that they are more replicable and transparent 3 , 4 . Such systematic overviews of literature on a specific topic are pivotal not only for scholars, but also for clinicians, policy-makers, journalists and, ultimately, the general public 5 , 6 , 7 .

Given that screening the entire research literature on a given topic is too labour intensive, scholars often develop quite narrow searches. Developing a search strategy for a systematic review is an iterative process aimed at balancing recall and precision 8 , 9 ; that is, including as many potentially relevant studies as possible while simultaneously limiting the total number of studies retrieved. The vast number of publications in the field of study often leads to a relatively precise search, with the risk of missing relevant studies. The process of systematic reviewing is error prone and extremely time intensive 10 . In fact, if the literature of a field is growing faster than the amount of time available for systematic reviews, adequate manual review of this field then becomes impossible 11 .

The rapidly evolving field of machine learning has aided researchers by allowing the development of software tools that assist in developing systematic reviews 11 , 12 , 13 , 14 . Machine learning offers approaches to overcome the manual and time-consuming screening of large numbers of studies by prioritizing relevant studies via active learning 15 . Active learning is a type of machine learning in which a model can choose the data points (for example, records obtained from a systematic search) it would like to learn from and thereby drastically reduce the total number of records that require manual screening 16 , 17 , 18 . In most so-called human-in-the-loop 19 machine-learning applications, the interaction between the machine-learning algorithm and the human is used to train a model with a minimum number of labelling tasks. Unique to systematic reviewing is that not only do all relevant records (that is, titles and abstracts) need to seen by a researcher, but an extremely diverse range of concepts also need to be learned, thereby requiring flexibility in the modelling approach as well as careful error evaluation 11 . In the case of systematic reviewing, the algorithm(s) are interactively optimized for finding the most relevant records, instead of finding the most accurate model. The term researcher-in-the-loop was introduced 20 as a special case of human-in-the-loop with three unique components: (1) the primary output of the process is a selection of the records, not a trained machine learning model; (2) all records in the relevant selection are seen by a human at the end of the process 21 ; (3) the use-case requires a reproducible workflow and complete transparency is required 22 .

Existing tools that implement such an active learning cycle for systematic reviewing are described in Table 1 ; see the Supplementary Information for an overview of all of the software that we considered (note that this list was based on a review of software tools 12 ). However, existing tools have two main drawbacks. First, many are closed source applications with black box algorithms, which is problematic as transparency and data ownership are essential in the era of open science 22 . Second, to our knowledge, existing tools lack the necessary flexibility to deal with the large range of possible concepts to be learned by a screening machine. For example, in systematic reviews, the optimal type of classifier will depend on variable parameters, such as the proportion of relevant publications in the initial search and the complexity of the inclusion criteria used by the researcher 23 . For this reason, any successful system must allow for a wide range of classifier types. Benchmark testing is crucial to understand the real-world performance of any machine learning-aided system, but such benchmark options are currently mostly lacking.

In this paper we present an open source machine learning-aided pipeline with active learning for systematic reviews called ASReview. The goal of ASReview is to help scholars and practitioners to get an overview of the most relevant records for their work as efficiently as possible while being transparent in the process. The open, free and ready-to-use software ASReview addresses all concerns mentioned above: it is open source, uses active learning, allows multiple machine learning models. It also has a benchmark mode, which is especially useful for comparing and designing algorithms. Furthermore, it is intended to be easily extensible, allowing third parties to add modules that enhance the pipeline. Although we focus this paper on systematic reviews, ASReview can handle any text source.

In what follows, we first present the pipeline for manual versus machine learning-aided systematic reviews. We then show how ASReview has been set up and how ASReview can be used in different workflows by presenting several real-world use cases. We subsequently demonstrate the results of simulations that benchmark performance and present the results of a series of user-experience tests. Finally, we discuss future directions.

Pipeline for manual and machine learning-aided systematic reviews

The pipeline of a systematic review without active learning traditionally starts with researchers doing a comprehensive search in multiple databases 24 , using free text words as well as controlled vocabulary to retrieve potentially relevant references. The researcher then typically verifies that the key papers they expect to find are indeed included in the search results. The researcher downloads a file with records containing the text to be screened. In the case of systematic reviewing it contains the titles and abstracts (and potentially other metadata such as the authors’s names, journal name, DOI) of potentially relevant references into a reference manager. Ideally, two or more researchers then screen the records’s titles and abstracts on the basis of the eligibility criteria established beforehand 4 . After all records have been screened, the full texts of the potentially relevant records are read to determine which of them will be ultimately included in the review. Most records are excluded in the title and abstract phase. Typically, only a small fraction of the records belong to the relevant class, making title and abstract screening an important bottleneck in systematic reviewing process 25 . For instance, a recent study analysed 10,115 records and excluded 9,847 after title and abstract screening, a drop of more than 95% 26 . ASReview therefore focuses on this labour-intensive step.

The research pipeline of ASReview is depicted in Fig. 1 . The researcher starts with a search exactly as described above and subsequently uploads a file containing the records (that is, metadata containing the text of the titles and abstracts) into the software. Prior knowledge is then selected, which is used for training of the first model and presenting the first record to the researcher. As screening is a binary classification problem, the reviewer must select at least one key record to include and exclude on the basis of background knowledge. More prior knowledge may result in improved efficiency of the active learning process.

figure 1

The symbols indicate whether the action is taken by a human, a computer, or whether both options are available.

A machine learning classifier is trained to predict study relevance (labels) from a representation of the record-containing text (feature space) on the basis of prior knowledge. We have purposefully chosen not to include an author name or citation network representation in the feature space to prevent authority bias in the inclusions. In the active learning cycle, the software presents one new record to be screened and labelled by the user. The user’s binary label (1 for relevant versus 0 for irrelevant) is subsequently used to train a new model, after which a new record is presented to the user. This cycle continues up to a certain user-specified stopping criterion has been reached. The user now has a file with (1) records labelled as either relevant or irrelevant and (2) unlabelled records ordered from most to least probable to be relevant as predicted by the current model. This set-up helps to move through a large database much quicker than in the manual process, while the decision process simultaneously remains transparent.

Software implementation for ASReview

The source code 27 of ASReview is available open source under an Apache 2.0 license, including documentation 28 . Compiled and packaged versions of the software are available on the Python Package Index 29 or Docker Hub 30 . The free and ready-to-use software ASReview implements oracle, simulation and exploration modes. The oracle mode is used to perform a systematic review with interaction by the user, the simulation mode is used for simulation of the ASReview performance on existing datasets, and the exploration mode can be used for teaching purposes and includes several preloaded labelled datasets.

The oracle mode presents records to the researcher and the researcher classifies these. Multiple file formats are supported: (1) RIS files are used by digital libraries such as IEEE Xplore, Scopus and ScienceDirect; the citation managers Mendeley, RefWorks, Zotero and EndNote support the RIS format too. (2) Tabular datasets with the .csv, .xlsx and .xls file extensions. CSV files should be comma separated and UTF-8 encoded; the software for CSV files accepts a set of predetermined labels in line with the ones used in RIS files. Each record in the dataset should hold the metadata on, for example, a scientific publication. Mandatory metadata is text and can, for example, be titles or abstracts from scientific papers. If available, both are used to train the model, but at least one is needed. An advanced option is available that splits the title and abstracts in the feature-extraction step and weights the two feature matrices independently (for TF–IDF only). Other metadata such as author, date, DOI and keywords are optional but not used for training the models. When using ASReview in the simulation or exploration mode, an additional binary variable is required to indicate historical labelling decisions. This column, which is automatically detected, can also be used in the oracle mode as background knowledge for previous selection of relevant papers before entering the active learning cycle. If unavailable, the user has to select at least one relevant record that can be identified by searching the pool of records. At least one irrelevant record should also be identified; the software allows to search for specific records or presents random records that are most likely to be irrelevant due to the extremely imbalanced data.

The software has a simple yet extensible default model: a naive Bayes classifier, TF–IDF feature extraction, a dynamic resampling balance strategy 31 and certainty-based sampling 17 , 32 for the query strategy. These defaults were chosen on the basis of their consistently high performance in benchmark experiments across several datasets 31 . Moreover, the low computation time of these default settings makes them attractive in applications, given that the software should be able to run locally. Users can change the settings, shown in Table 2 , and technical details are described in our documentation 28 . Users can also add their own classifiers, feature extraction techniques, query strategies and balance strategies.

ASReview has a number of implemented features (see Table 2 ). First, there are several classifiers available: (1) naive Bayes; (2) support vector machines; (3) logistic regression; (4) neural networks; (5) random forests; (6) LSTM-base, which consists of an embedding layer, an LSTM layer with one output, a dense layer and a single sigmoid output node; and (7) LSTM-pool, which consists of an embedding layer, an LSTM layer with many outputs, a max pooling layer and a single sigmoid output node. The feature extraction techniques available are Doc2Vec 33 , embedding LSTM, embedding with IDF or TF–IDF 34 (the default is unigram, with the option to run n -grams while other parameters are set to the defaults of Scikit-learn 35 ) and sBERT 36 . The available query strategies for the active learning part are (1) random selection, ignoring model-assigned probabilities; (2) uncertainty-based sampling, which chooses the most uncertain record according to the model (that is, closest to 0.5 probability); (3) certainty-based sampling (max in ASReview), which chooses the record most likely to be included according to the model; and (4) mixed sampling, which uses a combination of random and certainty-based sampling.

There are several balance strategies that rebalance and reorder the training data. This is necessary, because the data is typically extremely imbalanced and therefore we have implemented the following balance strategies: (1) full sampling, which uses all of the labelled records; (2) undersampling the irrelevant records so that the included and excluded records are in some particular ratio (closer to one); and (3) dynamic resampling, a novel method similar to undersampling in that it decreases the imbalance of the training data 31 . However, in dynamic resampling, the number of irrelevant records is decreased, whereas the number of relevant records is increased by duplication such that the total number of records in the training data remains the same. The ratio between relevant and irrelevant records is not fixed over interactions, but dynamically updated depending on the number of labelled records, the total number of records and the ratio between relevant and irrelevant records. Details on all of the described algorithms can be found in the code and documentation referred to above.

By default, ASReview converts the records’s texts into a document-term matrix, terms are converted to lowercase and no stop words are removed by default (but this can be changed). As the document-term matrix is identical in each iteration of the active learning cycle, it is generated in advance of model training and stored in the (active learning) state file. Each row of the document-term matrix can easily be requested from the state-file. Records are internally identified by their row number in the input dataset. In oracle mode, the record that is selected to be classified is retrieved from the state file and the record text and other metadata (such as title and abstract) are retrieved from the original dataset (from the file or the computer’s memory). ASReview can run on your local computer, or on a (self-hosted) local or remote server. Data (all records and their labels) remain on the users’s computer. Data ownership and confidentiality are crucial and no data are processed or used in any way by third parties. This is unique by comparison with some of the existing systems, as shown in the last column of Table 1 .

Real-world use cases and high-level function descriptions

Below we highlight a number of real-world use cases and high-level function descriptions for using the pipeline of ASReview.

ASReview can be integrated in classic systematic reviews or meta-analyses. Such reviews or meta-analyses entail several explicit and reproducible steps, as outlined in the PRISMA guidelines 4 . Scholars identify all likely relevant publications in a standardized way, screen retrieved publications to select eligible studies on the basis of defined eligibility criteria, extract data from eligible studies and synthesize the results. ASReview fits into this process, particularly in the abstract screening phase. ASReview does not replace the initial step of collecting all potentially relevant studies. As such, results from ASReview depend on the quality of the initial search process, including selection of databases 24 and construction of comprehensive searches using keywords and controlled vocabulary. However, ASReview can be used to broaden the scope of the search (by keyword expansion or omitting limitation in the search query), resulting in a higher number of initial papers to limit the risk of missing relevant papers during the search part (that is, more focus on recall instead of precision).

Furthermore, many reviewers nowadays move towards meta-reviews when analysing very large literature streams, that is, systematic reviews of systematic reviews 37 . This can be problematic as the various reviews included could use different eligibility criteria and are therefore not always directly comparable. Due to the efficiency of ASReview, scholars using the tool could conduct the study by analysing the papers directly instead of using the systematic reviews. Furthermore, ASReview supports the rapid update of a systematic review. The included papers from the initial review are used to train the machine learning model before screening of the updated set of papers starts. This allows the researcher to quickly screen the updated set of papers on the basis of decisions made in the initial run.

As an example case, let us look at the current literature on COVID-19 and the coronavirus. An enormous number of papers are being published on COVID-19. It is very time consuming to manually find relevant papers (for example, to develop treatment guidelines). This is especially problematic as urgent overviews are required. Medical guidelines rely on comprehensive systematic reviews, but the medical literature is growing at breakneck pace and the quality of the research is not universally adequate for summarization into policy 38 . Such reviews must entail adequate protocols with explicit and reproducible steps, including identifying all potentially relevant papers, extracting data from eligible studies, assessing potential for bias and synthesizing the results into medical guidelines. Researchers need to screen (tens of) thousands of COVID-19-related studies by hand to find relevant papers to include in their overview. Using ASReview, this can be done far more efficiently by selecting key papers that match their (COVID-19) research question in the first step; this should start the active learning cycle and lead to the most relevant COVID-19 papers for their research question being presented next. A plug-in was therefore developed for ASReview 39 , which contained three databases that are updated automatically whenever a new version is released by the owners of the data: (1) the Cord19 database, developed by the Allen Institute for AI, with over all publications on COVID-19 and other coronavirus research (for example SARS, MERS and so on) from PubMed Central, the WHO COVID-19 database of publications, the preprint servers bioRxiv and medRxiv and papers contributed by specific publishers 40 . The CORD-19 dataset is updated daily by the Allen Institute for AI and updated also daily in the plugin. (2) In addition to the full dataset, we automatically construct a daily subset of the database with studies published after December 1st, 2019 to search for relevant papers published during the COVID-19 crisis. (3) A separate dataset of COVID-19 related preprints, containing metadata of preprints from over 15 preprints servers across disciplines, published since January 1st, 2020 41 . The preprint dataset is updated weekly by the maintainers and then automatically updated in ASReview as well. As this dataset is not readily available to researchers through regular search engines (for example, PubMed), its inclusion in ASReview provided added value to researchers interested in COVID-19 research, especially if they want a quick way to screen preprints specifically.

Simulation study

To evaluate the performance of ASReview on a labelled dataset, users can employ the simulation mode. As an example, we ran simulations based on four labelled datasets with version 0.7.2 of ASReview. All scripts to reproduce the results in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 , whereas the results are available at OSF ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 .

First, we analysed the performance for a study systematically describing studies that performed viral metagenomic next-generation sequencing in common livestock such as cattle, small ruminants, poultry and pigs 44 . Studies were retrieved from Embase ( n  = 1,806), Medline ( n  = 1,384), Cochrane Central ( n  = 1), Web of Science ( n  = 977) and Google Scholar ( n  = 200, the top relevant references). After deduplication this led to 2,481 studies obtained in the initial search, of which 120 were inclusions (4.84%).

A second simulation study was performed on the results for a systematic review of studies on fault prediction in software engineering 45 . Studies were obtained from ACM Digital Library, IEEExplore and the ISI Web of Science. Furthermore, a snowballing strategy and a manual search were conducted, accumulating to 8,911 publications of which 104 were included in the systematic review (1.2%).

A third simulation study was performed on a review of longitudinal studies that applied unsupervised machine learning techniques to longitudinal data of self-reported symptoms of the post-traumatic stress assessed after trauma exposure 46 , 47 ; 5,782 studies were obtained by searching Pubmed, Embase, PsychInfo and Scopus and through a snowballing strategy in which both the references and the citation of the included papers were screened. Thirty-eight studies were included in the review (0.66%).

A fourth simulation study was performed on the results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors, from a study collecting various systematic review datasets from the medical sciences 15 . The collection is a subset of 2,544 publications from the TREC 2004 Genomics Track document corpus 48 . This is a static subset from all MEDLINE records from 1994 through 2003, which allows for replicability of results. Forty-one publications were included in the review (1.6%).

Performance metrics

We evaluated the four datasets using three performance metrics. We first assess the work saved over sampling (WSS), which is the percentage reduction in the number of records needed to screen achieved by using active learning instead of screening records at random; WSS is measured at a given level of recall of relevant records, for example 95%, indicating the work reduction in screening effort at the cost of failing to detect 5% of the relevant records. For some researchers it is essential that all relevant literature on the topic is retrieved; this entails that the recall should be 100% (that is, WSS@100%). We also propose the amount of relevant references found after having screened the first 10% of the records (RRF10%). This is a useful metric for getting a quick overview of the relevant literature.

For every dataset, 15 runs were performed with one random inclusion and one random exclusion (see Fig. 2 ). The classical review performance with randomly found inclusions is shown by the dashed line. The average work saved over sampling at 95% recall for ASReview is 83% and ranges from 67% to 92%. Hence, 95% of the eligible studies will be found after screening between only 8% to 33% of the studies. Furthermore, the number of relevant abstracts found after reading 10% of the abstracts ranges from 70% to 100%. In short, our software would have saved many hours of work.

figure 2

a – d , Results of the simulation study for the results for a study systematically review studies that performed viral metagenomic next-generation sequencing in common livestock ( a ), results for a systematic review of studies on fault prediction in software engineering ( b ), results for longitudinal studies that applied unsupervised machine learning techniques on longitudinal data of self-reported symptoms of posttraumatic stress assessed after trauma exposure ( c ), and results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors ( d ). Fiteen runs (shown with separate lines) were performed for every dataset, with only one random inclusion and one random exclusion. The classical review performances with randomly found inclusions are shown by the dashed lines.

Usability testing (user experience testing)

We conducted a series of user experience tests to learn from end users how they experience the software and implement it in their workflow. The study was approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences of Utrecht University (ID 20-104).

Unstructured interviews

The first user experience (UX) test—carried out in December 2019—was conducted with an academic research team in a substantive research field (public administration and organizational science) that has conducted various systematic reviews and meta-analyses. It was composed of three university professors (ranging from assistant to full) and three PhD candidates. In one 3.5 h session, the participants used the software and provided feedback via unstructured interviews and group discussions. The goal was to provide feedback on installing the software and testing the performance on their own data. After these sessions we prioritized the feedback in a meeting with the ASReview team, which resulted in the release of v.0.4 and v.0.6. An overview of all releases can be found on GitHub 27 .

A second UX test was conducted with four experienced researchers developing medical guidelines based on classical systematic reviews, and two experienced reviewers working at a pharmaceutical non-profit organization who work on updating reviews with new data. In four sessions, held in February to March 2020, these users tested the software following our testing protocol. After each session we implemented the feedback provided by the experts and asked them to review the software again. The main feedback was about how to upload datasets and select prior papers. Their feedback resulted in the release of v.0.7 and v.0.9.

Systematic UX test

In May 2020 we conducted a systematic UX test. Two groups of users were distinguished: an unexperienced group and an experienced user who already used ASReview. Due to the COVID-19 lockdown the usability tests were conducted via video calling where one person gave instructions to the participant and one person observed, called human-moderated remote testing 49 . During the tests, one person (SH) asked the questions and helped the participant with the tasks, the other person observed and made notes, a user experience professional at the IT department of Utrecht University (MH).

To analyse the notes, thematic analysis was used, which is a method to analyse data by dividing the information in subjects that all have a different meaning 50 using the Nvivo 12 software 51 . When something went wrong the text was coded as showstopper, when something did not go smoothly the text was coded as doubtful, and when something went well the subject was coded as superb. The features the participants requested for future versions of the ASReview tool were discussed with the lead engineer of the ASReview team and were submitted to GitHub as issues or feature requests.

The answers to the quantitative questions can be found at the Open Science Framework 52 . The participants ( N  = 11) rated the tool with a grade of 7.9 (s.d. = 0.9) on a scale from one to ten (Table 2 ). The unexperienced users on average rated the tool with an 8.0 (s.d. = 1.1, N  = 6). The experienced user on average rated the tool with a 7.8 (s.d. = 0.9, N  = 5). The participants described the usability test with words such as helpful, accessible, fun, clear and obvious.

The UX tests resulted in the new release v0.10, v0.10.1 and the major release v0.11, which is a major revision of the graphical user interface. The documentation has been upgraded to make installing and launching ASReview more straightforward. We made setting up the project, selecting a dataset and finding past knowledge is more intuitive and flexible. We also added a project dashboard with information on your progress and advanced settings.

Continuous input via the open source community

Finally, the ASReview development team receives continuous feedback from the open science community about, among other things, the user experience. In every new release we implement features listed by our users. Recurring UX tests are performed to keep up with the needs of users and improve the value of the tool.

We designed a system to accelerate the step of screening titles and abstracts to help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible. Our system uses active learning to train a machine learning model that predicts relevance from texts using a limited number of labelled examples. The classifier, feature extraction technique, balance strategy and active learning query strategy are flexible. We provide an open source software implementation, ASReview with state-of-the-art systems across a wide range of real-world systematic reviewing applications. Based on our experiments, ASReview provides defaults on its parameters, which exhibited good performance on average across the applications we examined. However, we stress that in practical applications, these defaults should be carefully examined; for this purpose, the software provides a simulation mode to users. We encourage users and developers to perform further evaluation of the proposed approach in their application, and to take advantage of the open source nature of the project by contributing further developments.

Drawbacks of machine learning-based screening systems, including our own, remain. First, although the active learning step greatly reduces the number of manuscripts that must be screened, it also prevents a straightforward evaluation of the system’s error rates without further onerous labelling. Providing users with an accurate estimate of the system’s error rate in the application at hand is therefore a pressing open problem. Second, although, as argued above, the use of such systems is not limited in principle to reviewing, no empirical benchmarks of actual performance in these other situations yet exist to our knowledge. Third, machine learning-based screening systems automate the screening step only; although the screening step is time-consuming and a good target for automation, it is just one part of a much larger process, including the initial search, data extraction, coding for risk of bias, summarizing results and so on. Although some other works, similar to our own, have looked at (semi-)automating some of these steps in isolation 53 , 54 , to our knowledge the field is still far removed from an integrated system that would truly automate the review process while guaranteeing the quality of the produced evidence synthesis. Integrating the various tools that are currently under development to aid the systematic reviewing pipeline is therefore a worthwhile topic for future development.

Possible future research could also focus on the performance of identifying full text articles with different document length and domain-specific terminologies or even other types of text, such as newspaper articles and court cases. When the selection of past knowledge is not possible based on expert knowledge, alternative methods could be explored. For example, unsupervised learning or pseudolabelling algorithms could be used to improve training 55 , 56 . In addition, as the NLP community pushes forward the state of the art in feature extraction methods, these are easily added to our system as well. In all cases, performance benefits should be carefully evaluated using benchmarks for the task at hand. To this end, common benchmark challenges should be constructed that allow for an even comparison of the various tools now available. To facilitate such a benchmark, we have constructed a repository of publicly available systematic reviewing datasets 57 .

The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We invite the community to contribute to open source projects such as our own, as well as to common benchmark challenges, so that we can provide measurable and reproducible improvement over current practice.

Data availability

The results described in this paper are available at the Open Science Framework ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 . The answers to the quantitative questions of the UX test can be found at the Open Science Framework (OSF.IO/7PQNM) 52 .

Code availability

All code to reproduce the results described in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 . All code for the software ASReview is available under an Apache 2.0 license ( https://doi.org/10.5281/zenodo.3345592 ) 27 , is maintained on GitHub 63 and includes documentation ( https://doi.org/10.5281/zenodo.4287120 ) 28 .

Bornmann, L. & Mutz, R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66 , 2215–2222 (2015).

Article   Google Scholar  

Gough, D., Oliver, S. & Thomas, J. An Introduction to Systematic Reviews (Sage, 2017).

Cooper, H. Research Synthesis and Meta-analysis: A Step-by-Step Approach (SAGE Publications, 2015).

Liberati, A. et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J. Clin. Epidemiol. 62 , e1–e34 (2009).

Boaz, A. et al. Systematic Reviews: What have They Got to Offer Evidence Based Policy and Practice? (ESRC UK Centre for Evidence Based Policy and Practice London, 2002).

Oliver, S., Dickson, K. & Bangpan, M. Systematic Reviews: Making Them Policy Relevant. A Briefing for Policy Makers and Systematic Reviewers (UCL Institute of Education, 2015).

Petticrew, M. Systematic reviews from astronomy to zoology: myths and misconceptions. Brit. Med. J. 322 , 98–101 (2001).

Lefebvre, C., Manheimer, E. & Glanville, J. in Cochrane Handbook for Systematic Reviews of Interventions (eds. Higgins, J. P. & Green, S.) 95–150 (John Wiley & Sons, 2008); https://doi.org/10.1002/9780470712184.ch6 .

Sampson, M., Tetzlaff, J. & Urquhart, C. Precision of healthcare systematic review searches in a cross-sectional sample. Res. Synth. Methods 2 , 119–125 (2011).

Wang, Z., Nayfeh, T., Tetzlaff, J., O’Blenis, P. & Murad, M. H. Error rates of human reviewers during abstract screening in systematic reviews. PLoS ONE 15 , e0227742 (2020).

Marshall, I. J. & Wallace, B. C. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst. Rev. 8 , 163 (2019).

Harrison, H., Griffin, S. J., Kuhn, I. & Usher-Smith, J. A. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med. Res. Methodol. 20 , 7 (2020).

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. & Ananiadou, S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4 , 5 (2015).

Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C. & Schmid, C. H. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinf. 11 , 55 (2010).

Cohen, A. M., Hersh, W. R., Peterson, K. & Yen, P.-Y. Reducing workload in systematic review preparation using automated citation classification. J. Am. Med. Inform. Assoc. 13 , 206–219 (2006).

Kremer, J., Steenstrup Pedersen, K. & Igel, C. Active learning with support vector machines. WIREs Data Min. Knowl. Discov. 4 , 313–326 (2014).

Miwa, M., Thomas, J., O’Mara-Eves, A. & Ananiadou, S. Reducing systematic review workload through certainty-based screening. J. Biomed. Inform. 51 , 242–253 (2014).

Settles, B. Active Learning Literature Survey (Minds@UW, 2009); https://minds.wisconsin.edu/handle/1793/60660

Holzinger, A. Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3 , 119–131 (2016).

Van de Schoot, R. & De Bruin, J. Researcher-in-the-loop for Systematic Reviewing of Text Databases (Zenodo, 2020); https://doi.org/10.5281/zenodo.4013207

Kim, D., Seo, D., Cho, S. & Kang, P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477 , 15–29 (2019).

Nosek, B. A. et al. Promoting an open research culture. Science 348 , 1422–1425 (2015).

Kilicoglu, H., Demner-Fushman, D., Rindflesch, T. C., Wilczynski, N. L. & Haynes, R. B. Towards automatic recognition of scientifically rigorous clinical research evidence. J. Am. Med. Inform. Assoc. 16 , 25–31 (2009).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Borah, R., Brown, A. W., Capers, P. L. & Kaiser, K. A. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7 , e012545 (2017).

de Vries, H., Bekkers, V. & Tummers, L. Innovation in the Public Sector: a systematic review and future research agenda. Public Adm. 94 , 146–166 (2016).

Van de Schoot, R. et al. ASReview: Active Learning for Systematic Reviews (Zenodo, 2020); https://doi.org/10.5281/zenodo.3345592

De Bruin, J. et al. ASReview Software Documentation 0.14 (Zenodo, 2020); https://doi.org/10.5281/zenodo.4287120

ASReview PyPI Package (ASReview Core Development Team, 2020); https://pypi.org/project/asreview/

Docker container for ASReview (ASReview Core Development Team, 2020); https://hub.docker.com/r/asreview/asreview

Ferdinands, G. et al. Active Learning for Screening Prioritization in Systematic Reviews—A Simulation Study (OSF Preprints, 2020); https://doi.org/10.31219/osf.io/w6qbg

Fu, J. H. & Lee, S. L. Certainty-enhanced active learning for improving imbalanced data classification. In 2011 IEEE 11th International Conference on Data Mining Workshops 405–412 (IEEE, 2011).

Le, Q. V. & Mikolov, T. Distributed representations of sentences and documents. Preprint at https://arxiv.org/abs/1405.4053 (2014).

Ramos, J. Using TF–IDF to determine word relevance in document queries. In Proc. 1st Instructional Conference on Machine Learning Vol. 242, 133–142 (ICML, 2003).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks Preprint at https://arxiv.org/abs/1908.10084 (2019).

Smith, V., Devane, D., Begley, C. M. & Clarke, M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med. Res. Methodol. 11 , 15 (2011).

Wynants, L. et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. Brit. Med. J . 369 , 1328 (2020).

Van de Schoot, R. et al. Extension for COVID-19 Related Datasets in ASReview (Zenodo, 2020). https://doi.org/10.5281/zenodo.3891420 .

Lu Wang, L. et al. CORD-19: The COVID-19 open research dataset. Preprint at https://arxiv.org/abs/2004.10706 (2020).

Fraser, N. & Kramer, B. Covid19_preprints (FigShare, 2020); https://doi.org/10.6084/m9.figshare.12033672.v18

Ferdinands, G., Schram, R., Van de Schoot, R. & De Bruin, J. Scripts for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (Zenodo, 2020); https://doi.org/10.5281/zenodo.4024122

Ferdinands, G., Schram, R., van de Schoot, R. & de Bruin, J. Results for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (OSF, 2020); https://doi.org/10.17605/OSF.IO/2JKD6

Kwok, K. T. T., Nieuwenhuijse, D. F., Phan, M. V. T. & Koopmans, M. P. G. Virus metagenomics in farm animals: a systematic review. Viruses 12 , 107 (2020).

Hall, T., Beecham, S., Bowes, D., Gray, D. & Counsell, S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38 , 1276–1304 (2012).

van de Schoot, R., Sijbrandij, M., Winter, S. D., Depaoli, S. & Vermunt, J. K. The GRoLTS-Checklist: guidelines for reporting on latent trajectory studies. Struct. Equ. Model. Multidiscip. J. 24 , 451–467 (2017).

Article   MathSciNet   Google Scholar  

van de Schoot, R. et al. Bayesian PTSD-trajectory analysis with informed priors based on a systematic literature search and expert elicitation. Multivar. Behav. Res. 53 , 267–291 (2018).

Cohen, A. M., Bhupatiraju, R. T. & Hersh, W. R. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In Proc. 13th Text Retrieval Conference (TREC, 2004).

Vasalou, A., Ng, B. D., Wiemer-Hastings, P. & Oshlyansky, L. Human-moderated remote user testing: orotocols and applications. In 8th ERCIM Workshop, User Interfaces for All Vol. 19 (ERCIM, 2004).

Joffe, H. in Qualitative Research Methods in Mental Health and Psychotherapy: A Guide for Students and Practitioners (eds Harper, D. & Thompson, A. R.) Ch. 15 (Wiley, 2012).

NVivo v. 12 (QSR International Pty, 2019).

Hindriks, S., Huijts, M. & van de Schoot, R. Data for UX-test ASReview - June 2020. OSF https://doi.org/10.17605/OSF.IO/7PQNM (2020).

Marshall, I. J., Kuiper, J. & Wallace, B. C. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J. Am. Med. Inform. Assoc. 23 , 193–201 (2016).

Nallapati, R., Zhou, B., dos Santos, C. N., Gulcehre, Ç. & Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning 280–290 (Association for Computational Linguistics, 2016).

Xie, Q., Dai, Z., Hovy, E., Luong, M.-T. & Le, Q. V. Unsupervised data augmentation for consistency training. Preprint at https://arxiv.org/abs/1904.12848 (2019).

Ratner, A. et al. Snorkel: rapid training data creation with weak supervision. VLDB J. 29 , 709–730 (2020).

Systematic Review Datasets (ASReview Core Development Team, 2020); https://github.com/asreview/systematic-review-datasets

Wallace, B. C., Small, K., Brodley, C. E., Lau, J. & Trikalinos, T. A. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In Proc. 2nd ACM SIGHIT International Health Informatics Symposium 819–824 (Association for Computing Machinery, 2012).

Cheng, S. H. et al. Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv. Biol. 32 , 762–764 (2018).

Yu, Z., Kraft, N. & Menzies, T. Finding better active learners for faster literature reviews. Empir. Softw. Eng . 23 , 3161–3186 (2018).

Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—a web and mobile app for systematic reviews. Syst. Rev. 5 , 210 (2016).

Przybyła, P. et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res. Synth. Methods 9 , 470–488 (2018).

ASReview: Active learning for Systematic Reviews (ASReview Core Development Team, 2020); https://github.com/asreview/asreview

Download references

Acknowledgements

We would like to thank the Utrecht University Library, focus area Applied Data Science, and departments of Information and Technology Services, Test and Quality Services, and Methodology and Statistics, for their support. We also want to thank all researchers who shared data, participated in our user experience tests or who gave us feedback on ASReview in other ways. Furthermore, we would like to thank the editors and reviewers for providing constructive feedback. This project was funded by the Innovation Fund for IT in Research Projects, Utrecht University, the Netherlands.

Author information

Authors and affiliations.

Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands

Rens van de Schoot, Gerbrich Ferdinands, Albert Harkema, Joukje Willemsen, Yongchao Ma, Qixiang Fang, Sybren Hindriks & Daniel L. Oberski

Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Jonathan de Bruin, Raoul Schram, Parisa Zahedi & Maarten Hoogerwerf

Utrecht University Library, Utrecht University, Utrecht, the Netherlands

Jan de Boer, Felix Weijdema & Bianca Kramer

Department of Test and Quality Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Martijn Huijts

School of Governance, Faculty of Law, Economics and Governance, Utrecht University, Utrecht, the Netherlands

Lars Tummers

Department of Biostatistics, Data management and Data Science, Julius Center, University Medical Center Utrecht, Utrecht, the Netherlands

Daniel L. Oberski

You can also search for this author in PubMed   Google Scholar

Contributions

R.v.d.S. and D.O. originally designed the project, with later input from L.T. J.d.Br. is the lead engineer, software architect and supervises the code base on GitHub. R.S. coded the algorithms and simulation studies. P.Z. coded the very first version of the software. J.d.Bo., F.W. and B.K. developed the systematic review pipeline. M.Huijts is leading the UX tests and was supported by S.H. M.Hoogerwerf developed the architecture of the produced (meta)data. G.F. conducted the simulation study together with R.S. A.H. performed the literature search comparing the different tools together with G.F. J.W. designed all the artwork and helped with formatting the manuscript. Y.M. and Q.F. are responsible for the preprocessing of the metadata under the supervision of J.d.Br. R.v.d.S, D.O. and L.T. wrote the paper with input from all authors. Each co-author has written parts of the manuscript.

Corresponding author

Correspondence to Rens van de Schoot .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Jian Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Overview of software tools supporting systematic reviews.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

van de Schoot, R., de Bruin, J., Schram, R. et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3 , 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7

Download citation

Received : 04 June 2020

Accepted : 17 December 2020

Published : 01 February 2021

Issue Date : February 2021

DOI : https://doi.org/10.1038/s42256-020-00287-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Assessing the health status of migrants upon arrival in europe: a systematic review of the adverse impact of migration journeys.

  • Cristina Canova
  • Lucia Dansero
  • Isabella Rosato

Globalization and Health (2024)

A systematic review, meta-analysis, and meta-regression of the prevalence of self-reported disordered eating and associated factors among athletes worldwide

  • Hadeel A. Ghazzawi
  • Lana S. Nimer
  • Haitham Jahrami

Journal of Eating Disorders (2024)

Systematic review using a spiral approach with machine learning

  • Amirhossein Saeidmehr
  • Piers David Gareth Steel
  • Faramarz F. Samavati

Systematic Reviews (2024)

Determinants of and interventions for Proton Pump Inhibitor prescription behavior: A systematic scoping review

  • L. C. van Gestel
  • M. A. Adriaanse
  • G. van den Brink

BMC Primary Care (2024)

The spatial patterning of emergency demand for police services: a scoping review

  • Samuel Langton
  • Stijn Ruiter
  • Linda Schoonmade

Crime Science (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

literature review machine learning example

Methodology

  • Open access
  • Published: 28 April 2022

An intelligent literature review: adopting inductive approach to define machine learning applications in the clinical domain

  • Renu Sabharwal   ORCID: orcid.org/0000-0001-9728-8001 1 &
  • Shah J. Miah 1  

Journal of Big Data volume  9 , Article number:  53 ( 2022 ) Cite this article

7926 Accesses

10 Citations

Metrics details

Big data analytics utilizes different techniques to transform large volumes of big datasets. The analytics techniques utilize various computational methods such as Machine Learning (ML) for converting raw data into valuable insights. The ML assists individuals in performing work activities intelligently, which empowers decision-makers. Since academics and industry practitioners have growing interests in ML, various existing review studies have explored different applications of ML for enhancing knowledge about specific problem domains. However, in most of the cases existing studies suffer from the limitations of employing a holistic, automated approach. While several researchers developed various techniques to automate the systematic literature review process, they also seemed to lack transparency and guidance for future researchers. This research aims to promote the utilization of intelligent literature reviews for researchers by introducing a step-by-step automated framework. We offer an intelligent literature review to obtain in-depth analytical insight of ML applications in the clinical domain to (a) develop the intelligent literature framework using traditional literature and Latent Dirichlet Allocation (LDA) topic modeling, (b) analyze research documents using traditional systematic literature review revealing ML applications, and (c) identify topics from documents using LDA topic modeling. We used a PRISMA framework for the review to harness samples sourced from four major databases (e.g., IEEE, PubMed, Scopus, and Google Scholar) published between 2016 and 2021 (September). The framework comprises two stages—(a) traditional systematic literature review consisting of three stages (planning, conducting, and reporting) and (b) LDA topic modeling that consists of three steps (pre-processing, topic modeling, and post-processing). The intelligent literature review framework transparently and reliably reviewed 305 sample documents.

Introduction

Organizations are continuously harnessing the power of various big data adopting different ML techniques. Captured insights from big data may create a greater impact to reshape their business operations and processes. As a vital technique, big data analytics methods are used to transform complicated and huge amounts of data, known as ‘Big Data, in order to uncover hidden patterns, new learning, untold facts or associations, anomalies, and other perceptions [ 41 ]. Big Data alludes to the enormous amount of data that a traditional database management system cannot handle. In most of the cases, traditional software functions would be inadequate to analyze or process them. Big data are characterized by the 5 V’s, which refers to volume, variety, velocity, veracity, and value [ 22 ]. ML is a vital approach to design useful big data analytics techniques, which is a rapidly growing sub-field in information sciences that deals with all these characteristics. ML employs numerous methods for machines to learn from past experiences (e.g., past datasets) reducing the extra burden of writing codes in traditional programming [ 7 , 26 ]. Clinical care enterprises face a huge challenge due to the increasing demand of big data processing to improve clinical care outcomes. For example, an electronic health record contains a huge amount of patient information, drug administration, imaging data using various modalities. The variety and quantity of the huge data provide in the clinical domain as an ideal topic to appraise the value of ML in research.

Existing ML approaches, such as Oala et al. [ 35 ] proposed an algorithmic framework that give a path towards the effective and reliable application of ML in the healthcare domain. In conjunction with their systematic review, our research offers a smart literature review that consolidates a traditional literature review followed the PRISMA framework guidelines and topic modeling using LDA, focusing on the clinical domain. Most of the existing literature focused on the healthcare domain [ 14 , 42 , 49 ] are more inclusive and of a broader scope with a requisite of medical activities, whereas our research is primarily focused is clinical, which assist in diagnosing and treating patients as well as includes clinical aspects of medicine.

Since clinical research has developed, the area has become increasingly attractive to clinical researchers, in particular for learning insights of ML applications in clinical practices . This is because of its practical pertinence to clinical patients, professionals, clinical application designers, and other specialists supported by the omnipresence of clinical disease management techniques. Although the advantage is presumed for the target audience, such as self-management abilities (self-efficacy and investment behavior) and physical or mental condition of life amid long-term ill patients, clinical care specialists (such as further developing independent direction and providing care support to patients), their clinical care have not been previously assessed and conceptualized as a well-defined and essential sub-field of health care research. It is important to portray similar studies utilizing different types of review approaches in the aspect of the utilization of ML/DL and its value. Table 1 represents some examples of existing studies with various points and review approaches in the domain.

Although the existing studies included in Table 1 give an understanding of designated aspects of ML/DL utilization in clinical care, they show a lack of focus on how key points addressed in existing ML/DL research are developing. Further to this, they indicate a clear need towards an understanding of multidisciplinary affiliations and profiles of ML/DL that could provide significant knowledge to new specialists or professionals in this space. For instance, Brnabic and Hess [ 8 ] recommended a direction for future research by stating that “ Future work should routinely employ ensemble methods incorporating various applications of machine learning algorithms” (p. 1).

ML tools have become the central focus of modern biomedical research, because of better admittance to large datasets, exponential processing power, and key algorithmic developments allowing ML models to handle increasingly challenging data [ 19 ]. Different ML approaches can analyze a huge amount of data, including difficult and abnormal patterns. Most studies have focused on ML and its impacts on clinical practices [ 2 , 9 , 10 , 24 , 26 , 34 , 43 ]. Fewer studies have examined the utilization of ML algorithms [ 11 , 20 , 45 , 48 ] for more holistic benefits for clinical researchers.

ML becomes an interdisciplinary science that integrates computer science, mathematics, and statistics. It is also a methodology that builds smart machines for artificial intelligence. Its applications comprise algorithms, an assortment of instructions to perform specific tasks, crafted to independently learn from data without human intercession. Over time, ML algorithms improve their prediction accuracy without a need for programming. Based on this, we offer an intelligent literature review using traditional literature review and Latent Dirichlet Allocation (LDA Footnote 1 ) topic modeling in order to meet knowledge demands in the clinical domain. Theoretical measures direct the current study results because previous literature provides a strong foundation for future IS researchers to investigate ML in the clinical sector. The main aim of this study is to develop an intelligent literature framework using traditional literature. For this purpose, we employed four digital databases -IEEE, Google Scholar, PubMed, and Scopus then performed LDA topic modeling, which may assist healthcare or clinical researchers in analyzing many documents intelligently with little effort and a small amount of time.

Traditional systematic literature is destined to be obsolete, time-consuming with restricted processing power, resulting in fewer sample documents investigated. Academic and practitioner-researchers are frequently required to discover, organize, and comprehend new and unexplored research areas. As a part of a traditional literature review that involves an enormous number of papers, the choice for a researcher is either to restrict the number of documents to review a priori or analyze the study using some other methods.

The proposed intelligent literature review approach consists of Part A and Part B, a combination of traditional systematic literature review and topic modeling that may assist future researchers in using appropriate technology, producing accurate results, and saving time. We present the framework below in Fig.  1 .

figure 1

Proposed intelligent literature review framework

The traditional literature review identified 534,327 articles embraces Scopus (24,498), IEEE (2558), PubMed (11,271), and Google Scholar (496,000) articles, which went through three stages–Planning the review, conducting the review, and reporting the review and analyzed 305 articles, where we performed topic modeling using LDA.

We follow traditional systematic literature review methodologies [ 25 , 39 , 40 ] including a PRISMA framework [ 37 ]. We review four digital databases and deliberately develop three stages entailing planning, conducting, and reporting the review (Fig.  2 ).

figure 2

Traditional literature review three stages

Planning the review

Research articles : the research articles are classified using some keywords mentioned below in Tables 2 , 3 .

Digital database : Four databases (IEEE, PubMed, Scopus, and Google Scholar) were used to collect details for reviewing research articles.

Review protocol development : We first used Scopus to search the information and found many studies regarding this review. We then searched PubMed, IEEE, and Google scholar for articles and extracted only relevant papers matching our keywords and review context based on their full-text availability.

Review protocol evaluation : To support the selection of research articles and inclusion and exclusion criteria, the quality of articles was explored and assessed to appraise their suitability and impartiality [ 44 ]. Only articles with keywords “machine learning” and “clinical” in document titles and abstracts were selected.

Conducting the review

The second step is conducting the review, which includes a description of Search Syntax and data synthesis.

Search syntax Table 4 details the syntax used to select research articles.

Data synthesis

We used a qualitative meta-synthesis technique to understand the methodology, algorithms, applications, qualities, results, and current research impediments. Qualitative meta-synthesis is a coherent approach for analyzing data across qualitative studies [ 4 ]. Our first search identified 534,327 papers, comprising Scopus (24,498), IEEE (2,558), PubMed (11,271), and Google Scholar (496,000) articles with the selected keywords. After subjecting this dataset to our inclusion and exclusion criteria, articles were reduced to Scopus (181), IEEE (62), PubMed (37), and Google Scholar (46) (Fig.  3 ).

figure 3

PRISMA framework of traditional literature review

Reporting the review

This section displays the result of the traditional literature review.

Demonstration of findings

A search including linear literature and citation chaining was acted in digital databases, and the resulted papers were thoroughly analyzed to choose only the most pertinent articles, at last, 305 articles were included for the Part B review. Information of such articles were classified, organized, and demonstrated to show the finding.

Report the findings

The word cloud is displayed on the selected 305 research articles which give an overview of the frequency of the word within those 305 research articles. The chosen articles are moved to the next step to perform the conversion of PDF files to text documents for performing LDA topic modeling (Fig. 4 ).

figure 4

Word cloud on 305 articles

Conversion of pdf files to a text document

The Python coding is used to convert pdf files shared on GitHub https://github.com/MachineLearning-UON/Topic-modeling-using-LDA.git . The one text document is prepared with 305 research papers collected from a traditional literature review.

Topic modelling for intelligent literature review

Our intelligent literature review is developed using a combination of traditional literature review and topic modeling [ 22 ]. We use topic modeling—probability generating, a text-mining technique widely used in computer science for text mining and data recovery. Topic modeling is used in numerous papers to analyze [ 1 , 5 , 17 , 36 ] and use various ML algorithms [ 38 ] such as Latent Semantic Indexing (LSI), Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), Parallel Latent Dirichlet Allocation (PLDA), and Pachinko Allocation Model (PAM). We developed the LDA-based methodological framework so it would be most widely and easily used [ 13 , 17 , 21 ] as a very elementary [ 6 ] approach. LDA is an unsupervised and probabilistic ML algorithm that discovers topics by calculating patterns of word co-occurrence across many documents or corpus [ 16 ]. Each LDA topic is distributed across each document as a probability.

While there are numerous ways of conducting a systematic literature review, most strategies require a high expense of time and prior knowledge of the area in advance. This study examined the expense of various text categorization strategies, where the assumptions and cost of the strategy are analyzed [ 5 ]. Interestingly, except manually reading the articles and topic modeling, all the strategies require prior knowledge of the articles' categories and high pre-examination costs. However, topic modeling can be automated, alternate the utilization of researchers' time, demonstrating a perfect match for the utilization of topic modeling as a part of an Intelligent literature review. Topic modeling has been used in a few papers to categorize research papers presented in Table 5 .

The articles/papers in the above table analyzed are speeches, web documents, web posts, press releases, and newspapers. However, none of those have developed the framework to perform traditional literature reviews from digital databases then use topic modeling to save time. However, this research points out the utilization of LDA in academics and explores four parameters—text pre-processing, model parameters selection, reliability, and validity [ 5 ]. Topic modeling identifies patterns of the repetitive word across a corpus of documents. Patterns of word co-occurrence are conceived as hidden ‘topics’ available in the corpus. First, documents must be modified to be machine-readable, with only their most informative features used for topic modeling. We modify documents in a three-stage process entailing pre-processing, topic modeling, and post-processing, as defined in Fig.  1 earlier.

The utilization of topic modeling presents an opportunity for researchers to use advanced technology for the literature review process. Topic modeling has been utilized online and requires many statistical skills, which not all researchers have. Therefore, we have shared the codes in GitHub with the default parameter for future researchers.

Pre-processing

Székely and Brocke [ 46 ] explained that pre-processing is a seven-step process which explored below and mentioned in Fig.  1 as part B:

Load data—The text data file is imported using the python command.

Optical character recognition—using word cloud, characters are recognized.

Filtering non-English words—non-English words are removed.

Document tokenization—Split the text into sentences and the sentences into words. Lowercase the words and remove punctuation.

Text cleaning—the text has been cleaned using portstemmer.

Word lemmatization—words in the third person are changed to the first person, and past and future verb tenses are changed into the present.

Stop word removal—All stop words are removed.

Topic modelling using LDA

Several research articles have been selected to run LDA topic modeling, explained in Table 5 . LDA model results present the coherence score for all the selected topics and a list of the most frequently used words for each.

Post-processing

The goal of the post-processing stage is to identify and label topics and topics relevant for use in the literature review. The result of the LDA model is presented as a list of topics and probabilities of each document (paper). The list is utilized to assign a paper to a topic by arranging the list by the highest probability for each paper for each topic. All the topics contain documents that are like each other. To reduce the risk of error in topic identification, a combination of inspecting the most frequent words for each topic and a paper view is used. After the topic review, it will present in the literature review.

Following the intelligent literature review, results of the LDA model should be approved or validated by statistical, semantic, or predictive means. Statistical validation defines the mutual information tests of result fit to model assumptions; semantics validation requires hand-coding to decide if the importance of specific words varies significantly and as expected with tasks to different topics which is used in the current study to validate LDA model result; and predictive validation refers to checking if events that ought to have expanded the prevalence of particular topic if out interpretations are right, did so [ 6 , 21 ].

LDA defines that each word in each document comes from a topic, and the topic is selected from a set of keywords. So we have two matrices:

ϴtd = P(t|d) which is the probability distribution of topics in documents

Фwt = P(w|t), which is the probability distribution of words in topics

And, we can say that the probability of a word given document, i.e., P(w|d), is equal to:

where T is the total number of topics; likewise, let’s assume there are W keywords for all the documents.

If we assume conditional independence, we can say that

And hence P(w|d) is equal to

that is the dot product of ϴtd and Фwt for each topic t.

Our systematic literature review identified 305 research papers after performing a traditional literature review. After executing LDA topic modeling, only 115 articles show the relevancy with our topic "machine learning application in clinical domain'. The following stages present LDA topic modeling process.

The 305 research papers were stacked into a Python environment then converted into a single text file. The seven steps have been carried out, described earlier in Pre-processing .

  • Topic modeling

The two main parameters of the LDA topic model are the dictionary (id2word)-dictionary and the corpus—doc_term_matrix. The LDA model is created by running the command:

# Creating the object for LDA model using gensim library

LDA = gensim.models.ldamodel.LdaModel

# Build LDA model

lda_model = LDA(corpus=doc_term_matrix, id2word = dictionary, num_topics=20, random_state=100,

chunksize = 1000, passes=50,iterations=100)

In this model, ‘num_topics’ = 20, ‘chunksize’ is the number of documents used in each training chunk, and ‘passes’ is the total number of training passes.

Firstly, the LDA model is built with 20 topics; each topic is represented by a combination of 20 keywords, with each keyword contributing a certain weight to a topic. Topics are viewed and interpreted in the LDA model, such as Topic 0, represented as below:

(0, '0.005*"analysis" + 0.005*"study" + 0.005*"models" + 0.004*"prediction" + 0.003*"disease" + 0.003*"performance" + 0.003*"different" + 0.003*"results" + 0.003*"patient" + 0.002*"feature" + 0.002*"system" + 0.002*"accuracy" + 0.002*"diagnosis" + 0.002*"classification" + 0.002*"studies" + 0.002*"medicine" + 0.002*"value" + 0.002*"approach" + 0.002*"variables" + 0.002*"review"'),

Our approach to finding the ideal number of topics is to construct LDA models with different numbers of topics as K and select the model with the highest coherence value. Selecting the ‘K' value that denotes the end of the rapid growth of topic coherence ordinarily offers significant and interpretable topics. Picking a considerably higher value can provide more granular sub-topics if the ‘K’ selection is too large, which can cause the repetition of keywords in multiple topics.

Model perplexity and topic coherence values are − 8.855378536321144 and 0.3724024189689453, respectively. To measure the efficiency of the LDA model is lower the perplexity, the better the model is. Topics and associated keywords were then examined in an interactive chart using the pyLDAvis package, which presents the topics are 20 and most salient terms in those 20 topics, but these 20 topics overlap each other as shown in Fig.  5 , which means the keywords are repeated in these 20 topics and topics are overlapped, which means so decided to use num_topics = 9 and presented PyLDAvis Figure below. Each bubble on the left-hand side plot represents a topic. The bigger the bubble is, the more predominant that topic is. A decent topic will have a genuinely big, non-overlapping bubble dispersed throughout the graph instead of grouped in one quadrant. A topic model with many topics will typically have many overlaps, small-sized bubbles clustered in one locale of the graph, as shown in Fig.  6 .

figure 5

PyLDAvis graph with 20 topics in the clinical domain

figure 6

PyLDAvis graph with nine vital topics in the clinical domain

Each bubble addresses a generated topic. The larger the bubble, the higher percentage of the number of keywords in the corpus is about that topic which can be seen on the GitHub file. Blue bars address the general occurrence of each word in the corpus. If no topic is selected, the blue bars of the most frequently used words are displayed, as depicted in Fig.  6 .

The further the bubbles are away from each other, the more various they are. For example, we can tell that topic 1 is about patient information and studies utilized deep learning to analyze the disease, which can be seen in GitHub file codes ( https://github.com/MachineLearning-UON/Topic-modeling-using-LDA.git ) and presented in Fig.  7 .

figure 7

PyLDAvis graph with topic 1

Red bars give the assessed number of times a given topic produced a given term. As you can see from Fig.  7 , there are around 4000 of the word 'analysis', and this term is utilized 1000 times inside topic 1. The word with the longest red bar is the most used by the keywords having a place with that topic.

A good topic model will have big and non-overlapping bubbles dispersed throughout the chart. As we can see from Fig.  6 , the bubbles are clustered within one place. One of the practical applications of topic modeling is discovering the topic in a provided document. We find out the topic number with the highest percentage contribution in that document, as shown in Fig.  8 .

figure 8

Dominant topics with topic percentage contribution

The next stage is to process the discoveries and find a satisfactory depiction of the topics. A combination of evaluating the most continuous words utilized to distinguish the topic. For example, the most frequent words for the papers in topic 2 are "study" and "analysis", which indicate frequent words for ML usage in the clinical domain.

The topic name is displayed with the topic number from 0 to 8, which represents in the Table 6 , which includes the Topic number and Topic words.

The result represents the percentage of the topics in all documents, which presents that topic 0 and topic 6 have the highest percentage and used in 58 and 57 documents, respectively, with 115 papers. The result of this research was an overview of the exploration areas inside the paper corpus, addressed by 9 topics.

This paper presented a new methodology that is uncommon in scholarly publications. The methodology utilizes ML to investigate sample articles/papers to distinguish research directions. Even though the structure of the ML-based methodology has its restrictions, the outcomes and its ease of use leave a promising future for topic modeling-based systematic literature reviews.

The principal benefit of the methodological framework is that it gives information about an enormous number of papers, with little effort on the researcher's part, before time-exorbitant manual work is to be finished. By utilizing the framework, it is conceivable to rapidly explore a wide range of paper corpora and assess where the researcher's time and concentration should be spent. This is particularly significant for a junior researcher with minimal earlier information on a research field. If default boundaries and cleaning settings can be found for the steps in the framework, a completely programmed gathering of papers could be empowered, where limited works have been introduced to accomplish an overview of research directions.

From a literature review viewpoint, the advantage of utilizing the proposed framework is that the inclusion and exclusion selection of papers for a literature review will be delayed to a later stage where more information is given, resulting in a more educated dynamic interaction. The framework empowers reproducibility, as every step can be reproduced in the systematic review process that ultimately empowers with transparency. The whole process has been demonstrated as a case concept on GitHub by future researchers.

The study has introduced an intelligent literature review framework that uses ML to analyze existing research documents or articles. We demonstrate how topic modeling can assist literature review by reducing the manual screening of huge quantities of literature for more efficient use of researcher time. An LDA algorithm provides default parameters and data cleaning steps, reducing the effort required to review literature. An additional advantage of our framework is that the intelligent literature review offers accurate results with little time, and it comprises traditional ways to analyze literature and LDA topic modeling.

This framework is constructed in a step-by-step manner. Researchers can use it efficiently because it requires less technical knowledge than other ML algorithms. There is no restriction on the quantity of the research papers it can measure. This research extends knowledge to similar studies in this field [ 12 , 22 , 23 , 26 , 30 , 46 ] which present topic modeling. The study acknowledges the inspiring concept of smart literature defined by Asmussen and Møller [ 3 ]. The researchers previously provided a brief description of how LDA is utilized in topic modeling. Our research followed the basic idea but enhanced its significance to broaden its scale and focus on a specific domain such as the clinical domain to produce insights from existing research articles. For instance, Székely and Vom [ 46 ] utilized natural language processing to analyze 9514 sustainability reports published between 1999 and 2015. They identified 42 topics but did not develop any framework for future researchers. This was considered a significant gap in the research. Similarly, Kushwaha et al. [ 22 ] used a network analysis approach to analyze 10-year papers without providing any clear transparent outcome (e.g., how the research step-by-step produces an outcome). Likewise, Asmussen and Møller [ 3 ] developed a smart literature review framework that was limited to analyzing 650 sample articles through a single method. However, in our research, we developed an intelligent literature review that combines traditional and LDA topic modeling, so that future researchers can get assistance to gain effective knowledge regarding literature review when it becomes a state-of-the-art in research domains.

Our research developed a more effective intelligent framework, which combines traditional literature review and topic modeling using LDA, which provides more accurate and transparent results. The results are shared via public access on GitHub using this link https://github.com/MachineLearning-UON/Topic-modeling-using-LDA.git .

This paper focused on creating a methodological framework to empower researchers, diminishing the requirement for manually scanning documents and assigning the possibility to examine practically limitless. It would assist in capturing insights of an enormous number of papers quicker, more transparently, with more reliability. The proposed framework utilizes the LDA's topic model, which gathers related documents into topics.

A framework employed topic modeling for rapidly and reliably investigating a limitless number of papers, reducing their need to read individually, is developed. Topic modeling using the LDA algorithm can assist future researchers as they often need an outline of various research fields with minimal pre-existing knowledge. The proposed framework can empower researchers to review more papers in less time with more accuracy. Our intelligent literature review framework includes a holistic literature review process (conducting, planning, and reporting the review) and an LDA topic modeling (pre-processing, topic modeling, and post-processing stages), which conclude the results of 115 research articles are relevant to the search.

The automation of topic modeling with default parameters could also be explored to benefit non-technical researchers to explore topics or related keywords in any problem domain. For future directions, the principal points should be addressed. Future researchers in other research fields should apply the proposed framework to acquire information about the practical usage and gain ideas for additional advancement of the framework. Furthermore, research in how to consequently specify model parameters could extraordinarily enhance the ease of use for the utilization of topic modeling for non-specialized researchers, as the determination of model parameters enormously affects the outcome of the framework.

Future research may be utilized more ML analytics tools as complete solution artifacts to analyze different forms of big data. This could be adopting design science research methodologies for benefiting design researchers who are interested in building ML-based artifacts [ 15 , 28 , 29 , 31 , 32 , 33 ].

Availability of data and materials

Data will be supplied upon request.

LDA is a probabilistic method for topic modeling in text analysis, providing both a predictive and latent topic representation.

Abbreviations

The Institute of Electrical and Electronics Engineers

  • Machine learning
  • Latent Dirichlet Allocation

Organizational Capacity

Latent Semantic Indexing

Latent Semantic Analysis

Non-Negative Matrix Factorization

Parallel Latent Dirichlet Allocation

Pachinko Allocation Model

Abuhay TM, Kovalchuk SV, Bochenina K, Mbogo G-K, Visheratin AA, Kampis G, et al. Analysis of publication activity of computational science society in 2001–2017 using topic modelling and graph theory. J Comput Sci. 2018;26:193–204.

Article   Google Scholar  

Adlung L, Cohen Y, Mor U, Elinav E. Machine learning in clinical decision making. Med. 2021;2(6):642–65.

Asmussen CB, Møller C. Smart literature review: a practical topic modeling approach to exploratory literature review. J Big Data. 2019;6(1):1–18.

Beck CT. A meta-synthesis of qualitative research. MCN Am J Mater Child Nurs. 2002;27(4):214–21.

Behera RK, Bala PK, Dhir A. The emerging role of cognitive computing in healthcare: a systematic literature review. Int J Med Informatics. 2019;129:154–66.

Blei DM. Probabilistic topic models. Commun ACM. 2012;55(4):77–84.

Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.

MATH   Google Scholar  

Brnabic A, Hess LM. Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak. 2021;21(1):1–19.

Cabitza F, Locoro A, Banfi G. Machine learning in orthopedics: a literature review. Front Bioeng Biotechnol. 2018;6:75.

Chang C-H, Lin C-H, Lane H-Y. Machine learning and novel biomarkers for the diagnosis of Alzheimer’s disease. Int J Mol Sci. 2021;22(5):2761.

Connor KL, O’Sullivan ED, Marson LP, Wigmore SJ, Harrison EM. The future role of machine learning in clinical transplantation. Transplantation. 2021;105(4):723–35.

Dias R, Torkamani A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. 2019;11(1):1–12.

DiMaggio P, Nag M, Blei D. Exploiting affinities between topic modeling and the sociological perspective on culture: application to newspaper coverage of US government arts funding. Poetics. 2013;41(6):570–606.

Forest P-G, Martin D. Fit for Purpose: Findings and recommendations of the external review of the Pan-Canadian Health Organizations: Summary Report: Health Canada Ottawa, ON; 2018.

Genemo H, Miah SJ, McAndrew A. A design science research methodology for developing a computer-aided assessment approach using method marking concept. Educ Inf Technol. 2016;21(6):1769–84.

Greene D, Cross JP. Exploring the political agenda of the european parliament using a dynamic topic modeling approach. Polit Anal. 2017;25(1):77–94.

Grimmer J. A Bayesian hierarchical topic model for political texts: measuring expressed agendas in Senate press releases. Polit Anal. 2010;18(1):1–35.

Grimmer J, Stewart BM. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit Anal. 2013;21(3):267–97.

Hassan N, Slight R, Weiand D, Vellinga A, Morgan G, Aboushareb F, et al. Preventing sepsis; how can artificial intelligence inform the clinical decision-making process? A systematic review. Int J Med Inform. 2021;150:104457.

Hirt R, Koehl NJ, Satzger G, editors. An end-to-end process model for supervised machine learning classification: from problem to deployment in information systems. Designing the Digital Transformation: DESRIST 2017 Research in Progress Proceedings of the 12th International Conference on Design Science Research in Information Systems and Technology Karlsruhe, Germany 30 May-1 Jun; 2017: Karlsruher Institut für Technologie (KIT).

Koltsova O, Koltcov S. Mapping the public agenda with topic modeling: the case of the Russian live journal. Policy Internet. 2013;5(2):207–27.

Kushwaha AK, Kar AK, Dwivedi YK. Applications of big data in emerging management disciplines: a literature review using text mining. Int J Inf Manag Data Insights. 2021;1(2):100017.

Google Scholar  

Li S, Wang H. Traditional literature review and research synthesis. The Palgrave handbook of applied linguistics research methodology. 2018:123–44.

Magrabi F, Ammenwerth E, McNair JB, De Keizer NF, Hyppönen H, Nykänen P, et al. Artificial intelligence in clinical decision support: challenges for evaluating AI and practical implications. Yearb Med Inform. 2019;28(01):128–34.

Maier D, Waldherr A, Miltner P, Wiedemann G, Niekler A, Keinert A, et al. Applying LDA topic modeling in communication research: toward a valid and reliable methodology. Commun Methods Meas. 2018;12(2–3):93–118.

Mårtensson G, Ferreira D, Granberg T, Cavallin L, Oppedal K, Padovani A, et al. The reliability of a deep learning model in clinical out-of-distribution MRI data: a multicohort study. Med Image Anal. 2020;66:101714.

Mendo IR, Marques G, de la Torre DI, López-Coronado M, Martín-Rodríguez F. Machine learning in medical emergencies: a systematic review and analysis. J Med Syst. 2021;45(10):1–16.

Miah SJ. An ontology based design environment for rural business decision support. Nathan: Griffith University Nathan; 2008.

Miah SJ, A new semantic knowledge sharing approach for e-government systems. 4th IEEE International Conference on Digital Ecosystems and Technologies; 2010: IEEE.

Miah SJ, Camilleri E, Vu HQ. Big Data in healthcare research: a survey study. J Comput Inf Syst. 2021. https://doi.org/10.1080/08874417.2020.1858727 .

Miah SJ, Gammack J, Kerr D, Ontology development for context-sensitive decision support. Third International Conference on Semantics, Knowledge and Grid (SKG 2007); 2007: IEEE.

Miah SJ, Gammack JG. Ensemble artifact design for context sensitive decision support. Australas J Inf Syst. 2014. https://doi.org/10.3127/ajis.v18i2.898 .

Miah SJ, Gammack JG, McKay J. A metadesign theory for tailorable decision support. J Assoc Inf Syst. 2019;20(5):4.

Mimno D, Blei D, editors. Bayesian checking for topic models. Proceedings of the 2011 conference on empirical methods in natural language processing; 2011.

Oala L, Murchison AG, Balachandran P, Choudhary S, Fehr J, Leite AW, et al. Machine learning for health: algorithm auditing & quality control. J Med Syst. 2021;45(12):1–8.

Ouhbi S, Idri A, Fernández-Alemán JL, Toval A. Requirements engineering education: a systematic mapping study. Requir Eng. 2015;20(2):119–38.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2020;372:n71.

Quinn KM, Monroe BL, Colaresi M, Crespin MH, Radev DR. How to analyze political attention with minimal assumptions and costs. Am J Polit Sci. 2010;54(1):209–28.

Rowley J, Slack F. Conducting a literature review. Management research news. 2004.

Rozas LW, Klein WC. The value and purpose of the traditional qualitative literature review. J Evid Based Soc Work. 2010;7(5):387–99.

Sabharwal R, Miah SJ. A new theoretical understanding of big data analytics capabilities in organizations: a thematic analysis. J Big Data. 2021;8(1):1–17.

Salazar-Reyna R, Gonzalez-Aleu F, Granda-Gutierrez EM, Diaz-Ramirez J, Garza-Reyes JA, Kumar A. A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems. Management Decision. 2020.

Shah P, Kendall F, Khozin S, Goosen R, Hu J, Laramie J, et al. Artificial intelligence and machine learning in clinical development: a translational perspective. NPJ Digit Med. 2019;2(1):1–5.

Sone D, Beheshti I. Clinical application of machine learning models for brain imaging in epilepsy: a review. Front Neurosci. 2021;15:761.

Spasic I, Nenadic G. Clinical text data in machine learning: systematic review. JMIR Med Inform. 2020;8(3):e17984.

Székely N, Vom Brocke J. What can we learn from corporate sustainability reporting? Deriving propositions for research and practice from over 9,500 corporate sustainability reports published between 1999 and 2015 using topic modelling technique. PLoS ONE. 2017;12(4):e0174807.

Verma D, Bach K, Mork PJ, editors. Application of machine learning methods on patient reported outcome measurements for predicting outcomes: a literature review. Informatics; 2021: Multidisciplinary Digital Publishing Institute.

Weng W-H. Machine learning for clinical predictive analytics. Leveraging data science for global health. Cham: Springer; 2020. p. 199–217.

Book   Google Scholar  

Yin Z, Sulieman LM, Malin BA. A systematic literature review of machine learning in online personal health data. J Am Med Inform Assoc. 2019;26(6):561–76.

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

Newcastle Business School, The University of Newcastle, Newcastle, NSW, Australia

Renu Sabharwal & Shah J. Miah

You can also search for this author in PubMed   Google Scholar

Contributions

The first author conducted the research, while the second author has ensured quality standards and rewritten the entire findings linking to underlying theories. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Renu Sabharwal .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Sabharwal, R., Miah, S.J. An intelligent literature review: adopting inductive approach to define machine learning applications in the clinical domain. J Big Data 9 , 53 (2022). https://doi.org/10.1186/s40537-022-00605-3

Download citation

Received : 18 November 2021

Accepted : 06 April 2022

Published : 28 April 2022

DOI : https://doi.org/10.1186/s40537-022-00605-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Clinical research
  • Systematic literature review

literature review machine learning example

A Systematic Literature Review on Machine Learning and Deep Learning Methods for Semantic Segmentation

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Open access
  • Published: 19 September 2024

Machine learning in business and finance: a literature review and research opportunities

  • Hanyao Gao 1 ,
  • Gang Kou 2 ,
  • Haiming Liang 1 ,
  • Hengjie Zhang 3 ,
  • Xiangrui Chao 1 ,
  • Cong-Cong Li 5 &
  • Yucheng Dong 1 , 4  

Financial Innovation volume  10 , Article number:  86 ( 2024 ) Cite this article

2081 Accesses

4 Altmetric

Metrics details

This study provides a comprehensive review of machine learning (ML) applications in the fields of business and finance. First, it introduces the most commonly used ML techniques and explores their diverse applications in marketing, stock analysis, demand forecasting, and energy marketing. In particular, this review critically analyzes over 100 articles and reveals a strong inclination toward deep learning techniques, such as deep neural, convolutional neural, and recurrent neural networks, which have garnered immense popularity in financial contexts owing to their remarkable performance. This review shows that ML techniques, particularly deep learning, demonstrate substantial potential for enhancing business decision-making processes and achieving more accurate and efficient predictions of financial outcomes. In particular, ML techniques exhibit promising research prospects in cryptocurrencies, financial crime detection, and marketing, underscoring the extensive opportunities in these areas. However, some limitations regarding ML applications in the business and finance domains remain, including issues related to linguistic information processes, interpretability, data quality, generalization, and the oversights related to social networks and causal relationships. Thus, addressing these challenges is a promising avenue for future research.

Introduction

The rapid development of information and database technologies, coupled with notable progress in data analysis methods and computer hardware, has led to an exponential increase in the application of ML techniques in various areas, including business and finance (Ghoddusi et al. 2019 ; Gogas and Papadimitriou 2021 ; Chen et al. 2022 ; Hoang and Wiegratz 2022 ; Nazareth and Ramana 2023 ; Ozbayoglu et al. 2020 ; Xiao and Ke 2021 ). The progress in ML techniques in business and finance applications, such as marketing, e-commerce, and energy, has been highly successful, yielding promising results (Athey and Imbens 2019 ). Compared to traditional econometric models, ML techniques can more effectively handle large amounts of structured and unstructured data, enabling rapid decision-making and forecasting. These benefits stem from ML techniques’ ability to avoid making specific assumptions about the functional form, parameter distribution, or variable interactions and instead focus on making accurate predictions about the dependent variables based on other variables.

Exploring scientific databases, such as the Thomson Reuters Web of Science, reveals a significant exponential increase in the utilization of ML in business and finance. Figure  1 illustrates the outcomes of an inquiry into fundamental ML applications in emerging business and financial domains over the past few decades. Numerous studies in this field have applied ML techniques to resolve business and financial problems. Table 1 lists some of their applications. Boughanmi and Ansari ( 2021 ) developed a multimodal ML framework that integrates different types of non-parametric data to accommodate diverse effects. Additionally, they combined multimedia data in creative product settings and applied their model to predict the success of musical albums and playlists. Zhu et al. ( 2021 ) asserted that accurate demand forecasting is critical for supply chain efficiency, especially for the pharmaceutical supply chain, owing to its unique characteristics. However, a lack of sufficient data has prevented forecasters from pursuing advanced models. Accordingly, they proposed a demand forecasting framework that “borrows” time-series data from many other products and trains the data with advanced ML models. Yan and Ouyang ( 2018 ) proposed a time-series prediction model that combines wavelet analysis with a long short-term memory neural network to capture the complex features of financial time series and showed that this neural network had a better prediction effect. Zhang et al. ( 2020a , b ) employed a Bayesian learning model with a rich dataset to analyze the decision-making behavior of taxi drivers in a large Asian city to understand the key factors that drive the supply side of urban mobility markets.

figure 1

Trend of articles on applied ML techniques in business and finance (2007–2021)

Several review papers have explored the potential of ML to enhance various domains, including agriculture (Raj et al. 2015 ; Coble et al. 2018 ; Kamilaris and Prenafeta-Boldu 2018 ; Storm et al. 2020 ), economic analysis (Einav and Levin 2014 ; Bajari et al. 2015 ; Grimmer 2015 ; Nguyen et al. 2020 ; Nosratabadi et al. 2020 ), and financial crisis prediction (Lin et al. 2012 ; Canhoto 2021 ; Dastile et al. 2020 ; Nanduri et al. 2020 ). Kou et al. ( 2019 ) conducted a survey encompassing research and methodologies related to the assessment and measurement of financial systemic risk that incorporated various ML techniques, including big data analysis, network analysis, and sentiment analysis. Meng and Khushi ( 2019 ) reviewed articles that focused on stock/forex prediction or trading, where reinforcement learning served as the primary ML method. Similarly, Nti et al. ( 2020 ) reviewed approximately 122 pertinent studies published in academic journals over an 11-year span, concentrating on the application of ML to stock market prediction.

Despite these valuable contributions, it is worth noting that the existing review papers primarily concentrate on specific issues within the realm of business and finance, such as the financial system and stock market. Consequently, although a substantial body of research exists in this area, a comprehensive and systematic review of the extensive applications of ML in various aspects of business and finance is lacking. In addition, existing review articles do not provide a comprehensive review of common ML techniques utilized in business and finance. To bridge the aforementioned gaps in the literature, we aim to provide an all-encompassing and methodological review of the extensive spectrum of ML applications in the business and finance domains. To begin with, we identify the most commonly utilized ML techniques in the business and finance domains. Then we introduce the fundamental ML concepts and frequently employed techniques and algorithms. Next, we systematically examine the extensive applications of ML in various sub-domains within business and finance, including marketing, stock markets, e-commerce, cryptocurrency, finance, accounting, credit risk management, and energy. We critically analyze the existing research that explores the implementation of ML techniques in business and finance to offer valuable insights to researchers, practitioners, and decision-makers, thereby facilitating better-informed decision-making and driving future research directions in this field.

The remainder of this paper is organized as follows. Section “ Keywords, distribution of articles, and common technologies in the application of ML techniques in business and finance ” outlines the literature retrieval process and presents the statistical findings from the literature analysis, including an analysis of common application trends and ML techniques. Section “ Machine learning: a brief introduction ” introduces fundamental concepts and terminology related to ML. Sections “ Supervised learning ” and “ Unsupervised learning ” explore in-depth common supervised and unsupervised learning techniques, respectively. Section “ Applications of machine learning techniques in business and finance ” discusses the most recent applications of ML in business and finance. Section “ Critical discussions and future research directions ” discusses some limitations of ML in this domain and analyzes future research opportunities. Finally, “ Conclusions ” section concludes.

Keywords, distribution of articles, and common technologies in the application of ML techniques in business and finance

The primary focus of this review is to explore the advancements in ML in business- and finance-related fields involving ML applications in various market-related issues, including prices, investments, and customer behaviors. This review employs the following strategies to identify existing literature. Initially, we identify relevant journals known for publishing papers that utilize ML techniques to address business and finance problems, such as the UTD-24. Table 2 lists the keywords used in the literature search. During the search process, we input various combinations of ML keywords and business/finance keywords, such as “support vector machine” and “marketing.” By cross-referencing the selected journals and keywords and thoroughly examining the citations of highly cited papers, we aimed to achieve a comprehensive and unbiased representation of the current literature.

After identifying journals and keywords, we searched for articles in the Thomson Reuters Web of Science and Elsevier Scopus databases using the same set of keywords. Once the collection phase was complete, the filtering process was initiated. Initially, duplicate articles were excluded to ensure that only unique articles remained for further analysis. Subsequently, we carefully reviewed the full text of each article to eliminate irrelevant or inappropriate items and thus ensure that the final selection comprised relevant and meaningful literature.

Figure  2 illustrates the process of article selection for the review. In the identification phase, we retrieved 154 articles from the search and identified an additional 37 articles through reference checking. During the second phase, duplicates and inappropriate articles were filtered out, resulting in a total of 68 articles eligible for inclusion in this study. Based on the review of these articles, we categorized them into seven different applications: stock market, marketing, e-commerce, energy marketing, cryptocurrency, accounting, and credit risk management, as depicted in Fig.  3 and Tables 3 , 4 , 5 , 6 , 7 , 8 and 9 . Statistical analyses have revealed that ML research in the business and finance domain is predominantly concentrated in the areas of stock market and marketing. The research on e-commerce, cryptocurrency, and energy market applications is nearly equivalent in quantity. Conversely, articles focusing on accounting and credit risk management applications are relatively limited. Figure  4 provides a summary of the ML techniques employed in the reviewed articles. Deep learning, support vector machine, and decision tree methods emerged as the most prominent research technologies. In contrast, the application of unsupervised learning techniques, such as k-means and reinforcement learning, were less common.

figure 2

Flow diagram for article identification and filtering

figure 3

Number of papers employing ML techniques

figure 4

Prominent methods applied in the business and finance domains

Machine learning: a brief introduction

This section introduces the basic concepts of ML, including its goals and terminology. Thereafter, we present the model selection method and how to improve the performance.

Goals and terminology

The key objective in various scientific disciplines is to model the relationships between multiple explanatory variables and a set of dependent variables. When a theoretical mathematical model is established, researchers can use it to predict or control desired variables. However, in real-world scenarios, the underlying model is often too complex to be formulated as a closed-form input–output relationship. This complexity has led researchers in the field of ML to focus on developing algorithms (Wu et al. 2008 ; Chao et al. 2018 ). The primary goal of these algorithms is to predict certain variables based on other variables or to classify units using limited information; for example, they can be used to classify handwritten digits based on pixel values. ML techniques can automatically construct computational models that capture the intricate relationships present in available data by maximizing the problem-dependent performance criterion or minimizing the error term, which allows them to establish a robust representation of the underlying relationships.

In the context of ML, the sample used to estimate the parameters is usually referred to as a “training sample,” and the procedure for estimating the parameters is known as “training.” Let N be the sample size, k be the number of features, and q be the number of all possible outcomes. ML can be classified into two main types: supervised and unsupervised. In supervised learning problems, we know both the feature \({\mathbf{X}}_{i} = (x_{i1} ,...,x_{ik} ),\; \, i = 1,2,...,N\) and the outcome \(Y_{i} = (y_{i1} ,y_{i2} ,...,y_{iq} )\) , where \(y_{ij}\) represents the outcome of \(y_{i}\) in the dimension \(j\) . For example, in a recommendation system, the quality of product can be scored from 1 to 5, indicating that “q” equals 5. In unsupervised learning problems, we only observe the features \({\mathbf{X}}_{i}\) (input data) and aim to group them into clusters based on their similarities or patterns.

Cross-validation, overfitting, and regularization

Cross-validation is frequently used for model selection in ML that is applied to each model; the technique is applied to each model and the one with the lowest expected out-of-sample prediction error is selected.

The ML literature shows significantly higher concern about overfitting than the standard statistics or econometrics literature. In the ML community, the degrees of freedom are not explicitly considered, and many ML methods involve a large number of parameters, which can potentially lead to negative degrees of freedom.

Limiting overfitting is commonly achieved through regularization in ML, which controls the complexity of a model. As stated by Vapnik ( 2013 ), the regularization theory was one of the first signs of intelligent inference. The complexity of the model describes its ability to approximate various functions. As the complexity increases, the risk of overfitting also increases, whereas less complex and more regularized models may lead to underfitting. Regularization is often implemented by selecting a parsimonious number of variables and using specific functional forms without explicitly controlling for overfitting. Instead of directly optimizing an objective function, a regularization term is added to the objective function, which penalizes the complexity of the model. This approach encourages the model to generalize better and avoids overfitting by promoting simpler and more interpretable solutions.

Here, we provide an example to illustrate how regularization works. The following linear regression model was used:

where N is the sample size, k is the numbers of features, and q is the number of all possible outcomes. The variable \(y_{{ij}} (i = 1,2,...,N,\quad j = 1,2,...,q)\) represents the outcome of \(y_{i}\) in the j th dimension. Additionally, \(b_{pj} (p = 1,2,...,k,j = 1,2,...,q)\) represents the coefficient of feature p in the j th dimension. By using vector notations, \({{\varvec{\upsigma}}} = (\sigma_{1} ,...,\sigma_{q} )^{{ \top }}\) , \({\mathbf{b}} = (b_{{11}} ,b_{{21}} ,...,b_{{k1}} ,b_{{12}} ,b_{{22}} ,...,b_{{k2}} ,...,b_{{1q}} ,b_{{2q}} ,...,b_{{kq}} )^{{ \top }}\) and \(Y_{i} = (y_{i1} ,y_{i2} ,...,y_{iq} )\) , we can rewrite Eq. ( 1 ) as follows:

where \({\mathbf{b}}\) is the solution of

\(\lambda\) is a penalty parameter that can be selected through out-of-sample cross-validation to optimize the model’s out-of-sample predictive performance.

Supervised learning

This section introduces common supervised learning technologies. Compared to traditional statistics, supervised learning methods exhibit certain desired properties when optimizing predictions in large datasets, such as transaction and financial time series data. In business and finance, supervised learning models have proven to be among the most effective tools for detecting credit card fraud (Lebichot et al. 2021 ). In the following subsections, we briefly describe the commonly used supervised ML methods for business and finance.

Shrinkage methods

The traditional least-squares method often yields complex models with an excessive number of explanatory variables. In particular, when the number of features, k , is large compared to the sample size N , the least-squares estimator, \({\hat{\mathbf{b}}}\) , does not have good predictive properties, even if the conditional mean of the outcome is linear. To address this problem, regularization is typically used to adjust the estimation parameters dynamically and reduce the complexity of the model. The shrinkage method is the most common regularization method and can reduce the values of the parameters to be estimated. Shrinkage methods, such as ridge regression (Hoerl and Kennard 1970 ) and least absolute shrinkage and selection operator (LASSO) (Tibshirani 1996 ), are linear regression models that add a penalty term to the size of the coefficients. This penalty term pushes the coefficients towards zero, effectively shrinking their values. Shrinkage methods can be effectively used to predict continuous outcomes or classification tasks, particularly when dealing with datasets containing numerous explanatory variables.

Compared to the traditional approach that estimates the regression function using least squares,

shrinkage methods add a penalty term that shrinks \({\mathbf{b}}\) toward zero, aiming to minimize the following objective function:

where \(\left\| {\mathbf{b}} \right\|_{q} = \sum\nolimits_{i = 1}^{N} {\left| {b_{i} } \right|^{q} }\) . In \(q = 1\) , this formulation leads to a LASSO. However, when \(q = 2\) is used, this formulation degenerates ridge regression.

Tree-based method

Regression trees (Breiman et al. 1984 ) and random forests (Breiman 2001 ) are effective methods for estimating regression functions with minimal tuning, especially when out-of-sample predictive abilities are required. Considering a sample \((x_{i1} ,...,x_{ik} ,Y_{i} )\) for \(i = 1,2,...,N\) , the idea of a regression tree is to split the sample into subsamples where the regression functions are being estimated. The splits process is sequential and based on feature value \(x_{ij}\) exceeding threshold \(c\) . Let \(R_{1} (j,c)\) and \(R_{2} (j,c)\) be two sets based on the feature \(j\) and threshold \(c\) , where \(R_{1} (j,c) = \left\{ {{\mathbf{X}}_{i} |x_{ij} \le c} \right\}\) and \(R_{2} (j,c) = \left\{ {{\mathbf{X}}_{i} |x_{ij} > c} \right\}\) . Naturally, the dataset \(R\) is divided into two parts, \(R_{1}\) and \(R_{2}\) , based on the chosen feature and threshold.

Let \(c_{1} = \frac{1}{{|R_{1} |}}\sum\nolimits_{{{\mathbf{X}}_{i} \in R_{1} }} {x_{ij} }\) and \(c_{2} = \frac{1}{{|R_{2} |}}\sum\nolimits_{{{\mathbf{X}}_{i} \in R_{2} }} {x_{ij} }\) , where \(| \bullet |\) refer to the cardinality of the set. Then we can construct the following optimization model to calculate the errors of the \(R_{1}\) and \(R_{2}\) datasets:

For all \(x_{ij}\) and threshold \(c \in ( - \infty , + \infty )\) , the method finds the optimal feature \(j^{*}\) and threshold \(c^{*}\) that minimizes errors and splits the sample into subsets based on these criteria. By selecting the best feature and threshold, the method obtains the optimal classification of \(R_{1}^{*}\) and \(R_{2}^{*}\) . This process is repeated recursively, leading to further splits that minimize the squared error and improve the overall model performance. However, researchers should be cautious about overfitting, wherein the model fits the training data too closely and fails to generalize well to new data. To address this issue, a penalty term can be added to the objective function to encourage simpler and more regularized models. The coefficients of the model are then selected through cross-validation, optimizing the penalty parameter to achieve the best trade-off between model complexity and predictive performance on new, unseen data. This helps prevent overfitting and ensures that the model's performance is robust and reliable.

Random forest builds on the tree algorithm to better estimate the regression function. This approach smooths the regression function by averaging across multiple trees, thus exhibiting two distinct differences. First, instead of using the original sample, each tree is constructed based on a bootstrap sample or a subsample of the data, a technique known as “bagging.” Second, at each stage of building a tree, the splits are not optimized over all possible features (covariates) but rather over a random subset of the features. Consequently, feature selection varies in each split, which enhances the diversity of the individual trees.

Deep learning and neural networks

Deep learning and neural networks have been proven to be highly effective in complex settings. However, it is worth noting that the practical implementation of deep learning often demands a considerable amount of tuning compared to other methods, such as decision trees or random forests.

Deep neural networks

As with any other supervised learning methods, deep neural networks (DNNs) can be viewed as a straightforward mapping \(y=f(x;\theta )\) from the input feature vector \(x\) to the output vector or scalar \(y\) , which is governed by the unknown parameters \(\theta\) . This mapping typically consists of layers that form chain-like structures. Figure  5 illustrates the structure of the DNN. For a DNN with multiple layers, the structure can be represented as

figure 5

Structure of DNN

In a fully connected DNN, the \(i\) th layer has a structure given by \(h^{(i)} = f^{(i)} (x) = g^{(i)} ({\mathbf{W}}^{(i)} h^{(i - 1)} + {\mathbf{b}}^{(i)} )\) , where \({\mathbf{W}}\) is the matrix of unknown parameters and \({\mathbf{b}}^{\left( i \right)}\) is the vector of basis factors. A typical choice for \(g^{\left( i \right)}\) , called the “activation function,” can be a rectified linear unit, tanh transformation function, or sigmoid function. The 0th layer \(h^{(0)} = x\) , which represents the input vector. The row dimension of \(b\) or the column dimension of the \({\mathbf{W}}\) species is the number of neurons in each layer. The weight matrix \({\mathbf{W}}\) is learned by minimizing a loss function, which can be the mean squared error for regression tasks or the cross-entropy for classification tasks. In particular, when the DNN has one layer, \(y\) is scalar. The activation function is set to linear or logistic, and we obtain a linear or logistic regression.

Convolutional neural networks

Although neural networks have many different architectures, the two most classical and relevant are convolutional neural networks (CNNs) and recurrent neural networks (RNNs). A classical CNN structure, which contains three main components—convolutional, pooling, and fully connected layers—is shown in Fig.  6 . In contrast to the previously mentioned fully connected structure, in the convolutional layer, each neuron connects with only a small fraction of the neurons from the former layer; however, they share the same parameters. Therefore, sparse connections and parameter sharing significantly reduces the number of estimated parameters.

figure 6

Structure of CNN

Different layers play different roles in the training process and are introduced in more detail as follows:

Convolutional layer : This layer comprises a collection of trained filters that are used to extract features from the input data. Assuming that \(X\) is the input and there are \(k\) filters, the output of the convolutional layer can be formulated as follows:

where \(\omega_{j}\) and \(b_{j}\) denote the weights and bias, respectively; \(f\) represents the activation function; and \(*\) denotes the convolutional operator.

Pooling layer : This layer reduces the features and parameters of the network. The most popular pooling methods are the maximum and average pooling.

CNN are designed to handle one-dimensional time-series data or images. Intuitively, each convolutional layer can be considered a set of filters that move across images or shift along time sequences. For example, some filters may learn to detect textures, whereas others may identify specific shapes. Each filter generates a feature map and the subsequent convolutional layer integrates these features to create a more complex structure, resulting in a map of learned features. Suppose that \(S\) is an \(p \times p\) window size. Then the average pooling process can be formulated as

where \(x_{ij}\) is the activation value at location \((i,j)\) , and N is the total number of \(S\) .

Recurrent neural networks

Recurrent neural networks (RNNs) are well suited for processing sequential data, dynamic relations, and long-term dependencies. RNNs, particularly those employing long short-term memory (LSTM) cells, have become popular and have shown significant potential in natural language processing (Schmidhuber 2015 ). A key feature of this architecture is its ability to maintain past information over time using a cell-state vector. In each time step, new variables are combined with past information in the cell vector, enabling the RNN to learn how to encode information and determine which encoded information should be retained or forgotten. Similar to CNNs, RNN benefit from parameter sharing, which allows them to detect specific patterns in sequential data.

Figure  7 illustrates the structure of the LSTM network, which contains a memory unit \({C}_{t}\) , a hidden state \({h}_{t}\) , and three types of gates. Index \(t\) refers to the time step. At each step \(t\) , the LTSM combines input \({x}_{t}\) with the previous hidden state \({h}_{t-1}\) , calculates the activations of all gates, and updates the memory units and hidden states accordingly.

figure 7

Structure of LSTM

The computations of LSTM networks are described as follows:

where \(W\) denotes the weight of the inputs, and \(\omega_{f}\) and \(\omega_{i}\) represent the weights of the outputs and biases, respectively. The subscript \(f,i,{\text{ and }}O\) refer to the forget, input, and output gate vectors, respectively. \(b\) indicates biases and \(\circ\) is an element-wise multiplication.

Wavelet neural networks

Wavelet neural networks (Zhang and Benveniste  1992 ) use the wavelet function as the activation function, thus combining the advantages of both the wavelet transform and neural networks. The structure of wavelet neural networks is based on backpropagation neural networks, and the transfer function of the hidden layer neuron is the mother wavelet function. For input features \({\mathbf{x}} = (x_{1} ,...,x_{n} )\) , the output of the hidden layer can be expressed as follows:

where \(h(j)\) is the output value for neuron \(j\) , \(h_{j}\) is the mother wavelet function, \(\omega_{ij}\) is the weight between the input and hidden layers, \(b_{j}\) is the shift factor, and \(a_{j}\) is the stretch factor for \(h_{j}\) .

Support vector machine and kernels

Support vector machines (SVM) are flexible classification methods (Cortes and Vapnik 1995 ). Let us consider a binary classification problem, where we have an \(N\) observation \({\mathbf{X}}_{i}\) , each with \(k\) features, and a binary label \(y_{i} \in \{ - 1,1\}\) . Subsequently, a hyperplane \(x \in {\mathbf{\mathbb{R}}}\) s. t. \(w^{{ \top }} {\mathbf{X}}_{i} + b = 0\) is defined, which can be considered a binary classifier \({\text{sgn}} (w^{{ \top }} {\mathbf{X}}_{i} + b)\) . The goal of SVM is to find a hyperplane such that the observations can be separated into two classes: + 1 and − 1. From the hyperplane space, SVM selects the option that maximizes the distance from the closest sample. In an SVM, there is typically a small set of samples with the same maximal distance, which are referred to as “support vectors.”

The above-mentioned process can be written as the following optimization model:

To solve the above optimization model, we rewrite it in terms of Lagrangian multipliers as follows:

where \(\alpha_{i}\) is the Lagrangian multiplier of the original restriction and \(Y_{i} (\omega^{{ \top }} {\mathbf{X}}_{i} + b) \ge 1\) . The model above is equivalent to

We can obtain the Lagrangian multiplier \({{\varvec{\upalpha}}} = (\alpha_{1} ,...,\alpha_{N} )\) from Model ( 15 ), and then \(\widehat{b}\) can be solved from \(\sum\nolimits_{i = 1}^{N} {\hat{\alpha }_{i} (Y_{i} (\omega^{{ \top }} {\mathbf{X}}_{i} + b) - 1)} = 0\) . Furthermore, we can obtain the classifier:

Traditional SVM assumes linearly separable training samples. However, SVM can also deal with non-linear cases by mapping the original covariates to a new feature space using the function \(\phi ({\mathbf{X}}_{i} )\) and then finding the optimal hyperplane in this transformed feature space; that is, \(f(x_{i} ) = \omega^{{ \top }} \phi (x_{i} ) + b\) . Thus, the optimization problem in the transformed feature space can be formulated as

where \(K({\mathbf{X}}_{i} ,{\mathbf{X}}_{j} ) = \phi ({\mathbf{X}}_{i} )^{{ \top }} \phi ({\mathbf{X}}_{j} )\) . The kernel function \(K( \bullet )\) can be linear, polynomial, or sigmoid. Once the kernel function is determined, we can solve for the value of the Lagrangian multiplier \(\alpha\) . Then \(\widehat{b}\) can be solved from \(\sum\nolimits_{i = 1}^{N} {\hat{\alpha }_{i} (Y_{i} (\omega^{{ \top }} {\mathbf{X}}_{i} + b) - 1)} = 0\) , which allows us to derive the classifier:

Bayesian classifier

A Bayesian network is a graphical model that represents the probabilistic relationships among a set of features (Friedman et al. 1997 ). The Bayesian network structure \(S\) is a directed acyclic graph. Formally, a Bayesian network is a pair \(B = \left\langle {G,\Theta } \right\rangle\) , where \(G\) is a directed acyclic graph whose nodes represent the random variable \(\left( {X_{1} ,...,X_{n} } \right)\) , whose edges represent the dependencies between variables, and \(\Theta\) is the set of parameters that quantify the graph.

Assuming that there are \(q\) labels; that is, \({\mathbf{Y}} = \{ c_{1} ,...,c_{q} \}\) , \(\lambda_{ij}\) is the loss caused by misclassifying the sample with the true label \(c_{j}\) as \(c_{i}\) , and \({\mathbb{X}}\) represents the sample space. Then, based on the posterior probability \(P(c_{i} |{\mathbf{x}})\) , we can calculate the expected loss of classifying sample \({\mathbf{x}}\) into the label \(c_{i}\) as follows:

Therefore, the aim of the Bayesian classifier is to find a criterion \(h:{\mathbb{X}} \to {\mathbf{Y}}\) that minimizes the total risk

Obviously, for each sample \({\mathbf{x}}\) , when \(h\) can minimize the conditional risk \(R(h({\mathbf{x}})|{\mathbf{x}})\) , the total risk \(R(h)\) will also be minimized. This leads to the concept of Bayes decision rules: to minimize the total risk, we need to classify each sample into the label that minimizes the conditional risk \(R(h({\mathbf{x}})|{\mathbf{x}})\) , namely

We then used \(h^{*}\) as the Bayes-optimal classifier and \(R(h^{*} )\) as the Bayes risk.

K-nearest neighbor

The K-nearest neighbor (KNN) algorithm is a lazy-learning algorithm because it defers to the induction process until classification is required (Wettschereck et al. 1997 ). The lazy-learning algorithm requires less computation time during the training process compared to eager-learning algorithms such as decision trees, neural networks, and Bayes networks. However, it may require additional time during the classification phase.

The kNN algorithm is based on the assumption that instances close to each other in a feature space are likely to have similar properties. If instances with the same classification label are found nearby, an unlabeled instance can be assigned the same class label as its nearest neighbors. kNN locates the k-nearest instances to the unlabeled instance and determines its label by observing the most frequent class label among these neighbors.

The choice of k significantly affects the performance of the kNN algorithm. Let us discuss the performance of kNN during \(k = 1\) . Given sample \({\mathbf{x}}\) and its nearest sample \({\mathbf{z}}\) , the probability of error can be expressed as follows:

Suppose the samples are independent and identically distributed. For any \({\mathbf{x}}\) and any positive number \(\delta\) , there always exists at least one sample \({\mathbf{z}}\) within a distance of \(\delta\) from \({\mathbf{x}}\) . Let \(c^{*} ({\mathbf{x}})\mathop {\arg \min }\limits_{{c \in {\mathbf{Y}}}} P(c|{\mathbf{x}})\) be the outcome the Bayes optimal classifier. Then we have:

According to (23), despite the simplicity of kNN, the generalization error is no more than twice that of the Bayes-optimal classifier.

Unsupervised learning

In unsupervised learning, researchers can only access observations without any labeled information, and their primary interest lies in partitioning a sample into subsamples or clusters. Unsupervised learning methods are particularly useful in descriptive tasks because they aim to find relationships in a data structure without measuring the outcomes. Several approaches commonly used in business and finance research fall under the umbrella of unsupervised learning, including k-means clustering and reinforcement learning. Accordingly, unsupervised learning can be used in qualitative business and finance. For example, it can be particularly beneficial during stakeholder analysis, when stakeholders must be mapped and classified by considering certain predefined attributes. It can also be useful for customer management. A company can employ an unsupervised ML method to cluster guests, which influences its marketing strategy for specific groups and leads to a competitive advantage. This section introduces unsupervised learning technologies that are widely used in business and finance.

K-means clustering

The K-means algorithm aims to find K points in the sample space and classify the samples that are closest to these points. Using an iterative method, the values of each cluster center are updated step-by-step to achieve the best clustering results. When partitioning the feature space into K clusters, the k-means algorithm selects centroids and assigns observations to clusters based on their proximity to them. \(b_{1} ,...,b_{k}\) . The algorithm proceeds as follows. First, we begin with the K centroids \(b_{1} ,...,b_{k}\) , which are initially scattered throughout the feature space. Next, in accordance with the chosen centroids, each observation is assigned to clusters that minimize the distance between the observation and the centroid of the cluster:

Next, we update the centroid by computing the average of \(X_{i}\) across each cluster:

where \(I( \bullet )\) is the indicative function. When choosing the number of clusters, K, we must exercise caution because no cross-validation method is available to compare the values.

Reinforcement learning

Reinforcement learning (RL) draws inspiration from the trial-and-error procedure conducted by Thorndike in his 1898 study of cat behavior. Originating from animal learning, RL aims to mimic human behavior by making decisions that maximize profits through interactions with the environment. Mnih et al. ( 2015 ) proposed deep RL by employing a deep Q-network to create an agent that outperformed a professional player in a game and further advanced the field of RL.

In deep RL, the learning algorithm plays an essential role in improving efficiency. These algorithms can be categorized into three types: value-based, policy-based, and model-based RL, as illustrated in Fig.  8 .

figure 8

Learning algorithm-based reinforcement learning

RL consists of four components—agent, state, action and reward—with the agent as its core. When an action leads to a profitable state, it receives a reward, otherwise, it is discouraged. In RL, an agent is defined as any decision-maker, while everything else is considered the environment. The interactions between the environments and the agents are described by state \(s\) , action \(a\) , and reward \(r\) . At time step \(t\) , the environment is in state \(s_{t}\) , and the agent takes action \(a_{t}\) . Consequently, the environment transitions to state \(s_{t + 1}\) and rewards agent \(r_{t + 1}\) .

The agent’s decision is formalized by a policy \(\pi\) , which maps state \(s\) to action \(a\) . This is deterministic when the probability of choosing action \(a\) in state \(s\) equals one (i.e., \(\pi (a|s) = p(a|s) = 1\) ). In contrast, it is stochastic when \(p(a|s) < 1\) is used. Policy \(\pi\) can be defined as the probability distribution of all actions selected from a certain \(s\) , as follows:

where \(\Delta_{\pi }\) represents all possible actions of \(\pi\) .

In each step, the agent receives an immediate reward \(r_{t + 1}\) until it reaches the final state \(s_{T}\) . However, the immediate reward does not ensure a long-term profit. To address this, a generalized return value is used at time step \(t\) , defined as \(R_{t}\) :

where \(0 \le \gamma \le 1\) . The agents become more farsighted when \(\gamma\) approaches 1, and more shortsighted when it approaches 0.

The next step is to define a score function \(V\) to estimate the goodness of the state:

Then, we determine the goodness of a state-action pair \((s,a)\) :

Finally, we access the goodness between two policies:

Finally, we can expand \(V_{\pi } (s)\) and \(Q_{\pi } (s,a)\) through \(R_{t}\) to represent the relationship between \(s\) and \(s_{t + 1}\) as

where \(W_{{s \to s^{\prime}|a}} = E[r_{t + 1} |s_{t} = s,a_{t} = a,s_{t + 1} = s^{\prime}]\) . By solving ( 31 ) and ( 32 ), we obtain \(V\) and \(S\) , respectively.

Restricted Boltzmann machines

As Fig.  9 shows, a restricted Boltzmann machine (RBM) can be considered an undirected neural network with two layers, called the “hidden” and “visible” layers. Hidden layers are used to detect the features, whereas visible layers are used to train the input data. Given the \(n\) visible layers \(v\) and \(m\) hidden layers \(h\) , the energy function is given by

where \(\alpha_{ij}\) is the weight between the unit \(i\) \(j\) , and \(a_{i}\) and \(b_{j}\) are the biases for \(v\) and \(h\) , respectively.

figure 9

Structure of RBM

Applications of machine learning techniques in business and finance

This section considers the application fields in the following categories: marketing, stock market, e-commerce, cryptocurrency, finance, accounting, credit risk management, and energy economy. This study reviews the application status of ML in these fields.

ML is an innovative technology that can potentially improve forecasting models and assist in management decision-making. ML applications can be highly beneficial in the marketing domain because they rely heavily on building accurate predictive models from databases. Compared to the traditional statistical approach for forecasting consumer behavior, researchers have recently applied ML technology, which offers several distinctive advantages for data mining with large, noisy databases (Sirignano and Cont 2019 ). An early example of ML in marketing can be found in the work of Zahavi and Levin ( 1997 ), who used neural networks (NNs) to model consumer responses to direct marketing. Compared with the statistical approach, simple forms of NNs are free from the assumptions of normality or complete data, making them particularly robust in handling noisy data. Recently, as shown in Table  3 , ML techniques have been predominantly used to study customer behaviors and demands. These applications enable marketers to gain valuable insights and make data-driven decisions to optimize marketing strategies.

Consumer behavior refers to the actions taken by consumers to request, use, and dispose of consumer goods, as well as the decision-making process that precedes and determines these actions. In the context of direct marketing, Cui et al. ( 2006 ) proposed Bayesian networks that learn by evolutionary programming to model consumer responses to direct marketing using a large direct marketing dataset. In the supply chain domain, Melancon et al. ( 2021 ) used gradient-boosted decision trees to predict service-level failures in advance and provide timely alerts to planners for proactive actions. Regarding unsupervised learning in consumer behavior analysis, Dingli et al. ( 2017 ) implemented a CNN and an RBM to predict customer churn. However, they found that their performance was comparable to that of supervised learning when introducing added complexity in specific operations and settings. Overall, ML techniques have demonstrated their potential for understanding and predicting consumer behavior, thereby enabling businesses to make informed decisions and optimize their marketing strategies (Machado and Karray 2022 ; Mao and Chao 2021 ).

Predicting consumer demand plays a critical role in helping enterprises efficiently arrange production and generate profits. Timoshenko and Hauser ( 2019 ) used a CNN to facilitate qualitative analysis by selecting the content for an efficient review. Zhang et al. ( 2020a , b ) used a Bayesian learning model with a rich dataset to analyze the decision-making behavior of taxi drivers in a large Asian city to understand the key factors that drive the supply side of urban mobility markets. Ferreira et al. ( 2016 ) employed ML techniques to estimate historical lost sales and predict future demand for new products. For the application of consumer demand-level prediction, most of the research we reviewed used supervised learning technologies because learning consumer consumption preferences requires historical data of consumers, and only clustering consumers is insufficient to predict their consumption levels.

Stock market

ML applications in the stock market have gained immense popularity, with the majority focusing on financial time series for stock price predictions. Table 4 summarizes the reviewed articles that employed ML methods in stock market studies, including references, research objectives, data sources, applied techniques, and journals. Investing in the stock market can be highly profitable but also entails risk. Therefore, investors always try to determine and estimate stock values before taking any action. Researchers have mostly used ML techniques to predict stock prices (Bennett et al. 2022 ; Moon and Kim 2019 ). However, predicting stock values can be challenging due to the influence of uncontrollable economic and political factors that make it difficult to identify future market trends. Additionally, financial time-series data are often noisy and non-stationary, rendering traditional forecasting methods less reliable for stock value predictions. Researchers have explored ML in sentiment analysis to identify future trends in the stock market (Baba and Sevil 2021 ). Furthermore, other studies have focused on objectives such as algorithmic trading, portfolio management, and S&P 500 index trend prediction using ML techniques (Cuomo et al. 2022 ; Go and Hong 2019 ).

Various ML techniques have been successfully applied for stock price predictions. Fischer and Krauss ( 2018 ) applied LSTM networks to predict the out-of-sample directional movements of the constituent stocks of the S&P 500 from 1992 to 2015, demonstrating that LSTM networks outperform memory-free classification methods. Wu et al. ( 2021 ) applied LASSO, random forest, gradient boosting, and a DNN to cross-sectional return predictions in hedge fund selection and found that ML techniques significantly outperformed four styles of hedge fund research indices in almost all situations. Bao et al. ( 2017 ) fed high-level denoising features into the LSTM to forecast the next day’s closing price. Sabeena and Venkata ( 2019 ) proposed a modified adversarial-network-based framework that integrated a gated recurrent unit and a CNN to acquire data from online financial sites and processed the obtained information using an adversarial network to generate predictions. Song et al. ( 2019 ) used deep learning methods to predict future stock prices. Sohangir et al. ( 2018 ) applied several NN models to stock market opinions posted on StockTwits to determine whether deep learning models could be adapted to improve the performance of sentiment analysis on StockTwits. Bianchi et al. ( 2021 ) showed that extreme trees and NNs provide strong statistical evidence in favor of bond return predictability. Vo et al. ( 2019 ) proposed a deep responsible investment portfolio model containing an LSTM network to predict stock returns. All of these stock price applications use supervised learning techniques and financial time-series data to supervise learning. In contrast, it is challenging to apply unsupervised learning methods, particularly clustering, in this domain (Chullamonthon and Tangamchit 2023 ). However, RL still has certain applications in the stock markets. Lei ( 2020 ) combined deep learning and RL models to develop a time-driven, feature-aware joint deep RL model for financial time-series forecasting in algorithmic trading, thus demonstrating the potential of RL in this domain.

Additionally, the evidence suggests that hybrid LSTM methods can outperform other single-supervised ML methods in certain scenarios. Thus, in applying ML to the stock market, researchers have explored the combination of LSTM with different methods to develop hybrid models for improved performance. For instance, Tamura et al. ( 2018 ) used LSTM to predict stock prices and reported that the accuracy test results outperformed those of other models, indicating the effectiveness of the hybrid LSTM approach in stock price prediction.

Researchers have explored various hybrid approaches that combine wavelet transforms and LSTM with other techniques to predict stock prices and financial time series. Bao et al. ( 2017 ) established a new method for predicting stock prices that integrated wavelet transforms, stacked autoencoders, and LSTM. In the first stage, they eliminate noise to decompose the stock price time series. In the next stage, predictive features for the stock price are created. Finally, LSTM is applied to predict the next day’s closing price based on the features of the previous stage. The authors claimed that their model outperformed state-of-the-art models in terms of predictive accuracy and profitability. To address the non-linearity and non-stationary characteristics of financial time series, Yan and Ouyang ( 2018 ) integrated wavelet analysis with LSTM to forecast the daily closing price of the Shanghai Composite Index. Their proposed model outperformed multiple layer perceptron, SVM, and KNN with respect to finding patterns in financial time-series data. Fang et al. ( 2019 ) developed a methodology to predict exchange trade–fund option prices by integrating LSTM with support vector regression (SVR). They used two LSTM-SVR models to model the final transaction price. In the second generation of LSTM-SVR, the hidden state vectors of the LSTM and the seven factors affecting the option price were considered as SVR inputs. Their proposed model outperformed other methods, including LSTM and RF, in predicting option prices.

Online shopping, which allows users to purchase products from companies via the Internet, falls under the umbrella of e-commerce. In today’s rapidly evolving online shopping landscape, companies employ effective methods to recognize their buyers’ purchasing patterns, thereby enhancing their overall client experience. Customer reviews play a crucial role in this process as they are not only utilized by companies to improve their products and services but also by customers to assess the quality of a product and make informed purchase decisions (Da et al. 2022 ). Consequently, the decision-making process is significantly improved through analysis of reviews that provide valuable insights to customers.

Traditionally, enterprises’ e-commerce strategic planning involves assessing the performance of organizational e-commerce adoption behavior at the strategic level. In this context, the decision-making process exhibits typical behavioral characteristics. With regard to organizations’ adoption of technology, it is important to note that the entity adopting the technology is no longer an individual but the organization as a whole. However, technology adoption decisions are still made by people within an organization, and these decisions are influenced by individual cognitive factors (Zha et al. 2021 ). Individuals involved in the decision-making process have their own perspectives, beliefs, and cognitive biases, which can significantly impact an organization’s technology adoption choices and strategies (Li et al. 2019 ; Xu et al. 2021 ). Therefore, the behavioral perspective of technology acceptance provides a new perspective for e-commerce strategic planning research. With the development of ML, research on technology acceptance has been hindered by the limitations of traditional strategic e-commerce planning. Different general models of information technology acceptance behaviors are commonly explored.

Table 5 provides a summary of the aforementioned studies. Cui et al. ( 2021 ) constructed an e-commerce product marketing model based on an SVM to improve the marketing effects of e-commerce products. Pang and Zhang ( 2021 ) built an SVM model to more effectively solve the decision support problem of e-commerce strategic planning. To increase buyers’ trust in the quality of the products and encourage online purchases, Saravanan and Charanya ( 2018 ) designed an algorithm that categorizes products based on several criteria, including reviews and ratings from other users. They proposed a hybrid feature-extraction method using an SVM to classify and separate products based on their features, best product ratings, and positive reviews. Wang et al. ( 2018a , b , c ) employed LSTM to improve the effectiveness and efficiency of mapping customer requirements to design parameters. The results of their model revealed the superior performance of the RNN over the KNN. Xu et al. ( 2019 ) designed an advanced credit risk evaluation system for e-commerce platforms to minimize the transaction risks associated with buyers and sellers. To this end, they employed a hybrid ML model combined with a decision tree ANN (DT-ANN) and found that it had high accuracy and outperformed other hybrid ML models, such as logistic regression and dynamic Bayesian network. Cai et al. ( 2018 ) used deep RL to develop an algorithm to address the allocation of impression problems on e-commerce websites such as www.taobao.com , www.ebay.com , and www.amazon.com . In this algorithm, buyers are allocated to sellers based on their impressions and strategies to maximize the income of the platform. To do so, they applied a gated recurrent unit, and their findings demonstrated that it outperformed a deep deterministic policy gradient. Wu and Yan ( 2018 ) claimed that the main assumption of current production recommender models for e-commerce websites is that all historical user data are recorded. In practice, however, many platforms fail to capture such data. Consequently, they devised a list-wise DNN to model the temporal online behavior of users and offered recommendations for anonymous users.

In the accounting field, ML techniques are employed to detect fraud and estimate accounting indicators. Most companies’ financial statements reflect accounts or disclosure amounts that require estimations. Accounting estimates are pervasive in financial statements and often significantly impact a company’s financial position and operational results. The evolution of financial reporting frameworks has led to the increased use of fair value measurements, which necessitates estimation. Most financial statement items are based on subjective managerial estimates and ML has the potential to provide an independent estimate generator (Kou et al. 2021 ).

Chen and Shi ( 2020 ) utilized bagging and boosting ensemble strategies to develop two models: bagged-proportion support vector machines (pSVM) and boosted-pSVMs. Using datasets from LibSVM, they tested their models and demonstrated that ensemble learning strategies significantly enhanced model performance in bankruptcy prediction. Lin et al. ( 2019 ) emphasized the importance of finding the best match between feature selection and classification techniques to improve the prediction performance of bankruptcy prediction models. Their results revealed that using a genetic algorithm as the wrapper-based feature selection method, combined with naïve Bayes and support vector machine classifiers, resulted in remarkable predictive performance. Faris et al. ( 2019 ) investigated a combination of resampling (oversampling) techniques and multiple election method features to improve the accuracy of bankruptcy prediction methods. According to their findings, employing the oversampling technique and the AdaBoost ensemble method using a reduced error pruning (REP) tree provided reliable and promising results for bankruptcy prediction.

The earlier studies by Perols ( 2011 ) and Perols et al. ( 2017 ) were among the first to predict accounting fraud. Two recent studies by Bao et al. ( 2020 ) and Bertomeu et al. (2020) used various accounting variables to improve the detection of ongoing irregularities. Bao et al. ( 2020 ) employed ensemble learning to develop a fraud-prediction model that demonstrated superior performance compared to the logistic regression and support vector machine models with a financial kernel. Huang et al. ( 2014 ) used Bayesian networks to extract textual opinions, and their findings showed that they outperformed dictionary-based approaches, both general and financial. Ding et al. ( 2020 ) used insurance companies’ data on loss reserve estimates and realizations and documented that the loss estimates generated by ML were superior to the actual managerial estimates reported in financial statements in four out of the five insurance lines examined.

Many companies commission accounting firms to handle accounting and bookkeeping and provide them access to transaction data, documentation, and other relevant information. Mapping daily financial transactions into accounts is one of the most common accounting tasks. Therefore, Jorgensen and Igel ( 2021 ) devised ML systems based on random forest to automate the mapping process of financial transfers to the appropriate accounts. Their approach achieved an impressive accuracy of 80.50%, outperforming baseline methods that either excluded transaction text or relied on lexical bag-of-words text representations. The success of ML systems indicates the potential of ML to streamline accounting processes and increase the efficiency of financial transaction’ mapping. Table 6 summarizes the ML techniques described in “ Accounting ” section.

Credit risk management

The scoring process is an essential part of the credit risk management system used in financial institutions to predict the risk of loan applications because credit scores imply a certain probability of default. Hence, credit scoring modes have been widely developed and investigated for credit approval assessment of new applicants. This process uses a statistical model that considers both the application and performance data of a credit or loan applicant to estimate the likelihood of default, which is the most significant factor used by lenders to prioritize applicants in decision-making. Given the substantial volume of decisions involved in the consumer lending business, it is necessary to rely on models and algorithms rather than on human discretion (Bao et al. 2019 ; Husmann et al. 2022 ; Liu et al. 2019 ). Furthermore, such algorithmic decisions are based on “hard” information, such as consumer credit file characteristics collected by credit bureau agencies.

Supervised and unsupervised ML methods are widely used for credit risk management. Supervised ML techniques are used in credit scoring models to determine the relationships between customer features and credit default risk and subsequently predict classifications. Unsupervised techniques, mainly clustering algorithms, are used as data mining techniques to group samples into clusters (Wang et al. 2019 ). Hence, unsupervised learning techniques often complement supervised techniques in credit risk management.

Despite the high accuracy of ML, it is not possible to explain its predictions. However, financial institutions must maintain transparency in their decision-making processes. Fortunately, researchers have shown that ML can deduce rules to mitigate a lack of transparency without compromising accuracy (Baesens et al. 2003 ). Table 7 summarizes the recent applications of ML methods in credit risk management. Liu et al. ( 2022 ) use KNN, SVM, and random forest to predict the default probability of online loan borrowers and compare their prediction performance with that of a logistic model. Khandani et al. ( 2010 ) applied regression trees to construct non-linear, non-parametric forecasting models for consumer credit risk.

Cryptocurrency

A cryptocurrency is a digital or virtual currency used to securely exchange and transfer assets. Cryptography is used to securely transfer assets, control and regulate the addition of cryptocurrencies, and secure their transactions (Garcia et al. 2014 ); hence, the term “cryptocurrency.” In contrast to standard currencies, which depend on the central banking system, cryptocurrencies are founded on the principle of decentralized control (Zhao 2021 ). Owing to its uncontrolled and untraceable nature, the cryptocurrency market has evolved exponentially over a short period. The growing interest in cryptocurrencies in the fields of economics and finance has drawn the attention of researchers in this domain. However, the applications of cryptocurrencies and associated technologies are not limited to financing. There is a significant body of computer science literature that focuses on the supporting technologies of cryptocurrencies, which can lead to innovative and efficient approaches for handling Bitcoin and other cryptocurrencies, as well as addressing their price volatility and other related technologies (Khedr et al. 2021 ).

Generating an accurate prediction model for such complex problems is challenging. As a result, cryptocurrency price prediction is still in its nascent stages and further research efforts are required to explore this area. In recent years, ML has become one of the most popular approaches for cryptocurrency price prediction owing to its ability to identify general trends and fluctuations. Table 8 presents a survey of cryptocurrency price prediction research using ML methods. Derbentsev et al. ( 2019 ) presented a short-term forecasting model to predict the cryptocurrency prices of Ripples, Bitcoin, and Ethereum using an ML approach. Greaves and Au ( 2015 ) applied blockchain data to Bitcoin price predictions and employed various ML techniques, including SVM, ANN, and linear and logistic regression. Among the ML classifiers used, the NN classifier with two hidden layers achieved the highest price accuracy of 55%, followed by logistic regression and SVM. Additionally, the research mentioned an analysis using several tree-based models and KNN.

The most recent LSTM networks appear to be more suitable and convenient for handling sequential data, such as time series. Lahmiri and Bekiros ( 2019 ) were the first to use LSTM to predict the digital currency prices of the three currencies that were used the most at the time they conducted their study: Bitcoin, Ripple, and digital cash. In their study, long memory was used to assess the market efficiency of cryptocurrencies, and the inherent non-linear dynamics encompassing chaoticity and fractality were examined to gauge the predictability of digital currencies. Chowdhury et al. ( 2020 ) applied LSTM to the indices and constituents of cryptocurrencies to predict prices. Lahmiri and Bekiros ( 2019 ) implemented LSTM to forecast the prices of the three most widely traded cryptocurrencies. Furthermore, Altan et al. ( 2019 ) built a novel hybrid forecasting model based on LSTM to predict digital currency time series.

The existing applications of ML techniques in energy economics can be classified into two major categories: energy price and energy demand prediction. Energy prices typically demonstrate complex features, such as non-linearity, lag dependence, and non-stationarity, which present challenges for the application of simple traditional models (Chen et al. 2018 ). Owing to their high flexibility, ML techniques can provide superior prediction performance. In energy demand predictions, lagged values of consumption and socioeconomic and technological variables, such as GDP per capita, population, and technology trends, are typically utilized. Table 9 presents a summary of these studies. A critical distinction between “price” and “consumption” prediction is that the latter is not subject to market efficiency dynamics. The prediction of consumption has little effect on the actual consumption of the agents. However, price prediction tends to offset itself by creating opportunities for traders to use this information.

Predicting prices in energy markets is a complicated process because prices are subject to physical constraints on electricity generation and transmission and market power potential (Young et al. 2014 ). Predicting prices using ML techniques is one of the oldest applications in energy economics. In the early 2000s, a wave of studies attempted to forecast electricity prices using conventional ANN techniques. Ding ( 2018 ) combined ensemble empirical mode decomposition and an artificial NN to forecast international crude oil prices. Zhang et al. ( 2020a , b ) employed the LSTM method to forecast day-ahead electricity prices in a deregulated electricity market. They also investigated the intricate dependence structure within the price-forecasting model. Peng et al. ( 2018 ) applied LSTM with a differential evolution algorithm to predict electricity prices. Lago et al. ( 2018 ) first proposed a DNN to improve the predictive accuracy in a local market and then proposed a second model that simultaneously predicts prices from two markets to further improve the forecasting accuracy. Huang and Wang ( 2018 ) proposed a model that combines wavelet NNs with random time-effective functions to improve the prediction accuracy of crude oil price fluctuations.

Understanding the future energy demand and consumption is essential for short- and long-term planning. A wide range of users, including government agencies, local development authorities, financial institutions, and trading institutions, are interested in obtaining realistic forecasts of future consumption portfolios (Lei et al. 2020 ). For demand prediction, Chen et al. ( 2018 ) used ridge regression to combine extreme gradient boosting forest and feedforward deep networks to predict the annual household electricity consumption. Wang et al. ( 2018a , b , c ) first built a model using a self-adaptive multi-verse optimizer to optimize the SVM and then employed it to predict China’s primary energy consumption.

Critical discussions and future research directions

ML techniques have proven valuable in establishing computational models that capture complex relationships with the available data. Consequently, ML has become a useful tool in business and finance. This section critically discusses the existing research and outlines future directions.

Critical discussions

Although ML techniques are widely employed in business and finance, several issues need to be addressed.

Linguistic information is abundant in business and finance, encompassing online commodity comments and investors’ emotional responses in the stock market. Nonetheless, the existing research has predominantly concentrated on processing numerical data. When juxtaposed with numerical information, linguistic data harbor intricate characteristics, notably personalized individual semantics (Li et al. 2022a , b ; Zhang et al. 2021a , b ; Hoang and Wiegratz 2022 ).

The integration of ML into business and finance can lead to interpretability issues. In ML, an interpretable model refers to one in which a human observer can readily comprehend how the model transforms an observation into a prediction (Freitas 2014 ). Typically, decision-makers are hesitant to accept recommendations generated by ML techniques unless they can grasp the reasoning behind them. Unfortunately, the existing research in business and finance, particularly those employing DNNs, has seldom emphasized the interpretability of their models.

Social networks are prevalent in the marketing domain within businesses (Zha et al. 2020 ). For instance, social networks exist among consumers, whose purchasing behavior is influenced by the opinions of trusted peers or friends. However, the existing research that applies ML to marketing has predominantly concentrated on personal customer attributes, such as personality, purchasing power, and preferences (Dong et al. 2021 ). Regrettably, the potential impact of social networks and their influence on customer behavior have been largely overlooked in these studies.

ML techniques typically focus on exploring the statistical relationships between dependent and independent variables and emphasize feature correlations. However, in the context of business and finance applications, causal relationships exist between variables. For instance, consider a study suggesting that girls who have breakfast tend to have lower weights than those who do not’, based on which one might conclude that having breakfast aids in weight loss. However, in reality, these two events may only exhibit a correlation rather than causation (Yao et al. 2021 ). Causality plays a significant role in ML techniques’ performance. However, many current business and finance applications have failed to account for this crucial factor. Ignoring causality may lead to misleading conclusions and hinder accurate modeling of real-world scenarios. Therefore, incorporating causality into ML methodologies within the business and finance domains is essential for enhancing the reliability and validity of predictive models and decision-making processes.

In the emerging cryptocurrency field, although traditional statistical methods are simple to implement and interpret, they require many unrealistic statistical assumptions, making ML the best technology in this field. Although many ML techniques exist, challenges remain in accurately predicting cryptocurrency prices. However, most ML techniques require further investigation.

In recent years, rapid growth in digital payments has led to significant shifts in fraud and financial crimes (Canhoto 2021 ; Prusti et al. 2022 ; Wang et al. 2023 ). While some studies have shown the effective use of ML in detecting financial crimes, there remains a limitation in the research dedicated to this area. As highlighted by Pourhabibi et al. ( 2020 ), the complex nature of financial crime detection applications poses challenges in terms of deploying and achieving the desired detection performance levels. These challenges are manifested in two primary aspects. First, ML solutions encounter substantial pressure to deliver real-time responses owing to the constraints of processing data in real time. Second, in addition to inherent data noise, criminals often attempt to introduce deceptive data to obfuscate illicit activities (Pitropakis et al. 2019 ). Regrettably, few studies have investigated the robustness and performance of the underlying algorithmic solutions when confronted with data quality issues.

In the finance domain, an important limitation of the current literature on energy and ML is that most works highlight the computer science perspective to optimize computational parameters (e.g., the accuracy rate), while finance intuition may be ignored.

Future research directions

Thus, we propose that future research on this topic follow the directions below:

As analyzed above, there is abundant linguistic information exists in business and finance. Consequently, leveraging natural language processing technology to handle and analyze linguistic data in these domains represents a highly promising research direction.

The amalgamation of theoretical models using ML techniques is an important research topic. The incorporation of interpretable models can effectively reveal the black-box nature of ML-driven analyses, thereby elucidating the underlying reasoning behind the results. Consequently, the introduction of interpretable models into business and finance while applying ML can yield substantial benefits.

The interactions and behaviors are often intertwined within social networks, making it crucial to incorporate social network dynamics when modeling their influence on consumer behavior. Introducing the social network aspect into ML models has tremendous potential for enhancing marketing strategies and outcomes  (Trandafili and Biba 2013 ).

Causality has garnered increasing attention in the field of ML in recent years. Accordingly, we believe it is an intriguing avenue to explore when applying ML to address problems in business and finance.

Further studies need to include all relevant factors affecting market mood and track them over a longer period to understand the anomalous behavior of cryptocurrencies and their prices. We recommend that researchers analyze the use of LSTM models in future research, such as CNN LSTM and encoder–decoder LSTM, and compare the results to obtain future insights and improve price prediction results. In addition, researchers can apply sentiment analysis to collect social signals, which can be further enhanced by improving the quality of content and using more content sources. Another area of opportunity is the use of more specialized models with different types of approaches, such as LSTM networks.

Graph NNs and emerging adaptive solutions provide important opportunities for shaping the future of fraud and financial crime detection owing to their parallel structures. Because of the complexity of digital transaction processing and the ever-changing nature of fraud, robustness should be treated as the primary design goal when applying ML to detect financial crimes. Finally, focusing on real-time responses and data noise issues is necessary to improve the performance of current ML solutions for financial crime detection.

Currently, the application of unsupervised learning methods in different areas, such as marketing and risk management, is limited. Some problems related to marketing and customer management could be analyzed using clustering techniques, such as K-means, to segment clients by different demographic or behavioral characteristics and by their likelihood of default or switching companies. In energy risk management, extreme events can be identified as outliers using principal component analysis or ranking algorithms.

Conclusions

Having already made notable contributions to business and finance, ML techniques for addressing issues in these domains are significantly increasing. This review discusses advancements in ML in business and finance by examining seven research directions of ML techniques: cryptocurrency, marketing, e-commerce, energy marketing, stock market, accounting, and credit risk management. Deep learning models, such as DNN, CNN, RNN, random forests, and SVM are highlighted in almost every domain of business and finance. Finally, we analyze some limitations of existing studies and suggest several avenues for future research. This review is helpful for researchers in understanding the progress of ML applications in business and finance, thereby promoting further developments in these fields.

Availability of data and materials

Not applicable.

Abbreviations

  • Machine learning

Long short-term memory

Support vector machine

Restricted Boltzmann machine

Least absolute shrinkage and selection operator

Agarwal S (2022) Deep learning-based sentiment analysis: establishing customer dimension as the lifeblood of business management. Glob Bus Rev 23(1):119–136

Article   Google Scholar  

Ahmadi E, Jasemi M, Monplaisir L, Nabavi MA, Mahmoodi A, Jam PA (2018) New efficient hybrid candlestick technical analysis model for stock market timing on the basis of the support vector machine and heuristic algorithms of imperialist competition and genetic. Expert Syst Appl 94:21–31

Akyildirim E, Goncu A, Sensoy A (2021) Prediction of cryptocurrency returns using machine learning. Ann Oper Res 297(1–2):34

MathSciNet   Google Scholar  

Alobaidi MH, Chebana F, Meguid MA (2018) Robust ensemble learning framework for day-ahead forecasting of household-based energy consumption. Appl Energy 212:997–1012

Article   ADS   Google Scholar  

Altan A, Karasu S, Bekiros S (2019) Digital currency forecasting with chaotic meta-heuristic bio-inspired signal processing techniques. Chaos Solitons Fractals 126:325–336

Article   ADS   MathSciNet   Google Scholar  

Athey S, Imbens GW (2019) Machine learning methods that economists should know about. Annu Rev Econ 11:685–725

Baba B, Sevil G (2021) Bayesian analysis of time-varying interactions between stock returns and foreign equity flows. Financ Innov 7(1):51

Baesens B, Setiono R, Mues C, Vanthienen J (2003) Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation. Manage Sci 49(3):312–329

Bajari P, Nekipelov D, Ryan SP, Yang MY (2015) Machine learning methods for demand estimation. Am Econ Rev 105(5):481–485

Bao W, Yue J, Rao YL (2017) A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS ONE 12(7):24

Bao W, Lianju N, Yue K (2019) Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Syst Appl 128:301–315

Bao Y, Ke BIN, Li BIN, Yu YJ, Zhang JIE (2020) Detecting accounting fraud in publicly traded U.S. firms using a machine learning approach. J Acc Res 58(1):199–235

Bennett S, Cucuringu M, Reinert G (2022) Lead–lag detection and network clustering for multivariate time series with an application to the US equity market. Mach Learn 111(12):4497–4538

Article   MathSciNet   Google Scholar  

Bianchi D, Buchner M, Tamoni A (2021) Bond risk premiums with machine learning. Rev Financ Stud 34(2):1046–1089

Boughanmi K, Ansari A (2021) Dynamics of musical success: a machine learning approach for multimedia data fusion. J Mark Res 58(6):1034–1057

Breiman L (2001) Random forests. Mach Learn 45(1):5–32

Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman and Hall, Wadsworth

Google Scholar  

Cai Q, Filos-Ratsikas A, Tang P, Zhang Y (2018) Reinforcement mechanism design for e-commerce. In: Proceedings of the 2018 world wide web conference, pp 1339–1348

Canhoto AI (2021) Leveraging machine learning in the global fight against money laundering and terrorism financing: an affordances perspective. J Bus Res 131:441–452

Article   PubMed   Google Scholar  

Chen KL, Jiang JC, Zheng FD, Chen KJ (2018) A novel data-driven approach for residential electricity consumption prediction based on ensemble learning. Energy 150:49–60

Chao X, Kou G, Li T, Peng Y (2018) Jie Ke versus AlphaGo: a ranking approach using decision making method for large-scale data with incomplete information. Eur J Oper Res 265(1):239–247

Chen Z, Chen W, Shi Y (2020) Ensemble learning with label proportions for bankruptcy prediction. Expert Syst Appl 146:113155

Chen H, Fang X, Fang H (2022) Multi-task prediction method of business process based on BERT and transfer learning. Knowl Based Syst 254:109603

Chen MR, Dautais Y, Huang LG, Ge JD (2017) Data driven credit risk management process: a machine learning approach. Paper presented at the international conference on software and system process Paris, France

Chong E, Han C, Park FC (2017) Deep learning networks for stock market analysis and prediction: methodology, data representations, and case studies. Expert Syst Appl 83:187–205

Chowdhury R, Rahman MA, Rahman MS, Mahdy MRC (2020) An approach to predict and forecast the price of constituents and index of cryptocurrency using machine learning. Physica A 551:17

Chullamonthon P, Tangamchit P (2023) Ensemble of supervised and unsupervised deep neural networks for stock price manipulation detection. Expert Syst Appl 220:119698

Coble KH, Mishra AK, Ferrell S, Griffin T (2018) Big data in agriculture: a challenge for the future. Appl Econ Perspect Policy 40(1):79–96

Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

Cui G, Wong ML, Lui HK (2006) Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Manag Sci 52(4):597–612

Cui F, Hu HH, Xie Y (2021) An intelligent optimization method of e-commerce product marketing. Neural Comput Appl 33(9):4097–4110

Cuomo S, Gatta F, Giampaolo F, Iorio C, Piccialli F (2022) An unsupervised learning framework for marketneutral portfolio. Expert Syst Appl 192:116308

Da F, Kou G, Peng Y (2022) Deep learning based dual encoder retrieval model for citation recommendation. Technol Forecast Soc 177:121545

Dastile X, Celik T, Potsane M (2020) Statistical and machine learning models in credit scoring: A systematic literature survey. Appl Soft Comput 91:21

Derbentsev V, Datsenko N, Stepanenko O, Bezkorovainyi V (2019) Forecasting cryptocurrency prices time series using machine learning approach. In: SHS web of conferences, vol 65, p 02001

Ding YS (2018) A novel decompose-ensemble methodology with AIC-ANN approach for crude oil forecasting. Energy 154:328–336

Ding KX, Lev B, Peng X, Sun T, Vasarhelyi MA (2020) Machine learning improves accounting estimates: evidence from insurance payments. Rev Acc Stud 25(3):1098–1134

Dingli A, Fournier KS (2017) Financial time series forecasting - a deep learning approach. Int J Mach Learn Comput 7(5):118–122

Dingli A, Marmara V, Fournier NS (2017) Comparison of deep learning algorithms to predict customer churn within a local retail industry. Int J Mach Learn Comput 7(5):128–132

Dong YC, Li Y, He Y, Chen X (2021) Preference-approval structures in group decision making: axiomatic distance and aggregation. Decis Anal 18(4):273–295

Einav L, Levin J (2014) Economics in the age of big data. Science 346(6210):715-+

Fang Y, Chen J, Xue Z (2019) Research on quantitative investment strategies based on deep learning. Algorithms 12(2):35

Faris H, Abukhurma R, Almanaseer W, Saadeh M, Mora AM, Castillo PA, Aljarah I (2019) Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: A case from the Spanish market. Prog Artif Intell 9:1–23

Ferreira KJ, Lee BHA, Simchi-Levi D (2016) Analytics for an online retailer: demand forecasting and price optimization. Manuf Serv Oper Manag 18(1):69–88

Fischer T, Krauss C (2018) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270(2):654–669

Freitas AA (2014) Comprehensible classification models: a position paper. SIGKDD Explor Newsl 15(1):1–10

Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

Garcia D, Tessone CJ, Mavrodiev P, Perony N (2014) The digital traces of bubbles: feedback cycles between socio-economic signals in the bitcoin economy. J R Soc Interface 11(99):20140623

Article   PubMed   PubMed Central   Google Scholar  

Ghoddusi H, Creamer GG, Rafizadeh N (2019) Machine learning in energy economics and finance: a review. Energy Econ 81:709–727

Go YH, Hong JK (2019) Prediction of stock value using pattern matching algorithm based on deep learning. Int J Recent Technol Eng 8:31–35

Gogas P, Papadimitriou T (2021) Machine learning in economics and finance. Comput Econ 57(1):1–4

Goncalves R, Ribeiro VM, Pereira FL, Rocha AP (2019) Deep learning in exchange markets. Inf Econ Policy 47:38–51

Greaves A, Au B (2015) Using the bitcoin transaction graph to predict the price of bitcoin. No Data

Grimmer J (2015) We are all social scientists now: how big data, machine learning, and causal inference work together. PS Polit Sci Polit 48(1):80–83

Gu SH, Kelly B, Xiu DC (2020) Empirical Asset Pricing via Machine Learning. Rev Financ Stud 33(5):2223–2273

Hoang D, Wiegratz K (2022) Machine learning methods in finance: Recent applications and prospects. Eur Financ Manag 29(5):1657–1701

Hoerl AE, Kennard RW (1970) Ridge regression—biased estimation for nonorthogonal problems. Technometrics 12(1):55–000

Huang LL, Wang J (2018) Global crude oil price prediction and synchronization-based accuracy evaluation using random wavelet neural network. Energy 151:875–888

Huang AH, Zang AY, Zheng R (2014) Evidence on the information content of text in analyst reports. Account Rev 89(6):2151–2180

Husmann S, Shivarova A, Steinert R (2022) Company classification using machine learning. Expert Syst Appl 195:116598

Jiang ZY, Liang JJ (2017) Cryptocurrency portfolio management with deep reinforcement learning. In: Paper presented at the intelligent systems conference, London, England

Johari SN, Farid FH, Nasrudin N, Bistamam NL, Shuhaili NS (2018) Predicting Stock Market Index Using Hybrid Intelligence Model. Int J Eng Technol 7:36

Jorgensen RK, Igel C (2021) Machine learning for financial transaction classification across companies using character-level word embeddings of text fields. Intell Syst Account Financ Manag 28(3):159–172

Kamilaris A, Prenafeta-Boldu FX (2018) Deep learning in agriculture: a survey. Comput Electron Agric 147:70–90

Khandani AE, Kim AJ, Lo AW (2010) Consumer credit-risk models via machine-learning algorithms. J Bank Financ 34(11):2767–2787

Khedr AM, Arif I, Raj PVP, El-Bannany M, Alhashmi SM, Sreedharan M (2021) Cryptocurrency price prediction using traditional statistical and machine-learning techniques: a survey. Intell Syst Account Financ Manag 28(1):3–34

Kim JJ, Cha SH, Cho KH, Ryu M (2018) Deep reinforcement learning based multi-agent collaborated network for distributed stock trading. Int J Grid Distrib Comput 11(2):11–20

Kou G, Chao XR, Peng Y, Alsaadi FE, Herrera-Viedma E (2019) Machine learning methods for systemic risk analysis in financial sectors. Technol Econ Dev Eco 25(5):716–742

Kou G, Xu Y, Peng Y, Shen F, Chen Y, Chang K, Kou S (2021) Bankruptcy prediction for SMEs using transactional data and two-stage multiobjective feature selection. Decis Support Syst 140:113429

Ladyzynski P, Zbikowski K, Gawrysiak P (2019) Direct marketing campaigns in retail banking with the use of deep learning and random forests. Expert Syst Appl 134:28–35

Lago J, De Ridder F, Vrancx P, De Schutter B (2018) Forecasting day-ahead electricity prices in Europe: the importance of considering market integration. Appl Energy 211:890–903

Lahmiri S, Bekiros S (2019) Cryptocurrency forecasting with deep learning chaotic neural networks. Chaos Solitons Fractals 118:35–40

Lebichot B, Paldino GM, Siblini W, Guelton LH, Oblé F, Bontempi G (2021) Incremental learning strategies for credit cards fraud detection. Int J Data Sci Anal 12:165–174

Lei ZZ (2020) Research and analysis of deep learning algorithms for investment decision support model in electronic commerce. Electron Commer Res 20(2):275–295

Lei K, Zhang B, Li Y, Yang M, Shen Y (2020) Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading. Expert Syst Appl 140:14

Li CC, Dong YC, Xu YJ, Chiclana F, Herrera-Viedma E, Herrera F (2019) An overview on managing additive consistency of reciprocal preference relations for consistency-driven decision making and Fusion: Taxonomy and future directions. Inf Fusion 52:143–156

Li CC, Dong YC, Liang H, Pedrycz W, Herrera F (2022a) Data-driven method to learning personalized individual semantics to support linguistic multi-attribute decision making. Omega 111:102642

Li CC, Dong YC, Pedrycz W, Herrera F (2022b) Integrating continual personalized individual semantics learning in consensus reaching in linguistic group decision making. IEEE Trans Syst Man Cybern Syst 52(3):1525–1536

Lima MSM, Eryarsoy E, Delen D (2021) Predicting and explaining pig iron production on charcoal blast furnaces: a machine learning approach. INFORMS J Appl Anal 51(3):213–235

Lin WY, Hu YH, Tsai CF (2012) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst Man Cybern Syst C 42(4):421–436

Lin WC, Lu YH, Tsai CF (2019) Feature selection in single and ensemble learning-based bankruptcy prediction models. Expert Syst 36:e12335

Liu YT, Zhang HJ, Wu YZ, Dong YC (2019) Ranking range based approach to MADM under incomplete context and its application in venture investment evaluation. Technol Econ Dev Eco 25(5):877–899

Liu Y, Yang ML, Wang YD, Li YS, Xiong TC, Li AZ (2022) Applying machine learning algorithms to predict default probability in the online credit market: evidence from China. Int Rev Financ Anal 79:14

Long W, Lu ZC, Cui LX (2019) Deep learning-based feature engineering for stock price movement prediction. Knowl Based Syst 164:163–173

Ma XM, Lv SL (2019) Financial credit risk prediction in internet finance driven by machine learning. Neural Comput Appl 31(12):8359–8367

Machado MR, Karray S (2022) Applying hybrid machine learning algorithms to assess customer risk-adjusted revenue in the financial industry. Electron Commer Res Appl 56:101202

Mao ST, Chao XL (2021) Dynamic joint assortment and pricing optimization with demand learning. Manuf Serv Oper Manag 23(2):525–545

Melancon GG, Grangier P, Prescott-Gagnon E, Sabourin E, Rousseau LM (2021) A machine learning-based system for predicting service-level failures in supply chains. INFORMS J Appl Anal 51(3):200–212

Meng TL, Khushi M (2019) Reinforcement learning in financial markets. Data 4(3):110

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

Article   ADS   CAS   PubMed   Google Scholar  

Moews B, Herrmann JM, Ibikunle G (2019) Lagged correlation-based deep learning for directional trend change prediction in financial time series. Expert Syst Appl 120:197–206

Moon KS, Kim H (2019) Performance of deep learning in prediction of stock market volatility. Econ Comput Econ Cybern Stud 53(2):77–92

ADS   Google Scholar  

Nanduri J, Jia YT, Oka A, Beaver J, Liu YW (2020) Microsoft uses machine learning and optimization to reduce e-commerce fraud. Informs J Appl Anal 50(1):64–79

Nazareth N, Ramana RYV (2023) Financial applications of machine learning: a literature review. Expert Syst Appl 219:119640

Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 50(9):3826–3839

Nosratabadi S, Mosavi A, Duan P, Ghamisi P, Filip F, Band SS, Reuter U, Gama J, Gandomi AH (2020) Data science in economics: comprehensive review of advanced machine learning and deep learning methods. Mathematics 8(10):1799

Nti IK, Adekoya AF, Weyori BA (2020) A systematic review of fundamental and technical analysis of stock market predictions. Artif Intell Rev 53(4):3007–3057

Ozbayoglu AM, Gudelek MU, Sezer OB (2020) Deep learning for financial applications: a survey. Appl Soft Comput 93:106384

Padilla N, Ascarza E (2021) Overcoming the cold start problem of customer relationship management using a probabilistic machine learning approach. J Mark Res 58(5):981–1006

Pang H, Zhang WK (2021) Decision support model of e-commerce strategic planning enhanced by machine learning. Inf Syst E-Bus Manag 21(1):11

Paolanti M, Romeo L, Martini M, Mancini A, Frontoni E, Zingaretti P (2019) Robotic retail surveying by deep learning visual and textual data. Robot Auton Syst 118:179–188

Peng L, Liu S, Liu R, Wang L (2018) Effective long short-term memory with differential evolution algorithm for electricity price prediction. Energy 162:1301–1314

Perols J (2011) Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing J Pract Th 30:19–50

Perols JL, Bowen RM, Zimmermann C, Samba B (2017) Finding needles in a haystack: using data analytics to improve fraud prediction. Acc Rev 92(2):221–245

Pfeiffer J, Pfeiffer T, Meissner M, Weiss E (2020) Eye-tracking-based classification of information search behavior using machine learning: evidence from experiments in physical shops and virtual reality shopping environments. Inf Syst Res 31(3):675–691

Pitropakis N, Panaousis E, Giannetsos T, Anastasiadis E, Loukas G (2019) A taxonomy and survey of attacks against machine learning. Comput Sci Rev 34:100199

Pourhabibi T, Ong KL, Kam BH, Boo YL (2020) Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis Support Syst 133:113303

Prusti D, Behera RK, Rath SK (2022) Hybridizing graph-based Gaussian mixture model with machine learning for classification of fraudulent transactions. Comput Intell 38(6):2134–2160

Rafieian O, Yoganarasimhan H (2021) Targeting and privacy in mobile advertising. Mark Sci 40(2):193–218

Raj MP, Swaminarayan PR, Saini JR, Parmar DK (2015) Applications of pattern recognition algorithms in agriculture: a review. Int J Adv Netw Appl 6(5):2495–2502

Sabeena J, Venkata SRP (2019) A modified deep learning enthused adversarial network model to predict financial fluctuations in stock market. Int J Eng Adv Technol 8:2996–3000

Saravanan V, Charanya SK (2018) E-Commerce Product Classification using Lexical Based Hybrid Feature Extraction and SVM. Int J Innov Technol Explor Eng 9(1):1885–1891

Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Networks 61:85–117

Simester D, Timoshenko A, Zoumpoulis SI (2020) Targeting prospective customers: robustness of machine-learning methods to typical data challenges. Manag Sci 66(6):2495–2522

Singh R, Srivastava S (2017) Stock prediction using deep learning. Multimed Tools Appl 76(18):18569–18584

Sirignano J, Cont R (2019) Universal features of price formation in financial markets: perspectives from deep learning. Quant Financ 19(9):1449–1459

Sohangir S, Wang DD, Pomeranets A, Khoshgoftaar TM (2018) Big data: deep learning for financial sentiment analysis. J Big Data 5(1):25

Song Y, Lee JW, Lee J (2019) A study on novel filtering and relationship between input-features and target-vectors in a deep learning model for stock price prediction. Appl Intell 49(3):897–911

Storm H, Baylis K, Heckelei T (2020) Machine learning in agricultural and applied economics. Eur Rev Agric Econ 47(3):849–892

Tamura K, Uenoyama K, Iitsuka S, Matsuo Y (2018) Model for evaluation of stock values by ensemble model using deep learning. Trans Jpn Soc Artif Intell 2018:33

Tashiro D, Matsushima H, Izumi K, Sakaji H (2019) Encoding of high-frequency order information and prediction of short-term stock price by deep learning. Quant Financ 19(9):1499–1506

Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Stat Methodol 58(1):267–288

Timoshenko A, Hauser JR (2019) Identifying customer needs from user-generated content. Mark Sci 38(1):1–20

Trandafili E, Biba M (2013) A review of machine learning and data mining approaches for business applications in social networks. Int J E Bus Res (IJEBR) 9(1):36–53

Valencia F, Gomez-Espinosa A, Valdes-Aguirre B (2019) Price movement prediction of cryptocurrencies using sentiment analysis and machine learning. Entropy 21(6):12

Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin

Vo NNY, He X, Liu S, Xu, G (2019) Deep learning for decision making and the optimization of socially responsible investments and portfolio. Decis Support Syst 124:113097. https://doi.org/10.1016/j.dss.2019.113097

Wang XY, Luo DK, Zhao X, Sun Z (2018b) Estimates of energy consumption in China using a self-adaptive multi-verse optimizer-based support vector machine with rolling cross-validation. Energy 152:539–548

Wang Y, Mo DY, Tseng MM (2018c) Mapping customer needs to design parameters in the front end of product design by applying deep learning. CIRP Ann 67(1):145–148

Wang B, Ning LJ, Kong Y (2019) Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Syst Appl 128:301–315

Wang WY, Li WZ, Zhang N, Liu KC (2020) Portfolio formation with preselection using deep learning from long-term financial data. Expert Syst Appl 143:17

Wang C, Zhu H, Hu R, Li R, Jiang C (2023) LongArms: fraud prediction in online lending services using sparse knowledge graph. IEEE Trans Big Data 9(2):758–772

Wang Q, Li BB, Singh PV (2018) Copycats vs. original mobile apps: a machine learning copycat-detection method and empirical analysis. Inf Syst Res 29(2):273–291

Weng B, Lu L, Wang X, Megahed FM, Martinez W (2018) Predicting short-term stock prices using ensemble methods and online data sources. Expert Syst Appl 112:258–273

Wettschereck D, Aha DW, Mohri T (1997) A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 11(1–5):273–314

Wu WB, Chen JQ, Yang ZB, Tindall ML (2021) A cross-sectional machine learning approach for hedge fund return prediction and selection. Manage Sci 67(7):4577–4601

Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14:1–37

Wu C, Yan M (2018) Session-aware Information Embedding for E-commerce Product Recommendation. In: Paper presented at the Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, Singapore

Xiao F, Ke J (2021) Pricing, management and decision-making of financial markets with artificial intelligence: introduction to the issue. Financ Innov 7(1):85

Article   MathSciNet   PubMed   PubMed Central   Google Scholar  

Xu YZ, Zhang JL, Hua Y, Wang LY (2019) Dynamic credit risk evaluation method for e-commerce sellers based on a hybrid artificial intelligence model. Sustainability 11:5521

Xu WJ, Chen X, Dong YC, Chiclana F (2021) Impact of decision rules and non-cooperative behaviors on minimum consensus cost in group decision making. Group Decis Negot 30(6):1239–1260

Yan HJ, Ouyang HB (2018) Financial time series prediction based on deep learning. Wirel Pers Commun 102(2):683–700

Yao LY, Chu ZX, Li S, Li YL, Gao J, Zhang AD (2021) A survey on causal inference. ACM Trans Knowl Discov Data 15(5):1–46

Yoganarasimhan H (2020) Search personalization using machine learning. Manag Sci 66(3):1045–1070

Young D, Poletti S, Browne O (2014) Can agent-based models forecast spot prices in electricity markets? Evidence from the New Zealand electricity market. Energy Econ 45:419–434

Zahavi JN, Levin I (1997) Applying neural computing to target marketing. J Direct Mark 11(4):76–93

Zha QB, Kou G, Zhang HJ, Liang HM, Chen X, Li CC, Dong YC (2020) Opinion dynamics in finance and business: a literature review and research opportunities. Financ Innov 6(1):44

Zha QB, Dong YC, Zhang HJ, Chiclana F, Herrera-Viedma E (2021) A personalized feedback mechanism based on bounded confidence learning to support consensus reaching in group decision making. IEEE Trans Syst Man Cybern Syst 51(6):3900–3910

Zhang QG, Benveniste A (1992) Wavelet networks. IEEE Trans Neural Netw 3(6):889–898

Article   CAS   PubMed   Google Scholar  

Zhang C, Li R, Shi H, Li FR (2020a) Deep learning for day-ahead electricity price forecasting. IET Smart Grid 3(4):462–469

Zhang YJ, Li BB, Krishnan R (2020b) Learning Individual behavior using sensor data: the case of global positioning system traces and taxi drivers. Inf Syst Res 31(4):1301–1321

Zhang B, Tan RH, Lin CJ (2021a) Forecasting of e-commerce transaction volume using a hybrid of extreme learning machine and improved moth-flame optimization algorithm. Appl Intell 51(2):952–965

Zhang HJ, Li CC, Liu YT, Dong YC (2021b) Modelling personalized individual semantics and consensus in comparative linguistic expression preference relations with self-confidence: An optimization-based approach. IEEE Trans Fuzzy Syst 29:627–640

Zhao L (2021) The function and impact of cryptocurrency and data technology in the context of financial technology: introduction to the issue. Financ Innov 7(1):84

Zhu XD, Ninh A, Zhao H, Liu ZM (2021) Demand forecasting with supply-chain information and machine learning: evidence in the pharmaceutical industry. Prod Oper Manag 30(9):3231–3252

Download references

Acknowledgements

We would like to acknowledge financial support from the grant (No. 72271171) from the National Natural Science Foundation of China, the grant (No. sksy12021-02) from Sichuan University, and National Outstanding Youth Science Fund Project of National Natural Science Foundation of China (71725001).

This work was supported by the grant (No. 72271171) from the National Natural Science Foundation of China, the grant (No. sksy12021-02) from Sichuan University, National Outstanding Youth Science Fund Project of National Natural Science Foundation of China (71725001), and the Open Project of Xiangjiang Laboratory (No. 22XJ03028).

Author information

Authors and affiliations.

Business School, Sichuan University, Chengdu, 610065, China

Hanyao Gao, Haiming Liang, Xiangrui Chao & Yucheng Dong

School of Business Administration, Faculty of Business Administration, Southwestern University of Finance and Economics, Chengdu, 611130, China

Business School, Hohai University, Nanjing, 211100, China

Hengjie Zhang

Xiangjiang Laboratory, Changsha, 410205, China

Yucheng Dong

School of Economics and Management, Southwest Jiaotong University, Chengdu, 610031, China

Cong-Cong Li

You can also search for this author in PubMed   Google Scholar

Contributions

HG, GK and YD contributed to the completion of the idea and writing of this paper. HG, GK and YD contributed to the discussion of the content of the organization and HL and HZ contributed to the improvement of the text of the manuscript. HG and HL contributed to Methodology. XC, and CL contributed to the literature collection of this paper. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Gang Kou or Yucheng Dong .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Gao, H., Kou, G., Liang, H. et al. Machine learning in business and finance: a literature review and research opportunities. Financ Innov 10 , 86 (2024). https://doi.org/10.1186/s40854-024-00629-z

Download citation

Received : 10 June 2022

Accepted : 07 February 2024

Published : 19 September 2024

DOI : https://doi.org/10.1186/s40854-024-00629-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

literature review machine learning example

This paper is in the following e-collection/theme issue:

Published on 23.10.2024 in Vol 12 (2024)

Accelerating Evidence Synthesis in Observational Studies: Development of a Living Natural Language Processing–Assisted Intelligent Systematic Literature Review System

Authors of this article:

Author Orcid Image

  • Frank J Manion 1 , PhD ; 
  • Jingcheng Du 1 , PhD ; 
  • Dong Wang 2 , PhD ; 
  • Long He 1 , MS ; 
  • Bin Lin 1 , MS ; 
  • Jingqi Wang 1 , PhD ; 
  • Siwei Wang 1 , MS ; 
  • David Eckels 2 , BA ; 
  • Jan Cervenka 2 ; 
  • Peter C Fiduccia 2 , PhD ; 
  • Nicole Cossrow 2 ; 
  • Lixia Yao 2 , PhD

1 IMO Health, , 9600 W Bryn Mawr Ave # 100, Rosemont, IL, , United States

2 Merck & Co, Inc, , 126 East Lincoln Ave, Rahway, NJ, , United States

Corresponding Author:

Dong Wang, PhD

Background: Systematic literature review (SLR), a robust method to identify and summarize evidence from published sources, is considered to be a complex, time-consuming, labor-intensive, and expensive task.

Objective: This study aimed to present a solution based on natural language processing (NLP) that accelerates and streamlines the SLR process for observational studies using real-world data.

Methods: We followed an agile software development and iterative software engineering methodology to build a customized intelligent end-to-end living NLP-assisted solution for observational SLR tasks. Multiple machine learning–based NLP algorithms were adopted to automate article screening and data element extraction processes. The NLP prediction results can be further reviewed and verified by domain experts, following the human-in-the-loop design. The system integrates explainable articificial intelligence to provide evidence for NLP algorithms and add transparency to extracted literature data elements. The system was developed based on 3 existing SLR projects of observational studies, including the epidemiology studies of human papillomavirus–associated diseases, the disease burden of pneumococcal diseases, and cost-effectiveness studies on pneumococcal vaccines.

Results: Our Intelligent SLR Platform covers major SLR steps, including study protocol setting, literature retrieval, abstract screening, full-text screening, data element extraction from full-text articles, results summary, and data visualization. The NLP algorithms achieved accuracy scores of 0.86-0.90 on article screening tasks (framed as text classification tasks) and macroaverage F1 scores of 0.57-0.89 on data element extraction tasks (framed as named entity recognition tasks).

Conclusions: Cutting-edge NLP algorithms expedite SLR for observational studies, thus allowing scientists to have more time to focus on the quality of data and the synthesis of evidence in observational studies. Aligning the living SLR concept, the system has the potential to update literature data and enable scientists to easily stay current with the literature related to observational studies prospectively and continuously.

Introduction

Systematic literature reviews (SLRs) are widely recognized as a robust method to identify and summarize evidence from published sources [ 1 ]. However, conducting an SLR can be a complex, time-consuming, labor-intensive, and expensive task, depending on the breadth of the topic, level of granularity, or resolution of the review needed [ 2 , 3 ]. One recent study estimated the time and cost required to conduct an SLR can be as high as 1.72 person-years of scientist effort and approximately $140,000 per review [ 4 ]. Because SLRs are so resource intensive, it is difficult to stay up to date, and once an SLR is complete and new literature is published, the SLR may become incomplete and obsolete as time goes by.

Natural language processing (NLP) refers to artificial intelligence (AI) technologies that can extract structured information from textual documents such as medical charts, lab results, and many other types of unstructured text. NLP has significantly advanced a variety of biomedical applications in recent years. There is considerable community interest in using AI such as machine learning (ML) and NLP to improve automation in aspects of literature reviews [ 2 , 5 - 7 ]. For example, Thomas et al used NLP to identify randomized controlled trials for Cochrane reviews, and Wallace et al developed methods to extract sentences from literature related to clinical trial reports. There are also some SLR management software, such as Raynan.ai [ 8 ], which leverages NLP to expedite certain SLR steps, including article screening.

Despite these existing efforts, there is a lack of systematic and integrated NLP solutions for SLR to cover its full aspects, preventing the wide adoption of such tools in SLR projects.

Thus, in this study, we evaluated an intelligent SLR system (hereinafter referred to as ISLR) for observational SLR tasks. The use of NLP improves efficiency, while the human-in-the-loop approach improves accuracy and reduces errors. The system uses cutting-edge NLP tools that employ ML and deep learning (DL) approaches to expedite the time-consuming processes involved in an SLR by making a series of learned recommendations to the end user. The purpose of this study is to evaluate an AI tool that accelerates and streamlines the SLR process and to demonstrate the validity of this tool in 3 use cases.

Workflow and System Architecture

ISLR has 2 major views that target 2 types of users in the observational studies in an SLR lifecycle: (1) an intelligent SLR workbench for literature reviewers who conduct routine literature reviews, and (2) a living literature data dashboard for researchers and analysts who focus on analyzing SLR data and keep up to date on new evidence. Figure 1 shows the overview architecture, including the 2 major views and data flow of the SLR system. ISLR integrates AI technologies and an SLR workflow management system to support literature collection, screening, and data extraction. The living literature dashboard continuously searches and updates the SLR, allowing users to interactively navigate the updated literature and develop new insights.

literature review machine learning example

Reliable NLP systems depend heavily on the development of a reasonable workflow, user interfaces, and high-performance NLP algorithms. To develop the system and define the system workflow and user interfaces, we collaborated with end users who are experts in SLR using an iterative approach that employed industry-standard agile methodology. The team identified 6 major functional areas that were essential for the application: (1) protocol specification assistance, (2) literature search and indexing, (3) abstract screening with NLP assistance, (4) support for full-text searching, uploading, and screening, (5) full-text data element extraction using NLP assistance to identify and extract relevant data elements from full-text and embedded tables, and (6) literature data visualization to enable users to assess the SLR results and perform data discovery. Figure 2 shows the system workflow and the embedded NLP services to expedite two of the most time-consuming steps, which are article screening and data element extraction.

literature review machine learning example

Development and Validation of NLP Algorithms

As mentioned earlier, 2 sets of NLP algorithms are required for a specific SLR project, including abstract screening and full-text data element extraction. Figure 3 outlines the NLP algorithm development process for these 2 steps separately. For abstract screening, the first step is to annotate and build a corpus that includes the abstract text, citation metadata, and inclusion/exclusion status. Once the corpus is prepared, NLP algorithm training, evaluation, and selection can be performed, and the best-performing algorithms will be chosen for deployment.

literature review machine learning example

Similar to abstract screening, the NLP algorithm for the full-text data element extraction also requires a complete NLP development lifecycle. Unlike abstract screening, where labeled corpora may be available from previous SLR projects, data annotation is required to curate a labeled data set for training and evaluating NLP algorithms. The best-performing algorithms will be selected for deployment after evaluation. The following figure describes details on NLP algorithm development and validation process for SLR projects.

Three previously completed SLRs were used to guide and validate NLP development. These 3 projects included: (1) the prevalence of human papillomavirus (HPV) detected in head and neck squamous cell carcinomas (referred to as HPV Prevalence ); (2) the epidemiology of the pneumococcal disease (referred to as Pneumococcal Epidemiology ), and (3) the economic burden of pneumococcal disease (referred to as Pneumococcal Economic Burden ). The inclusion and exclusion criteria for these 3 SLRs can be found in Table S1 in Multimedia Appendix 1 .

Developing the Abstract Screening Corpora

Abstract screening was treated as a binary document classification task, ie, inclusion or exclusion of the article based on the abstract. Consequently, it was necessary to select and train NLP models for the task that demonstrated adequate performance and that had a reasonable computational time. The annotated screening literature sets from the 3 previous SLRs were used as the gold standard to train and validate models, including 1697, 207, and 421 articles for HPV Epidemiology , Pneumococcal Epidemiology, and Pneumococcal Economic Burden , respectively. The corpora contained citation metadata, including title, authors, Medical Subject Heading terms [ 9 ], and the text of the corresponding abstracts.

Developing the Full-Text Data Element Extraction Corpora

We selected 190, 25, and 24 full-text articles for HPV Prevalence , Pneumococcal Epidemiology, and Pneumococcal Economic Burden for annotation, respectively. Based on the key outcome variables defined in the 3 SLRs, we annotated 12 types of data elements, covering information related to general observational studies, such as Study Population, to disease-specific information such as HPV Lab Technique and Pneumococcal Disease Type .

Abstract Screening NLP Algorithms

For abstract screening, the NLP model classifies each article for its relevance based on its title, abstract, and other citation meta data. To build the abstract screening module, we evaluated 4 traditional ML-based document classification algorithms, XGBoost [ 10 ], support vector machines [ 11 ], logistic regression [ 12 ], and random forest [ 13 ] on the binary inclusion/exclusion classification task for abstract screening. The abstract screening corpora were used to evaluate NLP models by calculating standard metric of precision (fraction of relevant instances among the retrieved instances, also called positive predictive value ), recall (fraction of relevant instances that were retrieved, also called sensitivity ), accuracy , and F1 scores (the harmonic mean of precision and recall). The full features include title, abstract, authors, keywords, journal, Medical Subject Heading term, and publication types. We concatenated all features and extracted the term frequency-inverse document frequency vector as feature representation.

Data Element Extraction NLP Algorithms

To construct the module for data element extraction, we treated the problem of data element recognition and extraction as a named entity recognition (NER) problem, which aims to recognize the mentions of entities from the text [ 14 ]. We evaluated a series of NLP algorithms consisting of ML and DL algorithms to recognize and extract data elements from full text, including (1) conditional random fields (CRFs), a classic statistical sequence modeling algorithm that has been widely applied to NER tasks [ 15 , 16 ]; (2) long short-term memory (LSTM), a variation of recurrent neural networks that has achieved remarkable success in NER tasks [ 17 , 18 ]; and (3) “Clinical BERT (Bidirectional Encoder Representations from Transformers)” [ 19 ], a novel transformer-based DL model. Standard metrics, including precision , recall , accuracy , and F1 scores , were calculated.

Ethical Considerations

This is not applicable as this study is not human subjects research.

Here, we report the results of the construction of the annotation corpora and the results of the NLP algorithm for abstract screening and data element extraction, respectively.

Abstract Screening Corpora Description

The HPV Prevalence corpus we constructed from the existing SLR project contained 1697 total citations, of which 538 were included, and 1159 were excluded due to study criteria. The constructed Pneumococcal Epidemiology contained 207 citations, of which 85 were included and 122 were excluded. The constructed Pneumococcal Economic Burden corpus contained 421 citations, of which 79 were included, and 342 were excluded.

Abstract Screening NLP Evaluation Results

Extensive studies have shown the superiority of transformer-based DL models for many NLP tasks [ 20 - 23 ]. Based on our experiments, however, adding features to the pretrained language models did not significantly boost their performance. The performance comparison results for each task are shown in Table 1 . XGBoost achieved the highest accuracy on HPV Prevalence and Pneumococcal Economic Burden tasks, while a support vector machine achieved the highest accuracy on Pneumococcal Epidemiology task. XGBoost was ultimately chosen for deployment due to its better generalizability.

Full-Text Data Element Extraction Corpora Description

The human annotators annotated 190, 25, and 24 full-text articles for the HPV Prevalence , Pneumococcal Epidemiology , and Pneumococcal Economic Burden tasks, respectively. Among these full-text articles, 4498, 579, and 252 entity mentions were annotated for 3 projects, respectively. However, the distribution of annotated entities is highly imbalanced. For example, data elements like HPV Lab Technique and HPV Sample Type were very prevalent, but data elements like Maximum/Minimum Age in Study Cohort were rarely annotated in the corpora.

Results of the Full-Text Screening and Data Element Extraction NLP Methods

Tables 2 and 3 show the comparison of NLP performance among CRFs, LSTM, and BERT on the 3 corpora. For each of the 3 corpora used to train the NLP models, LSTM demonstrated superiority over the conventional ML algorithm (ie, CRF) on entity recognition. Among DL models, we did not observe significant improvement in F1 scores by use of the BERT model. The BERT model achieved similar or worse performance on most data elements. The performance across different tasks varies, primarily due to the availability of annotated data. For example, on average, models’ performance on HPV Prevalence is higher than Pneumococcal Epidemiology and Pneumococcal Economic Burden , as HPV Prevalence has the largest annotated data. Due to the highly imbalanced distribution of annotated entities, we observe a variation in performance across different data elements for the same model. For example, in the Pneumococcal Epidemiology task, the LSTM model has achieved 0.412 in the identification of the Study Cohort and 0.768 in the identification of the Pneumococcal Disease Type .

a CRF: conditional random field.

b LSTM: long short-term memory.

c BERT: Bidirectional Encoder Representations from Transformers.

d Not applicable.

Final NLP Algorithm Selection

NLP algorithms were needed for the 2 tasks, abstract screening, and data element extraction, in the ISLR system. The abstract screening was treated as a classification task. Based on our experimental results, XGBoost was selected for this task due to good performance on our document classification experiments and less computational complexity than DL-based models. For the data element extraction task, LSTM was selected over CRF and BERT for the same reasons.

ISLR System Components

Study protocol specification.

Study protocol specification is one of the first steps in an SLR project. Users can upload a PDF document to the system that describes the SLR study protocol for reference. The SLR system has a default list of data elements with their descriptions and answer types (eg, free text, multiple choice, and checkbox), which will be extracted from full-text PDFs of articles. The system also allows users to create and modify the list. At the end of the project, all the extracted data elements can be exported in a structured format.

Literature Search

The ISLR system is integrated with the PubMed E-utilities application programming interface, which enables users to perform direct searches on PubMed. Citation metadata such as abstracts, titles, journals, and authors can be retrieved from PubMed and indexed in the system for further screening and data element extraction. Additionally, the system provides an option for users to retrieve this citation metadata by uploading a list of individual PubMed IDs.

Abstract Screening

The purpose of abstract screening is to review collected articles’ relevance based on their title, abstract, and other relevant metadata, such as journal names, article types, and keywords. The relevant articles will be included for the following full-text screening and data element extraction steps. NLP services are provided at this step to make recommendations on whether a particular article should be included for full-text review. The supporting information (eg, salient words that are impactful to inclusion and exclusion) for the NLP recommendation will also be shown to provide explainable evidence. Human experts can further review the predictions for each article and decide on abstract screening status (keep or exclude). Figure 4 shows the abstract screening interface demonstrating prediction results and relevant terms discovered by the NLP algorithms.

literature review machine learning example

Full-Text Searching, Uploading, and Screening

This step aims to identify full-text PDF documents for each included article and further screen their relevance based on the SLR study protocol. Only the articles that are deemed relevant after this stage will be included in the final full-text data element extraction step. The process of locating full-text PDF documents for each article can be time-consuming. The ISLR system integrates with PubMed Central to automatically find and collect full-text PDFs if they are publicly available. However, for articles whose full-text PDFs are not publicly available, users need to manually locate the articles through publishers and upload the corresponding PDFs to the system through the provided user interface.

Full-Text Data Element Extraction

Extracting full-text data elements is a time-consuming process in SLR projects. It requires reviewing the full-text article and extracting multiple relevant pieces of information defined in the study protocol. These data elements are often found in various sections of an article, including tables. The ISLR system uses Amazon Textract [ 24 ] for optical character recognition to extract text and tables from PDF files, followed by NLP services to further extract information from both text and tables. The NLP services can recommend potential answers for each data element, and human experts can review, select, and modify the extracted information. Figure 5 shows a screenshot of the user interface for this step.

literature review machine learning example

Data Summary and Visualization

The ISLR system offers interactive dashboards to end users, such as researchers, for exploring the SLR results and data. These dashboards allow users to apply data filters, such as study location and cohort size, to refine their search results. For each data element extracted from full-text articles, users can click on the element to navigate to the corresponding article, ensuring traceability and appropriate references to source documents in the SLR project. Additionally, the dashboards recommend recent relevant articles and suggest articles that may require full-text screening. Figure 6 displays the major functions and screenshots of the dashboard.

literature review machine learning example

Principal Findings

As described in the introduction, conducting an SLR is complex and expensive. There is also a rapid growth of the available number of publications and other data, such as clinical trial reports used in the article search and screening processes, with an average annual growth rate for the life sciences of around 5% [25]. Consequently, there is considerable community interest in applying various types of automation, including AI, DL, and NLP, to the multiple tasks required for producing an accurate SLR [ 2 , 5 - 7 ].

An important consideration for using the results of an SLR is how often the SLR is updated and hence how timely and complete these data are with respect to the real-world evidence. “Living” ISLR system addresses the difficulty of updating an SLR by providing an automated workflow including review tools to detect when new data are available and to trigger at least a semi-automated update process for the expedited review. The system is also expandable to cover additional data elements of interest by updating existing NLP pipelines.

The major accomplishments of this ISLR system include improving the time, efficiency, cost, completeness of evidence, and error avoidance through techniques to assist researchers with decision-making (so-called human-in-the-loop). The ISLR system is aligned with the living SLR concept, as it supports a rapid update of existing literature data. Additionally, since the classification and data element extraction tasks are maintained by the system, results can be used for retraining the classification and NLP algorithms on a routine basis. Consequently, the performance of the system should improve over time.

The focus of this work was to evaluate an intelligent system that includes all major steps of an SLR with humans in the loop. The corpora evaluated in this study mostly focus on health economics and outcomes research in specific therapeutical areas. The generalizability of the learning algorithms to another domain will benefit from further formal examination. Since we have not yet conducted a time analysis of an SLR study conducted both manually and with this tool, we are unable to precisely quantify the time savings from the ISLR system. In addition, our NLP technologies limit to the extraction of relevant information directly from the text but are not able to conduct reasoning with long context to support complex data element extraction, such as GRADE (Grading of Recommendations, Assessment, Development, and Evaluation) or RoB2 (Risk of Bias 2). The recent advances in large language models, such as generative pretrained transformer 4, bring NLP technologies expert-level performance on various professional and academic benchmarks. Given its high performance, generalizability, and reasoning capacity, it would be interesting to further assess the efficacy and accuracy of large language models in various SLR tasks and complex data element extraction.

As an early and innovative attempt to automate SLR lifestyle through NLP technologies, ISLR does not fully support PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) reporting yet. We plan to continuously iterate ISLR to cover the PRISMA checklist and report generation in the future. In addition, we have not yet conducted formal usability studies of the user interface, although agile methods involving iterative refinement of the interface through input from domain experts in SLR were employed throughout the software development process.

Conclusions

Our ISLR system is a user-centered, end-to-end intelligent solution to automate and accelerate the SLR process and supports “living” SLRs with humans in the loop. The system integrates cutting-edge ML- and DL-based NLP algorithms to make recommendations on article screening and data element extraction, which allow the system to prospectively and continuously update relevant literature in a timely fashion. This allows scientists to have more time to focus on the quality of data and the synthesis of evidence and to stay current with literature related to observational studies.

Acknowledgments

This research was supported by Merck Sharp & Dohme LLC, a subsidiary of Merck & Co, Inc, Rahway, NJ.

The content is the sole responsibility of the authors and does not necessarily represent the official views of Merck & Co, Inc, Rahway, NJ, or Melax Tech.

Data Availability

The annotated corpora underlying this article are available on GitHub [ 25 ].

Authors' Contributions

Study concept and design: JD and LY. Corpus preparation: DW, JD, and LY. Experiments: JD and BL. Draft of the manuscript: FJM, JD, DW, NC, and LY. Acquisition, analysis, or interpretation of data: JD, DW, NC, and LY. Critical revision of the manuscript for important intellectual content: all authors. Study supervision: JD, LY, and NC.

Conflicts of Interest

DW, JC, DE, NC, PCF, and LY are employees of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA. JD, BL, SW, XW, LH, JW, and FJM are employees of IMO.

Inclusion and exclusion criteria for 3 systematic literature review projects.

  • Munn Z, Stern C, Aromataris E, Lockwood C, Jordan Z. What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences. BMC Med Res Methodol. Jan 10, 2018;18(1):5. [ CrossRef ] [ Medline ]
  • Tsafnat G, Glasziou P, Choong MK. Systematic review automation technologies. Syst Rev. 2014;3(74). URL: https://link.springer.com/article/10.1186/2046-4053-3-74 [ CrossRef ]
  • Higgins J, Thomas J, editors. Cochrane Handbook for Systematic Reviews of Interventions, Version 65. 2024. URL: https://training.cochrane.org/handbook/current [Accessed 2024-10-17]
  • Michelson M, Reuter K. The significant cost of systematic reviews and meta-analyses: a call for greater involvement of machine learning to assess the promise of clinical trials. Contemp Clin Trials Commun. Dec 2019;16:100443. [ CrossRef ] [ Medline ]
  • Michelson M, Ross M, Minton S. AI2 leveraging machine-assistance to replicate a systematic review. V H. May 2019;22:S34. [ CrossRef ]
  • Del Fiol G, Michelson M, Iorio A, Cotoi C, Haynes RB. A deep learning method to automatically identify reports of scientifically rigorous clinical research from the biomedical literature: comparative analytic study. J Med Internet Res. Jun 25, 2018;20(6):e10281. [ CrossRef ] [ Medline ]
  • Elliott JH, Turner T, Clavisi O, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. Feb 2014;11(2):e1001603. [ CrossRef ] [ Medline ]
  • Rayyan - Intelligent systematic review. Rayyan. 2021. URL: https://www.rayyan.ai/ [Accessed 2024-04-23]
  • Medical Subject Headings. National Library of Medicine. 2024. URL: https://www.nlm.nih.gov/mesh/meshhome.html [Accessed 2022-05-30]
  • Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Presented at: KDD ’16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Aug 13-17, 2016:785-794; San Francisco, CA. [ CrossRef ]
  • Noble WS. What is a support vector machine? Nat Biotechnol. Dec 2006;24(12):1565-1567. [ CrossRef ] [ Medline ]
  • Kleinbaum DG, Klein M. Logistic Regression: A Self-Learning Text. Springer; 2010. URL: https://link.springer.com/book/10.1007/978-1-4419-1742-3 [Accessed 2022-05-30]
  • Pal M. Random forest classifier for remote sensing classification. Int J Remote Sens. 2005;26(1):217-222. [ CrossRef ]
  • Nadeau D, Sekine S. A survey of named entity recognition and classification. Lingvist Investig. Aug 15, 2007;30(1):3-26. [ CrossRef ]
  • Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. 2001. Presented at: CML ’01: Proceedings of the Eighteenth International Conference on Machine Learning; Jun 28 to Jul 1, 2001:282-289; San Francisco, CA. URL: http://www.cs.columbia.edu/~jebara/6772/papers/crf.pdf
  • Lin S, Ng JP, Pradhan S, et al. Extracting formulaic and free text clinical research articles metadata using conditional random fields. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents. Association for Computational Linguistics; 2010:90-95. URL: https://aclanthology.org/W10-1114 [Accessed 2022-08-07]
  • Chiu JPC, Nichols E. Named entity recognition with bidirectional LSTM-CNNs. arXiv. Preprint posted online on Nov 26, 2015. URL: https://arxiv.org/abs/1511.08308 [Accessed 2024-10-17]
  • Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. arXiv. Preprint posted online on Mar 4, 2016. URL: https://arxiv.org/abs/1603.01360 [Accessed 2024-10-17]
  • Alsentzer E, Murphy JR, Boag W, et al. Publicly available clinical BERT embeddings. arXiv. Preprint posted online on Apr 6, 2019. URL: https://arxiv.org/abs/1904.03323 [Accessed 2024-10-17] [ CrossRef ]
  • Devlin J, Chang MW, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv. Preprint posted online on Oct 11, 2019. URL: https://arxiv.org/abs/1810.04805 [Accessed 2024-10-17]
  • Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. Feb 15, 2020;36(4):1234-1240. URL: https://academic.oup.com/bioinformatics/article/36/4/1234/5566506 [ CrossRef ] [ Medline ]
  • Gu Y, Tinn R, Cheng H, et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthcare. Jan 31, 2022;3(1):1-23. [ CrossRef ]
  • Chen Q, Du J, Allot A, et al. LitMC-BERT: transformer-based multi-label classification of biomedical literature with an application on COVID-19 literature curation. arXiv. Preprint posted online on Apr 19, 2022. URL: https://arxiv.org/abs/2204.08649 [Accessed 2024-10-17]
  • Amazon Textract. Amazon Web Services. URL: https://aws.amazon.com/textract/ [Accessed 2022-08-08]
  • Merck/NLP-SLR-corpora. GitHub. URL: https://github.com/Merck/NLP-SLR-corpora [Accessed 2024-10-17]

Abbreviations

Edited by Caroline Perrin; submitted 17.11.23; peer-reviewed by Shinichi Matsuda, Sicheng Zhou; final revised version received 24.04.24; accepted 23.07.24; published 23.10.24.

© Frank J Manion, Jingcheng Du, Dong Wang, Long He, Bin Lin, Jingqi Wang, Siwei Wang, David Eckels, Jan Cervenka, Peter C Fiduccia, Nicole Cossrow, Lixia Yao. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 23.10.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/ , as well as this copyright and license information must be included.

Help | Advanced Search

Computer Science > Software Engineering

Title: security of language models for code: a systematic literature review.

Abstract: Language models for code (CodeLMs) have emerged as powerful tools for code-related tasks, outperforming traditional methods and standard machine learning approaches. However, these models are susceptible to security vulnerabilities, drawing increasing research attention from domains such as software engineering, artificial intelligence, and cybersecurity. Despite the growing body of research focused on the security of CodeLMs, a comprehensive survey in this area remains absent. To address this gap, we systematically review 67 relevant papers, organizing them based on attack and defense strategies. Furthermore, we provide an overview of commonly used language models, datasets, and evaluation metrics, and highlight open-source tools and promising directions for future research in securing CodeLMs.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Machine Learning and Artificial Intelligence Supported Machining: A Review and Insights for Future Research

  • Review Paper
  • Published: 22 October 2024

Cite this article

literature review machine learning example

  • Javvadi Eswara Manikanta 1 ,
  • Nitin Ambhore   ORCID: orcid.org/0000-0001-8468-8057 4 ,
  • Amol Dhumal 2 ,
  • Naveen Kumar Gurajala 3 &
  • Ganesh Narkhede 2  

Industry 4.0 and 5.0 have led to the extensive implementation of Artificial Intelligence (AI) and Machine Learning (ML). AI and ML signify a significant breakthrough in numerous fields by enabling more efficient data processing, offering enhancements across various services, and automation to replicate the learning process of machines, thereby enhancing system accuracy. In machining processes, AI and ML play crucial roles in predicting cutting forces, tool wear, and optimizing machining parameters. By employing advanced ML systems, machining operations can achieve longer cutting tool lifespan and increased efficiency. Additionally, these systems enable the prediction and enhancement of surface quality in machined components, contributing to overall part quality improvement. Furthermore, ML techniques are instrumental in analyzing and reducing power consumption during machining operations by predicting the energy consumption patterns of machine tools. This paper reviews the applications of AI and ML in machining operations and suggests future research directions. By examining recent achievements in the available literature, it aims to advance the research field by offering innovative concepts and approaches for integrating AI and ML into machining industries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

literature review machine learning example

Explore related subjects

  • Artificial Intelligence

B. Ordek, Y. Borgianni, E. Coatanea, Machine learning-supported manufacturing: a review and directions for future research. Prod. Manuf. Res. 12 (1) (2024). https://doi.org/10.1080/21693277.2024.2326526

P.W. Khan, Y. Kim, Y.C. Byun, S.J. Lee, Influencing factors evaluation of machine learning-based energy consumption prediction. Energies. 14 , 7167 (2021)

Article   Google Scholar  

M. Soori, B. Arezoo, R. Dastres, Machine learning and artificial intelligence in CNC machine tools, a review. Sustainable Manuf. Service Econ. 2 (100009), 100009 (2023). https://doi.org/10.1016/j.smse.2023.100009

J. Cao, X. Xia, L. Wang, Z. Zhang, X. Liu, A novel CNC milling energy consumption prediction method based on program parsing and parallel neural network. Sustainability. 13 , 13918 (2021)

Y. He, P. Wu, Y. Li, Y. Wang, F. Tao, Y. Wang, A generic energy prediction model of machine tools using deep learning algorithms. Appl. Energy. 275 , 115402 (2020)

J.E. Manikanta, B.N. Raju, C. Prasad, B.S.S.P. Sankar, Machining performance on SS304 using nontoxic, biodegradable vegetable-based cutting fluids. Chem. Data Collections. 42 (100961), 100961 (2022). https://doi.org/10.1016/j.cdc.2022.100961

Z. Liu, Y. Guo, A hybrid approach to integrate machine learning and process mechanics for the prediction of specific cutting energy. CIRP Ann. 67 , 57–60 (2018)

A. Fertig, M. Weigold, Y. Chen, Adv. Industrial Manuf. Eng. 4 (100074), 100074 (2022). https://doi.org/10.1016/j.aime.2022.100074 . Machine Learning based quality prediction for milling processes using internal machine tool data

K. Ullrich, von M. Elling, K. Gutzeit, M. Dix, M. Weigold, J.C. Aurich, R. Wertheim, I.S. Jawahir, H. Ghadbeigi, AI-based optimisation of total machining performance: a review. CIRP J. Manufact. Sci. Technol. 50 , 40–54 (2024). https://doi.org/10.1016/j.cirpj.2024.01.012

S.J. Plathottam, A. Rzonca, R. Lakhnori, C.O. Iloeje, A review of artificial intelligence applications in manufacturing operations. J. Adv. Manuf. Process. 5 (3) (2023). https://doi.org/10.1002/amp2.10159

J.V. Abellan-Nebot, F. Romero Subirón, A review of machining monitoring systems based on artificial intelligence process models. Int. J. Adv. Manuf. Technol. 47 (1–4), 237–257 (2010). https://doi.org/10.1007/s00170-009-2191-8

S.K. Choudhury, G. Bartarya, Role of temperature and surface finish in predicting tool wear using neural network and design of experiments. Int. J. Mach. Tools Manuf. 43 (7), 747–753 (2003). https://doi.org/10.1016/s0890-6955(02)00166-9

V. Vishnu, K.G. Varghese, B. Gurumoorthy, Energy prediction in process planning of five-axis machining by data-driven modelling. Procedia CIRP. 93 , 862–867 (2020)

W.H. Choi, J. Kim, J.Y. Lee, Development of fault diagnosis models based on predicting energy consumption of a machine tool spindle. Procedia Manuf. 51 , 353–358 (2020)

J.M. Lee, D.K. Choi, J. Kim, C.N. Chu, Real-time tool breakage monitoring for NC milling process. CIRP Annals Manuf. Technol. 44 (1), 59–62 (1995). https://doi.org/10.1016/s0007-8506(07)62275-6

N.R. Abburi, U.S. Dixit, A knowledge-based system for the prediction of surface roughness in turning process. Robot. Comput. Integr. Manuf. 22 (4), 363–372 (2006). https://doi.org/10.1016/j.rcim.2005.08.002

J.E. Manikanta, B.N. Raju, N. Ambhore, S. Santosh, Optimizing sustainable machining processes: a comparative study of multi-objective optimization techniques for minimum quantity lubrication with natural material derivatives in turning SS304. Int. J. Interact. Des. Manuf. (IJIDeM). 18 (2), 789–800 (2024). https://doi.org/10.1007/s12008-023-01706-w

R.E. Haber, J.E. Jiménez, C.R. Peres, J.R. Alique, An investigation of tool-wear monitoring in a high-speed machining process. Sens. Actuators A Phys. 116 (3), 539–545 (2004). https://doi.org/10.1016/j.sna.2004.05.017

Y.M. Niu, Y.S. Wong, G.S. Hong, An intelligent sensor system approach for reliable tool flank wear recognition. Int. J. Adv. Manuf. Technol. 14 (2), 77–84 (1998). https://doi.org/10.1007/bf01322215

P.G. Benardos, G.C. Vosniakos, Prediction of surface roughness in CNC face milling using neural networks and Taguchi’s design of experiments. Robot. Comput. Integr. Manuf. 18 (5–6), 343–354 (2002). https://doi.org/10.1016/s0736-5845(02)00005-4

J.Z. Zhang, J.C. Chen, The development of an in-process surface roughness adaptive control system in end milling operations. Int. J. Adv. Manuf. Technol. 31 (9–10), 877–887 (2007). https://doi.org/10.1007/s00170-005-0262-z

Y.M. Ertekin, Y. Kwon, T.-L. Tseng, (bill). (2003). Identification of common sensory features for the control of CNC milling operations under varying cutting conditions. International Journal of Machine Tools & Manufacture, 43(9), 897–904. https://doi.org/10.1016/s0890-6955(03)00087-7

B. Bahr, S. Motavalli, T. Arfi, Sensor fusion for monitoring machine tool conditions. Int. J. Comput. Integr. Manuf. 10 (5), 314–323 (1997). https://doi.org/10.1080/095119297131066

M. Kehayov, L. Holder, V. Koch, Application of artificial intelligence technology in the manufacturing process and purchasing and supply management. Procedia Comput. Sci. 200 , 1209–1217 (2022). https://doi.org/10.1016/j.procs.2022.01.321

S.W. Kim, J.H. Kong, S.W. Lee, S. Lee, Recent advances of artificial intelligence in manufacturing industrial sectors: a review. Int. J. Precis. Eng. Manuf. 23 (1), 111–129 (2022). https://doi.org/10.1007/s12541-021-00600-3

H. Tercan, T. Meisen, Machine learning and deep learning based predictive quality in manufacturing: a systematic review. J. Intell. Manuf. 33 (7), 1879–1905 (2022). https://doi.org/10.1007/s10845-022-01963-8

R. Rai, M.K. Tiwari, D. Ivanov, A. Dolgui, Machine learning in manufacturing and industry 4.0 applications. Int. J. Prod. Res. 59 (16), 4773–4778 (2021). https://doi.org/10.1080/00207543.2021.1956675

M. Elahi, S.O. Afolaranmi, J.L. Martinez Lastra, J.A. Perez Garcia, A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment. Discover Artif. Intell. 3 (1) (2023). https://doi.org/10.1007/s44163-023-00089-x

G. Schuh, C. Reuter, J.-P. Prote, F. Brambring, J. Ays, Increasing data integrity for improving decision making in production planning and control. CIRP Annals Manuf. Technol. 66 (1), 425–428 (2017). https://doi.org/10.1016/j.cirp.2017.04.003

J.E. Manikanta, N. Ambhore, C. Nikhare, Application of sustainable techniques in grinding process for enhanced machinability: a review. J. Brazilian Soc. Mech. Sci. Eng. 46 (4) (2024). https://doi.org/10.1007/s40430-024-04801-5

F. Tao, Q. Qi, A. Liu, A. Kusiak, Data-driven smart manufacturing. J. Manuf. Syst. 48 , 157–169 (2018). https://doi.org/10.1016/j.jmsy.2018.01.006

R. Cioffi, M. Travaglioni, G. Piscitelli, A. Petrillo, De F. Felice, Artificial intelligence and machine learning applications in smart production: Progress, trends, and directions. Sustainability. 12 (2), 492 (2020). https://doi.org/10.3390/su12020492

T. Wuest, D. Weimer, C. Irgens, K.-D. Thoben, Machine learning in manufacturing: advantages, challenges, and applications. Prod. Manuf. Res. 4 (1), 23–45 (2016). https://doi.org/10.1080/21693277.2016.1192517

E. Kuljanic, G. Totis, M. Sortino, Development of an intelligent multisensor chatter detection system in milling. Mech. Syst. Signal Process. 23 (5), 1704–1718 (2009). https://doi.org/10.1016/j.ymssp.2009.01.003

G. Kant, K.S. Sangwan, Predictive modelling and optimization of machining parameters to minimize surface roughness using artificial neural network coupled with genetic algorithm. Procedia CIRP. 31 , 453–458 (2015a). https://doi.org/10.1016/j.procir.2015.03.043

M. Imad, A. Hosseini, H.A. Kishawy, Optimization methodologies in intelligent machining systems - A review. IFAC-PapersOnLine. 52 (10), 282–287 (2019). https://doi.org/10.1016/j.ifacol.2019.10.043

J.E. Manikanta, C. Nikhare, N.K. Gurajala, N. Ambhore, R.R. Mohan, A review on hybrid nanofluids: Preparation methods, Thermo physical properties and applications. Iran. J. Sci. Technol. Trans. Mech. Eng. (2024). https://doi.org/10.1007/s40997-024-00772-z

M. Chen, C. Wang, Q. An, W. Ming, Tool path strategy and cutting process monitoring in intelligent machining. Front. Mech. Eng. 13 (2), 232–242 (2018). https://doi.org/10.1007/s11465-018-0469-y

B. Huang, S. Zhang, R. Huang, X. Li, Y. Zhang, J. Liang, An effective numerical control machining process optimization approach of part with complex pockets for numerical control process reuse. IEEE Access: Practical Innovations Open. Solutions. 7 , 45146–45165 (2019). https://doi.org/10.1109/access.2019.2908877

M.S. Alajmi, A.M. Almeshal, Modeling of cutting force in the turning of AISI 4340 using gaussian process regression algorithm. Appl. Sci. 11 (9), 4055 (2021)

M.C. Yesilli, F.A. Khasawneh, A. Otto, Topological feature vectors for chatter detection in turning processes. Int. J. Adv. Manuf. Technol. 2022 :1–27

Z. Jurkovic, G. Cukor, M. Brezocnik, T. Brajkovic, A comparison of machine learning methods for cutting parameters prediction in high speed turning process. J. Intell. Manuf. 29 , 1683–1693 (2018)

S. Masoudi, M. Sima, M. Tolouei-Rad, Comparative study of ANN and ANFIS models for predicting temperature in machining. J. Eng. Sci. Technol. 13 (1), 211–225 (2018)

Google Scholar  

N. Xie, J. Zhou, B. Zheng, An energy-based modeling and prediction approach for surface roughness in turning. Int. J. Adv. Manuf. Technol. 96 , 2293–2306 (2018)

T. Zhou, L. He, J. Wu, F. Du, Z. Zou, Prediction of surface roughness of 304 stainless steel and multi-objective optimization of cutting parameters based on GA-GBRT. Appl. Sci. 9 (18), 3684 (2019)

X.A. Vasanth, P.S. Paul, A.S. Varadarajan, A neural network model to predict surface roughness during turning of hardened SS410 steel. Int. J. Syst. Assur. Eng. Manage. 020 ;11:704–715 (2020)

S. Vaishnav, A. Agarwal, K.A. Desai, Machine learning-based instantaneous cutting force model for end milling operation. J. Intell. Manuf. 31 , 1353–1366 (2020)

A. Yeganefar, S.A. Niknam, R. Asadi, The use of support vector machine, neural network, and regression analysis to predict and optimize surface roughness and cutting forces in milling. Int. J. Dvanced Manuf. Technol. 105 , 951–965 (2019)

A. Saadallah, F. Finkeldey, K. Morik, P. Wiederkehr, Stability prediction in milling processes using a simulation-based machine learning approach. Procedia CIRP. 72 , 1493–1498 (2018)

P. Charalampous, Prediction of cutting forces in milling using machine learning algorithms and finite element analysis. J. Mater. Eng. Perform. 30 , 2002–2013 (2021)

H.O. Unver, B. Sener, A novel transfer learning framework for chatter detection using convolutional neural networks. J. Intell. Manuf. 34, 2021: 1–20

J. Wang, B. Zou, M. Liu, Y. Li, H. Ding, K. Xue, Milling force prediction model based on transfer learning and neural network. J. Intell. Manuf. 32 , 947–956 (2021)

S. Mahata, P. Shakya, N.R. Babu, P.K. Prakasam, In-process characterization of surface finish in cylindrical grinding process using vibration and power signals. Procedia CIRP. 88 , 335–340 (2020)

H. Safarzadeh, M. Leonesio, G. Bianchi, M. Monno, Roundness prediction in centreless grinding using physics-enhanced machine learning techniques. Int. J. Adv. Manuf. Technol. 112 , 1051–1063 (2021)

E. Sauter, E. Sarikaya, M. Winter, K. Wegener, In-process detection of grinding burn using machine learning. Int. J. Adv. Manuf. Technol. 115 , 2281–2297 (2021)

E. Sauter, M. Winter, K. Wegener, Analysis of robustness and transferability in feature-based grinding burn detection. Int. J. Adv. Manuf. Technol. 120 (3–4), 2587–2602 (2022)

A. Ouladmansour, O. Ameur-Zaimeche, R. Kechiched, S. Heddam, D.A. Wood, Integrating drilling parameters and machine learning tools to improve real-time porosity prediction of multi-zone reservoirs. Case study: Rhourd Chegga oilfield, Algeria. Geoenergy Sci. Eng. 223 , 211511 (2023)

S. Schorr, M. Moller, J. Heib, D. B¨ ahre, Quality prediction of drilled and reamed bores based on torque measurements and the machine learning method of random forest. Procedia Manuf. 48 , 894–901 (2020)

A. Ziegenbein, A. Fertig, J. Metternich, M. Weigold, Data-based process analysis in machining reduction: case study for quality determination in a drilling process. Procedia CIRP. 2020; 93 , p. 1472

M. Brillinger, M. Wuwer, M. Abdul Hadi, F. Haas, Energy prediction for CNC machining with machine learning. CIRP J. Manufact. Sci. Technol. 35 , 715–723 (2021). https://doi.org/10.1016/j.cirpj.2021.07.014

S. Hu, F. Liu, Y. He, T. Hu, An on-line approach for energy efficiency monitoring of machine tools. J. Clean. Prod. 27 , 133–140 (2012). https://doi.org/10.1016/j.jclepro.2012.01.013

G. Kant, K.S. Sangwan, Predictive modelling for energy consumption in machining using artificial neural network. Procedia CIRP. 37 , 205–210 (2015). https://doi.org/10.1016/j.procir.2015.08.081

S. Pervaiz, I. Deiab, A. Rashid, M. Nicolescu, (2015). Prediction of energy consumption and environmental implications for turning operation using finite element analysis. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 229(11), 1925–1932. https://doi.org/10.1177/0954405414541105

S.-J. Shin, J. Woo, S. Rachuri, Energy efficiency of milling machining: component modeling and online optimization of cutting parameters. J. Clean. Prod. 161 , 12–29 (2017). https://doi.org/10.1016/j.jclepro.2017.05.013

N. Sihag, K.S. Sangwan, An improved micro analysis-based energy consumption and carbon emissions modeling approach for a milling center. Int. J. Adv. Manuf. Technol. 104 (1–4), 705–721 (2019). https://doi.org/10.1007/s00170-019-03807-x

Z. Zhou, B. Yao, W. Xu, L. Wang, Condition monitoring towards energy-efficient manufacturing: a review. Int. J. Adv. Manuf. Technol. 91 (9–12), 3395–3415 (2017). https://doi.org/10.1007/s00170-017-0014-x

S. Pawanr, G.K. Garg, S. Routroy, Development of a transient energy prediction model for machine tools. Procedia CIRP. 98 , 678–683 (2021). https://doi.org/10.1016/j.procir.2021.01.174

S. Pawanr, G.K. Garg, S. Routroy, A novel approach to model the energy consumption of machine tools for machining cylindrical parts. J. Manuf. Process. 84 , 28–42 (2022). https://doi.org/10.1016/j.jmapro.2022.09.040

M.P. Sealy, Z.Y. Liu, D. Zhang, Y.B. Guo, Z.Q. Liu, Energy consumption and modeling in precision hard milling. J. Clean. Prod. 135 , 1591–1601 (2016). https://doi.org/10.1016/j.jclepro.2015.10.094

Q. Xiao, C. Li, Y. Tang, L. Li, L. Li, A knowledge-driven method of adaptively optimizing process parameters for energy efficient turning. Energy (Oxford England). 166 , 142–156 (2019). https://doi.org/10.1016/j.energy.2018.09.191

D.Y. Pimenov, A. Bustillo, S. Wojciechowski, V.S. Sharma, M.K. Gupta, M. Kuntoğlu, Artificial intelligence systems for tool condition monitoring in machining: analysis and critical review. J. Intell. Manuf. 47, (2022) 1–43

G. Serin, B. Sener, A.M. Ozbayoglu, H.O. Unver, Review of tool condition monitoring in machining and opportunities for deep learning. Int. J. Adv. Manuf. Technol. 109 , 953–974 (2020)

S. Ravikumar, K. Ramachandran, Tool wear monitoring of multipoint cutting tool using sound signal features signals with machine learning techniques, Mater. Today: Proc. 5 (2018) 25720–25729

V. Parwal, B. Rout, Machine learning based approach for process supervision to predict tool wear during machining. Procedia CIRP. 98 , 133–138 (2021)

P.J. Bagga, M.A. Makhesana, A.D. Pala, K.C. Chauhan, K.M. Patel, A novel computer vision-based machine learning approach for online tool wear monitoring in machining, (2021)

V.F. Sousa, F.J. Silva, J.S. Fecheira, H.M. Lopes, R.P. Martinho, R.B. Casais, L.P. Ferreira, Cutting forces assessment in CNC machining processes: a critical review. Sensors. 20 , 4536 (2020)

Y. Tao, Z. Li, P. Hu, F.W. Chen, B.F. Ju, Y.L. Chen, High-accurate cutting forces estimation by machine learning with voice coil motor-driven fast tool servo for micro/nano cutting. Precis. Eng. 79 , 291–299 (2023)

A. Vaishnav, K. Agarwal, Desai, Machine learning-based instantaneous cutting force model for end milling operation. J. Intell. Manuf. 31 , 1353–1366 (2020)

Y. Zhang, X. Xu, Machine learning cutting force, surface roughness, and tool life in high speed turning processes. Manuf. Lett. 29 , 84–89 (2021)

B. Peng, T. Bergs, D. Schraknepper, F. Klocke, B. Döbbeler, A hybrid approach using machine learning to predict the cutting forces under consideration of the tool wear. Procedia CIRP. 82 , 302–307 (2019)

E.G. Plaza, P.N. López, Analysis of cutting force signals by wavelet packet transform for surface roughness monitoring in CNC turning. Mech. Syst. Signal. Process. 98 , 634–651 (2018)

K. Xu, Y. Li, J. Zhang, G. Chen, Force Net: an offline cutting force prediction model based on neuro-physical learning approach. J. Manuf. Syst. 61 , 1–15 (2021)

L.W. Tseng, T.S. Hu, Y.C. Hu, A smart tool holder calibrated by machine learning for measuring cutting force in fine turning and its application to the specific cutting force of low carbon steel S15C. Machines. 9 , 190 (2021)

G. Terrazas, G. Martínez-Arellano, P. Benardos, S. Ratchev, Online tool wear classification during dry machining using real time cutting force measurements and a CNN approach. J. Manuf. Mater. Process. 2 , 72 (2018)

G. Kucukyildiz, H.G. Demir, A multistage cutting tool fault diagnosis algorithm for the involute form cutter using cutting force and vibration signals spectrum imaging and convolutional neural networks. Arab. J. Sci. Eng. 46 , 11819–11833 (2021)

J. Moore, J. Stammers, J. Dominguez-Caballero, The application of machine learning to sensor signals for machine tool and process health assessment, Proc. Inst.Mech. Eng. Part B J. Eng. Manuf. 235 (2021) 1543–1557

A. Jimenez-Cortadi, I. Irigoien, F. Boto, B. Sierra, G. Rodriguez, Predictive maintenance on the machining process and machine tool. Appl. Sci. 10 , 224 (2019)

B. Luo, H. Wang, H. Liu, B. Li, F. Peng, Early fault detection of machine tools based on deep learning and dynamic identification. IEEE Trans. Ind. Electron. 66 , 509–518 (2018)

E. Traini, G. Bruno, G. D’antonio, F. Lombardi, Machine learning framework for predictive maintenance in milling. IFAC-PapersOnLine. 52 , 177–182 (2019)

J. Diaz-Rozo, C. Bielza, P. Larrañaga, Machine learning-based CPS for clustering high throughput machining cycle conditions. Procedia Manuf. 10 , 997–1008 (2017)

U.L. Adizue, A.D. Tura, E.O. Isaya et al., Surface quality prediction by machine learning methods and process parameter optimization in ultra-precision machining of AISI D2 using CBN tool. Int. J. Adv. Manuf. Technol. 129 , 1375–1394 (2023). https://doi.org/10.1007/s00170-023-12366-1

Download references

Acknowledgements

Not applicable.

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and affiliations.

Department of Mechanical Engineering, Shri Vishnu Engineering College for Women, Bhimavaram, 534202, India

Javvadi Eswara Manikanta

Department of Mechanical Engineering, Vishwakarma Institute of Information Technology, Pune, 411048, India

Amol Dhumal & Ganesh Narkhede

Department of Mechanical Engineering, CMR College of Engineering & Technology, Hyderabad, Telangana, 501401, India

Naveen Kumar Gurajala

Department of Mechanical Engineering, Vishwakarma Institute of Technology, Pune, 411037, India

Nitin Ambhore

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Javvadi Eswara Manikanta, Nitin Ambhore, Amol Dhumal, Naveen Kumar Gurajala and Ganesh Narkhede. The first draft of the manuscript was written by Javvadi Eswara Manikanta, Nitin Ambhore, Amol Dhumal and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Nitin Ambhore .

Ethics declarations

Ethical approval, conflicts of interest.

The authors declare they have no financial interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Manikanta, J.E., Ambhore, N., Dhumal, A. et al. Machine Learning and Artificial Intelligence Supported Machining: A Review and Insights for Future Research. J. Inst. Eng. India Ser. C (2024). https://doi.org/10.1007/s40032-024-01118-z

Download citation

Received : 19 July 2024

Accepted : 08 October 2024

Published : 22 October 2024

DOI : https://doi.org/10.1007/s40032-024-01118-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence
  • Machine learning
  • Machining operations
  • Industry 4.0 and 5.0
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. (PDF) A Literature Review on Supervised Machine Learning Algorithms and

    literature review machine learning example

  2. (PDF) Systematic Reviews of Machine Learning in Healthcare: A

    literature review machine learning example

  3. (PDF) Literature review of machine learning techniques to analyse

    literature review machine learning example

  4. 39 Best Literature Review Examples (Guide & Samples)

    literature review machine learning example

  5. systematic literature review machine learning

    literature review machine learning example

  6. (PDF) Machine Learning Techniques for Developing Remotely Monitored

    literature review machine learning example

VIDEO

  1. Review

  2. A Systematic Literature Review on Federated Machine Learning:From A Software Engineering Perspective

  3. Machine Learning Example in 1 Minute 👾

  4. Student feedback about internship at Nullclass 🙇‍♂️ #feedback #internship #nullclass

  5. [Paper review] Machine Learning in Asset Management

  6. Principal Component Analysis

COMMENTS

  1. Systematic literature review of machine learning methods used in the

    Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine ...

  2. An open source machine learning framework for efficient and ...

    It is a challenging task for any research field to screen the literature and determine what needs to be included in a systematic review in a transparent way. A new open source machine learning ...

  3. Machine Learning: Algorithms, Real-World Applications and Research

    In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI ...

  4. Review Machine Learning for industrial applications: A comprehensive

    Application domains, trend, and evolutions are investigated. Machine Learning (ML) is a branch of artificial intelligence that studies algorithms able to learn autonomously, directly from the input data. Over the last decade, ML techniques have made a huge leap forward, as demonstrated by Deep Learning (DL) algorithms implemented by autonomous ...

  5. A systematic literature review of machine learning methods applied to

    Improving rail network velocity: a machine learning approach to predictive maintenance (Li et al., 2014) 2014: 37: Predicting the need for vehicle compressor repairs using maintenance records and logged vehicle data (Prytz et al., 2015) 2015: 13: Fault diagnosis of automobile gearbox based on machine learning techniques (Praveenkumar et al ...

  6. A systematic literature review on machine learning applications for

    The use of Machine Learning (ML) techniques to mine online reviews has been found broadly in literature [4], [5]. CSA, traditionally a DM and text classification task [6] , is described as the computational understanding of consumer's sentiments, opinions, and attitude towards services or products [7] , [8] .

  7. Systematic reviews of machine learning in healthcare: a literature review

    Artificial Intelligence and Machine Learning (ML) have to the potential to improve health outcomes and increase healthcare system's efficiency. A systematic literature review (SLR) identified 220 published SLRs evaluating ML applications in healthcare settings covering 10,462 ML.

  8. A Systematic Literature Review of Machine Learning Applications in

    A SLR is a way allowing us to evaluate and to interpret researches related to a specific research question, or to a research subject. It focuses on giving an objective evaluation of a research subject trough a credible methodology [].Once the research questions are specified, a protocol is established, this covered definitions of "Inclusion and exclusion criteria", "Search strategy ...

  9. Systematic reviews of machine learning in healthcare: a literature review

    Systematic reviews of machine learning in healthcare: a literature review Katarzyna Kolasa a, Bisrat Admassu , Malwina Hołownia-Voloskovaa, Katarzyna ... [9,10]. Second, a random sample of 30 SLRs was ana-lyzed to assess the most commonly reported information across the included publications and to develop an extraction grid capable of ...

  10. An intelligent literature review: adopting inductive approach to define

    The intelligent literature review framework transparently and reliably reviewed 305 sample documents. ... Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak. 2021;21(1):1-19. Article Google Scholar

  11. Artificial intelligence and the conduct of literature reviews

    In this essay, we focus on the use of AI-based tools in the conduct of literature reviews. Advancing knowledge in this area is particularly promising since (1) standalone review projects require substantial efforts over months and years (Larsen et al., 2019), (2) the volume of reviews published in IS journals has been rising steadily (Schryen et al., 2020), and (3) literature reviews involve ...

  12. Advancements and Challenges in Machine Learning: A Comprehensive Review

    In the current world of the Internet of Things, cyberspace, mobile devices, businesses, social media platforms, healthcare systems, etc., there is a lot of data online today. Machine learning (ML) is something we need to understand to do smart analyses of these data and make smart, automated applications that use them. There are many different kinds of machine learning algorithms. The most ...

  13. PDF The Application of Machine Learning in Literature Reviews: a Framework

    On a meta-level, a general process for SLR includes six steps: (1) formulating the problem, (2) conducting a literature search, (3) screening to include the relevant literature, (4) ensuring quality control, (5) extracting information, and (6) carrying out analysis and synthesis (Tate et al., 2015).

  14. Hybrid approaches to optimization and machine learning ...

    Section 5 presents the core of this work; it is the systematic literature review, considering a numerical analysis of the papers published since 1970 involving optimization and machine learning; and a bibliometric and in-depth analysis of relevant papers over the three last years is also presented, which encompassed 1007 papers.

  15. Machine learning in requirements elicitation: a literature review

    A growing trend in requirements elicitation is the use of machine learning (ML) techniques to automate the cumbersome requirement handling process. This literature review summarizes and analyzes studies that incorporate ML and natural language processing (NLP) into demand elicitation. We answer the following research questions: (1) What ...

  16. A Systematic Literature Review on Machine Learning and Deep Learning

    Machine learning and deep learning algorithms are widely used in computer science domains. These algorithms are mostly used for classification and regression problems in almost every field of life. Semantic segmentation is an instantly growing research topic in the last few decades that refers to the association of each pixel in the image to the class it belongs. This paper illustrates the ...

  17. (PDF) The Role of Machine Learning in Enhancing ...

    This review paper presents a comprehensive systematic assessment of machine learning (ML) techniques for estimating PM concentrations, drawing on studies published from 2018 to 2024.

  18. PDF A Systematic Literature Review of Student Performance Prediction Using

    This review investigates the application of different techniques of data mining and machine learning to; Predict the performance of students at risk in academic institutions. Determine and predict students' dropout from on-going courses. Evaluate students' performance based on dynamic and static data.

  19. (PDF) A literature review on artificial intelligence

    The literature of machine learning is wide (Grumberg et al., 2003, Brod ley and F riedl, 1999, Meek, 2001 and W alker, 2000).The following is a brief description of the v arious machine learning ...

  20. A systematic review of the literature on machine learning application

    A systematic review of the literature on machine learning application of determining the attributes influencing academic performance. Author links open overlay panel Iddrisu ... the scope of the population sample needs a benchmarked dataset and embedding the appropriate intervention outlines that will map the learner's performance early in ...

  21. Machine learning in business and finance: a literature review and

    This study provides a comprehensive review of machine learning (ML) applications in the fields of business and finance. First, it introduces the most commonly used ML techniques and explores their diverse applications in marketing, stock analysis, demand forecasting, and energy marketing. In particular, this review critically analyzes over 100 articles and reveals a strong inclination toward ...

  22. (PDF) Machine Learning:A Review

    portant: (1) Machine learning is important in adjusting its struc-. ture to produce desired outputs due to the heavy amount. of data input into the system [57]. (2) Machine learning is also ...

  23. JMIR Medical Informatics

    Background: Systematic literature review (SLR), a robust method to identify and summarize evidence from published sources, is considered as a complex, time-consuming, labor-intensive and expensive task. ... Multiple machine learning-based NLP algorithms were adopted to automate article screening and data element extraction processes. The NLP ...

  24. [2410.16349] Large Language Models in Computer Science Education: A

    Computer Science > Machine Learning. arXiv:2410.16349 (cs) ... We present a comprehensive systematic literature review to examine the impact of LLMs in computer science and computer engineering education. We analyze their effectiveness in enhancing the learning experience, supporting personalized education, and aiding educators in curriculum ...

  25. Leveraging Machine Learning for Sophisticated Rental Value ...

    This paper applies novel machine learning (ML) approaches, including ensemble techniques, neural networks, linear regression (LR), and tree-based algorithms, specifically designed for forecasting rental prices in Munich. ... the literature review emphasises the extensive application of AI-based techniques for forecasting housing market trends ...

  26. Financial applications of machine learning: A literature review

    4. Review of financial applications of machine learning. This section presents a comprehensive review of existing literature across the six financial areas: stock markets, portfolio management, cryptocurrency, foreign exchange markets, financial crisis, and bankruptcy and insolvency. The performed review of the 126 selected articles includes an ...

  27. Security of Language Models for Code: A Systematic Literature Review

    Language models for code (CodeLMs) have emerged as powerful tools for code-related tasks, outperforming traditional methods and standard machine learning approaches. However, these models are susceptible to security vulnerabilities, drawing increasing research attention from domains such as software engineering, artificial intelligence, and cybersecurity. Despite the growing body of research ...

  28. A Literature Review on Predictive Data Analytics and Learning Models in

    By leveraging machine learning models, the research seeks to develop flexible strategies that utilize pattern recognition and statistical analysis to anticipate potential trends. Predictive data analytics is at the core of this research, serving as a fundamental pillar for extracting actionable insights and making data-driven decisions from ...

  29. Machine Learning and Artificial Intelligence Supported Machining: A

    Industry 4.0 and 5.0 have led to the extensive implementation of Artificial Intelligence (AI) and Machine Learning (ML). AI and ML signify a significant breakthrough in numerous fields by enabling more efficient data processing, offering enhancements across various services, and automation to replicate the learning process of machines, thereby enhancing system accuracy. In machining processes ...