• Research article
  • Open access
  • Published: 19 March 2019

Machine learning in medicine: a practical introduction

  • Jenni A. M. Sidey-Gibbons 1 &
  • Chris J. Sidey-Gibbons   ORCID: orcid.org/0000-0002-4732-7305 2 , 3 , 4  

BMC Medical Research Methodology volume  19 , Article number:  64 ( 2019 ) Cite this article

95k Accesses

592 Citations

46 Altmetric

Metrics details

Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data.

We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples ( N =683) was randomly split into evaluation ( n =456) and validation ( n =227) samples.

We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment.

The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble.

Conclusions

We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.

Peer Review reports

Driven by an increase in computational power, storage, memory, and the generation of staggering volumes of data, computers are being used to perform a wide-range of complex tasks with impressive accuracy. Machine learning (ML) is the name given to both the academic discipline and collection of techniques which allow computers to undertake complex tasks. As an academic discipline, ML comprises elements of mathematics, statistics, and computer science. Machine learning is the engine which is helping to drive advances in the development of artificial intelligence. It is impressively employed in both academia and industry to drive the development of ‘intelligent products’ with the ability to make accurate predictions using diverse sources of data [ 1 ]. To date, the key beneficiaries of the 21 st century explosion in the availability of big data, ML, and data science have been industries which were able to collect these data and hire the necessary staff to transform their products. The learning methods developed in and for these industries offer tremendous potential to enhance medical research and clinical care, especially as providers increasingly employ electronic health records.

Two areas which may benefit from the application of ML techniques in the medical field are diagnosis and outcome prediction. This includes a possibility for the identification of high risk for medical emergencies such as relapse or transition into another disease state. ML algorithms have recently been successfully employed to classify skin cancer using images with comparable accuracy to a trained dermatologist [ 2 ] and to predict the progression from pre-diabetes to type 2 diabetes using routinely-collected electronic health record data [ 3 ].

Machine learning will is increasingly employed in combination with Natural Language Processing (NLP) to make sense of unstructured text data. By combining ML with NLP techniques, researchers have been able to derive new insights from comments from clinical incident reports [ 4 ], social media activity [ 5 , 6 ], doctor performance feedback [ 7 ], and patient reports after successful cancer treatments [ 8 ]. Automatically generated information from unstructured data could be exceptionally useful not only in order to gain insight into quality, safety, and performance, but also for early diagnosis. Recently, an automated analysis of free-speech collected during in-person interviews resulted in the ability to predict transition to psychosis with perfect accuracy in a group of high-risk youths [ 9 ].

Machine learning will also play a fundamental role in the development of learning healthcare systems. Learning healthcare systems describe environments which align science, informatics, incentives, and culture for continuous improvement and innovation. In a practical sense, these systems; which could occur on any scale from small group practices to large national providers, will combine diverse data sources with complex ML algorithms. The result will be a continuous source of data-driven insights to optimise biomedical research, public health, and health care quality improvement [ 10 ].

Machine learning

Machine learning techniques are based on algorithms – sets of mathematical procedures which describe the relationships between variables. This paper will explain the process of developing (known as training ) and validating an algorithm to predict the malignancy of a sample of breast tissue based on its characteristics. Though algorithms work in different ways depending on their type there are notable commonalities in the way in which they are developed. Though the complexities of ML algorithms may appear esoteric, they often bear more than a subtle resemblance to conventional statistical analyses.

Given the commonalities shared between statistical and ML techniques, the boundary between the two may seem fuzzy or ill-defined. One way to delineate these bodies of approaches is to consider their primary goals. The goal of statistical methods is inference ; to reach conclusions about populations or derive scientific insights from data which are collected from a representative sample of that population. Though many statistical techniques, such as linear and logistic regression, are capable of creating predictions about new data, the motivator of their use as a statistical methodology is to make inferences about relationships between variables. For example, if we were to create a model which described the relationship between clinical variables and mortality following organ transplant surgery for example, we would need to have insight into the factors which distinguish low mortality risk from high if we were to develop interventions to improve outcomes and reduce mortality in the future. In statistical inference, therefore, the goal is to understand the relationships between variables.

Conversely, in the field of ML, the primary concern is an accurate prediction ; the ‘what’ rather than the ‘how’. For example, in image recognition, the relationship between the individual features (pixels) and the outcome is of little relevance if the prediction is accurate. This is a critical facet of ML techniques as the relationship between many inputs, such as pixels in image or video and geo-location, are complex and usually non-linear. It is exceptionally difficult to describe in a coherent way the relationships between predictors and outcomes both when the relationships are non-linear and when there are a large number of predictors, each of which make a small individual contribution to the model.

Fortunately for the medical field, many relationships of interest are reasonably straightforward, such as those between body mass index and diabetes risk or tobacco use a lung cancer. Because of this, their interaction can often be reasonably well explained using relatively simple models. In many popular applications of ML, such a optimizing navigation, translating documents, and identifying objects in videos, understanding the relationship between features and outcomes is of less importance. This allows the use of complex non-linear algorithms. Given this key difference, it might be useful for researchers to consider that algorithms exist on a continuum between those algorithms which are easily interpretable (i.e., Auditable Algorithms) and those which are not (i.e., Black Boxes), presented visually in Fig.  1 .

figure 1

The complexity/interpretability trade-off in machine learning tools

Interesting questions remain as to when a conventionally statistical technique becomes a ML technique. In this work, we will introduce some that computational enhancements to traditional statistical techniques, such as elastic net regression, make these algorithms performed well with big data. However, a fuller discussion of the similarities and differences between ML and conventional statistics is beyond the purview of the current paper. Interested readers are directed to materials which develop the ideas discussed here [ 11 ]. It should also be acknowledged that whilst the ’Black Box’ concept does generally apply to models which utilize non-linear transformations, such as the neural networks, work is being carried out to facilitate feature identification in complex algorithms [ 12 ].

The majority of ML methods can be categorised into two types learning techniques: those which are supervised and those which are unsupervised. Both are introduced in the following sections.

Supervised learning

Supervised ML refers to techniques in which a model is trained on a range of inputs (or features) which are associated with a known outcome. In medicine, this might represent training a model to relate a person’s characteristics (e.g., height, weight, smoking status) to a certain outcome (onset of diabetes within five years, for example). Once the algorithm is successfully trained, it will be capable of making outcome predictions when applied to new data. Predictions which are made by models trained using supervised learning can be either discrete (e.g., positive or negative, benign or malignant) or continuous (e.g., a score from 0 to 100).

A model which produces discrete categories (sometimes referred to as classes) is referred to as a classification algorithm. Examples of classification algorithms include those which, predict if a tumour is benign or malignant, or to establish whether comments written by a patient convey a positive or negative sentiment [ 2 , 6 , 13 ]. In practice, classification algorithms return the probability of a class (between 0 for impossible and 1 for definite). Typically, we would transform any probability greater than.50 into a class of 1, but this threshold may be altered to improve algorithm performance as required. This paper provides an example of a classification algorithm in which a diagnosis is predicted.

A model which returns a prediction of a continuous value is known as a regression algorithm. The use of the term regression in ML varies from its use in statistics, where regression is often used to refer to both binary outcomes (i.e., logistic regression) and continuous outcomes (i.e., linear regression). In ML, an algorithm which is referred to as a regression algorithm might be used to predict an individual’s life expectancy or tolerable dose of chemotherapy.

Supervised ML algorithms are typically developed using a dataset which contains a number of variables and a relevant outcome. For some tasks, such as image recognition or language processing, the variables (which would be pixels or words) must be processed by a feature selector. A feature selector picks identifiable characteristics from the dataset which then can be represented in a numerical matrix and understood by the algorithm. In the examples above, a feature may be the colour of a pixel in an image or the number of times that a word appears in a given text. Using the same examples, outcomes may be whether an image shows a malignant or benign tumour or whether transcribed interview responses indicate predisposition to a mental health condition.

Once a dataset has been organised into features and outcomes, a ML algorithm may be applied to it. The algorithm is iteratively improved to reduce the error of prediction using an optimization technique.

Note that, when training ML algorithms, it is possible to over-fit the algorithm to the nuances of a specific dataset, resulting in a prediction model that does not generalise well to new data. The risk of over-fitting can be mitigated using various techniques. Perhaps the most straight-forward approach, which will be employed in this work, is to split our dataset into two segments; a training segment and a testing segment to ensure that the trained model can generalize to predictions beyond the training sample. Each segment contains a randomly-selected proportion of the features and their related outcomes. This allows the algorithm to associate certain features, or characteristics, with a specific outcome, and is known as training the algorithm. Once training is completed, the algorithm is applied to the features in the testing dataset without their associated outcomes. The predictions made by the algorithm are then compared to the known outcomes of the testing dataset to establish model performance. This is a necessary step to increase the likelihood that the algorithm will generalise well to new data. This process is illustrated graphically in Fig.  2 .

figure 2

Overview of supervised learning. a Training b Validation c Application of algorithm to new data

Unsupervised Machine Learning

In contrast with supervised learning, unsupervised learning does not involve a predefined outcome. In unsupervised learning, patterns are sought by algorithms without any input from the user. Unsupervised techniques are thus exploratory and used to find undefined patterns or clusters which occur within datasets. These techniques are often referred to as dimension reduction techniques and include processes such as principal component analysis, latent Dirichlet analysis and t-Distributed Stochastic Neighbour Embedding (t-SNE) [ 14 – 16 ]. Unsupervised learning techniques are not discussed at length in this work, which focusses primarily on supervised ML. However, unsupervised methods are sometimes employed in conjunction with the methods used in this paper to reduce the number of features in an analysis, and are thereby worth mention. By compressing the information in a dataset into fewer features, or dimensions, issues including multiple-collinearity or high computational cost may be avoided. A visual illustration of an unsupervised dimension reduction technique is given in Fig.  3 . In this figure, the raw data (represented by various shapes in the left panel) are presented to the algorithm which then groups the data into clusters of similar data points (represented in the right panel). Note that data which do not have sufficient commonality to the clustered data are typically excluded, thereby reducing the number of features within of the dataset.

figure 3

A visual illustration of an unsupervised dimension reduction technique

In a similar way to the supervised learning algorithms described earlier, also share many similarities to statistical techniques which will be familiar to medical researchers. Unsupervised learning techniques make use of similar algorithms used for clustering and dimension reduction in traditional statistics. Those familiar with Principal Component Analysis and factor analysis will already be familiar with many of the techniques used in unsupervised learning.

What this paper will achieve

This paper provides a pragmatic example using supervised ML techniques to derive classifications from a dataset containing multiple inputs. The first algorithm we introduce, the regularized logistic regression, is very closely related to multivariate logistic regression. It is distinguished primarily by the use of a regularization function which both reduces the number of features in the model and attenuates the magnitude of their coefficients. Regularization is, therefore, suitable for datasets which contain many variables and missing data (known as high sparsity datasets ), such as the term-document matrices which are used to represent text in text mining studies.

The second algorithm, a Support Vector Machine (SVM), gained popularity among the ML community for its high performance deriving accurate predictions in situations where the relationship between features and the outcome is non-linear. It uses a mathematical transformation known as the kernel trick , which we describe in more detail below.

Finally, we introduce an Artificial Neural Network (ANN), in which complex architecture and heavily modifiable parameters have led to it’s widespread use in many challenging applications, including image and video recognition. The addition of speciality neural networks, such as recurrent or convolutional networks, to ANNs has resulted in impressive performance on a range of tasks. Being highly parametrized models, ANNs are prone to over-fitting. Their performance may be improved using a regularization technique, such as DropConnect.

The ultimate goal of this manuscript is to imbue clinicians and medical researchers with both a foundational understanding of what ML is, how it may be used, as well as the practical skills to develop, evaluate, and compare their own algorithms to solve prediction problems in medicine.

How to follow this paper

We provide a conceptual introduction alongside practical instructions using code written for the R Statistical Programming Environment, which may be easily modified and applied to other classification or regression tasks. This code will act as a framework upon which researchers can develop their own ML studies. The models presented here may be fitted to diverse types of data and are, with minor modifications, suitable for analysing text and images.

This paper is divided into sections which describe the typical stages of a ML analysis: preparing data, training algorithms, validating algorithms, assessing algorithm performance, and applying new data to the trained models.

Throughout the paper, examples of R code used to the run the analyses are presented. The code is given in full in Additional file  1 . The data which was used for these analyses are available in Addition file 2 .

The dataset used in this work is the Breast Cancer Wisconsin Diagnostic Data Set. This dataset is publicly available from the University of California Irvine (UCI) Machine Learning Repository [ 17 ]. It consists of characteristics, or features, of cell nuclei taken from breast masses which were sampled using fine-needle aspiration (FNA), a common diagnostic procedure in oncology. The clinical samples used to form this dataset were collected from January 1989 to November 1991. Relevant features from digitised images of the FNA samples were extracted through the methods described in Refs. [ 13 , 18 , 19 ]. An example of one of the digitised images from an FNA sample is given in Fig.  4 .

figure 4

An example of an image of a breast mass from which dataset features were extracted

A total of 699 samples were used to create this dataset. This number will be referred to as the number of instances . Each instance has an I.D. number, diagnosis, and set of features attributed to it. While the Sample I.D. is unique to that instance, the diagnosis, listed as class in the dataset, can either be malignant or benign, depending if the FNA was found to be cancerous or not. In this dataset, 241 instances were diagnosed as malignant, and 458 instances were found to be benign. Malignant cases have a class of four, and benign cases have a class of two. This class, or diagnosis, is the outcome of the instance.

The features of the dataset are characteristics identified or calculated from each FNA image. There are nine features in this dataset, and each is valued on a scale of 1 to 10 for a particular instance, 1 being the closest to benign and 10 being the most malignant [ 18 ]. Features range from descriptors of cell characteristics, such as Uniformity of Cell Size and Uniformity of Cell Shape , to more complex cytological characteristics such as Clump Thickness and Marginal Adhesion . All nine features, along with the Instance No., Sample I.D., and Class are listed in Table  1 . The full dataset is a matrix of 699 × 12 (one identification number, nine features, and one outcome per instance).

This dataset is simple and therefore computationally efficient. The relatively low number of features and instances means that the analysis provided in this paper can be conducted using most modern PCs without long computing times. Although the principals are the same as those described throughout the rest of this paper, using large datasets to train Machine learning algorithms can be computationally intensive and, in some cases, require many days to complete. The principals illustrated here apply to datasets of any size.

The R Statistical Programming Language is an open-source tool for statistics and programming which was developed as an extension of the S language. R is supported by a large community of active users and hosts several excellent packages for ML which are both flexible and easy to use. R is a computationally efficient language which is readily comprehensible without special training in computer science. The R language is similar to many other statistical programming languages, including MATLAB, SAS, and STATA. Packages for R are arranged into different task views on the Comprehensive R Archive Network. The Machine Learning and Statistical Learning task view currently lists almost 100 packages dedicated to ML.

Many, if not most, R users access the R environment using RStudio, an open-source integrated developer environment (IDE) which is designed to make working in R more straightforward. We recommend that readers of the current paper download the latest version of both R and RStudio and access the environment through the RStudio application. Both R and RStudio are free to use and available for use under an open-source license.

Conducting a machine learning analysis

The following section will take you through the necessary steps of a ML analysis using the Wisconsin Cancer dataset.

Importing and preparing the dataset.

Training the ML algorithms.

Testing the ML algorithms.

Assessing sensitivity, specificity and accuracy of the algorithms.

Plotting receiver operating characteristic curves.

Applying new data to the trained models.

1. Importing and preparing the dataset.

The dataset can be downloaded directly from the UCI repository using the code in Fig.  5 .

figure 5

Import the data and label the columns

We first modify the data by re-scoring missing data from ‘?’ to NA, removing any rows with missing data and re-scoring the class variables from 2 and 4 to 0 and 1, where 0 indicates the tumour was benign and 1 indicates that it was malignant. Recall that a dataset with many missing data points is referred to as a sparse dataset. In this dataset there are small number of cases (n =16) with at least one missing value. To simplify the analytical steps, we will remove these cases, using the code in Fig.  6 .

figure 6

Remove missing items and restore the outcome data

Datasets used for supervised ML are most easily represented in a matrix similar to the way Table  1 is presented. The n columns are populated with the n −1 features, with the single remaining column containing the outcome. Each row contains an individual instance. The features which make up the training dataset may also be described as inputs or variables and are denoted in code as x . The outcomes may be referred to as the label or the class and are denoted using y .

Recall that it is necessary to train a supervised algorithm on a training dataset in order to ensure it generalises well to new data. The code in Fig.  7 will divide the dataset into two required segments, one which contains 67% of the dataset, to be used for training; and the other, to be used for evaluation, which contains the remaining 33%.

figure 7

Split the data into training and testing datasets

2. Training the ML algorithms

Now that we have arranged our dataset into a suitable format, we may begin training our algorithms. These ML algorithms which we will use are listed below and detailed in the following section.

Logistic regression using Generalised Linear Models (GLMs) with \(\mathscr {L}_{1}\) Least Absolute Selection and Shrinkage Operator (LASSO) regularisation.

Support Vector Machines (SVMs) with a radial basis function (RBF) kernel.

Artificial Neural Networks (ANNs) with a single hidden layer.

Regularised regression using Generalised Linear Models (GLMs)

Regularised General Linear Models (GLMs) have demonstrated excellent performance in some complex learning problems, including predicting individual traits from on-line digital footprints [ 20 ], classifying open-text reports of doctors’ performance [ 7 ], and identifying prostate cancer by desorption electro-spray ionization mass spectrometric imaging of small metabolites and lipids [ 21 ].

When fitting GLMs using datasets which have a large number of features and substantial sparsity, model performance may be increased when the contribution of each of the included features to the model is reduced (or penalised) using regularisation, a process which also reduces the risk of over-fitting. Regularisation effectively reduces both the number of coefficients in the model and their magnitudes, making especially it suitable for big datasets that may have more features than instances. In this example, feature selection is guided by the Least Absolute Shrinkage and Selection Operator (LASSO). Other forms of regularisation are available, including Ridge Regression and the Elastic Net (which is a linear blend of both Ridge and LASSO regularisation) [ 22 ]. An accessible, up-to-date summary of LASSO and other regularisation techniques is given in Ref [ 23 ].

Regularised GLMs are operationalised in R using the glmnet package [ 24 ]. The code below demonstrates how the GLM algorithm is fitted to the training dataset. In the glmnet package, the regularistion parameter is chosen using the numerical value referred to as alpha. In this package, a alpha value of 1 selects LASSO regularisation where as alpha 0 selects Ridge regularization, a value between between 0 and 1 selects a linear blend of the two techniques known as the Elastic Net [ 22 ].

nFold cross-validation is used to ascertain the optimal value of lambda ( λ ), the regularisation parameter. The value of ( λ ) which minimizes prediction error is stored in the glm_model$lambda.min object. The smaller the λ value, the greater the effect of regularisation upon the number of features in the model and their respective coefficients. Figure  8 shows the effect of different levels of log( λ ). The optimal value of log( λ ) is indicated using the vertical broken line (shown here at x = -5.75). The rightmost dotted line indicates the most parsimonious value of log( λ ) which is within 1 standard deviation of the absolute minimum value. Note that the random nature of cross-validation means that values of log( λ ) may differ slightly between analyses. The integers are given above Fig.  8 (0-9) relate to the number of features included in the model. The code shown in Fig.  9 fits the GLM algorithm to the data and extracts the minimum value of λ and the weights of the coefficients.

figure 8

Regression coefficients for the GLM model. The figure shows the coefficients for the 9 model features for different values of log( λ ). log( λ ) values are given on the lower x-axis and number of features in the model are displayed above the figure. As the size of log( λ ) decreases the number of variables in the model (i.e. those with a nonzero coefficient) increases as does the magnitude of each feature. The vertical dotted line indicates the value of log( λ ) at which the accuracy of the predictions is maximized

figure 9

Fit the GLM model to the data and extract the coefficients and minimum value of lambda

Figure  10 shows the cross-validation curves for different levels of log( λ ). This figure can be plotted using the code in Fig.  11 .

figure 10

Cross-validation curves for the GLM model. The figure shows the cross-validation curves as the red dots with upper and lower standard deviation shown as error bars

figure 11

Plot the cross-validation curves for the GLM algorithm

Figure  8 shows magnitude of the coefficients for each of the variables within the model for different values of log( λ ). The vertical dotted line indicates the value of log( λ ) which minimises the mean squared error established during cross-validation. This figure can be augmented with a dotted vertical line indicating the value of log( λ ) using the abline() function, shown in Fig.  12 .

figure 12

Plot the coefficients and their magnitudes

Support Vector Machines (SVMs)

Support Vector Machine (SVM) classifiers operate by separating the two classes using a linear decision boundary called the hyperplane. The hyperplane is placed at a location that maximises the distance between the hyperplane and instances [ 25 ].

Fig.  13 depicts an example of a linear hyperplane that perfectly separates between two classes. In real-world examples, it may not be possible to adequately separate the two classes using a linear hyperplane. By maximising the width of the decision boundary then the generalisability of the model to new data is optimised. Rather than employ a non-linear separator such as a high-order polynomial, SVM techniques use a method to transform the feature space such that the classes do become linearly separable. This technique, known as the kernel trick, is demonstrated in Fig.  14 .

figure 13

A SVM Hyperplane The hyperplane maximises the width of the decision boundary between the two classes

figure 14

The kernel trick The kernel trick modifies the feature space allowing separation of the classes with a linear hyperplane

Fig.  14 shows an example of a two classes that are not separable using a linear separator. By projecting the data to X 2 , they become linearly separable using the y =5 hyperplane. A popular method for kernel transformation in high-dimensional space is the radial basis function (RBF).

The SVM algorithm is fitted to the data using a function, given in Fig.  15 , which is arranged in a similar way to the regularised regression shown above.

figure 15

Fit the SVM algorithm to the data

Further exploration of SVM which attempt to fit separating hyperplanes following different feature space transformations is possible by altering the kernel argument to “linear”, “radial”, “polynomial”, or “sigmoid”.

Artificial Neural Networks (ANNs)

Artificial Neural Networks (ANNs) are algorithms which are loosely modelled on the neuronal structure observed in the mammalian cortex. Neural networks are arranged with a number of input neurons, which represent the information taken from each of the features in the dataset. which feed into any number of hidden layers before passing to an output layer in which the final decision is presented. As information passes through the ’neurons’, or nodes, where is is multiplied by the weight of the neuron (plus a constant bias term) and transformed by an activation function. The activation function applies a non-linear transformation using a simple equation shown in Eq. 1 .

In recurrent ANNs, a process is undertaken in which the prediction errors are fed back through the network before modifying the weights of each neural connection is altered until error level is minimised, a process known as backpropagation [ 26 ].

Deep Neural Networks (DNNs) refers to neural networks which have many hidden layers. Deep learning, which may utilise DNNs, has produced impressive results when employed in complex tasks using very high dimensional data, such as image recognition [ 27 ] and computer-assisted diagnosis of melanoma [ 2 ].

DNNs are heavily parametrised and, resultantly, can be prone to over-fitting models to data. Regularisation can, like the GLM algorithm described above, be used prevent this. Other strategies to improve performance can include dropout regularisation, where some number of randomly-selected units are omitted from the hidden layers during training [ 28 ].

The code in Fig.  16 demonstrates the code for fitting a neural network. This is straightforward, requiring the x and y datasets to be defined, as well as the number of units in the hidden layer using the size argument.

figure 16

Fit the ANN algorithm to the data

3. Testing the ML algorithms

In order to test the performance of the trained algorithms, it is necessary to compare the predictions which the algorithm has made on data other than the data upon which it was trained with the true outcomes for that data which we have known but we did not expose the algorithm to. To accomplish this in he R programming environment, we would create a vector of model predictions using the x_test matrix, which can be compared to the y_test vector to establish performance metrics. This is easily achievable using the predict() function, which is included in the stats package in the R distribution. The nnet package contains a minor modification to the predict() function, and as such the type argument is set to ‘raw’, rather than ‘response’ for the neural network. This code is given in Fig.  17 .

figure 17

Extract predictions from the trained models on the new data

4. Assessing the sensitivity, specificity and accuracy of the algorithms

Machine learning algorithms for classification are typically evaluated using simple methodologies that will be familiar to many medical researchers and clinicians. In the current study, we will use sensitivity, specificity, and accuracy to evaluate the performance of the three algorithms. Sensitivity is the proportion of true positives that are correctly identified by the test, specificity is the proportion of true negatives that are correctly identified by the test and the accuracy is the proportion of the times which the classifier is correct [ 29 ]. Equations used to calculate sensitivity, specificity, and accuracy are given below.

Confusion matrices

Data from classifiers are often represented in a confusion matrix in which the classifications made by the algorithm (e.g., pred_y_svm ) are compared to the true classifications (which the algorithms were blinded to) in the dataset (i.e., y_test ). Once populated, the confusion matrix provides all of the information needed to calculate sensitivity, specificity, and accuracy manually. An example of an unpopulated confusion matrix is demonstrated in Table  2 .

Confusion matrices can be easily created in R using the caret package. The confusionMatrix() function creates a confusion matrix and calculates sensitivity, specificity, and accuracy. The confusionMatrix() function requires a binary input for the predictors whereas the pred() functions used earlier produce a vector of continuous values between 0 and 1, in which a larger value reflects greater certainty that the sample was positive. Before evaluating a binary classifier, a cut-off threshold must be decided upon. The round() function used in the code shown in Fig.  18 effectively sets a threshold of >.50 for a positive prediction by rounding values ≤.50 down to 0 and values >.50 up to 1. While this is sufficient for this teaching example, users may wish to evaluate the optimal threshold for a positive prediction as this may differ from.50. The populated confusion matrix for this example is shown in Table  3 and is displayed alongside sensitivity, specificity, and accuracy.

figure 18

Create confusion matrices for the three algorithms

5. Plotting receiver operating characteristic curves

Receiver operating characteristics curves are useful and are shown in the code in Fig.  19 using the pROC package. An example output is given in Fig.  20 . These curves illustrate the relationship between the model’s sensitivity (plotted on the y -axis) and specificity (plotted on the x -axis). The grey diagonal line is reflective of as-good-as-chance performance and any curves which are plotted to the left of that line are performing better than chance. Interpretation of ROC curves is facilitated by calculating the area under each curve (AUC) [ 30 ]. The AUC gives a single value which explains the probability that a random sample would be correctly classified by each algorithm. In this example all models perform very well but the SVM algorithm shows the best performance, with AUC =.97 compared to the ANN (AUC =.95) and the LASSO-regularized regression (AUC =.94).

figure 19

Draw received operating curves and calculate the area under them

figure 20

Receiver Operating Characteristics curves

6. Applying new data to the trained models

Despite many similarities, ML is differentiated from statistical inference by its focus on predicting real-life outcomes from new data. As such, we develop models not to infer the relationships between variables but rather to produce reliable predictions from new data (though, as we have demonstrated, prediction and inference are not mutually exclusive).

In order to use the trained models to make predictions from data we need to construct either a vector (if there is a single new case) or a matrix (if there are multiple new cases). We need to ensure that the new data are entered into the model in the same order as the x_train and x_test matrices. In this case, we need to enter new data in the order of thickness , cell size , cell shape , adhesion , epithelial size , bare nuclei , bland cromatin , normal nucleoli , and mitoses . The code in Fig.  21 demonstrates how these data are represented in a manner that allows them to be processed by the trained model. Note that all three algorithms return predictions that suggest there is a near-certainty that this particular sample is malignant.

figure 21

Apply new data to the trained and validated algorithm

Additional ML techniques

Reducing prediction error; the case for ensembles..

When working to maximise the performance of a predictive model, it can be beneficial to group different algorithms together to create a more robust prediction in a process known as ensemble learning [ 24 ]. There are too many ensemble techniques to adequately summarize here, but more information can be found in Ref. [ 23 ].

The principal of ensemble learning can be demonstrated using a un-weighted voting algorithm with R code. The code in Fig.  22 can be used to demonstrate the process of developing both an averaging and and voting algorithm.

figure 22

Create predictions from the ensemble

Natural language processing

Another common use for classification algorithms is in Natural Language Processing (NLP), the branch of ML in which computers are taught to interpret linguistic data. One popular example of NLP is in sentiment analysis, which involves ML algorithms trained to classify texts into different categories relating to the sentiment they convey; usually positive, negative, or neutral. We will give an overview of how features can be extracted from text and then used in the framework we have introduced above.

A linguistic dataset (also known as a corpus ) comprises a number of distinct documents . The documents can be broken down into smaller tokens of text, such as the individual words contained within. These tokens can be used as the features in a ML analysis as demonstrated above. In such an analysis, we arrange the x_train matrix such that the rows represent the individual documents and the tokenized features are represented in the columns. This arrangement for linguistic analysis is known as a term-document matrix (TDM).

In its most basic form, each row of the TDM represents a simple count of the words which were used in a document. In this case, the width of a TDM is equal to the number of unique words in the entire corpus and, for each document, the value any given cell will either be 0 if the word does not appear in that comment or 1 if it does. Arranging a document this way leads to two issues: firstly, that the majority of the matrix likely contains null values (an issue known as sparsity ); and secondly, that many of the documents contain the most common words in a language (e.g., “the”, “a”, or “and”) which are not very informative in analysis. Refining the TDM using a technique known as a term-frequency-inverse document frequency (TF-IDF) weighting can reduce the value of certain common words in the matrix which may be less informative and increase the value of less common words, which may be more informative. It is also possible to remove uninformative words using a pre-defined dictionary known as a stop words dictionary.

In a TDM, words can be tokenized individually, known as unigrams , or as groups of sequential words, known a nGrams where n is the number of words extracted in the token ( i.e, bi-gram or tri-gram extraction ). Such extraction can mitigate issues caused by grammatical nuances such as negation (e.g., “I never said she stole my money.”). Some nuances are more difficult to analyse robustly, especially those used commonly in spoken language, such as emphasis or sarcasm. For example, the sentence above about the stolen money could have at least 7 different meanings depending on where the emphasis was placed.

A TDM can be easily developed in R using the tools provided in the tm package. In Table  4 , we demonstrate a simple uniGram (single word) TDM without TF-IDF weighting.

The code in Fig.  23 demonstrates the process for creating a term document management for a vector of open-text comments called ’comments’. modifications are made to the open text comments including the removal of punctuation and weighting using the TF-DF technique. The final matrix which is saved to an objects names ’x’ could The linked to a vector of outcomes ‘y’ and used to train and validate machine learning algorithms using the process described above listings 3 to 11.

figure 23

Create a term document matrix

Once created, documents in the TDM can be combined with a vector of outcomes using the cbind() function, as shown in Table  4 , and processed in the same way as demonstrated in Fig.  7 . Interested readers can explore the informative tm package documentation to learn more about term-document matrices [ 31 ].

When trained on a proportion of the dataset, the three algorithms were able to classify cell nuclei in the remainder of the dataset with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Though each algorithm performed well individually, maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm (see Table  3 ).

Model performance was marginally increased when the three algorithms were arranged into a voting ensemble, with an overall accuracy of.97, sensitivity of.99 and specificity of.95 (see the attached R Code for further details.).

Machine learning has the potential to transform the way that medicine works [ 32 ], however, increased enthusiasm has hitherto not been met by increased access to training materials aimed at the knowledge and skill sets of medical practitioners.

In this paper, we introduce basic ML concepts within a context which medical researchers and clinicians will find familiar and accessible. We demonstrate three commonly-used algorithms; a regularized general linear model, support vector machines (SVM), and an artificial neural network to classify tumour biopsies with high accuracy as either benign or malignant. Our results show that all algorithms can perform with high accuracy, sensitivity, and specificity despite substantial differences in the way that the algorithms work. The best-performing algorithm, the SVM, is very similar to the method demonstrated by Wolberg and Mangasarian who used different versions of the same dataset with fewer observations to achieve similar results [ 18 , 33 ]. It is noteworthy that the LASSO-regularized linear regression also performed exceptionally well whilst preserving the ability to understand which features were guiding the predictions (see Table  5 ). In contrast, the archetypal ’black box’ of the heavily-parametrized neural network could not improve classification accuracy.

In parallel to our analysis, we demonstrate techniques which can be applied with a commonly-used and open-source programming software (the R environment) which does not require prior experience with command-line computing. The presented code is designed to be re-usable and easily adaptable, so that readers may apply these techniques to their own datasets. With some modification, the same code may be used to develop linguistic classifiers or object recognition algorithms using open-text or image-based data respectively. Though the R environment now provides many options for advanced ML analyses, including deep learning, the framework of the code can be easily translated to other programming languages, such as Python, if desired. After working through examples in this paper we suggest that user apply their knowledge to problems within their own datasets. Doing so will elucidate specific issue which need to be overcome and will form a foundation for continued learning in this area. Further information can be from any number of excellent textbooks, websites, and online courses. Additional practice data sets can be obtained from the University of California Irvine Machine learning data sets repository which at the time of writing, includes an additional 334 datasets suitable for classification tasks, including 35 which contain open-text data [ 17 ].

Further, this paper acts to demystify ML and endow clinicians and researchers without a previous ML experience with the ability to critically evaluate these techniques. This is particularly important because without a clear understanding of the way in which algorithms are trained, medical practitioners are at risk of relying too heavily on these tools which might not always perform as expected. In their paper demonstrating a multi-surface pattern separation technique using a similar dataset, Wolberg and Mangasarian stress the importance of training algorithms on data which does not itself contain errors; their model was unable to achieve perfect performance as the sample in the dataset appeared to have been incorrectly extracted from an area beyond the tumour. The oft-told parable of the failure of the Google Flu Trends model offers an accessible example of the risks and consequences posed by a lack of understanding of ML models deployed ostensibly to improve health [ 34 ]. In short, the Google Flu Trends model was not generalizable over time as the Google Search data it was trained on was temporally sensitive. Looking to applications of ML beyond the medical field offers further insight into some risks that these algorithms might engender. For example, concerns have been raised about predictive policing algorithms and, in particular, the risk of entrenching certain prejudices in an algorithm which may be apparent in police practice. Though the evidence of whether predictive policing algorithms leads to biases in practice is unclear [ 35 ], it stands to reason that if biases exist in routine police work then models taught to recognize patterns in routinely collected data would have no means to exclude these biases when making predictions about future crime risk. Similar bias-based risks have been identified in some areas of medical practice and, if left unchecked, threaten the ethical use of data-driven automation in those areas [ 36 ]. An understanding of the way ML algorithms are trained is essential to minimize and mitigate the risks of entrenching biases in predictive algorithms in medicine.

The approach which we have taken in this paper entails some notable strengths and weaknesses. We have chosen to use a publicly-available dataset which contains a relatively small number of inputs and cases. The data is arranged in such a way that will allow those trained in medical disciplines to easily draw parallels between familiar statistical and novel ML techniques. Additionally, the compact dataset enables short computational times on almost all modern computers. A caveat of this approach is that many of the nuances and complexities of ML analyses, such as sparsity or high dimensionality, are not well represented in the data. Despite the omission of these common features of a ML dataset, we are confident that users who have worked through the examples given here with the code provided in the appendix will be well-placed to further develop their skills working on more complex datasets using the scalable code framework which we provide. In addition, this data also usefully demonstrates an important principle of ML: more complex algorithms do not necessarily beget more useful predictions.

We look toward a future of medical research and practice greatly enhanced by the power of ML. In the provision of this paper, we hope that the enthusiasm for new and transformative ML techniques is tempered by a critical appreciation for the way in which they work and the risks that they could pose.

Abbreviations

Artificial neural network

Area under the curve

Fine needle aspiration

Generalized linear model

Integrated developer environment

Least absolute shrinkage and selection operator

Radial basis function

Received operating characteristics

Support vector machine

Term document - inverse document frequency

Term document matrix

t-embedded stochastic neighbor embedding

University of California, Irvine

Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Sci (NY). 2015; 349(6245):255–60. https://doi.org/10.1126/science.aaa8415 .

Article   CAS   Google Scholar  

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017; 542(7639):115–8. https://doi.org/10.1038/nature21056 .

Anderson J, Parikh J, Shenfeld D. Reverse Engineering and Evaluation of Prediction Models for Progression to Type 2 Diabetes: Application of Machine Learning Using Electronic Health Records. J Diabetes. 2016.

Ong M-S, Magrabi F, Coiera E. Automated identification of extreme-risk events in clinical incident reports. J Am Med Inform Assoc. 2012; 19(e1):e110–e18.

Article   Google Scholar  

Greaves F, Ramirez-Cano D, Millett C, Darzi A, Donaldson L. Use of sentiment analysis for capturing patient experience from free-text comments posted online,. J Med Internet Res. 2013; 15(11):239. https://doi.org/10.2196/jmir.2721 .

Hawkins JB, Brownstein JS, Tuli G, Runels T, Broecker K, Nsoesie EO, McIver DJ, Rozenblum R, Wright A, Bourgeois FT, Greaves F. Measuring patient-perceived quality of care in US hospitals using Twitter,. BMJ Qual Saf. 2016; 25(6):404–13. https://doi.org/10.1136/bmjqs-2015-004309 .

Gibbons C, Richards S, Valderas JM, Campbell J. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy,. J Med Internet Res. 2017; 19(3):65. https://doi.org/10.2196/jmir.6533 .

Wagland R, Recio-Saucedo A, Simon M, Bracher M, Hunt K, Foster C, Downing A, Glaser A, Corner J. Development and testing of a text-mining approach to analyse patients’ comments on their experiences of colorectal cancer care. Qual Saf BMJ. 2015:2015–004063. https://doi.org/10.1136/bmjqs-2015-004063 .

Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, Ribeiro S, Javitt DC, Copelli M, Corcoran CM. Automated analysis of free speech predicts psychosis onset in high-risk youths. npj Schizophr. 2015; 1(1):15030. https://doi.org/10.1038/npjschz.2015.30 .

Friedman CP, Wong AK, Blumenthal D. Achieving a Nationwide Learning Health System. Sci Transl Med. 2010; 2(57):57–29.

Beam A, Kohane I. Big Data and Machine Learning in Health Care. J Am Med Assoc. 2018; 319(13):1317–8.

Lei T, Barzilay R, Jaakkola T. Rationalizing Neural Predictions. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16: 2016. p. 1135–1144. https://doi.org/10.1145/2939672.2939778 .

Mangasarian OL, Street WN, Wolberg WH. Breast Cancer Diagnosis and Prognosis via Linear Programming: AAAI; 1994, pp. 83 - 86.

Jolliffe I, Jolliffe I. Principal Component Analysis. In: Wiley StatsRef: Statistics Reference Online. Chichester: John Wiley & Sons, Ltd: 2014.

Google Scholar  

Blei DM, Ng AY, Jordan MI. Latent Dirichlet Allocation. J Mach Learn Res. 2003; 3(Jan):993–1022.

Maaten Lvd, Hinton G. Visualizing Data using t-SNE. J Mach Learn Res. 2008; 9(Nov):2579–605.

Lichman M. UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set. 2014. http://archive.ics.uci.edu/ml . Accessed 8 Aug 2017.

Wolberg WH, Mangasariant OL. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc Natl Acad Sci USA. 1990; 87:9193–6.

Bennett KP. Decision tree construction via linear programming: University of Wisconsin-Madison Department of Computer Sciences; 1992, pp. 97–101.

Kosinski M, Stillwell D, Graepel T. Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci. 2013; 110(15):5802–5. https://doi.org/10.1073/pnas.1218772110 .

Banerjee S, Zare RN, Tibshirani RJ, Kunder CA, Nolley R, Fan R, Brooks JD, Sonn GA. Diagnosis of prostate cancer by desorption electrospray ionization mass spectrometric imaging of small metabolites and lipids. Proc Natl Acad Sci U S A. 2017; 114(13):3334–9. https://doi.org/10.1073/pnas.1700677114 .

Zou H, Zou H, Hastie T. Regularization and variable selection via the Elastic Net. J R Stat Soc Ser B. 2005; 67:301–20.

Efron B, Hastie T. Computer Age Statistical Inference, 1st edn. Cambridge: Cambridge University Press; 2016.

Book   Google Scholar  

Hastie T, Tibshirani R, Friedman J. Elements of statistical learning. 2001; 1(10). New York: Springer series in statistics.

Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995; 20(3):273–97. https://doi.org/10.1007/BF00994018 .

Hecht-Nielsen. Theory of the backpropagation neural network. 1989:593–605. https://doi.org/10.1109/IJCNN.1989.118638 .

Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. In: Advances in neural information processing systems: 2012. p. 1097–1105.

Dahl GE, Sainath TN, Hinton GE. Improving deep neural networks for LVCSR using rectified linear units and dropout. 2013:8609–8613. https://doi.org/10.1109/ICASSP.2013.6639346 .

Martin Bland J, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 327(8476):307–10. https://doi.org/10.1016/S0140-6736(86)90837-8 .

Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve,. Radiology. 1982; 143(1):29–36. https://doi.org/10.1148/radiology.143.1.7063747 .

Meyer D, Hornik K, Fienerer I. Text mining infrastructure in R. J Stat Softw. 2008; 25(5):1–54.

Darcy AM, Louie AK, Roberts LW. Machine Learning and the Profession of Medicine. J Am Med Assoc. 2016; 315(6):551. https://doi.org/10.1001/jama.2015.18421 .

Wolberg WH, Street WN, Mangasarian OL. Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates. Cancer Lett. 1994; 77(2-3):163–71. https://doi.org/10.1016/0304-3835(94)90099-X .

Lazer D, Kennedy R, King G, Vespignani A. The Parable of Google Flu: Traps in Big Data Analysis. Science. 2014; 343(6176):1203–5. https://doi.org/10.1126/science.1248506 .

Brantingham PJ, Valasik M, Mohler GO. Does Predictive Policing Lead to Biased Arrests? Results From a Randomized Controlled Trial. Stat Public Policy. 2018; 5(1):1–6. https://doi.org/10.1080/2330443X.2018.1438940 .

Haider AH, Chang DC, Efron DT, Haut ER, Crandall M, Cornwell EE. Race and Insurance Status as Risk Factors for Trauma Mortality. Arch Surg. 2008; 143(10):945. https://doi.org/10.1001/archsurg.143.10.945 .

Download references

Acknowledgments

We acknowledge and thank the investigators, scientists, and developers who have contributed to the scientific community by making their data, code, and software freely available. We thank our colleagues in Cambridge, Boston, and beyond who provided critical insight into this work.

CSG was funded by National Institute for Health Research Trainees Coordinating Centre Fellowships (NIHR-PDF-2014-07-028 and NIHR-CDF-2017-10-19). The funders had no role in the design or execution of this study.

Availability of data and materials

In this manuscript we use de-identified data from a public repository [ 17 ]. The data are included on the BMC Med Res Method website. As such, ethical approval was not required.

Author information

Authors and affiliations.

Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK

Jenni A. M. Sidey-Gibbons

Department of Surgery, Harvard Medical School, 25 Shattuck Street, Boston, 01225, Massachusetts, USA

Chris J. Sidey-Gibbons

Department of Surgery, Brigham and Women’s Hospital, 75 Francis Street, Boston, 01225, Massachusetts, USA

University of Cambridge Psychometrics Centre, Trumpington Street, Cambridge, CB2 1AG, UK

You can also search for this author in PubMed   Google Scholar

Contributions

JSG contributed to the conception and design of the work, interpretation of data and presentation of results, and drafted the manuscript. CSG contributed to the conception and design of the work, conducted the analyses, and drafted the manuscript. Both JSG and CSG approve of the final versions and agree to be accountable for their own contributions. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Chris J. Sidey-Gibbons .

Ethics declarations

Ethics approval and consent to participate, consent for publication.

All contributing parties consent for the publication of this work.

Competing interests

The authors report no competing interests relating to this work.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1.

Breast Cancer Wisconsin Dataset. Anonomised dataset used in this work. (CSV 24.9 kb)

Additional file 2

R Markdown Supplementary Material. R Code accompanying the work described in this paper and its output. (PDF 207 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Sidey-Gibbons, J., Sidey-Gibbons, C. Machine learning in medicine: a practical introduction. BMC Med Res Methodol 19 , 64 (2019). https://doi.org/10.1186/s12874-019-0681-4

Download citation

Received : 11 June 2018

Accepted : 14 February 2019

Published : 19 March 2019

DOI : https://doi.org/10.1186/s12874-019-0681-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Medical informatics
  • Classification
  • Supervised machine learning
  • Programming languages
  • Computer-assisted
  • Decision making

BMC Medical Research Methodology

ISSN: 1471-2288

machine learning in healthcare research papers pdf

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 04 July 2022

Shifting machine learning for healthcare from development to deployment and from models to data

  • Angela Zhang   ORCID: orcid.org/0000-0003-0906-6770 1 , 2 , 3 , 4 ,
  • Lei Xing   ORCID: orcid.org/0000-0003-2536-5359 5 ,
  • James Zou   ORCID: orcid.org/0000-0001-8880-4764 4 , 6 &
  • Joseph C. Wu   ORCID: orcid.org/0000-0002-6068-8041 1 , 3 , 7 , 8  

Nature Biomedical Engineering volume  6 ,  pages 1330–1345 ( 2022 ) Cite this article

42k Accesses

26 Citations

216 Altmetric

Metrics details

  • Biomedical engineering
  • Computational science
  • Machine learning
  • Medical imaging
  • Translational research

In the past decade, the application of machine learning (ML) to healthcare has helped drive the automation of physician tasks as well as enhancements in clinical capabilities and access to care. This progress has emphasized that, from model development to model deployment, data play central roles. In this Review, we provide a data-centric view of the innovations and challenges that are defining ML for healthcare. We discuss deep generative models and federated learning as strategies to augment datasets for improved model performance, as well as the use of the more recent transformer models for handling larger datasets and enhancing the modelling of clinical text. We also discuss data-focused problems in the deployment of ML, emphasizing the need to efficiently deliver data to ML models for timely clinical predictions and to account for natural data shifts that can deteriorate model performance.

Similar content being viewed by others

machine learning in healthcare research papers pdf

The future of digital health with federated learning

machine learning in healthcare research papers pdf

Automated clinical coding: what, why, and where we are?

machine learning in healthcare research papers pdf

Deep learning and alternative learning strategies for retrospective real-world clinical data

In the past decade, machine learning (ML) for healthcare has been marked by particularly rapid progress. Initial groundwork has been laid for many healthcare needs that promise to improve patient care, reduce healthcare workload, streamline healthcare processes and empower the individual 1 . In particular, ML for healthcare has been successful in the translation of computer vision through the development of image-based triage 2 and second readers 3 . There has also been rapid progress in the harnessing of electronic health records 4 , 5 (EHRs) to predict the risk and progression of many diseases 6 , 7 . A number of software platforms for ML are beginning to make their way into the clinic 8 . In 2018, iDX-DR, which detects diabetic retinopathy, was the first ML system for healthcare that the United States Food and Drug Administration approved for clinical use 8 . Babylon 9 , a chatbot triage system, has partnered with the United Kingdom’s National Healthcare system. Furthermore, Viz.ai 10 , 11 has rolled out their triage technology to more than 100 hospitals in the United States.

As ML systems begin to be deployed in clinical settings, the defining challenge of ML in healthcare has shifted from model development to model deployment. In bridging the gap between the two, another trend has emerged: the importance of data. We posit that large, well-designed, well-labelled, diverse and multi-institutional datasets drive performance in real-world settings far more than model optimization 12 , 13 , 14 , and that these datasets are critical for mitigating racial and socioeconomic biases 15 . We realize that such rich datasets are difficult to obtain, owing to clinical limitations of data availability, patient privacy and the heterogeneity of institutional data frameworks. Similarly, as ML healthcare systems are deployed, the greatest challenges in implementation arise from problems with the data: how to efficiently deliver data to the model to facilitate workflow integration and make timely clinical predictions? Furthermore, once implemented, how can model robustness be maintained in the face of the inevitability of natural changes in physician and patient behaviours? In fact, the shift from model development to deployment is also marked by a shift in focus: from models to data.

In this Review, we build on previous surveys 1 , 16 , 17 and take a data-centric approach to reviewing recent innovations in ML for healthcare. We first discuss deep generative models and federated learning as strategies for creating larger and enhanced datasets. We also examine the more recent transformer models for handling larger datasets. We end by highlighting the challenges of deployment, in particular, how to process and deliver usable raw data to models, and how data shifts can affect the performance of deployed models.

Deep generative models

Generative adversarial networks (GANs) are among the most exciting innovations in deep learning in the past decade. They offer the capability to create large amounts of synthetic yet realistic data. In healthcare, GANs have been used to augment datasets 18 , alleviate the problems of privacy-restricted 19 and unbalanced datasets 20 , and perform image-modality-to-image-modality translation 21 and image reconstruction 22 (Fig. 1 ). GANs aim to model and sample from the implicit density function of the input data 23 . They consist of two networks that are trained in an adversarial process under which one network, the ‘generator’, generates synthetic data while the other network, the ‘discriminator’, discriminates between real and synthetic data. The generative model aims to implicitly learn the data distribution from a set of samples to further generate new samples drawn from the learned distribution, while the discriminator pushes the generator network to sample from a distribution that more closely mirrors the true data distribution.

figure 1

a , GANs can be used to augment datasets to increase model performance and anonymize patient data. For example, they have been used to generate synthetic images of benign and malignant lesions from real images 183 . b , GANs for translating images acquired with one imaging modality into another modality 51 . Left to right: input CT image, generated MR image and reference MR image. c , GANs for the denoising and reconstruction of medical images 184 . Left, low-dose CT image of a patient with mitral valve prolapse, serving as the input into the GAN. Right, corresponding routine-dose CT image and the target of the GAN. Middle, GAN-generated denoised image resembling that obtained from routine-dose CT imaging. The yellow arrows indicate a region that is distinct between the input image (left) and the target denoised image (right). d , GANs for image classification, segmentation and detection 39 . Left, input image of T2 MRI slice from the multimodal brain-tumour image-segmentation benchmark dataset. Middle, ground-truth segmentation of the brain tumour. Right, GAN-generated segmentation image. Yellow, segmented tumour; blue, tumour core; and red, Gd-enhanced tumour core. e , GANs can model a spectrum of clinical scenarios and predict disease progression 66 . Top: given an input MR image (denoted by the arrow), DaniGAN can generate images that reflect neurodegeneration over time. Bottom, difference between the generated image and the input image. ProGAN, progressive growing of generative adversarial network; DaniNet, degenerative adversarial neuroimage net. Credit: Images (‘Examples’) reproduced with permission from: a , ref. 183 , Springer Nature Ltd; b , ref. 51 , under a Creative Commons licence CC BY 4.0 ; c , ref. 184 , Wiley; d , ref. 39 , Springer Nature Ltd; e , ref. 66 , Springer Nature Ltd.

Over the years, a multitude of GANs have been developed to overcome the limitations of the original GAN (Table 1 ), and to optimize its performance and extend its functionalities. The original GAN 23 suffered from unstable training and low image diversity and quality 24 . In fact, training two adversarial models is, in practice, a delicate and often difficult task. The goal of training is to achieve a Nash equilibrium between the generator and the discriminator networks. However, simultaneously obtaining such an equilibrium for networks that are inherently adversarial is difficult and, if achieved, the equilibrium can be unstable (that is, it can be suddenly lost after model convergence). This has also led to sensitivity to hyperparameters (making the tuning of hyperparameters a precarious endeavour) and to mode collapse, which occurs when the generator produces a limited and repeated number of outputs. To remedy these limitations, changes have been made to GAN architectures and loss functions. In particular, the deep convolutional GAN (DCGAN 25 ), a popular GAN often used for medical-imaging tasks, aimed to combat instability by introducing key architecture-design decisions, including the replacement of fully connected layers with convolutional layers, and the introduction of batch normalization (to standardize the inputs to a layer when training deep neural networks) and ReLU (rectified linear unit) activation. The Laplacian pyramid of adversarial networks GAN (LAPGAN 26 ) and the progressively growing GAN (ProGAN 27 ) build on DCGAN to improve training stability and image quality. Both LAPGAN and ProGAN start with a small image, which promotes training stability, and progressively grow the image into a higher-resolution image.

The conditional GAN (cGAN 28 ) and the auxiliary classifier GAN (AC-GAN 29 ) belong to a subtype of GANs that enable the model to be conditioned on external information to create synthetic data of a specific class or condition. This was found to improve the quality of the generated samples and increase the capability to handle the generation of multimodal data. The pix2pix GAN 30 , which is conditioned on images, allows for image-to-image translation (also across imaging modalities) and has been popular in healthcare applications.

A recent major architectural change to GANs involve attention mechanisms. Attention was first introduced to facilitate language translation and has rapidly become a staple in deep-learning models, as it can efficiently capture longer-range global and spatial relations from input data. The incorporation of attention into GANs has led to the development of self-attention GANs (SAGANs) 31 , 32 and BigGAN; 33 ; the latter scales up SAGAN to achieve state-of-the-art performance.

Another primary strategy to mitigate the limitations of GANs involves improving the loss function. Early GANs used the Jensen-Shannon divergence and the Kullback-Leibler divergence as loss functions to minimize the difference in distribution between the synthetic generated dataset and the real-data dataset. However, the Jensen-Shannon divergence was found to fail in scenarios where there is no overlap (or little overlap) between distributions, while the minimization of the Kullback-Leibler divergence can lead to mode collapse. To address these problems, a number of GANs have used alternative loss functions. The most popular are arguably the Wasserstein GAN (WGAN 34 ) and the Wasserstein GAN gradient penalty (WGAN-GP 35 ). The Wasserstein distance measures the effort to minimize the distance between dataset distributions and has been shown to have a smoother gradient. Additional popular strategies that have been implemented to improve GAN performance and that do not involve modifying the model architecture include spectral normalization and varying how frequently the discriminator is updated (with respect to the update frequency of the generator).

The explosive progress of GANs has spawned many more offshoots of the original GAN, as documented by the diverse models that now populate the GAN Model Zoo 36 .

Augmenting datasets

In the past decade, many deep-learning models for medical-image classification 3 , 37 , segmentation 38 , 39 and detection 40 have achieved physician-level performance. However, the success of these models is ultimately beholden to large, diverse, balanced and well-labelled datasets. This is a bottleneck that extends across domains, yet it is particularly restrictive in healthcare applications where collecting comprehensive datasets comes with unique obstacles. In particular, large amounts of standardized clinical data are difficult to obtain, and this is exacerbated by the reality that clinical data often reflects the patient population of one or few institutions (with the data sometimes overrepresenting common diseases or healthy populations and making the sampling of rarer conditions more difficult). Datasets with high class imbalance or insufficient variability can often lead to poor model performance, generalization failures, unintentional modelling of confounders 41 and propagation of biases 42 . To mitigate these problems, clinical datasets can be augmented by using standard data-manipulation techniques, such as the flipping, rotation, scaling and translation of images 43 . However, these methods can lead to limited increases in performance and generate highly correlated training data.

GANs offer potent solutions to these problems. GANs can be used to augment training data to improve model performance. For example, a convolutional neural network (CNN) for the classification of liver lesions, trained on both synthetically and traditionally augmented data, boosted the performance of the model by 10% with respect to a CNN trained on only traditionally augmented datasets 18 . Moreover, when generating synthetic data across data classes, developing a generator for each class can result in higher model performance 20 , 44 , as was shown via the comparison of two variants of GANs (a DCGAN that generated labelled examples for each of three lesion classes separately and an AC-GAN that incorporated class conditioning to generate labelled examples) 18 .

The aforementioned studies involved class-balanced datasets but did not address medical data with either simulated or real class imbalances. In an assessment of the capability of GANs to alleviate the shortcomings of unbalanced chest-X-ray datasets 20 , it was found that training a classifier on real unbalanced datasets that had been augmented with DCGANs outperformed models that were trained with the unbalanced and balanced versions of the original dataset. Although there was an increase in classification accuracy across all classes, the greatest increase in performance was seen in the most imbalanced classes (pneumothorax and oedema), which had just one-fourth the number of training cases as the next class.

Protecting patient privacy

The protection of patient privacy is often a leading concern when developing clinical datasets 45 . Sharing patient data when generating multi-institution clinical datasets can pose a risk to patient privacy 46 . Even if privacy protocols are followed, patient characteristics can sometimes be inferred from the ML model and its outputs 47 , 48 . In this regard, GANs may provide a solution. Data created by GANs cannot be attributed to a single patient, as they synthesize data that reflect the patient population in aggregate. GANs have thus been used as a patient-anonymization tool to generate synthetic data for model training 9 , 49 . Although models trained on just synthetic data can perform poorly, models trained on synthetic data and fine-tuned with 10% real data resulted in similar performance to models trained on real datasets augmented with synthetic data 19 . Similarly, using synthetic data generated from GANs to train an image-segmentation model was sufficient to achieve 95% of the accuracy of the same model trained on real data 49 . Hence, using synthetic data during model development can mitigate potential patient-privacy violations.

Image-to-image translation

One exciting use of GANs involves image-to-image translation. In healthcare, this capability has been used to translate between imaging modalities—between computed tomography (CT) and magnetic resonance (MR) images 21 , 49 , 50 , 51 , between CT and positron emission tomography (PET) 52 , 53 , 54 , between MR and PET 55 , 56 , 57 , and between T1 and T2 MR images 58 , 59 . Transfer between image modalities can reduce the need for additional costly and time-intensive image acquisitions, can be used in scenarios where imaging is not possible (as is the case for MR imaging in individuals with metal implants) and to expand the types of training data that can be created from image datasets. There are two predominant strategies for image-to-image translation: paired-image training (with pix2pix 30 ) and unpaired training (with CycleGAN 60 ). For example, pix2pix was used to generate synthetic CT images for accurate MR-based dose calculations for the pelvis 61 . Similarly, using paired magnetic resonance angiography and MR images, pix2pix was modified to generate a model for the translation of T1 and T2 MR images to retrospectively inspect vascular structures 62 .

Obtaining paired images can be difficult in scenarios involving moving organs or multimodal medical images that are in three dimensions and do not have cross-modality paired data. In such cases, one can use CycleGAN 60 , which handles image-to-image translation on unpaired images. A difficulty with unpaired images is the lack of ground-truth labels for evaluating the accuracy of the predictions (yet real cardiac MR images have been used to compare the performance of segmentation models trained on synthetic cardiac MR images translated from CT images 49 ). Another common problem is the need to avoid geometric distortions that destroy anatomical structures. Limitations with geometric distortions can be overcome by using two auxiliary mappings to constrain the geometric invariance of synthetic data 21 .

Opportunities

In the context of clinical datasets, GANs have primarily been used to augment or balance the datasets, and to preserve patient privacy. Yet a burgeoning application of GANs is their use to systematically explore the entire terrain of clinical scenarios and disease presentations. Indeed, GANs can be used to generate synthetic data to combat model deterioration in the face of domain shifts 63 , 64 , for example, by creating synthetic data that simulate variable lighting or camera distortions, or that imitate data collected from devices from different vendors or from different imaging modalities. Additionally, GANs can be used to create data that simulate the full spectrum of clinical scenarios and disease presentations, from dangerous and rare clinical scenarios such as incorrect surgery techniques 63 , to modelling the spectrum of brain-tumour presentation 19 , to exploring the disease progression of neurodegenerative diseases 65 , 66 .

However, GANs can suffer from training instability and low image diversity and quality. These limitations could hamper the deployment of GANs in clinical practice. For example, one hope for image-to-image translation in healthcare involves the creation of multimodality clinical images (from CT and MR, for example) for scenarios in which only one imaging modality is possible. However, GANs are currently limited in the size and quality of the images that they can produce. This raises the question of whether these images can realistically be used clinically when medical images are typically generated at high resolution. Moreover, there may be regulatory hurdles involved in approving ML healthcare models that have been trained on synthetic data. This is further complicated by the current inability to robustly evaluate and control the quality of GANs and of the synthetic data that they generate 67 . Still, in domains unrelated to healthcare, GANs have been used to make tangible improvements to deployed models 68 . These successes may lay a foundation for the real-world application of GANs in healthcare.

Federated learning

When using multi-institutional datasets, model training is typically performed centrally: data siloed in individual institutions are aggregated into a single server. However, data used in such ‘centralized training’ represent a fraction of the vast amount of clinical data that could be harnessed for model development. Yet, openly sharing and exchanging patient data is restricted by many legal, ethical and administrative constraints; in fact, in many jurisdictions, patient data must remain local.

Federated learning is a paradigm for training ML models when decentralized data are used collaboratively under the orchestration of a central server 69 , 70 (Fig. 2 ). In contrast to centralized training, where data from various locations are moved to a single server to train the model, federated learning allows for the data to remain in place. At the start of each round of training, the current copy of the model is sent to each location where the training data are stored. Each copy of the model is then trained and updated using the data at each location. The updated models are then sent from each location back to the central server, where they are aggregated into a global model. The subsequent round of training follows, the newly updated global model is distributed again, and the process is repeated until model convergence or training is stopped. At no point do the data leave a particular location or institution, and only individuals associated with an institution have direct access to its data. This mitigates concerns about privacy breaches, minimizes costs associated with data aggregation, and allows training datasets to quickly scale in size and diversity. The successful implementation of federated learning could transform how deep-learning models for healthcare are trained. Here we focus on two applications: cross-silo federated learning and cross-device federated learning (Table 2 ).

figure 2

Multiple institutions collaboratively train an ML model. Federated learning begins when each institution notifies a central server of their intention to participate in the current round of training. Upon notification, approval and recognition of the institution, the central server sends the current version of the model to the institution (step 1). Then, the institution trains the model locally using the data available to it (step 2). Upon completion of local training, the institution sends the model back to the central server (step 3). The central server aggregates all of the models that have been trained locally by each of the individual institutions into a single updated model (step 4). This process is repeated in each round of training until model training concludes. At no point during any of the training rounds do patient data leave the institution (step 5). The successful implementation of federated learning requires healthcare-specific federated learning frameworks that facilitate training, as well as institutional infrastructure for communication with the central server and for locally training the model.

Cross-silo federated learning

Cross-silo federated learning is an increasingly attractive solution to the shortcomings of centralized training 71 . It has been used to leverage EHRs to train models to predict hospitalization due to heart disease 72 , to promote the development of ‘digital twins’ or ‘Google for patients’ 73 , and to develop a Coronavirus disease 2019 (COVID-19) chest-CT lesion segmenter 74 . Recent efforts have focused on empirically evaluating model-design parameters, and on logistical decisions to optimize model performance and overcome the unique implementation challenges of federated learning, such as bottlenecks in protecting privacy and in tackling the statistical heterogeneity of the data 75 , 76 .

Compared with centralized training, one concern of federated learning is that models may encounter more severe domain shifts or overfitting. However, models trained through federated learning were found to achieve 99% of the performance of traditional centralized training even with imbalanced datasets or with relatively few samples per institution, thus showing that federated learning can be realistically implemented without sacrificing performance or generalization 77 , 78 .

Although federated learning offers greater privacy protection because patient data are no longer being transmitted, there are risks of privacy breaches 79 . Communicating model updates during the training process can reveal sensitive information to third parties or to the central server. In certain instances, data leakage can occur, such as when ML models ‘memorize’ datasets 80 , 81 , 82 and when access to model parameters and updates can be used to infer the original dataset 83 . Differential privacy 84 can further reinforce privacy protection for federated learning 70 , 85 , 86 . Selective parameter sharing 87 and the sparse vector technique 88 are two strategies for achieving greater privacy, but at the expense of model performance (this is consistent with differential-privacy findings in domains outside of medicine and healthcare 80 , 89 ).

Another active area of research for federated learning in healthcare involves the handling of data that are neither independent nor identically distributed (non-IID data). Healthcare data are particularly susceptible to this problem, owing to a higher prevalence of certain diseases in certain institutions (which can cause label-distribution skew) or to institution-specific data-collection techniques (leading to ‘same label, different features’ or to ‘same features, different label’). Many federated learning strategies assume IID data, but non-IID data can pose a very real problem in federated learning; for example, it can cause the popular federated learning algorithm FedAvg 70 to fail to converge 90 . The predominant strategies for addressing this issue have involved the reframing of the data to achieve a uniform distribution (consensus solutions) or the embracing of the heterogeneity of the data 69 , 91 , 92 (pluralistic solutions). In healthcare, the focus has been on consensus solutions involving data sharing (a small subset of training data is shared among all institutions 93 , 94 ).

Cross-device federated learning to handle health data from individuals

‘ Smart’ devices can produce troves of continuous, passive and individualized health data that can be leveraged to train ML models and deliver personalized health insights for each user 1 , 16 , 39 , 95 , 96 . As smart devices become increasingly widespread, and as computing and sensor technology become more advanced and cheaper to mass-produce, the amount of health data will grow exponentially. This will accentuate the challenges of aggregating large quantities of data into a single location for centralized training and exacerbate privacy concerns (such as any access to detailed individual health data by large corporations or governments).

Cross-device federated learning was developed to address the increasing amounts of data that are being generated ‘at the edge’ (that is, by decentralized smart devices), and has been deployed on millions of smart devices; for example, for voice recognition (by Apple, for the voice assistant Siri 97 ) and to improve query suggestions (by Google, for the Android operating system 98 ).

The application of cross-device federated learning to train healthcare models for smart devices is an emerging area of research. For example, using a human-activity-recognition dataset, a global model (FedHealth) was pre-trained using 80% of the data before deploying it to be locally trained and then aggregated 99 . The aggregated model was then sent back to each user and fine-tuned on user-specific data to develop a personalized model for the user. Model personalization resolves issues arising from the highly different probability distributions that may arise across users and the global model. This training strategy outperformed non-federated learning by nearly 5.3%.

Limitations and opportunities

In view of the initial promises and successes of federated learning, the next few years will be defined by progress towards the implementation of federated learning in healthcare. This will require a high degree of coordination across institutions at each step of the federated learning process. Before training, medical data will need to undergo data normalization and standardization. This can be challenging, owing to differences in how data are collected, stored, labelled and partitioned across institutions. Current data pre-processing pipelines could be adapted to create multi-institutional training datasets, yet in federated learning, the responsibility shifts from a central entity to each institution individually. Hence, methods to streamline and validate these processes across institutions will be essential for the successful implementation of federated learning.

Another problem concerns the inability of the developer of the model to directly inspect data during model development. Data inspection is critical for troubleshooting and for identifying any mislabelled data as well as general trends. Tools (such as Federated Analytics, developed by Google 100 ) that use GANs to create synthetic data that resemble the original training data 101 and derive population-level summary statistics from the data, can be helpful. However, it is currently unclear whether tools that have been developed for cross-device settings can be applied to cross-silo healthcare settings while preserving institutional privacy.

Furthermore, federated learning will require robust frameworks for the implementation of federated networks. Many such software is proprietary, and many of the open-source frameworks are primarily intended for use in research. The primary concerns of federated learning can be addressed by frameworks designed to reinforce patient privacy, facilitate model aggregation and tackle the challenges of non-IID data.

One main hurdle is the need for each participating healthcare institution to acquire the necessary infrastructure. This implies ensuring that each institution has the same federated learning framework and version, that stable and encrypted network communication is available to send and receive model updates from the central server, and that the computing capabilities (institutional graphics processing units or access to cloud computing) are sufficient to train the model. Although most large healthcare institutions may have the necessary infrastructure in place, it has typically been optimized to store and handle data centrally. The adaptation of infrastructure to handle the requirements of federated learning requires coordinated effort and time.

A number of ongoing federated learning initiatives in healthcare are underway. Specifically, the Federated Tumour Segmentation Initiative (a collaboration between Intel and the University of Pennsylvania) trains lesion-segmentation models collaboratively across 29 international healthcare institutions 102 . This initiative focuses on finding the optimal algorithm for model aggregation, as well as on ways to standardize training data from various institutions. In another initiative (a collaboration of NVIDIA and several institutions), federated learning was used to train mammography-classification models 103 . These efforts may establish blueprints for coordinated federated networks applied to healthcare.

Natural language processing

Harnessing natural language processing (NLP)—the automated understanding of text—has been a long-standing goal for ML in healthcare 1 , 16 , 17 . NLP has enabled the automated translation of doctor–patient interactions to notes 5 , 104 , 105 , the summarization of clinical notes 106 , the captioning of medical images 107 , 108 and the prediction of disease progression 6 , 7 . However, the inability to efficiently train models using the large datasets needed to achieve adept natural-language understanding has limited progress. In this section, we provide an overview of two recent innovations that have transformed NLP: transformers and transfer learning for NLP. We also discuss their applications in healthcare.

Transformers

When modelling sequential data, recurrent neural networks (RNNs) have been the predominant choice of neural network. In particular, long short-term memory networks 109 and gated units 110 were staple RNNs in modelling EHR data, as these networks can model the sequential nature of clinical data 111 , 112 and clinical text 5 , 104 , 105 , 113 . However, RNNs harbour several limitations 114 . Namely, RNNs process data sequentially and not in parallel. This restricts the size of the input datasets and of the networks, which limits the complexity of the features and the range of relations that can be learned 114 . Hence, RNNs are difficult to train, deploy and scale, and are suboptimal for capturing long-range patterns and global patterns in data. However, learning global or long-range relationships are often needed when learning language representations. For example, sentences far removed from a word may be important for providing context for the word, and previous clinical events that have occurred can inform clinical decisions that are made years later. For a period, CNNs, which are adept at parallelization, were used to overcome some of the limitations of RNNs 115 , but were found to be inefficient when modelling longer global dependencies.

In 2017, a research team at Google (the Google Brain team) released the transformer, a landmark model that has revolutionized NLP 116 . Compared with RNN and CNN models, transformers are more parallelizable and less computationally complex at each layer, and thus can handle larger training data and learn longer-range and global relations. The use of only attention layers for the encoders and decoders while forgoing the use of RNNs or CNNs was critical to the success of transformers. Attention was introduced and refined 117 , 118 to handle bottlenecks in sequence-to-sequence RNNs 110 , 119 . Attention modules allow models to globally relate different positions of a sequence to compute a richer representation of the sequence 116 , and does so in parallel, allowing for increased computing efficiency and for the embedding of longer relations of the input sequence (Fig. 3 ).

figure 3

a , The original transformer model performs language translation, and contains encoders that convert the input into an embedding and decoders that convert the embedding into the output. b , The transformer model uses attention mechanisms within its encoders and decoders. The attention module is used in three places: in the encoder (for the input sentence), in the decoder (for the output sentence) and in the encoder–decoder in the decoder (for embeddings passed from the encoder). c , The key component of the transformer block is the attention module. Briefly, attention is a mechanism to determine how much weight to place on input features when creating embeddings for downstream tasks. For NLP, this involves determining how much importance to place on surrounding text when creating a representation for a particular word. To learn the weights, the attention mechanism assigns a score to each pair of words from an input phrase to determine how strongly the words should influence the representation. To obtain the score, the transformer model first decomposes the input into three vectors: the query vector ( Q ; the word of interest), the key vector ( K ; surrounding words) and the value vector ( V ; the contents of the input) (1). Next, the dot product is taken between the query and key vector (2) and then scaled to stabilize training (3). The SoftMax function is then applied to normalize the scores and ensure that they add to 1 (4). The output SoftMax score is then multiplied by the value vector to apply a weighted focus to the input (5). The transformer model has multiple attention mechanisms (termed attention heads); each learn a separate representation for the same word, which therefore increases the relations that can be learned. Each attention head is composed of stacked attention layers. The output of each attention mechanism is concatenated into a single matrix (6) that is fed into the downstream feed-forward layer. d , e , Visual representation of what is learned 185 . Lines relate the query (left) to the words that are attended to the most (right). Line thickness denotes the magnitude of attention, and colours represent the attention head. d , The learned attention in one attention-mechanism layer of one head. e , Examples of what is learned by each layer of each attention head. Certain layers learn to attend to the next words (head 2, layer 0) or to the previous word (head 0, layer 0). f , Workflow for applying a transformer language model to a clinical task. Matmul, matrix multiplication; (CLS), classification token placed at the start of a sentence to store the sentence-level embedding; (SEP), separation token placed at the end of a sentence. BERT, bidirectional encoder representations from transformers; MIMIC, multiparameter intelligence monitoring in intensive care.

Transfer learning for NLP

Simultaneous and subsequent work following the release of the transformer resolved another main problem in NLP: the formalization of the process of transfer learning. Transfer learning has been used most extensively in computer vision, owing to the success of the ImageNet challenge, which made pre-trained CNNs widely available 120 . Transfer learning has enabled the broader application of deep learning in healthcare 17 , as researchers can fine-tune a pre-trained CNN adept at image classification on a smaller clinical dataset to accomplish a wide spectrum of healthcare tasks 3 , 37 , 121 , 122 . Until recently, robust transfer learning for NLP models was not possible, which limited the use of NLP models in domain-specific applications. A series of recent milestones have enabled transfer learning for NLP. The identification of the ideal pre-training language task for deep-learning NLP models (for example, masked-language modelling, predicting missing words from surrounding context, next-sentence prediction or predicting whether two sentences follow one another) was solved by universal language model fine-tuning (ULM-FiT 123 ) and embeddings from language model (ELMo 124 ). The generative pre-trained transformer (GPT 125 ) from Open AI and the bidirectional encoder representations from transformers (BERT 126 ) from Google Brain then applied the methods formalized by ULM-FiT and ELMo to transformer models, delivering pre-trained models that achieved unprecedented capabilities on a series of NLP tasks.

Transformers for the understanding of clinical text

Following the success of transformers for NLP, their potential to handle domain-specific text, specifically clinical text, was quickly assessed. The performances of the transformer-based model BERT, the RNN-based model ELMo and traditional word-vector embeddings 127 , 128 at clinical-concept extraction (the identification of the medical problems, tests and treatments) from EHR data were evaluated 106 . BERT outperformed traditional word vectors by a substantial margin and was more computationally efficient than ELMo (it achieved higher performance with fewer training iterations) 129 , 130 , 131 , 132 . Pre-training on a dataset of 2 million clinical notes (the dataset multiparameter intelligence monitoring in intensive care 132 ; MIMIC-III) increased the performance of all NLP models. This suggests that contextual embeddings encode valuable semantic information not accounted for in traditional word representations 106 . However, the performance of MIMIC-III BERT began to decline after achieving its optimal model; this is perhaps indicative of the model losing information learned from the large open corpus and converging to a model similar to the one initialized from scratch 106 . Hence, there may be a fine balance between learning from a large open-domain corpus and a domain-specific clinical corpus. This may be a critical consideration when applying pre-trained models to healthcare tasks.

To facilitate the further application of clinically pre-trained BERT 129 to downstream clinical tasks, a BERT pre-trained on large clinical datasets was publicly released. Because transformers and deep NLP models are resource-intensive to train (training the BERT model can cost US$50,000–200,000 133 ; and pre-training BERT on clinical datasets required 18 d of continuous training, an endeavour that may be out of the reach of many institutions), openly releasing pre-trained clinical models can facilitate widespread advancements of NLP tasks in healthcare. Other large and publicly available clinically pre-trained models (Table 3 ) are ClinicalBERT 130 , BioBERT 134 and SciBERT 135 .

The release of clinically pre-trained models has spurred downstream clinical applications. ClinicalBERT, a BERT model pre-trained on MIMIC-III data using masked-language modelling and next-sentence prediction, was evaluated on the downstream task of predicting 30 d readmission 130 . Compared with previous models 136 , 137 , ClinicalBERT can dynamically predict readmission risk during a patient’s stay and uses clinical text rather than structured data (such as laboratory values, or codes from the international classification of diseases). This shows the power of transformers to unlock clinical text, a comparatively underused data source in EHRs. Similarly, clinical text from EHRs has been harnessed using SciBERT for the automated extraction of symptoms from COVID-19-positive and COVID-19-negative patients to identify the most discerning clinical presentation 138 . ClinicalBERT has also been adapted to extract anginal symptoms from EHRs 139 . Others have used enhanced clinical-text understanding for the automatic labelling and summarization of clinical reports. BioBERT and ClinicalBERT have been harnessed to extract labels from radiology text reports, enabling an automatic clinical summarization tool and labeller 140 . Transformers have also been used to improve clinical questioning and answering 141 , in clinical voice assistants 142 , 143 , in chatbots for patient triage 144 , 145 , and in medical-image-to-text translation and medical-image captioning 146 .

Transformers for the modelling of clinical events

In view of their adeptness to model the sequential nature of clinical text, transformers have also been harnessed to model the sequential nature of clinical events 147 , 148 , 149 , 150 , 151 . A key challenge of modelling clinical events is properly capturing long-term dependencies—that is, previous clinical procedures that may preclude future downstream interventions. Transformers are particularly adept at exploring longer-range relationships and were recently used to develop BEHRT 152 , which leverages the parallels between sequences in natural language and clinical events in EHRs to portray diagnoses as words, visits as sentences and a patient’s medical history as a document 152 . When used to predict the likelihood of 301 conditions in future visits, BEHRT achieved an 8–13.2% improvement over the existing state-of-the-art EHR model 152 . BEHRT was also used to predict the incidence of heart failure from EHR data 153 .

Data-limiting factors in the deployment of ML

The past decade of research in ML in healthcare has focused on model development, and the next decade will be defined by model deployment into clinical settings 42 , 45 , 46 , 154 , 155 . In this section, we discuss two data-centric obstacles in model deployment: how to efficiently deliver raw clinical data (Table 4 ) to models, and how to monitor and correct for natural data shifts that deteriorate model performance.

Delivering data to models

A main obstacle to model deployment is associated with how to efficiently transform raw, unstructured and heterogeneous clinical data into structured data that can be inputted into ML models. During model development, pre-processed structured data are directly inputted into the model. However, during deployment, minimizing the delay between the acquisition of raw data and the delivery of structured inputs requires an adept data pipeline for collecting data from their source, and for ingesting, preparing and transforming the data (Fig. 4 ). An ideal system would need to be high-throughput, have low latency and be scalable to a large number of data sources. A lack of optimization can result in major sources of inefficiency and delayed predictions from the model. In what follows, we detail the challenges of building a pipeline for clinical data and give an overview of the key components of such a pipeline.

figure 4

Delivering data to a model is a key bottleneck in obtaining timely and efficient inferences. ML models require input data that are organized, standardized and normalized, often in tabular format. Therefore, it is critical to establish a pipeline for organizing and storing heterogeneous clinical data. The data pipeline involves collecting, ingesting and transforming clinical data from an assortment of data sources. Data can be housed in data lakes, in data warehouses or in both. Data lakes are central repositories to store all forms of data, raw and processed, without any predetermined organizational structure. Data in data lakes can exist as a mix of binary data (for example, images), structured data, semi-structured data (such as tabular data) and unstructured data (for example, documents). By contrast, data warehouses store cleaned, enriched, transformed and structured data with a predetermined organizational structure.

The fundamental challenge of creating an adept data pipeline arises from the need to anticipate the heterogeneity of the data. ML models often require a set of specific clinical inputs (for example, blood pressure and heart rate), which are extracted from a suite of dynamically changing health data. However, it is difficult to extract the relevant data inputs. Clinical data vary in volume and velocity (the rate that data are generated), thus prompting the question of how frequently data should be collected. Furthermore, clinical data can vary in veracity (data quality), thus requiring different pre-processing steps. Moreover, the majority of clinical data exist in an unstructured format that is further complicated by the availability of hundreds of EHR products, each with its own clinical terminology, technical specifications and capabilities 156 . Therefore, how to precisely extract data from a spectrum of unstructured EHR frameworks becomes critical.

Data heterogeneity must be carefully accounted for when designing the data pipeline, as it can influence throughput, latency and other performance factors. The data pipeline starts with the process of data ingestion (by which raw clinical data are moved from the data source and into the pipeline), a primary bottleneck in the throughput of the data through the pipeline. In particular, handling peaks of data generation may require the design and implementation of scalable ways to support a variable number of connected objects 157 . Such data-elasticity issues can take advantage of software frameworks that scale up or down in real time to more effectively use computer resources in cloud data centres 158 .

After the data enters the pipeline, the data-preparation stage involves the cleansing, denoising, standardization and shaping of the data into structured data that are ready for consumption by the ML system. In studies that developed data pipelines to handle healthcare data 156 , 159 , 160 , the data-preparation stage was found to regulate the latency of the data pipeline, as latency depended on the efficiency of the data queue, the streaming of the data and the database for storing the computation results.

A final consideration is how data should move throughout the data pipeline; specifically, whether data should move in discrete batches or in continuous streams. Batch processing involves collecting and moving source data periodically, whereas stream processing involves sourcing, moving and processing data as soon as they are created. Batch processing has the advantages of being high-throughput, comprehensive and economical (and hence may be advantageous for scalability), whereas stream processing occurs in real time (and thus may be required for time-sensitive predictions). Many healthcare systems use a combination of batch processing and stream processing 160 .

Established data pipelines are being harnessed to support real-time healthcare modelling. In particular, Columbia University Medical Center, in collaboration with IBM, is streaming physiological data from patients with brain injuries to predict adverse neurological complications up to 48 h before existing methods can 161 . Similarly, Yale School of Medicine has used a data pipeline to support real-time data acquisition for predicting the number of beds available, handling care for inpatients and patients in the intensive care unit (such as managing ventilator capacity) and tracking the number of healthcare providers exposed to COVID-19 161 . However, optimizing the components of the data pipeline, particularly for numerous concurrent ML healthcare systems, remains a challenging task.

Deployment in the face of data shifts

A main obstacle in deploying ML systems for healthcare has been maintaining model robustness when faced with data shifts 162 . Data shifts occur when differences or changes in healthcare practices or in patient behaviour cause the deployment data to differ substantially from the training data, resulting in the distribution of the deployment data diverging from the distribution of the training data. This can lead to a decline in model performance. Also, failure to correct for data shifts can lead to the perpetuation of algorithmic biases, missing critical diagnoses 163 and unnecessary clinical interventions 164 .

In healthcare, data shifts are common occurrences and exist primarily along the axes of institutional differences (such as local clinical practices, or different instruments and data-collection workflows), epidemiological shifts, temporal shifts (for example, changes in physician and patient behaviours over time) and differences in patient demographics (such as race, gender and age). A recent case study 165 characterizing data shifts caused by institutional differences reported that pneumothorax classifiers trained on individual institutional datasets declined in performance when evaluated on data from external institutions. Similar phenomena have been observed in a number of studies 41 , 163 , 166 . Institutional differences are among the most patent causes of data shifts because they frequently harbour underlying differences in patient demographics, disease incidence and data-collection workflows. For example, in an analysis of chest-X-ray classifiers and their potential to generalize to other institutions, it was found that one institution collected chest X-rays using portable radiographs, whereas another used stationary radiographs 41 . This led to differences in disease prevalence (33% vs 2% for pneumonia) and patient demographics (average age of 63 vs 45), as portable radiographs were primarily used for inpatients who were too sick to be transported, whereas stationary radiographs were used primarily in outpatient settings. Similarly, another study found that different image-acquisition and image-processing techniques caused the deterioration of the performance of breast-mammography classifiers to random performance (areas under the receiver operating characteristic curve of 0.4–0.6) when evaluated on datasets from four external institutions and countries 163 . However, it is important to note that the models evaluated were trained on data collected during the 1990s and were externally tested on datasets created in 2014–2017. The decline in performance owing to temporal shifts is particularly relevant; if deployed today, models that have been trained on older datasets would be making inferences on newly generated data.

Studies that have characterized temporal shifts have provided insights into the conditions under which deployed ML models should be re-evaluated. An evaluation of models that used data collected over a period of 9 years found that model performance deteriorated substantially, drifting towards overprediction as early as one year after model development 167 . For the MIMIC-III dataset 132 (commonly used for the development of models to predict clinical outcomes), an assessment of the effects of temporal shifts on model performance over time showed that, whereas all models experienced a moderate decline over time, the most significant drop in performance occurred owing to a shift in clinical practice, when EHRs transitioned systems 164 (from CareVue to MetaVision). A modern-day analogy would be how ML systems for COVID-19 (ref. 168 ) that were trained on data 169 acquired during the early phase of the pandemic and before the availability of COVID-19 vaccines would perform when deployed in the face of shifts in disease incidence and presentation.

Data shifts and model deterioration can also occur when models are deployed on patients with gender, racial or socioeconomic backgrounds that are different from those of the patient population that the model was trained on. In fact, it has been shown that ML models can be biased against individuals of certain races 170 or genders 42 , or particular religious 171 or socioeconomic 15 backgrounds. For example, a large-scale algorithm used in many health institutions to identify patients for complex health needs underpredicted the health needs of African American patients and failed to triage them for necessary care 172 . Using non-representative or non-inclusive training datasets can constitute an additional source of gender, racial or socioeconomic biases. Popular chest-X-ray datasets used to train classifiers have been shown to be heavily unbalanced 15 : 67.6% of the patients in these datasets are Caucasian and only 8.98% are under Medicare insurance. Unsurprisingly, the performance of models trained with these datasets deteriorates for non-Caucasian subgroups, and especially for Medicare patients 15 . Similarly, skin-lesion classifiers that were trained primarily on images of one skin tone decrease in performance when evaluated on images of different skin tones 173 ; in this case, the drop in performance could be attributed to variations in disease presentation that are not captured when certain patient populations are not adequately represented in the training dataset 174 .

These findings exemplify two underlying limitations of ML models: the models can propagate existing healthcare biases on a large scale, and insufficient diversity in the training datasets can lead to an inadequate generalization of model outputs to different patient populations. Training models on multi-institutional datasets can be most effective at combating model deterioration 15 , and directly combating existing biases in the training data can also mitigate their impact 171 . There are also solutions for addressing data shifts that involve proactively addressing them during model development 175 , 176 , 177 , 178 or retroactively by surveilling for data shifts during model deployment 179 . A proactive attitude towards recognizing and addressing potential biases and data shifts will remain imperative.

Substantial progress in the past decade has laid a foundation of knowledge for the application of ML to healthcare. In pursuing the deployment of ML models, it is clear that success is dictated by how data are collected, organized, protected, moved and audited. In this Review, we have highlighted methods that can address these challenges. The emphasis will eventually shift to how to build the tools, infrastructure and regulations needed to efficiently deploy innovations in ML in clinical settings. A central challenge will be the implementation and translation of these advances into healthcare in the face of their current limitations: for instance, GANs applied to medical images are currently limited by image resolution and image diversity, and can be challenging to train and scale; federated learning promises to alleviate problems associated with small single-institution datasets, yet it requires robust frameworks and infrastructure; and large language models trained on large public datasets can subsume racial and ethnic biases 171 .

Another central consideration is how to handle the regulatory assessment of ML models for healthcare applications. Current regulation and approval processes are being adapted to meet the emerging needs; in particular, initiatives are attempting to address data shifts and patient representation in the training datasets 165 , 180 , 181 . However, GANs, federated learning and transformer models add complexities to the regulatory process. Few healthcare-specific benchmarking datasets exist to evaluate the performance of these ML systems during clinical deployment. Moreover, the assessment of the performance of GANs is hampered by the lack of efficient and robust metrics to evaluate, compare and control the quality of synthetic data.

Notwithstanding the challenges, the fact that analogous ML technologies are being used daily by millions of individuals in other domains, most prominently in smartphones 100 , search engines 182 and self-driving vehicles 68 , suggests that the challenges of deployment and regulation of ML for healthcare can also be addressed.

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25 , 44–56 (2019).

Article   CAS   Google Scholar  

Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316 , 2402–2410 (2016).

Article   Google Scholar  

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 , 115–118 (2017).

Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med. 1 , 18 (2018).

Rajkomar, A. et al. Automatically charting symptoms from patient-physician conversations using machine learning. JAMA Intern. Med. 179 , 836–838 (2019).

Henry, K. E., Hager, D. N., Pronovost, P. J. & Saria, S. A targeted real-time early warning score (TREWScore) for septic shock. Sci. Transl. Med. 7 , 299ra122 (2015).

Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C. & Faisal, A. A. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24 , 1716–1720 (2018).

Abràmoff, M. D., Lavin, P. T., Birch, M., Shah, N. & Folk, J. C. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. npj Digit. Med. 1 , 39 (2018).

Iacobucci, G. Babylon Health holds talks with ‘significant’ number of NHS trusts. Brit. Med. J. 368 , m266 (2020).

Hale, C. Medtronic to distribute Viz.ai’s stroke-spotting AI imaging software. Fierce Biotech (23 July 2019); https://www.fiercebiotech.com/medtech/medtronic-to-distribute-viz-ai-s-stroke-spotting-ai-imaging-software

Hassan, A. E. et al. Early experience utilizing artificial intelligence shows significant reduction in transfer times and length of stay in a hub and spoke model. Interv. Neuroradiol. 26 , 615–622 (2020).

Ting, D. S. W. et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 318 , 2211–2223 (2017).

McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577 , 89–94 (2020).

Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 26 , 900–908 (2020).

Seyyed-Kalantari, L., Liu, G., McDermott, M., Chen, I. Y. & Ghassemi, M. CheXclusion: fairness gaps in deep chest X-ray classifiers. Pac. Symp. Biocomput. 26 , 232–243 (2021).

Google Scholar  

Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2 , 719–731 (2018).

Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25 , 24–29 (2019).

Frid-Adar, M. et al. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321 , 321–331 (2018).

Shin, H.-C. et al. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In Simulation and Synthesis in Medical Imaging SASHIMI 2018 (eds Gooya, A., Goksel, O., Oguz, I. & Burgos, N.) 1–11 (Springer Cham, 2018).

Salehinejad, H., Valaee, S., Dowdell, T., Colak, E. & Barfett, J. Generalization of deep neural networks for chest pathology classification in X-rays using generative adversarial networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 990–994 (ieeexplore.ieee.org, 2018).

Zhang, Z., Yang, L. & Zheng, Y. Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition 9242–9251 (IEEE, 2018).

Xu, F., Zhang, J., Shi, Y., Kang, K. & Yang, S. A fast low-rank matrix factorization method for dynamic magnetic resonance imaging restoration. In 5th International Conference on Big Data Computing and Communications (BIGCOM) 38–42 (2019).

Goodfellow, I. J. et al. Generative adversarial networks. In Advances in Neural Information Processing Systems 27 (eds .Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. & Weinbergerm, K.Q.) Paper 1384 (Curran, 2014).

Wang, Z., She, Q. & Ward, T. E. Generative adversarial networks in computer vision: a survey and taxonomy. ACM Comput. Surv. 54 , 1–38 (2021).

Radford, A., Metz, L. & Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint at https://arxiv.org/abs/1511.06434v2 (2016).

Denton, E. L., Chintala, S. & Fergus, R. Deep generative image models using a Laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems 28 (eds Cortes, C., Lawrence, N., Lee, D., Sugiyama, M. & Garnett, R.) Paper 903 (Curran, 2015).

Karras, T., Aila, T., Laine, S. & Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations 2018 Paper 447 (ICLR, 2018).

Mirza, M. & Osindero, S. Conditional generative adversarial nets. Preprint at https://arxiv.org/abs/1411.1784v1 (2014).

Odena, A., Olah, C. & Shlens, J. Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34th International Conference on Machine Learning (eds. Precup, D. & Teh, Y. W.) 2642–2651 (PMLR, 2017).

Isola, P., Zhu, J.-Y., Zhou, T. & Efros, A. A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 5967–5976 (2018).

Zhang, H., Goodfellow, I., Metaxas, D. & Odena, A. Self-attention generative adversarial networks. In Proceedings of the 36th International Conference on Machine Learning (eds. Chaudhuri, K. & Salakhutdinov, R.) 7354–7363 (PMLR, 2019).

Wu, Y., Ma, Y., Liu, J., Du, J. & Xing, L. Self-attention convolutional neural network for improved MR image reconstruction. Inf. Sci. 490 , 317–328 (2019).

Brock, A., Donahue, J. & Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations Paper 564 (ICLR, 2019).

Arjovsky, M., Chintala, S. & Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning (eds. Precup, D. & Teh, Y. W.) 214–223 (PMLR, 2017).

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V. & Courville, A. C. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems 30 (eds. Guyon, I. et al.) Paper 2945 (Curran, 2017).

Hindupur, A. The-gan-zoo. https://github.com/hindupuravinash/the-gan-zoo (2018).

Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15 , e1002686 (2018).

Ouyang, D. et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature 580 , 252–256 (2020).

Xue, Y., Xu, T., Zhang, H., Long, L. R. & Huang, X. SegAN: adversarial network with multi-scale L1 loss for medical image segmentation. Neuroinformatics 16 , 383–392 (2018).

Haque, A., Milstein, A. & Fei-Fei, L. Illuminating the dark spaces of healthcare with ambient intelligence. Nature 585 , 193–202 (2020).

Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15 , e1002683 (2018).

Zou, J. & Schiebinger, L. AI can be sexist and racist — it’s time to make it fair. Nature 559 , 324–326 (2018).

Perez, L. & Wang, J. The effectiveness of data augmentation in image classification using deep learning. Preprint at https://arxiv.org/abs/1712.04621v1 (2017).

Madani, A., Moradi, M., Karargyris, A. & Syeda-Mahmood, T. Semi-supervised learning with generative adversarial networks for chest X-ray classification with ability of data domain adaptation. In IEEE 15th International Symposium on Biomedical Imaging (ISBI) 1038–1042 (IEEE, 2018).

He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25 , 30–36 (2019).

Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17 , 195 (2019).

Rocher, L., Hendrickx, J. M. & de Montjoye, Y.-A. Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 10 , 3069 (2019).

Schwarz, C. G. et al. Identification of anonymous MRI research participants with face-recognition software. N. Engl. J. Med. 381 , 1684–1686 (2019).

Chartsias, A., Joyce, T., Dharmakumar, R. & Tsaftaris, S. A. Adversarial image synthesis for unpaired multi-modal cardiac data. in Simulation and Synthesis in Medical Imaging (eds. Tsaftaris, S. A., Gooya, A., Frangi, A. F. & Prince, J. L.) 3–13 (Springer International Publishing, 2017).

Emami, H., Dong, M., Nejad-Davarani, S. P. & Glide-Hurst, C. K. Generating synthetic CTs from magnetic resonance images using generative adversarial networks. Med. Phys . https://doi.org/10.1002/mp.13047 (2018).

Jin, C.-B. et al. Deep CT to MR synthesis using paired and unpaired data. Sensors 19 , 2361 (2019).

Bi, L., Kim, J., Kumar, A., Feng, D. & Fulham, M. In Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment (eds. Cardoso, M. J. et al.) 43–51 (Springer International Publishing, 2017).

Ben-Cohen, A. et al. Cross-modality synthesis from CT to PET using FCN and GAN networks for improved automated lesion detection. Eng. Appl. Artif. Intell. 78 , 186–194 (2019).

Armanious, K. et al. MedGAN: medical image translation using GANs. Comput. Med. Imaging Graph. 79 , 101684 (2020).

Choi, H. & Lee, D. S. Alzheimer’s Disease Neuroimaging Initiative. Generation of structural MR images from amyloid PET: application to MR-less quantification. J. Nucl. Med. 59 , 1111–1117 (2018).

Wei, W. et al. Learning myelin content in multiple sclerosis from multimodal MRI through adversarial training. In Medical Image Computing and Computer Assisted Intervention — MICCAI 2018 (eds. Frangi, A. F., Schnabel, J. A., Davatzikos, C., Alberola-López, C. & Fichtinger, G.) 514–522 (Springer Cham, 2018).

Pan, Y. et al. Synthesizing missing PET from MRI with cycle-consistent generative adversarial networks for Alzheimer’s disease diagnosis. In Medical Image Computing and Computer Assisted Intervention — MICCAI 2018 (eds. Frangi, A. F., Schnabel, J. A., Davatzikos, C., Alberola-López, C. & Fichtinger, G.) 455–463 (Springer Cham, 2018).

Welander, P., Karlsson, S. & Eklund, A. Generative adversarial networks for image-to-image translation on multi-contrast MR images - a comparison of CycleGAN and UNIT. Preprint at https://arxiv.org/abs/1806.07777v1 (2018).

Dar, S. U. H. et al. Image synthesis in multi-contrast MRI with conditional generative adversarial networks. IEEE Trans. Med. Imaging 38 , 2375–2388 (2019).

Zhu, J.-Y., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV) (IEEE, 2017); https://doi.org/10.1109/iccv.2017.244

Maspero, M. et al. Dose evaluation of fast synthetic-CT generation using a generative adversarial network for general pelvis MR-only radiotherapy. Phys. Med. Biol. 63 , 185001 (2018).

Olut, S., Sahin, Y.H., Demir, U., Unal, G. Generative adversarial training for MRA image synthesis using multi-contrast MRI. In PRedictive Intelligence in MEdicine. PRIME 2018. Lecture Notes in Computer Science (eds Rekik, I., Unal, G., Adeli, E. & Park, S.) (Springer Cham, 2018); https://doi.org/10.1007/978-3-030-00320-3_18

Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. K. & Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5 , 493–497 (2021).

Kanakasabapathy, M. K. et al. Adaptive adversarial neural networks for the analysis of lossy and domain-shifted datasets of medical images. Nat. Biomed. Eng. 5 , 571–585 (2021).

Bowles, C., Gunn, R., Hammers, A. & Rueckert, D. Modelling the progression of Alzheimer’s disease in MRI using generative adversarial networks. In Medical Imaging 2018: Image Processing (eds. Angelini, E. D. & Landman, B. A.) 397– 407 (International Society for Optics and Photonics, 2018).

Ravi, D., Alexander, D.C., Oxtoby, N.P. & Alzheimer’s Disease Neuroimaging Initiative. Degenerative adversarial neuroImage nets: generating images that mimic disease progression. In Medical Image Computing and Computer Assisted Intervention — MICCAI 2019. Lecture Notes in Computer Science. (eds Shen, D. et al) 164–172 (Springer, 2019).

Borji, A. Pros and cons of GAN evaluation measures. Comput. Vis. Image Underst. 179 , 41–65 (2019).

Vincent, J. Nvidia uses AI to make it snow on streets that are always sunny. The Verge https://www.theverge.com/2017/12/5/16737260/ai-image-translation-nvidia-data-self-driving-cars (2017).

Kairouz, P. et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. https://doi.org/10.1561/2200000083 (2021)

McMahan, B., Moore, E., Ramage, D., Hampson, S. & Aguera y Arcas, B. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (eds. Singh, A. & Zhu, J.) 1273–1282 (ML Research Press, 2017).

Li, X. et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. Med. Image Anal. 65 , 101765 (2020).

Brisimi, T. S. et al. Federated learning of predictive models from federated Electronic Health Records. Int. J. Med. Inform. 112 , 59–67 (2018).

Lee, J. et al. Privacy-preserving patient similarity learning in a federated environment: development and analysis. JMIR Med. Inform. 6 , e20 (2018).

Dou, Q. et al. Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study. npj Digit. Med. 4 , 60 (2021).

Silva, S. et al. Federated learning in distributed medical databases: meta-analysis of large-scale subcortical brain data. In 2019 IEEE 16th International Symposium on Biomedical Imaging ISBI 2019 18822077 (IEEE, 2019).

Sheller, M. J., Reina, G. A., Edwards, B., Martin, J. & Bakas, S. Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. Brainlesion 11383 , 92–104 (2019).

Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10 , 12598 (2020).

Sarma, K. V. et al. Federated learning improves site performance in multicenter deep learning without data sharing. J. Am. Med. Inform. Assoc. 28 , 1259–1264 (2021).

Li, W. et al. Privacy-preserving federated brain tumour segmentation. In Machine Learning in Medical Imaging (eds. Suk, H.-I., Liu, M., Yan, P. & Lian, C.) 133–141 (Springer International Publishing, 2019).

Shokri, R., Stronati, M., Song, C. & Shmatikov, V. Membership inference attacks against machine learning models. In IEEE Symposium on Security and Privacy SP 2017 3–18 (IEEE, 2017).

Fredrikson, M., Jha, S. & Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security 1322–1333 (Association for Computing Machinery, 2015).

Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64 , 107–115 (2021).

Zhu, L., Liu, Z. & Han, S. Deep leakage from gradients. In Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) Paper 8389 (Curran, 2019)

Abadi, M. et al. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security 308–318 (Association for Computing Machinery, 2016).

Brendan McMahan, H. et al. A general approach to adding differential privacy to iterative training procedures. Preprint at https://arxiv.org/abs/1812.06210v2 (2018).

McMahan, H. B., Ramage, D., Talwar, K. & Zhang, L. Learning differentially private recurrent language models. In ICLR 2018 Sixth International Conference on Learning Representations Paper 504 (ICLR, 2018).

Shokri, R. & Shmatikov, V. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security 1310–1321 (Association for Computing Machinery, 2015).

Lyu, M., Su, D. & Li, N. Understanding the sparse vector technique for differential privacy. Proc. VLDB Endow. 10 , 637–648 (2017).

Hitaj, B., Ateniese, G. & Perez-Cruz, F. Deep models under the GAN: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security 603–618 (Association for Computing Machinery, 2017).

Li, X., Huang, K., Yang, W., Wang, S. & Zhang, Z. On the convergence of FedAvg on Non-IID Data. In ICLR 2020 Eighth International Conference on Learning Representations Paper 261 (2020).

Smith, V., Chiang, C.-K., Sanjabi, M. & Talwalkar, A. S. Federated multi-task learning. In Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) Paper 2307 (NeuIPS, 2017).

Xu, J. et al. Federated learning for healthcare informatics. J. Healthc. Inform. Res. 5 , 1–19 (2021).

Huang, L. et al. LoAdaBoost: loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data. PLoS ONE 15 , e0230706 (2020).

Zhao, Y. et al. Federated learning with non-IID data. Preprint at https://arxiv.org/abs/1806.00582v1 (2018).

Torres-Soto, J. & Ashley, E. A. Multi-task deep learning for cardiac rhythm detection in wearable devices. npj Digit. Med. 3 , 116 (2020).

Turakhia, M. P. et al. Rationale and design of a large-scale, app-based study to identify cardiac arrhythmias using a smartwatch: The Apple Heart Study. Am. Heart J. 207 , 66–75 (2019).

Synced. Apple reveals design of its on-device ML system for federated evaluation and tuning SyncedReview https://syncedreview.com/2021/02/19/apple-reveals-design-of-its-on-device-ml-system-for-federated-evaluation-and-tuning (2021).

McMahan, B. & Ramage, D. Federated learning: collaborative machine learning without centralized training data Google AI Blog https://ai.googleblog.com/2017/04/federated-learning-collaborative.html (2017).

Chen, Y., Qin, X., Wang, J., Yu, C. & Gao, W. FedHealth: a federated transfer learning framework for wearable healthcare. IEEE Intell. Syst. 35 , 83–93 (2020).

Ramage, D. & Mazzocchi, S. Federated analytics: collaborative data science without data collection Google AI Blog https://ai.googleblog.com/2020/05/federated-analytics-collaborative-data.html (2020).

Augenstein, S. et al. Generative models for effective ML on private, decentralized datasets. In ICLR 2020 Eighth International Conference on Learning Representations Paper 1448 (ICLR, 2020).

Pati, S. et al. The federated tumor segmentation (FeTS) challenge. Preprint at https://arxiv.org/abs/2105.05874v2 (2021).

Flores, M. Medical institutions collaborate to improve mammogram assessment AI with Nvidia Clara federated learning The AI Podcast https://blogs.nvidia.com/blog/2020/04/15/federated-learning-mammogram-assessment/ (2020).

Kannan, A., Chen, K., Jaunzeikare, D. & Rajkomar, A. Semi-supervised learning for information extraction from dialogue. In Proc. Interspeech 2018 2077–2081 (ISCA, 2018); https://doi.org/10.21437/interspeech.2018-1318

Chiu, C.-C. et al. Speech recognition for medical conversations. Preprint at https://arxiv.org/abs/1711.07274v2 ; https://doi.org/10.1093/jamia/ocx073 (2017).

Si, Y., Wang, J., Xu, H. & Roberts, K. Enhancing clinical concept extraction with contextual embeddings. J. Am. Med. Inform. Assoc. 26 , 1297–1304 (2019).

Shin, H.-C. et al. Learning to read chest X-rays: recurrent neural cascade model for automated image annotation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016); https://doi.org/10.1109/cvpr.2016.274

Wang, X., Peng, Y., Lu, L., Lu, Z. & Summers, R. M. TieNet: text-image embedding network for common thorax disease classification and reporting in chest X-rays. In IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018 (IEEE, 2018); https://doi.org/10.1109/cvpr.2018.00943

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9 , 1735–1780 (1997).

Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Moschitti, A., Pang, B. & Daelemans, W.) 1724–1734 (Association for Computational Linguistics, 2014).

Lipton, Z. C., Kale, D. C., Elkan, C. & Wetzel, R. Learning to diagnose with LSTM recurrent neural networks. Preprint at https://arxiv.org/abs/1511.03677v7 (2015).

Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F. & Sun, J. Doctor AI: predicting clinical events via recurrent neural networks. JMLR Workshop Conf. Proc. 56 , 301–318 (2016).

Zhu, Paschalidis & Tahmasebi. Clinical concept extraction with contextual word embedding. Preprint at https://doi.org/10.48550/arXiv.1810.10566 (2018).

Cho, K., van Merriënboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (eds Wu, D., Carpuat, M., Carreras, X. & Vecchi, E. M.) 103–111 (Association for Computational Linguistics, 2014).

Gehring, J., Auli, M., Grangier, D., Yarats, D. & Dauphin, Y. N. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1243–1252 (PMLR, 2017).

Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) Paper 3058 (Curran, 2017).

Bahdanau, D., Cho, K. H. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations ICLR 2015 (ICLR, 2015).

Luong, T., Pham, H. & Manning, C. D. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (eds Màrquez, L., Callison-Burch, C. & Su, J.) 1412–1421 (Association for Computational Linguistics, 2015); https://doi.org/10.18653/v1/d15-1166

Sutskever, I., Vinyals, O. & Le, Q. V. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. & Weinberger, K. Q.) Paper 1610 (Curran, 2014).

Krizhevsky, A., Sutskever, I. & Hinton, G. E. in Advances in Neural Information Processing Systems 25 (eds Bartlett, P. et al.) 1097–1105 (Curran, 2012).

Kiani, A. et al. Impact of a deep learning assistant on the histopathologic classification of liver cancer. npj Digit. Med. 3 , 23 (2020).

Park, S.-M. et al. A mountable toilet system for personalized health monitoring via the analysis of excreta. Nat. Biomed. Eng. 4 , 624–635 (2020).

Howard, J. & Ruder, S. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (eds Gurevych, I. & Miyao, Y.) 328–339 (Association for Computational Linguistics, 2018).

Peters, M. E. et al. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Walker, M., Ji, H. & Stent, A.) 2227–2237 (Association for Computational Linguistics, 2018).

Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33 (eds. Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F. & Lin, H.) 1877–1901 (Curran, 2020).

Kenton, J. D. M.-W. C. & Toutanova, L. K. BERT: pre-training of deep bidirectional transformers for language understanding. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Burstein, J., Doran, C. & Solorio, T.) 4171–4186 (Association for Computational Linguistics, 2019).

Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at https://arxiv.org/abs/1301.3781v3 (2013).

Pennington, J., Socher, R. & Manning, C. GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (eds Moschitti, A., Pang, B., Daelemans, W.) 1532–1543 (Association for Computational Linguistics, 2014).

Alsentzer, E. et al. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop (eds Rumshisky, A., Roberts, K., Bethard, S. & Naumann, T.) 72–78 (Association for Computational Linguistics, 2019).

Huang, K., Altosaar, J. & Ranganath, R. ClinicalBERT: modeling clinical notes and predicting hospital readmission. Preprint at https://arxiv.org/abs/1904.05342v3 (2019).

Peng, Y., Yan, S. & Lu, Z. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In Proceedings of the 18th BioNLP Workshop and Shared Task (eds Demner-Fushman, D., Bretonnel Cohen, K., Ananiadou, S. & Tsujii, J.) 58–65 (Association for Computational Linguistics, 2019).

Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3 , 160035 (2016).

Sharir, O., Peleg, B. & Shoham, Y. The cost of training NLP models: a concise overview. Preprint at https://arxiv.org/abs/2004.08900v1 (2020).

Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36 , 1234–1240 (2020).

CAS   Google Scholar  

Beltagy, I., Lo, K. & Cohan, A. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (eds Inui, K., Jiang, J., Ng, V. & Wan, X.) 3615–3620 (Association for Computational Linguistics, 2019).

Futoma, J., Morris, J. & Lucas, J. A comparison of models for predicting early hospital readmissions. J. Biomed. Inform. 56 , 229–238 (2015).

Caruana, R. et al. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In Proc. 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1721–1730 (Association for Computing Machinery, 2015).

Wagner, T. et al. Augmented curation of clinical notes from a massive EHR system reveals symptoms of impending COVID-19 diagnosis. Elife 9 , e58227 (2020).

Eisman, A. S. et al. Extracting angina symptoms from clinical notes using pre-trained transformer architectures. AMIA Annu. Symp. Proc. 2020 , 412–421 (American Medical Informatics Association, 2020).

Smit, A. et al. Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (eds Webber, B., Cohn, T., He, Y. & Liu, Y.) 1500–1519 (Association for Computational Linguistics, 2020).

Soni, S. & Roberts, K. Evaluation of dataset selection for pre-training and fine-tuning transformer language models for clinical question answering. In Proc. 12th Language Resources and Evaluation Conference 5532–5538 (European Language Resources Association, 2020).

Sezgin, E., Huang, Y., Ramtekkar, U. & Lin, S. Readiness for voice assistants to support healthcare delivery during a health crisis and pandemic. npj Digit. Med. 3 , 122 (2020).

Sakthive, V., Kesaven, M. P. V., William, J. M. & Kumar, S. K. M. Integrated platform and response system for healthcare using Alexa. Int. J. Commun. Computer Technol. 7 , 14–22 (2019).

Comstock, J. Buoy Health, CVS MinuteClinic partner to send patients from chatbot to care. mobihealthnews https://www.mobihealthnews.com/content/buoy-health-cvs-minuteclinic-partner-send-patients-chatbot-care (2018).

Razzaki, S. et al. A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis. Preprint at https://doi.org/10.48550/arXiv.1806.10698 (2018).

Xiong, Y., Du, B. & Yan, P. Reinforced transformer for medical image captioning. In Machine Learning in Medical Imaging (eds. Suk, H.-I., Liu, M., Yan, P. & Lian, C.) 673–680 (Springer International Publishing, 2019).

Meng, Y., Speier, W., Ong, M. K. & Arnold, C. W. Bidirectional representation learning from transformers using multimodal electronic health record data to predict depression. IEEE J. Biomed. Health Inform. 25 , 3121–3129 (2021).

Choi, E. et al. Learning the graphical structure of electronic health records with graph convolutional transformer. Proc. Conf. AAAI Artif. Intell. 34 , 606–613 (2020).

Li, F. et al. Fine-tuning bidirectional encoder representations from transformers (BERT)–based models on large-scale electronic health record notes: an empirical study. JMIR Med. Inform. 7 , e14830 (2019).

Rasmy, L., Xiang, Y., Xie, Z., Tao, C. & Zhi, D. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digital Medicine 4 , 86 (2021).

Shang, J., Ma, T., Xiao, C. & Sun, J. Pre-training of graph augmented transformers for medication recommendation. in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (ed. Kraus, S.) 5953–5959 (International Joint Conferences on Artificial Intelligence Organization, 2019); https://doi.org/10.24963/ijcai.2019/825

Li, Y. et al. BEHRT: transformer for electronic health records. Sci. Rep. 10 , 7155 (2020).

Rao, S. et al. BEHRT-HF: an interpretable transformer-based, deep learning model for prediction of incident heart failure. Eur. Heart J. 41 (Suppl. 2), ehaa946.3553 (2020).

Qian, X. et al. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat. Biomed. Eng. 5 , 522–532 (2021).

Xing, L., Giger, M. L. & Min, J. K. Artificial Intelligence in Medicine: Technical Basis and Clinical Applications (Academic Press, 2020).

Reisman, M. EHRs: the challenge of making electronic data usable and interoperable. P. T. 42 , 572–575 (2017).

Cortés, R., Bonnaire, X., Marin, O. & Sens, P. Stream processing of healthcare sensor data: studying user traces to identify challenges from a big data perspective. Procedia Comput. Sci. 52 , 1004–1009 (2015).

Zhang, F., Cao, J., Khan, S. U., Li, K. & Hwang, K. A task-level adaptive MapReduce framework for real-time streaming data in healthcare applications. Future Gener. Comput. Syst. 43–44 , 149–160 (2015).

El Aboudi, N. & Benhlima, L. Big data management for healthcare systems: architecture, requirements, and implementation. Adv. Bioinformatics 2018 , 4059018 (2018).

Ta, V.-D., Liu, C.-M. & Nkabinde, G. W. Big data stream computing in healthcare real-time analytics. In IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA) 37–42 (ieeexplore.ieee.org, 2016).

Data-Driven Healthcare Organizations Use Big Data Analytics for Big Gains White Paper (IBM Software, 2017); https://silo.tips/download/ibm-software-white-paper-data-driven-healthcare-organizations-use-big-data-analy

Futoma, J., Simons, M., Panch, T., Doshi-Velez, F. & Celi, L. A. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit. Health 2 , e489–e492 (2020).

Wang, X. et al. Inconsistent performance of deep learning models on mammogram classification. J. Am. Coll. Radiol. 17 , 796–803 (2020).

Nestor, B., McDermott, M. B. A. & Boag, W. Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks. Preprint at https://doi.org/10.48550/arXiv.1908.00690 (2019).

Wu, E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat. Med . https://doi.org/10.1038/s41591-021-01312-x (2021).

Barish, M., Bolourani, S., Lau, L. F., Shah, S. & Zanos, T. P. External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19. Nat. Mach. Intell. 3 , 25–27 (2020).

Davis, S. E., Lasko, T. A., Chen, G., Siew, E. D. & Matheny, M. E. Calibration drift in regression and machine learning models for acute kidney injury. J. Am. Med. Inform. Assoc. 24 , 1052–1061 (2017).

Wang, G. et al. A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images. Nat. Biomed. Eng. 5 , 509–521 (2021).

Ning, W. et al. Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning. Nat. Biomed. Eng. 4 , 1197–1207 (2020).

Koenecke, A. et al. Racial disparities in automated speech recognition. Proc. Natl Acad. Sci. USA 117 , 7684–7689 (2020).

Abid, A., Farooqi, M. & Zou, J. Large language models associate muslims with violence. Nat. Mach. Intell. 3 , 461–463 (2021).

Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366 , 447–453 (2019).

Adamson, A. S. & Smith, A. Machine learning and health care disparities in dermatology. JAMA Dermatol . 154 , 1247–1248 (2018).

Han, S. S. et al. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J. Invest. Dermatol. 138 , 1529–1538 (2018).

Subbaswamy, A., Adams, R. & Saria, S. Evaluating model robustness and stability to dataset shift. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (eds. Banerjee, A. & Fukumizu, K.) 2611–2619 (PMLR, 2021).

Izzo, Z., Ying, L. & Zou, J. How to learn when data reacts to your model: performative gradient descent. In Proceedings of the 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) 4641–4650 (PMLR, 2021).

Ghorbani, A., Kim, M. & Zou, J. A Distributional framework for data valuation. In Proceedings of the 37th International Conference on Machine Learning (eds. Iii, H. D. & Singh, A.) 3535–3544 (PMLR, 2020).

Zhang, L., Deng, Z., Kawaguchi, K., Ghorbani, A. & Zou, J. How does mixup help with robustness and generalization? In International Conference on Learning Representations 2021 Paper 2273 (ICLR, 2021).

Schulam, P. & Saria, S. Can you trust this prediction? Auditing pointwise reliability after learning. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (eds. Chaudhuri, K. & Sugiyama, M.) 1022–1031 (PMLR, 2019).

Liu, X. et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat. Med. 26 , 1364–1374 (2020).

Cruz Rivera, S. et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat. Med. 26 , 1351–1363 (2020).

Nayak, P. Understanding searches better than ever before. Google The Keyword https://blog.google/products/search/search-language-understanding-bert/ (2019).

Baur, C., Albarqouni, S. & Navab, N. in OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis (eds Stoyanov, D. et al.) 260–267 (Springer International Publishing, 2018).

Kang, E., Koo, H. J., Yang, D. H., Seo, J. B. & Ye, J. C. Cycle-consistent adversarial denoising network for multiphase coronary CT angiography. Med. Phys. 46 , 550–562 (2019).

Vig, J. A multiscale visualization of attention in the transformer model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (Costa-jussà, M. R. & Alfonseca, E.) 37–42 (Association for Computational Linguistics, 2019).

Download references

Acknowledgements

This work was supported in part by the National Institutes of Health via grants F30HL156478 (to A.Z.), R01CA227713 (to L.X.), R01CA256890 (to L.X.), P30AG059307 (to J.Z.), U01MH098953 (to J.Z.), P01HL141084 (to J.C.W), R01HL163680 (to J.C.W), R01HL130020 (to J.C.W), R01HL146690 (to J.C.W.) and R01HL126527 (to J.C.W.); by the National Science Foundation grant CAREER1942926 (to J.Z.); and by the American Heart Association grant 17MERIT3361009 (to J.C.W.). Figures were created with BioRender.com.

Author information

Authors and affiliations.

Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, CA, USA

Angela Zhang & Joseph C. Wu

Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA

Angela Zhang

Greenstone Biosciences, Palo Alto, CA, USA

Department of Computer Science, Stanford University, Stanford, CA, USA

Angela Zhang & James Zou

Department of Radiation Oncology, School of Medicine, Stanford University, Stanford, CA, USA

Department of Biomedical Informatics, School of Medicine, Stanford University, Stanford, CA, USA

Departments of Medicine, Division of Cardiovascular Medicine Stanford University, Stanford, CA, USA

Joseph C. Wu

Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

A.Z. and J.C.W. drafted the manuscript. All authors contributed to the conceptualization and editing of the manuscript.

Corresponding authors

Correspondence to Angela Zhang or Joseph C. Wu .

Ethics declarations

Competing interests.

J.C.W. is a co-founder and scientific advisory board member of Greenstone Biosciences. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Biomedical Engineering thanks Pearse Keane, Faisal Mahmood and Hadi Shafiee for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Zhang, A., Xing, L., Zou, J. et al. Shifting machine learning for healthcare from development to deployment and from models to data. Nat. Biomed. Eng 6 , 1330–1345 (2022). https://doi.org/10.1038/s41551-022-00898-y

Download citation

Received : 24 January 2021

Accepted : 03 May 2022

Published : 04 July 2022

Issue Date : December 2022

DOI : https://doi.org/10.1038/s41551-022-00898-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Generative models improve fairness of medical classifiers under distribution shifts.

  • Olivia Wiles

Nature Medicine (2024)

Cervical cancer survival prediction by machine learning algorithms: a systematic review

  • Milad Rahimi
  • Atieh Akbari
  • Hassan Emami

BMC Cancer (2023)

Systematic review: fluid biomarkers and machine learning methods to improve the diagnosis from mild cognitive impairment to Alzheimer’s disease

  • Kevin Blanco
  • Stefanny Salcidua
  • Rolando de la Cruz

Alzheimer's Research & Therapy (2023)

Prepare for truly useful large language models

Nature Biomedical Engineering (2023)

A framework for integrating artificial intelligence for clinical care with continuous therapeutic monitoring

  • Shvetank Prakash
  • Pranav Rajpurkar

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

machine learning in healthcare research papers pdf

Machine Learning in Healthcare: A Review

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Future Healthc J
  • v.8(2); 2021 Jul

Logo of futhealthcj

Artificial intelligence in healthcare: transforming the practice of medicine

Junaid bajwa.

A Microsoft Research, Cambridge, UK

Usman Munir

B Microsoft Research, Cambridge, UK

Aditya Nori

C Microsoft Research, Cambridge, UK

Bryan Williams

D University College London, London, UK and director, NIHR UCLH Biomedical Research Centre, London, UK

Artificial intelligence (AI) is a powerful and disruptive area of computer science, with the potential to fundamentally transform the practice of medicine and the delivery of healthcare. In this review article, we outline recent breakthroughs in the application of AI in healthcare, describe a roadmap to building effective, reliable and safe AI systems, and discuss the possible future direction of AI augmented healthcare systems.

Introduction

Healthcare systems around the world face significant challenges in achieving the ‘quadruple aim’ for healthcare: improve population health, improve the patient's experience of care, enhance caregiver experience and reduce the rising cost of care. 1–3 Ageing populations, growing burden of chronic diseases and rising costs of healthcare globally are challenging governments, payers, regulators and providers to innovate and transform models of healthcare delivery. Moreover, against a backdrop now catalysed by the global pandemic, healthcare systems find themselves challenged to ‘perform’ (deliver effective, high-quality care) and ‘transform’ care at scale by leveraging real-world data driven insights directly into patient care. The pandemic has also highlighted the shortages in healthcare workforce and inequities in the access to care, previously articulated by The King's Fund and the World Health Organization (Box ​ (Box1 1 ). 4,5

Workforce challenges in the next decade

The application of technology and artificial intelligence (AI) in healthcare has the potential to address some of these supply-and-demand challenges. The increasing availability of multi-modal data (genomics, economic, demographic, clinical and phenotypic) coupled with technology innovations in mobile, internet of things (IoT), computing power and data security herald a moment of convergence between healthcare and technology to fundamentally transform models of healthcare delivery through AI-augmented healthcare systems.

In particular, cloud computing is enabling the transition of effective and safe AI systems into mainstream healthcare delivery. Cloud computing is providing the computing capacity for the analysis of considerably large amounts of data, at higher speeds and lower costs compared with historic ‘on premises’ infrastructure of healthcare organisations. Indeed, we observe that many technology providers are increasingly seeking to partner with healthcare organisations to drive AI-driven medical innovation enabled by cloud computing and technology-related transformation (Box ​ (Box2 2 ). 6–8

Quotes from technology leaders

Here, we summarise recent breakthroughs in the application of AI in healthcare, describe a roadmap to building effective AI systems and discuss the possible future direction of AI augmented healthcare systems.

What is artificial intelligence?

Simply put, AI refers to the science and engineering of making intelligent machines, through algorithms or a set of rules, which the machine follows to mimic human cognitive functions, such as learning and problem solving. 9 AI systems have the potential to anticipate problems or deal with issues as they come up and, as such, operate in an intentional, intelligent and adaptive manner. 10 AI's strength is in its ability to learn and recognise patterns and relationships from large multidimensional and multimodal datasets; for example, AI systems could translate a patient's entire medical record into a single number that represents a likely diagnosis. 11,12 Moreover, AI systems are dynamic and autonomous, learning and adapting as more data become available. 13

AI is not one ubiquitous, universal technology, rather, it represents several subfields (such as machine learning and deep learning) that, individually or in combination, add intelligence to applications. Machine learning (ML) refers to the study of algorithms that allow computer programs to automatically improve through experience. 14 ML itself may be categorised as ‘supervised’, ‘unsupervised’ and ‘reinforcement learning’ (RL), and there is ongoing research in various sub-fields including ‘semi-supervised’, ‘self-supervised’ and ‘multi-instance’ ML.

  • Supervised learning leverages labelled data (annotated information); for example, using labelled X-ray images of known tumours to detect tumours in new images. 15
  • ‘Unsupervised learning’ attempts to extract information from data without labels; for example, categorising groups of patients with similar symptoms to identify a common cause. 16
  • In RL, computational agents learn by trial and error, or by expert demonstration. The algorithm learns by developing a strategy to maximise rewards. Of note, major breakthroughs in AI in recent years have been based on RL.
  • Deep learning (DL) is a class of algorithms that learns by using a large, many-layered collection of connected processes and exposing these processors to a vast set of examples. DL has emerged as the predominant method in AI today driving improvements in areas such as image and speech recognition. 17,18

How to build effective and trusted AI-augmented healthcare systems?

Despite more than a decade of significant focus, the use and adoption of AI in clinical practice remains limited, with many AI products for healthcare still at the design and develop stage. 19–22 While there are different ways to build AI systems for healthcare, far too often there are attempts to force square pegs into round holes ie find healthcare problems to apply AI solutions to without due consideration to local context (such as clinical workflows, user needs, trust, safety and ethical implications).

We hold the view that AI amplifies and augments, rather than replaces, human intelligence. Hence, when building AI systems in healthcare, it is key to not replace the important elements of the human interaction in medicine but to focus it, and improve the efficiency and effectiveness of that interaction. Moreover, AI innovations in healthcare will come through an in-depth, human-centred understanding of the complexity of patient journeys and care pathways.

In Fig ​ Fig1, 1 , we describe a problem-driven, human-centred approach, adapted from frameworks by Wiens et al , Care and Sendak to building effective and reliable AI-augmented healthcare systems. 23–25

An external file that holds a picture, illustration, etc.
Object name is futurehealth-8-2-e188fig1.jpg

Multi-step, iterative approach to build effective and reliable AI-augmented systems in healthcare.

Design and develop

The first stage is to design and develop AI solutions for the right problems using a human-centred AI and experimentation approach and engaging appropriate stakeholders, especially the healthcare users themselves.

Stakeholder engagement and co-creation

Build a multidisciplinary team including computer and social scientists, operational and research leadership, and clinical stakeholders (physician, caregivers and patients) and subject experts (eg for biomedical scientists) that would include authorisers, motivators, financiers, conveners, connectors, implementers and champions. 26 A multi-stakeholder team brings the technical, strategic, operational expertise to define problems, goals, success metrics and intermediate milestones.

Human-centred AI

A human-centred AI approach combines an ethnographic understanding of health systems, with AI. Through user-designed research, first understand the key problems (we suggest using a qualitative study design to understand ‘what is the problem’, ‘why is it a problem’, ‘to whom does it matter’, ‘why has it not been addressed before’ and ‘why is it not getting attention’) including the needs, constraints and workflows in healthcare organisations, and the facilitators and barriers to the integration of AI within the clinical context. After defining key problems, the next step is to identify which problems are appropriate for AI to solve, whether there is availability of applicable datasets to build and later evaluate AI. By contextualising algorithms in an existing workflow, AI systems would operate within existing norms and practices to ensure adoption, providing appropriate solutions to existing problems for the end user.

Experimentation

The focus should be on piloting of new stepwise experiments to build AI tools, using tight feedback loops from stakeholders to facilitate rapid experiential learning and incremental changes. 27 The experiments would allow the trying out of new ideas simultaneously, exploring to see which one works, learn what works and what doesn't, and why. 28 Experimentation and feedback will help to elucidate the purpose and intended uses for the AI system: the likely end users and the potential harm and ethical implications of AI system to them (for instance, data privacy, security, equity and safety).

Evaluate and validate

Next, we must iteratively evaluate and validate the predictions made by the AI tool to test how well it is functioning. This is critical, and evaluation is based on three dimensions: statistical validity, clinical utility and economic utility.

  • Statistical validity is understanding the performance of AI on metrics of accuracy, reliability, robustness, stability and calibration. High model performance on retrospective, in silico settings is not sufficient to demonstrate clinical utility or impact.
  • To determine clinical utility, evaluate the algorithm in a real-time environment on a hold-out and temporal validation set (eg longitudinal and external geographic datasets) to demonstrate clinical effectiveness and generalisability. 25
  • Economic utility quantifies the net benefit relative to the cost from the investment in the AI system.

Scale and diffuse

Many AI systems are initially designed to solve a problem at one healthcare system based on the patient population specific to that location and context. Scale up of AI systems requires special attention to deployment modalities, model updates, the regulatory system, variation between systems and reimbursement environment.

Monitor and maintain

Even after an AI system has been deployed clinically, it must be continually monitored and maintained to monitor for risks and adverse events using effective post-market surveillance. Healthcare organisations, regulatory bodies and AI developers should cooperate to collate and analyse the relevant datasets for AI performance, clinical and safety-related risks, and adverse events. 29

What are the current and future use cases of AI in healthcare?

AI can enable healthcare systems to achieve their ‘quadruple aim’ by democratising and standardising a future of connected and AI augmented care, precision diagnostics, precision therapeutics and, ultimately, precision medicine (Table ​ (Table1 1 ). 30 Research in the application of AI healthcare continues to accelerate rapidly, with potential use cases being demonstrated across the healthcare sector (both physical and mental health) including drug discovery, virtual clinical consultation, disease diagnosis, prognosis, medication management and health monitoring.

Widescale adoption and application of artificial intelligence in healthcare

Timings are illustrative to widescale adoption of the proposed innovation taking into account challenges / regulatory environment / use at scale.

We describe a non-exhaustive suite of AI applications in healthcare in the near term, medium term and longer term, for the potential capabilities of AI to augment, automate and transform medicine.

AI today (and in the near future)

Currently, AI systems are not reasoning engines ie cannot reason the same way as human physicians, who can draw upon ‘common sense’ or ‘clinical intuition and experience’. 12 Instead, AI resembles a signal translator, translating patterns from datasets. AI systems today are beginning to be adopted by healthcare organisations to automate time consuming, high volume repetitive tasks. Moreover, there is considerable progress in demonstrating the use of AI in precision diagnostics (eg diabetic retinopathy and radiotherapy planning).

AI in the medium term (the next 5–10 years)

In the medium term, we propose that there will be significant progress in the development of powerful algorithms that are efficient (eg require less data to train), able to use unlabelled data, and can combine disparate structured and unstructured data including imaging, electronic health data, multi-omic, behavioural and pharmacological data. In addition, healthcare organisations and medical practices will evolve from being adopters of AI platforms, to becoming co-innovators with technology partners in the development of novel AI systems for precision therapeutics.

AI in the long term (>10 years)

In the long term, AI systems will become more intelligent , enabling AI healthcare systems achieve a state of precision medicine through AI-augmented healthcare and connected care. Healthcare will shift from the traditional one-size-fits-all form of medicine to a preventative, personalised, data-driven disease management model that achieves improved patient outcomes (improved patient and clinical experiences of care) in a more cost-effective delivery system.

Connected/augmented care

AI could significantly reduce inefficiency in healthcare, improve patient flow and experience, and enhance caregiver experience and patient safety through the care pathway; for example, AI could be applied to the remote monitoring of patients (eg intelligent telehealth through wearables/sensors) to identify and provide timely care of patients at risk of deterioration.

In the long term, we expect that healthcare clinics, hospitals, social care services, patients and caregivers to be all connected to a single, interoperable digital infrastructure using passive sensors in combination with ambient intelligence. 31 Following are two AI applications in connected care.

Virtual assistants and AI chatbots

AI chatbots (such as those used in Babylon ( www.babylonhealth.com ) and Ada ( https://ada.com )) are being used by patients to identify symptoms and recommend further actions in community and primary care settings. AI chatbots can be integrated with wearable devices such as smartwatches to provide insights to both patients and caregivers in improving their behaviour, sleep and general wellness.

Ambient and intelligent care

We also note the emergence of ambient sensing without the need for any peripherals.

  • Emerald ( www.emeraldinno.com ): a wireless, touchless sensor and machine learning platform for remote monitoring of sleep, breathing and behaviour, founded by Massachusetts Institute of Technology faculty and researchers.
  • Google nest: claiming to monitor sleep (including sleep disturbances like cough) using motion and sound sensors. 32
  • A recently published article exploring the ability to use smart speakers to contactlessly monitor heart rhythms. 33
  • Automation and ambient clinical intelligence: AI systems leveraging natural language processing (NLP) technology have the potential to automate administrative tasks such as documenting patient visits in electronic health records, optimising clinical workflow and enabling clinicians to focus more time on caring for patients (eg Nuance Dragon Ambient eXperience ( www.nuance.com/healthcare/ambient-clinical-intelligence.html )).

Precision diagnostics

Diagnostic imaging.

The automated classification of medical images is the leading AI application today. A recent review of AI/ML-based medical devices approved in the USA and Europe from 2015–2020 found that more than half (129 (58%) devices in the USA and 126 (53%) devices in Europe) were approved or CE marked for radiological use. 34 Studies have demonstrated AI's ability to meet or exceed the performance of human experts in image-based diagnoses from several medical specialties including pneumonia in radiology (a convolutional neural network trained with labelled frontal chest X-ray images outperformed radiologists in detecting pneumonia), dermatology (a convolutional neural network was trained with clinical images and was found to classify skin lesions accurately), pathology (one study trained AI algorithms with whole-slide pathology images to detect lymph node metastases of breast cancer and compared the results with those of pathologists) and cardiology (a deep learning algorithm diagnosed heart attack with a performance comparable with that of cardiologists). 35–38

We recognise that there are some exemplars in this area in the NHS (eg University of Leeds Virtual Pathology Project and the National Pathology Imaging Co-operative) and expect widescale adoption and scaleup of AI-based diagnostic imaging in the medium term. 39 We provide two use cases of such technologies.

Diabetic retinopathy screening

Key to reducing preventable, diabetes-related vision loss worldwide is screening individuals for detection and the prompt treatment of diabetic retinopathy. However, screening is costly given the substantial number of diabetes patients and limited manpower for eye care worldwide. 40 Research studies on automated AI algorithms for diabetic retinopathy in the USA, Singapore, Thailand and India have demonstrated robust diagnostic performance and cost effectiveness. 41–44 Moreover, Centers for Medicare & Medicaid Services approved Medicare reimbursement for the use of Food and Drug Administration approved AI algorithm ‘IDx-DR’, which demonstrated 87% sensitivity and 90% specificity for detecting more-than-mild diabetic retinopathy. 45

Improving the precision and reducing waiting timings for radiotherapy planning

An important AI application is to assist clinicians for image preparation and planning tasks for radiotherapy cancer treatment. Currently, segmentation of the images is time consuming and laborious task, performed manually by an oncologist using specially designed software to draw contours around the regions of interest. The AI-based InnerEye open-source technology can cut this preparation time for head and neck, and prostate cancer by up to 90%, meaning that waiting times for starting potentially life-saving radiotherapy treatment can be dramatically reduced (Fig ​ (Fig2 2 ). 46,47

An external file that holds a picture, illustration, etc.
Object name is futurehealth-8-2-e188fig2.jpg

Potential applications for the InnerEye deep learning toolkit include quantitative radiology for monitoring tumour progression, planning for surgery and radiotherapy planning. 47

Precision therapeutics

To make progress towards precision therapeutics, we need to considerably improve our understanding of disease. Researchers globally are exploring the cellular and molecular basis of disease, collecting a range of multimodal datasets that can lead to digital and biological biomarkers for diagnosis, severity and progression. Two important future AI applications include immunomics / synthetic biology and drug discovery.

Immunomics and synthetic biology

Through the application of AI tools on multimodal datasets in the future, we may be able to better understand the cellular basis of disease and the clustering of diseases and patient populations to provide more targeted preventive strategies, for example, using immunomics to diagnose and better predict care and treatment options. This will be revolutionary for multiple standards of care, with particular impact in the cancer, neurological and rare disease space, personalising the experience of care for the individual.

AI-driven drug discovery

AI will drive significant improvement in clinical trial design and optimisation of drug manufacturing processes, and, in general, any combinatorial optimisation process in healthcare could be replaced by AI. We have already seen the beginnings of this with the recent announcements by DeepMind and AlphaFold, which now sets the stage for better understanding disease processes, predicting protein structures and developing more targeted therapeutics (for both rare and more common diseases; Fig ​ Fig3 3 ). 48,49

An external file that holds a picture, illustration, etc.
Object name is futurehealth-8-2-e188fig3.jpg

An overview of the main neural network model architecture for AlphaFold. 49 MSA = multiple sequence alignment.

Precision medicine

New curative therapies.

Over the past decade, synthetic biology has produced developments like CRISPR gene editing and some personalised cancer therapies. However, the life cycle for developing such advanced therapies is still extremely inefficient and expensive.

In future, with better access to data (genomic, proteomic, glycomic, metabolomic and bioinformatic), AI will allow us to handle far more systematic complexity and, in turn, help us transform the way we understand, discover and affect biology. This will improve the efficiency of the drug discovery process by helping better predict early which agents are more likely to be effective and also better anticipate adverse drug effects, which have often thwarted the further development of otherwise effective drugs at a costly late stage in the development process. This, in turn will democratise access to novel advanced therapies at a lower cost.

AI empowered healthcare professionals

In the longer term, healthcare professionals will leverage AI in augmenting the care they provide, allowing them to provide safer, standardised and more effective care at the top of their licence; for example, clinicians could use an ‘AI digital consult’ to examine ‘digital twin’ models of their patients (a truly ‘digital and biomedical’ version of a patient), allowing them to ‘test’ the effectiveness, safety and experience of an intervention (such as a cancer drug) in the digital environment prior to delivering the intervention to the patient in the real world.

We recognise that there are significant challenges related to the wider adoption and deployment of AI into healthcare systems. These challenges include, but are not limited to, data quality and access, technical infrastructure, organisational capacity, and ethical and responsible practices in addition to aspects related to safety and regulation. Some of these issues have been covered, but others go beyond the scope of this current article.

Conclusion and key recommendations

Advances in AI have the potential to transform many aspects of healthcare, enabling a future that is more personalised, precise, predictive and portable. It is unclear if we will see an incremental adoption of new technologies or radical adoption of these technological innovations, but the impact of such technologies and the digital renaissance they bring requires health systems to consider how best they will adapt to the changing landscape. For the NHS, the application of such technologies truly has the potential to release time for care back to healthcare professionals, enabling them to focus on what matters to their patients and, in the future, leveraging a globally democratised set of data assets comprising the ‘highest levels of human knowledge’ to ‘work at the limits of science’ to deliver a common high standard of care, wherever and whenever it is delivered, and by whoever. 50 Globally, AI could become a key tool for improving health equity around the world.

As much as the last 10 years have been about the roll out of digitisation of health records for the purposes of efficiency (and in some healthcare systems, billing/reimbursement), the next 10 years will be about the insight and value society can gain from these digital assets, and how these can be translated into driving better clinical outcomes with the assistance of AI, and the subsequent creation of novel data assets and tools. It is clear that we are at an turning point as it relates to the convergence of the practice of medicine and the application of technology, and although there are multiple opportunities, there are formidable challenges that need to be overcome as it relates to the real world and the scale of implementation of such innovation. A key to delivering this vision will be an expansion of translational research in the field of healthcare applications of artificial intelligence. Alongside this, we need investment into the upskilling of a healthcare workforce and future leaders that are digitally enabled, and to understand and embrace, rather than being intimidated by, the potential of an AI-augmented healthcare system.

Healthcare leaders should consider (as a minimum) these issues when planning to leverage AI for health:

  • processes for ethical and responsible access to data: healthcare data is highly sensitive, inconsistent, siloed and not optimised for the purposes of machine learning development, evaluation, implementation and adoption
  • access to domain expertise / prior knowledge to make sense and create some of the rules which need to be applied to the datasets (to generate the necessary insight)
  • access to sufficient computing power to generate decisions in real time, which is being transformed exponentially with the advent of cloud computing
  • research into implementation: critically, we must consider, explore and research issues which arise when you take the algorithm and put it in the real world, building ‘trusted’ AI algorithms embedded into appropriate workflows.

Advertisement

Advertisement

Systematic Mapping Study of AI/Machine Learning in Healthcare and Future Directions

  • Survey Article
  • Published: 16 September 2021
  • Volume 2 , article number  461 , ( 2021 )

Cite this article

machine learning in healthcare research papers pdf

  • Gaurav Parashar   ORCID: orcid.org/0000-0003-4869-1819 1 ,
  • Alka Chaudhary 1 &
  • Ajay Rana 1  

10 Citations

3 Altmetric

Explore all metrics

This study attempts to categorise research conducted in the area of: use of machine learning in healthcare , using a systematic mapping study methodology. In our attempt, we reviewed literature from top journals, articles, and conference papers by using the keywords use of machine learning in healthcare . We queried Google Scholar, resulted in 1400 papers, and then categorised the results on the basis of the objective of the study, the methodology adopted, type of problem attempted and disease studied. As a result we were able to categorize study in five different categories namely, interpretable ML, evaluation of medical images, processing of EHR, security/privacy framework, and transfer learning. In the study we also found that most of the authors have studied cancer, and one of the least studied disease was epilepsy, evaluation of medical images is the most researched and a new field of research, Interpretable ML/Explainable AI, is gaining momentum. Our basic intent is to provide a fair idea to future researchers about the field and future directions.

Similar content being viewed by others

machine learning in healthcare research papers pdf

Deep learning in radiology: ethics of data and on the value of algorithm transparency, interpretability and explainability

machine learning in healthcare research papers pdf

Designing User-Centric Explanations for Medical Imaging with Informed Machine Learning

machine learning in healthcare research papers pdf

Interpretable AI in Healthcare: Enhancing Fairness, Safety, and Trust

Avoid common mistakes on your manuscript.

Introduction

Artificial intelligence (AI) can be defined as a field in which the machine demonstrates intelligence by learning itself. It can be done by deploying various techniques & algorithms to understand human intelligence but does not confine to—John McCarthy. Even though if we do not specifically program the machine and still it can automatically learn and improve itself this defines an intelligent behaviour of machine. Machine learning (ML) is a specific field of AI which relates to techniques that can automatically learn from experience.

The use of machine learning in healthcare had shown many promising solutions which had created confidence in the field. Researchers had used ICT tools with ML in developing solutions for increasing the effectiveness of the earlier methods or procedures. The field of healthcare had also shown tremendous improvement after the use of Big Data, ICT, and AI/machine learning (ML) in precision and speed. These tools have greatly helped physicians and healthcare professionals in their day-to-day working, research, testing the effect of biomedicine on humans using simulations. Every single detail of the patient gets recorded by the doctors with the other information like clinical notes, prescriptions, medical test results, diagnosis, X-rays, MRI scan, sonographic images, etc. This data becomes huge repository of information, which if churned, could give us better insights of treatment, fruitful suggestions and recommendations in diagnosis, progressive pattern of one disease could be correlated to another disease and may lead to new procedure for treatment of a disease and many more. There may be a chance that healthcare professional overlooked a symptom, which if not addressed early could lead to loss of life. Therefore, tools like AI/ML could help in better healthcare services.

The use of tools like IBM Watson Footnote 1 and Google DeepMind [ 1 ] have shown impressive results in healthcare. On top of these tools researchers and developers have designed applications, which harness the capabilities of these tools, to provide personalised patient care, better drug discovery, and improved healthcare organisational performance. According to Wired Footnote 2 Google DeepMind was used to identify protein structures associated with SARS-CoV-2 and understand how the virus functions. One of the oldest scientific puzzle of ’protein folding problem’ was also solved and paved the way for faster development of drugs, better treatment by Google DeepMind. Other contributions in the field of healthcare are use of association rules(AR) which helped analyse malaria in Brazil [ 2 ], according to [ 3 ] diagnosing images of X-rays revealed respiratory condition of the patient and helped in better healthcare services.

ML can be applied to varied fields like defence, automation, finance, automobile, and manufacturing in performing tasks like classification, clustering, and forecasting. It can be categorised into three types supervised, unsupervised, and reinforcement learning.

In supervised learning, algorithms learn from the labeled datasets and prepare a model. After training, we give data to the model, which the model has not seen earlier and belongs to the same category so that it can correctly classify it. In unsupervised learning, algorithms themselves learn, by analysing data and the model then prepares a model, which can be used to correctly cluster the elements. Lastly, in reinforcement learning, machine learns itself from its mistakes or maximising rewards and by reducing penalties.

In this paper, we aim to categorise papers based on the healthcare and machine learning.With the use of AI and ML in healthcare, there have been significant changes in the life of healthcare professionals. The accuracy of medical diagnostic has increased, healthcare professionals have an assistant on which they can rely, they can predict diseases like pneumonia, cancer, heart diseases, Tumour, COVID-19, and many more with better accuracy & precision than before.

In this paper, we attempt to categorise research done in use of machine learning in healthcare. According to best of our knowledge this type of categorisation has not been done earlier . This attempt will become the basis of future research in the field. We attempt to categorise them on the basis of the objective of the study, methodology adopted, type of problem attempted, etc. We discussed in section “ Research Methodology ”, in section “ Literature Survey ”, in section “ Results ” and section “ Conclusion ”.

Research Methodology

This section describes the systematic mapping procedure adopted to study the use of AI/ML in the healthcare domain. This study was conducted using the keywords “Machine Learning” OR ”Healthcare” . The search was conducted on Google scholar and considered only results from Nature, Wiley periodicals, Elsevier, Taylor and Francis, IEEE transactions, ACM, SVN, IET, and ArXiv. Following steps were carried out: (1) Definition of research questions (2) conduct search for primary studies (3) screening of papers for inclusion and exclusion (4) keywording using abstracts (5) data extraction and mapping of studies. The above steps were proposed by [ 4 ].

figure 1

The systematic mapping process

The Systematic Mapping Process

We have adopted the systematic mapping process from [ 4 ] and applied it to the study conducted on use of ML in the healthcare .

Systematic Mapping Process is a well defined, comprehensive overview study done on a particular research topic. According to [ 5 ] it helps researchers do verifiable, unbiased literature review, find research gaps by critical examination of research literature, helps collate evidence, reduce reviewer selection bias & publication bias with transparent inclusion and exclusion criteria.

The process is described here:

We first define research questions and scope of the study.

With respect to the questions framed from the previous step now search is conducted and literature is collected.

Proper screening is done to check whether the selected literature is related to the research question and scope of the research.

Abstract and keywords are scanned for critical survey of the content

In a spreadsheet, collected data is mapped with the RQs.

In the following (see Fig.  1 ) we had shown the process which we had implemented in the study.

Definition of Research Questions

The main intent of the study is to find out the use of ML in the field of healthcare. To start with we have formulated three research questions(see Table  1 ) which are based on the topic of the study. Major goals of systematic mapping study are:-

Find review of the research area

Find the quantity, type of research, and result

Find journals of the published research topic

Therefore, on the basis of the above goals following research questions have been formed.

What type of research has been conducted in the field of use of AI/ML in healthcare? Rationale: This question aims to find the type of research, which has been conducted under the field of the healthcare domain. We need to find out papers published under the topic.

What are the broad categories of papers published under the topic? Rationale: The rationale for this question arises from the outcome of the RQ1. FQ1 gives research papers, then we need to find out the broad category under which the paper lie.

What are the different diseases which have been studied and total number of total paper published under it? Rationale: After categorising the papers, we need to find out different diseases which are being studied in the research done by other researchers. The main intent is to find the least studied disease, which can become a starting point for new research.

Conduct Search for Primary Studies

To conduct the search we followed the steps:

Prepare the search string w.r.t to different databases(as described in Table 2 ). Since we have used only Google scholar therefore we had used a broad search string to cover all papers containing the keywords healthcare and machine learning .

Execute the search and collect the results (see Fig.  2 ).

Categorise the results by studying the papers and grouping them together on the basis of the disease studied (As mentioned in the Table  6 ) & intent of the paper (As mentioned in the Table  5 ).

figure 2

Raw text from Google scholar results

We took 1400 search results from google scholar and transferred them to spreadsheet based on the query mentioned in Table  2 . This data of around 1400 entries will be further drilled down in next section by excluding the entries which are not related to the study.

Screening of Papers for Inclusion and Exclusion (Relevant Papers)

In this step we exclude all the papers that are not relevant in the study. By this we also mean that the papers which are not related to the RQs (Refer Table  5 ), papers which do not from Nature, Wiley periodicals, Elsevier, Taylor and Francis, IEEE transactions, ACM, SVN, IET, and ArXiv are excluded from the final list.

Using the above criteria we retained those entries, which are based on Inclusion criteria (Refer Table  3 ). After using the above exclusion criteria we drilled down the entries which we finally considered were 42 .

Keywording Using Abstracts

For our study we followed the systematic process of classifying the results from Google Scholar. For Keywording we followed these steps:

The result collected from the previous step are analysed by surveying abstract.

Abstract are surveyed for keywords and content. Then context of the study is evaluated.

Group the result on the basis of context and keywords (Refer Table  4 ).

Data Extraction and Mapping of Studies

We collected all the information in a spreadsheet with the information like s.no., paper title, abstract, keywords, year of publication, authors, name of publisher, name of periodical/journal/conference, major findings, major shortcomings. After that we mapped the RQs (see Table 1 ) to each entry.

Literature Survey

Interpretable machine learning.

Interpretable models are those which explains itself. Interpretable models are linear regression, logistic regression and decision trees. For instance, if we use decision tree model then we can easily extract decision rules as explanations for the model.

In [ 6 ] authors referred to the use of ML in healthcare with an emphasis on Interpretability. Interpretable ML refers to models which can provide rationale on predictions made by the model. The basic impediment in the adoption of ML in healthcare is its BlackBox nature. Since we have to develop ML as a tool that can act as an assistant to physicians, therefore, we need to make its output more explainable. Mere providing metrics like AUC, recall, precision, F-Score may not suffice. We need to develop more interpretable models that themselves can provide explanations of their predictions. The authors [ 7 ] proposed a model which adds important value to features and make the output interpretable. Authors [ 8 ] developed reasoning through the use of visual indicators making the model interpretable. In [ 9 ] authors proposed an interpretable tree from a decision forest making understandable by humans. As proposed in [ 10 , 11 ] interpretable ML models helps develop a reasonable and data-driven decision support system that results in personalised decisions.

Authors [ 12 ] applied deep learning on medical data of patients for developing interpretable predictions for decision support.

Evaluation of Medical Images

In this category, the authors discussed evaluation of medical images for better diagnosis using machine learning models.

In [ 13 , 14 , 15 , 16 , 17 ] authors used deep learning models, Neural Network to classify different diseases, organ segmentation and compared it with the diagnosis of health care professionals for diagnostic accuracy. In [ 18 ] authors proposed a novel colour deconvolution for stain separation and colour normalisation of images. In [ 19 ] authors performed a comparison of five colour normalisation algorithms and found stain colour normalisation algorithms performed better, which had high stain segmentation accuracy and low computational complexity. In their review paper authors [ 20 ] did a comparison of different image segmentation methods and related studies with medical imaging for AI-assisted diagnosis of COVID-19. In [ 21 ] authors explained AI, ML, DL, and CNN and the use of these techniques in imaging. [ 22 ] discussed image enhancements method with noise suppression by enhancing low light regions.

Processing of Electronic Health Record (EHR)

In this category, we had compiled papers that had processed electronic health records of patients.

In the paper [ 17 ] authors proposed diagnostic of pneumonia in a resource-constrained environment. The authors of [ 23 , 24 , 25 ] discussed the processing of electronic health records and used ML algorithms to categorise disease. The authors [ 26 ] trained their proposed model on large dataset and performed regression and classification to check their effectiveness and accuracy. In [ 27 ], a medical recommendation system was proposed using Fast Fourier transformation coupled with a machine learning ensemble model. The model uses this model for disease risk prediction to provide medical recommendations like medical tests and other recommendations for chronic heart disease patients. In [ 28 ] authors proposed the use of graphical structure of electronic health records and find hidden structure from it. In [ 29 ] proposed a model that provides help to physicians to evaluate the quality of evidence for better decision making. Authors used risk of bias assessment in textual data using Natural Language Processing.

Security/Privacy Framework

Under this category, we will summarised papers related to the security and privacy framework for safeguarding health records transferred over network or internet.

Authors of [ 30 ] researched on novel design of smart and secure healthcare information system by adopting machine learning. It also employed advanced security mechanism to handle the big data of the healthcare industry. This framework used many security tools to secure the data like encryption, monitoring the activity, access control, and many other mechanism. This paper [ 31 ] discussed the privacy-preserving collaborative model using ML tools for medical diagnosis systems.

Most of the privacy protection methods are centralized. There is a need for a decentralized system that can help in mitigating several challenges like single-point-of-failure, modification of records, privacy preservation, improper information exchange that may lead to risk of patient’s life. To protect, many researchers have proposed different algorithms [ 32 , 33 , 34 , 35 ]. Models like VERTIGO, GLORE, and WebDISCO were designed for privacy preservation and predictive modelling. These models aimed to preserve privacy by sending partially-trained machine learning models rather than patient data. This way the information is preserved, and develop trust between different parties.

Many other distributed privacy-preserving models were developed those were based on Blockchain technology. They use the technology to update models as in Blockchain like ModelChain, EXPLORER, Distributed Autonomous Online Learning sequentially.

Secure multiparty computation(SMC) for privacy preservation that do computations on encrypted data with personally identifiable information had opened a new dimension. Data is a very precious commodity, therefore techniques like privacy preserving scoring of Tree Ensembles [ 36 ] are designed to provide a framework that provides cryptographic protocols for sending data securely.

Transfer Learning

In this category, we summarised research papers related to transfer learning. Transfer learning is a technique in which we gain knowledge from one problem and use the same knowledge to solve different but related problem. In [ 37 ] authors proposed a technique for handling missing data using transfer learning perspective. The proposed classifier learn weights and then complete portion of the dataset and then transfer it to the target domain. In [ 38 ] authors used transfer learning approach to predict breast cancer using model trained on a task some other task. A model trained on ImageNet databases containing 1.2 million images is used as feature extractor. The model is combined with other components to perform classification. [ 39 , 40 ] uses data generated by different wearable devices using federated learning, and then builds machine learning model by transfer learning. The study was applied to diagnose Parkinson’s disease.

After the systematic mapping process we got categories of research literature as mentioned in Table  5 . This table describes category and total number of papers under the category.

From Table  5 we can clearly observe that most research is being done in Processing of Medical Images this might be due to the availability of the dataset for research purpose. In case of Processing of EHR, which is second most researched category, might be again due to availability of the dataset. In case of Interpretable ML, since it is a new field and slowly gaining momentum therefore, researchers are taking interest because this gives a rationale to the outcome of the model result. This is a vey important attribute, when it comes to certain domains where high stakes are at risk. Like for example in healthcare, defence, and finance. Lastly, In case of Transfer Learning, it is a field which talks about using the domain knowledge of one domain and use it in another related domain. So according to us researchers use this technique to apply for testing the results. Therefore it has a very limited number of research.

From Table  6 it is clearly evident that most researched disease is Cancer, which is 38 and Pneumonia with frequency as 4, Alzheimer as 3, Parkinson as 2 and Epilepsy as 2 are the least researched diseases. These results have been extracted from 1400 papers downloaded from Google Scholar.

In this paper, we have provided a brief overview of the directions of research in the healthcare domain using Machine Leaning. As described earlier, these papers can show researchers path where they can work. The result is based on literature review done on around 1400 papers and filtered down to 42 papers. As described in section “ Literature Survey ”, we categorised the research into 5 broad areas and found most of the research is done in the field of Evaluation of Medical Images in which authors researched many diseases like cancer, heart disease, COVID-19, Parkinson, etc. Authors used different kinds of dataset like images, voice, and electronic health records. Using these dataset they predicted these diseases using machine learning/AI. As described in section “ Processing of Electronic Health Record (EHR) ” second major contribution is done in this category. We would like to conclude that in section “ Interpretable Machine Learning ” very little research has been done so this area can be chosen for further research.

https://www.healthcareglobal.com/technology-and-ai-3/four-ways-which-watson-transforming-healthcare-sector

https://www.wired.co.uk/article/ai-healthcare-boom-deepmind

Powles J, Hodson H. Google DeepMind and healthcare in an age of algorithms. Health Technol. 2017;7(4):351.

Article   Google Scholar  

Baroni L, Salles R, Salles S, Guedes G, Porto F, Bezerra E, Barcellos C, Pedroso M, Ogasawara E. An analysis of malaria in the Brazilian Legal Amazon using divergent association rules. J Biomed Inf. 2020;108:103512.

Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz C, Shpanskaya K, et al. 2017. arXiv:1711.05225 .

Petersen K, Feldt R, Mujtaba S, Mattsson M. In: 12th international conference on evaluation and assessment in software engineering (EASE) 12; 2008. pp. 1–10.

Haddaway NR, Westgate MJ. Predicting the time needed for environmental systematic reviews and systematic maps. Conserv Biol. 2019;33(2):434.

Ahmad MA, Eckert C, Teredesai A. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics; 2018. pp. 559–60.

Lundberg S, Lee SI. 2017. arXiv:1705.07874 .

Yu F, Ip HH. Semantic content analysis and annotation of histological images. Comput Biol Med. 2008;38(6):635.

Sagi O, Rokach L. Explainable decision forest: transforming a decision forest into an interpretable tree. Inf Fusion. 2020;61:124.

Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L. Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscipl Rev Data Min Knowl Disc. 2020;10(5):e1379.

Google Scholar  

Ribeiro MT, Singh S, Guestrin C. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. pp. 1135–44.

Rebane J, Samsten I, Papapetrou P. Exploiting complex medical data with interpretable deep learning for adverse drug event predictio. Artif Intell Med. 2020;109:101942.

Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas M, Kern C, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Dig Health. 2019;1(6):e271.

Panayides AS, Amini A, Filipovic ND, Sharma A, Tsaftaris SA, Young A, Foran D, Do N, Golemati S, Kurc T, et al. AI in medical imaging informatics: current challenges and future directions. IEEE J Biomed Health Inf. 2020;24(7):1837.

Oktay O, Ferrante E, Kamnitsas K, Heinrich M, Bai W, Caballero J, Cook SA, De Marvao A, Dawes T, Oregan DP, et al. Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation. IEEE Trans Med Imaging. 2017;37(2):384.

Jeyaraj PR, Nadar ERS. Deep Boltzmann machine algorithm for accurate medical image analysis for classification of cancerous region. Cogn Comput Syst. 2019;1(3):85.

Harmon SA, Sanford TH, Xu S, Turkbey EB, Roth H, Xu Z, Yang D, Myronenko A, Anderson V, Amalou A, et al. A systematic review of antibody mediated immunity to coronaviruses: kinetics, correlates of protection, and association with severity. Nature Commun. 2020;11(1):1.

Zheng Y, Jiang Z, Zhang H, Xie F, Shi J, Xue C. Adaptive color deconvolution for histological WSI normalization. Comput Methods Progr Biomed. 2019;170:107.

Hoffman RA, Kothari S, Wang MD. In: 2014 36th annual international conference of the IEEE engineering in medicine and biology society, IEEE; 2014. pp. 194–7.

Feng S, et al. 2020. arXiv:2004.02731 .

Currie G, Hawk KE, Rohren E, Vial A, Klein R. Machine learning and deep learning in medical imaging: intelligent imaging. J Med Imaging Radiat Sci. 2019;50(4):477.

Xia W, Chen EC, Peters T. Endoscopic image enhancement with noise suppression. Healthcare Technol Lett. 2018;5(5):154.

Capotorti A. Probabilistic inconsistency correction for misclassification in statistical matching, with an example in health care. Int J Gener Syst. 2020;49(1):32.

Article   MathSciNet   Google Scholar  

Li JP, Haq AU, Din SU, Khan J, Khan A, Saboor A. Heart disease identification method using machine learning classification in e-healthcare. IEEE Access. 2020;8:107562.

Naydenova E, Tsanas A, Casals-Pascual C, De Vos M. In: 2015 IEEE global humanitarian technology conference (GHTC), IEEE; 2015. pp. 377–84.

Haq AU, Li JP, Memon MH, Malik A, Ahmad T, Ali A, Nazir S, Ahad I, Shahid M, et al. Feature selection based on L1-norm support vector machine and effective recognition system for Parkinson disease using voice recordings. IEEE Access. 2019;7:37718.

Zhang J, Lafta RL, Tao X, Li Y, Chen F, Luo Y, Zhu X. Coupling a fast fourier transformation with a machine learning ensemble model to support recommendations for heart disease patients in a telehealth environment. IEEE Access. 2017;5:10674.

Choi E, Xu Z, Li Y, Dusenberry M, Flores G, Xue E, Dai A. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34; 2020. pp. 606–13.

Pereira RG, Castro GZ, Azevedo P, Tôrres L, Zuppo I, Rocha T, Júnior AAG. In: 2020 IEEE 33rd international symposium on computer-based medical systems (CBMS), IEEE; 2020. pp. 1–6.

Kaur P, Sharma M, Mittal M. Big data and machine learning based secure healthcare framework. Procedia Comput Sci. 2018;132:1049.

Wang F, Zhu H, Liu X, Lu R, Hua J, Li H, Li H. Privacy-preserving collaborative model learning scheme for E-healthcare. IEEE Access. 2019;7:166054.

Wu Y, Jiang X, Kim J, Ohno-Machado L. Grid Binary LOgistic REgression (GLORE): building shared models without sharing data. J Am Med Inf Assoc. 2012;19(5):758.

Kuo TT, Ohno-Machado L. Modelchain: decentralized privacy-preserving healthcare predictive modeling framework on private blockchain networks. 2018.

Wang S, Jiang X, Wu Y, Cui L, Cheng S, Ohno-Machado L. Expectation propagation logistic regression (explorer): distributed privacy-preserving online model learning. J Biomed Inf. 2013;46(3):480.

Li Y, Jiang X, Wang S, Xiong H, Ohno-Machado L. Vertical grid logistic regression (vertigo). J Am Med Inf Assoc. 2016;23(3):570.

Fritchman K, Saminathan K, Dowsley R, Hughes T, De Cock M, Nascimento A, Teredesai A. In: 2018 IEEE international conference on big data (Big Data); 2018. pp. 2413–22. https://doi.org/10.1109/BigData.2018.8622627 .

Wang G, Lu J, Choi KS, Zhang G. A transfer-based additive LS-SVM classifier for handling missing data. IEEE Trans Cybern. 2018;50(2):739.

Dey N, Das H, Naik B, Behera HS. Big data analytics for intelligent healthcare management. Cambridge: Academic Press; 2019.

Chen Y, Qin X, Wang J, Yu C, Gao W, Chen Y, Qin X, Wang J, Yu C, Gao W. Fedhealth: a federated transfer learning framework for wearable healthcare. IEEE Intell Syst. 2020;35(4):83.

Download references

Author information

Authors and affiliations.

AIIT, AMITY University, Noida, Uttar Pradesh, India

Gaurav Parashar, Alka Chaudhary & Ajay Rana

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Gaurav Parashar .

Ethics declarations

Conflict of interest.

The authors declare that they have no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Intelligent Systems” guest edited by Geetha Ganesan, Lalit Garg, Renu Dhir, Vijay Kumar and Manik Sharma.

Rights and permissions

Reprints and permissions

About this article

Parashar, G., Chaudhary, A. & Rana, A. Systematic Mapping Study of AI/Machine Learning in Healthcare and Future Directions. SN COMPUT. SCI. 2 , 461 (2021). https://doi.org/10.1007/s42979-021-00848-6

Download citation

Received : 04 August 2021

Accepted : 01 September 2021

Published : 16 September 2021

DOI : https://doi.org/10.1007/s42979-021-00848-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning (ML)
  • Transfer learning (TL)
  • Interpretable ML
  • Electronic health records (EHR)
  • Security framework
  • Privacy framework
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 29 August 2023

Healthcare predictive analytics using machine learning and deep learning techniques: a survey

  • Mohammed Badawy   ORCID: orcid.org/0000-0001-9494-1386 1 ,
  • Nagy Ramadan 1 &
  • Hesham Ahmed Hefny 2  

Journal of Electrical Systems and Information Technology volume  10 , Article number:  40 ( 2023 ) Cite this article

10k Accesses

4 Citations

Metrics details

Healthcare prediction has been a significant factor in saving lives in recent years. In the domain of health care, there is a rapid development of intelligent systems for analyzing complicated data relationships and transforming them into real information for use in the prediction process. Consequently, artificial intelligence is rapidly transforming the healthcare industry, and thus comes the role of systems depending on machine learning and deep learning in the creation of steps that diagnose and predict diseases, whether from clinical data or based on images, that provide tremendous clinical support by simulating human perception and can even diagnose diseases that are difficult to detect by human intelligence. Predictive analytics for healthcare a critical imperative in the healthcare industry. It can significantly affect the accuracy of disease prediction, which may lead to saving patients' lives in the case of accurate and timely prediction; on the contrary, in the case of an incorrect prediction, it may endanger patients' lives. Therefore, diseases must be accurately predicted and estimated. Hence, reliable and efficient methods for healthcare predictive analysis are essential. Therefore, this paper aims to present a comprehensive survey of existing machine learning and deep learning approaches utilized in healthcare prediction and identify the inherent obstacles to applying these approaches in the healthcare domain.

Introduction

Each day, human existence evolves, yet the health of each generation either improves or deteriorates. There are always uncertainties in life. Occasionally encounter many individuals with fatal health problems due to the late detection of diseases. Concerning the adult population, chronic liver disease would affect more than 50 million individuals worldwide. However, if the sickness is diagnosed early, it can be stopped. Disease prediction based on machine learning can be utilized to identify common diseases at an earlier stage. Currently, health is a secondary concern, which has led to numerous problems. Many patients cannot afford to see a doctor, and others are extremely busy and on a tight schedule, yet ignoring recurring symptoms for an extended length of time can have significant health repercussions [ 1 ].

Diseases are a global issue; thus, medical specialists and researchers are exerting their utmost efforts to reduce disease-related mortality. In recent years, predictive analytic models has played a pivotal role in the medical profession because of the increasing volume of healthcare data from a wide range of disparate and incompatible data sources. Nonetheless, processing, storing, and analyzing the massive amount of historical data and the constant inflow of streaming data created by healthcare services has become an unprecedented challenge utilizing traditional database storage [ 2 , 3 , 4 ]. A medical diagnosis is a form of problem-solving and a crucial and significant issue in the real world. Illness diagnosis is the process of translating observational evidence into disease names. The evidence comprises data received from evaluating a patient and substances generated from the patient; illnesses are conceptual medical entities that detect anomalies in the observed evidence [ 5 ].

Healthcare is the collective effort of society to ensure, provide, finance, and promote health. In the twentieth century, there was a significant shift toward the ideal of wellness and the prevention of sickness and incapacity. The delivery of healthcare services entails organized public or private efforts to aid persons in regaining health and preventing disease and impairment [ 6 ]. Health care can be described as standardized rules that help evaluate actions or situations that affect decision-making [ 7 ]. Healthcare is a multi-dimensional system. The basic goal of health care is to diagnose and treat illnesses or disabilities. A healthcare system’s key components are health experts (physicians or nurses), health facilities (clinics and hospitals that provide medications and other diagnostic services), and a funding institution to support the first two [ 8 ].

With the introduction of systems based on computers, the digitalization of all medical records and the evaluation of clinical data in healthcare systems have become widespread routine practices. The phrase "electronic health records" was chosen by the Institute of Medicine, a division of the National Academies of Sciences, Engineering, and Medicine, in 2003 to define the records that continued to enhance the healthcare sector for the benefit of both patients and physicians. Electronic Health Records (EHR) are "computerized medical records for patients that include all information in an individual's past, present, or future that occurs in an electronic system used to capture, store, retrieve, and link data primarily to offer healthcare and health-related services," according to Murphy, Hanken, and Waters [ 8 ].

Daily, healthcare services produce an enormous amount of data, making it increasingly complicated to analyze and handle it in "conventional ways." Using machine learning and deep learning, this data may be properly analyzed to generate actionable insights. In addition, genomics, medical data, social media data, environmental data, and other data sources can be used to supplement healthcare data. Figure  1 provides a visual picture of these data sources. The four key healthcare applications that can benefit from machine learning are prognosis, diagnosis, therapy, and clinical workflow, as outlined in the following section [ 9 ].

figure 1

Illustration of heterogeneous sources contributing to healthcare data [ 9 ]

The long-term investment in developing novel technologies based on machine learning as well as deep learning techniques to improve the health of individuals via the prediction of future events reflects the increased interest in predictive analytics techniques to enhance healthcare. Clinical predictive models, as they have been formerly referred to, assisted in the diagnosis of people with an increased probability of disease. These prediction algorithms are utilized to make clinical treatment decisions and counsel patients based on some patient characteristics [ 10 ].

The concept of medical care is used to stress the organization and administration of curative care, which is a subset of health care. The ecology of medical care was first introduced by White in 1961. White also proposed a framework for perceiving patterns of health concerning symptoms experienced by populations of interest, along with individuals’ choices in getting medical treatment. In this framework, it is possible to calculate the proportion of the population that used medical services over a specific period of time. The "ecology of medical care" theory has become widely accepted in academic circles over the past few decades [ 6 ].

Medical personnel usually face new problems, changing tasks, and frequent interruptions because of the system's dynamism and scalability. This variability often makes disease recognition a secondary concern for medical experts. Moreover, the clinical interpretation of medical data is a challenging task from an epistemological point of view. This not only applies to professionals with extensive experience but also to representatives, such as young physician assistants, with varied or little experience [ 11 ]. The limited time available to medical personnel, the speedy progression of diseases, and the fluctuating patient dynamics make diagnosis a particularly complex process. However, a precise method of diagnosis is critical to ensuring speedy treatment and, thus, patient safety [ 12 ].

Predictive analytics for health care are critical industry requirements. It can have a significant impact on the accuracy of disease prediction, which can save patients' lives in the case of an accurate and timely prediction but can also endanger patients' lives in the case of an incorrect prediction. Diseases must therefore be accurately predicted and estimated. As a result, dependable and efficient methods for healthcare predictive analysis are required.

The purpose of this paper is to present a comprehensive review of common machine learning and deep learning techniques that are utilized in healthcare prediction, in addition to identifying the inherent obstacles that are associated with applying these approaches in the healthcare domain.

The rest of the paper is organized as follows: Section  " Background " gives a theoretical background on artificial intelligence, machine learning, and deep learning techniques. Section  " Disease prediction with analytics " outlines the survey methodology and presents a literature review of machine learning as well as deep learning approaches employed in healthcare prediction. Section  " Results and Discussion " gives a discussion of the results of previous works related to healthcare prediction. Section  " Challenges " covers the existing challenges related to the topic of this survey. Finally, Section  " Conclusion " concludes the paper.

The extensive research and development of cutting-edge tools based on machine learning and deep learning for predicting individual health outcomes demonstrate the increased interest in predictive analytics techniques to improve health care. Clinical predictive models assisted physicians in better identifying and treating patients who were at a higher risk of developing a serious illness. Based on a variety of factors unique to each individual patient, these prediction algorithms are used to advise patients and guide clinical practice.

Artificial intelligence (AI) is the ability of a system to interpret data, and it makes use of computers and machines to improve humans' capacity for decision-making, problem-solving, and technological innovation [ 13 ]. Figure  2 depicts machine learning and deep learning as subsets of AI.

figure 2

AI, ML, and DL

Machine learning

Machine learning (ML) is a subfield of AI that aims to develop predictive algorithms based on the idea that machines should have the capability to access data and learn on their own [ 14 ]. ML utilizes algorithms, methods, and processes to detect basic correlations within data and create descriptive and predictive tools that process those correlations. ML is usually associated with data mining, pattern recognition, and deep learning. Although there are no clear boundaries between these areas and they often overlap, it is generally accepted that deep learning is a relatively new subfield of ML that uses extensive computational algorithms and large amounts of data to define complex relationships within data. As shown in Fig.  3 , ML algorithms can be divided into three categories: supervised learning, unsupervised learning, and reinforcement learning [ 15 ].

figure 3

Different types of machine learning algorithms

Supervised learning

Supervised learning is an ML model for investigating the input–output correlation information of a system depending on a given set of training examples that are paired between the inputs and the outputs [ 16 ]. The model is trained with a labeled dataset. It matches how a student learns fundamental math from a teacher. This kind of learning requires labeled data with predicted correct answers based on algorithm output [ 17 ]. The most widely used supervised learning-based techniques include linear regression, logistic regression, decision trees, random forests, support vector machines, K-nearest neighbor, and naive Bayes.

A. Linear regression

Linear regression is a statistical method commonly used in predictive investigations. It succeeds in forecasting the dependent, output, variable (Y) based on the independent, input, variable (X). The connection between X and Y is represented as shown in Eq.  1 assuming continuous, real, and numeric parameters.

where m indicates the slope and c indicates the intercept. According to Eq.  1 , the association between the independent parameters (X) and the dependent parameters (Y) can be inferred [ 18 ].

The advantage of linear regression is that it is straightforward to learn and easy to-eliminate overfitting through regularization. One drawback of linear regression is that it is not convenient when applied to nonlinear relationships. However, it is not recommended for most practical applications as it greatly simplifies real-world problems [ 19 ]. The implementation tools utilized in linear regression are Python, R, MATLAB, and Excel.

As shown in Fig.  4 , observations are highlighted in red, and random deviations' result (shown in green) from the basic relationship (shown in yellow) between the independent variable (x) and the dependent variable (y) [ 20 ].

figure 4

Linear regression model

B. Logistic regression

Logistic regression, also known as the logistic model, investigates the correlation between many independent variables and a categorical dependent variable and calculates the probability of an event by fitting the data to a logistic curve [ 21 ]. Discrete mean values must be binary, i.e., have only two outcomes: true or false, 0 or 1, yes or no, or either superscript or subscript. In logistic regression, categorical variables need to be predicted and classification problems should be solved. Logistic regression can be implemented using various tools such as R, Python, Java, and MATLAB [ 18 ]. Logistic regression has many benefits; for example, it shows the linear relationship between dependent and independent variables with the best results. It is also simple to understand. On the other hand, it can only predict numerical output, is not relevant to nonlinear data, and is sensitive to outliers [ 22 ].

C. Decision tree

The decision tree (DT) is the supervised learning technique used for classification. It combines the values of attributes based on their order, either ascending or descending [ 23 ]. As a tree-based strategy, DT defines each path starting from the root using a data-separating sequence until a Boolean conclusion is attained at the leaf node [ 24 , 25 ]. DT is a hierarchical representation of knowledge interactions that contains nodes and links. When relations are employed to classify, nodes reflect purposes [ 26 , 27 ]. An example of DT is presented in Fig.  5 .

figure 5

Example of a DT

DTs have various drawbacks, such as increased complexity with increasing nomenclature, small modifications that may lead to a different architecture, and more processing time to train data [ 18 ]. The implementation tools used in DT are Python (Scikit-Learn), RStudio, Orange, KNIME, and Weka [ 22 ].

D. Random forest

Random forest (RF) is a basic technique that produces correct results most of the time. It may be utilized for classification and regression. The program produces an ensemble of DTs and blends them [ 28 ].

In the RF classifier, the higher the number of trees in the forest, the more accurate the results. So, the RF has generated a collection of DTs called the forest and combined them to achieve more accurate prediction results. In RF, each DT is built only on a part of the given dataset and trained on approximations. The RF brings together several DTs to reach the optimal decision [ 18 ].

As indicated in Fig.  6 , RF randomly selects a subset of features from the data, and from each subset it generates n random trees [ 20 ]. RF will combine the results from all DTs and provide them in the final output.

figure 6

Random forest architecture

Two parameters are being used for tuning RF models: mtry —the count of randomly selected features to be considered in each division; and ntree —the model trees count. The mtry parameter has a trade-off: Large values raise the correlation between trees, but enhance the per-tree accuracy [ 29 ].

The RF works with a labeled dataset to do predictions and build a model. The final model is utilized to classify unlabeled data. The model integrates the concept of bagging with a random selection of traits to build variance-controlled DTs [ 30 ].

RF offers significant benefits. First, it can be utilized for determining the relevance of the variables in a regression and classification task [ 31 , 32 ]. This relevance is measured on a scale, based on the impurity drop at each node used for data segmentation [ 33 ]. Second, it automates missing values contained in the data and resolves the overfitting problem of DT. Finally, RF can efficiently handle huge datasets. On the other side, RF suffers from drawbacks; for example, it needs more computing and resources to generate the output results and it requires training effort due to the multiple DTs involved in it. The implementation tools used in RF are Python Scikit-Learn and R [ 18 ].

E. Support vector machine

The supervised ML technique for classification issues and regression models is called the support vector machine (SVM). SVM is a linear model that offers solutions to issues that are both linear and nonlinear. as shown in Fig.  7 . Its foundation is the idea of margin calculation. The dataset is divided into several groups to build relations between them [ 18 ].

figure 7

Support vector machine

SVM is a statistics-based learning method that follows the principle of structural risk minimization and aims to locate decision bounds, also known as hyperplanes, that can optimally separate classes by finding a hyperplane in a usable N-dimensional space that explicitly classifies data points [ 34 , 35 , 36 ]. SVM indicates the decision boundary between two classes by defining the value of each data point, in particular the support vector points placed on the boundary between the respective classes [ 37 ].

SVM has several advantages; for example, it works perfectly with both semi-structured and unstructured data. The kernel trick is a strong point of SVM. Moreover, it can handle any complex problem with the right functionality and can also handle high-dimensional data. Furthermore, SVM generalization has less allocation risk. On the other hand, SVM has many downsides. The model training time is increased on a large dataset. Choosing the right kernel function is also a difficult process. In addition, it is not working well with noisy data. Implementation tools used in SVM include SVMlight with C, LibSVM with Python, MATLAB or Ruby, SAS, Kernlab, Scikit-Learn, and Weka [ 22 ].

F. K-nearest neighbor

K-nearest neighbor (KNN) is an "instance-based learning" or non-generalized learning algorithm, which is often known as a “lazy learning” algorithm [ 38 ]. KNN is used for solving classification problems. To anticipate the target label of the novel test data, KNN determines the distance of the nearest training data class labels with a new test data point in the existence of a K value, as shown in Fig.  8 . It then calculates the number of nearest data points using the K value and terminates the label of the new test data class. To determine the number of nearest-distance training data points, KNN usually sets the value of K according to (1): k  =  n ^(1/2), where n is the size of the dataset [ 22 ].

figure 8

K-nearest neighbor

KNN has many benefits; for example, it is sufficiently powerful if the size of the training data is large. It is also simple and flexible, with attributes and distance functions. Moreover, it can handle multi-class datasets. KNN has many drawbacks, such as the difficulty of choosing the appropriate K value, it being very tedious to choose the distance function type for a particular dataset, and the computation cost being a little high due to the distance between all the training data points, the implementation tools used in KNN are Python (Scikit-Learn), WEKA, R, KNIME, and Orange [ 22 ].

G. Naive Bayes

Naive Bayes (NB) focuses on the probabilistic model of Bayes' theorem and is simple to set up as the complex recursive parameter estimation is basically none, making it suitable for huge datasets [ 39 ]. NB determines the class membership degree based on a given class designation [ 40 ]. It scans the data once, and thus, classification is easy [ 41 ]. Simply, the NB classifier assumes that there is no relation between the presence of a particular feature in a class and the presence of any other characteristic. It is mainly targeted at the text classification industry [ 42 ].

NB has great benefits such as ease of implementation, can provide a good result even using fewer training data, can manage both continuous and discrete data, and is ideal to solve the prediction of multi-class problems, and the irrelevant feature does not affect the prediction. NB, on the other hand, has the following drawbacks: It assumes that all features are independent which is not always viable in real-world problems, suffers from zero frequency problems, and the prediction of NB is not usually accurate. Implementation tools are WEKA, Python, RStudio, and Mahout [ 22 ].

To summarize the previously discussed models, Table 1 demonstrates the advantages and disadvantages of each model.

Unsupervised learning

Unlike supervised learning, there are no correct answers and no teachers in unsupervised learning [ 42 ]. It follows the concept that a machine can learn to understand complex processes and patterns on its own without external guidance. This approach is particularly useful in cases where experts have no knowledge of what to look for in the data and the data itself do not include the objectives. The machine predicts the outcome based on past experiences and learns to predict the real-valued outcome from the information previously provided, as shown in Fig.  9 .

figure 9

Workflow of unsupervised learning [ 23 ]

Unsupervised learning is widely used in the processing of multimedia content, as clustering and partitioning of data in the lack of class labels is often a requirement [ 43 ]. Some of the most popular unsupervised learning-based approaches are k-means, principal component analysis (PCA), and apriori algorithm.

The k-means algorithm is the common portioning method [ 44 ] and one of the most popular unsupervised learning algorithms that deal with the well-known clustering problem. The procedure classifies a particular dataset by a certain number of preselected (assuming k -sets) clusters [ 45 ]. The pseudocode of the K-means algorithm is shown in Pseudocode 1.

machine learning in healthcare research papers pdf

K means has several benefits such as being more computationally efficient than hierarchical grouping in case of large variables. It provides more compact clusters than hierarchical ones when a small k is used. Also, the ease of implementation and comprehension of assembly results is another benefit. However, K -means have disadvantages such as the difficulty of predicting the value of K . Also, as different starting sections lead to various final combinations, the performance is affected. It is accurate for raw points and local optimization, and there is no single solution for a given K value—so the average of the K value must be run multiple times (20–100 times) and then pick the results with the minimum J [ 19 ].

B. Principal component analysis

In modern data analysis, principal component analysis (PCA) is an essential tool as it provides a guide for extracting the most important information from a dataset, compressing the data size by keeping only those important features without losing much information, and simplifying the description of a dataset [ 46 , 47 ].

PCA is frequently used to reduce data dimensions before applying classification models. Moreover, unsupervised methods, such as dimensionality reduction or clustering algorithms, are commonly used for data visualizations, detection of common trends or behaviors, and decreasing the data quantity to name a few only [ 48 ].

PCA converts the 2D data into 1D data. This is done by changing the set of variables into new variables known as principal components (PC) which are orthogonal [ 23 ]. In PCA, data dimensions are reduced to make calculations faster and easier. To illustrate how PCA works, let us consider an example of 2D data. When these data are plotted on a graph, it will take two axes. Applying PCA, the data turn into 1D. This process is illustrated in Fig.  10 [ 49 ].

figure 10

Visualization of data before and after applying PCA [ 49 ]

Apriori algorithm is considered an important algorithm, which was first introduced by R. Agrawal and R. Srikant, and published in [ 50 , 51 ].

The principle of the apriori algorithm is to represent the filter generation strategy. It creates a filter element set ( k  + 1) based on the repeated k element groups. Apriori uses an iterative strategy called planar search, where k item sets are employed to explore ( k  + 1) item sets. First, the set of repeating 1 item is produced by scanning the dataset to collect the number for each item and then collecting items that meet the minimum support. The resulting group is called L1. Then L1 is used to find L2, the recursive set of two elements is used to find L3, and so on until no repeated k element groups are found. Finding every Lk needs a full dataset scan. To improve production efficiency at the level-wise of repeated element groups, a key property called the apriori property is used to reduce the search space. Apriori property states that all non-empty subsets of a recursive element group must be iterative. A two-step technique is used to identify groups of common elements: join and prune activities [ 52 ].

Although it is simple, the apriori algorithm suffers from several drawbacks. The main limitation is the costly wasted time to contain many candidates sets with a lot of redundant item sets. It also suffers from low minimum support or large item sets, and multiple rounds of data are needed for data mining which usually results in irrelevant items, in addition to difficulties in discovering individual elements of events [ 53 , 54 ].

To summarize the previously discussed models, Table 2 demonstrates the advantages and disadvantages of each model.

Reinforcement learning

Reinforcement learning (RL) is different from supervised learning and unsupervised learning. It is a goal-oriented learning approach. RL is closely related to an agent (controller) that takes responsibility for the learning process to achieve a goal. The agent chooses actions, and as a result, the environment changes its state and returns rewards. Positive or negative numerical values are used as rewards. An agent's goal is to maximize the rewards accumulated over time. A job is a complete environment specification that identifies how to generate rewards [ 55 ]. Some of the most popular reinforcement learning-based algorithms are the Q-learning algorithm and the Monte Carlo tree search (MCTS).

A. Q-learning

Q-learning is a type of model-free RL. It can be considered an asynchronous dynamic programming approach. It enables agents to learn how to operate optimally in Markovian domains by exploring the effects of actions, without the need to generate domain maps [ 56 ]. It represented an incremental method of dynamic programming that imposed low computing requirements. It works through the successive improvement of the assessment of individual activity quality in particular states [ 57 ].

In information theory, Q-learning is strongly employed, and other related investigations are underway. Recently, Q-learning combined with information theory has been employed in different disciplines such as natural language processing (NLP), pattern recognition, anomaly detection, and image classification [ 58 , 59 , 60 , 60 ]. Moreover, a framework has been created to provide a satisfying response based on the user’s utterance using RL in a voice interaction system [ 61 ]. Furthermore, a high-resolution deep learning-based prediction system for local rainfall has been constructed [ 62 ].

The advantage of developmental Q-learning is that it is possible to identify the reward value effectively on a given multi-agent environment method as agents in ant Q-learning are interacting with each other. The problem with Q-learning is that its output can be stuck in the local minimum as agents just take the shortest path [ 63 ].

B. Monte Carlo tree search

Monte Carlo tree search (MCTS) is an effective technique for solving sequential selection problems. Its strategy is based on a smart tree search that balances exploration and exploitation. MCTS presents random samples in the form of simulations and keeps activity statistics for better educated choices in each future iteration. MCTS is a decision-making algorithm that is employed in searching tree-like huge complex regions. In such trees, each node refers to a state, which is also referred to as problem configuration, while edges represent transitions from one state to another [ 64 ].

The MCTS is related directly to cases that can be represented by a Markov decision process (MDP), which is a type of discrete-time random control process. Some modifications of the MCTS make it possible to apply it to partially observable Markov decision processes (POMDP) [ 65 ]. Recently, MCTS coupled with deep RL became the base of AlphaGo developed by Google DeepMind and documented in [ 66 ]. The basic MCTS method is conceptually simple, as shown in Fig.  11 .

figure 11

Basic MCTS process

Tree 1 is constructed progressively and unevenly. The tree policy is utilized to get the critical node of the current tree for each iteration of the method. The tree strategy seeks to strike a balance between exploration and exploitation concerns. Then, from the specified node, simulation 2 is run, and the search tree is then updated according to the obtained results. This comprises adding a child node that matches the specified node's activity and updating its ancestor's statistics. During this simulation, movements are performed based on some default policy, which in its simplest case is to make uniform random movements. The benefit of MCTS is that there is no need to evaluate the values of the intermediate state, which significantly minimizes the amount of required knowledge in the field [ 67 ].

To summarize the previously discussed models, Table 3 demonstrates the advantages and disadvantages of each model.

Deep learning

Over the past decades, ML has had a significant impact on our daily lives with examples including efficient computer vision, web search, and recognition of optical characters. In addition, by applying ML approaches, AI at the human level has also been improved [ 68 , 69 , 70 ]. However, when it comes to the mechanisms of human information processing (such as sound and vision), the performance of traditional ML algorithms is far from satisfactory. The idea of deep learning (DL) was formed in the late 20th inspired by the deep hierarchical structures of human voice recognition and production systems. DL breaks have been introduced in 2006 when Hinton built a deep-structured learning architecture called deep belief network (DBN) [ 71 ].

The performance of classifiers using DL has been extensively improved with the increased complexity of data compared to classical learning methods. Figure  12 shows the performance of classic ML algorithms and DL methods [ 72 ]. The performance of typical ML algorithms becomes stable when they reach the training data threshold, but DL improves its performance as the complexity of data increases [ 73 ].

figure 12

Performance of deep learning concerning the complexity of data

DL (deep ML, or deep-structured learning) is a subset of ML that involves a collection of algorithms attempting to represent high-level abstractions for data through a model that has complicated structures or is otherwise, composed of numerous nonlinear transformations. The most important characteristic of DL is the depth of the network. Another essential aspect of DL is the ability to replace handcrafted features generated by efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction [ 74 ].

DL has significantly advanced the latest technologies in a variety of applications, including machine translation, speech, and visual object recognition, NLP, and text automation, using multilayer artificial neural networks (ANNs) [ 15 ].

Different DL designs in the past two decades give enormous potential for employment in various sectors such as automatic voice recognition, computer vision, NLP, and bioinformatics. This section discusses the most common architectures of DL such as convolutional neural networks (CNNs), long short-term memory (LSTM), and recurrent convolution neural networks (RCNNs) [ 75 ].

A. Convolutional neural network

CNNs are special types of neural networks inspired by the human visual cortex and used in computer vision. It is an automatic feed-forward neural network in which information transfers exclusively in the forward direction [ 76 ]. CNN is frequently applied in face recognition, human organ localization, text analysis, and biological image recognition [ 77 ].

Since CNN was first created in 1989, it has done well in disease diagnosis over the past three decades [ 78 ]. Figure  13 depicts the general architecture of a CNN composed of feature extractors and a classifier. Each layer of the network accepts the output of the previous layer as input and passes it on to the next layer in feature extraction layers. A typical CNN architecture consists of three types of layers: convolution, pooling, and classification. There are two types of layers at the network's low and middle levels: convolutional layers and pooling layers. Even-numbered layers are used for convolutions, while odd-numbered layers are used for pooling operations. The convolution and pooling layers' output nodes are categorized in a two-dimensional plane called feature mapping. Each layer level is typically generated by combining one or more previous layers [ 79 ].

figure 13

Architecture of CNN [ 79 ]

CNN has a lot of benefits, including a human optical processing system, greatly improved 2D and 3D image processing structure, and is effective in learning and extracting abstract information from 2D information. The max-pooling layer in CNN is efficient in absorbing shape anisotropy. Furthermore, they are constructed from sparse connections with paired weights and contain far fewer parameters than a fully connected network of equal size. CNNs are trained using a gradient-based learning algorithm and are less susceptible to the diminishing gradient problem because the gradient-based approach trains the entire network to directly reduce the error criterion, allowing CNNs to provide highly optimized weights [ 79 ].

B. Long short-term memory

LSTM is a special type of recurrent neural network (RNN) with internal memory and multiplicative gates. Since the original LSTM introduction in 1997 by Sepp Hochrieiter and Jürgen Schmidhuber, a variety of LSTM cell configurations have been described [ 80 ].

LSTM has contributed to the development of well-known software such as Alexa, Siri, Cortana, Google Translate, and Google voice assistant [ 81 ]. LSTM is an implementation of RNN with a special connection between nodes. The special components within the LSTM unit include the input, output, and forget gates. Figure  14 depicts a single LSTM cell.

figure 14

LSTM unit [ 82 ]

x t  = Input vector at the time t.

h t-1  = Previous hidden state.

c t-1  = Previous memory state.

h t  = Current hidden state.

c t  = Current memory state.

[ x ] = Multiplication operation.

[+] = Addition operation.

LSTM is an RNN module that handles gradient loss problems. In general, RNN uses LSTM to eliminate propagation errors. This allows the RNN to learn over multiple time steps. LSTM is characterized by cells that hold information outside the recurring network. This cell enables the RNN to learn over many time steps. The basic principle of LSTMs is the state of the cell, which contains information outside the recurrent network. A cell is like a memory in a computer, which decides when data should be stored, written, read, or erased via the LSTM gateway [ 82 ]. Many network architectures use LSTM such as bidirectional LSTM, hierarchical and attention-based LSTM, convolutional LSTM, autoencoder LSTM, grid LSTM, cross-modal, and associative LSTM [ 83 ].

Bidirectional LSTM networks move the state vector forward and backward in both directions. This implies that dependencies must be considered in both temporal directions. As a result of inverse state propagation, the expected future correlations can be included in the network's current output [ 84 ]. Bidirectional LSTM investigates and analyzes this because it encapsulates spatially and temporally scattered information and can tolerate incomplete inputs via a flexible cell state vector propagation communication mechanism. Based on the detected gaps in data, this filtering mechanism reidentifies the connections between cells for each data sequence. Figure  15 depicts the architecture. A bidirectional network is used in this study to process properties from multiple dimensions into a parallel and integrated architecture [ 83 ].

figure 15

(left) Bidirectional LSTM and (right) filter mechanism for processing incomplete data [ 84 ]

Hierarchical LSTM networks solve multi-dimensional problems by breaking them down into subproblems and organizing them in a hierarchical structure. This has the advantage of focusing on a single or multiple subproblems. This is accomplished by adjusting the weights within the network to generate a certain level of interest [ 83 ]. A weighting-based attention mechanism that analyzes and filters input sequences is also used in hierarchical LSTM networks for long-term dependency prediction [ 85 ].

Convolutional LSTM reduces and filters input data collected over a longer period using convolutional operations applied in LSTM networks or the LSTM cell architecture directly. Furthermore, due to their distinct characteristics, convolutional LSTM networks are useful for modeling many quantities such as spatially and temporally distributed relationships. However, many quantities can be expected collectively in terms of reduced feature representation. Decoding or decoherence layers are required to predict different output quantities not as features but based on their parent units [ 83 ].

The LSTM autoencoder solves the problem of predicting high-dimensional parameters by shrinking and expanding the network [ 86 ]. The autoencoder architecture is separately trained with the aim of accurate reconstruction of the input data as reported in [ 87 ]. Only the encoder is used during testing and commissioning to extract the low-dimensional properties that are transmitted to the LSTM. The LSTM was extended to multimodal prediction using this strategy. To compress the input data and cell states, the encoder and decoder are directly integrated into the LSTM cell architecture. This combined reduction improves the flow of information in the cell and results in an improved cell state update mechanism for both short-term and long-term dependency [ 83 ].

Grid long short-term memory is a network of LSTM cells organized into a multi-dimensional grid that can be applied to sequences, vectors, or higher-dimensional data like images [ 88 ]. Grid LSTM has connections to the spatial or temporal dimensions of input sequences. Thus, connections of different dimensions within cells extend the normal flow of information. As a result, grid LSTM is appropriate for the parallel prediction of several output quantities that may be independent, linear, or nonlinear. The network's dimensions and structure are influenced by the nature of the input data and the goal of the prediction [ 89 ].

A novel method for the collaborative prediction of numerous quantities is the cross-modal and associative LSTM. It uses several standard LSTMs to separately model different quantities. To calculate the dependencies of the quantities, these LSTM streams communicate with one another via recursive connections. The chosen layers' outputs are added as new inputs to the layers before and after them in other streams. Consequently, a multimodal forecast can be made. The benefit of this approach is that the correlation vectors that are produced have the same dimensions as the input vectors. As a result, neither the parameter space nor the computation time increases [ 90 ].

C. Recurrent convolution neural network

CNN is a key method for handling various computer vision challenges. In recent years, a new generation of CNNs has been developed, the recurrent convolution neural network (RCNN), which is inspired by large-scale recurrent connections in the visual systems of animals. The recurrent convolutional layer (RCL) is the main feature of RCNN, which integrates repetitive connections among neurons in the normal convolutional layer. With the increase in the number of repetitive computations, the receptive domains (RFs) of neurons in the RCL expand infinitely, which is contrary to biological facts [ 91 ].

The RCNN prototype was proposed by Ming Liang and Xiaolin Hu [ 92 , 93 ], and the structure is illustrated in Fig.  16 , in which both forward and redundant connections have local connectivity and weights shared between distinct sites. This design is quite like the recurrent multilayer perceptron (RMLP) concept which is often used for dynamic control [ 94 , 95 ] (Fig.  17 , middle). Like the distinction between MLP and CNN, the primary distinction is that in RMLP, common local connections are used in place of full connections. For this reason, the proposed model is known as RCNN [ 96 ].

figure 16

Illustration of the architectures of CNN, RMLP, and RCNN [ 85 ]

figure 17

Illustration of the total number of reviewed papers

The main unit of RCNN is the RCL. RCLs develop through discrete time steps. RCNN offers three basic advantages. First, it allows each unit to accommodate background information in an arbitrarily wide area in the current layer. Second, recursive connections improve the depth of the network while keeping the number of mutable parameters constant through weight sharing. This is consistent with the trend of modern CNN architecture to grow deeper with a relatively limited number of parameters. The third aspect of RCNN is the time exposed in RCNN which is a CNN with many paths between the input layer and the output layer, which makes learning simple. On one hand, having longer paths makes it possible for the model to learn very complex features. On the other hand, having shorter paths may improve the inverse gradient during training [ 91 ].

To summarize the previously discussed models, Table 4 demonstrates the advantages and disadvantages of each model.

Disease prediction with analytics

The studies discussed in this paper have been presented and published in high-quality journals and international conferences published by IEEE, Springer, and Elsevier, and other major scientific publishers such as Hindawi, Frontiers, Taylor, and MDPI. The search engines used are Google Scholar, Scopus, and Science Direct. All papers selected covered the period from 2019 to 2022. Machine learning, deep learning, health care, surgery, cardiology, radiology, hepatology, and nephrology are some of the terms used to search for these studies. The studies chosen for this survey are concerned with the use of machine learning as well as deep learning algorithms in healthcare prediction. For this survey, empirical and review articles on the topics were considered. This section discusses existing research efforts that healthcare prediction using various techniques in ML and DL. This survey gives a detailed discussion about the methods and algorithms which are used for predictions, performance metrics, and tools of their model.

ML-based healthcare prediction

To predict diabetes patients, the authors of [ 97 ] utilized a framework to develop and evaluate ML classification models like logistic regression, KNN, SVM, and RF. ML method was implemented on the Pima Indian Diabetes Database (PIDD) which has 768 rows and 9 columns. The forecast accuracy delivers 83%. Results of the implementation approach indicate how the logistic regression outperformed other algorithms of ML, in addition only a structured dataset was selected but unstructured data are not considered, also model should be implemented in other healthcare domains like heart disease, and COVID-19, finally other factors should be considered for diabetes prediction, like family history of diabetes, smoking habits, and physical inactivity.

The authors created a diagnosis system in [ 98 ] that uses two different datasets (Frankfurt Hospital in Germany and PIDD provided by the UCI ML repository) and four prediction models (RF, SVM, NB, and DT) to predict diabetes. the SVM algorithm performed with an accuracy of 83.1 percent. There are some aspects of this study that need to be improved; such as, using a DL approach to predict diabetes may lead to achieving better results; furthermore, the model should be tested in other healthcare domains such as heart disease and COVID-19 prediction datasets.

In [ 99 ], the authors proposed three ML methods (logistic regression, DT, and boosted RF) to assess COVID-19 using OpenData Resources from Mexico and Brazil. To predict rescue and death, the proposed model incorporates just the COVID-19 patient's geographical, social, and economic conditions, as well as clinical risk factors, medical reports, and demographic data. On the dataset utilized, the model for Mexico has a 93 percent accuracy, and an F1 score is 0.79. On the other hand, on the used dataset, the Brazil model has a 69 percent accuracy and an F1 score is 0.75. The three ML algorithms have been examined and the acquired results showed that logistic regression is the best way of processing data. The authors should be concerned about the usage of authentication and privacy management of the created data.

A new model for predicting type 2 diabetes using a network approach and ML techniques was presented by the authors in [ 100 ] (logistic regression, SVM, NB, KNN, decision tree, RF, XGBoost, and ANN). To predict the risk of type 2 diabetes, the healthcare data of 1,028 type 2 diabetes patients and 1,028 non-type 2 diabetes patients were extracted from de-identified data. The experimental findings reveal the models’ effectiveness with an area under curve (AUC) varied from 0.79 to 0.91. The RF model achieved higher accuracy than others. This study relies only on the dataset providing hospital admission and discharge summaries from one insurance company. External hospital visits and information from other insurance companies are missing for people with many insurance providers.

The authors of [ 101 ] proposed a healthcare management system that can be used by patients to schedule appointments with doctors and verify prescriptions. It gives support for ML to detect ailments and determine medicines. ML models including DT, RF, logistic regression, and NB classifiers are applied to the datasets of diabetes, heart disease, chronic kidney disease, and liver. The results showed that among all the other models, logistic regression had the highest accuracy of 98.5 percent in the heart dataset. while the least accuracy is of the DT classifier which came out to be 92 percent. In the liver dataset the logistic regression with maximum accuracy of 75.17% among all others. In the chronic renal disease dataset, the logistic regression, RF, and Gaussian NB, all performed well with an accuracy of 1, the accuracy of 100% should be verified by using k-fold cross-validation to test the reliability of the models. In the diabetes dataset random forest with maximum accuracy of 83.67 percent. The authors should include a hospital directory as then various hospitals and clinics can be accessed through a single portal. Additionally, image datasets could be included to allow image processing of reports and the deployment of DL to detect diseases.

In [ 102 ], the authors developed an ML model to predict the occurrence of Type 2 Diabetes in the following year (Y + 1) using factors in the present year (Y). Between 2013 and 2018, the dataset was obtained as an electronic health record from a private medical institute. The authors applied logistic regression, RF, SVM, XGBoost, and ensemble ML algorithms to predict the outcome of non-diabetic, prediabetes, and diabetes. Feature selection was applied to choose the three classes efficiently. FPG, HbA1c, triglycerides, BMI, gamma-GTP, gender, age, uric acid, smoking, drinking, physical activity, and family history were among the features selected. According to the experimental results, the maximum accuracy was 73% from RF, while the lowest was 71% from the logistic regression model. The authors presented a model that used only one dataset. As a result, additional data sources should be applied to verify the models developed in this study.

The authors of [ 103 ] classified the diabetes dataset using SVM and NB algorithms with feature selection to improve the model's accuracy. PIDD is taken from the UCI Repository for analysis. For training and testing purposes the authors employed the k-fold cross-validation model, the SVM classifier was performing better than the NB method it offers around 91% correct predictions; however, the authors acknowledge that they need to extend to the latest dataset that will contain additional attributes and rows.

K-means clustering is an unsupervised ML algorithm that was introduced by the authors of [ 104 ] for the purpose of detecting heart disease in its earliest stages using the UCI heart disease dataset. PCA is used for dimensionality reduction. The outcome of the method demonstrates early cardiac disease prediction with 94.06% accuracy. The authors should apply the proposed technique using more than one algorithm and use more than one dataset.

In [ 105 ], the authors constructed a predictive model for the classification of diabetes data using the logistic regression classification technique. The dataset includes 459 patients for training data and 128 cases for testing data. The prediction accuracy using logistic regression was obtained at 92%. The main limitation of this research is that the authors have not compared the model with other diabetes prediction algorithms, so it cannot be confirmed.

The authors of [ 106 ] developed a prediction model that analyzes the user's symptoms and predicts the disease using ML algorithms (DT classifier, RF classifier, and NB classifier). The purpose of this study was to solve health-related problems by allowing medical professionals to predict diseases at an early stage. The dataset is a sample of 4920 patient records with 41 illnesses diagnosed. A total of 41 disorders were included as a dependent variable. All algorithms achieved the same accuracy score of 95.12%. The authors noticed that overfitting occurred when all 132 symptoms from the original dataset were assessed instead of 95 symptoms. That is, the tree appears to remember the dataset provided and thus fails to classify new data. As a result, just 95 symptoms were assessed during the data-cleansing process, with the best ones being chosen.

In [ 107 ], the authors built a decision-making system that assists practitioners to anticipate cardiac problems in exact classification through a simpler method and will deliver automated predictions about the condition of the patient’s heart. implemented 4 algorithms (KNN, RF, DT, and NB), all these algorithms were used in the Cleveland Heart Disease dataset. The accuracy varies for different classification methods. The maximum accuracy is given when they utilized the KNN algorithm with the Correlation factor which is almost 94 percent. The authors should extend the presented technique to leverage more than one dataset and forecast different diseases.

The authors of [ 108 ] used the Cleveland dataset, which included 303 cases and 76 attributes, to test out three different classification strategies: NB, SVM, and DT in addition to KNN. Only 14 of these 76 characteristics are going to be put through the testing process. The authors performed data preprocessing to remove noisy data. The KNN obtained the greatest accuracy with 90.79 percent. The authors need to use more sophisticated models to improve the accuracy of early heart disease prediction.

The authors of [ 109 ] proposed a model to predict heart disease by making use of a cardiovascular dataset, which was then classified through the application of supervised machine learning algorithms (DT, NB, logistic regression, RF, SVM, and KNN). The results reveal that the DT classification model predicted cardiovascular disorders better than other algorithms with an accuracy of 73 percent. The authors highlighted that the ensemble ML techniques employing the CVD dataset can generate a better illness prediction model.

In [ 110 ], the authors attempted to increase the accuracy of heart disease prediction by applying a logistic regression using a healthcare dataset to determine whether patients have heart illness problems or not. The dataset was acquired from an ongoing cardiovascular study on people of the town of Framingham, Massachusetts. The model reached an accuracy prediction of 87 percent. The authors acknowledge the model could be improved with more data and the use of more ML models.

Because breast cancer affects one in every 28 women in India, the author of [ 111 ] presented an accurate classification technique to examine the breast cancer dataset containing 569 rows and 32 columns. Similarly employing a heart disease dataset and Lung cancer dataset, this research offered A novel way to function selection. This method of selection is based on genetic algorithms mixed with the SVM classification. The classifier results are Lung cancer 81.8182, Diabetes 78.9272. noticed that the size, kind, and source of data used are not indicated.

In [ 112 ], the authors predicted the risk factors that cause heart disease using the K-means clustering algorithm and analyzed with a visualization tool using a Cleveland heart disease dataset with 76 features of 303 patients, holds 209 records with 8 attributes such as age, chest pain type, blood pressure, blood glucose level, ECG in rest, heart rate as well as four types of chest pain. The authors forecast cardiac diseases by taking into consideration the primary characteristics of four types of chest discomfort solely and K-means clustering is a common unsupervised ML technique.

The aim of the article [ 113 ] was to report the advantages of using a variety of data mining (DM) methods and validated heart disease survival prediction models. From the observations, the authors proposed that logistic regression and NB achieved the highest accuracy when performed on a high-dimensional dataset on the Cleveland hospital dataset and DT and RF produce better results on low-dimensional datasets. RF delivers more accuracy than the DT classifier as the algorithm is an optimized learning algorithm. The author mentioned that this work can be extended to other ML algorithms, the model could be developed in a distributed environment such as Map–Reduce, Apache Mahout, and HBase.

In [ 114 ], the authors proposed a single algorithm named hybridization to predict heart disease that combines used techniques into one single algorithm. The presented method has three phases. Preprocessing phase, classification phase, and diagnosis phase. They employed the Cleveland database and algorithms NB, SVM, KNN, NN, J4.8, RF, and GA. NB and SVM always perform better than others, whereas others depend on the specified features. results attained an accuracy of 89.2 percent. The authors need to is the key goal. Notice that the dataset is little; hence, the system was not able to train adequately, so the accuracy of the method was bad.

Using six algorithms (logistic regression, KNN, DT, SVM, NB, and RF), the authors of [ 115 ] explored different data representations to better understand how to use clinical data for predicting liver disease. The original dataset was taken from the northeast of Andhra Pradesh, India. includes 583 liver patient data, whereas 75.64 percent are male, and 24.36 percent are female. The analysis result indicated that the logistic regression classifier delivers the most increased order exactness of 75 percent depending on the f1 measure to forecast the liver illness and NB gives the least precision of 53 percent. The authors merely studied a few prominent supervised ML algorithms; more algorithms can be picked to create an increasingly exact model of liver disease prediction and performance can be steadily improved.

In [ 116 ], the authors aimed to predict coronary heart disease (CHD) based on historical medical data using ML technology. The goal of this study is to use three supervised learning approaches, NB, SVM, and DT, to find correlations in CHD data that could aid improve prediction rates. The dataset contains a retrospective sample of males from KEEL, a high-risk heart disease location in the Western Cape of South Africa. The model utilized NB, SVM, and DT. NB achieved the most accurate among the three models. SVM and DT J48 outperformed NB with a specificity rate of 82 percent but showed an inadequate sensitivity rate of less than 50 percent.

With the help of DM and network analysis methods, the authors of [ 117 ] created a chronic disease risk prediction framework that was created and evaluated in the Australian healthcare system to predict type 2 diabetes risk. Using a private healthcare funds dataset from Australia that spans six years and three different predictive algorithms (regression, parameter optimization, and DT). The accuracy of the prediction ranges from 82 to 87 percent. The hospital admission and discharge summary are the dataset's source. As a result, it does not provide information about general physician visits or future diagnoses.

DL-based healthcare prediction

With the help of DL algorithms such as CNN for autofeature extraction and illness prediction, the authors of [ 118 ] proposed a system for predicting patients with the more common inveterate diseases, and they used KNN for distance calculation to locate the exact matching in the dataset and the outcome of the final sickness prediction. A combination of disease symptoms was made for the structure of the dataset, the living habits of a person, and the specific attaches to doctor consultations which are acceptable in this general disease prediction. In this study, the Indian chronic kidney disease dataset was utilized that comprises 400 occurrences, 24 characteristics, and 2 classes were restored from the UCI ML store. Finally, a comparative study of the proposed system with other algorithms such as NB, DT, and logistic regression has been demonstrated in this study. The findings showed that the proposed system gives an accuracy of 95% which is higher than the other two methods. So, the proposed technique should be applied using more than one dataset.

In [ 119 ], the authors developed a DL approach that uses chest radiography images to differentiate between patients with mild, pneumonia, and COVID-19 infections, providing a valid mechanism for COVID-19 diagnosis. To increase the intensity of the chest X-ray image and eliminate noise, image-enhancing techniques were used in the proposed system. Two distinct DL approaches based on a pertained neural network model (ResNet-50) for COVID-19 identification utilizing chest X-ray (CXR) pictures are proposed in this work to minimize overfitting and increase the overall capabilities of the suggested DL systems. The authors emphasized that tests using a vast and hard dataset encompassing several COVID-19 cases are necessary to establish the efficacy of the suggested system.

Diabetes disease prediction was the topic of the article [ 120 ], in which the authors presented a cuckoo search-based deep LSTM classifier for prediction. The deep convLSTM classifier is used in cuckoo search optimization, which is a nature-inspired method for accurately predicting disease by transferring information and therefore reducing time consumption. The PIMA dataset is used to predict the onset of diabetes. The National Institute of Diabetes and Digestive and Kidney Diseases provided the data. The dataset is made up of independent variables including insulin level, age, and BMI index, as well as one dependent variable. The new technique was compared to traditional methods, and the results showed that the proposed method achieved 97.591 percent accuracy, 95.874 percent sensitivity, and 97.094 percent specificity, respectively. The authors noticed more datasets are needed, as well as new approaches to improve the classifier's effectiveness.

In [ 121 ], the authors presented a wavelet-based convolutional neural network to handle data limitations in this time of COVID-19 fast emergence. By investigating the influence of discrete wavelet transform decomposition up to 4 levels, the model demonstrated the capability of multi-resolution analysis for detecting COVID-19 chest X-rays. The wavelet sub-bands are CNN’s inputs at each decomposition level. COVID-19 chest X-ray-12 is a collection of 1,944 chest X-ray pictures divided into 12 groups that were compiled from two open-source datasets (National Institute Health containing several X-rays of pneumonia-related diseases, whereas the COVID-19 dataset is collected from Radiology Society North America). COVID-Neuro wavelet, a suggested model, was trained alongside other well-known ImageNet pre-trained models on COVID-CXR-12. The authors acknowledge they hope to investigate the effects of other wavelet functions besides the Haar wavelet.

A CNN framework for COVID-19 identification was suggested in [ 122 ] it made use of computed tomography images that was developed by the authors. The proposed framework employs a public CT dataset of 2482 CT images from patients of both classifications. the system attained an accuracy of 96.16 percent and recall of 95.41 percent after training using only 20 percent of the dataset. The authors stated that the use of the framework should be extended to multimodal medical pictures in the future.

Using an LSTM network enhanced by two processes to perform multi-label classification based on patients' clinical visit records, the authors of [ 123 ] performed multi-disease prediction for intelligent clinical decision support. A massive dataset of electronic health records was collected from a prominent hospital in southeast China. The suggested LSTM approach outperforms several standard and DL models in predicting future disease diagnoses, according to model evaluation results. The F1 score rises from 78.9 to 86.4 percent, respectively, with the state-of-the-art conventional and DL models, to 88.0 percent with the suggested technique. The authors stated that the model prediction performance may be enhanced further by including new input variables and that to reduce computational complexity, the method only uses one data source.

In [ 124 ], the authors introduced an approach to creating a supervised ANN structure based on the subnets (the group of neurons) instead of layers, in the cases of low datasets, this effectively predicted the disease. The model was evaluated using textual data and compared to multilayer perceptrons (MLPs) as well as LSTM recurrent neural network models using three small-scale publicly accessible benchmark datasets. On the Iris dataset, the experimental findings for classification reached 97% accuracy, compared to 92% for RNN (LSTM) with three layers, and the model had a lower error rate, 81, than RNN (LSTM) and MLP on the diabetic dataset, while RNN (LSTM) has a high error rate of 84. For larger datasets, however, this method is useless. This model is useless because it has not been implemented on large textual and image datasets.

The authors of [ 125 ] presented a novel AI and Internet of Things (IoT) convergence-based disease detection model for a smart healthcare system. Data collection, reprocessing, categorization, and parameter optimization are all stages of the proposed model. IoT devices, such as wearables and sensors, collect data, which AI algorithms then use to diagnose diseases. The forest technique is then used to remove any outliers found in the patient data. Healthcare data were used to assess the performance of the CSO-LSTM model. During the study, the CSO-LSTM model had a maximum accuracy of 96.16% on heart disease diagnoses and 97.26% on diabetes diagnoses. This method offered a greater prediction accuracy for heart disease and diabetes diagnosis, but there was no feature selection mechanism; hence, it requires extensive computations.

The global health crisis posed by coronaviruses was a subject of [ 126 ]. The authors aimed at detecting disease in people whose X-ray had been selected as potential COVID-19 candidates. Chest X-rays of people with COVID-19, viral pneumonia, and healthy people are included in the dataset. The study compared the performance of two DL algorithms, namely CNN and RNN. DL techniques were used to evaluate a total of 657 chest X-ray images for the diagnosis of COVID-19. VGG19 is the most successful model, with a 95% accuracy rate. The VGG19 model successfully categorizes COVID-19 patients, healthy individuals, and viral pneumonia cases. The dataset's most failing approach is InceptionV3. The success percentage can be improved, according to the authors, by improving data collection. In addition to chest radiography, lung tomography can be used. The success ratio and performance can be enhanced by creating numerous DL models.

In [ 127 ], the authors developed a method based on the RNN algorithm for predicting blood glucose levels for diabetics a maximum of one hour in the future, which required the patient's glucose level history. The Ohio T1DM dataset for blood glucose level prediction, which included blood glucose level values for six people with type 1 diabetes, was used to train and assess the approach. The distribution features were further honed with the use of studies that revealed the procedure's certainty estimate nature. The authors point out that they can only evaluate prediction goals with enough glucose level history; thus, they cannot anticipate the beginning levels after a gap, which does not improve the prediction's quality.

To build a new deep anomaly detection model for fast, reliable screening, the authors of [ 128 ] used an 18-layer residual CNN pre-trained on ImageNet with a different anomaly detection mechanism for the classification of COVID-19. On the X-ray dataset, which contains 100 images from 70 COVID-19 persons and 1431 images from 1008 non-COVID-19 pneumonia subjects, the model obtains a sensitivity of 90.00 percent specificity of 87.84 percent or sensitivity of 96.00 percent specificity of 70.65 percent. The authors noted that the model still has certain flaws, such as missing 4% of COVID-19 cases and having a 30% false positive rate. In addition, more clinical data are required to confirm and improve the model's usefulness.

In [ 129 ], the authors developed COVIDX-Net, a novel DL framework that allows radiologists to diagnose COVID-19 in X-ray images automatically. Seven algorithms (MobileNetV2, ResNetV2, VGG19, DenseNet201, InceptionV3, Inception, and Xception) were evaluated using a small dataset of 50 photographs (MobileNetV2, ResNetV2, VGG19, DenseNet201, InceptionV3, Inception, and Xception). Each deep neural network model can classify the patient's status as a negative or positive COVID-19 case based on the normalized intensities of the X-ray image. The f1-scores for the VGG19 and dense convolutional network (DenseNet) models were 0.89 and 0.91, respectively. With f1-scores of 0.67, the InceptionV3 model has the weakest classification performance.

The authors of [ 130 ] designed a DL approach for delivering 30-min predictions about future glucose levels based on a Dilated RNN (DRNN). The performance of the DRNN models was evaluated using data from two electronic health records datasets: OhioT1DM from clinical trials and the in silicon dataset from the UVA-Padova simulator. It outperformed established glucose prediction approaches such as neural networks (NNs), support vector regression (SVR), and autoregressive models (ARX). The results demonstrated that it significantly improved glucose prediction performance, although there are still some limits, such as the authors' creation of a data-driven model that heavily relies on past EHR. The quality of the data has a significant impact on the accuracy of the prediction. The number of clinical datasets is limited and , however, often restricted. Because certain data fields are manually entered, they are occasionally incorrect.

In [ 131 ], the authors utilized a deep neural network (DNN) to discover 15,099 stroke patients, researchers were able to predict stroke death based on medical history and human behaviors utilizing large-scale electronic health information. The Korea Centers for Disease Control and Prevention collected data from 2013 to 2016 and found that there are around 150 hospitals in the country, all having more than 100 beds. Gender, age, type of insurance, mode of admission, necessary brain surgery, area, length of hospital stays, hospital location, number of hospital beds, stroke kind, and CCI were among the 11 variables in the DL model. To automatically create features from the data and identify risk factors for stroke, researchers used a DNN/scaled principal component analysis (PCA). 15,099 people with a history of stroke were enrolled in the study. The data were divided into a training set (66%) and a testing set (34%), with 30 percent of the samples used for validation in the training set. DNN is used to examine the variables of interest, while scaled PCA is utilized to improve the DNN's continuous inputs. The sensitivity, specificity, and AUC values were 64.32%, 85.56%, and 83.48%, respectively.

The authors of [ 132 ] proposed (GluNet), an approach to glucose forecasting. This method made use of a personalized DNN to forecast the probabilistic distribution of short-term measurements for people with Type 1 diabetes based on their historical data. These data included insulin doses, meal information, glucose measurements, and a variety of other factors. It utilized the newest DL techniques consisting of four components: post-processing, dilated CNN, label recovery/ transform, and data preprocessing. The authors run the models on the subjects from the OhioT1DM datasets. The outcomes revealed significant enhancements over the previous procedures via a comprehensive comparison concerning the and root mean square error (RMSE) having a time lag of 60 min prediction horizons (PH) and RMSE having a small-time lag for the case of prediction horizons in the virtual adult participants. If the PH is properly matched to the lag between input and output, the user may learn the control of the system more frequently and it achieves good performance. Additionally, GluNet was validated on two clinical datasets. It attained an RMSE with a time lag of 60 min PH and RMSE with a time lag of 30-min PH. The authors point out that the model does not consider physiological knowledge, and that they need to test GluNet with larger prediction horizons and use it to predict overnight hypoglycemia.

The authors of [ 133 ] proposed the short-term blood glucose prediction model (VMD-IPSO-LSTM), which is a short-term strategy for predicting blood glucose (VMD-IPSO-LSTM). Initially, the intrinsic modal functions (IMF) in various frequency bands were obtained using the variational modal decomposition (VMD) technique, which deconstructed the blood glucose content. The short- and long-term memory networks then constructed a prediction mechanism for each blood glucose component’s intrinsic modal functions (IMF). Because the time window length, learning rate, and neuron count are difficult to set, the upgraded PSO approach optimized these parameters. The improved LSTM network anticipated each IMF, and the projected subsequence was superimposed in the final step to arrive at the ultimate prediction result. The data of 56 participants were chosen as experimental data among 451 diabetic Mellitus patients. The experiments revealed that it improved prediction accuracy at "30 min, 45 min, and 60 min." The RMSE and MAPE were lower than the "VMD-PSO-LSTM, VMD-LSTM, and LSTM," indicating that the suggested model is effective. The longer time it took to anticipate blood glucose levels and the higher accuracy of the predictions gave patients and doctors more time to improve the effectiveness of diabetes therapy and manage blood glucose levels. The authors noted that they still faced challenges, such as an increase in calculation volume and operation time. The time it takes to estimate glucose levels in the short term will be reduced.

To speed up diagnosis and cut down on mistakes, the authors of [ 134 ] proposed a new paradigm for primary COVID-19 detection based on a radiology review of chest radiography or chest X-ray. The authors used a dataset of chest X-rays from verified COVID-19 patients (408 photographs), confirmed pneumonia patients (4273 images), and healthy people (1590 images) to perform a three-class image classification (1590 images). There are 6271 people in total in the dataset. To fulfill this image categorization problem, the authors plan to use CNN and transfer learning. For all the folds of data, the model's accuracy ranged from 93.90 percent to 98.37 percent. Even the lowest level of accuracy, 93.90 percent, is still quite good. The authors will face a restriction, particularly when it comes to adopting such a model on a large scale for practical usage.

In [ 135 ], the authors proposed DL models for predicting the number of COVID-19-positive cases in Indian states. The Ministry of Health and Family Welfare dataset contains time series data for 32 individual confirmed COVID-19 cases in each of the states (28) and union territories (4) since March 14, 2020. This dataset was used to conduct an exploratory analysis of the increase in the number of positive cases in India. As prediction models, RNN-based LSTMs are used. Deep LSTM, convolutional LSTM, and bidirectional LSTM models were tested on 32 states/union territories, and the model with the best accuracy was chosen based on absolute error. Bidirectional LSTM produced the best performance in terms of prediction errors, while convolutional LSTM produced the worst performance. For all states, daily and weekly forecasts were calculated, and bi-LSTM produced accurate results (error less than 3%) for short-term prediction (1–3 days).

With the goal of increasing the reliability and precision of type 1 diabetes predictions, the authors of [ 136 ] proposed a new method based on CNNs and DL. It was about figuring out how to extract the behavioral pattern. Numerous observations of identical behaviors were used to fill in the gaps in the data. The suggested model was trained and verified using data from 759 people with type 1 diabetes who visited Sheffield Teaching Hospitals between 2013 and 2015. A subject's type 1 diabetes test, demographic data (age, gender, years with diabetes), and the final 84 days (12 weeks) of self-monitored blood glucose (SMBG) measurements preceding the test formed each item in the training set. In the presence of insufficient data and certain physiological specificities, prediction accuracy deteriorates, according to the authors.

The authors of [ 137 ] constructed a framework using the PIDD. PID's participants are all female and at least 21 years old. PID comprises 768 incidences, with 268 samples diagnosed as diabetic and 500 samples not diagnosed as diabetic. The eight most important characteristics that led to diabetes prediction. The accuracy of functional classifiers such as ANN, NB, DT, and DL is between 90 and 98 percent. On the PIMA dataset, DL had the best results for diabetes onset among the four, with an accuracy rate of 98.07 percent. The technique uses a variety of classifiers to accurately predict the disease, but it failed to diagnose it at an early stage.

To summarize all previous works discussed in this section, we will categorize them according to the diseases along with the techniques used to predict each disease, the datasets used, and the main findings, as shown in Table 5 .

Results and discussion

This study conducted a systematic review to examine the latest developments in ML and DL for healthcare prediction. It focused on healthcare forecasting and how the use of ML and DL can be relevant and robust. A total of 41 papers were reviewed, 21 in ML and 20 in DL as depicted in Fig.  17 .

In this study, the reviewed paper were classified by diseases predicted; as a result, 5 diseases were discussed including diabetes, COVID-19, heart, liver, and chronic kidney). Table 6 illustrates the number of reviewed papers for each disease in addition to the adopted prediction techniques in each disease.

Table 6 provides a comprehensive summary of the various ML and DL models used for disease prediction. It indicates the number of studies conducted on each disease, the techniques employed, and the highest level of accuracy attained. As shown in Table 6 , the optimal diagnostic accuracy for each disease varies. For diabetes, the DL model achieved a 98.07% accuracy rate. For COVID-19, the accuracy of the logistic regression model was 98.5%. The CSO-LSTM model achieved an accuracy of 96.16 percent for heart disease. For liver disease, the accuracy of the logistic regression model was 75%. The accuracy of the logistic regression model for predicting multiple diseases was 98.5%. It is essential to note that these are merely the best accuracy included in this survey. In addition, it is essential to consider the size and quality of the datasets used to train and validate the models. It is more likely that models trained on larger and more diverse datasets will generalize well to new data. Overall, the results presented in Table 6 indicate that ML and DL models can be used to accurately predict disease. When selecting a model for a specific disease, it is essential to carefully consider the various models and techniques.

Although ML and DL have made incredible strides in recent years, they still have a long way to go before they can effectively be used to solve the fundamental problems plaguing the healthcare systems. Some of the challenges associated with implementing ML and DL approaches in healthcare prediction are discussed here.

The Biomedical Data Stream is the primary challenge that needs to be handled. Significant amounts of new medical data are being generated rapidly, and the healthcare industry as a whole is evolving rapidly. Some examples of such real-time biological signals include measurements of blood pressure, oxygen saturation, and glucose levels. While some variants of DL architecture have attempted to address this problem, many challenges remain before effective analyses of rapidly evolving, massive amounts of streaming data can be conducted. These include problems with memory consumption, feature selection, missing data, and computational complexity. Another challenge for ML and DL is tackling the complexity of the healthcare domain.

Healthcare and biomedical research present more intricate challenges than other fields. There is still a lot we do not know about the origins, transmission, and cures for many of these incredibly diverse diseases. It is hard to collect sufficient data because there are not always enough patients. A solution to this issue may be found, however. The small number of patients necessitates exhaustive patient profiling, innovative data processing, and the incorporation of additional datasets. Researchers can process each dataset independently using the appropriate DL technique and then represent the results in a unified model to extract patient data.

The use of ML and DL techniques for healthcare prediction has the potential to change the way traditional healthcare services are delivered. In the case of ML and DL applications, healthcare data is deemed the most significant component that contributes to medical care systems. This paper aims to present a comprehensive review of the most significant ML and DL techniques employed in healthcare predictive analytics. In addition, it discussed the obstacles and challenges of applying ML and DL Techniques in the healthcare domain. As a result of this survey, a total of 41 papers covering the period from 2019 to 2022 were selected and thoroughly reviewed. In addition, the methodology for each paper was discussed in detail. The reviewed studies have shown that AI techniques (ML and DL) play a significant role in accurately diagnosing diseases and helping to anticipate and analyze healthcare data by linking hundreds of clinical records and rebuilding a patient's history using these data. This work advances research in the field of healthcare predictive analytics using ML and DL approaches and contributes to the literature and future studies by serving as a resource for other academics and researchers.

Availability of data and materials

Not applicable.

Abbreviations

Artificial Intelligence

Machine Learning

Decision Tree

Electronic Health Records

Random Forest

Support Vector Machine

K-Nearest Neighbor

Naive Bayes

Reinforcement Learning

Natural Language Processing

Monte Carlo Tree Search

Partially Observable Markov Decision Processes

Deep Learning

Deep Belief Network

Artificial Neural Networks

Convolutional Neural Networks

Long Short-Term Memory

Recurrent Convolution Neural Networks

Recurrent Neural Networks

Recurrent Convolutional Layer

Receptive Domains

Recurrent Multilayer Perceptron

Pima Indian Diabetes Database

Coronary Heart Disease

Chest X-Ray

Multilayer Perceptrons

Internet of Things

Dilated RNN

Neural Networks

Support Vector Regression

Principal Component Analysis

Deep Neural Network

Prediction Horizons

Root Mean Square Error

Intrinsic Modal Functions

Variational Modal Decomposition

Self-Monitored Blood Glucose

Latha MH, Ramakrishna A, Reddy BSC, Venkateswarlu C, Saraswathi SY (2022) Disease prediction by stacking algorithms over big data from healthcare communities. Intell Manuf Energy Sustain: Proc ICIMES 2021(265):355

Google Scholar  

Van Calster B, Wynants L, Timmerman D, Steyerberg EW, Collins GS (2019) Predictive analytics in health care: how can we know it works? J Am Med Inform Assoc 26(12):1651–1654

Sahoo PK, Mohapatra SK, Wu SL (2018) SLA based healthcare big data analysis and computing in cloud network. J Parallel Distrib Comput 119:121–135

Thanigaivasan V, Narayanan SJ, Iyengar SN, Ch N (2018) Analysis of parallel SVM based classification technique on healthcare using big data management in cloud storage. Recent Patents Comput Sci 11(3):169–178

Elmahdy HN (2014) Medical diagnosis enhancements through artificial intelligence

Xiong X, Cao X, Luo L (2021) The ecology of medical care in Shanghai. BMC Health Serv Res 21:1–9

Donev D, Kovacic L, Laaser U (2013) The role and organization of health care systems. Health: systems, lifestyles, policies, 2nd edn. Jacobs Verlag, Lage, pp 3–144

Murphy G F, Hanken M A, & Waters K A (1999) Electronic health records: changing the vision

Qayyum A, Qadir J, Bilal M, Al-Fuqaha A (2020) Secure and robust machine learning for healthcare: a survey. IEEE Rev Biomed Eng 14:156–180

El Seddawy AB, Moawad R, Hana MA (2018) Applying data mining techniques in CRM

Wang Y, Kung L, Wang WYC, Cegielski CG (2018) An integrated big data analytics-enabled transformation model: application to health care. Inform Manag 55(1):64–79

Mirbabaie M, Stieglitz S, Frick NR (2021) Artificial intelligence in disease diagnostics: a critical review and classification on the current state of research guiding future direction. Heal Technol 11(4):693–731

Tang R, De Donato L, Besinović N, Flammini F, Goverde RM, Lin Z, Wang Z (2022) A literature review of artificial intelligence applications in railway systems. Transp Res Part C: Emerg Technol 140:103679

Singh G, Al’Aref SJ, Van Assen M, Kim TS, van Rosendael A, Kolli KK, Dwivedi A, Maliakal G, Pandey M, Wang J, Do V (2018) Machine learning in cardiac CT: basic concepts and contemporary data. J Cardiovasc Comput Tomograph 12(3):192–201

Kim KJ, Tagkopoulos I (2019) Application of machine learning in rheumatic disease research. Korean J Intern Med 34(4):708

Liu B (2011) Web data mining: exploring hyperlinks, contents, and usage data. Spriger, Berlin

MATH   Google Scholar  

Haykin S, Lippmann R (1994) Neural networks, a comprehensive foundation. Int J Neural Syst 5(4):363–364

Gupta M, Pandya SD (2022) A comparative study on supervised machine learning algorithm. Int J Res Appl Sci Eng Technol (IJRASET) 10(1):1023–1028

Ray S (2019) A quick review of machine learning algorithms. In: 2019 international conference on machine learning, big data, cloud and parallel computing (COMITCon) (pp 35–39). IEEE

Srivastava A, Saini S, & Gupta D (2019) Comparison of various machine learning techniques and its uses in different fields. In: 2019 3rd international conference on electronics, communication and aerospace technology (ICECA) (pp 81–86). IEEE

Park HA (2013) An introduction to logistic regression: from basic concepts to interpretation with particular attention to nursing domain. J Korean Acad Nurs 43(2):154–164

Obulesu O, Mahendra M, & Thrilok Reddy M (2018) Machine learning techniques and tools: a survey. In: 2018 international conference on inventive research in computing applications (ICIRCA) (pp 605–611). IEEE

Dhall D, Kaur R, & Juneja M (2020) Machine learning: a review of the algorithms and its applications. Proceedings of ICRIC 2019: recent innovations in computing 47–63

Yang F J (2019) An extended idea about Decision Trees. In: 2019 international conference on computational science and computational intelligence (CSCI) (pp 349–354). IEEE

Eesa AS, Orman Z, Brifcani AMA (2015) A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems. Expert Syst Appl 42(5):2670–2679

Shamim A, Hussain H, & Shaikh M U (2010) A framework for generation of rules from Decision Tree and decision table. In: 2010 international conference on information and emerging technologies (pp 1–6). IEEE

Eesa AS, Abdulazeez AM, Orman Z (2017) A dids based on the combination of cuttlefish algorithm and Decision Tree. Sci J Univ Zakho 5(4):313–318

Bakyarani ES, Srimathi H, Bagavandas M (2019) A survey of machine learning algorithms in health care. Int J Sci Technol Res 8(11):223

Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems. ACM Comput Surv (CSUR) 51(3):1–36

Breiman L (2001) Random forests. Mach learn 45:5–32

Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

Hofmann M, & Klinkenberg R (2016) RapidMiner: data mining use cases and business analytics applications. CRC Press

Chow CKCN, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467

Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167

Han J, Pei J, Kamber M (1999) Data mining: concepts and techniques. 2011

Cortes C, Vapnik V (1995) Support-vector networks. Mach learn 20:273–297

Aldahiri A, Alrashed B, Hussain W (2021) Trends in using IoT with machine learning in health prediction system. Forecasting 3(1):181–206

Sarker IH (2021) Machine learning: Algorithms, real-world applications and research directions. SN Comput Sci 2(3):160

Ting K M, & Zheng Z (1999) Improving the performance of boosting for naive Bayesian classification. In: Methodologies for knowledge discovery and data mining: third Pacific-Asia conference, PAKDD-99 Beijing, China, Apr 26–28, 1999 proceedings 3 (pp 296–305). Springer Berlin Heidelberg

Oladipo ID, AbdulRaheem M, Awotunde JB, Bhoi AK, Adeniyi EA, Abiodun MK (2022) Machine learning and deep learning algorithms for smart cities: a start-of-the-art review. In: IoT and IoE driven smart cities, pp 143–162

Shailaja K, Seetharamulu B, & Jabbar M A Machine learning in healthcare: a review. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA) 2018 Mar 29 (pp 910–914)

Mahesh B (2020) Machine learning algorithms-a review. Int J Sci Res (IJSR) 9:381–386

Greene D, Cunningham P, & Mayer R (2008) Unsupervised learning and clustering. Mach learn Techn Multimed: Case Stud Organ Retriev 51–90

Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc, USA

Kodinariya TM, Makwana PR (2013) Review on determining number of cluster in K-means clustering. Int J 1(6):90–95

Smith LI (2002) A tutorial on principal components analysis

Mishra SP, Sarkar U, Taraphder S, Datta S, Swain D, Saikhom R, Laishram M (2017) Multivariate statistical data analysis-principal component analysis (PCA). Int J Livestock Res 7(5):60–78

Kamani M, Farzin Haddadpour M, Forsati R, and Mahdavi M (2019) "Efficient Fair Principal Component Analysis." arXiv e-prints: arXiv-1911.

Dey A (2016) Machine learning algorithms: a review. Int J Comput Sci Inf Technol 7(3):1174–1179

Agrawal R, Imieliński T, & Swami A (1993) Mining association rules between sets of items in large databases. In: proceedings of the 1993 ACM SIGMOD international conference on Management of data (pp 207–216)

Agrawal R, & Srikant R (1994) Fast algorithms for mining association rules. In: Proceeding of 20th international conference very large data bases, VLDB (Vol 1215, pp 487-499)

Singh J, Ram H, Sodhi DJ (2013) Improving efficiency of apriori algorithm using transaction reduction. Int J Sci Res Publ 3(1):1–4

Al-Maolegi M, & Arkok B (2014) An improved Apriori algorithm for association rules. arXiv preprint arXiv:1403.3948

Abaya SA (2012) Association rule mining based on Apriori algorithm in minimizing candidate generation. Int J Sci Eng Res 3(7):1–4

Coronato A, Naeem M, De Pietro G, Paragliola G (2020) Reinforcement learning for intelligent healthcare applications: a survey. Artif Intell Med 109:101964

Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8:279–292

Jang B, Kim M, Harerimana G, Kim JW (2019) Q-learning algorithms: a comprehensive classification and applications. IEEE access 7:133653–133667

Achille A, Soatto S (2018) Information dropout: Learning optimal representations through noisy computation. IEEE Trans Pattern Anal Mach Intell 40(12):2897–2905

Williams G, Wagener N, Goldfain B, Drews P, Rehg J M, Boots B, & Theodorou E A (2017) Information theoretic MPC for model-based reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA) (pp 1714–1721). IEEE

Wilkes JT, Gallistel CR (2017) Information theory, memory, prediction, and timing in associative learning. Comput Models Brain Behav 29:481–492

Ning Y, Jia J, Wu Z, Li R, An Y, Wang Y, Meng H (2017) Multi-task deep learning for user intention understanding in speech interaction systems. In: Proceedings of the AAAI conference on artificial intelligence (Vol 31, No. 1)

Shi X, Gao Z, Lausen L, Wang H, Yeung DY, Wong WK, Woo WC (2017) Deep learning for precipitation nowcasting: a benchmark and a new model. In: Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (Eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc.,. https://proceedings.neurips.cc/paper_files/paper/2017/file/a6db4ed04f1621a119799fd3d7545d3d-Paper.pdf

Juang CF, Lu CM (2009) Ant colony optimization incorporated with fuzzy Q-learning for reinforcement fuzzy control. IEEE Trans Syst, Man, Cybernet-Part A: Syst Humans 39(3):597–608

Świechowski M, Godlewski K, Sawicki B, Mańdziuk J (2022) Monte Carlo tree search: a review of recent modifications and applications. Artif Intell Rev 56:1–66

Lizotte DJ, Laber EB (2016) Multi-objective Markov decision processes for data-driven decision support. J Mach Learn Res 17(1):7378–7405

MathSciNet   MATH   Google Scholar  

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489

Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Colton S (2012) A survey of monte carlo tree search methods. IEEE Trans Comput Intell AI Games 4(1):1–43

Ling ZH, Kang SY, Zen H, Senior A, Schuster M, Qian XJ, Deng L (2015) Deep learning for acoustic modeling in parametric speech generation: a systematic review of existing techniques and future trends. IEEE Signal Process Magaz 32(3):35–52

Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

Yu D, Deng L (2010) Deep learning and its applications to signal and information processing [exploratory dsp]. IEEE Signal Process Mag 28(1):145–154

Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

Goyal P, Pandey S, Jain K, Goyal P, Pandey S, Jain K (2018) Introduction to natural language processing and deep learning. Deep Learn Nat Language Process: Creat Neural Netw Python 1–74. https://doi.org/10.1007/978-1-4842-3685-7

Mathew A, Amudha P, Sivakumari S (2021) Deep learning techniques: an overview. Adv Mach Learn Technol Appl: Proc AMLTA 2020:599–608

Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, USA

Gomes L (2014) Machine-learning maestro Michael Jordan on the delusions of big data and other huge engineering efforts. IEEE Spectrum 20. https://spectrum.ieee.org/machinelearning-maestro-michael-jordan-on-the-delusions-of-big-data-and-other-huge-engineering-efforts

Huang G, Liu Z, Van Der Maaten L, & Weinberger K Q (2017) Densely connected convolutional networks. In: proceedings of the IEEE conference on computer vision and pattern recognition (pp 4700–4708)

Yap MH, Pons G, Marti J, Ganau S, Sentis M, Zwiggelaar R, Marti R (2017) Automated breast ultrasound lesions detection using convolutional neural networks. IEEE J Biomed Health Inform 22(4):1218–1226

Hayashi Y (2019) The right direction needed to develop white-box deep learning in radiology, pathology, and ophthalmology: a short review. Front Robot AI 6:24

Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3):292

Schmidhuber J, Hochreiter S (1997) Long short-term memory. Neural Comput 9(8):1735–1780

Smagulova K, James AP (2019) A survey on LSTM memristive neural network architectures and applications. Eur Phys J Spec Topics 228(10):2313–2324

Setyanto A, Laksito A, Alarfaj F, Alreshoodi M, Oyong I, Hayaty M, Kurniasari L (2022) Arabic language opinion mining based on long short-term memory (LSTM). Appl Sci 12(9):4140

Lindemann B, Müller T, Vietz H, Jazdi N, Weyrich M (2021) A survey on long short-term memory networks for time series prediction. Procedia CIRP 99:650–655

Cui Z, Ke R, Pu Z, & Wang Y (2018) Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143

Villegas R, Yang J, Zou Y, Sohn S, Lin X, & Lee H (2017) Learning to generate long-term future via hierarchical prediction. In: international conference on machine learning (pp 3560–3569). PMLR

Gensler A, Henze J, Sick B, & Raabe N (2016) Deep learning for solar power forecasting—an approach using autoencoder and LSTM neural networks. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC) (pp 002858–002865). IEEE

Lindemann B, Fesenmayr F, Jazdi N, Weyrich M (2019) Anomaly detection in discrete manufacturing using self-learning approaches. Procedia CIRP 79:313–318

Kalchbrenner N, Danihelka I, & Graves A (2015) Grid long short-term memory. arXiv preprint arXiv:1507.01526

Cheng B, Xu X, Zeng Y, Ren J, Jung S (2018) Pedestrian trajectory prediction via the social-grid LSTM model. J Eng 2018(16):1468–1474

Veličković P, Karazija L, Lane N D, Bhattacharya S, Liberis E, Liò P & Vegreville M (2018) Cross-modal recurrent models for weight objective prediction from multimodal time-series data. In: proceedings of the 12th EAI international conference on pervasive computing technologies for healthcare (pp 178–186)

Wang J, Hu X (2021) Convolutional neural networks with gated recurrent connections. IEEE Trans Pattern Anal Mach Intell 44(7):3421–3435

Liang M, & Hu X (2015) Recurrent convolutional neural network for object recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition (pp 3367–3375)

Liang M, Hu X, Zhang B (2015) Convolutional neural networks with intra-layer recurrent connections for scene labeling. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (Eds) Advances in Neural Information Processing Systems, vol 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/file/9cf81d8026a9018052c429cc4e56739b-Paper.pdf

Fernandez B, Parlos A G, & Tsai W K (1990) Nonlinear dynamic system identification using artificial neural networks (ANNs). In: 1990 IJCNN international joint conference on neural networks (pp 133–141). IEEE

Puskorius GV, Feldkamp LA (1994) Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks. IEEE Trans Neural Netw 5(2):279–297

Rumelhart DE (1986) Learning representations by error propagation. In: DE Rumelhart and JL McClelland & PDP Research Group, eds, Parallel distributed processing: explorations in the microstructure of cognition. Bradford Books MITPress, Cambridge, Mass

Krishnamoorthi R, Joshi S, Almarzouki H Z, Shukla P K, Rizwan A, Kalpana C, & Tiwari B (2022) A novel diabetes healthcare disease prediction framework using machine learning techniques. J Healthcare Eng. https://doi.org/10.1155/2022/1684017

Edeh MO, Khalaf OI, Tavera CA, Tayeb S, Ghouali S, Abdulsahib GM, Louni A (2022) A classification algorithm-based hybrid diabetes prediction model. Front Publ Health 10:829510

Iwendi C, Huescas C G Y, Chakraborty C, & Mohan S (2022) COVID-19 health analysis and prediction using machine learning algorithms for Mexico and Brazil patients. J Experiment Theor Artif Intell 1–21. https://doi.org/10.1080/0952813X.2022.2058097

Lu H, Uddin S, Hajati F, Moni MA, Khushi M (2022) A patient network-based machine learning model for disease prediction: the case of type 2 diabetes mellitus. Appl Intell 52(3):2411–2422

Chugh M, Johari R, & Goel A (2022) MATHS: machine learning techniques in healthcare system. In: international conference on innovative computing and communications: proceedings of ICICC 2021, Volume 3 (pp 693–702). Springer Singapore

Deberneh HM, Kim I (2021) Prediction of type 2 diabetes based on machine learning algorithm. Int J Environ Res Public Health 18(6):3317

Gupta S, Verma H K, & Bhardwaj D (2021) Classification of diabetes using Naive Bayes and support vector machine as a technique. In: operations management and systems engineering: select proceedings of CPIE 2019 (pp 365–376). Springer Singapore

Islam M T, Rafa S R, & Kibria M G (2020) Early prediction of heart disease using PCA and hybrid genetic algorithm with k-means. In: 2020 23rd international conference on computer and information technology (ICCIT) (pp 1–6). IEEE

Qawqzeh Y K, Bajahzar A S, Jemmali M, Otoom M M, Thaljaoui A (2020) Classification of diabetes using photoplethysmogram (PPG) waveform analysis: logistic regression modeling. BioMed Res Int. https://doi.org/10.1155/2020/3764653

Grampurohit S, Sagarnal C (2020) Disease prediction using machine learning algorithms. In: 2020 international conference for emerging technology (INCET) (pp 1–7). IEEE

Moturi S, Srikanth Vemuru DS (2020) Classification model for prediction of heart disease using correlation coefficient technique. Int J 9(2). https://doi.org/10.30534/ijatcse/2020/185922020

Barik S, Mohanty S, Rout D, Mohanty S, Patra A K, & Mishra A K (2020) Heart disease prediction using machine learning techniques. In: advances in electrical control and signal systems: select proceedings of AECSS 2019 (pp 879–888). Springer, Singapore

Princy R J P, Parthasarathy S, Jose P S H, Lakshminarayanan A R, & Jeganathan S (2020) Prediction of cardiac disease using supervised machine learning algorithms. In: 2020 4th international conference on intelligent computing and control systems (ICICCS) (pp 570–575). IEEE

Saw M, Saxena T, Kaithwas S, Yadav R, & Lal N (2020) Estimation of prediction for getting heart disease using logistic regression model of machine learning. In: 2020 international conference on computer communication and informatics (ICCCI) (pp 1–6). IEEE

Soni VD (2020) Chronic disease detection model using machine learning techniques. Int J Sci Technol Res 9(9):262–266

Indrakumari R, Poongodi T, Jena SR (2020) Heart disease prediction using exploratory data analysis. Procedia Comput Sci 173:130–139

Wu C S M, Badshah M, & Bhagwat V (2019) Heart disease prediction using data mining techniques. In: proceedings of the 2019 2nd international conference on data science and information technology (pp 7–11)

Tarawneh M, & Embarak O (2019) Hybrid approach for heart disease prediction using data mining techniques. In: advances in internet, data and web technologies: the 7th international conference on emerging internet, data and web technologies (EIDWT-2019) (pp 447–454). Springer International Publishing

Rahman AS, Shamrat FJM, Tasnim Z, Roy J, Hossain SA (2019) A comparative study on liver disease prediction using supervised machine learning algorithms. Int J Sci Technol Res 8(11):419–422

Gonsalves A H, Thabtah F, Mohammad R M A, & Singh G (2019) Prediction of coronary heart disease using machine learning: an experimental analysis. In: proceedings of the 2019 3rd international conference on deep learning technologies (pp 51–56)

Khan A, Uddin S, Srinivasan U (2019) Chronic disease prediction using administrative data and graph theory: the case of type 2 diabetes. Expert Syst Appl 136:230–241

Alanazi R (2022) Identification and prediction of chronic diseases using machine learning approach. J Healthcare Eng. https://doi.org/10.1155/2022/2826127

Gouda W, Almurafeh M, Humayun M, Jhanjhi NZ (2022) Detection of COVID-19 based on chest X-rays using deep learning. Healthcare 10(2):343

Kumar A, Satyanarayana Reddy S S, Mahommad G B, Khan B, & Sharma R (2022) Smart healthcare: disease prediction using the cuckoo-enabled deep classifier in IoT framework. Sci Progr. https://doi.org/10.1155/2022/2090681

Monday H N, Li J P, Nneji G U, James E C, Chikwendu I A, Ejiyi C J, & Mgbejime G T (2021) The capability of multi resolution analysis: a case study of COVID-19 diagnosis. In: 2021 4th international conference on pattern recognition and artificial intelligence (PRAI) (pp 236–242). IEEE

Al Rahhal MM, Bazi Y, Jomaa RM, Zuair M, Al Ajlan N (2021) Deep learning approach for COVID-19 detection in computed tomography images. Cmc-Comput Mater Continua 67(2):2093–2110

Men L, Ilk N, Tang X, Liu Y (2021) Multi-disease prediction using LSTM recurrent neural networks. Expert Syst Appl 177:114905

Ahmad U, Song H, Bilal A, Mahmood S, Alazab M, Jolfaei A & Saeed U (2021) A novel deep learning model to secure internet of things in healthcare. Mach Intell Big Data Anal Cybersec Appl 341–353

Mansour RF, El Amraoui A, Nouaouri I, Díaz VG, Gupta D, Kumar S (2021) Artificial intelligence and internet of things enabled disease diagnosis model for smart healthcare systems. IEEE Access 9:45137–45146

Sevi M, & Aydin İ (2020) COVID-19 detection using deep learning methods. In: 2020 international conference on data analytics for business and industry: way towards a sustainable economy (ICDABI) (pp 1–6). IEEE

Martinsson J, Schliep A, Eliasson B, Mogren O (2020) Blood glucose prediction with variance estimation using recurrent neural networks. J Healthc Inform Res 4:1–18

Zhang J, Xie Y, Pang G, Liao Z, Verjans J, Li W, Xia Y (2020) Viral pneumonia screening on chest X-rays using confidence-aware anomaly detection. IEEE Trans Med Imaging 40(3):879–890

Hemdan E E D, Shouman M A, & Karar M E (2020) Covidx-net: a framework of deep learning classifiers to diagnose covid-19 in x-ray images. arXiv preprint arXiv:2003.11055

Zhu T, Li K, Chen J, Herrero P, Georgiou P (2020) Dilated recurrent neural networks for glucose forecasting in type 1 diabetes. J Healthc Inform Res 4:308–324

Cheon S, Kim J, Lim J (2019) The use of deep learning to predict stroke patient mortality. Int J Environ Res Public Health 16(11):1876

Li K, Liu C, Zhu T, Herrero P, Georgiou P (2019) GluNet: a deep learning framework for accurate glucose forecasting. IEEE J Biomed Health Inform 24(2):414–423

Wang W, Tong M, Yu M (2020) Blood glucose prediction with VMD and LSTM optimized by improved particle swarm optimization. IEEE Access 8:217908–217916

Rashid N, Hossain M A F, Ali M, Sukanya M I, Mahmud T, & Fattah S A (2020) Transfer learning based method for COVID-19 detection from chest X-ray images. In: 2020 IEEE region 10 conference (TENCON) (pp 585–590). IEEE

Arora P, Kumar H, Panigrahi BK (2020) Prediction and analysis of COVID-19 positive cases using deep learning models: a descriptive case study of India. Chaos, Solitons Fractals 139:110017

MathSciNet   Google Scholar  

Zaitcev A, Eissa MR, Hui Z, Good T, Elliott J, Benaissa M (2020) A deep neural network application for improved prediction of in type 1 diabetes. IEEE J Biomed Health Inform 24(10):2932–2941

Naz H, Ahuja S (2020) Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metab Disord 19:391–403

Download references

Acknowledgements

Author information, authors and affiliations.

Department of Information Systems and Technology, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza, Egypt

Mohammed Badawy & Nagy Ramadan

Department of Computer Sciences, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza, Egypt

Hesham Ahmed Hefny

You can also search for this author in PubMed   Google Scholar

Contributions

MB wrote the main text of the manuscript; NR and HAH revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mohammed Badawy .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests. All authors approved the final manuscript.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Badawy, M., Ramadan, N. & Hefny, H.A. Healthcare predictive analytics using machine learning and deep learning techniques: a survey. Journal of Electrical Systems and Inf Technol 10 , 40 (2023). https://doi.org/10.1186/s43067-023-00108-y

Download citation

Received : 27 December 2022

Accepted : 31 July 2023

Published : 29 August 2023

DOI : https://doi.org/10.1186/s43067-023-00108-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Healthcare prediction
  • Artificial intelligence (AI)
  • Machine learning (ML)
  • Deep learning (DL)
  • Medical diagnosis

machine learning in healthcare research papers pdf

Machine learning applications in stroke medicine: advancements, challenges, and future prospectives

Affiliations.

  • 1 Internal Medicine and Stroke Care Ward, Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties; Molecular and Clinical Medicine PhD Program, University of Palermo, Palermo, Italy.
  • 2 Internal Medicine and Stroke Care Ward, Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, University of Palermo, Palermo, Italy.
  • PMID: 37843210
  • PMCID: PMC10664112
  • DOI: 10.4103/1673-5374.382228

Stroke is a leading cause of disability and mortality worldwide, necessitating the development of advanced technologies to improve its diagnosis, treatment, and patient outcomes. In recent years, machine learning techniques have emerged as promising tools in stroke medicine, enabling efficient analysis of large-scale datasets and facilitating personalized and precision medicine approaches. This abstract provides a comprehensive overview of machine learning's applications, challenges, and future directions in stroke medicine. Recently introduced machine learning algorithms have been extensively employed in all the fields of stroke medicine. Machine learning models have demonstrated remarkable accuracy in imaging analysis, diagnosing stroke subtypes, risk stratifications, guiding medical treatment, and predicting patient prognosis. Despite the tremendous potential of machine learning in stroke medicine, several challenges must be addressed. These include the need for standardized and interoperable data collection, robust model validation and generalization, and the ethical considerations surrounding privacy and bias. In addition, integrating machine learning models into clinical workflows and establishing regulatory frameworks are critical for ensuring their widespread adoption and impact in routine stroke care. Machine learning promises to revolutionize stroke medicine by enabling precise diagnosis, tailored treatment selection, and improved prognostication. Continued research and collaboration among clinicians, researchers, and technologists are essential for overcoming challenges and realizing the full potential of machine learning in stroke care, ultimately leading to enhanced patient outcomes and quality of life. This review aims to summarize all the current implications of machine learning in stroke diagnosis, treatment, and prognostic evaluation. At the same time, another purpose of this paper is to explore all the future perspectives these techniques can provide in combating this disabling disease.

Keywords: cerebrovascular disease; deep learning; machine learning; reinforcement learning; stroke; stroke therapy; supervised learning; unsupervised learning.

Publication types

IMAGES

  1. (PDF) Machine Learning Theory and Applications for Healthcare

    machine learning in healthcare research papers pdf

  2. (PDF) A Research on Machine Learning Methods and Its Applications

    machine learning in healthcare research papers pdf

  3. (PDF) An Overview of Machine Learning and its Applications

    machine learning in healthcare research papers pdf

  4. (PDF) Systematic literature review of machine learning methods used in

    machine learning in healthcare research papers pdf

  5. (PDF) Application of Machine Learning In Healthcare

    machine learning in healthcare research papers pdf

  6. (PDF) A survey on Machine Learning Techniques in Health Care Industry

    machine learning in healthcare research papers pdf

VIDEO

  1. EU AI Act Explained: Shaping Ethical AI Development

  2. AI in Personal Healthcare A Practical Guide

  3. Unlocking the Potential: How AI is Revolutionizing Healthcare, Finance, and More

  4. Machine Learning Based Healthcare System for Investigating the Association Between Depression and Qu

  5. Harnessing Data Science: The Future of Machine Learning & More!

  6. 1 Federated Machine Learning Day1

COMMENTS

  1. (PDF) Applications of Machine Learning in Healthcare

    Abstract and Figures. Machine learning techniques in healthcare use the increasing amount of health data provided by the Internet of Things to improve patient outcomes. These techniques provide ...

  2. Machine Learning in Healthcare

    The compilation of articles and papers focused on the use of machine learning and artificial intelligence in healthcare as well as current and potential applications. ... In a recent research study, Liu, Zhang, and Razavian developed a deep learning algorithm using LSTM networks (reinforcement learning) and CNNs (supervised learning) to predict ...

  3. A Comprehensive Review on Machine Learning in Healthcare Industry

    2. Overview of Machine-Learning in Healthcare. Machine learning is a type of artificial intelligence that involves training algorithms on data so that they can make predictions or take actions without being explicitly programmed. In healthcare, machine learning has the potential to revolutionize how we diagnose, treat, and prevent diseases, as ...

  4. Artificial intelligence in healthcare: An essential guide for health

    Abstract. Artificial Intelligence (AI) is evolving rapidly in healthcare, and various AI applications have been developed to solve some of the most pressing problems that health organizations currently face. It is crucial for health leaders to understand the state of AI technologies and the ways that such technologies can be used to improve the ...

  5. Machine learning in medicine: a practical introduction

    The result will be a continuous source of data-driven insights to optimise biomedical research, public health, and health care quality improvement . Machine learning This paper will explain the process of developing (known as training ) and validating an algorithm to predict the malignancy of a sample of breast tissue based on its characteristics.

  6. Shifting machine learning for healthcare from development to ...

    In the past decade, the application of machine learning (ML) to healthcare has helped drive the automation of physician tasks as well as enhancements in clinical capabilities and access to care.

  7. Machine learning in healthcare: review, opportunities and challenges

    The integration of healthcare and technology indicates some characteristics such as; 1. Adoption of smart systems to handle health issues of human beings. 2. Improvement in quality of life of human beings using intelligent and smart systems. 3. Distribution of human workload to machine. 4.

  8. A Comprehensive Review on Machine Learning in Healthcare Industry

    Recently, various sophisticated methods, including machine learning and artificial intelligence, have been employed to examine health-related data. Medical professionals are acquiring enhanced diagnostic and treatment abilities by utilizing machine learning applications in the healthcare domain. Medical data have been used by many researchers to detect diseases and identify patterns. In the ...

  9. Full article: Systematic reviews of machine learning in healthcare: a

    Results. In total 220 SLRs covering 10,462 ML algorithms were reviewed. The main application of AI in medicine related to the clinical prediction and disease prognosis in oncology and neurology with the use of imaging data. Accuracy, specificity, and sensitivity were provided in 56%, 28%, and 25% SLRs respectively.

  10. Using machine learning for healthcare challenges and opportunities

    Machine learning (ML) and its applications in healthcare have gained a lot of attention. When enhanced computational power is combined with big data, there is an opportunity to use ML algorithms to improve health care. Supervised learning is the type of ML that can be implemented to predict labeled data based on algorithms such as linear or ...

  11. Machine Learning in Healthcare: A Review

    Machine Learning is modern and highly sophisticated technological applications became a huge trend in the industry. Machine Learning is Omni present and is widely used in various applications. It is playing a vital role in many fields like finance, Medical science and in security. Machine learning is used to discover patterns from medical data sources and provide excellent capabilities to ...

  12. Significance of machine learning in healthcare: Features, pillars and

    Machine Learning (ML) applications are making a considerable impact on healthcare. ML is a subtype of Artificial Intelligence (AI) technology that aims to improve the speed and accuracy of physicians' work. Countries are currently dealing with an overburdened healthcare system with a shortage of skilled physicians, where AI provides a big hope.

  13. PDF Machine learning in healthcare

    Machine learning in healthcare - a system's perspective [Position paper] Awais Ashfaq Center of Applied Intelligent Systems Research, Halmstad University. Sweden Halland Hospital, Region Halland. Sweden [email protected] Slawomir Nowaczyk Center of Applied Intelligent Systems Research, Halmstad University. Sweden [email protected] ABSTRACT

  14. Artificial intelligence in healthcare: transforming the practice of

    Introduction. Healthcare systems around the world face significant challenges in achieving the 'quadruple aim' for healthcare: improve population health, improve the patient's experience of care, enhance caregiver experience and reduce the rising cost of care. 1-3 Ageing populations, growing burden of chronic diseases and rising costs of healthcare globally are challenging governments ...

  15. Machine Learning in Healthcare: A Review

    This study attempts to introduce artificial intelligence and its significant subfields in machine learning algorithms and reviews the role of these subfields in various areas in healthcare such as bioinformatics, gene detection for cancer diagnosis, epileptic seizure, brain-computer interface. It also reviews the medical image processing ...

  16. PDF Chapter Applications of Machine Learning in Healthcare

    1. Introduction. The advent of digital technologies in the healthcare field is characterized by continual challenges in both application and practicality. Unification of disparate health systems have been slow and the adoption of a fully integrated healthcare system in most parts of the world has not been accomplished.

  17. Systematic Mapping Study of AI/Machine Learning in Healthcare and

    This study attempts to categorise research conducted in the area of: use of machine learning in healthcare, using a systematic mapping study methodology.In our attempt, we reviewed literature from top journals, articles, and conference papers by using the keywords use of machine learning in healthcare.We queried Google Scholar, resulted in 1400 papers, and then categorised the results on the ...

  18. Healthcare predictive analytics using machine learning and deep

    Healthcare prediction has been a significant factor in saving lives in recent years. In the domain of health care, there is a rapid development of intelligent systems for analyzing complicated data relationships and transforming them into real information for use in the prediction process. Consequently, artificial intelligence is rapidly transforming the healthcare industry, and thus comes the ...

  19. Machine learning applications in stroke medicine: advancements ...

    Machine learning promises to revolutionize stroke medicine by enabling precise diagnosis, tailored treatment selection, and improved prognostication. Continued research and collaboration among clinicians, researchers, and technologists are essential for overcoming challenges and realizing the full potential of machine learning in stroke care ...

  20. PDF Artificial Intelligence for Health and Health Care

    JASON. The MITRE Corporation 7515 Colshire Drive McLean, VA 22102-7508 (703) 983-6997. This study centers on how computer-based decision procedures, under the broad umbrella of artificial intelligence (AI), can assist in improving health and health care.

  21. Information Systems IE&IS

    Create value through intelligent processing of business information. Information Systems are at the core of modern-day organizations. Both within and between organizations. The Information Systems group studies tools and techniques that help to use them in the best possible way, to get the most value out of them. Read more.