For Vendors

99 machine learning case studies from 91 enterprises by 2024.

  • 99 use cases in 17 industries
  • 14 business processes in 14 business functions
  • Implementations in 91 companies in 20 countries
  • 10 benefits
  • Growth over 6 years
  • 9 vendors which created these case studies

Which industries leverage machine learning?

The most common use case of machine learning is Financial services which is mentioned in 19% of case studies.

The most common industries using machine learning are:

  • Financial services

Which business functions leverage machine learning?

The most common business function of machine learning is Analytics which is mentioned in 14 case studies.

Most common business functions using machine learning are:

Which processes leverage machine learning?

The top process reported in machine learning case studies is Credit appraisal.

Most common business processes using machine learning are:

  • Credit appraisal
  • Financial planning & analysis
  • Marketing analytics
  • Data quality management
  • Innovation management
  • Product development
  • Shipping / transportation management
  • Financial risk management
  • Customer journey mapping
  • Incident management
  • Performance management
  • Sales forecasting
  • Campaign management
  • Data governance

What is the geographical distribution of machine learning case studies?

Click on the countries with links to explore how that country’s machine learning market is structured including top vendors, case studies etc.

Countries that use machine learning most commonly are listed below.

  • United States of America
  • United Kingdom

What are machine learning’s use cases?

The most common use case of machine learning is customer segmentation which is mentioned in 27% of case studies.

What are machine learning’s benefits?

The most common benefit of machine learning is time saving which is mentioned in 24% of case studies.

How are machine learning case studies growing?

Growth by vendor.

Leading vendors in terms of case study contributions to machine learning are:

  • CognitiveScale

Growth over time

Years in which the;

  • The first case study in our DB was published: 2016
  • Most machine learning case studies have been published: 2019
  • The highest increase in the number of case studies was reported vs the previous year: 2018
  • The largest decrease in the number of case studies was reported vs the previous year: 2020

Comprehensive list of machine learning case studies

AIMultiple identified 99 case studies in machine learning covering 10 benefits and 99 use cases. You can learn more about these case studies in table below:

Our research on machine learning software

If you want to learn more about machine learning software, you can also check our related research articles that can assist you in your decision:

Machine Learning Case Studies with Powerful Insights

Explore the potential of machine learning through these practical machine learning case studies and success stories in various industries. | ProjectPro

Machine Learning Case Studies with Powerful Insights

Machine learning is revolutionizing how different industries function, from healthcare to finance to transportation. If you're curious about how this technology is applied in real-world scenarios, look no further. In this blog, we'll explore some exciting machine learning case studies that showcase the potential of this powerful emerging technology.

Machine-learning-based applications have quickly transformed work methods in the technological world. It is changing the way we work, live, and interact with the world around us. Machine learning is revolutionizing industries, from personalized recommendations on streaming platforms to self-driving cars.

But while the technology of artificial intelligence and machine learning may seem abstract or daunting to some, its applications are incredibly tangible and impactful. Data Scientists use machine learning algorithms to predict equipment failures in manufacturing, improve cancer diagnoses in healthcare , and even detect fraudulent activity in 5 . If you're interested in learning more about how machine learning is applied in real-world scenarios, you are on the right page. This blog will explore in depth how machine learning applications are used for solving real-world problems.

Machine Learning Case Studies

We'll start with a few case studies from GitHub that examine how machine learning is being used by businesses to retain their customers and improve customer satisfaction. We'll also look at how machine learning is being used with the help of Python programming language to detect and prevent fraud in the financial sector and how it can save companies millions of dollars in losses. Next, we will examine how top companies use machine learning to solve various business problems. Additionally, we'll explore how machine learning is used in the healthcare industry, and how this technology can improve patient outcomes and save lives.

By going through these case studies, you will better understand how machine learning is transforming work across different industries. So, let's get started!

Table of Contents

Machine learning case studies on github, machine learning case studies in python, company-specific machine learning case studies, machine learning case studies in biology and healthcare, aws machine learning case studies , azure machine learning case studies, how to prepare for machine learning case studies interview.

This section has machine learning case studies along with their GitHub repository that contains the sample code.

1. Customer Churn Prediction

Predicting customer churn is essential for businesses interested in retaining customers and maximizing their profits. By leveraging historical customer data, machine learning algorithms can identify patterns and factors that are correlated with churn, enabling businesses to take proactive steps to prevent it.

Customer Churn Prediction Machine Learning Case Study

In this case study, you will study how a telecom company uses machine learning for customer churn prediction. The available data contains information about the services each customer signed up for, their contact information, monthly charges, and their demographics. The goal is to first analyze the data at hand with the help of methods used in Exploratory Data Analysis . It will assist in picking a suitable machine-learning algorithm. The five machine learning models used in this case-study are AdaBoost, Gradient Boost, Random Forest, Support Vector Machines, and K-Nearest Neighbors. These models are used to determine which customers are at risk of churn. 

By using machine learning for churn prediction, businesses can better understand customer behavior, identify areas for improvement, and implement targeted retention strategies. It can result in increased customer loyalty, higher revenue, and a better understanding of customer needs and preferences. This case study example will help you understand how machine learning is a valuable tool for any business looking to improve customer retention and stay ahead of the competition.

GitHub Repository: https://github.com/Pradnya1208/Telecom-Customer-Churn-prediction  

ProjectPro Free Projects on Big Data and Data Science

2. Market Basket Analysis

Market basket analysis is a common application of machine learning in retail and e-commerce, where it is used to identify patterns and relationships between products that are frequently purchased together. By leveraging this information, businesses can make informed decisions about product placement, promotions, and pricing strategies.

Market Basket Analysis Machine Learning Case Study

In this case study, you will utilize the EDA methods to carefully analyze the relationships among different variables in the data. Next, you will study how to use the Apriori algorithm to identify frequent itemsets and association rules, which describe the likelihood of a product being purchased given the presence of another product. These rules can generate recommendations, optimize product placement, and increase sales, and they can also be used for customer segmentation.  

Using machine learning for market basket analysis allows businesses to understand customer behavior better, identify cross-selling opportunities, and increase customer satisfaction. It has the potential to result in increased revenue, improved customer loyalty, and a better understanding of customer needs and preferences. 

GitHub Repository: https://github.com/kkrusere/Market-Basket-Analysis-on-the-Online-Retail-Data

3. Predicting Prices for Airbnb

Airbnb is a tech company that enables hosts to rent out their homes, apartments, or rooms to guests interested in temporary lodging. One of the key challenges hosts face is optimizing the rent prices for the customers. With the help of machine learning, hosts can have rough estimates of the rental costs based on various factors such as location, property type, amenities, and availability.

The first step, in this case study, is to clean the dataset to handle missing values, duplicates, and outliers. In the same step, the data is transformed, and the data is prepared for modeling with the help of feature engineering methods. The next step is to perform EDA to understand how the rental listings are spread across different cities in the US. Next, you will learn how to visualize how prices change over time, looking at trends for different seasons, months, days of the week, and times of the day.

The final step involves implementing ML models like linear regression (ridge and lasso), Naive Bayes, and Random Forests to produce price estimates for listings. You will learn how to compare the outcome of these models and evaluate their performance.

GitHub Repository: https://github.com/samuelklam/airbnb-pricing-prediction  

New Projects

4. Titanic Disaster Analysis

The Titanic Machine Learning Case Study is a classic example in the field of data science and machine learning. The study is based on the dataset of passengers aboard the Titanic when it sank in 1912. The study's goal is to predict whether a passenger survived or not based on their demographic and other information.

The dataset contains information on 891 passengers, including their age, gender, ticket class, fare paid, as well as whether or not they survived the disaster. The first step in the analysis is to explore the dataset and identify any missing values or outliers. Once this is done, the data is preprocessed to prepare it for modeling.

Titanic Disaster Analysis Machine Learning Case Study

The next step is to build a predictive model using various machine learning algorithms, such as logistic regression, decision trees, and random forests. These models are trained on a subset of the data and evaluated on another subset to ensure they can generalize well to new data.

Finally, the model is used to make predictions on a test dataset, and the model performance is measured using various metrics such as accuracy, precision, and recall. The study results can be used to improve safety protocols and inform future disaster response efforts.

GitHub Repository: https://github.com/ashishpatel26/Titanic-Machine-Learning-from-Disaster  

Here's what valued users are saying about ProjectPro

user profile

Graduate Research assistance at Stony Brook University

user profile

Tech Leader | Stanford / Yale University

Not sure what you are looking for?

If you are looking for a sample of machine learning case study in python, then keep reading this space.

5. Loan Application Classification

Financial institutions receive tons of requests for lending money by borrowers and making decisions for each request is a crucial task. Manually processing these requests can be a time-consuming and error-prone process, so there is an increasing demand for machine learning to improve this process by automation.

Loan Application Classification Machine Learning Case Study

You can work on this Loan Dataset on Kaggle to get started on this one of the most real-world case studies in the financial industry. The dataset contains 614 unique values for 13 columns: Follow the below-mentioned steps to get started on this case study.

Analyze the dataset and explore how various factors such as gender, marital status, and employment affect the loan amount and status of the loan application .

Select the features to automate the process of classification of loan applications.

Apply machine learning models such as logistic regression, decision trees, and random forests to the features and compare their performance using statistical metrics.

This case study falls under the umbrella of supervised learning problems in machine learning and demonstrates how ML models are used to automate tasks in the financial industry.

With these Data Science Projects in Python , your career is bound to reach new heights. Start working on them today!

6. Computer Price Estimation

Whenever one thinks of buying a new computer, the first thing that comes to mind is to curate a list of hardware specifications that best suit their needs. The next step is browsing different websites and looking for the cheapest option available. Performing all these processes can be time-consuming and require a lot of effort. But you don’t have to worry as machine learning can help you build a system that can estimate the price of a computer system by taking into account its various features.

Computer Price Estimation Machine Learning Case Study

This sample basic computer dataset on Kaggle can help you develop a price estimation model that can analyze historical data and identify patterns and trends in the relationship between computer specifications and prices. By training a machine learning model on this data, the model can learn to make accurate predictions of prices for new or unseen computer components. Machine learning algorithms such as K-Nearest Neighbours, Decision Trees, Random Forests, ADA Boost and XGBoost can effectively capture complex relationships between features and prices, leading to more accurate price estimates. 

Besides saving time and effort compared to manual estimation methods, this project also has a business use case as it can provide stakeholders with valuable insights into market trends and consumer preferences.

7. House Price Prediction

Here is a machine learning case study that aims to predict the median value of owner-occupied homes in Boston suburbs based on various features such as crime rate, number of rooms, and pupil-teacher ratio.

House Price Prediction  Machine Learning Case Study

Start working on this study by collecting the data from the publicly available UCI Machine Learning Repository, which contains information about 506 neighborhoods in the Boston area. The dataset includes 13 features such as per capita crime rate, average number of rooms per dwelling, and the proportion of owner-occupied units built before 1940. You can gain more insights into this data by using EDA techniques. Then prepare the dataset for implementing ML models by handling missing values, converting categorical features to numerical ones, and scaling the data.

Use machine learning algorithms such as Linear Regression, Lasso Regression, and Random Forest to predict house prices for different neighborhoods in the Boston area. Select the best model by comparing the performance of each one using metrics such as mean squared error, mean absolute error, and R-squared.

This section has machine learning case studies of different firms across various industries.

8. Machine Learning Case Study on Dell

Dell Technologies is a multinational technology company that designs, develops, and sells computers, servers, data storage devices, network switches, software, and other technology products and services. Dell is one of the world's most prominent PC vendors and serves customers in over 180 countries. As Data is an integral component of Dell's hard drive, the marketing team of Dell required a data-focused solution that would improve response rates and demonstrate why some words and phrases are more effective than others.

Machine Learning Case Study on Dell

Dell contacted Persado and partnered with the firm that utilizes AI to create marketing content. Persado helped Dell revamp the email marketing strategy and leverage the data analytics to garner their audiences' attention. The statistics revealed that the partnership resulted in a noticeable increase in customer engagement as the page visits by 22% on average and a 50% average increase in CTR.

Dell currently relies on ML methods to improve their marketing strategy for emails, banners, direct mail, Facebook ads, and radio content.

Explore Categories

9. Machine Learning Case Study on Harley Davidson

In the current environment, it is challenging to overcome traditional marketing. An artificial intelligence powered robot, Albert is appealing for a business like Harley Davidson. Robots are now directing traffic, creating news stories, working in hotels, and even running McDonald's, thanks to machine learning and artificial intelligence.

There are many marketing channels that Albert can be applied to, including Email and social media.It automatically prepares customized creative copies and forecasts which customers will most likely convert.

Machine Learning Case Study on Harley Davidson

The only company to make use of Albert is Harley Davidson. The business examined customer data to ascertain the activities of past clients who successfully made purchases and invested more time than usual across different pages on the website. With this knowledge, Albert divided the customer base into groups and adjusted the scale of test campaigns accordingly.

Results reveal that using Albert increased Harley Davidson's sales by 40%. The brand also saw a 2,930% spike in leads, 50% of which came from very effective "lookalikes" found by machine learning and artificial intelligence.

10. Machine Learning Case Study on Zomato

Zomato is a popular online platform that provides restaurant search and discovery services, online ordering and delivery, and customer reviews and ratings. Founded in India in 2008, the company has expanded to over 24 countries and serves millions of users globally. Over the years, it has become a popular choice for consumers to browse the ratings of different restaurants in their area. 

Machine Learning Case Study on Zomato

To provide the best restaurant options to their customers, Zomato ensures to hand-pick the ones likely to perform well in the future. Machine Learning can help zomato in making such decisions by considering the different restaurant features. You can work on this sample Zomato Restaurants Data and experiment with how machine learning can be useful to Zomato. The dataset has the details of 9551 restaurants. The first step should involve careful analysis of the data and identifying outliers and missing values in the dataset. Treat them using statistical methods and then use regression models to predict the rating of different restaurants.

The Zomato Case study is one of the most popular machine learning startup case studies among data science enthusiasts.

11. Machine Learning Case Study on Tesla

Tesla, Inc. is an American electric vehicle and clean energy company founded in 2003 by Elon Musk. The company designs, manufactures, and sells electric cars, battery storage systems, and solar products. Tesla has pioneered the electric vehicle industry and has popularized high-capacity lithium-ion batteries and regenerative braking systems. The company strongly focuses on innovation, sustainability, and reducing the world's dependence on fossil fuels.

Tesla uses machine learning in various ways to enhance the performance and features of its electric vehicles. One of the most notable applications of machine learning at Tesla is in its Autopilot system, which uses a combination of cameras, sensors, and machine learning algorithms to enable advanced driver assistance features such as lane centering, adaptive cruise control, and automatic emergency braking.

Machine Learning Case Study on Tesla

Tesla's Autopilot system uses deep neural networks to process large amounts of real-world driving data and accurately predict driving behavior and potential hazards. It enables the system to learn and adapt over time, improving its accuracy and responsiveness.

Additionally, Tesla also uses machine learning in its battery management systems to optimize the performance and longevity of its batteries. Machine learning algorithms are used to model and predict the behavior of the batteries under different conditions, enabling Tesla to optimize charging rates, temperature control, and other factors to maximize the lifespan and performance of its batteries.

Unlock the ProjectPro Learning Experience for FREE

12. Machine Learning Case Study on Amazon

Amazon Prime Video uses machine learning to ensure high video quality for its users. The company has developed a system that analyzes video content and applies various techniques to enhance the viewing experience.

Machine Learning Case Study on Amazon

The system uses machine learning algorithms to automatically detect and correct issues such as unexpected black frames, blocky frames, and audio noise. For detecting block corruption, residual neural networks are used. After training the algorithm on the large dataset, a threshold of 0.07 was set for the corrupted-area ratio to mark the areas of the frame that have block corruption. For detecting unwanted noise in the audio, a model based on a pre-trained audio neural network is used to classify a one-second audio sample into one of these classes: audio hum, audio distortion, audio diss, audio clicks, and no defect. The lip sync is handled using the SynNet architecture.

By using machine learning to optimize video quality, Amazon can deliver a consistent and high-quality viewing experience to its users, regardless of the device or network conditions they are using. It helps maintain customer satisfaction and loyalty and ensures that Amazon remains a competitive video streaming market leader.

Machine Learning applications are not only limited to financial and tech use cases. It also finds its use in the Healthcare industry. So, here are a few machine learning case studies that showcase the use of this technology in the Biology and Healthcare domain.

13. Microbiome Therapeutics Development

The development of microbiome therapeutics involves the study of the interactions between the human microbiome and various diseases and identifying specific microbial strains or compositions that can be used to treat or prevent these diseases. Machine learning plays a crucial role in this process by enabling the analysis of large, complex datasets and identifying patterns and correlations that would be difficult or impossible to detect through traditional methods.

Machine Learning in Microbiome Therapeutics Development

Machine learning algorithms can analyze microbiome data at various levels, including taxonomic composition, functional pathways, and gene expression profiles. These algorithms can identify specific microbial strains or communities associated with different diseases or conditions and can be used to develop targeted therapies.

Besides that, machine learning can be used to optimize the design and delivery of microbiome therapeutics. For example, machine learning algorithms can be used to predict the efficacy of different microbial strains or compositions and optimize these therapies' dosage and delivery mechanisms.

14. Mental Illness Diagnosis

Machine learning is increasingly being used to develop predictive models for diagnosing and managing mental illness. One of the critical advantages of machine learning in this context is its ability to analyze large, complex datasets and identify patterns and correlations that would be difficult for human experts to detect.

Machine learning algorithms can be trained on various data sources, including clinical assessments, self-reported symptoms, and physiological measures such as brain imaging or heart rate variability. These algorithms can then be used to develop predictive models to identify individuals at high risk of developing a mental illness or who are likely to experience a particular symptom or condition.

Machine Learning Case Study for Mental Illness Diagnosis

One example of machine learning being used to predict mental illness is in the development of suicide risk assessment tools. These tools use machine learning algorithms to analyze various risk factors, such as demographic information, medical history, and social media activity, to identify individuals at risk of suicide. These tools can be used to guide early intervention and support for individuals struggling with mental health issues.

One can also a build a Chatbot using Machine learning and Natural Lanaguage Processing that can analyze the responses of the user and recommend them the necessary steps that they can immediately take.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

15. 3D Bioprinting

Another popular subject in the biotechnology industry is Bioprinting. Based on a computerized blueprint, the printer prints biological tissues like skin, organs, blood arteries, and bones layer by layer using cells and biomaterials, also known as bioinks.

They can be made in printers more ethically and economically than by relying on organ donations. Additionally, synthetic construct tissue is used for drug testing instead of testing on animals or people. Due to its tremendous complexity, the entire technology is still in its early stages of maturity. Data science is one of the most essential components to handle this complexity of printing.

3D Bioprinting  Machine Learning Case Study

The qualities of the bioinks, which have inherent variability, or the many printing parameters, are just a couple of the many variables that affect the printing process and quality. For instance, Bayesian optimization improves the likelihood of producing useable output and optimizes the printing process.

A crucial element of the procedure is the printing speed. To estimate the optimal speed, siamese network models are used. Convolutional neural networks are applied to photographs of the layer-by-layer tissue to detect material, or tissue abnormalities.

In this section, you will find a list of machine learning case studies that have utilized Amazon Web Services to create machine learning based solutions.

16. Machine Learning Case Study on AutoDesk

Autodesk is a US-based software company that provides solutions for 3D design, engineering, and entertainment industries. The company offers a wide range of software products and services, including computer-aided design (CAD) software, 3D animation software, and other tools used in architecture, construction, engineering, manufacturing, media and entertainment industries.

Autodesk utilizes machine learning (ML) models that are constructed on Amazon SageMaker, a managed ML service provided by Amazon Web Services (AWS), to assist designers in categorizing and sifting through a multitude of versions created by generative design procedures and selecting the most optimal design.  ML techniques built with Amazon SageMaker help Autodesk progress from intuitive design to exploring the boundaries of generative design for their customers to produce innovative products that can even be life-changing. As an example, Edera Safety, a design studio located in Austria, created a superior and more effective spine protector by utilizing Autodesk's generative design process constructed on AWS.

17. Machine Learning Case Study on Capital One

Capital One is a financial services company in the United States that offers a range of financial products and services to consumers, small businesses, and commercial clients. The company provides credit cards, loans, savings and checking accounts, investment services, and other financial products and services.

Capital One leverages AWS to transform data into valuable insights using machine learning, enabling the company to innovate rapidly on behalf of its customers.  To power its machine-learning innovation, Capital One utilizes a range of AWS services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Relational Database Service (Amazon RDS), and AWS Lambda. AWS is enabling Capital One to implement flexible DevOps processes, enabling the company to introduce new products and features to the market in just a few weeks instead of several months or years. Additionally, AWS assists Capital One in providing data to and facilitating the training of sophisticated machine-learning analysis and customer-service solutions. The company also integrates its contact centers with its CRM and other critical systems, while simultaneously attracting promising entry-level and mid-career developers and engineers with the opportunity to gain knowledge and innovate with the most up-to-date cloud technologies.

18. Machine Learning Case Study on BuildFax

In 2008, BuildFax began by collecting widely scattered building permit data from different parts of the United States and distributing it to various businesses, including building inspectors, insurance companies, and economic analysts. Today, it offers custom-made solutions to these professions and several other services. These services comprise indices that monitor trends like commercial construction, and housing remodels.

Machine Learning Case Study on BuildFax

Source: aws.amazon.com/solutions/case-studies

The primary customer base of BuildFax is insurance companies that splurge billion dollars on rood losses. BuildFax assists its customers in developing policies and premiums by evaluating the roof losses for them. Initially, it relied on general data and ZIP codes for building predictive models but they did not prove to be useful as they were not accurate and were slightly complex in nature. It thus required a way out of building a solution that could support more accurate results for property-specific estimates. It thus chose Amazon Machine Learning for predictive modeling. By employing Amazon Machine Learning, it is possible for the company to offer insurance companies and builders personalized estimations of roof-age and job-cost, which are specific to a particular property and it does not have to depend on more generalized estimates based on ZIP codes.  It now utilizes customers' data and data from public sources to create predictive models.

What makes Python one of the best programming languages for ML Projects? The answer lies in these solved and end-to-end Machine Learning Projects in Python . Check them out now!

This section will present you with a list of machine learning case studies that showcase how companies have leveraged Microsoft Azure Services for completing machine learning tasks in their firm.

19. Machine Learning Case Study for an Enterprise Company

Consider a company (Azure customer) in the Electronic Design Automation industry that provides software, hardware, and IP for electronic systems and semiconductor companies. Their finance team was struggling to manage account receivables efficiently, so they wanted to use machine learning to predict payment outcomes and reduce outstanding receivables. The team faced a major challenge with managing change data capture using Azure Data Factory . A3S provided a solution by automating data migration from SAP ECC to Azure Synapse and offering fully automated analytics as a service, which helped the company streamline their account receivables management. It was able to achieve the entire scenario from data ingestion to analytics within a week, and they plan to use A3S for other analytics initiatives.

20. Machine Learning Case Study on Shell

Royal Dutch Shell, a global company managing oil wells to retail petrol stations, is using computer vision technology to automate safety checks at its service stations. In partnership with Microsoft, it has developed the project called Video Analytics for Downstream Retail (VADR) that uses machine vision and image processing to detect dangerous behavior and alert the servicemen. It uses OpenCV and Azure Databricks in the background highlighting how Azure can be used for personalised applications. Once the projects shows decent results in the countries where it has been deployed (Thailand and Singapore), Shell plans to expand the project further by going global with the VADR project. 

21. Machine Learning Case Study on TransLink

TransLink, a transportation company in Vancouver, deployed 18,000 different sets of machine learning models using Azure Machine Learning to predict bus departure times and determine bus crowdedness. The models take into account factors such as traffic, bad weather and at-capacity buses. The deployment led to an improvement in predicted bus departure times of 74%. The company also created a mobile app that allows people to plan their trips based on how at-capacity a bus might be at different times of day.

22. Machine Learning Case Study on XBox

Microsoft Azure Personaliser is a cloud-based service that uses reinforcement learning to select the best content for customers based on up-to-date information about them, the context, and the application. Custom recommender services can also be created using Azure Machine Learning. The Xbox One group used Cognitive Services Personaliser to find content suited to each user, which resulted in a 40% increase in user engagement compared to a random personalisation policy on the Xbox platform.

All the mentioned case studies in this blog will help you explore the application of machine learning in solving real problems across different industries. But you must not stop after working on them if you are preparing for an interview and intend to showcase that you have mastered the art of implementing ML algorithms, and you must practice more such caste studies in machine learning.

And if you have decided to dive deeper into machine learning, data science, and big data, be sure to check out ProjectPro , which offers a repository of solved projects in data science and big data. With a wide range of projects, you can explore different techniques and approaches and build your machine learning and data science skills . Our repository has a project for each one of you, irrespective of your academic and professional background. The customer-specific learning path is likely to help you find your way to making a mark in this newly emerging field. So why wait? Start exploring today and see what you can accomplish with big data and data science ! 

Access Data Science and Machine Learning Project Code Examples

1. What is a case study in machine learning?

A case study in machine learning is an in-depth analysis of a real-world problem or scenario, where machine learning techniques are applied to solve the problem or provide insights. Case studies can provide valuable insights into the application of machine learning and can be used as a basis for further research or development.

2. What is a good use case for machine learning?

A good use case for machine learning is any scenario with a large and complex dataset and where there is a need to identify patterns, predict outcomes, or automate decision-making based on that data. It could include fraud detection, predictive maintenance, recommendation systems, and image or speech recognition, among others.

3. What are the 3 basic types of machine learning problems?

The three basic types of machine learning problems are supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, the algorithm is trained on labeled data. In unsupervised learning, the algorithm seeks to identify patterns in unstructured data. In reinforcement learning, the algorithm learns through trial and error based on feedback from the environment.

4. What are the 4 basics of machine learning?

The four basics of machine learning are data preparation, model selection, model training, and model evaluation. Data preparation involves collecting, cleaning, and preparing data for use in training models. Model selection involves choosing the appropriate algorithm for a given task. Model training involves optimizing the chosen algorithm to achieve the desired outcome. Model evaluation consists of assessing the performance of the trained model on new data.

Access Solved Big Data and Data Science Projects

About the Author

author profile

Manika Nagpal is a versatile professional with a strong background in both Physics and Data Science. As a Senior Analyst at ProjectPro, she leverages her expertise in data science and writing to create engaging and insightful blogs that help businesses and individuals stay up-to-date with the

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 September 2022

Machine learning in project analytics: a data-driven framework and case study

  • Shahadat Uddin 1 ,
  • Stephen Ong 1 &
  • Haohui Lu 1  

Scientific Reports volume  12 , Article number:  15252 ( 2022 ) Cite this article

9786 Accesses

13 Citations

18 Altmetric

Metrics details

  • Applied mathematics
  • Computational science

The analytic procedures incorporated to facilitate the delivery of projects are often referred to as project analytics. Existing techniques focus on retrospective reporting and understanding the underlying relationships to make informed decisions. Although machine learning algorithms have been widely used in addressing problems within various contexts (e.g., streamlining the design of construction projects), limited studies have evaluated pre-existing machine learning methods within the delivery of construction projects. Due to this, the current research aims to contribute further to this convergence between artificial intelligence and the execution construction project through the evaluation of a specific set of machine learning algorithms. This study proposes a machine learning-based data-driven research framework for addressing problems related to project analytics. It then illustrates an example of the application of this framework. In this illustration, existing data from an open-source data repository on construction projects and cost overrun frequencies was studied in which several machine learning models (Python’s Scikit-learn package) were tested and evaluated. The data consisted of 44 independent variables (from materials to labour and contracting) and one dependent variable (project cost overrun frequency), which has been categorised for processing under several machine learning models. These models include support vector machine, logistic regression, k -nearest neighbour, random forest, stacking (ensemble) model and artificial neural network. Feature selection and evaluation methods, including the Univariate feature selection, Recursive feature elimination, SelectFromModel and confusion matrix, were applied to determine the most accurate prediction model. This study also discusses the generalisability of using the proposed research framework in other research contexts within the field of project management. The proposed framework, its illustration in the context of construction projects and its potential to be adopted in different contexts will significantly contribute to project practitioners, stakeholders and academics in addressing many project-related issues.

Similar content being viewed by others

case study of machine learning

An ensemble-based machine learning solution for imbalanced multiclass dataset during lithology log generation

case study of machine learning

Prediction of jumbo drill penetration rate in underground mines using various machine learning approaches and traditional models

case study of machine learning

An efficient machine learning approach for predicting concrete chloride resistance using a comprehensive dataset

Introduction.

Successful projects require the presence of appropriate information and technology 1 . Project analytics provides an avenue for informed decisions to be made through the lifecycle of a project. Project analytics applies various statistics (e.g., earned value analysis or Monte Carlo simulation) among other models to make evidence-based decisions. They are used to manage risks as well as project execution 2 . There is a tendency for project analytics to be employed due to other additional benefits, including an ability to forecast and make predictions, benchmark with other projects, and determine trends such as those that are time-dependent 3 , 4 , 5 . There has been increasing interest in project analytics and how current technology applications can be incorporated and utilised 6 . Broadly, project analytics can be understood on five levels 4 . The first is descriptive analytics which incorporates retrospective reporting. The second is known as diagnostic analytics , which aims to understand the interrelationships and underlying causes and effects. The third is predictive analytics which seeks to make predictions. Subsequent to this is prescriptive analytics , which prescribes steps following predictions. Finally, cognitive analytics aims to predict future problems. The first three levels can be applied with ease with the help of technology. The fourth and fifth steps require data that is generally more difficult to obtain as they may be less accessible or unstructured. Further, although project key performance indicators can be challenging to define 2 , identifying common measurable features facilitates this 7 . It is anticipated that project analytics will continue to experience development due to its direct benefits to the major baseline measures focused on productivity, profitability, cost, and time 8 . The nature of project management itself is fluid and flexible, and project analytics allows an avenue for which machine learning algorithms can be applied 9 .

Machine learning within the field of project analytics falls into the category of cognitive analytics, which deals with problem prediction. Generally, machine learning explores the possibilities of computers to improve processes through training or experience 10 . It can also build on the pre-existing capabilities and techniques prevalent within management to accomplish complex tasks 11 . Due to its practical use and broad applicability, recent developments have led to the invention and introduction of newer and more innovative machine learning algorithms and techniques. Artificial intelligence, for instance, allows for software to develop computer vision, speech recognition, natural language processing, robot control, and other applications 10 . Specific to the construction industry, it is now used to monitor construction environments through a virtual reality and building information modelling replication 12 or risk prediction 13 . Within other industries, such as consumer services and transport, machine learning is being applied to improve consumer experiences and satisfaction 10 , 14 and reduce the human errors of traffic controllers 15 . Recent applications and development of machine learning broadly fall into the categories of classification, regression, ranking, clustering, dimensionality reduction and manifold learning 16 . Current learning models include linear predictors, boosting, stochastic gradient descent, kernel methods, and nearest neighbour, among others 11 . Newer and more applications and learning models are continuously being introduced to improve accessibility and effectiveness.

Specific to the management of construction projects, other studies have also been made to understand how copious amounts of project data can be used 17 , the importance of ontology and semantics throughout the nexus between artificial intelligence and construction projects 18 , 19 as well as novel approaches to the challenges within this integration of fields 20 , 21 , 22 . There have been limited applications of pre-existing machine learning models on construction cost overruns. They have predominantly focussed on applications to streamline the design processes within construction 23 , 24 , 25 , 26 , and those which have investigated project profitability have not incorporated the types and combinations of algorithms used within this study 6 , 27 . Furthermore, existing applications have largely been skewed towards one type or another 28 , 29 .

In addition to the frequently used earned value method (EVM), researchers have been applying many other powerful quantitative methods to address a diverse range of project analytics research problems over time. Examples of those methods include time series analysis, fuzzy logic, simulation, network analytics, and network correlation and regression. Time series analysis uses longitudinal data to forecast an underlying project's future needs, such as the time and cost 30 , 31 , 32 . Few other methods are combined with EVM to find a better solution for the underlying research problems. For example, Narbaev and De Marco 33 integrated growth models and EVM for forecasting project cost at completion using data from construction projects. For analysing the ongoing progress of projects having ambiguous or linguistic outcomes, fuzzy logic is often combined with EVM 34 , 35 , 36 . Yu et al. 36 applied fuzzy theory and EVM for schedule management. Ponz-Tienda et al. 35 found that using fuzzy arithmetic on EVM provided more objective results in uncertain environments than the traditional methodology. Bonato et al. 37 integrated EVM with Monte Carlo simulation to predict the final cost of three engineering projects. Batselier and Vanhoucke 38 compared the accuracy of the project time and cost forecasting using EVM and simulation. They found that the simulation results supported findings from the EVM. Network methods are primarily used to analyse project stakeholder networks. Yang and Zou 39 developed a social network theory-based model to explore stakeholder-associated risks and their interactions in complex green building projects. Uddin 40 proposed a social network analytics-based framework for analysing stakeholder networks. Ong and Uddin 41 further applied network correlation and regression to examine the co-evolution of stakeholder networks in collaborative healthcare projects. Although many other methods have already been used, as evident in the current literature, machine learning methods or models are yet to be adopted for addressing research problems related to project analytics. The current investigation is derived from the cognitive analytics component of project analytics. It proposes an approach for determining hidden information and patterns to assist with project delivery. Figure  1 illustrates a tree diagram showing different levels of project analytics and their associated methods from the literature. It also illustrates existing methods within the cognitive component of project analytics to where the application of machine learning is situated contextually.

figure 1

A tree diagram of different project analytics methods. It also shows where the current study belongs to. Although earned value analysis is commonly used in project analytics, we do not include it in this figure since it is used in the first three levels of project analytics.

Machine learning models have several notable advantages over traditional statistical methods that play a significant role in project analytics 42 . First, machine learning algorithms can quickly identify trends and patterns by simultaneously analysing a large volume of data. Second, they are more capable of continuous improvement. Machine learning algorithms can improve their accuracy and efficiency for decision-making through subsequent training from potential new data. Third, machine learning algorithms efficiently handle multi-dimensional and multi-variety data in dynamic or uncertain environments. Fourth, they are compelling to automate various decision-making tasks. For example, machine learning-based sentiment analysis can easily a negative tweet and can automatically take further necessary steps. Last but not least, machine learning has been helpful across various industries, for example, defence to education 43 . Current research has seen the development of several different branches of artificial intelligence (including robotics, automated planning and scheduling and optimisation) within safety monitoring, risk prediction, cost estimation and so on 44 . This has progressed from the applications of regression on project cost overruns 45 to the current deep-learning implementations within the construction industry 46 . Despite this, the uses remain largely limited and are still in a developmental state. The benefits of applications are noted, such as optimising and streamlining existing processes; however, high initial costs form a barrier to accessibility 44 .

The primary goal of this study is to demonstrate the applicability of different machine learning algorithms in addressing problems related to project analytics. Limitations in applying machine learning algorithms within the context of construction projects have been explored previously. However, preceding research has mainly been conducted to improve the design processes specific to construction 23 , 24 , and those investigating project profitabilities have not incorporated the types and combinations of algorithms used within this study 6 , 27 . For instance, preceding research has incorporated a different combination of machine-learning algorithms in research of predicting construction delays 47 . This study first proposed a machine learning-based data-driven research framework for project analytics to contribute to the proposed study direction. It then applied this framework to a case study of construction projects. Although there are three different machine learning algorithms (supervised, unsupervised and semi-supervised), the supervised machine learning models are most commonly used due to their efficiency and effectiveness in addressing many real-world problems 48 . Therefore, we will use machine learning to represent supervised machine learning throughout the rest of this article. The contribution of this study is significant in that it considers the applications of machine learning within project management. Project management is often thought of as being very fluid in nature, and because of this, applications of machine learning are often more difficult 9 , 49 . Further to this, existing implementations have largely been limited to safety monitoring, risk prediction, cost estimation and so on 44 . Through the evaluation of machine-learning applications, this study further demonstrates a case study for which algorithms can be used to consider and model the relationship between project attributes and a project performance measure (i.e., cost overrun frequency).

Machine learning-based framework for project analytics

When and why machine learning for project analytics.

Machine learning models are typically used for research problems that involve predicting the classification outcome of a categorical dependent variable. Therefore, they can be applied in the context of project analytics if the underlying objective variable is a categorical one. If that objective variable is non-categorical, it must first be converted into a categorical variable. For example, if the objective or target variable is the project cost, we can convert this variable into a categorical variable by taking only two possible values. The first value would be 0 to indicate a low-cost project, and the second could be 1 for showing a high-cost project. The average or median cost value for all projects under consideration can be considered for splitting project costs into low-cost and high-cost categories.

For data-driven decision-making, machine learning models are advantageous. This is because traditional statistical methods (e.g., ordinary least square (OLS) regression) make assumptions about the underlying research data to produce explicit formulae for the objective target measures. Unlike these statistical methods, machine learning algorithms figure out patterns on their own directly from the data. For instance, for a non-linear but separable dataset, an OLS regression model will not be the right choice due to its assumption that the underlying data must be linear. However, a machine learning model can easily separate the dataset into the underlying classes. Figure  2 (a) presents a situation where machine learning models perform better than traditional statistical methods.

figure 2

( a ) An illustration showing the superior performance of machine learning models compared with the traditional statistical models using an abstract dataset with two attributes (X 1 and X 2 ). The data points within this abstract dataset consist of two classes: one represented with a transparent circle and the second class illustrated with a black-filled circle. These data points are non-linear but separable. Traditional statistical models (e.g., ordinary least square regression) will not accurately separate these data points. However, any machine learning model can easily separate them without making errors; and ( b ) Traditional programming versus machine learning.

Similarly, machine learning models are compelling if the underlying research dataset has many attributes or independent measures. Such models can identify features that significantly contribute to the corresponding classification performance regardless of their distributions or collinearity. Traditional statistical methods have become prone to biased results when there exists a correlation between independent variables. Machine learning-based current studies specific to project analytics have been largely limited. Despite this, there have been tangential studies on the use of artificial intelligence to improve cost estimations as well as risk prediction 44 . Additionally, models have been implemented in the optimisation of existing processes 50 .

Machine learning versus traditional programming

Machine learning can be thought of as a process of teaching a machine (i.e., computers) to learn from data and adjust or apply its present knowledge when exposed to new data 42 . It is a type of artificial intelligence that enables computers to learn from examples or experiences. Traditional programming requires some input data and some logic in the form of code (program) to generate the output. Unlike traditional programming, the input data and their corresponding output are fed to an algorithm to create a program in machine learning. This resultant program can capture powerful insights into the data pattern and can be used to predict future outcomes. Figure  2 (b) shows the difference between machine learning and traditional programming.

Proposed machine learning-based framework

Figure  3 illustrates the proposed machine learning-based research framework of this study. The framework starts with breaking the project research dataset into the training and test components. As mentioned in the previous section, the research dataset may have many categorical and/or nominal independent variables, but its single dependent variable must be categorical. Although there is no strict rule for this split, the training data size is generally more than or equal to 50% of the original dataset 48 .

figure 3

The proposed machine learning-based data-driven framework.

Machine learning algorithms can handle variables that have only numerical outcomes. So, when one or more of the underlying categorical variables have a textual or string outcome, we must first convert them into the corresponding numerical values. Suppose a variable can take only three textual outcomes (low, medium and high). In that case, we could consider, for example, 1 to represent low , 2 to represent medium , and 3 to represent high . Other statistical techniques, such as the RIDIT (relative to an identified distribution) scoring 51 , can also be used to convert ordered categorical measurements into quantitative ones. RIDIT is a parametric approach that uses probabilistic comparison to determine the statistical differences between ordered categorical groups. The remaining components of the proposed framework have been briefly described in the following subsections.

Model-building procedure

The next step of the framework is to follow the model-building procedure to develop the desired machine learning models using the training data. The first step of this procedure is to select suitable machine learning algorithms or models. Among the available machine learning algorithms, the commonly used ones are support vector machine, logistic regression, k -nearest neighbours, artificial neural network, decision tree and random forest 52 . One can also select an ensemble machine learning model as the desired algorithm. An ensemble machine learning method uses multiple algorithms or the same algorithm multiple times to achieve better predictive performance than could be obtained from any of the constituent learning models alone 52 . Three widely used ensemble approaches are bagging, boosting and stacking. In bagging, the research dataset is divided into different equal-sized subsets. The underlying machine learning algorithm is then applied to these subsets for classification. In boosting, a random sample of the dataset is selected and then fitted and trained sequentially with different models to compensate for the weakness observed in the immediately used model. Stacking combined different weak machine learning models in a heterogeneous way to improve the predictive performance. For example, the random forest algorithm is an ensemble of different decision tree models 42 .

Second, each selected machine learning model will be processed through the k -fold cross-validation approach to improve predictive efficiency. In k -fold cross-validation, the training data is divided into k folds. In an iteration, the (k-1) folds are used to train the selected machine models, and the remaining last fold isF used for validation purposes. This iteration process continues until each k folds will get a turn to be used for validation purposes. The final predictive efficiency of the trained models is based on the average values from the outcomes of these iterations. In addition to this average value, researchers use the standard deviation of the results from different iterations as the predictive training efficiency. Supplementary Fig 1 shows an illustration of the k -fold cross-validation.

Third, most machine learning algorithms require a pre-defined value for their different parameters, known as hyperparameter tuning. The settings of these parameters play a vital role in the achieved performance of the underlying algorithm. For a given machine learning algorithm, the optimal value for these parameters can be different from one dataset to another. The same algorithm needs to run multiple times with different parameter values to find its optimal parameter value for a given dataset. Many algorithms are available in the literature, such as the Grid search 53 , to find the optimal parameter value. In the Grid search, hyperparameters are divided into discrete grids. Each grid point represents a specific combination of the underlying model parameters. The parameter values of the point that results in the best performance are the optimal parameter values 53 .

Testing of the developed models and reporting results

Once the desired machine learning models have been developed using the training data, they need to be tested using the test data. The underlying trained model is then applied to predict its dependent variable for each data instance. Therefore, for each data instance, two categorical outcomes will be available for its dependent variable: one predicted using the underlying trained model, and the other is the actual category. These predicted and actual categorical outcome values are used to report the results of the underlying machine learning model.

The fundamental tool to report results from machine learning models is the confusion matrix, which consists of four integer values 48 . The first value represents the number of positive cases correctly identified as positive by the underlying trained model (true-positive). The second value indicates the number of positive instances incorrectly identified as negative (false-negative). The third value represents the number of negative cases incorrectly identified as positive (false-positive). Finally, the fourth value indicates the number of negative instances correctly identified as negative (true-negative). Researchers also use a few performance measures based on the four values of the confusion matrix to report machine learning results. The most used measure is accuracy which is the ratio of the number of correct predictions (true-positive + true-negative) and the total number of data instances (sum of all four values of the confusion matrix). Other measures commonly used to report machine learning results are precision, recall and F1-score. Precision refers to the ratio between true-positives and the total number of positive predictions (i.e., true-positive + false-positive), often used to indicate the quality of a positive prediction made by a model 48 . Recall, also known as the true-positive rate, is calculated by dividing true-positive by the number of data instances that should have been predicted as positive (i.e., true-positive + false-negative). F1-score is the harmonic mean of the last two measures, i.e., [(2 × Precision × Recall)/(Precision + Recall)] and the error-rate equals to (1-Accuracy).

Another essential tool for reporting machine learning results is variable or feature importance, which identifies a list of independent variables (features) contributing most to the classification performance. The importance of a variable refers to how much a given machine learning algorithm uses that variable in making accurate predictions 54 . The widely used technique for identifying variable importance is the principal component analysis. It reduces the dimensionality of the data while minimising information loss, which eventually increases the interpretability of the underlying machine learning outcome. It further helps in finding the important features in a dataset as well as plotting them in 2D and 3D 54 .

Ethical approval

Ethical approval is not required for this study since this study used publicly available data for research investigation purposes. All research was performed in accordance with relevant guidelines/regulations.

Informed consent

Due to the nature of the data sources, informed consent was not required for this study.

Case study: an application of the proposed framework

This section illustrates an application of this study’s proposed framework (Fig.  2 ) in a construction project context. We will apply this framework in classifying projects into two classes based on their cost overrun experience. Projects rarely experience a delay belonging to the first class (Rare class). The second class indicates those projects that often experience a delay (Often class). In doing so, we consider a list of independent variables or features.

Data source

The research dataset is taken from an open-source data repository, Kaggle 55 . This survey-based research dataset was collected to explore the causes of the project cost overrun in Indian construction projects 45 , consisting of 44 independent variables or features and one dependent variable. The independent variables cover a wide range of cost overrun factors, from materials and labour to contractual issues and the scope of the work. The dependent variable is the frequency of experiencing project cost overrun (rare or often). The dataset size is 139; 65 belong to the rare class, and the remaining 74 are from the often class. We converted each categorical variable with a textual or string outcome into an appropriate numerical value range to prepare the dataset for machine learning analysis. For example, we used 1 and 2 to represent rare and often class, respectively. The correlation matrix among the 44 features is presented in Supplementary Fig 2 .

Machine learning algorithms

This study considered four machine learning algorithms to explore the causes of project cost overrun using the research dataset mentioned above. They are support vector machine, logistic regression, k- nearest neighbours and random forest.

Support vector machine (SVM) is a process applied to understand data. For instance, if one wants to determine and interpret which projects are classified as programmatically successful through the processing of precedent data information, SVM would provide a practical approach for prediction. SVM functions by assigning labels to objects 56 . The comparison attributes are used to cluster these objects into different groups or classes by maximising their marginal distances and minimising the classification errors. The attributes are plotted multi-dimensionally, allowing a separation line, known as a hyperplane , see supplementary Fig 3 (a), to distinguish between underlying classes or groups 52 . Support vectors are the data points that lie closest to the decision boundary on both sides. In Supplementary Fig 3 (a), they are the circles (both transparent and shaded ones) close to the hyperplane. Support vectors play an essential role in deciding the position and orientation of the hyperplane. Various computational methods, including a kernel function to create more derived attributes, are applied to accommodate this process 56 . Support vector machines are not only limited to binary classes but can also be generalised to a larger variety of classifications. This is accomplished through the training of separate SVMs 56 .

Logistic regression (LR) builds on the linear regression model and predicts the outcome of a dichotomous variable 57 ; for example, the presence or absence of an event. It uses a scatterplot to understand the connection between an independent variable and one or more dependent variables (see Supplementary Fig 3 (b)). LR model fits the data to a sigmoidal curve instead of fitting it to a straight line. The natural logarithm is considered when developing the model. It provides a value between 0 and 1 that is interpreted as the probability of class membership. Best estimates are determined by developing from approximate estimates until a level of stability is reached 58 . Generally, LR offers a straightforward approach for determining and observing interrelationships. It is more efficient compared to ordinary regressions 59 .

k -nearest neighbours (KNN) algorithm uses a process that plots prior information and applies a specific sample size ( k ) to the plot to determine the most likely scenario 52 . This method finds the nearest training examples using a distance measure. The final classification is made by counting the most common scenario or votes present within the specified sample. As illustrated in Supplementary Fig 3 (c), the closest four nearest neighbours in the small circle are three grey squares and one white square. The majority class is grey. Hence, KNN will predict the instance (i.e., Χ ) as grey. On the other hand, if we look at the larger circle of the same figure, the nearest neighbours consist of ten white squares and four grey squares. The majority class is white. Thus, KNN will classify the instance as white. KNN’s advantage lies in its ability to produce a simplified result and handle missing data 60 . In summary, KNN utilises similarities (as well as differences) and distances in the process of developing models.

Random forest (RF) is a machine learning process that consists of many decision trees. A decision tree is a tree-like structure where each internal node represents a test on the input attribute. It may have multiple internal nodes at different levels, and the leaf or terminal nodes represent the decision outcomes. It produces a classification outcome for a distinctive and separate part to the input vector. For non-numerical processes, it considers the average value, and for discrete processes, it considers the number of votes 52 . Supplementary Fig 3 (d) shows three decision trees to illustrate the function of a random forest. The outcomes from trees 1, 2 and 3 are class B, class A and class A, respectively. According to the majority vote, the final prediction will be class A. Because it considers specific attributes, it can have a tendency to emphasise specific attributes over others, which may result in some attributes being unevenly weighted 52 . Advantages of the random forest include its ability to handle multidimensionality and multicollinearity in data despite its sensitivity to sampling design.

Artificial neural network (ANN) simulates the way in which human brains work. This is accomplished by modelling logical propositions and incorporating weighted inputs, a transfer and one output 61 (Supplementary Fig 3 (e)). It is advantageous because it can be used to model non-linear relationships and handle multivariate data 62 . ANN learns through three major avenues. These include error-back propagation (supervised), the Kohonen (unsupervised) and the counter-propagation ANN (supervised) 62 . There are two types of ANN—supervised and unsupervised. ANN has been used in a myriad of applications ranging from pharmaceuticals 61 to electronic devices 63 . It also possesses great levels of fault tolerance 64 and learns by example and through self-organisation 65 .

Ensemble techniques are a type of machine learning methodology in which numerous basic classifiers are combined to generate an optimal model 66 . An ensemble technique considers many models and combines them to form a single model, and the final model will eliminate the weaknesses of each individual learner, resulting in a powerful model that will improve model performance. The stacking model is a general architecture comprised of two classifier levels: base classifier and meta-learner 67 . The base classifiers are trained with the training dataset, and a new dataset is constructed for the meta-learner. Afterwards, this new dataset is used to train the meta-classifier. This study uses four models (SVM, LR, KNN and RF) as base classifiers and LR as a meta learner, as illustrated in Supplementary Fig 3 (f).

Feature selection

The process of selecting the optimal feature subset that significantly influences the predicted outcomes, which may be efficient to increase model performance and save running time, is known as feature selection. This study considers three different feature selection approaches. They are the Univariate feature selection (UFS), Recursive feature elimination (RFE) and SelectFromModel (SFM) approach. UFS examines each feature separately to determine the strength of its relationship with the response variable 68 . This method is straightforward to use and comprehend and helps acquire a deeper understanding of data. In this study, we calculate the chi-square values between features. RFE is a type of backwards feature elimination in which the model is fit first using all features in the given dataset and then removing the least important features one by one 69 . After that, the model is refit until the desired number of features is left over, which is determined by the parameter. SFM is used to choose effective features based on the feature importance of the best-performing model 70 . This approach selects features by establishing a threshold based on feature significance as indicated by the model on the training set. Those characteristics whose feature importance is more than the threshold are chosen, while those whose feature importance is less than the threshold are deleted. In this study, we apply SFM after we compare the performance of four machine learning methods. Afterwards, we train the best-performing model again using the features from the SFM approach.

Findings from the case study

We split the dataset into 70:30 for training and test purposes of the four selected machine learning algorithms. We used Python’s Scikit-learn package for implementing these algorithms 70 . Using the training data, we first developed six models based on these six algorithms. We used fivefold validation and target to improve the accuracy value. Then, we applied these models to the test data. We also executed all required hyperparameter tunings for each algorithm for the possible best classification outcome. Table 1 shows the performance outcomes for each algorithm during the training and test phase. The hyperparameter settings for each algorithm have been listed in Supplementary Table 1 .

As revealed in Table 1 , random forest outperformed the other three algorithms in terms of accuracy for both the training and test phases. It showed an accuracy of 78.14% and 77.50% for the training and test phases, respectively. The second-best performer in the training phase is k- nearest neighbours (76.98%), and for the test phase, it is the support vector machine, k- nearest neighbours and artificial neural network (72.50%).

Since random forest showed the best performance, we explored further based on this algorithm. We applied the three approaches (UFS, RFE and SFM) for feature optimisation on the random forest. The result is presented in Table 2 . SFM shows the best outcome among these three approaches. Its accuracy is 85.00%, whereas the accuracies of USF and RFE are 77.50% and 72.50%, respectively. As can be seen in Table 2 , the accuracy for the testing phase increases from 77.50% in Table 1 (b) to 85.00% with the SFM feature optimisation. Table 3 shows the 19 selected features from the SFM output. Out of 44 features, SFM found that 19 of them play a significant role in predicting the outcomes.

Further, Fig.  4 illustrates the confusion matrix when the random forest model with the SFM feature optimiser was applied to the test data. There are 18 true-positive, five false-negative, one false-positive and 16 true-negative cases. Therefore, the accuracy for the test phase is (18 + 16)/(18 + 5 + 1 + 16) = 85.00%.

figure 4

Confusion matrix results based on the random forest model with the SFM feature optimiser (1 for the rare class and 2 for the often class).

Figure  5 illustrates the top-10 most important features or variables based on the random forest algorithm with the SFM optimiser. We used feature importance based on the mean decrease in impurity in identifying this list of important variables. Mean decrease in impurity computes each feature’s importance as the sum over the number of splits that include the feature in proportion to the number of samples it splits 71 . According to this figure, the delays in decision marking attribute contributed most to the classification performance of the random forest algorithm, followed by cash flow problem and construction cost underestimation attributes. The current construction project literature also highlighted these top-10 factors as significant contributors to project cost overrun. For example, using construction project data from Jordan, Al-Hazim et al. 72 ranked 20 causes for cost overrun, including causes similar to these causes.

figure 5

Feature importance (top-10 out of 19) based on the random forest model with the SFM feature optimiser.

Further, we conduct a sensitivity analysis of the model’s ten most important features (from Fig.  5 ) to explore how a change in each feature affects the cost overrun. We utilise the partial dependence plot (PDP), which is a typical visualisation tool for non-parametric models 73 , to display this analysis’s outcomes. A PDP can demonstrate whether the relation between the target and a feature is linear, monotonic, or more complicated. The result of the sensitivity analysis is presented in Fig.  6 . For the ‘delays in decisions making’ attribute, the PDP shows that the probability is below 0.4 until the rating value is three and increases after. A higher value for this attribute indicates a higher risk of cost overrun. On the other hand, there are no significant differences can be seen in the remaining nine features if the value changes.

figure 6

The result of the sensitivity analysis from the partial dependency plot tool for the ten most important features.

Summary of the case study

We illustrated an application of the proposed machine learning-based research framework in classifying construction projects. RF showed the highest accuracy in predicting the test dataset. For a new data instance with information for its 19 features but has not had any information on its classification, RF can identify its class ( rare or often ) correctly with a probability of 85.00%. If more data is provided, in addition to the 139 instances of the case study, to the machine learning algorithms, then their accuracy and efficiency in making project classification will improve with subsequent training. For example, if we provide 100 more data instances, these algorithms will have an additional 50 instances for training with a 70:30 split. This continuous improvement facility put the machine learning algorithms in a superior position over other traditional methods. In the current literature, some studies explore the factors contributing to project delay or cost overrun. In most cases, they applied factor analysis or other related statistical methods for research data analysis 72 , 74 , 75 . In addition to identifying important attributes, the proposed machine learning-based framework identified the ranking of factors and how eliminating less important factors affects the prediction accuracy when applied to this case study.

We shared the Python software developed to implement the four machine learning algorithms considered in this case study using GitHub 76 , a software hosting internet site. user-friendly version of this software can be accessed at https://share.streamlit.io/haohuilu/pa/main/app.py . The accuracy findings from this link could be slightly different from one run to another due to the hyperparameter settings of the corresponding machine learning algorithms.

Due to their robust prediction ability, machine learning methods have already gained wide acceptability across a wide range of research domains. On the other side, EVM is the most commonly used method in project analytics due to its simplicity and ease of interpretability 77 . Essential research efforts have been made to improve its generalisability over time. For example, Naeni et al. 34 developed a fuzzy approach for earned value analysis to make it suitable to analyse project scenarios with ambiguous or linguistic outcomes. Acebes 78 integrated Monte Carlo simulation with EVM for project monitoring and control for a similar purpose. Another prominent method frequently used in project analytics is the time series analysis, which is compelling for the longitudinal prediction of project time and cost 30 . Apparently, as evident in the present current literature, not much effort has been made to bring machine learning into project analytics for addressing project management research problems. This research made a significant attempt to contribute to filling up this gap.

Our proposed data-driven framework only includes the fundamental model development and application process components for machine learning algorithms. It does not have a few advanced-level machine learning methods. This study intentionally did not consider them for the proposed model since they are required only in particular designs of machine learning analysis. For example, the framework does not contain any methods or tools to handle the data imbalance issue. Data imbalance refers to a situation when the research dataset has an uneven distribution of the target class 79 . For example, a binary target variable will cause a data imbalance issue if one of its class labels has a very high number of observations compared with the other class. Commonly used techniques to address this issue are undersampling and oversampling. The undersampling technique decreases the size of the majority class. On the other hand, the oversampling technique randomly duplicates the minority class until the class distribution becomes balanced 79 . The class distribution of the case study did not produce any data imbalance issues.

This study considered only six fundamental machine learning algorithms for the case study, although many other such algorithms are available in the literature. For example, it did not consider the extreme gradient boosting (XGBoost) algorithm. XGBoost is based on the decision tree algorithm, similar to the random forest algorithm 80 . It has become dominant in applied machine learning due to its performance and speed. Naïve Bayes and convolutional neural networks are other popular machine learning algorithms that were not considered when applying the proposed framework to the case study. In addition to the three feature selection methods, multi-view can be adopted when applying the proposed framework to the case study. Multi-view learning is another direction in machine learning that considers learning with multiple views of the existing data with the aim to improve predictive performance 81 , 82 . Similarly, although we considered five performance measures, there are other potential candidates. One such example is the area under the receiver operating curve, which is the ability of the underlying classifier to distinguish between classes 48 . We leave them as a potential application scope while applying our proposed framework in any other project contexts in future studies.

Although this study only used one case study for illustration, our proposed research framework can be used in other project analytics contexts. In such an application context, the underlying research goal should be to predict the outcome classes and find attributes playing a significant role in making correct predictions. For example, by considering two types of projects based on the time required to accomplish (e.g., on-time and delayed ), the proposed framework can develop machine learning models that can predict the class of a new data instance and find out attributes contributing mainly to this prediction performance. This framework can also be used at any stage of the project. For example, the framework’s results allow project stakeholders to screen projects for excessive cost overruns and forecast budget loss at bidding and before contracts are signed. In addition, various factors that contribute to project cost overruns can be figured out at an earlier stage. These elements emerge at each stage of a project’s life cycle. The framework’s feature importance helps project managers locate the critical contributor to cost overrun.

This study has made an important contribution to the current project analytics literature by considering the applications of machine learning within project management. Project management is often thought of as being very fluid in nature, and because of this, applications of machine learning are often more difficult. Further, existing implementations have largely been limited to safety monitoring, risk prediction and cost estimation. Through the evaluation of machine learning applications, this study further demonstrates the uses for which algorithms can be used to consider and model the relationship between project attributes and cost overrun frequency.

The applications of machine learning in project analytics are still undergoing constant development. Within construction projects, its applications have been largely limited and focused on profitability or the design of structures themselves. In this regard, our study made a substantial effort by proposing a machine learning-based framework to address research problems related to project analytics. We also illustrated an example of this framework’s application in the context of construction project management.

Like any other research, this study also has a few limitations that could provide scopes for future research. First, the framework does not include a few advanced machine learning techniques, such as data imbalance issues and kernel density estimation. Second, we considered only one case study to illustrate the application of the proposed framework. Illustrations of this framework using case studies from different project contexts would confirm its robust application. Finally, this study did not consider all machine learning models and performance measures available in the literature for the case study. For example, we did not consider the Naïve Bayes model and precision measure in applying the proposed research framework for the case study.

Data availability

This study obtained research data from publicly available online repositories. We mentioned their sources using proper citations. Here is the link to the data https://www.kaggle.com/datasets/amansaxena/survey-on-road-construction-delay .

Venkrbec, V. & Klanšek, U. In: Advances and Trends in Engineering Sciences and Technologies II 685–690 (CRC Press, 2016).

Google Scholar  

Damnjanovic, I. & Reinschmidt, K. Data Analytics for Engineering and Construction Project Risk Management (Springer, 2020).

Book   Google Scholar  

Singh, H. Project Management Analytics: A Data-driven Approach to Making Rational and Effective Project Decisions (FT Press, 2015).

Frame, J. D. & Chen, Y. Why Data Analytics in Project Management? (Auerbach Publications, 2018).

Ong, S. & Uddin, S. Data Science and Artificial Intelligence in Project Management: The Past, Present and Future. J. Mod. Proj. Manag. 7 , 26–33 (2020).

Bilal, M. et al. Investigating profitability performance of construction projects using big data: A project analytics approach. J. Build. Eng. 26 , 100850 (2019).

Article   Google Scholar  

Radziszewska-Zielina, E. & Sroka, B. Planning repetitive construction projects considering technological constraints. Open Eng. 8 , 500–505 (2018).

Neely, A. D., Adams, C. & Kennerley, M. The Performance Prism: The Scorecard for Measuring and Managing Business Success (Prentice Hall Financial Times, 2002).

Kanakaris, N., Karacapilidis, N., Kournetas, G. & Lazanas, A. In: International Conference on Operations Research and Enterprise Systems. 135–155 Springer.

Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science 349 , 255–260 (2015).

Article   ADS   MathSciNet   CAS   PubMed   MATH   Google Scholar  

Shalev-Shwartz, S. & Ben-David, S. Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, 2014).

Book   MATH   Google Scholar  

Rahimian, F. P., Seyedzadeh, S., Oliver, S., Rodriguez, S. & Dawood, N. On-demand monitoring of construction projects through a game-like hybrid application of BIM and machine learning. Autom. Constr. 110 , 103012 (2020).

Sanni-Anibire, M. O., Zin, R. M. & Olatunji, S. O. Machine learning model for delay risk assessment in tall building projects. Int. J. Constr. Manag. 22 , 1–10 (2020).

Cong, J. et al. A machine learning-based iterative design approach to automate user satisfaction degree prediction in smart product-service system. Comput. Ind. Eng. 165 , 107939 (2022).

Li, F., Chen, C.-H., Lee, C.-H. & Feng, S. Artificial intelligence-enabled non-intrusive vigilance assessment approach to reducing traffic controller’s human errors. Knowl. Based Syst. 239 , 108047 (2021).

Mohri, M., Rostamizadeh, A. & Talwalkar, A. Foundations of Machine Learning (MIT press, 2018).

MATH   Google Scholar  

Whyte, J., Stasis, A. & Lindkvist, C. Managing change in the delivery of complex projects: Configuration management, asset information and ‘big data’. Int. J. Proj. Manag. 34 , 339–351 (2016).

Zangeneh, P. & McCabe, B. Ontology-based knowledge representation for industrial megaprojects analytics using linked data and the semantic web. Adv. Eng. Inform. 46 , 101164 (2020).

Akinosho, T. D. et al. Deep learning in the construction industry: A review of present status and future innovations. J. Build. Eng. 32 , 101827 (2020).

Soman, R. K., Molina-Solana, M. & Whyte, J. K. Linked-Data based constraint-checking (LDCC) to support look-ahead planning in construction. Autom. Constr. 120 , 103369 (2020).

Soman, R. K. & Whyte, J. K. Codification challenges for data science in construction. J. Constr. Eng. Manag. 146 , 04020072 (2020).

Soman, R. K. & Molina-Solana, M. Automating look-ahead schedule generation for construction using linked-data based constraint checking and reinforcement learning. Autom. Constr. 134 , 104069 (2022).

Shi, F., Soman, R. K., Han, J. & Whyte, J. K. Addressing adjacency constraints in rectangular floor plans using Monte-Carlo tree search. Autom. Constr. 115 , 103187 (2020).

Chen, L. & Whyte, J. Understanding design change propagation in complex engineering systems using a digital twin and design structure matrix. Eng. Constr. Archit. Manag. (2021).

Allison, J. T. et al. Artificial intelligence and engineering design. J. Mech. Des. 144 , 020301 (2022).

Dutta, D. & Bose, I. Managing a big data project: The case of ramco cements limited. Int. J. Prod. Econ. 165 , 293–306 (2015).

Bilal, M. & Oyedele, L. O. Guidelines for applied machine learning in construction industry—A case of profit margins estimation. Adv. Eng. Inform. 43 , 101013 (2020).

Tayefeh Hashemi, S., Ebadati, O. M. & Kaur, H. Cost estimation and prediction in construction projects: A systematic review on machine learning techniques. SN Appl. Sci. 2 , 1–27 (2020).

Arage, S. S. & Dharwadkar, N. V. In: International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). 594–599 (IEEE, 2017).

Cheng, C.-H., Chang, J.-R. & Yeh, C.-A. Entropy-based and trapezoid fuzzification-based fuzzy time series approaches for forecasting IT project cost. Technol. Forecast. Soc. Chang. 73 , 524–542 (2006).

Joukar, A. & Nahmens, I. Volatility forecast of construction cost index using general autoregressive conditional heteroskedastic method. J. Constr. Eng. Manag. 142 , 04015051 (2016).

Xu, J.-W. & Moon, S. Stochastic forecast of construction cost index using a cointegrated vector autoregression model. J. Manag. Eng. 29 , 10–18 (2013).

Narbaev, T. & De Marco, A. Combination of growth model and earned schedule to forecast project cost at completion. J. Constr. Eng. Manag. 140 , 04013038 (2014).

Naeni, L. M., Shadrokh, S. & Salehipour, A. A fuzzy approach for the earned value management. Int. J. Proj. Manag. 29 , 764–772 (2011).

Ponz-Tienda, J. L., Pellicer, E. & Yepes, V. Complete fuzzy scheduling and fuzzy earned value management in construction projects. J. Zhejiang Univ. Sci. A 13 , 56–68 (2012).

Yu, F., Chen, X., Cory, C. A., Yang, Z. & Hu, Y. An active construction dynamic schedule management model: Using the fuzzy earned value management and BP neural network. KSCE J. Civ. Eng. 25 , 2335–2349 (2021).

Bonato, F. K., Albuquerque, A. A. & Paixão, M. A. S. An application of earned value management (EVM) with Monte Carlo simulation in engineering project management. Gest. Produção 26 , e4641 (2019).

Batselier, J. & Vanhoucke, M. Empirical evaluation of earned value management forecasting accuracy for time and cost. J. Constr. Eng. Manag. 141 , 05015010 (2015).

Yang, R. J. & Zou, P. X. Stakeholder-associated risks and their interactions in complex green building projects: A social network model. Build. Environ. 73 , 208–222 (2014).

Uddin, S. Social network analysis in project management–A case study of analysing stakeholder networks. J. Mod. Proj. Manag. 5 , 106–113 (2017).

Ong, S. & Uddin, S. Co-evolution of project stakeholder networks. J. Mod. Proj. Manag. 8 , 96–115 (2020).

Khanzode, K. C. A. & Sarode, R. D. Advantages and disadvantages of artificial intelligence and machine learning: A literature review. Int. J. Libr. Inf. Sci. (IJLIS) 9 , 30–36 (2020).

Loyola-Gonzalez, O. Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access 7 , 154096–154113 (2019).

Abioye, S. O. et al. Artificial intelligence in the construction industry: A review of present status, opportunities and future challenges. J. Build. Eng. 44 , 103299 (2021).

Doloi, H., Sawhney, A., Iyer, K. & Rentala, S. Analysing factors affecting delays in Indian construction projects. Int. J. Proj. Manag. 30 , 479–489 (2012).

Alkhaddar, R., Wooder, T., Sertyesilisik, B. & Tunstall, A. Deep learning approach’s effectiveness on sustainability improvement in the UK construction industry. Manag. Environ. Qual. Int. J. 23 , 126–139 (2012).

Gondia, A., Siam, A., El-Dakhakhni, W. & Nassar, A. H. Machine learning algorithms for construction projects delay risk prediction. J. Constr. Eng. Manag. 146 , 04019085 (2020).

Witten, I. H. & Frank, E. Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann, 2005).

Kanakaris, N., Karacapilidis, N. I. & Lazanas, A. In: ICORES. 362–369.

Heo, S., Han, S., Shin, Y. & Na, S. Challenges of data refining process during the artificial intelligence development projects in the architecture engineering and construction industry. Appl. Sci. 11 , 10919 (2021).

Article   CAS   Google Scholar  

Bross, I. D. How to use ridit analysis. Biometrics 14 , 18–38 (1958).

Uddin, S., Khan, A., Hossain, M. E. & Moni, M. A. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 19 , 1–16 (2019).

LaValle, S. M., Branicky, M. S. & Lindemann, S. R. On the relationship between classical grid search and probabilistic roadmaps. Int. J. Robot. Res. 23 , 673–692 (2004).

Abdi, H. & Williams, L. J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2 , 433–459 (2010).

Saxena, A. Survey on Road Construction Delay , https://www.kaggle.com/amansaxena/survey-on-road-construction-delay (2021).

Noble, W. S. What is a support vector machine?. Nat. Biotechnol. 24 , 1565–1567 (2006).

Article   CAS   PubMed   Google Scholar  

Hosmer, D. W. Jr., Lemeshow, S. & Sturdivant, R. X. Applied Logistic Regression Vol. 398 (John Wiley & Sons, 2013).

LaValley, M. P. Logistic regression. Circulation 117 , 2395–2399 (2008).

Article   PubMed   Google Scholar  

Menard, S. Applied Logistic Regression Analysis Vol. 106 (Sage, 2002).

Batista, G. E. & Monard, M. C. A study of K-nearest neighbour as an imputation method. His 87 , 48 (2002).

Agatonovic-Kustrin, S. & Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 22 , 717–727 (2000).

Zupan, J. Introduction to artificial neural network (ANN) methods: What they are and how to use them. Acta Chim. Slov. 41 , 327–327 (1994).

CAS   Google Scholar  

Hopfield, J. J. Artificial neural networks. IEEE Circuits Devices Mag. 4 , 3–10 (1988).

Zou, J., Han, Y. & So, S.-S. Overview of artificial neural networks. Artificial Neural Networks . 14–22 (2008).

Maind, S. B. & Wankar, P. Research paper on basic of artificial neural network. Int. J. Recent Innov. Trends Comput. Commun. 2 , 96–100 (2014).

Wolpert, D. H. Stacked generalization. Neural Netw. 5 , 241–259 (1992).

Pavlyshenko, B. In: IEEE Second International Conference on Data Stream Mining & Processing (DSMP). 255–258 (IEEE).

Jović, A., Brkić, K. & Bogunović, N. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). 1200–1205 (Ieee, 2015).

Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46 , 389–422 (2002).

Article   MATH   Google Scholar  

Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Louppe, G., Wehenkel, L., Sutera, A. & Geurts, P. Understanding variable importances in forests of randomized trees. Adv. Neural. Inf. Process. Syst. 26 , 431–439 (2013).

Al-Hazim, N., Salem, Z. A. & Ahmad, H. Delay and cost overrun in infrastructure projects in Jordan. Procedia Eng. 182 , 18–24 (2017).

Breiman, L. Random forests. Mach. Learn. 45 , 5–32. https://doi.org/10.1023/A:1010933404324 (2001).

Shehu, Z., Endut, I. R. & Akintoye, A. Factors contributing to project time and hence cost overrun in the Malaysian construction industry. J. Financ. Manag. Prop. Constr. 19 , 55–75 (2014).

Akomah, B. B. & Jackson, E. N. Contractors’ perception of factors contributing to road project delay. Int. J. Constr. Eng. Manag. 5 , 79–85 (2016).

GitHub: Where the world builds software , https://github.com/ .

Anbari, F. T. Earned value project management method and extensions. Proj. Manag. J. 34 , 12–23 (2003).

Acebes, F., Pereda, M., Poza, D., Pajares, J. & Galán, J. M. Stochastic earned value analysis using Monte Carlo simulation and statistical learning techniques. Int. J. Proj. Manag. 33 , 1597–1609 (2015).

Japkowicz, N. & Stephen, S. The class imbalance problem: A systematic study. Intell. data anal. 6 , 429–449 (2002).

Chen, T. et al. Xgboost: extreme gradient boosting. R Packag. Version 0.4–2.1 1 , 1–4 (2015).

Guarino, A., Lettieri, N., Malandrino, D., Zaccagnino, R. & Capo, C. Adam or Eve? Automatic users’ gender classification via gestures analysis on touch devices. Neural Comput. Appl. 1–23 (2022).

Zaccagnino, R., Capo, C., Guarino, A., Lettieri, N. & Malandrino, D. Techno-regulation and intelligent safeguards. Multimed. Tools Appl. 80 , 15803–15824 (2021).

Download references

Acknowledgements

The authors acknowledge the insightful comments from Prof Jennifer Whyte on an earlier version of this article.

Author information

Authors and affiliations.

School of Project Management, The University of Sydney, Level 2, 21 Ross St, Forest Lodge, NSW, 2037, Australia

Shahadat Uddin, Stephen Ong & Haohui Lu

You can also search for this author in PubMed   Google Scholar

Contributions

S.U.: Conceptualisation; Data curation; Formal analysis; Methodology; Supervision; and Writing (original draft, review and editing) S.O.: Data curation; and Writing (original draft, review and editing) H.L.: Methodology; and Writing (original draft, review and editing) All authors reviewed the manuscript).

Corresponding author

Correspondence to Shahadat Uddin .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Uddin, S., Ong, S. & Lu, H. Machine learning in project analytics: a data-driven framework and case study. Sci Rep 12 , 15252 (2022). https://doi.org/10.1038/s41598-022-19728-x

Download citation

Received : 13 April 2022

Accepted : 02 September 2022

Published : 09 September 2022

DOI : https://doi.org/10.1038/s41598-022-19728-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Evaluation and prediction of time overruns in jordanian construction projects using coral reefs optimization and deep learning methods.

  • Jumana Shihadeh
  • Ghyda Al-Shaibie
  • Hamza Al-Bdour

Asian Journal of Civil Engineering (2024)

A robust, resilience machine learning with risk approach: a case study of gas consumption

  • Mehdi Changizi
  • Sadia Samar Ali

Annals of Operations Research (2024)

Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets

  • Shahadat Uddin

Health and Technology (2024)

Prediction of SMEs’ R&D performances by machine learning for project selection

  • Hyoung Sun Yoo
  • Ye Lim Jung
  • Seung-Pyo Jun

Scientific Reports (2023)

A robust and resilience machine learning for forecasting agri-food production

  • Amin Gholamrezaei
  • Kiana Kheiri

Scientific Reports (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

case study of machine learning

Software Engineering for Machine Learning: A Case Study

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. News & World Report Education takes an unbiased approach to our recommendations. When you use our links to buy products, we may earn a commission but that in no way affects our editorial independence.

Machine Learning Foundations: A Case Study Approach

Machine Learning Foundations: A Case Study Approach

About this course.

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies. At the end of the first course you will have studied how to predict house prices based on house-level features, analyze sentiment from user reviews, retrieve documents of interest, recommend products, and search for images. Through hands-on practice with these use cases, you will be able to apply machine learning methods in a wide range of domains. This first course treats the machine learning method as a black box. Using this abstraction, you will focus on understanding tasks of interest, matching these tasks to machine learning tools, and assessing the quality of the output. In subsequent courses, you will delve into the components of this black box by examining models and algorithms. Together, these pieces form the machine learning pipeline, which you will use in developing intelligent applications. Learning Outcomes: By the end of this course, you will be able to: -Identify potential applications of machine learning in practice. -Describe the core differences in analyses enabled by regression, classification, and clustering. -Select the appropriate machine learning task for a potential application. -Apply regression, classification, clustering, retrieval, recommender systems, and deep learning. -Represent your data as features to serve as input to machine learning models. -Assess the model quality in terms of relevant error metrics for each task. -Utilize a dataset to fit a model to analyze new data. -Build an end-to-end application that uses machine learning at its core. -Implement these techniques in Python.

Add a Verified Certificate for $79 USD

Other Courses in this Specialization

Machine learning: classification.

case study of machine learning

Machine Learning: Regression

case study of machine learning

Machine Learning: Clustering & Retrieval

case study of machine learning

What Else Should I Know?

Data Science

  • Biomedical Engineering
  • Chemical & Biomolecular Engineering
  • Civil and Environmental Engineering
  • Computer & Data Sciences
  • Electrical, Computer and Systems Engineering
  • Macromolecular Science & Engineering
  • Materials Science & Engineering
  • Mechanical & Aerospace Engineering
  • Intranet Login

Search form

Case school of engineering.

  • Global Opportunities
  • Outside The Classroom

Computer and Data Sciences

Artificial intelligence and machine learning.

case study of machine learning

Artificial Intelligence is poised to revolutionize our world, our societies and our lives through myriad applications from healthcare to transportation, to data science applications to cybersecurity. At the same time, fundamental scientific questions remain to be answered about what makes up an intelligent system and how to realize intelligent behavior through efficient computation. 

Researchers at Case Western Reserve University are working to answer the fundamental questions behind AI as well as apply AI and machine learning methods to applications. To better understand these  questions, we study how our brains encode sensory information, how people should interact with AI assistants, how to build AI systems that can automatically learn to decompose problems and communicate in natural language, and how to build secure AI systems. 

Among applications, we apply AI and machine learning methods to healthcare, bioinformatics, software engineering, computer networks, cybersecurity, cognitive science and more. 

Faculty who conduct research in Artificial Intelligence and Machine Learning

Profile Photo

M. Cenk Cavusoglu

Develops next-generation medical robotic systems for surgery and image-guided interventions

Anon Photo

Vipin Chaudhary

High Performance Computing and Applications to Science, Engineering, Biology, and Medicine; Artificial Intelligence/Machine Learning/Data Science; Computer Assisted Diagnosis and Interventions; Medical Image Processing; Computer Architecture; Quantum Computing.

Profile Photo

Harold Connamacher

Applies theoretical computer science techniques to discover problem structures and improve algorithm performance

Anon Photo

Mehmet Koyutürk

Develops algorithms for transforming "big" biological data into systems biology knowledge

Anon Photo

Sanmukh Kuppannagari

AI/ML Acceleration on Heterogeneous Platforms; Parallel Computing; Reconfigurable Computing; Combinatorial Optimization

Anon Photo

Michael Lewicki

Develops theoretical models of computation and representation in sensory coding and perception

Profile Photo

Develops computational approaches and software tools for genomics, bioinformatics and systems biology, and creates computational solutions for big data analytics

Profile Photo

Develops and analyzes algorithms for intelligent adaptive systems

Profile Photo

Devise, design, and develop novel data management and analysis techniques and tools to support data and knowledge management and exploration.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Entropy (Basel)

Logo of entropy

Quantum Machine Learning: A Review and Case Studies

Amine zeguendry.

1 Laboratoire d’Ingénierie des Systèmes d’Information, Faculty of Sciences, Cadi Ayyad University, Marrakech 40000, Morocco

Mohamed Quafafou

2 Laboratoire des Sciences de l’Information et des Systèmes, Unité Mixte de Recherche 7296, Aix-Marseille University, 13007 Marseille, France

Associated Data

The notebooks used during this study can be provided after contacting the authors.

Despite its undeniable success, classical machine learning remains a resource-intensive process. Practical computational efforts for training state-of-the-art models can now only be handled by high speed computer hardware. As this trend is expected to continue, it should come as no surprise that an increasing number of machine learning researchers are investigating the possible advantages of quantum computing. The scientific literature on Quantum Machine Learning is now enormous, and a review of its current state that can be comprehended without a physics background is necessary. The objective of this study is to present a review of Quantum Machine Learning from the perspective of conventional techniques. Departing from giving a research path from fundamental quantum theory through Quantum Machine Learning algorithms from a computer scientist’s perspective, we discuss a set of basic algorithms for Quantum Machine Learning, which are the fundamental components for Quantum Machine Learning algorithms. We implement the Quanvolutional Neural Networks (QNNs) on a quantum computer to recognize handwritten digits, and compare its performance to that of its classical counterpart, the Convolutional Neural Networks (CNNs). Additionally, we implement the QSVM on the breast cancer dataset and compare it to the classical SVM. Finally, we implement the Variational Quantum Classifier (VQC) and many classical classifiers on the Iris dataset to compare their accuracies.

1. Introduction

Machine Learning is a subset of Artificial Intelligence (AI) that aims to create models that learn from previous experience without being explicitly formulated, and it has been used extensively in several scientific and technical fields, including natural language processing, medical diagnostics, computer vision, and data mining, and so on. Many machine learning problems require the use of linear algebra to execute matrix operations since data are described as matrices. On the other hand, performing these operations on traditional computers requires a significant amount of time and computational resources. Quantum Computing is an ambitious new field that combines computer science, mathematics, and physics. This field investigates ways to use some of the special properties of quantum physics to build quantum computers that take advantage of quantum bits (qubits) that can contain 0 and 1 combinations in superposition at the same time. As a result, quantum computers can handle and process large matrices, as well as accelerate various linear algebra operations, significantly improving traditional machine learning applications. Theoretically, it should solve problems that belong to classes of complexity that traditional computers, even large giant supercomputers, will never be able to solve. Grover’s algorithm, for instance, has demonstrated a quadratic speedup for exploring unstructured databases [ 1 ], while Shor’s technique [ 2 ] illustrates that quantum computing may provide an exponential speedup in solving the traditionally difficult problem of big integer factorization. These algorithms are anticipated to include some key characteristics of quantum computation, including quantum superposition, quantum measurement, and quantum entanglement.

While machine learning is being limited by a lack of computing power, researchers are exploring the prospects of combining quantum computing and machine learning to handle classical data using machine learning algorithms. This combination of machine learning theory with the characteristics of quantum computing is a new research sub-discipline called Quantum Machine Learning (QML). Therefore, QML aims to build quantum applications for diverse machine learning algorithms, using the processing power of quantum computers and the scalability and learning capacity of machine learning algorithms.

Quantum variants of several popular algorithms for machine learning are already developed. Quantum Neural Network (QNN) described by Narayanan and Menneer [ 3 ], in which they presented the theoretical design of a QNN architecture and how the system’s components might perform relative to their traditional counterparts. Quantum Support Vector Machines (QSVM) was proposed by Rebentrost et al. [ 4 ] for solving least-squares SVM using the HHL algorithm [ 5 ] for matrix inversion in order to generate the hyperplane. In 2014, Wiebe et al. [ 6 ] provided a quantum version of k-nearest neighbors for computing the nearest neighbors relying on the Euclidean distance between the data locations, in addition to amplitude estimation which eliminates the requirement for measurement. In 2018, Dang et al. [ 7 ] also introduced an image classification model based on quantum k-nearest neighbors and parallel computing. Their model improved categorization precision and productivity. Schuld et al. [ 8 ] proposed quantum linear regression as a version of classical linear regression, and it operates in an exponential runtime with N dimensions of features, using quantum data, which is presented as quantum information. The quantum decision tree classifier developed by Lu et al. [ 9 ] employs quantum fidelity measurement and quantum entropy impurity.

Lloyd et al. [ 10 ] described several quantum machine learning techniques for cluster detection and cluster assignment. A quantization version of Lloyd’s method [ 11 ] is provided as a part of their k-means clustering algorithm. Furthermore, Kerenidis et al. [ 12 ] have suggested Q-means, a quantized form of the k-means method with similar results and convergence to the traditional k-means algorithm. Aïmeur et al. [ 13 ] introduced quantum K-medians clustering, which uses Grover’s search algorithm for locating the cluster’s median. In 2014, Lloyd et al. [ 14 ] developed Quantum Principal Component Analysis (QPCA), which identifies the eigenvectors relating to the unknown state’s huge eigenvalues exponentially faster than any other solution.

Inside Machine Learning, there is a more advanced field known as Reinforcement Learning. It is built on continuous learning through exploration of the environment. There exist few quantum variants of conventional Reinforcement Learning algorithms including Quantum Reinforcement Learning introduced by Dong et al. [ 15 ], which makes use of the quantum parallelism and superposition principle. They discovered that probability amplitude and quantum parallelism may aid in learning speed. For solving Dynamic Programming problems, which are deterministic forms of Markov decision problems that are studied in reinforcement learning, in [ 16 ], they have suggested various quantum algorithms as solutions. McKiernan et al. [ 17 ] developed a general algorithm to improve hybrid quantum-classical computing using reinforcement learning.

Deep Learning is a new machine learning sub-discipline. Deep Learning techniques, which demand a significant amount of storage and time consumption, are now being implemented on quantum computers. These algorithms are shown by Quantum Generative Adversarial Networks (Quantum GAN) [ 18 , 19 , 20 ] with its implementation in [ 21 ], which uses a superconducting quantum processor to learn and generate real-world handwritten digit pictures, Quantum Wasserstein Generative Adversarial Networks (Quantum WGAN) [ 22 ], which improves the scalability and stability of quantum generative adversarial model training on quantum machines, Quantum Boltzmann Machines [ 23 , 24 ], Quantum Autoencoders [ 25 , 26 ], and Quantum Convolutional Neural Networks [ 27 , 28 , 29 ], and the latter is implemented in the benchmark section.

Optimization is also important in conventional machine learning, since most of the time learning training machine learning models are based on optimizing cost functions. Numerical optimization techniques are a major field of study that tries to enhance optimization algorithm computations. Quantum-enhanced Optimization (QEO) [ 30 ], a sub-field of quantum computing, intends to augment these techniques even further, similar to conventional optimization. Quantum Approximate Optimization Algorithm and Quantum Gradient Descent [ 31 , 32 ] are two well-known examples of this type, which are used within Quantum Neural Networks such as Quantum Boltzman Machines [ 24 ].

In this paper, we aim to introduce quantum computing to the field of machine learning from a basics perspective all the way to its applications. The remainder of this work is structured as follows. In Section 2 , we explore the background of quantum computing, the architecture of quantum computers, and an introduction to quantum algorithms. In Section 3 , we start introducing several fundamental algorithms for quantum machine learning that are the fundamental components of QML algorithms and can primarily offer a performance boost over conventional machine learning algorithms, and we discuss a few popular quantum machine learning algorithms. In Section 4 , we implement three machine learning algorithms in order to compare the performance of each algorithm with its classical counterparts. In the final sections, ahead of the conclusions and perspectives, we discuss some quantum computer challenges.

2. Background

Quantum computing uses quantum physics principles including s u p e r p o s i t i o n , e n t a n g l e m e n t , and q u a n t u m m e a s u r e m e n t , to process data. Superposition is the quantum mechanical feature that permits objects to exist in several states simultaneously. Entanglement may join two or even more quantum particles in full synchronization, despite being on different sides of the universe. Quantum measurement is a process that transforms quantum information into classical information. This section provides an introduction to the basics of quantum computing, from Dirac notations through quantum algorithms.

2.1. Dirac (Bra-Ket) Notation

States and operators in quantum mechanics are represented as vectors and matrices, respectively. Instead of utilizing standard linear algebra symbols, Dirac notation is used to represent the vectors.

Let a and b be in C 2 :

  • Ket: | a 〉 = a 1 a 2 . (1)

Note that the complex conjugate of any complex number can be generated by inverting the sign of its imaginary component—for example, the complex conjugate of b = a + i × d is b * = a − i × d .

  • Bra-Ket: Inner product 〈 b | a 〉 = a 1 b 1 * + a 2 b 2 * = 〈 a | b 〉 * . (3)
  • Ket-Bra: Outer product | a 〉 〈 b | = a 1 b 1 * a 1 b 2 * a 2 b 1 * a 2 b 2 * . (4)

The bit is the standard measure of information in classical computing. It could exist in one of two states: 0 or 1. Quantum computers, similarly, use a “qubit” that is also called “quantum bit” [ 33 ], which represents the likelihood of an electron’s “spin up” or “spin down” when passing through some kind of a magnetic field. In traditional computing, the spin may be thought of as the value of a bit. A qubit may be considered as the complex two-dimensional Hilbert space C 2 . An instantaneous qubit state can be expressed as a vector in this multidimensional Hilbert space.

An inner product between two vectors that represent qubit states in Hilbert space allows us to identify their relative positions. 〈 a | b 〉 represents the inner product of the vectors | a 〉 and | b 〉 ; it equals 0 if | a 〉 and | b 〉 are orthogonal and 1 if | a 〉 = | b 〉 .

The states | 0 〉 and | 1 〉 can be represented with vectors, as shown below. These two are known as the c o m p u t a t i o n a l b a s i s of a two-level system. These vectors in the state space are then susceptible to matrix-based operators:

A qubit could be in one of three states: | 0 〉 , | 1 〉 , or a superposition of both,

where the coefficients c 1 , c 2 ∈ C , called a m p l i t u d e s . According to the Born rule, the total of the squares of the amplitudes of all possible states in a superposition equals 1:

When we perform a m e a s u r e m e n t , we obtain a single bit of information: 0 or 1. The most basic measurement is in the computational basis (Z-Basis: | 0 〉 , | 1 〉 ). For example, the result of measuring the state c 1 | 0 〉 + c 2 | 1 〉 in the computational basis is 0 with probability | c 1 | 2 and 1 with probability | c 2 | 2 .

There are numerous different bases; however, below are some of the most common ones:

  • X-Basis: | + 〉 : = 1 2 | 0 〉 + | 1 〉 , | − 〉 : = 1 2 | 0 〉 − | 1 〉
  • Y-Basis: | + i 〉 : = 1 2 | 0 〉 + i | 1 〉 , | − i 〉 : = 1 2 | 0 〉 − i | 1 〉

The probability that a state | ψ 〉 collapses during a measurement in the basis | b 〉 , | b * 〉 , for the state | b 〉 is provided by P ( b ) = 〈 b | ψ 〉 2 ; Σ i P ( b ) = 1 , as claimed by the Born rule. For example, we measure the state | ψ 〉 = 1 3 | 0 〉 + 2 | 1 〉 in the | 0 〉 , | 1 〉 basis,

P ( 0 ) = 〈 0 | 1 3 | 0 〉 + 2 | 1 〉 〉 2 = 1 3 〈 0 | 0 〉 + 2 3 〈 0 | 1 〉 2 = 1 3 and P ( 1 ) = 1 − 1 3 = 2 3 , where 〈 0 | 0 〉 = 1 (normalized) and 〈 0 | 1 〉 = 0 (orthogonal); thus, the qubit is in the state | 1 〉 .

2.3. Quantum Circuit

Quantum circuits are represented using circuit diagrams. These diagrams are built and read from left to right. Barenco et al. [ 34 ] defined several of the basic operators that we now use in quantum circuits today. Two binary operators were added to this set by Toffoli and Fredkin [ 35 , 36 ]. We start building a quantum circuit with a line representing a circuit wire. As seen in Figure 1 a, on the left side of the wire, a ket indicates its original condition of state preparation.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g001.jpg

Representations of wires of quantum circuit. ( a ) Single quantum circuit wire; ( b ) Quantum circuit wire with n qubits.

The qubit stays in the state in which it was originally created if there is no operator on the line. This indicates that the qubit state is maintained by the quantum computer.

The number of qubits produced in that condition is denoted by a slash n sign all over the wire as shown in Figure 1 b.

A quantum circuit is a computer process that combines traditional real-time computation with coherent q u a n t u m g a t e s on quantum data such as qubits.

2.4. Quantum Gates

Quantum gates or operators fundamentally involve the modification of one or more qubits. Single-qubit gates are represented as a box with the letter of the operator straddling the line. An operator box connecting two quantum wires is the basis of a binary gate.

2.4.1. Single Qubit Gates

Pauli matrices are the first of three operators we examine, which are used to represent certain typical quantum gates.

When we apply X to | 0 〉 , we obtain

As we can see, this gate flips the amplitudes of the | 0 〉 and | 1 〉 states. In a quantum circuit, the symbol in Figure 2 a represents the Pauli-X gate.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g002.jpg

Circuit representations of the four most used gates in quantum circuits. ( a ) The Pauli-Y gate; ( b ) The Pauli-Y gate; ( c ) The Pauli-Z gate; ( d ) The CNOT gate.

Consequently, when it is applied to the | 1 〉 state, we obtain

In Figure 2 b, the circuit design for the Y operator is displayed.

By applying the Pauli-Z gate to the computational basis state, we obtain the result shown below:

For the particular case u = 0 , we present it in matrix form:

We have in the case where u = 1 ,

The Pauli-Z gate’s circuit diagram is shown in Figure 2 c.

The matrix below represents the phase shift gate:

where φ denotes the phase shift over a period of 2π. Typical instances include the Pauli-Z gate where φ = π , T gate where φ = π 4 , and S gate where φ = π 2 .

When Hadamard gate is applied to the state | 0 〉 , we obtain

and to state | 1 〉 , we obtain

As can be seen, the Hadamard gate projects a computational basis state into the superposition of two states.

2.4.2. Multi Qubit States and Gates

A single bit offers two possible states; in addition, a qubit state contains two complex amplitudes, as we saw before. Similarly, two bits might be in one of four states: 00, 10, 01, or 11. Four complex amplitudes are needed to present the state of two qubits. These amplitudes are stored in a 4D-vector:

The tensor product ⊗ can be used to characterize the combined state of two separated qubits:

States of this form ( | v 〉 ⊗ | u 〉 ) called uncorrelated, but there are also bipartite states which cannot be expressed as | v 〉 ⊗ | u 〉 ; these states are correlated and sometimes e n t a n g l e d .

The CNOT or CX gate is one of several quantum gates which perform on multi qubits, if the first qubit is in state | 1 〉 , the second qubit will execute the NOT operation, otherwise leaving it unaltered. Using the CNOT gate, we entangle two qubits in a quantum circuit. This gate is described by the following matrix:

As an instance, the CNOT gate may be applied to the state | 10 〉 as follows:

The Table 1 reveals that the output state of the targeted qubit matches that of a conventional XOR gate. The targeted qubit is | 0 〉 when both inputs are the same, but | 1 〉 when the inputs are different. We represent the CNOT gate with the circuit’s diagram in Figure 2 d.

CNOT gate truth table.

2.5. Representation of Qubit States

The state of a qubit may be represented with different ways. Dirac notation allows us to express this state in a readable form. A qubit in state | 0 〉 , for instance, will transfer to state | 1 〉 after the application of the X operator.

In Figure 3 , a state of a single qubit is represented by the Bloch sphere. Quantum states are represented by vectors that extend from the origin to a certain point on the surface of the Bloch sphere. The top and bottom antipodes of the sphere are | 0 〉 and | 1 〉 , respectively.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g003.jpg

Bloch sphere representation of the state of qubit. Reprinted with permission from Ref. [ 37 ]. 2014, Anton Frisk Kockum.

On the Bloch sphere, we may write any p u r e s t a t e (i.e., a qubit state specified by a vector of norm 1. M i x e d s t a t e , in contrast, is a state that combines numerous pure quantum states or qubits) as seen below:

with θ ∈ [ 0 , π ] , determines the probability to measure the state | 0 〉 as P ( 0 ) = cos 2 θ / 2 and the state | 1 〉 as P ( 1 ) = sin 2 θ / 2 , and ϕ ∈ [ 0 , 2 π ] describes the relative phase. On the surface of a Bloch sphere, all of these pure states may be represented with a radius of | r → | = 1 . The Bloch vector generates such a state’s coordinates:

  • | 0 〉 : θ = 0 , ϕ arbitrary → r → = ( 0 , 0 , 1 ) ;
  • | 1 〉 : θ = π , ϕ arbitrary → r → = ( 0 , 0 , − 1 ) ;
  • | + 〉 : θ = π / 2 , ϕ = 0 → r → = ( 1 , 0 , 0 ) ;
  • | − 〉 : θ = π / 2 , ϕ = π → r → = ( − 1 , 0 , 0 ) ;
  • | + i 〉 : θ = π / 2 , ϕ = π / 2 → r → = ( 0 , 1 , 0 ) ;
  • | − i 〉 : θ = π / 2 , ϕ = 3 π / 2 → r → = ( 0 , − 1 , 0 ) .

Bloch sphere is only capable of representing the state of a particular qubit. Therefore, the Q − s p h e r e is used for multi-qubits (and single qubits as well).

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g004.jpg

Representations of single qubit and multi-qubit in Q-sphere. ( a ) Representation of a superposition state; ( b ) Representation of three-qubit states.

  • - The south pole represents the state | 1 〉 ;
  • - The north pole represents the state | 0 〉 ;
  • - The size of the blobs is related to the likelihood that the relevant state will be measured;
  • - The color indicates the relative phase compared to the state | 0 〉 .

In Figure 4 b, we plot all states as equally distributed points on sphere with 0 ⊗ n on the north pole, 1 ⊗ n on the south pole, and all other state are aligned on parallels, such as the number of “1”s on each latitude is constant and increasing from north to south.

2.6. Entanglement

Quantum entanglement is a phenomena wherein quantum particles interact and are represented by reference to one another despite their great distance apart. At the point of measurement, if one of the entangled particles in a pair is determined to be in the ’down’ spin state, this state is instantly transmitted to the other particle, which now adopts the opposite spin, state of ’up’. Even if entangled particles were positioned on opposite corners of the universe, they would remain “connected”. This example is significant because it demonstrates how wave–particle duality qualities may be used to allow qubits to interact with quantum algorithms via i n t e r f e r e n c e in some cases. We consider, for example, the state 1 2 | 00 〉 + | 11 〉 (which is a B e l l s t a t e ), and this state has a chance of 50 percent of being measured in state | 00 〉 and another half chance of being measured in the state | 11 〉 . The measurement of one causes the superposition to collapse and seems to have an instantaneous impact on the other.

The Bell states, also called EPR pairs, are quantum states with two qubits that represent the simplest (and most extreme) forms of quantum entanglement; they are a form of normalized and entangled basis vectors. This normalization indicates that the particle’s overall probability of being in one of the states stated is 1: 〈 ψ | ψ 〉 = 1 [ 38 ].

Four fully entangled states known as Bell states exist, and they constitute an orthonormal basis (i.e, a basis in which all of the vectors have the same unit norm and are orthogonal to one another):

  • 1. | ψ 00 〉 = 1 2 | 00 〉 + | 11 〉 .
  • 2. | ψ 01 〉 = 1 2 | 01 〉 + | 10 〉 .
  • 3. | ψ 10 〉 = 1 2 | 00 〉 − | 11 〉 .
  • 4. | ψ 11 〉 = 1 2 | 01 〉 − | 10 〉 .

Although there are a variety of ways to create entangled Bell states using quantum circuits, the most basic uses a computational basis as input and includes a Hadamard gate and a CNOT gate (see Figure 5 a).

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g005.jpg

Quantum circuits of Bell state and Bell measurement. ( a ) Bell state; ( b ) Bell measurement.

The circuit calculation results are displayed in Table 2 .

The results of the Bell states circuit computation.

In the opposite sense: Bell’s measure is a standard measure with standard outputs i and j corresponds to a measure of the state | ψ i j 〉 . Figure 5 b represents the circuit of Bell measurement.

2.7. Quantum Computer

Quantum computers are processing machines that take advantage of quantum physics properties. This may be extremely beneficial for some tasks, where they can greatly outperform the most powerful supercomputers. A quantum computer can be considered as a co-processor of a traditional computer, as a GPU can be for video games or for training neural networks in deep learning. As shown in Figure 6 , a traditional computer closely control computer operations by triggering at precise rates the qubit operations performed by the quantum gates. This trigger takes into consideration the execution time of the quantum gates as well as the known c o h e r e n c e t i m e of the qubits, i.e., the duration for which the qubits remain in a superposition state. In addition to its classical control computer, the quantum computer includes several components, which we analyze one by one, with the details below:

  • Quantum registers are just a collections of qubits. In November 2022, the benchmarked record was 433 qubits, announced by IBM. Quantum registers store the information manipulated in the computer and exploit the principle of superposition allowing a large number of values to coexist in these registers and to operate on them simultaneously;
  • Quantum gates are physical systems acting on the qubits of the quantum registers, to initialize them and to perform computational operations on them. These gates are applied in an iterative way according to the algorithms to be executed;
  • At the conclusion of the sequential execution of quantum gates, the measurement interface permits the retrieval of the calculations’ results. Typically, this cycle of setup, computation, and measurement is repeated several times to assess the outcome. We then obtain an average value between 0 and 1 for each qubit of the quantum computer registers. The values received by the physical reading devices are therefore translated into digital values and sent to the traditional computer, which controls the whole system and permits the interpretation of the data. In common cases, such as at D-Wave or IBM, which are the giants of quantum computers building, the calculation is repeated at least 1024 times in the quantum computer;
  • Quantum chipset includes quantum registers, quantum gates and measurement devices when it comes to superconducting qubits. Current chipsets are not very large. They are the size of a full-frame photo sensor or double size for the largest of them. The size of the latest powerful quantum chip called the 433-qubit Osprey, around the size of a quarter;
  • Refrigerated enclosure generally holds the inside of the computer at temperatures near absolute zero. It contains part of the control electronics and the quantum chipset to avoid generating disturbances that prevent the qubits from working, especially at the level of their entanglement and coherence, and to reduce the noise of their operation;
  • Electronic writing and reading in the refrigerated enclosure, control the physical devices needed to initialize, update, and read the state of qubits.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g006.jpg

The quantum computer’s architecture.

Quantum computers today are built based on atoms (e.g., cold atoms, trapped ions and nuclear magnetic resonance), electrons (e.g., superconducting, silicon and topological), or photon (e.g., linear optics). D-Wave, IBM and Google have all shown significant quantum computer development progress. The D-Wave model is built using the quantum adiabatic technique, which handles issues involving optimization or probabilistic sampling, whereas the IBM model is mostly dealing with non-adiabatic models. Most of those companies provide a quantum computer simulator that runs on a classical computer to test code before executing it on a quantum device. It is possible to use this simulator in either a local environment or the cloud. It cannot handle true quantum states because it is operating on a classical computer, but it is useful for testing code syntax and flow.

2.8. Quantum Algorithms

The creation of quantum algorithms necessitates a higher level of expertise than traditional algorithms and programs. Quantum computers will need the development of a new generation of mathematicians and developers capable of reasoning with the mathematical formalization of quantum programming. Furthermore, these algorithms must be more efficient than those designed for traditional computers or supercomputers. Given that quantum computing employs a distinct method of computation, it is only reasonable to ask what kinds of problems may now be solved in this new environment, even if they were not expected to be solved in a traditional computer. To do this, we must review the theory of complexity.

The complexity theory focuses on the s c a l a b i l i t y and computational cost of algorithms in general and for specific problems. Scalability refers to the time and/or space necessary to increase the volume or complexity of a computation’s goal. Using the Big-O notation, an algorithm that is O ( n 3 ) is considered to be “harder” than one that is O ( n 2 ) , since the former will often affect more operations than the latter, regardless of the time at which these operations are executed. A specific complexity class is composed of problems that share similar characteristics of difficulty. The most significant categories of complexity are discussed below:

  • Polynomial time (P): Are issues solvable in a polynomial amount of time. In other terms, a traditional computer is capable of resolving the issue in a reasonable time;
  • Non-deterministic Polynomial time (NP): Is a collection of decision issues that a nondeterministic Turing machine might solve in polynomial time. P is an NP subset;
  • NP-Complete: X is considered to be NP-Complete only if the requirements here are met: ( i ) X is in NP, and ( i i ) in polynomial time, all NP problems are reducible to X. We assert that X is NP-hard if only ( i i ) is true and not necessarily ( i ) ;
  • Polynomial Space (PSPACE): This category is concerned with memory resources instead of time. PSPACE is a category of decision issues that can be solved by an algorithm whose total space utilization is always polynomially restricted by the instance size;
  • Bounded-error Probabilistic Polynomial time (BPP): Is a collection of decision issues which may be handled in polynomial time using a probabilistic Turing computer with such a maximum error probability equal to 1/3;
  • Bounded-error Quantum Polynomial time (BQP): A decision issue is BQP, if it has a polynomial time solution and has a high accuracy probability. BQP is the basic complexity category of problems which quantum computers may effectively solve. It corresponds to the classical BPP class on the quantum level;
  • Exact Quantum Polynomial time (EQP or QP): Is a type of decision issue that a quantum computer can handle in polynomial time with probability 1. This is the quantum counterpart of the P complexity class.

The potential uses of quantum computing vary from decrypting cryptography systems to the creation of novel medications. These applications use quantum algorithms, which are programs that exploit quantum gates on a quantum computer in order to obtain a speedup or even other advantages over traditional algorithms. An important consideration in the creation of quantum algorithms is to ensure that they are more efficient than their counterparts optimized for traditional computers. Theories exist to verify this by evaluating the exponential, polynomial, logarithmic, or linear scaling of the computation time as a measure of the task’s complexity, or a combination of all four. Currently, there are four primary classes of quantum algorithms: s e a r c h a l g o r i t h m s based on those of Deutsch Jozsa [ 39 ] and Grover [ 1 ]; a l g o r i t h m s b a s e d o n Q u a n t u m F o u r i e r T r a n s f o r m s (QFT) such as Shor’s algorithm [ 2 ], which is used to factor integers; q u a n t u m a n n e a l i n g a l g o r i t h m s , algorithms that search for an equilibrium point of a complex system as in neural network training, optimal path search in networks or process optimization; and q u a n t u m s i m u l a t i o n a l g o r i t h m s , which are used to simulate the interactions between atoms in various molecular structures. Quantum chemistry, which simulates the impact of chemical stimulation on a huge amount of atomic particles, is one particularly exciting topic in quantum simulation.

One of the first quantum algorithms invented is that of David Deutsch, called Deutsch Jozsa algorithm [ 39 ], co-invented with Richard Jozsa. This algorithm allows for identifying the function of a “black box” called an “oracle” (a black box that is frequently used in quantum algorithms to estimate functions using qubits), which we know in advance that it will return for all its inputs, either always the same value, 0 or 1, or the values 0 and 1 in equal parts. The algorithm thus makes it possible to know if the function has a balanced output. It is implemented on a set of qubits n, all of the input qubits are set to zero, except one which is initialized to 1, then they are each put in superposition between 0 and 1 using Hadamard gates. The qubits thus have simultaneously all the possible values with 2 n + 1 combinations of values. It is easy to understand why this quantum algorithm is much more efficient than its conventional counterpart. In conventional computation, more than half of the possible input values would have to be sequentially scanned, whereas, in the quantum version, they are all scanned at the same time. This is an example of a strong algorithm that has no known practical use. Moreover, there are classical probabilistic algorithms algorithms that erase a good part of the quantum power gain of the Deutsch–Jozsa algorithm.

Grover’s algorithm is the other popular algorithm [ 1 ], created in 1996. It allows for a fast quantum search in a database. A bit like the Deutsch–Jorza algorithm, it scans a list of elements to find those that verify a specific condition. It also uses the superposition of qubit states to speed up the processing compared to a traditional sequential search. The improvement in performance is significant compared to an unsorted database. Grover’s algorithm also uses an “oracle” or “black box” function that will indicate whether a set of input qubits verifies a search condition or not, such as verifying that a given phone number has been found in a list of phone numbers. In such a case, the function compares the phone number searched and the one submitted to it, to answer one if they are identical and zero otherwise. The black box is a quantum box and will evaluate this function for 2 N registers of qubits at the same time. It will therefore output a one once and zeros elsewhere. The computing time is determined by the square root of the list size, while a classical approach has a computation time proportional to the size of the list. Therefore, going from a time N to N is an interesting gain. This algorithm can then be used to be integrated in other algorithms such as those which allow for discovering the optimal path in a graph or the minimal or maximal number of a series of N numbers.

The quantum Fourier transform (QFT) was invented by Don Coppersmith in 1994 [ 40 ]. It is used in various other algorithms and in particular in Shor’s algorithm, which is a factoring algorithm for integers. The QFT could compute the prime factorization of an N-bit integer in log 2 ( N ) time, whereas the most well-known basic Fourier transform requires N × log ( N ) time. IBM performed one of the initial implementations of Shor’s technique in 2001 using an experimental quantum computer with seven qubits to factor the integer 15. In 2012, Nanyang et al. [ 41 ] claimed factorization of 143 using an NMR adiabatic quantum computer at ambient temperature (300 K). Using a D-Wave 2X quantum computer, the 18-bit number 200 099 successfully factored in April 2016 utilizing quantum annealing [ 42 ]. Late in 2019, an industry collaboration determined that 1,099,551,473,989 equals 1,048,589 times 1,048,601 [ 43 ]. Note that Shor’s algorithm also allows for breaking cryptography using elliptic curves [ 44 ], which compete with RSA cryptography. Incidentally, a part of the cryptography used in the Bitcoin protocol would also pass through the Shor algorithm [ 45 ]. In addition to the purely quantum algorithms (based only on quantum gates), we must indeed add: hybrid algorithms, combining traditional algorithms and quantum algorithms or algorithms based on quantum-classical gates. This is notably the situation involving the Variational Quantum Eigensolver (VQE) [ 46 ], which allows the solution of chemical simulation problems as well as neural network training; and quantum inspired algorithms, which are algorithms for classical computers inspired by quantum algorithms for solving complex problems.

3. Quantum Machine Learning

There are four strategies of how to combine machine learning and quantum computing, based on if one considers the data was created by a classical (C) or quantum (Q) system, and whether the computer that processes data are classical (C) or quantum (Q),(as shown in Figure 7 ).

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g007.jpg

Four different ways to combine quantum computing with machine learning. Reprinted with permission from Ref. [ 47 ]. 2017, Maria Schuld.

The scenario CC refers to data that is treated traditionally. This is the traditional technique of machine learning; however, in this case, it refers to machine learning using quantum information research-derived approaches. Tensor networks that are designed addressing to quantum many-body processes, have been employed in training neural networks [ 48 ]. In addition, there are various ’quantum-inspired’ machine learning algorithms with varied degrees of quantum theoretical background.

The scenario QC examines how machine learning might aid quantum computing. For instance, if we wish to obtain a complete explanation of the state of a computing device by taking a few measurements, we may analyze the measurement data using machine learning [ 49 ]. There are several applications for machine learning to distinguish either quantum states generated by a quantum source or manipulations carried out by a quantum experiment [ 50 , 51 ].

The scenario CQ uses quantum computing to examine conventional datasets. Observations from classical computers, which include images and text, are supplied to a quantum device for evaluation. The primary objective of the CQ methodology is the development of quantum algorithms to be used in data mining, for which the community has offered a variety of solutions. They might be adaptations of traditional machine learning models to function with quantum algorithms, or they can be entirely novel creations inspired by the characteristics of quantum computing.

The last scenario QQ examines the processing of “quantum data” by a quantum device. There are two possible interpretations of this. The information may be obtained through a quantum experiment involving a quantum system and the subsequent input of those experimental measurements into a different quantum computer. In a second interpretation, a quantum computer simulates the behavior of the quantum system and then uses the state of this system as such an input of a QML algorithm performed on the same computer.

Nowadays, Quantum Machine Learning (QML) algorithms are separated into three distinct strategies. Quantum machine learning algorithms, which are quantum versions of classical machine learning, in addition to algorithms that are implemented on a real quantum computer, including QSVM [ 4 ], Quantum Neural Network [ 52 ] and Quantum Linear Regression [ 8 ]. The second strategy is quantum-inspired machine learning, which leverages the concepts of quantum computing to enhance traditional machine learning algorithms. A few examples of such algorithms include quantum-inspired binary classifier [ 53 ], Helstrom Quantum Centroid which is a quantum-inspired binary supervised learning classification [ 54 ], Quantum-inspired Support Vector Machines [ 55 ], Quantum Nearest Mean Classifier [ 56 ], inspired Quantum K Nearest-Neighbor [ 57 ], quantum algorithms for ridge regression [ 58 ] and Quantum inspired Neural network [ 59 ]. The third strategy, hybrid classical-quantum machine learning algorithms, merges classical and quantum algorithms to improve performance and reduce the cost of learning—for example, using quantum circuit to propose a novel variational quantum classifier [ 60 ], variational quantum SVM and SVM quantum kernel-based algorithm [ 61 ], quantum KNN algorithm [ 62 ], and a hybrid quantum computer-based quantum version of nonlinear regression [ 63 ].

In our study, we concentrate on the CQ scenario. There are two distinct approaches to the development of quantum machine learning models. The first approach involves running traditional machine learning algorithms on quantum computers or simulators in an effort to achieve algorithmic speedups. This approach needs to translate conventional data into quantum data, a process known as q u a n t u m e n c o d i n g . Another approach is to create QML algorithms based on quantum subroutines, including Grover’s algorithm, HHL algorithm, quantum phase estimation, and Variational Quantum Circuit (VQC), which we cover later in detail.

Figure 8 illustrates the processing strategies used in conventional machine learning and QML. In traditional machine learning, data are a direct input to the algorithm, which then analyses the data and generates an output. QML, on the other hand, demands the initial encoding of the data into quantum data. QML receives quantum data as input, processes it, and generates quantum data as output. The quantum data are then converted to conventional data.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g008.jpg

Processing techniques of conventional machine learning and quantum machine learning. CD represents classical data and QD represents quantum data.

Quantum Machine Learning requires a more intricate encoding of classical to quantum data than classical machine learning.

3.1. Quantum Encoding

The majority of machine learning models depends on sets of data with { x i } i ∈ [ N ] samples, where x i ∈ R d for every i . Additionally, one requires classical outcomes including labels or classes { y i } i ∈ [ N ] , where y i ∈ R d ′ for every i . It appears that the ability to deal with conventional data will be required for the vast majority of practical applications of quantum machine learning [ 64 ].

In a Hilbert space, there are several strategies for encoding data from conventional data to quantum data. In other words, data encoding requires putting conventional data into a quantum computer as quantum states. We describe three methods. We recommend [ 65 ] for information on different strategies.

3.1.1. Basis Encoding

Basis encoding links a computational basis state for an n-qubit system. Therefore, traditional data should be the binary strings, and encoded quantum state is the bit-wise conversion of a binary string towards the equivalent states of the quantum subsystems. In example, x = 1001 is encoded by the 4-qubit quantum state | 1001 〉 . Furthermore, one bit of classical data are encoded by one quantum subsystem. In some ways, it is the most basic type of computing since every bit is effectively replaced with a qubit, and computations are carried out simultaneously on all bits in a superposition. Consequently, the bit encoding for a vector is the following: m + [ log ( d ) ] qubits could a vector x = ( x 1 , … , x d ) ∈ R d as a quantum superposition of bit strings, where each instance must be a binary string with N bits for the basis encoding. x i = ( b 1 , … , b j , … , b N ) for j = 1 , … , N with b j ∈ { 0 , 1 } :

where m denotes the amount of qubits used for the precision, while d represents the number of samples. To encode a vector x = ( 2 , 3 ) in basis encoding. First, we must transform it to a binary format, with 2 bits, x 1 = 10 , x 2 = 11 . The corresponding basis encoding makes use of two qubits to represent the data as below:

3.1.2. Amplitude Encoding

To employ qubits to represent classical vectors or matrices, the most efficient encoding is by far the amplitude encoding approach. It is the theoretical link between quantum computing and linear algebra that uses quantum features to the maximum. The amplitudes of an n-qubit quantum state | ψ x 〉 as | ψ x 〉 = ∑ i = 1 d x i | i 〉 , with d = 2 n represent a normalized classical d-dimensional set of data x , where x i is the i -th item, and | i 〉 is the i -th computational base state. However, in this case, x i might be any numeric value, such as a floating or integer point. For instance, suppose we want to use amplitude encoding to encode the four-dimensional array x = ( 1.0 , 3.0 , − 2.5 , 0.0 ) . The first step is to normalize this array, which is achieved by setting x n o r m = 1 16 , 25 ( 1.0 , 3.0 , − 2.5 , 0.0 ) . Two qubits are used to encode x n o r m in the corresponding amplitude encoding as follows:

This result state will at the same moment encode the matrix below:

where the first qubit represents the index for such a row, while the second qubit represents the index for such a column.

3.1.3. Qsample Encoding

Qsample encoding is a hybrid scenario of amplitude and basis encoding, and it combines a real amplitude vector with conventional binary probability distributions. Suppose an amplitude vector v = ( v 1 , … , v 2 n ) T that describes an n-qubit quantum state. Sampling a series of bits of size n with a set of discrete probabilities p 1 = | v 1 | 2 ,…, p 2 n = | v 2 n | 2 is comparable to measuring the n qubits. This means that the amplitude vector may be used to characterize the distribution of an a classical discrete random variable [ 66 ]. Taking into account a discrete classical probability distribution over binary representation p 1 , … , p 2 n , the quantum state is:

3.2. Essential Quantum Routines for QML

In this subsection, we discuss a few of the most basic quantum algorithms which aid in the development of quantum machine learning algorithms.

3.2.1. HHL Algorithm

Using a quantum implementation, the Harrow–Hassidim–Lloyd algorithm [ 5 ] consists of solving a linear equation set. Solving a system of linear equations is comparable to the matrix inversion issue. The objective of the matrix inversion problem, with a matrix M and a vector v , is to determine the vector x :

Numerous algorithms for machine learning determine their parameter θ through solving the M θ = v matrix inversion issue, which makes this algorithm essential for quantum machine learning. In general, the matrix M represents the input characteristics of the training data points expressed in the matrix X . The data point matrix X and the targeted vector Y are both used to create the training data point vector v . In linear regression, in which the predicted output is y = θ T x , determining θ is as simple as solving the matrix inversion issue presented below:

thus, M = X T X and v = X T Y in linear regression.

For the HHL algorithm, we must identify one or more operators capable of transforming the state | v 〉 into the solution vector θ . Clearly, M = X T X would have to be included into one of the operators. If M is unitary, we cannot use it as the quantum operator. However, if M is Hermitian, we might assign it to a Hamiltonian denoted by H of a quantum system. As a reminder, a linear operator or matrix H is Hermitian only if its complex conjugate transpose H † is identical to H . Although if M is not Hermitian, a Hermitian operator M ˜ may be defined as shown below:

While M ˜ is now Hermitian, it has the following eigenvalue decomposition:

in which the eigenvectors | u i 〉 provide an orthonormal basis. The vector | v 〉 may be expressed as follows in the orthonormal basis | u i 〉 :

The following is the answer to the inverse problem:

Despite the fact that M ˜ is a Hermitian matrix of spectral decomposition M ˜ = ∑ i λ i | u i 〉 〈 u i | , its inverse is given as follows:

The answer | x 〉 is obtained by putting the value of M ˜ − 1 of Equation ( 18 ) as well as | v 〉 of Equation ( 16 ) into Equation ( 17 ), which can be seen below:

Equation ( 19 ) demonstrates that, if we could move from eigenstates | u i 〉 to 1 λ i | u i 〉 , then would indeed be nearer to the answer. Q u a n t u m p h a s e e s t i m a t i o n (which we discuss later) is one approach for achieving this objective, which used the unitary operator U = e − i M ˜ t upon that state | v 〉 represented as the superposition state of such basis states | u i 〉 , as this will transform the eigenstates | u i 〉 into λ i | u i 〉 . Then, the eigenvalues may be inverted from λ i | u i 〉 to 1 λ i | u i 〉 by a controlled rotation. Let us remember that the state | v 〉 must have a unit norm before a quantum phase estimate could be performed on it. In compared to its classical version, the algorithm asserts exponential gains in speedup for certain issue formulations. The result of the procedure requires O ( ( l o g N ) 2 ) quantum states, but O ( N l o g N ) conventional computing processes.

3.2.2. Grover’s Algorithm

The speed at which database components may be accessed is one of the possible benefits of quantum computing above traditional computing. Such an algorithm is Grover’s algorithm, which may yield a quadratic speedup while searching a database. The algorithm of Grover [ 1 ] employs the amplitude amplification approach [ 67 ] that is used in database search as well as in several other applications.

Assuming we have such a database containing N = 2 n entries, and we wish to find the one indexed by w , which we refer to as the winner. The computational basis states | x 〉 that correspond to n input quantum bits may be used to define the N items. The oracle operates on every one of the computational basis states | x 〉 ∈ { 0 , 1 } n with the output f ( x ) = 1 , while for the other elements, it returns 0. In the paradigm of quantum computing, the oracle U w may be seen as the unitary operator which further operates upon that computational basis state | x 〉 , seen below:

The Oracle has the following impact on the winning element indicated element | w 〉 specified by the computational basis state:

Now that we are already familiar with the oracle, let us examine the stages of Grover’s method. Let | w 〉 represent the superposition of every state:

and the operator,

is called the Grover diffusion operator. We can consider the winner | w 〉 and an extra state | s ′ 〉 which is in the span of | w 〉 and | s 〉 , which is orthogonal to | w 〉 , and which is obtained from | s 〉 by eliminating | w 〉 and rescaling.

This superposition | s 〉 , which is easily produced from | s 〉 = H ⊗ n | 0 〉 n , is the beginning for the amplitude amplification technique, as shown in Figure 9 .

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g009.jpg

Geometric visualization and the condition of the amplitude of the state | s 〉 .

The left chart corresponds to the two-dimensional plane spanned by orthogonal vectors | w 〉 and | s ′ 〉 which allows for describing the beginning state as | s 〉 = sin θ | w 〉 + cos θ | s ′ 〉 , where θ = arcsin 〈 s | w 〉 = arcsin 1 N . The right picture is a bar chart of the amplitudes of the state | s 〉 .

Geometrically, this relates to a reflection of the state | s 〉 about | s ′ 〉 . This transformation indicates that the amplitude in front of the | w 〉 state turns negative, which in turn implies that the average amplitude (shown by a dashed line in Figure 10 ) has been reduced.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g010.jpg

Geometric visualization and the condition of the amplitude after the implementation of the U w operator.

This transformation completes the transformation by matching the state to U s U w | s 〉 , which relates a rotation around an angle θ as shown in Figure 11 .

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g011.jpg

Geometric visualization and the condition of the amplitude after the implementation of the U s operator.

The state will rotate by r × θ after r implementation of step 2, where r = π 4 2 n ≡ O ( N ) [ 38 ].

  • 3. The final measurement will give the state | w 〉 with probability P ( ω ) ≥ 1 − sin 2 ( θ 2 ) = 1 − 1 2 n .

3.2.3. Quantum Phase Estimation

Quantum phase estimation is a quantum algorithm that employs the Quantum Fourier Transform (QFT) to convert information encoded within the phase φ with an amplitude α = | α | e i φ of a state. The QFT accelerates exponentially the process of translating a quantum state encoded vector into Fourier space. It is often utilized in QML algorithms to retrieve the information contained in the eigenvalues of operators which includes details of data points. Phase estimation consists essentially to identify the eigenvalues of such a matrix U , which is represented as the operator U in the quantum circuit U , hence this operator must be unitary. We designate their eigenvectors by | u j 〉 as well as eigenvalues e i θ j , thus U | u j 〉 = e i θ j | u j 〉 . Assuming as inputs an eigenvector and an additional register | u j 〉 | 0 〉 , the method should return | u j 〉 | θ j 〉 . Because the eigenvectors form a basis, it is possible to express every state | ψ 〉 as | ψ 〉 = ∑ j ∈ [ n ] α j | u j 〉 . Phase estimation is therefore particularly convincing due to its potential for use in superposition. The circuit diagram is shown below. Given an eigenvector as well as an extra register | u j 〉 | 0 〉 as input, the algorithm will return | u j 〉 | θ j 〉 . Since eigenvectors constitute a basis, each state | ψ 〉 could be represented as | ψ 〉 = ∑ j ∈ [ n ] α j | u j 〉 . Thus, the most intriguing aspect of phase estimation is its application in superposition. The circuit diagram of the quantum phase estimation is shown in Figure 12 .

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g012.jpg

Quantum phase estimation circuit. Q F T − 1 is just the inverse QFT.

One has the option of measuring the final output to obtain a classical description of the eigenvalues or retaining it as a quantum state for future computing. Recall that this circuit requires the sequential controlled application of the unitary U . For such U circuits, this can be an expensive operation.

3.2.4. Variational Quantum Circuit

Variational quantum circuits (VQC) [ 68 ] is the method that has attracted the most interest from researchers. It was suggested using the variational quantum eigensolver (VQE) method for quantum chemistry [ 46 ] and the quantum approximation optimization algorithm (QAOA) [ 31 ] for optimization based on classical machine learning. The latter used for a lot of related applications in machine learning [ 27 , 69 , 70 ]. Variational quantum circuit have universal qualities [ 71 , 72 ] and have already shown positive outcomes in real-world tests [ 73 ], but their natures are vastly different. They generally based on the approach shown in Figure 13 . The ansatz is a tiny circuit comprised of numerous gates having adjustable characteristics, including the angle of a gate that controls rotation. The resultant quantum state is then measured, which should provide the correct answers for the intended problem (classification, regression). Initial findings are poor since settings are almost random. The name of this measure referred to as the Loss or the Objective Function. In the end, an optimization is performed on a traditional computer in order to provide a hopefully improved set of parameters for the experiment to test. In addition, this process is repeated until the circuit produces acceptable results.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g013.jpg

Representation of a variational quantum circuit optimization scheme.

When performing a VQC, it is necessary to make an estimate of the gradients of such a cost function relative to every parameter. In conventional neural networks, this is typically achieved through backpropagation across analytic procedures. While using VQC, processes become excessively complicated, and we are unable to reach intermediate quantum states unless we first measure them. In the present day, the progressed approach is referred to as the parameter shift rule [ 73 ], and needs to apply the circuit twice for each parameter and measuring its outcome twice. In contrast, traditional deep learning just requires a single forward and backward pass through the network to collect all of the thousand of gradients. The parameter shift rule could be parallelizable over several simulators or quantum devices. However, this might be impractical for a high number of parameters.

Table 3 presents a comprehensive overview of the current QML algorithms and indicates the quantum routines they are based on or have used.

An overview of current quantum machine learning algorithm.

3.3. QML Algorithms

3.3.1. quantum support vector machines.

Support Vector Machines, generally known as SVMs, are a type of supervised algorithm for machine learning that may be used to handle problems involving linear discrimination. The approach involves establishing a hyperplane that differentiates between two different classes of feature vectors. This hyperplane serves as a decision threshold for the further categorization of data and is based on the concept of finding the hyperplane. The SVM is described as aiming to maximize the distance between the hyperplane and the support vectors of the data points that are located nearest to it. Depending on the kernel employed by the SVM method, the objective function is sometimes convex or not. Non-convex functions are often closer to the local maximum; hence, the conventional SVM compromises optimization efficiency and accuracy rate. QSVM uses Grover’s algorithm as a quantum subroutine for reduction. This makes it possible for non-convex cost functions to converge to their global optimal value [ 79 ]. We describe the Quantum SVM algorithm as the following. For clarity, consider linear SVMs with hyperplanes given below:

Recall that c is a constant value while x and θ are vectors. If θ is constrained to a minimum, then the margin may be relied upon to accurately categorize the data. All of that is characterized by the following optimization issue:

subject to one constraint:

In the training data i = { 1 , … , M } and y ( i ) = { − 1 , 1 } , using Lagrange multipliers α ( i ) , the constraint may be included into an objective function, as a consequence, and the problem might be described as follows:

It is important to note that the only non-zero value of α ( i ) is equivalent to the sum of the support vectors x ( i ) . The derivatives listed below are all set to zero order to maximize the objective function F with regard to α ( i ) :

As a result, we can describe weights as:

as well as the simultaneous issue as:

assuming that:

for every i in the training set i = 1 , … , M as well as:

It is possible to introduce nonlinearity into the optimization problem by expanding it to all kernel functions K ( x ( i ) , x ( j ) ) . This kernel approach is used instead of the dot product in the preceding dual problem:

For instance, we may define the Gaussian kernel function by:

This necessitates more Euclidean distance computations. A detailed explanation of each stage of the algorithm is given below:

  • 1. kernel function and parameters initialization: Each parameter utilized by a kernel function must have its value initialized. Choose a relevant kernel function for the problem at hand and then generate the corresponding kernel matrix;
  • 2. Parameters and classical information represented by quantum states: In this stage, the objective function is segmented, and its components are recorded in qubits. Binary strings may be used to represent the conventional data: x → b = ( b 1 , b 2 , … , b m ) T , (36) with b i = { 0 , 1 } for i = 1 , … , m . Then, the binary strings may be simply converted into k qubit quantum states: | b 1 , b 2 , … , b m 〉 , (37) which form a Hilbert space of 2 k dimensions covered by basis { | 00 … 0 〉 , | 10 … 0 〉 , … , | 11 … 1 〉 } ;
  • 3. The quantum minimization subroutine examines objective function space: The Grover technique determines the optimal value of α ( i ) that resolves for θ and c by searching the space of all possible objective functions. By first generating a superposition of all potential inputs to a quantum oracle O expressing the objective function, this procedure achieves a global minimum for the SVM optimization problem. This subroutine’s measurement yields the answer with a high degree of probability.

The Grover approach decreases the temporal complexity of the given SVM optimization problem from O ( N ) to O ( N ) , in which N is the training vector’s dimension, and provides a global minimum. The computation of a kernel matrix is one of the most time-consuming aspects of any SVM algorithm, with a computational complexity of O ( M 2 N ) . The approach has the same limitations as that of the quantum subroutine GroverOptim, namely that the algorithm could not produce acceptable results due to quantum noise. It is assumed that the objective function of SVM is provided as an input to the algorithm by a quantum oracle. Another quantum technique for SVM, introduced in [ 4 ], demonstrates an unconstrained exponential speed-up, as we shall discuss below.

3.3.2. Quantum Least Square SVM

The quantum variant of the Least Square SVM is a quantum machine learning breakthrough, in which Rebentrost et al. [ 4 ] obtained a significant exponentially speedup gain in the formulation of support vector machine. This algorithm transforms the optimization issue into such a linear equation system that includes the kernel matrix in handling the optimization model. By handling the least squares method and accelerating the kernel matrix computations, we obtain a significantly more efficient method for solving the linear equations system. We use quantum advantages in the following areas to improve the efficiency of the SVM:

  • Quantum random access memory (QRAM) data translation: Preparing the collected data as input into the quantum device for computation is among the difficult tasks in QML. QRAM aids in transforming a collection of classical data into its quantum counterpart. QRAM requires O ( l o g 2 d ) steps to retrieve data from storage in order to reconstruct a state, with d representing the feature vector’s dimension;
  • Computation of the kernel matrix: The kernel matrix is mostly determined by the examination of the dot product. Therefore, if we obtain speedup benefits of the dot product performance in the quantum approach, this will result in an overall speedup increase in the computation of the kernel matrix. With the use of quantum characteristics, the authors of [ 5 ] provide a quicker method for calculating dot products. A quantum paradigm is also used for the computation of the normalized kernel matrix inversion K − 1 . As previously said, QRAM requires only O ( l o g 2 d ) steps to request for recreating a state, hence a simple dot product using QRAM requires O ( ϵ − 1 l o g 2 d ) steps, where ϵ represents a desired level of accuracy;
  • Least square formulation: The quantum implementation of the exponentially faster eigenvector performance makes the speedup increase conceivable during the training phase in matrix inversion method and non-sparse density matrices [ 5 ].

The formulation for the least square SVM is as follows:

using A ^ = A T r A Normalized T r A , and T r A represents a trace of A . The aim will be solving the optimization issue A ^ ( | b , α → 〉 ) = | y → 〉 , where b and α → are parameters, and K denotes the kernel matrix.

3.3.3. Quantum Linear Regression

The aim of any regression issue is to make an accurate prediction of the continuous value of a target variable y i ∈ R , based on a collection of input characteristics x ( 1 ) , x ( 2 ) … x ( N ) , which may be expressed as an N-dimensional input data vector x ∈ R N . It is assumed in linear regression that the output is a linear composition of the input characteristics and an unavoidable error factor ϵ i , as shown below:

The parameters of the model we would like to learn are the θ i values relating to each feature and the intercept b . If the parameters θ i , where i ∈ { 1 , 2 , … N } are represented with such a vector θ ∈ R , the linear connection from Equation ( 39 ) may therefore be simplified as follows:

The phrase y i / x i i represents the value of y i as a function of x i . Therefore, ϵ i would be the fundamental component which has zero association with the input characteristics, and, as a result, cannot be learned. Nevertheless, given x i , we can precisely define the phrase θ T x i + b . Assuming that the error ϵ i has a normal distribution having zero mean and limited standard deviation σ i , the following expression may be written:

and θ T x i + b was indeed constant based on the value within the feature vector x i ; hence, we may affirm:

Given the input feature, the target label y i follows a normally distributed with mean θ T x i + b and standard deviation σ . As can be seen below, in linear regression, the conditional mean of the distribution is used to make predictions:

Minimizing the square of the error term ϵ i at every data point enables the identification of the model parameters θ and b . The bias factor b may be used as a parameter through the parameter vector θ associated with the constant feature of 1 for convenience of notation. This results in the prediction y ^ i = θ T x i where θ and x i represent N + 1 -dimensional vectors. Through this reduction in complexity, the equation system describing the M data points may be expressed in matrix notation below:

Equation ( 44 ) may be rewritten as follows using a matrix representation of the input feature vectors X ∈ R M × ( N + 1 ) and a vector representation of the prediction Y ^ i ∈ R M :

Therefore, if we let the vector Y ∈ R M represent the actual targets y i in all M sets of data, we can calculate the error vector ϵ ∈ R as below:

The loss function may be expressed also as the average of the squared errors in the predictions over all points in the data:

The parameter may be determined by minimizing the loss L ( θ ) for a range of values of θ . The minima may be found by setting the gradient of the loss L ( θ ) relative to θ to zero vector, as illustrated below:

The matrix ( X T X ) may be thought of as the Hamiltonian of a quantum system due to its Hermitian character. Using the HHL technique, we may solve the matrix inversion issue posed by Equation ( 48 ) in order to obtain the model parameter θ .

3.3.4. Quantum K-Means Clustering

Grover’s search algorithm and the Euclidean distance computation approach may be used to provide the quantum equivalent of conventional k-means clustering. The processes are identical to those of the traditional k -means algorithm, only with the exception that quantum routines rather than conventional ones are used to execute each step. The steps are detailed below.

  • 1. Initialization: Set the k cluster u 1 , u 2 , … , u k ∈ R n by using heuristic comparable to that of the traditional k -means algorithm. For instance, k data points may be randomly selected as the first clusters;
  • (a) In every piece of data, x i ∈ R n defined with its magnitude | | x → i | | 2 saved conventionally as well as the unit norm | x i 〉 saved like a quantum state, the distance is computed by using quantum Euclidean distance computation procedure with every one of the k cluster centroids: d ( i , j ) = | | ( x i − u j ) | | 2 = 4 Z ( P ( | 0 〉 ) − 0.5 ) , (49) with j ∈ { 1 , 2 , … k } ;
  • (b) Apply the search technique developed by Grover to allocate every x i in the data set to a single k clusters. As seen below, the oracle implemented in Grover’s search method must be capable of taking the distance d ( i , j ) and then allocate the proper cluster c i according to the formula below: c i = a r g m i n j | | ( x i − u j ) | | 2 c i ∈ { 1 , 2 , … k } ; (50)
  • (c) The mean of each cluster is determined after assigning each piece of data x i to its cluster c i ∈ { 1 , 2 , … k } : u j = 1 N j ∑ c i = j x i , (51) where N j represents the total number of data that might be assigned to cluster j .

The method converges when the clusters of data points no longer change with each iteration. Conventional k -means clustering does have an iteration complexity equal to O ( M N k ) ). For each data point, the computational complexity of the distance to a cluster is O ( N ) , wherein N represents the number of features. The complexity is O ( N k ) for each data point, as there are k clusters. Moreover, the total complexity for every iteration is O ( M N k ) , as the complexity for each of the M points of data are O ( N k ) . The advantage of the quantum k-means implementation is that the complexity of computing the quantum Euclidean distance between every data point in a cluster is just M l o g ( N ) k for a high value of a feature size N . Notably, for conventional and quantum k-means, the complications of allocating every data point towards the right cluster using distance minimization are not taken into account. In this aspect, if built appropriately, Grover’s technique, which is used to allocate data points to its respective clusters, may give an additional speedup.

3.3.5. Quantum Principal Component Analysis

The classical Principal component analysis (PCA) is a crucial dimensionality reduction approach. We might claim that PCA works by transposing the covariance matrix of the data, C = ∑ j u → u → j T , onto its diagonal. The covariance matrix summarizes how the various data elements are related to one another. To write the covariance matrix in terms of its eigenvectors and eigenvalues, we write C = ∑ j e k v → k v → k † , wherein e k represents the eigenvalues corresponding to the eigenvectors v k . Principal components refer to the few eigenvalues that are relatively big compared to the rest. Therefore, each of the primary components is regarded as a new feature vector.

The traditional PCA technique has a runtime complexity of O ( d 2 ) , wherein d corresponds to the size of the Hilbert space. For quantum PCA [ 14 ], a quantum state is transferred into a randomly selected data vector using QRAM. ρ = 1 N ∑ j | u j 〉 〈 u j | is the density matrix of the resultant state, with N being the cardinality of the input vector collection. Applying density matrix exponentiation in conjunction with the continuous sampling of the data and quantum phase estimation approach, we may deconstruct the input vectors into their principal components. The quantum PCA method has a runtime complexity of O ( ( l o g d ) 2 ) .

4. ML vs. QML Benchmarks

In this section, we perform the Variational Quantum Classifier and compare it with several classical classifiers; next, we implement QSVM and compare it with the classical SVM, and then the implementation of the Quantum Convolutional Neural Network is discussed in the last sub-section.

4.1. Variational Quantum Classifier

The Variational Quantum Classifier (VQC) is based on the use of a function f ( x , θ ) = y which can be implemented in a quantum computer using a circuit denoted as S x in order to encode the input data x toward a quantum state and especially as amplitudes of the state, then a quantum circuit denoted as U θ , finally a one simple qubit measurement. This latter measurement yields the likelihood of the VQC predicting ’0’ or ’1’, which may be used to determine the prediction of a binary classification. The circuit parameters ( θ ) of this classification are also trainable, and a variational technique may be employed to train these parameters [ 77 ]. The four steps of VQC are shown in Figure 14 and explained below.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g014.jpg

The classification with the quantum classifier consists of four phases, each represented by a distinct color, and can be examined from three different perspectives: a formal mathematical diagram, a quantum circuit diagram, and a graphical neural network diagram [ 77 ].

  • State preparation: To be able to encode classical data into quantum states, we use particular operations to help us work with data in a quantum circuit. As mentioned earlier, quantum encoding is one these methods that consists of representing classical data in the form of a quantum state in Hilbert space employing a quantum feature map. Recall that a feature map is a mathematical mapping that allows us to integrate our data into higher dimensional spaces, such as quantum states in our case. It is similar to a variational circuit in which the parameters are determined by the input data. It is essential to emphasize that a variational circuit depends on parameters that can be optimized using classical methods;
  • The model circuit: The next step is the model circuit, or the classifier in precise terms. φ ′ is generated using a parameterized unitary operator U θ applied to the feature vector noted as φ ( x ) which became a vector of a quantum state in an n-qubit system (in the Hilbert space ). The model uses a circuit that is composed of gates which change the state of the input and are built on unitary processes, and they depend on external factors that can be adjusted. U θ translates φ ( x ) into another vector φ ′ with a prepared state | φ ( x ) 〉 in the model circuit. U θ is comprised of a series of unitary gates;
  • Measurement: We take measurements in order to obtain information from a quantum system. Although a quantum system has an infinite number of potential states, we can only recover a limited amount of information from a quantum measurement. Notice that the number of qubits is equal to the amount of results;
  • Post-process: Finally, the results were post-processed including a learnable bias parameter and a step function to translate the result to the outcome 0 or 1.

4.1.1. Implementation

The architecture that we have implemented is inspired from [ 77 ], as we have already mentioned. The goal is to encode the real vectors as amplitude vectors (amplitude encoding) and training the model to detect the first two classes of the Iris dataset.

  • Dataset: Our classification data sets made up of three sorts of irises (Setosa, Versicolour, and Virginica), and containing four features that are Sepal Length, Sepal Width, Petal Length and Petal Width. In our implementation, we used the two first classes:
  • - VQC: The Variational Quantum Classifiers commonly define a “layer”, whose fundamental circuit design is replicated to create the variational circuit. Our circuit layer is composed of a random rotation upon each qubit, and also a CNOT gate that entangles each qubit with its neighbor. The classical data were encoded to amplitude vectors via amplitude encoding;
  • - SVC: A support vector classifier (SVC) implemented by the sklearn Python library;
  • - Decision Tree: Is a non-parametric learning algorithm which anticipates the target variable through learning decision rules;
  • - Naive Bayes: Naive Bayes classifiers utilize Bayes theory with the assumption of conditional independence in between each pair of features;
  • Experimental Environment: We use the Jupyter Notebook and PennyLane [ 80 ] (A cross-platform Python framework for discrete programming of quantum computing) for developing all the codes and executing them on IBM Quantum simulators [ 81 ]. We implemented three classical Scikitlearn [ 82 ] algorithms to accomplish the same classification task on a conventional computer in order to compare their performance with that of the VQC. We also used the end value of the cost function and the test accuracy as metrics for evaluating the implemented algorithms.

4.1.2. Results

As it can be plainly observed from the Table 4 , our VQC works slightly better than the best result obtained with the SVC in both terms of accuracy and cost function value. The probability of a model guessing the correct class in the classical example is less than 0.96, but the probability of the quantum classifier is 1 with the value of the cost function equal to 0.23, which shows that the quantum classifier made small errors in the data compared to SVC and Decision Tree.

Result of the classification algorithms for the Iris dataset.

One may assume that the effort required to employ quantum algorithms is exaggerated. However, we have to look at the whole process, in which we reduced the dimension of the data set in order to handle it by only two qubits in order to use a smaller circuit depth to prepare quantum states. Despite this, the results are promising.

4.2. SVM vs. QSVM

Quantum Support Vector Machine (QSVM) is a quantum variant of the standard SVM algorithm that uses quantum principles to perform computations. QSVM employs the advantages of quantum hardware and quantum software to enhance the performance of classic SVM algorithms that perform on classical computers with GPUs or CPUs. The initial procedure QSVM is to encode the conventional data which contains specific categories (class, features, dimension, etc.) giving the information about the data we wish to classify. Current quantum hardware functions are yet to achieve full capability as there is a limited number of qubits accessible. Therefore, we need to decrease the features of the data to render it compatible with the available qubits. The principal component analysis (PCA) is typically employed in machine learning for this kind of procedure. Next, the classical data are required to be mapped towards quantum input in order to be treated by the quantum computer. These mapped data are called a feature map which operates as a quantum feature for classical data. Finally, the proper machine learning algorithm needs to implemented to attain the best classification result. This implementation covers many processes including dividing the data into training and testing sets, determining the amount of qubits to be employed, defining the classes of data. Additionally, we need to define the algorithm as well as parameters for how many iterations this should be carried out with the depth of the circuit.

4.2.1. Implementation

The objective of this implementation is to compare the performances of the classical SVM and the QSVM which is implemented on a dataset that represents the real world using a quantum computer.

  • Breast cancer dataset: The Wisconsin Diagnostic Breast Cancer dataset (WDBC) of UCI machine learning repository is a classification dataset that includes breast cancer case metrics. There are two categories: benign and malignant. This dataset contains information on 31 characteristics that define a tumor, among which are: average radius, mean perimeter, mean texture, etc., and a total of 569 records and 31 features;

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g015.jpg

The set of data (20 total) after performing PCA.

The needed number of qubits is proportional to the data dimensionality. By altering the angle of unitary gates to a certain value, data are encoded. QSVM uses a Quantum processor to estimate the kernel in the feature space. During the training step, the kernel is estimated or computed, and support vectors are obtained. However, in applying the Quantum Kernel method directly, we should encode our training data and test data into proper quantum states. Using Amplitude Encoding, we may encode the training samples in a superposition inside one register, while test examples can be encoded in a second register.

4.2.2. Results

We have implemented SVM on the breast cancer dataset from the Scikit learn library in a classical computer using SVM as well as using QSVM on a quantum simulator in the IBM cloud using its quantum machine learning library Qiskit [ 81 ], in order to compare their performances.

As we can observe from Figure 16 , the QSVM has predicted accurately the classes of our test data, and we obtained an accuracy around 90%. Notice that, on the diagonal of the kernel matrix, we have only black squares, which means that each point is at a distance of zero from itself. In addition, what we are seeing in the kernel matrix in Figure 16 and Figure 17 , is the calculated distance of the kernel in the space with higher dimensions. Thus, we may say that, with QSVM, our dataset can simply classify, due to the quantum feature map method which encodes data in a space with a higher dimension.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g016.jpg

Kernel matrix of QSVM, with an accuracy of around 90%.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g017.jpg

Kernel matrix of SVM, with an accuracy of around 85%.

4.3. CNN vs. QCNN

Convolutional Neural Networks (CNNs) are a prominent model in computer vision that has gained a ton of potential for different machine learning tasks notably in the area of image recognition. The capacity of such networks to retrieve features from input in a typical hierarchical structure provides the majority of their advantages. Several transformation layers, mainly the convolutional layer that provides its name to the model, are employed to retrieve these features. To perform the CNN model on a quantum computer, we implement another kind of transformation layer named quanvolutional layer or quantum convolution utilized in Quanvolutional Neural Networks model (QNNs), which is introduced by Henderson et. al. [ 29 ]. Quanvolutional layers perform on data input by encoding it through a series of arbitrary quantum circuits, in a manner that is equivalent to the task of the aleatory convolutional filter layers. In this subsection, we briefly review the architectural design of Convolutional Neural Networks (CNNs) and then the architectural design of Quanvolutional Neural Networks (QNNs) that are also known as Quantum Convolutional Neural Networks (QCNNs). Then, we evaluated the performance advantage of quantum transformations by making a comparison between the CNN and QNN models based on the MNIST dataset.

Convolutional neural networks (CNNs) are a specialized sort of neural networks, built primarily for time series or image processing. They become presently the most frequently used models for image recognition applications. Their abilities have indeed been applied in several sectors such as gravitational wave detection or autonomous vision. Despite these advances, CNNs struggle from a computational limitation which renders deep CNNs extremely pricey in reality. As Figure 18 illustrates, a convolutional neural network generally consists of three components even though the architectural implementation varies considerably:

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g018.jpg

Representation of a CNN’s architecture. Source: Mathworks.

  • 1. Input: The most popular input is an image, although significant work has also been carried out on so-called 3D convolutional neural networks which can handle either volumetric data (three spatial dimensions) or videos (two spatial dimensions + one temporal dimension). For the majority of the implementations, the input needs to be adjusted to correspond to the specifics of the CNN used. These include cropping, lowering the size of the image, identifying a particular region of interest, and also normalizing pixel values to specified regions. Images, and even more widely layers of a network, may be represented as tensors. The tensor is a generalization of a matrix into extra dimensionality. For example, an image of a height H and a width W may indeed be visualized as just a matrix in R H × W , wherein every pixel represents a greyscale ranging between 0 and 255. Furthermore, all three channels in color RGB (Red Green Blue) should be taken into account, simply layering 3 times the matrix for every color. A whole image is thus viewed as a three-dimensional vector in R H × W × D wherein D represents the number of channels;
  • - Convolution Layer: Each l th layer is combined by a collection filter named kernels. The result of this procedure would be the ( l + 1 ) th layer. The convolution using a simple kernel may be considered to be like a feature detector, which will screen through all sections of the input. If a feature described by a kernel, for example, a vertical border, is present through some area of the input, it will be highly valuable at the corresponding point of the outcome. The outcome is known as the feature map of the whole convolution;
  • - Function: Just like in normal neural networks, we add certain nonlinearities also named activation functions. These functions are necessary for a neural network in order to be capable of learning any function. As in implementation of a CNN, every convolution is frequently followed by a Relu (Rectified Linear Unit function). This is a basic function that sets all negative numbers of the output to zero, then leaves the positive values as they are;
  • - Pooling Layer: The downsampling method that reduces the dimensions of the layer, particularly optimizing the calculation. Furthermore, it provides the CNN the capability to learn a form invariant to lower translations. Almost all of the cases, either we use a Maximum Pooling or perhaps an Average Pooling. Maximum Pooling consists of swapping a subsection of P × P components just by another with the biggest value. Average Pooling achieves this by averaging all numbers. Note that the number of a pixel relates to how often a certain feature is represented in the preceding convolution layer;
  • 3. Classification/Fully Connected Layer: Following a specific set of convolution layers, the inputs had been properly handled such that we may deploy a fully connected network. The weights link every input to every output, wherein inputs are all components of the preceding layer. The final layer must include a single node for each possible label. The node value may be read as the probability that the input image belongs to the specified class.

The Quanvolutional Neural Networks (QNNs) are essentially a variant of classical convolutional neural networks with an extra transformation layer known as the quanvolutional layer, or quantum convolution that is composed of quanvolutional filters [ 29 ]. When applied, the latter filters to a data input tensor; then, this will individually generate a feature map by changing spatially local subsections of this input. However, unlike the basic handling data element by element in matrix multiplication performed by a traditional convolutional filter, a quanvolutional filter transforms input data using a quantum circuit that may be structured or arbitrary. In our implementation, we employ random quantum circuits as proposed in [ 29 ] for quanvolutional filters rather than circuits with a specific form, for simplicity and to create a baseline.

This technique for converting classical data using quanvolutional filters can be formalized as follows and illustrated in Figure 19 :

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g019.jpg

( A ) represents a quanvolutional layer inside a network architecture. The quanvolutional layer has a series of quanvolutional filters (three in this case) that transform the entered data to various outcome feature maps. ( B ) represents the transformation of conventional data input and output of an arbitrary quantum circuit within the quanvolutional filter [ 29 ].

  • 1. Let us simply begin with a basic quanvolutional filter. The latter filter employs the arbitrary quantum circuit q that accepts as inputs subsections of images of dataset a u . Each input is denoted by the letter u x , with each u x being a two-dimensional matrix of length n -by- n , where n is greater than 1;
  • 2. Despite the fact that there are a variety of methods for “encoding” u x as an initial state of q , we chose one encoding method e for every quanvolutional filter, and we define the resulting state of this encoding with i x = e ( u x ) ;
  • 3. Thereafter, the application of the quantum circuit q to the beginning state i x , and the outcome of the quantum computing was indeed the quantum state o x , as defined by the relation o x = q ( i x ) that also equals q ( e ( u x ) ) ;
  • 4. In order to decode the quantum state o x , we use d which is our decoding method that converts the quantum output into classical output using a set of measurements, which guarantees that the output of the quanvolutional filter is equivalent to the output of a simple classical convolution. The decoded state is described by f x = d ( o x ) with d ( q ( e ( u x ) ) ) in which f x represents a scale value;
  • 5. Lastly, we denote the full transformation achieved by the ’quanvolutional filter transformation’ [ 29 ] by Q of a data point u x , which is also described as f x = Q ( u x , e , q , d ) . Figure 19 B represents one basic quanvolutional filter, showing the encoding, the applying circuit and the decoding process.

4.3.1. Implementation

The aim of this implementation is to demonstrate the practicality of quanvolutional layers included in a classical CNN model, the feasibility to employ this quanvolutional technique using actual world data sets, and the possibility of using features created by the quanvolutional transformation.

  • Dataset: We used a subset of the MNIST (Modified or Mixed National Institute of Standards and Technology) dataset, which includes 70,000 handwritten 28 by 28 pixels in grayscale;
  • - The CNN model: We used a simple model: Fully-connected layer containing ten output nodes and then a final activation function softmax in a pure classical convolutional neural network;
  • - The QNN model: A CNN with one quanvolutional layer seems to be the simplest basic quanvolutional neural network. The single quantum layer will be the first, and then the rest of the model is identical to the traditional CNN model;
  • Experimental Environment: The Pennylane library [ 80 ] was used to create the implementation, which was then run on the IBM quantum computer simulator QASM;
  • - A quantum circuit is created by including some small section of the input image, in our case a square of 2 × 2 . A rotation encoding layer denoted by R y (with angles scaled by a factor π );
  • - The system performs a quantum computation associated with a unit U. A variational quantum circuit or, quite generally, a randomized circuit could generate this unit;
  • - The quantum system is measured using a computational basis measurement, which gives a set of four classical expectation values. The outcomes of the measurements can eventually be post-processed in a classical way;
  • - Each value is translated into a distinct channels of one output pixel, as in a classical convolution layer;
  • - In repeating the technique on different parts, the whole input image can be scanned, thus obtaining an output object structured as a multi-channel;
  • Other quantum or classical layers can be added after the quanvolutional layer.

4.3.2. Results

Before analyzing the performance of the QNN model and traditional CNN results, we should first confirm that the model works as planned; the addition of the quanvolutional layer to the model has resulted in a highest accuracy, and the more the training iterations there are, the more the accurate the model becomes. The value of the loss function in the QNN models is quite low, which shows that our model has fewer errors during testing. As shown in Table 5 below, the QNN has a slightly higher accuracy than the CNN which indicates that our QNN model has well predicted our test data which are not used in the training of the model. We can conclude that the generated quantum features were effective in constructing features for the MNIST data set for classification purposes.

Results of the benchmarking experiments.

In Figure 20 , we clearly observe the reduction of the sampling of the resolution and certain local deformations generated by our quanvolutional layer. In addition, the general shape of the input image is preserved, as expected for a convolution layer.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g020.jpg

Under each input, the four outcome channels produced by our quantum convolution were shown in grayscale images.

Figure 21 displays the outcomes of this implementation. Our QNN model typically provides superior learning performance compared to the CNN model. The presented results prove that the quantum convolution layer might boost learning performance.

An external file that holds a picture, illustration, etc.
Object name is entropy-25-00287-g021.jpg

Comparative performance of the QCNN model and the CNN model.

5. General Discussion

As shown in the results above, the three quantum algorithms outperformed their classical counterparts in terms of accuracy and cost function value. However, it is indeed challenging to affirm that a QML algorithm will ever be better compared to a classical machine learning algorithm through practice since a quantum computer is evolved in noisy intermediate-scale quantum (NISQ) and small-scale quantum devices, which inhibits us to use a limited amount of qubits and few features, and huge data processing on quantum devices is impracticable due to the loss of a large amount of important information. QML has a number of challenges that are required to be solved on both the hardware and software levels. Quantum computers with a high-qubit-count and interface devices, such as quantum random access memory (qRAM), which enables classical information to be recorded in quantum states structure, represents the biggest and most significant challenges. Small scale quantum computers with 50–150 qubits are commonly accessible through quantum cloud computing, and their hardware is distinct; we can observe ion-trapped devices (IonQ) and both superconducting devices (IBM or Rigetti). Additionally, there are devices consisting of topological qubits, NV center (diamond), photons, quantum dots and neutral atoms. When it comes to physically implementing the quantum devices, each company decides one or more technologies. Each has benefits and drawbacks, and there is no agreement or dominance of one in the literature. However, due to the size of these devices, the complexity of their implementation, and their ongoing development, their everyday access is not yet feasible. An upgrade in quantum hardware will enable an improvement in the performance of applied algorithms. Because the the potential number of output labels is proportional to the amount of qubits in the device, most of the used algorithms, such as image classification using QCNN, are constrained by the number of qubits. We may also find a vast selection of quantum simulators on many platforms, which enable the testing of quantum computers as if they were real and even imitate their noise.

6. Conclusions and Perspectives

This article reviews a selection of the most recent research on QML algorithms for a variety of issues which are more accurate and effective than classical algorithms in classical computing, especially focusing on the quantum variant of certain supervised machine learning algorithms that rely on the VQC. In addition, we discuss numerous ways for mapping data into quantum computers. We covered approaches of quantum machine learning such as quantum sub-processes. In addition, then, we compared the performance of various QML algorithms such as QSVM, VQC, and QNN with that of its classical counterparts.

The QSVM algorithm exceeded the classical SVM in performance and speed up, but when it performed on a small subset of the dataset. Most quantum machine learning techniques are at some point restricted by the absence of an appropriate quantum RAM (QRAM) which facilitates the mapping of classical input to a quantum computer. The number of practical machine learning algorithms in the area of QML is expected to rise as QRAM implementations reach a stable state.

The VQC obtains a greater advantage over certain classical classifiers when implemented on a quantum computer with a smaller circuit depth, employing only two qubits to encode a subset of the dataset. Furthermore, we aim for a similar result on a large dataset with the existence of large number of qubits.

The result produced by the QNN demonstrated that the model can be a solution in several classification tasks, and it is a more efficient and effective learning model when employed with a classical CNN model. Moreover, the QNN model may also predict highly efficient results in much more sophisticated and large-scale training using quantum computers of the NISQ (Noisy Intermediate-Scale Quantum) era.

In conclusion, comprehensive benchmarks on large datasets are needed to systematically examine the influence of growing input data. In addition, in the near-term, the quantum encoding (feature maps) approach is a strong and theoretically interesting approach to think about in order to successfully implement quantum machine learning algorithms.

As a future work, we are currently developing new quantum machine learning algorithms that are able to handle a large actual world datasets which can be handled on currently quantum computers. Our next objective is to examine the implementation of classical machine learning models on quantum computers.

Funding Statement

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Conflicts of interest.

The authors declare no conflict of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Machine Learning and image analysis towards improved energy management in Industry 4.0: a practical case study on quality control

  • Original Article
  • Open access
  • Published: 13 May 2024
  • Volume 17 , article number  48 , ( 2024 )

Cite this article

You have full access to this open access article

case study of machine learning

  • Mattia Casini 1 ,
  • Paolo De Angelis 1 ,
  • Marco Porrati 2 ,
  • Paolo Vigo 1 ,
  • Matteo Fasano 1 ,
  • Eliodoro Chiavazzo 1 &
  • Luca Bergamasco   ORCID: orcid.org/0000-0001-6130-9544 1  

155 Accesses

1 Altmetric

Explore all metrics

With the advent of Industry 4.0, Artificial Intelligence (AI) has created a favorable environment for the digitalization of manufacturing and processing, helping industries to automate and optimize operations. In this work, we focus on a practical case study of a brake caliper quality control operation, which is usually accomplished by human inspection and requires a dedicated handling system, with a slow production rate and thus inefficient energy usage. We report on a developed Machine Learning (ML) methodology, based on Deep Convolutional Neural Networks (D-CNNs), to automatically extract information from images, to automate the process. A complete workflow has been developed on the target industrial test case. In order to find the best compromise between accuracy and computational demand of the model, several D-CNNs architectures have been tested. The results show that, a judicious choice of the ML model with a proper training, allows a fast and accurate quality control; thus, the proposed workflow could be implemented for an ML-powered version of the considered problem. This would eventually enable a better management of the available resources, in terms of time consumption and energy usage.

Similar content being viewed by others

case study of machine learning

Towards Operation Excellence in Automobile Assembly Analysis Using Hybrid Image Processing

case study of machine learning

Deep Learning Based Algorithms for Welding Edge Points Detection

case study of machine learning

Artificial Intelligence: Prospect in Mechanical Engineering Field—A Review

Avoid common mistakes on your manuscript.

Introduction

An efficient use of energy resources in industry is key for a sustainable future (Bilgen, 2014 ; Ocampo-Martinez et al., 2019 ). The advent of Industry 4.0, and of Artificial Intelligence, have created a favorable context for the digitalisation of manufacturing processes. In this view, Machine Learning (ML) techniques have the potential for assisting industries in a better and smart usage of the available data, helping to automate and improve operations (Narciso & Martins, 2020 ; Mazzei & Ramjattan, 2022 ). For example, ML tools can be used to analyze sensor data from industrial equipment for predictive maintenance (Carvalho et al., 2019 ; Dalzochio et al., 2020 ), which allows identification of potential failures in advance, and thus to a better planning of maintenance operations with reduced downtime. Similarly, energy consumption optimization (Shen et al., 2020 ; Qin et al., 2020 ) can be achieved via ML-enabled analysis of available consumption data, with consequent adjustments of the operating parameters, schedules, or configurations to minimize energy consumption while maintaining an optimal production efficiency. Energy consumption forecast (Liu et al., 2019 ; Zhang et al., 2018 ) can also be improved, especially in industrial plants relying on renewable energy sources (Bologna et al., 2020 ; Ismail et al., 2021 ), by analysis of historical data on weather patterns and forecast, to optimize the usage of energy resources, avoid energy peaks, and leverage alternative energy sources or storage systems (Li & Zheng, 2016 ; Ribezzo et al., 2022 ; Fasano et al., 2019 ; Trezza et al., 2022 ; Mishra et al., 2023 ). Finally, ML tools can also serve for fault or anomaly detection (Angelopoulos et al., 2019 ; Md et al., 2022 ), which allows prompt corrective actions to optimize energy usage and prevent energy inefficiencies. Within this context, ML techniques for image analysis (Casini et al., 2024 ) are also gaining increasing interest (Chen et al., 2023 ), for their application to e.g. materials design and optimization (Choudhury, 2021 ), quality control (Badmos et al., 2020 ), process monitoring (Ho et al., 2021 ), or detection of machine failures by converting time series data from sensors to 2D images (Wen et al., 2017 ).

Incorporating digitalisation and ML techniques into Industry 4.0 has led to significant energy savings (Maggiore et al., 2021 ; Nota et al., 2020 ). Projects adopting these technologies can achieve an average of 15% to 25% improvement in energy efficiency in the processes where they were implemented (Arana-Landín et al., 2023 ). For instance, in predictive maintenance, ML can reduce energy consumption by optimizing the operation of machinery (Agrawal et al., 2023 ; Pan et al., 2024 ). In process optimization, ML algorithms can improve energy efficiency by 10-20% by analyzing and adjusting machine operations for optimal performance, thereby reducing unnecessary energy usage (Leong et al., 2020 ). Furthermore, the implementation of ML algorithms for optimal control can lead to energy savings of 30%, because these systems can make real-time adjustments to production lines, ensuring that machines operate at peak energy efficiency (Rahul & Chiddarwar, 2023 ).

In automotive manufacturing, ML-driven quality control can lead to energy savings by reducing the need for redoing parts or running inefficient production cycles (Vater et al., 2019 ). In high-volume production environments such as consumer electronics, novel computer-based vision models for automated detection and classification of damaged packages from intact packages can speed up operations and reduce waste (Shahin et al., 2023 ). In heavy industries like steel or chemical manufacturing, ML can optimize the energy consumption of large machinery. By predicting the optimal operating conditions and maintenance schedules, these systems can save energy costs (Mypati et al., 2023 ). Compressed air is one of the most energy-intensive processes in manufacturing. ML can optimize the performance of these systems, potentially leading to energy savings by continuously monitoring and adjusting the air compressors for peak efficiency, avoiding energy losses due to leaks or inefficient operation (Benedetti et al., 2019 ). ML can also contribute to reducing energy consumption and minimizing incorrectly produced parts in polymer processing enterprises (Willenbacher et al., 2021 ).

Here we focus on a practical industrial case study of brake caliper processing. In detail, we focus on the quality control operation, which is typically accomplished by human visual inspection and requires a dedicated handling system. This eventually implies a slower production rate, and inefficient energy usage. We thus propose the integration of an ML-based system to automatically perform the quality control operation, without the need for a dedicated handling system and thus reduced operation time. To this, we rely on ML tools able to analyze and extract information from images, that is, deep convolutional neural networks, D-CNNs (Alzubaidi et al., 2021 ; Chai et al., 2021 ).

figure 1

Sample 3D model (GrabCAD ) of the considered brake caliper: (a) part without defects, and (b) part with three sample defects, namely a scratch, a partially missing letter in the logo, and a circular painting defect (shown by the yellow squares, from left to right respectively)

A complete workflow for the purpose has been developed and tested on a real industrial test case. This includes: a dedicated pre-processing of the brake caliper images, their labelling and analysis using two dedicated D-CNN architectures (one for background removal, and one for defect identification), post-processing and analysis of the neural network output. Several different D-CNN architectures have been tested, in order to find the best model in terms of accuracy and computational demand. The results show that, a judicious choice of the ML model with a proper training, allows to obtain fast and accurate recognition of possible defects. The best-performing models, indeed, reach over 98% accuracy on the target criteria for quality control, and take only few seconds to analyze each image. These results make the proposed workflow compliant with the typical industrial expectations; therefore, in perspective, it could be implemented for an ML-powered version of the considered industrial problem. This would eventually allow to achieve better performance of the manufacturing process and, ultimately, a better management of the available resources in terms of time consumption and energy expense.

figure 2

Different neural network architectures: convolutional encoder (a) and encoder-decoder (b)

The industrial quality control process that we target is the visual inspection of manufactured components, to verify the absence of possible defects. Due to industrial confidentiality reasons, a representative open-source 3D geometry (GrabCAD ) of the considered parts, similar to the original one, is shown in Fig. 1 . For illustrative purposes, the clean geometry without defects (Fig.  1 (a)) is compared to the geometry with three possible sample defects, namely: a scratch on the surface of the brake caliper, a partially missing letter in the logo, and a circular painting defect (highlighted by the yellow squares, from left to right respectively, in Fig.  1 (b)). Note that, one or multiple defects may be present on the geometry, and that other types of defects may also be considered.

Within the industrial production line, this quality control is typically time consuming, and requires a dedicated handling system with the associated slow production rate and energy inefficiencies. Thus, we developed a methodology to achieve an ML-powered version of the control process. The method relies on data analysis and, in particular, on information extraction from images of the brake calipers via Deep Convolutional Neural Networks, D-CNNs (Alzubaidi et al., 2021 ). The designed workflow for defect recognition is implemented in the following two steps: 1) removal of the background from the image of the caliper, in order to reduce noise and irrelevant features in the image, ultimately rendering the algorithms more flexible with respect to the background environment; 2) analysis of the geometry of the caliper to identify the different possible defects. These two serial steps are accomplished via two different and dedicated neural networks, whose architecture is discussed in the next section.

Convolutional Neural Networks (CNNs) pertain to a particular class of deep neural networks for information extraction from images. The feature extraction is accomplished via convolution operations; thus, the algorithms receive an image as an input, analyze it across several (deep) neural layers to identify target features, and provide the obtained information as an output (Casini et al., 2024 ). Regarding this latter output, different formats can be retrieved based on the considered architecture of the neural network. For a numerical data output, such as that required to obtain a classification of the content of an image (Bhatt et al., 2021 ), e.g. correct or defective caliper in our case, a typical layout of the network involving a convolutional backbone, and a fully-connected network can be adopted (see Fig. 2 (a)). On the other hand, if the required output is still an image, a more complex architecture with a convolutional backbone (encoder) and a deconvolutional head (decoder) can be used (see Fig. 2 (b)).

As previously introduced, our workflow targets the analysis of the brake calipers in a two-step procedure: first, the removal of the background from the input image (e.g. Fig. 1 ); second, the geometry of the caliper is analyzed and the part is classified as acceptable or not depending on the absence or presence of any defect, respectively. Thus, in the first step of the procedure, a dedicated encoder-decoder network (Minaee et al., 2021 ) is adopted to classify the pixels in the input image as brake or background. The output of this model will then be a new version of the input image, where the background pixels are blacked. This helps the algorithms in the subsequent analysis to achieve a better performance, and to avoid bias due to possible different environments in the input image. In the second step of the workflow, a dedicated encoder architecture is adopted. Here, the previous background-filtered image is fed to the convolutional network, and the geometry of the caliper is analyzed to spot possible defects and thus classify the part as acceptable or not. In this work, both deep learning models are supervised , that is, the algorithms are trained with the help of human-labeled data (LeCun et al., 2015 ). Particularly, the first algorithm for background removal is fed with the original image as well as with a ground truth (i.e. a binary image, also called mask , consisting of black and white pixels) which instructs the algorithm to learn which pixels pertain to the brake and which to the background. This latter task is usually called semantic segmentation in Machine Learning and Deep Learning (Géron, 2022 ). Analogously, the second algorithm is fed with the original image (without the background) along with an associated mask, which serves the neural networks with proper instructions to identify possible defects on the target geometry. The required pre-processing of the input images, as well as their use for training and validation of the developed algorithms, are explained in the next sections.

Image pre-processing

Machine Learning approaches rely on data analysis; thus, the quality of the final results is well known to depend strongly on the amount and quality of the available data for training of the algorithms (Banko & Brill, 2001 ; Chen et al., 2021 ). In our case, the input images should be well-representative for the target analysis and include adequate variability of the possible features to allow the neural networks to produce the correct output. In this view, the original images should include, e.g., different possible backgrounds, a different viewing angle of the considered geometry and a different light exposure (as local light reflections may affect the color of the geometry and thus the analysis). The creation of such a proper dataset for specific cases is not always straightforward; in our case, for example, it would imply a systematic acquisition of a large set of images in many different conditions. This would require, in turn, disposing of all the possible target defects on the real parts, and of an automatic acquisition system, e.g., a robotic arm with an integrated camera. Given that, in our case, the initial dataset could not be generated on real parts, we have chosen to generate a well-balanced dataset of images in silico , that is, based on image renderings of the real geometry. The key idea was that, if the rendered geometry is sufficiently close to a real photograph, the algorithms may be instructed on artificially-generated images and then tested on a few real ones. This approach, if properly automatized, clearly allows to easily produce a large amount of images in all the different conditions required for the analysis.

In a first step, starting from the CAD file of the brake calipers, we worked manually using the open-source software Blender (Blender ), to modify the material properties and achieve a realistic rendering. After that, defects were generated by means of Boolean (subtraction) operations between the geometry of the brake caliper and ad-hoc geometries for each defect. Fine tuning on the generated defects has allowed for a realistic representation of the different defects. Once the results were satisfactory, we developed an automated Python code for the procedures, to generate the renderings in different conditions. The Python code allows to: load a given CAD geometry, change the material properties, set different viewing angles for the geometry, add different types of defects (with given size, rotation and location on the geometry of the brake caliper), add a custom background, change the lighting conditions, render the scene and save it as an image.

In order to make the dataset as varied as possible, we introduced three light sources into the rendering environment: a diffuse natural lighting to simulate daylight conditions, and two additional artificial lights. The intensity of each light source and the viewing angle were then made vary randomly, to mimic different daylight conditions and illuminations of the object. This procedure was designed to provide different situations akin to real use, and to make the model invariant to lighting conditions and camera position. Moreover, to provide additional flexibility to the model, the training dataset of images was virtually expanded using data augmentation (Mumuni & Mumuni, 2022 ), where saturation, brightness and contrast were made randomly vary during training operations. This procedure has allowed to consistently increase the number and variety of the images in the training dataset.

The developed automated pre-processing steps easily allows for batch generation of thousands of different images to be used for training of the neural networks. This possibility is key for proper training of the neural networks, as the variability of the input images allows the models to learn all the possible features and details that may change during real operating conditions.

figure 3

Examples of the ground truth for the two target tasks: background removal (a) and defects recognition (b)

The first tests using such virtual database have shown that, although the generated images were very similar to real photographs, the models were not able to properly recognize the target features in the real images. Thus, in a tentative to get closer to a proper set of real images, we decided to adopt a hybrid dataset, where the virtually generated images were mixed with the available few real ones. However, given that some possible defects were missing in the real images, we also decided to manipulate the images to introduce virtual defects on real images. The obtained dataset finally included more than 4,000 images, where 90% was rendered, and 10% was obtained from real images. To avoid possible bias in the training dataset, defects were present in 50% of the cases in both the rendered and real image sets. Thus, in the overall dataset, the real original images with no defects were 5% of the total.

Along with the code for the rendering and manipulation of the images, dedicated Python routines were developed to generate the corresponding data labelling for the supervised training of the networks, namely the image masks. Particularly, two masks were generated for each input image: one for the background removal operation, and one for the defect identification. In both cases, the masks consist of a binary (i.e. black and white) image where all the pixels of a target feature (i.e. the geometry or defect) are assigned unitary values (white); whereas, all the remaining pixels are blacked (zero values). An example of these masks in relation to the geometry in Fig. 1 is shown in Fig. 3 .

All the generated images were then down-sampled, that is, their resolution was reduced to avoid unnecessary large computational times and (RAM) memory usage while maintaining the required level of detail for training of the neural networks. Finally, the input images and the related masks were split into a mosaic of smaller tiles, to achieve a suitable size for feeding the images to the neural networks with even more reduced requirements on the RAM memory. All the tiles were processed, and the whole image reconstructed at the end of the process to visualize the overall final results.

figure 4

Confusion matrix for accuracy assessment of the neural networks models

Choice of the model

Within the scope of the present application, a wide range of possibly suitable models is available (Chen et al., 2021 ). In general, the choice of the best model for a given problem should be made on a case-by-case basis, considering an acceptable compromise between the achievable accuracy and computational complexity/cost. Too simple models can indeed be very fast in the response yet have a reduced accuracy. On the other hand, more complex models can generally provide more accurate results, although typically requiring larger amounts of data for training, and thus longer computational times and energy expense. Hence, testing has the crucial role to allow identification of the best trade-off between these two extreme cases. A benchmark for model accuracy can generally be defined in terms of a confusion matrix, where the model response is summarized into the following possibilities: True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN). This concept can be summarized as shown in Fig. 4 . For the background removal, Positive (P) stands for pixels belonging to the brake caliper, while Negative (N) for background pixels. For the defect identification model, Positive (P) stands for non-defective geometry, whereas Negative (N) stands for defective geometries. With respect to these two cases, the True/False statements stand for correct or incorrect identification, respectively. The model accuracy can be therefore assessed as Géron ( 2022 )

Based on this metrics, the accuracy for different models can then be evaluated on a given dataset, where typically 80% of the data is used for training and the remaining 20% for validation. For the defect recognition stage, the following models were tested: VGG-16 (Simonyan & Zisserman, 2014 ), ResNet50, ResNet101, ResNet152 (He et al., 2016 ), Inception V1 (Szegedy et al., 2015 ), Inception V4 and InceptionResNet V2 (Szegedy et al., 2017 ). Details on the assessment procedure for the different models are provided in the Supplementary Information file. For the background removal stage, the DeepLabV3 \(+\) (Chen et al., 2018 ) model was chosen as the first option, and no additional models were tested as it directly provided satisfactory results in terms of accuracy and processing time. This gives preliminary indication that, from the point of view of the task complexity of the problem, the defect identification stage can be more demanding with respect to the background removal operation for the case study at hand. Besides the assessment of the accuracy according to, e.g., the metrics discussed above, additional information can be generally collected, such as too low accuracy (indicating insufficient amount of training data), possible bias of the models on the data (indicating a non-well balanced training dataset), or other specific issues related to missing representative data in the training dataset (Géron, 2022 ). This information helps both to correctly shape the training dataset, and to gather useful indications for the fine tuning of the model after its choice has been made.

Background removal

An initial bias of the model for background removal arose on the color of the original target geometry (red color). The model was indeed identifying possible red spots on the background as part of the target geometry as an unwanted output. To improve the model flexibility, and thus its accuracy on the identification of the background, the training dataset was expanded using data augmentation (Géron, 2022 ). This technique allows to artificially increase the size of the training dataset by applying various transformations to the available images, with the goal to improve the performance and generalization ability of the models. This approach typically involves applying geometric and/or color transformations to the original images; in our case, to account for different viewing angles of the geometry, different light exposures, and different color reflections and shadowing effects. These improvements of the training dataset proved to be effective on the performance for the background removal operation, with a validation accuracy finally ranging above 99% and model response time around 1-2 seconds. An example of the output of this operation for the geometry in Fig.  1 is shown in Fig. 5 .

While the results obtained were satisfactory for the original (red) color of the calipers, we decided to test the model ability to be applied on brake calipers of other colors as well. To this, the model was trained and tested on a grayscale version of the images of the calipers, which allows to completely remove any possible bias of the model on a specific color. In this case, the validation accuracy of the model was still obtained to range above 99%; thus, this approach was found to be particularly interesting to make the model suitable for background removal operation even on images including calipers of different colors.

figure 5

Target geometry after background removal

Defect recognition

An overview of the performance of the tested models for the defect recognition operation on the original geometry of the caliper is reported in Table 1 (see also the Supplementary Information file for more details on the assessment of different models). The results report on the achieved validation accuracy ( \(A_v\) ) and on the number of parameters ( \(N_p\) ), with this latter being the total number of parameters that can be trained for each model (Géron, 2022 ) to determine the output. Here, this quantity is adopted as an indicator of the complexity of each model.

figure 6

Accuracy (a) and loss function (b) curves for the Resnet101 model during training

As the results in Table 1 show, the VGG-16 model was quite unprecise for our dataset, eventually showing underfitting (Géron, 2022 ). Thus, we decided to opt for the Resnet and Inception families of models. Both these families of models have demonstrated to be suitable for handling our dataset, with slightly less accurate results being provided by the Resnet50 and InceptionV1. The best results were obtained using Resnet101 and InceptionV4, with very high final accuracy and fast processing time (in the order \(\sim \) 1 second). Finally, Resnet152 and InceptionResnetV2 models proved to be slightly too complex or slower for our case; they indeed provided excellent results but taking longer response times (in the order of \(\sim \) 3-5 seconds). The response time is indeed affected by the complexity ( \(N_p\) ) of the model itself, and by the hardware used. In our work, GPUs were used for training and testing all the models, and the hardware conditions were kept the same for all models.

Based on the results obtained, ResNet101 model was chosen as the best solution for our application, in terms of accuracy and reduced complexity. After fine-tuning operations, the accuracy that we obtained with this model reached nearly 99%, both in the validation and test datasets. This latter includes target real images, that the models have never seen before; thus, it can be used for testing of the ability of the models to generalize the information learnt during the training/validation phase.

The trend in the accuracy increase and loss function decrease during training of the Resnet101 model on the original geometry are shown in Fig. 6 (a) and (b), respectively. Particularly, the loss function quantifies the error between the predicted output during training of the model and the actual target values in the dataset. In our case, the loss function is computed using the cross-entropy function and the Adam optimiser (Géron, 2022 ). The error is expected to reduce during the training, which eventually leads to more accurate predictions of the model on previously-unseen data. The combination of accuracy and loss function trends, along with other control parameters, is typically used and monitored to evaluate the training process, and avoid e.g. under- or over-fitting problems (Géron, 2022 ). As Fig. 6 (a) shows, the accuracy experiences a sudden step increase during the very first training phase (epochs, that is, the number of times the complete database is repeatedly scrutinized by the model during its training (Géron, 2022 )). The accuracy then increases in a smooth fashion with the epochs, until an asymptotic value is reached both for training and validation accuracy. These trends in the two accuracy curves can generally be associated with a proper training; indeed, being the two curves close to each other may be interpreted as an absence of under-fitting problems. On the other hand, Fig. 6 (b) shows that the loss function curves are close to each other, with a monotonically-decreasing trend. This can be interpreted as an absence of over-fitting problems, and thus of proper training of the model.

figure 7

Final results of the analysis on the defect identification: (a) considered input geometry, (b), (c) and (d) identification of a scratch on the surface, partially missing logo, and painting defect respectively (highlighted in the red frames)

Finally, an example output of the overall analysis is shown in Fig. 7 , where the considered input geometry is shown (a), along with the identification of the defects (b), (c) and (d) obtained from the developed protocol. Note that, here the different defects have been separated in several figures for illustrative purposes; however, the analysis yields the identification of defects on one single image. In this work, a binary classification was performed on the considered brake calipers, where the output of the models allows to discriminate between defective or non-defective components based on the presence or absence of any of the considered defects. Note that, fine tuning of this discrimination is ultimately with the user’s requirements. Indeed, the model output yields as the probability (from 0 to 100%) of the possible presence of defects; thus, the discrimination between a defective or non-defective part is ultimately with the user’s choice of the acceptance threshold for the considered part (50% in our case). Therefore, stricter or looser criteria can be readily adopted. Eventually, for particularly complex cases, multiple models may also be used concurrently for the same task, and the final output defined based on a cross-comparison of the results from different models. As a last remark on the proposed procedure, note that here we adopted a binary classification based on the presence or absence of any defect; however, further classification of the different defects could also be implemented, to distinguish among different types of defects (multi-class classification) on the brake calipers.

Energy saving

Illustrative scenarios.

Given that the proposed tools have not yet been implemented and tested within a real industrial production line, we analyze here three perspective scenarios to provide a practical example of the potential for energy savings in an industrial context. To this, we consider three scenarios, which compare traditional human-based control operations and a quality control system enhanced by the proposed Machine Learning (ML) tools. Specifically, here we analyze a generic brake caliper assembly line formed by 14 stations, as outlined in Table 1 in the work by Burduk and Górnicka ( 2017 ). This assembly line features a critical inspection station dedicated to defect detection, around which we construct three distinct scenarios to evaluate the efficacy of traditional human-based control operations versus a quality control system augmented by the proposed ML-based tools, namely:

First Scenario (S1): Human-Based Inspection. The traditional approach involves a human operator responsible for the inspection tasks.

Second Scenario (S2): Hybrid Inspection. This scenario introduces a hybrid inspection system where our proposed ML-based automatic detection tool assists the human inspector. The ML tool analyzes the brake calipers and alerts the human inspector only when it encounters difficulties in identifying defects, specifically when the probability of a defect being present or absent falls below a certain threshold. This collaborative approach aims to combine the precision of ML algorithms with the experience of human inspectors, and can be seen as a possible transition scenario between the human-based and a fully-automated quality control operation.

Third Scenario (S3): Fully Automated Inspection. In the final scenario, we conceive a completely automated defect inspection station powered exclusively by our ML-based detection system. This setup eliminates the need for human intervention, relying entirely on the capabilities of the ML tools to identify defects.

For simplicity, we assume that all the stations are aligned in series without buffers, minimizing unnecessary complications in our estimations. To quantify the beneficial effects of implementing ML-based quality control, we adopt the Overall Equipment Effectiveness (OEE) as the primary metric for the analysis. OEE is a comprehensive measure derived from the product of three critical factors, as outlined by Nota et al. ( 2020 ): Availability (the ratio of operating time with respect to planned production time); Performance (the ratio of actual output with respect to the theoretical maximum output); and Quality (the ratio of the good units with respect to the total units produced). In this section, we will discuss the details of how we calculate each of these factors for the various scenarios.

To calculate Availability ( A ), we consider an 8-hour work shift ( \(t_{shift}\) ) with 30 minutes of breaks ( \(t_{break}\) ) during which we assume production stop (except for the fully automated scenario), and 30 minutes of scheduled downtime ( \(t_{sched}\) ) required for machine cleaning and startup procedures. For unscheduled downtime ( \(t_{unsched}\) ), primarily due to machine breakdowns, we assume an average breakdown probability ( \(\rho _{down}\) ) of 5% for each machine, with an average repair time of one hour per incident ( \(t_{down}\) ). Based on these assumptions, since the Availability represents the ratio of run time ( \(t_{run}\) ) to production time ( \(t_{pt}\) ), it can be calculated using the following formula:

with the unscheduled downtime being computed as follows:

where N is the number of machines in the production line and \(1-\left( 1-\rho _{down}\right) ^{N}\) represents the probability that at least one machine breaks during the work shift. For the sake of simplicity, the \(t_{down}\) is assumed constant regardless of the number of failures.

Table  2 presents the numerical values used to calculate Availability in the three scenarios. In the second scenario, we can observe that integrating the automated station leads to a decrease in the first factor of the OEE analysis, which can be attributed to the additional station for automated quality-control (and the related potential failure). This ultimately increases the estimation of the unscheduled downtime. In the third scenario, the detrimental effect of the additional station compensates the beneficial effect of the automated quality control on reducing the need for pauses during operator breaks; thus, the Availability for the third scenario yields as substantially equivalent to the first one (baseline).

The second factor of OEE, Performance ( P ), assesses the operational efficiency of production equipment relative to its maximum designed speed ( \(t_{line}\) ). This evaluation includes accounting for reductions in cycle speed and minor stoppages, collectively termed as speed losses . These losses are challenging to measure in advance, as performance is typically measured using historical data from the production line. For this analysis, we are constrained to hypothesize a reasonable estimate of 60 seconds of time lost to speed losses ( \(t_{losses}\) ) for each work cycle. Although this assumption may appear strong, it will become evident later that, within the context of this analysis – particularly regarding the impact of automated inspection on energy savings – the Performance (like the Availability) is only marginally influenced by introducing an automated inspection station. To account for the effect of automated inspection on the assembly line speed, we keep the time required by the other 13 stations ( \(t^*_{line}\) ) constant while varying the time allocated for visual inspection ( \(t_{inspect}\) ). According to Burduk and Górnicka ( 2017 ), the total operation time of the production line, excluding inspection, is 1263 seconds, with manual visual inspection taking 38 seconds. For the fully automated third scenario, we assume an inspection time of 5 seconds, which encloses the photo collection, pre-processing, ML-analysis, and post-processing steps. In the second scenario, instead, we add an additional time to the pure automatic case to consider the cases when the confidence of the ML model falls below 90%. We assume this happens once in every 10 inspections, which is a conservative estimate, higher than that we observed during model testing. This results in adding 10% of the human inspection time to the fully automated time. Thus, when \(t_{losses}\) are known, Performance can be expressed as follows:

The calculated values for Performance are presented in Table  3 , and we can note that the modification in inspection time has a negligible impact on this factor since it does not affect the speed loss or, at least to our knowledge, there is no clear evidence to suggest that the introduction of a new inspection station would alter these losses. Moreover, given the specific linear layout of the considered production line, the inspection time change has only a marginal effect on enhancing the production speed. However, this approach could potentially bias our scenario towards always favouring automation. To evaluate this hypothesis, a sensitivity analysis which explores scenarios where the production line operates at a faster pace will be discussed in the next subsection.

The last factor, Quality ( Q ), quantifies the ratio of compliant products out of the total products manufactured, effectively filtering out items that fail to meet the quality standards due to defects. Given the objective of our automated algorithm, we anticipate this factor of the OEE to be significantly enhanced by implementing the ML-based automated inspection station. To estimate it, we assume a constant defect probability for the production line ( \(\rho _{def}\) ) at 5%. Consequently, the number of defective products ( \(N_{def}\) ) during the work shift is calculated as \(N_{unit} \cdot \rho _{def}\) , where \(N_{unit}\) represents the average number of units (brake calipers) assembled on the production line, defined as:

To quantify defective units identified, we consider the inspection accuracy ( \(\rho _{acc}\) ), where for human visual inspection, the typical accuracy is 80% (Sundaram & Zeid, 2023 ), and for the ML-based station, we use the accuracy of our best model, i.e., 99%. Additionally, we account for the probability of the station mistakenly identifying a caliper as with a defect even if it is defect-free, i.e., the false negative rate ( \(\rho _{FN}\) ), defined as

In the absence of any reasonable evidence to justify a bias on one mistake over others, we assume a uniform distribution for both human and automated inspections regarding error preference, i.e. we set \(\rho ^{H}_{FN} = \rho ^{ML}_{FN} = \rho _{FN} = 50\%\) . Thus, the number of final compliant goods ( \(N_{goods}\) ), i.e., the calipers that are identified as quality-compliant, can be calculated as:

where \(N_{detect}\) is the total number of detected defective units, comprising TN (true negatives, i.e. correctly identified defective calipers) and FN (false negatives, i.e. calipers mistakenly identified as defect-free). The Quality factor can then be computed as:

Table  4 summarizes the Quality factor calculation, showcasing the substantial improvement brought by the ML-based inspection station due to its higher accuracy compared to human operators.

figure 8

Overall Equipment Effectiveness (OEE) analysis for three scenarios (S1: Human-Based Inspection, S2: Hybrid Inspection, S3: Fully Automated Inspection). The height of the bars represents the percentage of the three factors A : Availability, P : Performance, and Q : Quality, which can be interpreted from the left axis. The green bars indicate the OEE value, derived from the product of these three factors. The red line shows the recall rate, i.e. the probability that a defective product is rejected by the client, with values displayed on the right red axis

Finally, we can determine the Overall Equipment Effectiveness by multiplying the three factors previously computed. Additionally, we can estimate the recall rate ( \(\rho _{R}\) ), which reflects the rate at which a customer might reject products. This is derived from the difference between the total number of defective units, \(N_{def}\) , and the number of units correctly identified as defective, TN , indicating the potential for defective brake calipers that may bypass the inspection process. In Fig.  8 we summarize the outcomes of the three scenarios. It is crucial to note that the scenarios incorporating the automated defect detector, S2 and S3, significantly enhance the Overall Equipment Effectiveness, primarily through substantial improvements in the Quality factor. Among these, the fully automated inspection scenario, S3, emerges as a slightly superior option, thanks to its additional benefit in removing the breaks and increasing the speed of the line. However, given the different assumptions required for this OEE study, we shall interpret these results as illustrative, and considering them primarily as comparative with the baseline scenario only. To analyze the sensitivity of the outlined scenarios on the adopted assumptions, we investigate the influence of the line speed and human accuracy on the results in the next subsection.

Sensitivity analysis

The scenarios described previously are illustrative and based on several simplifying hypotheses. One of such hypotheses is that the production chain layout operates entirely in series, with each station awaiting the arrival of the workpiece from the preceding station, resulting in a relatively slow production rate (1263 seconds). This setup can be quite different from reality, where slower operations can be accelerated by installing additional machines in parallel to balance the workload and enhance productivity. Moreover, we utilized a literature value of 80% for the accuracy of the human visual inspector operator, as reported by Sundaram and Zeid ( 2023 ). However, this accuracy can vary significantly due to factors such as the experience of the inspector and the defect type.

figure 9

Effect of assembly time for stations (excluding visual inspection), \(t^*_{line}\) , and human inspection accuracy, \(\rho _{acc}\) , on the OEE analysis. (a) The subplot shows the difference between the scenario S2 (Hybrid Inspection) and the baseline scenario S1 (Human Inspection), while subplot (b) displays the difference between scenario S3 (Fully Automated Inspection) and the baseline. The maps indicate in red the values of \(t^*_{line}\) and \(\rho _{acc}\) where the integration of automated inspection stations can significantly improve OEE, and in blue where it may lower the score. The dashed lines denote the breakeven points, and the circled points pinpoint the values of the scenarios used in the “Illustrative scenario” Subsection.

A sensitivity analysis on these two factors was conducted to address these variations. The assembly time of the stations (excluding visual inspection), \(t^*_{line}\) , was varied from 60 s to 1500 s, and the human inspection accuracy, \(\rho _{acc}\) , ranged from 50% (akin to a random guesser) to 100% (representing an ideal visual inspector); meanwhile, the other variables were kept fixed.

The comparison of the OEE enhancement for the two scenarios employing ML-based inspection against the baseline scenario is displayed in the two maps in Fig.  9 . As the figure shows, due to the high accuracy and rapid response of the proposed automated inspection station, the area representing regions where the process may benefit energy savings in the assembly lines (indicated in red shades) is significantly larger than the areas where its introduction could degrade performance (indicated in blue shades). However, it can be also observed that the automated inspection could be superfluous or even detrimental in those scenarios where human accuracy and assembly speed are very high, indicating an already highly accurate workflow. In these cases, and particularly for very fast production lines, short times for quality control can be expected to be key (beyond accuracy) for the optimization.

Finally, it is important to remark that the blue region (areas below the dashed break-even lines) might expand if the accuracy of the neural networks for defect detection is lower when implemented in an real production line. This indicates the necessity for new rounds of active learning and an augment of the ratio of real images in the database, to eventually enhance the performance of the ML model.

Conclusions

Industrial quality control processes on manufactured parts are typically achieved by human visual inspection. This usually requires a dedicated handling system, and generally results in a slower production rate, with the associated non-optimal use of the energy resources. Based on a practical test case for quality control on brake caliper manufacturing, in this work we have reported on a developed workflow for integration of Machine Learning methods to automatize the process. The proposed approach relies on image analysis via Deep Convolutional Neural Networks. These models allow to efficiently extract information from images, thus possibly representing a valuable alternative to human inspection.

The proposed workflow relies on a two-step procedure on the images of the brake calipers: first, the background is removed from the image; second, the geometry is inspected to identify possible defects. These two steps are accomplished thanks to two dedicated neural network models, an encoder-decoder and an encoder network, respectively. Training of these neural networks typically requires a large number of representative images for the problem. Given that, one such database is not always readily available, we have presented and discussed an alternative methodology for the generation of the input database using 3D renderings. While integration of the database with real photographs was required for optimal results, this approach has allowed fast and flexible generation of a large base of representative images. The pre-processing steps required for data feeding to the neural networks and their training has been also discussed.

Several models have been tested and evaluated, and the best one for the considered case identified. The obtained accuracy for defect identification reaches \(\sim \) 99% of the tested cases. Moreover, the response of the models is fast (in the order of few seconds) on each image, which makes them compliant with the most typical industrial expectations.

In order to provide a practical example of possible energy savings when implementing the proposed ML-based methodology for quality control, we have analyzed three perspective industrial scenarios: a baseline scenario, where quality control tasks are performed by a human inspector; a hybrid scenario, where the proposed ML automatic detection tool assists the human inspector; a fully-automated scenario, where we envision a completely automated defect inspection. The results show that the proposed tools may help increasing the Overall Equipment Effectiveness up to \(\sim \) 10% with respect to the considered baseline scenario. However, a sensitivity analysis on the speed of the production line and on the accuracy of the human inspector has also shown that the automated inspection could be superfluous or even detrimental in those cases where human accuracy and assembly speed are very high. In these cases, reducing the time required for quality control can be expected to be the major controlling parameter (beyond accuracy) for optimization.

Overall the results show that, with a proper tuning, these models may represent a valuable resource for integration into production lines, with positive outcomes on the overall effectiveness, and thus ultimately leading to a better use of the energy resources. To this, while the practical implementation of the proposed tools can be expected to require contained investments (e.g. a portable camera, a dedicated workstation and an operator with proper training), in field tests on a real industrial line would be required to confirm the potential of the proposed technology.

Agrawal, R., Majumdar, A., Kumar, A., & Luthra, S. (2023). Integration of artificial intelligence in sustainable manufacturing: Current status and future opportunities. Operations Management Research, 1–22.

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. (2021). Review of deep learning: Concepts, cnn architectures, challenges, applications, future directions. Journal of big Data, 8 , 1–74.

Article   Google Scholar  

Angelopoulos, A., Michailidis, E. T., Nomikos, N., Trakadas, P., Hatziefremidis, A., Voliotis, S., & Zahariadis, T. (2019). Tackling faults in the industry 4.0 era-a survey of machine—learning solutions and key aspects. Sensors, 20 (1), 109.

Arana-Landín, G., Uriarte-Gallastegi, N., Landeta-Manzano, B., & Laskurain-Iturbe, I. (2023). The contribution of lean management—industry 4.0 technologies to improving energy efficiency. Energies, 16 (5), 2124.

Badmos, O., Kopp, A., Bernthaler, T., & Schneider, G. (2020). Image-based defect detection in lithium-ion battery electrode using convolutional neural networks. Journal of Intelligent Manufacturing, 31 , 885–897. https://doi.org/10.1007/s10845-019-01484-x

Banko, M., & Brill, E. (2001). Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th annual meeting of the association for computational linguistics (pp. 26–33).

Benedetti, M., Bonfà, F., Introna, V., Santolamazza, A., & Ubertini, S. (2019). Real time energy performance control for industrial compressed air systems: Methodology and applications. Energies, 12 (20), 3935.

Bhatt, D., Patel, C., Talsania, H., Patel, J., Vaghela, R., Pandya, S., Modi, K., & Ghayvat, H. (2021). Cnn variants for computer vision: History, architecture, application, challenges and future scope. Electronics, 10 (20), 2470.

Bilgen, S. (2014). Structure and environmental impact of global energy consumption. Renewable and Sustainable Energy Reviews, 38 , 890–902.

Blender. (2023). Open-source software. https://www.blender.org/ . Accessed 18 Apr 2023.

Bologna, A., Fasano, M., Bergamasco, L., Morciano, M., Bersani, F., Asinari, P., Meucci, L., & Chiavazzo, E. (2020). Techno-economic analysis of a solar thermal plant for large-scale water pasteurization. Applied Sciences, 10 (14), 4771.

Burduk, A., & Górnicka, D. (2017). Reduction of waste through reorganization of the component shipment logistics. Research in Logistics & Production, 7 (2), 77–90. https://doi.org/10.21008/j.2083-4950.2017.7.2.2

Carvalho, T. P., Soares, F. A., Vita, R., Francisco, R., d. P., Basto, J. P., & Alcalá, S. G. (2019). A systematic literature review of machine learning methods applied to predictive maintenance. Computers & Industrial Engineering, 137 , 106024.

Casini, M., De Angelis, P., Chiavazzo, E., & Bergamasco, L. (2024). Current trends on the use of deep learning methods for image analysis in energy applications. Energy and AI, 15 , 100330. https://doi.org/10.1016/j.egyai.2023.100330

Chai, J., Zeng, H., Li, A., & Ngai, E. W. (2021). Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications, 6 , 100134.

Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).

Chen, L., Li, S., Bai, Q., Yang, J., Jiang, S., & Miao, Y. (2021). Review of image classification algorithms based on convolutional neural networks. Remote Sensing, 13 (22), 4712.

Chen, T., Sampath, V., May, M. C., Shan, S., Jorg, O. J., Aguilar Martín, J. J., Stamer, F., Fantoni, G., Tosello, G., & Calaon, M. (2023). Machine learning in manufacturing towards industry 4.0: From ‘for now’to ‘four-know’. Applied Sciences, 13 (3), 1903. https://doi.org/10.3390/app13031903

Choudhury, A. (2021). The role of machine learning algorithms in materials science: A state of art review on industry 4.0. Archives of Computational Methods in Engineering, 28 (5), 3361–3381. https://doi.org/10.1007/s11831-020-09503-4

Dalzochio, J., Kunst, R., Pignaton, E., Binotto, A., Sanyal, S., Favilla, J., & Barbosa, J. (2020). Machine learning and reasoning for predictive maintenance in industry 4.0: Current status and challenges. Computers in Industry, 123 , 103298.

Fasano, M., Bergamasco, L., Lombardo, A., Zanini, M., Chiavazzo, E., & Asinari, P. (2019). Water/ethanol and 13x zeolite pairs for long-term thermal energy storage at ambient pressure. Frontiers in Energy Research, 7 , 148.

Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow . O’Reilly Media, Inc.

GrabCAD. (2023). Brake caliper 3D model by Mitulkumar Sakariya from the GrabCAD free library (non-commercial public use). https://grabcad.com/library/brake-caliper-19 . Accessed 18 Apr 2023.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

Ho, S., Zhang, W., Young, W., Buchholz, M., Al Jufout, S., Dajani, K., Bian, L., & Mozumdar, M. (2021). Dlam: Deep learning based real-time porosity prediction for additive manufacturing using thermal images of the melt pool. IEEE Access, 9 , 115100–115114. https://doi.org/10.1109/ACCESS.2021.3105362

Ismail, M. I., Yunus, N. A., & Hashim, H. (2021). Integration of solar heating systems for low-temperature heat demand in food processing industry-a review. Renewable and Sustainable Energy Reviews, 147 , 111192.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436–444.

Leong, W. D., Teng, S. Y., How, B. S., Ngan, S. L., Abd Rahman, A., Tan, C. P., Ponnambalam, S., & Lam, H. L. (2020). Enhancing the adaptability: Lean and green strategy towards the industry revolution 4.0. Journal of cleaner production, 273 , 122870.

Liu, Z., Wang, X., Zhang, Q., & Huang, C. (2019). Empirical mode decomposition based hybrid ensemble model for electrical energy consumption forecasting of the cement grinding process. Measurement, 138 , 314–324.

Li, G., & Zheng, X. (2016). Thermal energy storage system integration forms for a sustainable future. Renewable and Sustainable Energy Reviews, 62 , 736–757.

Maggiore, S., Realini, A., Zagano, C., & Bazzocchi, F. (2021). Energy efficiency in industry 4.0: Assessing the potential of industry 4.0 to achieve 2030 decarbonisation targets. International Journal of Energy Production and Management, 6 (4), 371–381.

Mazzei, D., & Ramjattan, R. (2022). Machine learning for industry 4.0: A systematic review using deep learning-based topic modelling. Sensors, 22 (22), 8641.

Md, A. Q., Jha, K., Haneef, S., Sivaraman, A. K., & Tee, K. F. (2022). A review on data-driven quality prediction in the production process with machine learning for industry 4.0. Processes, 10 (10), 1966. https://doi.org/10.3390/pr10101966

Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., & Terzopoulos, D. (2021). Image segmentation using deep learning: A survey. IEEE transactions on pattern analysis and machine intelligence, 44 (7), 3523–3542.

Google Scholar  

Mishra, S., Srivastava, R., Muhammad, A., Amit, A., Chiavazzo, E., Fasano, M., & Asinari, P. (2023). The impact of physicochemical features of carbon electrodes on the capacitive performance of supercapacitors: a machine learning approach. Scientific Reports, 13 (1), 6494. https://doi.org/10.1038/s41598-023-33524-1

Mumuni, A., & Mumuni, F. (2022). Data augmentation: A comprehensive survey of modern approaches. Array, 16 , 100258. https://doi.org/10.1016/j.array.2022.100258

Mypati, O., Mukherjee, A., Mishra, D., Pal, S. K., Chakrabarti, P. P., & Pal, A. (2023). A critical review on applications of artificial intelligence in manufacturing. Artificial Intelligence Review, 56 (Suppl 1), 661–768.

Narciso, D. A., & Martins, F. (2020). Application of machine learning tools for energy efficiency in industry: A review. Energy Reports, 6 , 1181–1199.

Nota, G., Nota, F. D., Peluso, D., & Toro Lazo, A. (2020). Energy efficiency in industry 4.0: The case of batch production processes. Sustainability, 12 (16), 6631. https://doi.org/10.3390/su12166631

Ocampo-Martinez, C., et al. (2019). Energy efficiency in discrete-manufacturing systems: Insights, trends, and control strategies. Journal of Manufacturing Systems, 52 , 131–145.

Pan, Y., Hao, L., He, J., Ding, K., Yu, Q., & Wang, Y. (2024). Deep convolutional neural network based on self-distillation for tool wear recognition. Engineering Applications of Artificial Intelligence, 132 , 107851.

Qin, J., Liu, Y., Grosvenor, R., Lacan, F., & Jiang, Z. (2020). Deep learning-driven particle swarm optimisation for additive manufacturing energy optimisation. Journal of Cleaner Production, 245 , 118702.

Rahul, M., & Chiddarwar, S. S. (2023). Integrating virtual twin and deep neural networks for efficient and energy-aware robotic deburring in industry 4.0. International Journal of Precision Engineering and Manufacturing, 24 (9), 1517–1534.

Ribezzo, A., Falciani, G., Bergamasco, L., Fasano, M., & Chiavazzo, E. (2022). An overview on the use of additives and preparation procedure in phase change materials for thermal energy storage with a focus on long term applications. Journal of Energy Storage, 53 , 105140.

Shahin, M., Chen, F. F., Hosseinzadeh, A., Bouzary, H., & Shahin, A. (2023). Waste reduction via image classification algorithms: Beyond the human eye with an ai-based vision. International Journal of Production Research, 1–19.

Shen, F., Zhao, L., Du, W., Zhong, W., & Qian, F. (2020). Large-scale industrial energy systems optimization under uncertainty: A data-driven robust optimization approach. Applied Energy, 259 , 114199.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 .

Sundaram, S., & Zeid, A. (2023). Artificial Intelligence-Based Smart Quality Inspection for Manufacturing. Micromachines, 14 (3), 570. https://doi.org/10.3390/mi14030570

Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (vol. 31).

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).

Trezza, G., Bergamasco, L., Fasano, M., & Chiavazzo, E. (2022). Minimal crystallographic descriptors of sorption properties in hypothetical mofs and role in sequential learning optimization. npj Computational Materials, 8 (1), 123. https://doi.org/10.1038/s41524-022-00806-7

Vater, J., Schamberger, P., Knoll, A., & Winkle, D. (2019). Fault classification and correction based on convolutional neural networks exemplified by laser welding of hairpin windings. In 2019 9th International Electric Drives Production Conference (EDPC) (pp. 1–8). IEEE.

Wen, L., Li, X., Gao, L., & Zhang, Y. (2017). A new convolutional neural network-based data-driven fault diagnosis method. IEEE Transactions on Industrial Electronics, 65 (7), 5990–5998. https://doi.org/10.1109/TIE.2017.2774777

Willenbacher, M., Scholten, J., & Wohlgemuth, V. (2021). Machine learning for optimization of energy and plastic consumption in the production of thermoplastic parts in sme. Sustainability, 13 (12), 6800.

Zhang, X. H., Zhu, Q. X., He, Y. L., & Xu, Y. (2018). Energy modeling using an effective latent variable based functional link learning machine. Energy, 162 , 883–891.

Download references

Acknowledgements

This work has been supported by GEFIT S.p.a.

Open access funding provided by Politecnico di Torino within the CRUI-CARE Agreement.

Author information

Authors and affiliations.

Department of Energy, Politecnico di Torino, Turin, Italy

Mattia Casini, Paolo De Angelis, Paolo Vigo, Matteo Fasano, Eliodoro Chiavazzo & Luca Bergamasco

R &D Department, GEFIT S.p.a., Alessandria, Italy

Marco Porrati

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Luca Bergamasco .

Ethics declarations

Conflict of interest statement.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 354 KB)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Casini, M., De Angelis, P., Porrati, M. et al. Machine Learning and image analysis towards improved energy management in Industry 4.0: a practical case study on quality control. Energy Efficiency 17 , 48 (2024). https://doi.org/10.1007/s12053-024-10228-7

Download citation

Received : 22 July 2023

Accepted : 28 April 2024

Published : 13 May 2024

DOI : https://doi.org/10.1007/s12053-024-10228-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Industry 4.0
  • Energy management
  • Artificial intelligence
  • Machine learning
  • Deep learning
  • Convolutional neural networks
  • Computer vision
  • Find a journal
  • Publish with us
  • Track your research

Artificial intelligence  is being used in healthcare for everything from answering patient questions to assisting with surgeries and developing new pharmaceuticals.

According to  Statista , the artificial intelligence (AI) healthcare market, which is valued at $11 billion in 2021, is projected to be worth $187 billion in 2030. That massive increase means we will likely continue to see considerable changes in how medical providers, hospitals, pharmaceutical and biotechnology companies, and others in the healthcare industry operate.

Better  machine learning (ML)  algorithms, more access to data, cheaper hardware, and the availability of 5G have contributed to the increasing application of AI in the healthcare industry, accelerating the pace of change. AI and ML technologies can sift through enormous volumes of health data—from health records and clinical studies to genetic information—and analyze it much faster than humans.

Healthcare organizations are using AI to improve the efficiency of all kinds of processes, from back-office tasks to patient care. The following are some examples of how AI might be used to benefit staff and patients:

  • Administrative workflow:  Healthcare workers spend a lot of time doing paperwork and other administrative tasks. AI and automation can help perform many of those mundane tasks, freeing up employee time for other activities and giving them more face-to-face time with patients. For example, generative AI can help clinicians with note-taking and content summarization that can help keep medical records as thoroughly as possible. AI might also help with accurate coding and sharing of information between departments and billing.
  • Virtual nursing assistants:  One study found that  64% of patients  are comfortable with the use of AI for around-the-clock access to answers that support nurses provide. AI virtual nurse assistants—which are AI-powered chatbots, apps, or other interfaces—can be used to help answer questions about medications, forward reports to doctors or surgeons and help patients schedule a visit with a physician. These sorts of routine tasks can help take work off the hands of clinical staff, who can then spend more time directly on patient care, where human judgment and interaction matter most.
  • Dosage error reduction:  AI can be used to help identify errors in how a patient self-administers medication. One example comes from a study in  Nature Medicine , which found that up to 70% of patients don’t take insulin as prescribed. An AI-powered tool that sits in the patient’s background (much like a wifi router) might be used to flag errors in how the patient administers an insulin pen or inhaler.
  • Less invasive surgeries:  AI-enabled robots might be used to work around sensitive organs and tissues to help reduce blood loss, infection risk and post-surgery pain.
  • Fraud prevention:  Fraud in the healthcare industry is enormous, at $380 billion/year, and raises the cost of consumers’ medical premiums and out-of-pocket expenses. Implementing AI can help recognize unusual or suspicious patterns in insurance claims, such as billing for costly services or procedures that are not performed, unbundling (which is billing for the individual steps of a procedure as though they were separate procedures), and performing unnecessary tests to take advantage of insurance payments.

A recent study found that  83% of patients  report poor communication as the worst part of their experience, demonstrating a strong need for clearer communication between patients and providers. AI technologies like  natural language processing  (NLP), predictive analytics, and  speech recognition  might help healthcare providers have more effective communication with patients. AI might, for instance, deliver more specific information about a patient’s treatment options, allowing the healthcare provider to have more meaningful conversations with the patient for shared decision-making.

According to  Harvard’s School of Public Health , although it’s early days for this use, using AI to make diagnoses may reduce treatment costs by up to 50% and improve health outcomes by 40%.

One use case example is out of the  University of Hawaii , where a research team found that deploying  deep learning  AI technology can improve breast cancer risk prediction. More research is needed, but the lead researcher pointed out that an AI algorithm can be trained on a much larger set of images than a radiologist—as many as a million or more radiology images. Also, that algorithm can be replicated at no cost except for hardware.

An  MIT group  developed an ML algorithm to determine when a human expert is needed. In some instances, such as identifying cardiomegaly in chest X-rays, they found that a hybrid human-AI model produced the best results.

Another  published study  found that AI recognized skin cancer better than experienced doctors.  US, German and French researchers used deep learning on more than 100,000 images to identify skin cancer. Comparing the results of AI to those of 58 international dermatologists, they found AI did better.

As health and fitness monitors become more popular and more people use apps that track and analyze details about their health. They can share these real-time data sets with their doctors to monitor health issues and provide alerts in case of problems.

AI solutions—such as big data applications, machine learning algorithms and deep learning algorithms—might also be used to help humans analyze large data sets to help clinical and other decision-making. AI might also be used to help detect and track infectious diseases, such as COVID-19, tuberculosis, and malaria.

One benefit the use of AI brings to health systems is making gathering and sharing information easier. AI can help providers keep track of patient data more efficiently.

One example is diabetes. According to the  Centers for Disease Control and Prevention , 10% of the US population has diabetes. Patients can now use wearable and other monitoring devices that provide feedback about their glucose levels to themselves and their medical team. AI can help providers gather that information, store, and analyze it, and provide data-driven insights from vast numbers of people. Using this information can help healthcare professionals determine how to better treat and manage diseases.

Organizations are also starting to use AI to help improve drug safety. The company SELTA SQUARE, for example, is  innovating the pharmacovigilance (PV) process , a legally mandated discipline for detecting and reporting adverse effects from drugs, then assessing, understanding, and preventing those effects. PV demands significant effort and diligence from pharma producers because it’s performed from the clinical trials phase all the way through the drug’s lifetime availability. Selta Square uses a combination of AI and automation to make the PV process faster and more accurate, which helps make medicines safer for people worldwide.

Sometimes, AI might reduce the need to test potential drug compounds physically, which is an enormous cost-savings.  High-fidelity molecular simulations  can run on computers without incurring the high costs of traditional discovery methods.

AI also has the potential to help humans predict toxicity, bioactivity, and other characteristics of molecules or create previously unknown drug molecules from scratch.

As AI becomes more important in healthcare delivery and more AI medical applications are developed, ethical, and regulatory governance must be established. Issues that raise concern include the possibility of bias, lack of transparency, privacy concerns regarding data used for training AI models, and safety and liability issues.

“AI governance is necessary, especially for clinical applications of the technology,” said Laura Craft, VP Analyst at  Gartner . “However, because new AI techniques are largely new territory for most [health delivery organizations], there is a lack of common rules, processes, and guidelines for eager entrepreneurs to follow as they design their pilots.”

The World Health Organization (WHO) spent 18 months deliberating with leading experts in ethics, digital technology, law, and human rights and various Ministries of Health members to produce a report that is called  Ethics & Governance of Artificial Intelligence for Health . This report identifies ethical challenges to using AI in healthcare, identifies risks, and outlines six  consensus principles  to ensure AI works for the public’s benefit:

  • Protecting autonomy
  • Promoting human safety and well-being
  • Ensuring transparency
  • Fostering accountability
  • Ensuring equity
  • Promoting tools that are responsive and sustainable

The WHO report also provides recommendations that ensure governing AI for healthcare both maximizes the technology’s promise and holds healthcare workers accountable and responsive to the communities and people they work with.

AI provides opportunities to help reduce human error, assist medical professionals and staff, and provide patient services 24/7. As AI tools continue to develop, there is potential to use AI even more in reading medical images, X-rays and scans, diagnosing medical problems and creating treatment plans.

AI applications continue to help streamline various tasks, from answering phones to analyzing population health trends (and likely, applications yet to be considered). For instance, future AI tools may automate or augment more of the work of clinicians and staff members. That will free up humans to spend more time on more effective and compassionate face-to-face professional care.

When patients need help, they don’t want to (or can’t) wait on hold. Healthcare facilities’ resources are finite, so help isn’t always available instantaneously or 24/7—and even slight delays can create frustration and feelings of isolation or cause certain conditions to worsen.

IBM® watsonx Assistant™ AI healthcare chatbots  can help providers do two things: keep their time focused where it needs to be and empower patients who call in to get quick answers to simple questions.

IBM watsonx Assistant  is built on deep learning, machine learning and natural language processing (NLP) models to understand questions, search for the best answers and complete transactions by using conversational AI.

Get email updates about AI advancements, strategies, how-tos, expert perspective and more.

See IBM watsonx Assistant in action and request a demo

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.

ScienceDaily

New machine learning algorithm promises advances in computing

Digital twin models may enhance future autonomous systems.

Systems controlled by next-generation computing algorithms could give rise to better and more efficient machine learning products, a new study suggests.

Using machine learning tools to create a digital twin, or a virtual copy, of an electronic circuit that exhibits chaotic behavior, researchers found that they were successful at predicting how it would behave and using that information to control it.

Many everyday devices, like thermostats and cruise control, utilize linear controllers -- which use simple rules to direct a system to a desired value. Thermostats, for example, employ such rules to determine how much to heat or cool a space based on the difference between the current and desired temperatures.

Yet because of how straightforward these algorithms are, they struggle to control systems that display complex behavior, like chaos.

As a result, advanced devices like self-driving cars and aircraft often rely on machine learning-based controllers, which use intricate networks to learn the optimal control algorithm needed to best operate. However, these algorithms have significant drawbacks, the most demanding of which is that they can be extremely challenging and computationally expensive to implement.

Now, having access to an efficient digital twin is likely to have a sweeping impact on how scientists develop future autonomous technologies, said Robert Kent, lead author of the study and a graduate student in physics at The Ohio State University.

"The problem with most machine learning-based controllers is that they use a lot of energy or power and they take a long time to evaluate," said Kent. "Developing traditional controllers for them has also been difficult because chaotic systems are extremely sensitive to small changes."

These issues, he said, are critical in situations where milliseconds can make a difference between life and death, such as when self-driving vehicles must decide to brake to prevent an accident.

The study was published recently in Nature Communications.

Compact enough to fit on an inexpensive computer chip capable of balancing on your fingertip and able to run without an internet connection, the team's digital twin was built to optimize a controller's efficiency and performance, which researchers found resulted in a reduction of power consumption. It achieves this quite easily, mainly because it was trained using a type of machine learning approach called reservoir computing.

"The great thing about the machine learning architecture we used is that it's very good at learning the behavior of systems that evolve in time," Kent said. "It's inspired by how connections spark in the human brain."

Although similarly sized computer chips have been used in devices like smart fridges, according to the study, this novel computing ability makes the new model especially well-equipped to handle dynamic systems such as self-driving vehicles as well as heart monitors, which must be able to quickly adapt to a patient's heartbeat.

"Big machine learning models have to consume lots of power to crunch data and come out with the right parameters, whereas our model and training is so extremely simple that you could have systems learning on the fly," he said.

To test this theory, researchers directed their model to complete complex control tasks and compared its results to those from previous control techniques. The study revealed that their approach achieved a higher accuracy at the tasks than its linear counterpart and is significantly less computationally complex than a previous machine learning-based controller.

"The increase in accuracy was pretty significant in some cases," said Kent. Though the outcome showed that their algorithm does require more energy than a linear controller to operate, this tradeoff means that when it is powered up, the team's model lasts longer and is considerably more efficient than current machine learning-based controllers on the market.

"People will find good use out of it just based on how efficient it is," Kent said. "You can implement it on pretty much any platform and it's very simple to understand." The algorithm was recently made available to scientists.

Outside of inspiring potential advances in engineering, there's also an equally important economic and environmental incentive for creating more power-friendly algorithms, said Kent.

As society becomes more dependent on computers and AI for nearly all aspects of daily life, demand for data centers is soaring, leading many experts to worry over digital systems' enormous power appetite and what future industries will need to do to keep up with it.

And because building these data centers as well as large-scale computing experiments can generate a large carbon footprint, scientists are looking for ways to curb carbon emissions from this technology.

To advance their results, future work will likely be steered toward training the model to explore other applications like quantum information processing, Kent said. In the meantime, he expects that these new elements will reach far into the scientific community.

"Not enough people know about these types of algorithms in the industry and engineering, and one of the big goals of this project is to get more people to learn about them," said Kent. "This work is a great first step toward reaching that potential."

This study was supported by the U.S. Air Force's Office of Scientific Research. Other Ohio State co-authors include Wendson A.S. Barbosa and Daniel J. Gauthier.

  • Electronics
  • Telecommunications
  • Energy Technology
  • Computers and Internet
  • Neural Interfaces
  • Information Technology
  • Computer Science
  • Artificial intelligence
  • Alan Turing
  • Computing power everywhere
  • Grid computing
  • Data mining

Story Source:

Materials provided by Ohio State University . Original written by Tatyana Woodall. Note: Content may be edited for style and length.

Journal Reference :

  • Robert M. Kent, Wendson A. S. Barbosa, Daniel J. Gauthier. Controlling chaos using edge computing hardware . Nature Communications , 2024; 15 (1) DOI: 10.1038/s41467-024-48133-3

Cite This Page :

Explore More

  • Life Expectancy May Increase by 5 Years by 2050
  • Toward a Successful Vaccine for HIV
  • Highly Efficient Thermoelectric Materials
  • Toward Human Brain Gene Therapy
  • Whale Families Learn Each Other's Vocal Style
  • AI Can Answer Complex Physics Questions
  • Otters Use Tools to Survive a Changing World
  • Monogamy in Mice: Newly Evolved Type of Cell
  • Sustainable Electronics, Doped With Air
  • Male Vs Female Brain Structure

Trending Topics

Strange & offbeat.

case study of machine learning

IMAGES

  1. 5 Machine Learning Case Studies to explore the Power of Technology

    case study of machine learning

  2. 16 Real World Case Studies of Machine Learning

    case study of machine learning

  3. Machine Learning Algorithms & their Use Cases

    case study of machine learning

  4. A Guide to Machine Learning [Principles and Use in Software Development

    case study of machine learning

  5. (PDF) Case Studies of Machine Learning Applications in Retail

    case study of machine learning

  6. Machine Learning and its Use Cases

    case study of machine learning

VIDEO

  1. Machine Teaching Demo

  2. 10-601 Machine Learning Spring 2015

  3. Mentioned You.. study Machine learning with me 🤩

  4. 10-601 Machine Learning Spring 2015

  5. let me convince you to study machine learning in science with me #msc #levelup #growthmindset

  6. 10-601 Machine Learning Spring 2015

COMMENTS

  1. 16 Real World Case Studies of Machine Learning

    Machine Learning Case Studies in Life Science and Biology 7. Development of Microbiome Therapeutics. We have studied and identified a vast number of microorganisms, so-called microbiota like bacteria, fungi, viruses, and other single-celled organisms in our body till today with the advancement in technology. All the genes of the microbiota are ...

  2. 99 Machine Learning Case Studies from 91 Enterprises by 2024

    AIMultiple analyzed 99 machine learning case studies for data-driven insights. They highlight. 99 use cases in 17 industries. 14 business processes in 14 business functions. Implementations in 91 companies in 20 countries. 10 benefits. Growth over 6 years. 9 vendors which created these case studies.

  3. Machine Learning Case Studies with Powerful Insights

    The Titanic Machine Learning Case Study is a classic example in the field of data science and machine learning. The study is based on the dataset of passengers aboard the Titanic when it sank in 1912. The study's goal is to predict whether a passenger survived or not based on their demographic and other information.

  4. 8 Exciting Case Studies of Machine Learning Applications in Life

    The apps for consumers are typically conversational chatbots enhanced with machine learning algorithms. The app analyzes the spoken language of the consumer, and recommendations for help are given. As the recommendations must be based on scientific evidence, the interaction and response of proposals and the individual language pattern must be ...

  5. Machine Learning Foundations: A Case Study Approach

    This first course treats the machine learning method as a black box. Using this abstraction, you will focus on understanding tasks of interest, matching these tasks to machine learning tools, and assessing the quality of the output. In subsequent courses, you will delve into the components of this black box by examining models and algorithms.

  6. 10 Wonderful Machine Learning Case Studies From Tech Company ...

    Another wonderful thing about this post is that it also covers personalization to rank results differently for different users. 6. From shallow to deep learning in fraud. (Hao Yi Ong, Lyft ...

  7. Machine learning in project analytics: a data-driven framework and case

    This study considered four machine learning algorithms to explore the causes of project cost overrun using the research dataset mentioned above. They are support vector machine, logistic ...

  8. Machine Learning Case-Studies

    Genetic Algorithms + Neural Networks = Best of Both Worlds. Learn how Neural Network training can be accelerated using Genetic Algorithms! Suryansh S. Mar 26, 2018. Real-world case studies on applications of machine learning to solve real problems. Your home for data science. A Medium publication sharing concepts, ideas and codes.

  9. Challenges in Deploying Machine Learning: A Survey of Case Studies

    This case study emphasizes the importance of context-aware personalization for ML systems' interfaces, one of the key observations delivered by "Project explAIn" . 7.4 Security. Machine Learning opens up opportunities for new types of security attacks across the whole ML deployment workflow . Specialized adversarial attacks for ML can ...

  10. Machine Learning Foundations: A Case Study Approach

    Machine Learning Foundations: A Case Study Approach is a 6-week introductory machine learning course offered by the University of Washington on Coursera. It is the first course in a 5-part Machine Learning specialization. The course provides a broad overview of key areas in machine learning, including regression, classification, clustering ...

  11. Case Study

    This blog post is co-authored by Guillermo Ribeiro, Sr. Data Scientist at Cepsa. Machine learning (ML) has rapidly evolved from being a fashionable trend emerging from academic environments and innovation departments to becoming a key means to deliver value across businesses in every industry. This transition from experiments in laboratories to ...

  12. PDF Software Engineering for Machine Learning: A Case Study

    Fig. 1. The nine stages of the machine learning workflow. Some stages are data-oriented (e.g., collection, cleaning, and labeling) and others are model-oriented (e.g., model requirements, feature engineering, training, evaluation, deployment, and monitoring). There are many feedback loops in the workflow.

  13. A case study comparing machine learning with statistical ...

    Time series forecasting is one of the most active research topics. Machine learning methods have been increasingly adopted to solve these predictive tasks. However, in a recent work, evidence was shown that these approaches systematically present a lower predictive performance relative to simple statistical methods. In this work, we counter these results. We show that these are only valid ...

  14. Physics-informed machine learning: case studies for weather and climate

    3. Physics-informed machine learning: case studies in emulation, downscaling and forecasting. In this section, we introduce 10 case studies representing the three application areas in §2c that use the key PIML approaches described in §2b to address critical challenges in weather and climate modelling.

  15. Software Engineering for Machine Learning: A Case Study

    Recent advances in machine learning have stimulated widespread interest within the Information Technology sector on integrating AI capabilities into software and services. This goal has forced organizations to evolve their development processes. We report on a study that we conducted on observing software teams at Microsoft as they develop AI-based applications. We consider a nine-stage ...

  16. The Big Book of Machine Learning Use Cases

    The world of machine learning is evolving so quickly that it's challenging to find real-life use cases that are relevant to your day-to-day work. That's why we've created this comprehensive guide you can start using right away. Get everything you need — use cases, code samples and notebooks — so you can start putting the Databricks ...

  17. Machine Learning Foundations: A Case Study Approach

    In this course, you will get hands-on experience with machine learning from a series of practical case-studies. At the end of the first course you will have studied how to predict house prices ...

  18. Machine Learning Case Study: A data-driven approach to predict the

    As our dataset has 10 categorical features we will need to encode these features into a numerical representation to apply the machine learning models. For this case study we will look into two encoding schemes and compare the results of both the encoding schemes at the end of the case study. The two encoding schemes are: 1.

  19. 10 everyday machine learning use cases

    10 everyday machine learning use cases. Machine learning (ML) —the artificial intelligence (AI) subfield in which machines learn from datasets and past experiences by recognizing patterns and generating predictions—is a $21 billion global industry projected to become a $209 billion industry by 2029. Here are some real-world applications of ...

  20. 5 Machine Learning Case Studies to explore the Power of Technology

    Here are the five best machine learning case studies explained: 1. Machine Learning Case Study on Dell. The multinational leader in technology, Dell, empowers people and communities from across the globe with superior software and hardware. Since data is a core part of Dell's hard drive, their marketing team needed a data-driven solution that ...

  21. Machine Learning Foundations: A Case Study Approach

    This first course treats the machine learning method as a black box. Using this abstraction, you will focus on understanding tasks of interest, matching these tasks to machine learning tools, and assessing the quality of the output. In subsequent courses, you will delve into the components of this black box by examining models and algorithms.

  22. Artificial Intelligence and Machine Learning

    Researchers at Case Western Reserve University are working to answer the fundamental questions behind AI as well as apply AI and machine learning methods to applications. To better understand these questions, we study how our brains encode sensory information, how people should interact with AI assistants, how to build AI systems that can ...

  23. Quantum Machine Learning: A Review and Case Studies

    The notebooks used during this study can be provided after contacting the authors. ... This is the traditional technique of machine learning; however, in this case, it refers to machine learning using quantum information research-derived approaches. Tensor networks that are designed addressing to quantum many-body processes, ...

  24. Electronics

    Growing battery use in energy storage and automotive industries demands advanced Battery Management Systems (BMSs) to estimate key parameters like the State of Charge (SoC) which are not directly measurable using standard sensors. Consequently, various model-based and data-driven approaches have been developed for their estimation. Among these, the latter are often favored due to their high ...

  25. Machine Learning and image analysis towards improved energy ...

    In this work, we focus on a practical case study of a brake caliper quality control operation, which is usually accomplished by human inspection and requires a dedicated handling system, with a slow production rate and thus inefficient energy usage. We report on a developed Machine Learning (ML) methodology, based on Deep Convolutional Neural ...

  26. The Benefits of AI in Healthcare

    Better machine learning (ML) algorithms, more access to data, cheaper hardware, and the availability of 5G have contributed to the increasing application of AI in the healthcare industry, accelerating the pace of change. AI and ML technologies can sift through enormous volumes of health data—from health records and clinical studies to genetic ...

  27. New machine learning algorithm promises advances in computing

    Ohio State University. "New machine learning algorithm promises advances in computing." ScienceDaily. ScienceDaily, 9 May 2024. <www.sciencedaily.com / releases / 2024 / 05 / 240509155536.htm ...

  28. Ethics book recommendations from the Poe Business Ethics Center

    Ethics and Military Strategy in the 21st Century. by Geroge Lucas, Jr. This book examines the importance of "military ethics" in the formulation and conduct of contemporary military strategy. Clausewitz's original analysis of war relegated ethics to the side-lines in favor of political realism, interpreting the proper use of military ...

  29. Managing Fabric Data Pipelines: a step-by-step guide to source control

    Introduction. In the post Microsoft Fabric: Integration with ADO Repos and Deployment Pipelines - A Power BI Case Study. we have outlined key best practices for utilizing the seamless integration between Fabric and GIT via Azure DevOps repositories and the use of Fabric Deployment Pipelines, both features intended to improve collaborative development and agile application publishing in the ...