cloud computing Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Simulation and performance assessment of a modified throttled load balancing algorithm in cloud computing environment

<span lang="EN-US">Load balancing is crucial to ensure scalability, reliability, minimize response time, and processing time and maximize resource utilization in cloud computing. However, the load fluctuation accompanied with the distribution of a huge number of requests among a set of virtual machines (VMs) is challenging and needs effective and practical load balancers. In this work, a two listed throttled load balancer (TLT-LB) algorithm is proposed and further simulated using the CloudAnalyst simulator. The TLT-LB algorithm is based on the modification of the conventional TLB algorithm to improve the distribution of the tasks between different VMs. The performance of the TLT-LB algorithm compared to the TLB, round robin (RR), and active monitoring load balancer (AMLB) algorithms has been evaluated using two different configurations. Interestingly, the TLT-LB significantly balances the load between the VMs by reducing the loading gap between the heaviest loaded and the lightest loaded VMs to be 6.45% compared to 68.55% for the TLB and AMLB algorithms. Furthermore, the TLT-LB algorithm considerably reduces the average response time and processing time compared to the TLB, RR, and AMLB algorithms.</span>

An improved forensic-by-design framework for cloud computing with systems engineering standard compliance

Reliability of trust management systems in cloud computing.

Cloud computing is an innovation that conveys administrations like programming, stage, and framework over the web. This computing structure is wide spread and dynamic, which chips away at the compensation per-utilize model and supports virtualization. Distributed computing is expanding quickly among purchasers and has many organizations that offer types of assistance through the web. It gives an adaptable and on-request administration yet at the same time has different security dangers. Its dynamic nature makes it tweaked according to client and supplier’s necessities, subsequently making it an outstanding benefit of distributed computing. However, then again, this additionally makes trust issues and or issues like security, protection, personality, and legitimacy. In this way, the huge test in the cloud climate is selecting a perfect organization. For this, the trust component assumes a critical part, in view of the assessment of QoS and Feedback rating. Nonetheless, different difficulties are as yet present in the trust the board framework for observing and assessing the QoS. This paper talks about the current obstructions present in the trust framework. The objective of this paper is to audit the available trust models. The issues like insufficient trust between the supplier and client have made issues in information sharing likewise tended to here. Besides, it lays the limits and their enhancements to help specialists who mean to investigate this point.

Cloud Computing Adoption in the Construction Industry of Singapore: Drivers, Challenges, and Strategies

An extensive review of web-based multi granularity service composition.

The paper reviews the efforts to compose SOAP, non-SOAP and non-web services. Traditionally efforts were made for composite SOAP services, however, these efforts did not include the RESTful and non-web services. A SOAP service uses structured exchange methodology for dealing with web services while a non-SOAP follows different approach. The research paper reviews the invoking and composing a combination of SOAP, non-SOAP, and non-web services into a composite process to execute complex tasks on various devices. It also shows the systematic integration of the SOAP, non-SOAP and non-web services describing the composition of heterogeneous services than the ones conventionally used from the perspective of resource consumption. The paper further compares and reviews different layout model for the discovery of services, selection of services and composition of services in Cloud computing. Recent research trends in service composition are identified and then research about microservices are evaluated and shown in the form of table and graphs.

Integrated Blockchain and Cloud Computing Systems: A Systematic Survey, Solutions, and Challenges

Cloud computing is a network model of on-demand access for sharing configurable computing resource pools. Compared with conventional service architectures, cloud computing introduces new security challenges in secure service management and control, privacy protection, data integrity protection in distributed databases, data backup, and synchronization. Blockchain can be leveraged to address these challenges, partly due to the underlying characteristics such as transparency, traceability, decentralization, security, immutability, and automation. We present a comprehensive survey of how blockchain is applied to provide security services in the cloud computing model and we analyze the research trends of blockchain-related techniques in current cloud computing models. During the reviewing, we also briefly investigate how cloud computing can affect blockchain, especially about the performance improvements that cloud computing can provide for the blockchain. Our contributions include the following: (i) summarizing the possible architectures and models of the integration of blockchain and cloud computing and the roles of cloud computing in blockchain; (ii) classifying and discussing recent, relevant works based on different blockchain-based security services in the cloud computing model; (iii) simply investigating what improvements cloud computing can provide for the blockchain; (iv) introducing the current development status of the industry/major cloud providers in the direction of combining cloud and blockchain; (v) analyzing the main barriers and challenges of integrated blockchain and cloud computing systems; and (vi) providing recommendations for future research and improvement on the integration of blockchain and cloud systems.

Cloud Computing and Undergraduate Researches in Universities in Enugu State: Implication for Skills Demand

Cloud building block chip for creating fpga and asic clouds.

Hardware-accelerated cloud computing systems based on FPGA chips (FPGA cloud) or ASIC chips (ASIC cloud) have emerged as a new technology trend for power-efficient acceleration of various software applications. However, the operating systems and hypervisors currently used in cloud computing will lead to power, performance, and scalability problems in an exascale cloud computing environment. Consequently, the present study proposes a parallel hardware hypervisor system that is implemented entirely in special-purpose hardware, and that virtualizes application-specific multi-chip supercomputers, to enable virtual supercomputers to share available FPGA and ASIC resources in a cloud system. In addition to the virtualization of multi-chip supercomputers, the system’s other unique features include simultaneous migration of multiple communicating hardware tasks, and on-demand increase or decrease of hardware resources allocated to a virtual supercomputer. Partitioning the flat hardware design of the proposed hypervisor system into multiple partitions and applying the chip unioning technique to its partitions, the present study introduces a cloud building block chip that can be used to create FPGA or ASIC clouds as well. Single-chip and multi-chip verification studies have been done to verify the functional correctness of the hypervisor system, which consumes only a fraction of (10%) hardware resources.

Study On Social Network Recommendation Service Method Based On Mobile Cloud Computing

Cloud-based network virtualization in iot with openstack.

In Cloud computing deployments, specifically in the Infrastructure-as-a-Service (IaaS) model, networking is one of the core enabling facilities provided for the users. The IaaS approach ensures significant flexibility and manageability, since the networking resources and topologies are entirely under users’ control. In this context, considerable efforts have been devoted to promoting the Cloud paradigm as a suitable solution for managing IoT environments. Deep and genuine integration between the two ecosystems, Cloud and IoT, may only be attainable at the IaaS level. In light of extending the IoT domain capabilities’ with Cloud-based mechanisms akin to the IaaS Cloud model, network virtualization is a fundamental enabler of infrastructure-oriented IoT deployments. Indeed, an IoT deployment without networking resilience and adaptability makes it unsuitable to meet user-level demands and services’ requirements. Such a limitation makes the IoT-based services adopted in very specific and statically defined scenarios, thus leading to limited plurality and diversity of use cases. This article presents a Cloud-based approach for network virtualization in an IoT context using the de-facto standard IaaS middleware, OpenStack, and its networking subsystem, Neutron. OpenStack is being extended to enable the instantiation of virtual/overlay networks between Cloud-based instances (e.g., virtual machines, containers, and bare metal servers) and/or geographically distributed IoT nodes deployed at the network edge.

Export Citation Format

Share document.

  • Original Papers
  • Open access
  • Published: 20 April 2010

Cloud computing: state-of-the-art and research challenges

  • Qi Zhang 1 ,
  • Lu Cheng 1 &
  • Raouf Boutaba 1  

Journal of Internet Services and Applications volume  1 ,  pages 7–18 ( 2010 ) Cite this article

89k Accesses

2081 Citations

28 Altmetric

Metrics details

Cloud computing has recently emerged as a new paradigm for hosting and delivering services over the Internet. Cloud computing is attractive to business owners as it eliminates the requirement for users to plan ahead for provisioning, and allows enterprises to start from the small and increase resources only when there is a rise in service demand. However, despite the fact that cloud computing offers huge opportunities to the IT industry, the development of cloud computing technology is currently at its infancy, with many issues still to be addressed. In this paper, we present a survey of cloud computing, highlighting its key concepts, architectural principles, state-of-the-art implementation as well as research challenges. The aim of this paper is to provide a better understanding of the design challenges of cloud computing and identify important research directions in this increasingly important area.

Al-Fares M et al (2008) A scalable, commodity data center network architecture. In: Proc SIGCOMM

Amazon Elastic Computing Cloud, aws.amazon.com/ec2

Amazon Web Services, aws.amazon.com

Ananthanarayanan R, Gupta K et al (2009) Cloud analytics: do we really need to reinvent the storage stack? In: Proc of HotCloud

Armbrust M et al (2009) Above the clouds: a Berkeley view of cloud computing. UC Berkeley Technical Report

Berners-Lee T, Fielding R, Masinter L (2005) RFC 3986: uniform resource identifier (URI): generic syntax, January 2005

Bodik P et al (2009) Statistical machine learning makes automatic control practical for Internet datacenters. In: Proc HotCloud

Brooks D et al (2000) Power-aware microarchitecture: design and modeling challenges for the next-generation microprocessors, IEEE Micro

Chandra A et al (2009) Nebulas: using distributed voluntary resources to build clouds. In: Proc of HotCloud

Chang F, Dean J et al (2006) Bigtable: a distributed storage system for structured data. In: Proc of OSDI

Chekuri C, Khanna S (2004) On multi-dimensional packing problems. SIAM J Comput 33(4):837–851

Article   MATH   MathSciNet   Google Scholar  

Church K et al (2008) On delivering embarrassingly distributed cloud services. In: Proc of HotNets

Clark C, Fraser K, Hand S, Hansen JG, Jul E, Limpach C, Pratt I, Warfield A (2005) Live migration of virtual machines. In: Proc of NSDI

Cloud Computing on Wikipedia, en.wikipedia.org/wiki/Cloudcomputing , 20 Dec 2009

Cloud Hosting, CLoud Computing and Hybrid Infrastructure from GoGrid, http://www.gogrid.com

Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proc of OSDI

Dedicated Server, Managed Hosting, Web Hosting by Rackspace Hosting, http://www.rackspace.com

FlexiScale Cloud Comp and Hosting, www.flexiscale.com

Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: Proc of SOSP, October 2003

Google App Engine, URL http://code.google.com/appengine

Greenberg A, Jain N et al (2009) VL2: a scalable and flexible data center network. In: Proc SIGCOMM

Guo C et al (2008) DCell: a scalable and fault-tolerant network structure for data centers. In: Proc SIGCOMM

Guo C, Lu G, Li D et al (2009) BCube: a high performance, server-centric network architecture for modular data centers. In: Proc SIGCOMM

Hadoop Distributed File System, hadoop.apache.org/hdfs

Hadoop MapReduce, hadoop.apache.org/mapreduce

Hamilton J (2009) Cooperative expendable micro-slice servers (CEMS): low cost, low power servers for Internet-scale services In: Proc of CIDR

IEEE P802.3az Energy Efficient Ethernet Task Force, www.ieee802.org/3/az

Kalyvianaki E et al (2009) Self-adaptive and self-configured CPU resource provisioning for virtualized servers using Kalman filters. In: Proc of international conference on autonomic computing

Kambatla K et al (2009) Towards optimizing Hadoop provisioning in the cloud. In: Proc of HotCloud

Kernal Based Virtual Machine, www.linux-kvm.org/page/MainPage

Krautheim FJ (2009) Private virtual infrastructure for cloud computing. In: Proc of HotCloud

Kumar S et al (2009) vManage: loosely coupled platform and virtualization management in data centers. In: Proc of international conference on cloud computing

Li B et al (2009) EnaCloud: an energy-saving application live placement approach for cloud computing environments. In: Proc of international conf on cloud computing

Meng X et al (2010) Improving the scalability of data center networks with traffic-aware virtual machine placement. In: Proc INFOCOM

Mysore R et al (2009) PortLand: a scalable fault-tolerant layer 2 data center network fabric. In: Proc SIGCOMM

NIST Definition of Cloud Computing v15, csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc

Osman S, Subhraveti D et al (2002) The design and implementation of zap: a system for migrating computing environments. In: Proc of OSDI

Padala P, Hou K-Y et al (2009) Automated control of multiple virtualized resources. In: Proc of EuroSys

Parkhill D (1966) The challenge of the computer utility. Addison-Wesley, Reading

Google Scholar  

Patil S et al (2009) In search of an API for scalable file systems: under the table or above it? HotCloud

Salesforce CRM, http://www.salesforce.com/platform

Sandholm T, Lai K (2009) MapReduce optimization using regulated dynamic prioritization. In: Proc of SIGMETRICS/Performance

Santos N, Gummadi K, Rodrigues R (2009) Towards trusted cloud computing. In: Proc of HotCloud

SAP Business ByDesign, www.sap.com/sme/solutions/businessmanagement/businessbydesign/index.epx

Sonnek J et al (2009) Virtual putty: reshaping the physical footprint of virtual machines. In: Proc of HotCloud

Srikantaiah S et al (2008) Energy aware consolidation for cloud computing. In: Proc of HotPower

Urgaonkar B et al (2005) Dynamic provisioning of multi-tier Internet applications. In: Proc of ICAC

Valancius V, Laoutaris N et al (2009) Greening the Internet with nano data centers. In: Proc of CoNext

Vaquero L, Rodero-Merino L, Caceres J, Lindner M (2009) A break in the clouds: towards a cloud definition. ACM SIGCOMM computer communications review

Vasic N et al (2009) Making cluster applications energy-aware. In: Proc of automated ctrl for datacenters and clouds

Virtualization Resource Chargeback, www.vkernel.com/products/EnterpriseChargebackVirtualAppliance

VMWare ESX Server, www.vmware.com/products/esx

Windows Azure, www.microsoft.com/azure

Wood T et al (2007) Black-box and gray-box strategies for virtual machine migration. In: Proc of NSDI

XenSource Inc, Xen, www.xensource.com

Zaharia M et al (2009) Improving MapReduce performance in heterogeneous environments. In: Proc of HotCloud

Zhang Q et al (2007) A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In: Proc ICAC

Download references

Author information

Authors and affiliations.

University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1

Qi Zhang, Lu Cheng & Raouf Boutaba

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Raouf Boutaba .

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Zhang, Q., Cheng, L. & Boutaba, R. Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 1 , 7–18 (2010). https://doi.org/10.1007/s13174-010-0007-6

Download citation

Received : 08 January 2010

Accepted : 25 February 2010

Published : 20 April 2010

Issue Date : May 2010

DOI : https://doi.org/10.1007/s13174-010-0007-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cloud computing
  • Data centers
  • Virtualization

cloud computing in research papers

Special issue on ‘‘artificial intelligence in cloud computing’’

  • Published: 09 August 2021
  • Volume 105 , pages 507–511, ( 2023 )

Cite this article

cloud computing in research papers

  • Sabah Mohammed 1 ,
  • Wai Chi Fang 2 &
  • Carlos Ramos 3  

3585 Accesses

2 Citations

Explore all metrics

A Correction to this article was published on 18 January 2023

This article has been updated

Avoid common mistakes on your manuscript.

Cloud computing equips artificial intelligence (AI) with tremendous power and considered to be one of the most important catalyst for developing innovative smart applications. With its potential to change the way data used to get stored and processed across various geographies,the scope and impact of AI have reached larger market. With all the cloud models, AI developers and consumers started to create an ecosystem that improve the lives of millions. Now digital assistants like Siri, Google Home, and Amazon’s Alexa blend AI and cloud computing in our lives every day. AI practitioners based on the Infrastructure as a Service cloud model (IaaS) can use advanced infrastructure facilities—CPU, GPU, memory, disk, network, and O/S without waiting for an infrastructure team to prepare it. Moreover with Platform as a Service cloud model (PaaS), AI practionars can use variety of AI algorithms and data science services including jupyter notebooks, data catalog services to develop new generation smart applications. Additionally, consumers based on the Software as a Service cloud model (SaaS) can to employ and embed AI services within their application (e.g. Smart Building). Before the SaaS, software and data were only “on premise.” SaaS moved everything to the cloud, collaboration and efficiency as well as sharing telents. With AI, the next step is to have “smart SaaS” as services can begin to use wider AI/machine learning to create higher consumer experience. Cloud computer, however, is adding more capabilities that can fuel the use of higher AI applications. Capabilities like containerization, developers can isolate applications to fit different computing environments and platforms. With Kubernetes the automating deployment, scaling, and management of containerized applications can be achieved where applications running on containers can run on different cloud providers without worrying about compute environment. On this large scale of research and development, AI capabilities are working in the cloud computing environment to make organizations more efficient, strategic, and insight-driven. Cloud computing offers businesses more flexibility, agility, and cost savings by hosting data and applications in the cloud. Artificial intelligence capabilities are now layering with cloud computing and helping companies manage their data, look for patterns and insights in information, deliver customer experiences, and optimize workflows [ 1 ].

Artificial intelligence is being embedded into cloud computing infrastructures to help streamline workloads and automate repetitive tasks as well as to monitor, manage, and even self-heal when an issue occurs. The future research on the empowerment of AI through cloud computing is without limit. This special issue aims to gather researches and practitioners working in the same field in order to present, discuss challenges and share original research works and practical experiences, and provide the latest and most innovative contributions. We would like to thank our reviwers who played an important role in selecting and commenting on the various submissions of this special issue. We would like to thank the EiC of the Computing Journal and the wonderful staff who help us through producing this special issue.

1 In this special issue

The first paper by Weiwei Lin et al. proposes a novel workload-aware power measuring framework. We first introduce separate power consumption models for different workload types based on their impacts on server components. Then we present the adaptive workload-aware power consumption measuring method (WSPM) for cloud servers, which proactively selects an appropriate power model for the upcoming workload through workload clustering, forecasting, and classification.

Hamza Turabieh et al. examined the students' performance inside the educational organization to reduce the probability of students' failure and enhance the understanding of students' learning process. To achieve this enhancement, we applied an enhanced wrapper feature selection method to understand the most valuable attributes (features) that affect students' performance. We have proposed a modified version of HHO algorithm by hybridizing it with kNN algorithm. The proposed contribution will control the population diversity and prevent premature convergence of HHO by injecting the current population with new solutions once all solutions belong to one cluster and stuck in local optima. We simulate the proposed modification of HHO algorithm as a feature selection algorithm for students' performance prediction problems. The proposed approach can enhance the original HHO algorithm and support our claim that controlling population diversity will enhance the exploration process of HHO algorithm.

Thuy Thi Le et al. investigated the object coreference resolution challenge in the context of opinion mining and proposed a CROAS model. The proposed CROAS model combines machine learning, deep learning, ontology-based reference, graph-based reference, and dependency grammar for object coreference resolution. Specifically, a powerful new language representation method and machine learning support object classification of CROAS.

The fourth paper by Guang-Ho Cha proposes a similarity ranking technique that exploits the entire network structure of similarity relationships for multimedia, particularly image, databases. The main problem in the similarity ranking on multimedia is the meaning gap between the characteristics automatically computed from the multimedia dataset and the interpretation by a human from the multimedia itself. The similarity semantics usually lies on high-level human interpretation and automatically computed low-level multimedia properties may not reflect it. This paper assumes that the meaning of the multimedia is affected by the context or similarity relationships in a dataset and therefore, we propose the ranking technique to catch the semantics from a large multimedia dataset.

In the paper by Deguang You et al. a novel approach for CPU load prediction of cloud server combining denoising and error correction was proposed. In this new method, filtering followed by recombining noise signal before prediction and error correction after prediction is the two most important techniques. For the filtering and recombining of noise signal before prediction, CEEMDAN is used to decompose the original signal into multiple IMF components, screened the effective IMF, and recombined them with Fréchet calculation. For the error correction after prediction, we predicted the error in advance of the actual load prediction from the historical CPU load error data. It works by doing a preliminary prediction for allowing an initial round of error correction. Experimentation was designed to test the efficacy of the proposed approach by comparing a collection of experimental runs set up by different configurations.

The authors Sun-Young Ihm et al. propose an unbalanced-hierarchical (UB-H) layer. This method increases the total number of layers and reduces the index building time, compared to the UB-Layer and the convex hull method. The proposed method first divides the dimensions of input data hierarchically into two or three sub-datasets. Next, we build the sub-convex hull in each sub-dataset and construct the final UB-H as an index by combining each sub-convex hull.

The seventh paper by Sandeep Kumar Sood et al. proposes a fog-based intelligent healthcare system, which diagnoses the possible DeV infection of the individuals using Naive Bayesian Network and generates real-time diagnostic, suggestive, and emergency alerts to the concerned stakeholders (individuals, government agencies, and health organizations). The proposed system aware and suggests the individuals diagnosed with possible DeV infection medically confirm the incidence of the infection by consulting with the doctors and through proper recommended laboratory tests. The proposed system has utilized the environment event index (EEI) to ascertain the health sensitivity of the possibly infected individual concerning the occurrence of undesired environmental events, and generate emergency alerts to the doctors or caregivers for taking timely remedial actions. The proposed system has also pinpointed the DeV infected and risk-prone areas on Google Maps using SNA and provided an efficient warning alert system for the visitors or residents in those areas. The system helps in preventing the further spread of DeV by alerting uninfected individuals and government healthcare agencies and aids in effective and precautionary control of the infection.

Ruiping Wang et al. proposes an illumination-robust feature detection method that consists of two parts: Front-end EIRFT and back-end ATFAST feature detection. The EIRFT effectively improves the image quality and the latter successfully solves the problem that traditional FAST algorithms cannot extract enough feature points in underexposure and overexposure images by improving the threshold function in the traditional FAST method. In the experimental section, we proved that the proposed method has excellent algorithm stability and illumination robust. At the same time, in terms of the number of repeated features and repeatability rate, the proposed algorithm also has significant advantages over state-of-the-art feature-based and learning-based detection methods in the case of underexposure.

Girish L et al. propose a model for anomaly detection in an OpenStack cloud environment. In the proposed model, we used Stacked and Bidirectional LSTM models to build the neural network. For the experiment, the data is collected from OpenStack using collected. The collected data sets 10 features and class labels. Using LSTM neural network, we were able to detect the anomalies in the OpenStack environment.

A novel streamlined sensor data processing method is proposed by Shimin Hu et al. called Evolutionary Expand-and-Contract Instance-based Learning algorithm (EEAC-IBL). The multivariate data stream is first expanded into many subspaces, then the subspaces which are corresponding to the characteristics of the features are selected and condensed into a significant feature subset. The selection operates scholastically instead of deterministically by evolutionary optimization which approximates the best subgroup. Followed by data stream mining, machine learning for activity recognition is done on the fly. This approach is unique and suitable for such extreme connectivity scenarios where precise feature selection is not required, and the relative importance of each feature among the sensor data changes over time. This stochastic approximation method is fast and accurate, offering an alternative to the traditional machine learning method for smart home activity recognition applications.

A Novel Indoor Localization System Using Machine Learning Based on Bluetooth Low Energy with Cloud Computing by Quanyi Hu et al. proposes to propose a novel indoor localization system in a multi-indoor environment using cloud computing. Prior studies show that there are always concerns about how to avoid signal occlusion and interference in a single indoor environment. However, it finds some general rules to support our system being immune to interference generated by occlusion in the multi-indoor environment. A convenient way is measured to deploy Bluetooth low energy (BLE) devices, which mainly collect large information to assist localization. A neural network-based classification is proposed to improve localization accuracy, compared with several algorithms and their performance comparison is discussed. It also designs a distributed data storage structure and establishes a platform considering the storage load with Redis.

The paper by Junsheng Xiao et al. proposes a few shot segmentation networks for skin lesion segmentation, which requires only a few pixel-level annotations. First, the co-occurrence region between the supported image and query image is obtained, which is used as a prior mask to exclude irrelevant background regions. Second, the results are concatenated and sent to the inference module to predict the segmentation of the query image. Third, the proposed network is retrained by reversing the support and query role, which benefits from the symmetrical structure.

Change history

18 january 2023.

A Correction to this paper has been published: https://doi.org/10.1007/s00607-023-01149-x

Alton L (2019) 4 Ways AI is improving cloud computing, Community Connection, June 5, 2019. https://community.connection.com/4-ways-ai-is-improving-cloud-computing/

Download references

Author information

Authors and affiliations.

Lakehead University, Thunder Bay, Canada

Sabah Mohammed

National Chiao Tung University, Hsinchu City, Taiwan

Wai Chi Fang

ISEP, Polytechnic of Porto, Porto, Portugal

Carlos Ramos

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Sabah Mohammed .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Mohammed, S., Fang, W.C. & Ramos, C. Special issue on ‘‘artificial intelligence in cloud computing’’. Computing 105 , 507–511 (2023). https://doi.org/10.1007/s00607-021-00985-z

Download citation

Accepted : 13 July 2021

Published : 09 August 2021

Issue Date : March 2023

DOI : https://doi.org/10.1007/s00607-021-00985-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

Analysis and Research on Green Cloud Computing

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Advances, Systems and Applications

  • Open access
  • Published: 23 December 2019

Load balancing in cloud computing – A hierarchical taxonomical classification

  • Shahbaz Afzal   ORCID: orcid.org/0000-0002-1217-9357 1 &
  • G. Kavitha 1  

Journal of Cloud Computing volume  8 , Article number:  22 ( 2019 ) Cite this article

58k Accesses

95 Citations

1 Altmetric

Metrics details

Load unbalancing problem is a multi-variant, multi-constraint problem that degrades performance and efficiency of computing resources. Load balancing techniques cater the solution for load unbalancing situation for two undesirable facets- overloading and under-loading. In contempt of the importance of load balancing techniques to the best of our knowledge, there is no comprehensive, extensive, systematic and hierarchical classification about the existing load balancing techniques. Further, the factors that cause load unbalancing problem are neither studied nor considered in the literature. This paper presents a detailed encyclopedic review about the load balancing techniques. The advantages and limitations of existing methods are highlighted with crucial challenges being addressed so as to develop efficient load balancing algorithms in future. The paper also suggests new insights towards load balancing in cloud computing.

Introduction

Cloud Computing is an internet based network technology that shared a rapid growth in the advances of communication technology by providing service to customers of various requirements with the aid of online computing resources. It has provisions of both hardware and software applications along with software development platforms and testing tools as resources [ 1 , 2 ]. Such a resource delivery is accomplished with the help of services. While as the former comes under category of Infrastructure as a service (IaaS) cloud, the latter two comes under headings of Software as a service (SaaS) cloud and platform as a service (PaaS) cloud respectively [ 3 ]. The cloud computing is an on-demand network enabled computing model that share resources as services billed on pay-as-you-go (PAYG) plan [ 4 ]. Some of the giant players in given technology are Amazon, Microsoft, Google, SAP, Oracle, VMware, Sales force, IBM and others [ 1 , 2 ]. Majority of these cloud providers are high- tech IT organizations. The cloud computing model is viewed under two different headings. The first one is the service delivery model, which defines the type of the service offered by a typical cloud provider. Based on this aspect, there are popularly following three important service models SaaS, PaaS and IaaS [ 5 , 6 ]. The other aspect of cloud computing model is viewed on its scale of use, affiliation, ownership, size and access. The official ‘National Institute of Standards and Technology’ (NIST) definition for cloud computing outlines four cloud deployment models namely private, public, community and hybrid clouds [ 7 ].

A cloud computing model is efficient if its resources are utilized in best possible way and such an efficient utilization can be achieved by employing and maintaining proper management of cloud resources. Resource management is achieved by adopting robust resource scheduling, allocation and powerful resource scalability techniques. These resources are provided to customers in the form of Virtual Machines (VM) through a process known as virtualization that makes use of an entity (software, hardware or both) known as hypervisor [ 8 ]. The greatest advantage of cloud computing is that a single user physical machine is transformed into a multiuser virtual machines [ 9 , 10 ]. The Cloud Service Provider (CSP) plays a crucial role in service delivery to users and is a complex task with given available virtual resources. While serving user requests, some VMs will get a heavy traffic of user tasks and some will get a lesser traffic. As a result, the Cloud Service Provider (CSP) is left with unbalanced machines which have a huge gradient of user tasks and resource utilization [ 11 ].

The problem of load unbalancing is an undesirable event in the CSP side that degrades the performance and efficacy of the computing resources along with guaranteed Quality of Service (QoS) on agreed Service Level Agreement (SLA) between consumer and provider. Under these circumstances there arises need for load balancing (LB) and is a peculiar topic of research interest among researchers. The load balancing in cloud computing can be done at physical machine level or VM level [ 2 ].

A task utilize resources of a VM and when a bunch of tasks arrive at a VM, the resources gets exhausted which means no resource is now available to handle the additional task requests. When such situation arises the VM is said to have entered into an overloaded state. At this point of time, tasks will either suffer from starvation or end up in deadlock with no hope of accomplishing them. Consequently there is necessity to migrate tasks to another resource on other VM. The workload migration process includes three basic steps: load balancing which checks the current load on machine resource, resource discovery which finds another suitable resource and workload migration which moves extra tasks to available resources. These operations are performed by three different units commonly known as load balancer, resource discovery and task migration units respectively.

Load balancing is the process of redistribution of workload in a distributed system like cloud computing ensuring no computing machine is overloaded, under-loaded or idle [ 12 , 13 ]. Load balancing tries to speed up different constrained parameters like response time, execution time, system stability etc. thereby improving performance of cloud [ 14 , 15 ]. It is an optimization technique in which task scheduling is an NP hard problem. There are a large number of load balancing approaches proposed by researchers where most of focus has been concerned on task scheduling, task allocation, resource scheduling, resource allocation, and resource management. To the best of our knowledge, we could not find an in-depth and comprehensive literature concerned with factors that cause load unbalancing situation. The survey papers based on load balancing could not provide a proper systematic classification of methods and techniques. The main aim of the paper is to review the existing work along with the advantages and pitfalls in them. A comparison is also made among different existing load balancing techniques and the challenges faced in cloud load balancing. The survey also outlines factors responsible for load unbalancing problem and also suggests methods that can be used in future work. The contributions of this paper are summarized as follows:

Explore the factors that cause load unbalancing problem in cloud computing.

Provide a systematic overview of the existing approaches in the load balancing process and the way in which these approaches have been used in the cloud technology.

Provide the in-depth classification of different load balancing techniques, methods, strategies and algorithms.

Analyze the challenges faced by researchers in developing an efficient load balancing algorithm.

The remaining paper is structured as follows. Section “ Load balancing model background ” features a brief description about load balancing model in cloud computing. Section “ Research methodology ” highlights some related works. The research methodology is discussed in section “ Research methodology ”. Section “ Proposed classification of load balancing algorithms ” proposes taxonomy based classification. The results are evaluated in section “ Results and discussion ” while section “ Discussion on open issues on load balancing in cloud computing ” discusses upon open issues in cloud load balancing. Finally section “ Conclusion and future work ” concludes our work and points out some future directions.

Load balancing model background

In this section a two level load balancing architecture model is presented in imbalanced clouds for achieving best load shedding as shown in Fig.  1 which is a modified architecture given by Gupta et al. [ 16 ]. The virtual machine manager and virtual machine monitor are abstracted in this model. The first level load balancing is performed at the Physical Machine (PM) level and the second level is performed at the VM level. Based on this, there are two task migration sets;

Intra VM task migration

Inter VM task migration

figure 1

Two level Load Balancing Architecture

The request generator generates user requests which are user tasks that need computing resources for their execution. Data center controller is in-charge of task management. The load balancer checks which VM to assign for a given user task. The first level load balancer balances the given workload on individual Physical Machines by distributing the workload among its respective associated Virtual Machines. The second level load balancer balances the workload across different Virtual Machines of different Physical Machines.

Activities involved in load balancing

Scheduling and allocating tasks to VMs based on their requirements constitute the cloud computing workload. The load balancing process involves the following activities [ 2 ]:

Identification of user task requirements

This phase identifies the resource requirement of the user tasks to be scheduled for execution on a VM.

Identification of resource details of a VM

This checks the status of resource details of a VM. It gives the current resource utilization of VM and the unallocated resources. Based on this phase, the status of VM can be determined as balanced, overloaded or under-loaded with respect to a threshold.

Task scheduling

Once resource details of a VM are identified the tasks are scheduled to appropriate resources on appropriate VMs by a scheduling algorithm.

Resource allocation

The resources are allocated to scheduled tasks for execution. A resource allocation policy is being employed to accomplish this. A large number of scheduling and allocation policies are proposed in literature. While, scheduling is required for speeding up the execution, allocation policy is used for proper resource management and improving resource performance. The strength of the load balancing algorithm is determined by the efficacy of the scheduling algorithm and the allocation policy [ 17 , 18 , 19 ].

Migration is an important phase in load balancing process in cloud and latter is incomplete without the former. Migration is of two kinds in cloud based on entity taken into consideration- VM migration and task migration. VM migration is the movement of a VM from one physical host to another to get rid of the overloading problem and is categorized into types as live VM migration and non live migration. Likewise task migration is the movement of tasks across VMs and is of two types: intra VM task migration and inter VM task migration. A large number of migration approaches has been proposed in literature. An efficient migration technique leads to an efficient load balancing. From the extensive survey it has been concluded that task migration process is more time and cost effective than VM migration and the trend has shifted from VM to task migration [ 20 , 21 , 22 , 23 , 24 ].

Related work

In general a lot of work have been done in the field of cloud computing particularly in scheduling (tasks, VMs and Compute), resource provisioning, resource management, energy management and load balancing etc. However, load balancing has been an eagle’s eye among researchers because of its essence in cloud computing between the stakeholders’ i.e. Cloud Service Provider and Cloud Service Consumer. Based on analysis of existing review literature one of the reasons presented is absence of proper classification among different approaches. A thorough review about the existing work in literature has been presented in this section.

Ghomi et al. [ 25 ] proposed a survey on load balancing algorithm in cloud computing. The authors presented classification on task scheduling and load balancing algorithms in seven different categories that include hadoop- map reduce load balancing, agent based load balancing, natural phenomena based load balancing, application oriented load balancing, general load balancing, network aware load balancing and workflow specific load balancing which in literature fall under two domains based on system state and who initialized the process. From each category, the different algorithms are grouped together and their advantages and limitations are listed. Meanwhile, Milani et al. [ 26 ] reviewed existing load balancing techniques, established on the survey; authors grouped existing algorithms into three broad domains as static, dynamic and hybrid. The authors formalized relevant questions towards load balancing and addressed key concern about importance, expectation level of metrics, role and challenges faced in load balancing. A proper search operation was followed in search query to retrieve most relevant content from different publishing sources assisted by Boolean operations in search strings and selection criteria phase was executed with Quality Assessment Checklist (QAC). However the two surveys examined limited QoS metrics in their work that are Response time, Makespan, Scalability, Resource utilization, Migration Time, Throughput and Energy saving leaving behind a gap to consider other important QoS metrics like migration cost, service level violations, degree of balance, task rejection ratio etc. This gap in metric selection for analysis is overcome in this survey.

Kalra and Singh [ 27 ] conducted a comparative study of various scheduling algorithms for cloud and grid computing considering five fundamental meta-heuristic methods namely Ant Colony Optimization (ACO), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), League Championship Algorithm (LCA) and BAT algorithm. Besides this a thorough comparison is made among the techniques; however their work is limited to scheduling algorithms for meta-heuristic techniques only. Also the survey concentrates on evolutionary algorithms only and lacks broad classification.

Mesbahi and Rahmani [ 28 ] classified load balancing algorithms into three categories: general algorithm based, architectural based and artificial intelligence based, studied the basic requirements and essentials in designing and implementing a desired load balancer for cloud provider. Like other previous studies, this paper considers static and dynamic categorization as broad classification. However, authors suggested key challenges in designing load balancing algorithms. Further, authors made a judgment on the basis of study on algorithms that have properties of being dynamic, distributive and non-cooperative are best.

Kanakala et al. [ 29 ] proposed a classification paper on existing load balancing algorithms which are grouped into static and dynamic algorithms like ones discussed in previous studies. They also identified challenges in finding solution for problem of load balancing. Among the challenges are geographical distribution of nodes, migration time, system performance, energy management and security which are long ago listed in the literature. In-fact the authors compared existing load balancing algorithms on the basis of certain QoS metrics like throughput, speed, response time, migration time etc. The paper concluded that there is tradeoff among metrics. The limitation of paper is that only eight load balancing algorithms are compared from a vast set of algorithms.

Shah et al. [ 30 ] discusses a comprehensive overview with respect to the load balancing algorithms. The different load balancing methods were classified as static and dynamic based on the state of the system, homogeneous and heterogeneous based on VM type uniformity. Performance metrics were also used to classify the load balancing methods. Further, the advantages and disadvantages of each algorithm were discussed. The paper does not address the literature in a systematic manner.

Neghabi et al. [ 31 ] presented a well defined, systematic and potential review about load balancing techniques in software defined networks and broadly classified them into deterministic and non-deterministic approaches along with associated metrics being investigated into depth. The study poses some important questions and tries to answer them along dimensions of their significance, metric analysis, role and challenges being faced in load balancing of software defined networks. The study carried out by the authors presents the detailed advantages and limitations of existing literature in communication networks. Further the paper holds a strong foundation and solid correlation among load balancing metrics, despite the fact that it does not go specific to cloud computing domain. Also, the study is based on single level classification rather than hierarchical classification.

From above listed survey papers, it is concluded that already existing survey papers are lacking from a good classification system. A criterion is fixed for classification purpose, but no generalization and specialization characteristics are drawn, which eventually lead to inadequate and insufficient conclusions. Further, existing review articles does not examine some important parameters like algorithmic complexity of load balancing algorithms and also the percentage of occurrence of load balancing metrics in literature. The existing survey papers lacks full description of QoS metric set, most likely new metrics (as migration cost, service level violations, degree of balance, task rejection ratio) should have been introduced in survey. A taxonomy based classification is proposed in this paper to prove its effectiveness over the existing literature. Also, the classification of QoS metrics is proposed in this survey as performance metrics and economic metrics [ 32 ]. So, to guide future researchers in developing an efficient, robust, fault tolerant and advanced load balancing algorithm and to give them new insights into future work, a taxonomy based classification system is introduced in this paper. The proposed methodology of classification is based on various characteristics of load balancing algorithms- ‘nature of algorithm’, ‘state of the algorithm’, ‘trait used for load balancing’, ‘mode of execution’, ‘type’, ‘functionality’, and ‘technique used by algorithm’.

Research methodology

To go deep into roots of load balancing process as to what causes load unbalancing problem a proper research methodology was followed. The literature survey was conducted in accordance with general research strategy that outlines the way in which load unbalancing problem is undertaken and identifies the methods, theories, algorithms, approaches and paradigms used in it. The load unbalancing problem was studied in accordance with constructive generic framework (CGF) methodology [ 33 ] where it is broken down into sub- processes i.e. the factors, variables and parameters that are associated with load balancing. Further the literature study was enhanced by following the research guideline for Systematic Literature Review (SLR) as contemplated by Kitchenham with a special focus on research related to load balancing mechanism in cloud [ 26 , 34 ]. An SLR is a repeated research method that can be replicated by other researchers to explore more knowledge.

In order to feature the importance of load balancing in cloud computing, a set of questions were framed to address the key issues and challenges in load unbalancing.

Question identification

A set of questions were identified from literature survey that need to be answered before going into the load balancing process. Some of the questions have been answered in literature while others are not. The questions are given as follows:

RQ1: What causes the load unbalancing problem? This question tries to answer why load unbalancing problem happens. This involves identification of factors responsible for cause of load unbalancing. The question cannot be answered until each individual factor is considered and studied with full detail. The load balancing process is incomplete unless knowledge about variables leading to unbalancing is not clear. This is of prime importance and till date no study considered this question. So, this paper presents answer of this question.

RQ2: Why load balancing is the need of hour in cloud computing? This question tries to answer the issues and challenges faced by cloud service providers.

RQ3: Does load balancing consider the evaluation of single objective (single attribute) or multi-objective (multi-attribute) function(s)? This question classifies the existing load balancing algorithms into single objective and multi-objective approaches.

RQ4: What is the time complexity of load balancing algorithm? This question answers the amount of time taken by load balancing algorithm to complete the load balancing process. Algorithm complexity is not taken as a standard for classifying the LB algorithms in the existing literature. The algorithm should run with real time algorithm complexity to be of practical use.

This section explores the causes for load unbalancing problem in IaaS clouds and tries to answers RQ1. The following factors are the causes for load unbalancing problem in IaaS clouds.

The dynamic nature of user tasks.

The unpredictable and probabilistic traffic flow to a cloud provider.

Lack of robust, accurate and efficient mapper and generator function to map the tasks to the appropriate resources.

The scheduling process itself is an NP hard problem.

The heterogeneous nature of user tasks demanding varying resource requirements.

The uneven and non- uniform distribution of tasks across computing resources along with their dependencies also contributes to load unbalancing situation.

Load balancing is a promising solution to load unbalancing problem that arise due to circumstances discussed in this section. This section answers RQ2 with the importance of load balancing in cloud computing. The load balancing algorithm has to enhance response time, cost of execution, execution time, throughput, fault tolerance, migration time, degree of balance, makespan, resource utilization and scalability. At the same time to reduce the resource wastage, migration cost, power consumption, energy consumption, carbon emission and SLA violations. The degradation of values of these factors leads to poor Quality of Service (QOS) to CSC and drop in economy in the form of profit to CSP. So keeping in view QOS and economy, it has become a big challenge for CSPs to provide QOS according to guaranteed SLA. However, to improve performance and economic metrics in one go is still a milestone for researchers to conclude the load balancing as NP hard problem like scheduling. This is because as we try to improve one specific metric the associated metrics begin to diminish and bottleneck persists, thus declaring load unbalancing as a multi- constrained multi-objective problem.

The section discusses classification of load balancing approaches as single objective and multi-objective on the basis of number of objective functions solved by a particular algorithm and tries to answer RQ3. RQ3 is also an elucidation of the RQ2 to consider load unbalancing as a multi-objective problem. Till date there exists no perfect load balancing algorithm in literature that takes into accounts all of the metrics in a single algorithm. Different researchers propose single objective algorithm to speed up a single metric while others tried to improve more than one metric at a time. The limitation of single objective approaches is that introduction of these in load balancing process altogether would result in huge complexity of architectural design and may become impractical for use. So prime attention is shifted from single objective to multi-objective approaches. Table  3 reviews different existing approaches based on single objective or multi-objective function(s) solved by a particular approach.

The RQ4 tries to answer time complexity of the algorithm being used in load balancing process and should be considered as a benchmark to determine performance of a load balancing algorithm. However, as a matter of concern we could not find enough literature determining the algorithmic complexity of an approach being used in the process. Out of top 35 studies conducted in this research only 7 studies considers the algorithmic complexity in their work which accounts to only 20% and the figure may drop as we increase the search space.

Milani et al. [ 26 ] identified the three main primary questions in the existing literature and justified them in their work. The questions were formalized as follows;

What is the consequence on load balancing with the growth of cloud users? The authors pointed out that from 2010 to 2015 there is a momentous rise in research papers on the scope of load balancing that follows a positive exponential curve and in our work find it more increasing in 2016, 2017 and 2018. This shows the importance of load balancing in the cloud computing with increasing number of users.

What is the capability of present load balancing approaches to meet the primary load balancing metrics? The question was answered and validated through the argument that dynamic load balancing algorithms are more practical, robust, efficient and fault tolerant than static ones.

What are the problems, issues, challenges and solutions identified in load balancing for future trends? The limitations and advantages of the existing approaches were listed and based on that challenges faced by researchers were discussed.

Data collection search process

The data collection search process includes papers gathered from reputed sources, journals and publications from five most authentic and potential databases that are scientifically and technically peer- reviewed: IEEE Xplore Digital Library, Science Direct, ACM Digital Library, Springer and Elsevier. The search was organized in June 2018 with data collected from 2010 to June 2018. The data source consist of review and survey papers, journal papers and conference papers excluding book chapters. A well organized search process was adopted to retrieve relevant data started with fundamental terms to advanced ones. The search strings were framed for source databases with inclusion and exclusion criteria similar to the one used by [ 26 ]. Search keywords were formed along with their synonyms to increase the search space. Initially, the basic terms and keywords were used in search query processing like “Load Balancing in Cloud Computing”, “Workload Distribution in Cloud Computing”, “Resource Distribution in Cloud”, “Task scheduling”, “Migration process in Cloud Computing”, “Resource utilization in Cloud”, “Resource allocation policies in Cloud”, “Load scheduling in Cloud” and much more. Later on the advanced terms were used following by integration of basic keywords in query operation assisted by Boolean operations as “Boolean OR” and “Boolean AND” [ 26 , 35 ] to narrow down the search space for relevant data. As an illustration the following keywords were integrated “Resource allocation AND Task scheduling”, “Task Migration” “Task scheduling” and “Resource utilization”. Later on the advanced search operation was equipped to collect most reliable papers for data collection, like the use of Inspec controlled and non- controlled keywords were applied followed by advanced filters. As an example the following advanced search operation was applied in the IEEE Xplore. “Advanced keyword/phrase” option which include two sub options “metadata only” and “full text and metadata” each using three Boolean operations “AND”, “OR” “NOT”. Similarly, command search and citation search options were also used.

Proposed classification of load balancing algorithms

In this section load balancing algorithms are classified based on various criteria. A top down approach is proposed and followed in classification process. The limitation of existing review papers is that there is no proper and significant hierarchical taxonomical classification of load balancing algorithms which makes it quite difficult to identify where a particular algorithm holds its place in taxonomy. The various criteria used for classification purpose include ‘nature of algorithm’, ‘state of algorithm’, ‘trait used for load balancing’, ‘type of load balancing’, and ‘technique used in load balancing ’. For the first time in literature an in-depth analysis of the LB algorithms have been achieved in this work which the previous studies were lacking. Based on nature of algorithm, the load balancing algorithms are either proactive or reactive. This is first broad categorization we placed in taxonomy and which till date have not been shown in any of the literature studies. Based on state of system, the LB algorithms are static, dynamic or hybrid. On the basis of trait used in load balancing, LB algorithms are classified as scheduling and allocation algorithms. On the basis of type of LB algorithms they are grouped as VM LB algorithms, CPU LB algorithms, Task LB algorithms, Server LB algorithms, Network LB algorithms and Normal Cloud Load balancing algorithms. On the basis of functionality, the load balancing methods are grouped as hardware load balancing; elastic load balancing and latter is further grouped into network load balancing, application load balancing and classic load balancing. Based on the technique, Load balancing algorithms are classified as; Machine learning, Evolutionary, Nature Inspired, mathematical derived algorithms, and swarm based techniques.

Nature of the algorithm

The first categorization of load balancing algorithms in this work has been done on the basis of nature of algorithm. On the basis of this classification, LB algorithms are classified as proactive based approaches and reactive based approaches. However in other fields of technology particularly in the communication and networking for mobile adhoc networks (MANETS), the nature of the communication routing protocols has been extensively studied under these two variants [ 36 ].

A proactive based LB algorithmic technique is an approach to algorithmic design which takes into consideration action by causing change and not only reacting to that change when it happens. It is intended to yield a good outcome to avoid a problem in advance rather than waiting until there is a problem. Proactive behavior aims at identification and exploitation of opportunities and in taking preemptory action against potential problems and threats. The limitation of existing approaches is that a limited number of proactive approaches have been used and that too in a traditional manner with no novel concepts. Table  1 depicts proactive approaches in existing LB approaches. Polepally et al. [ 37 ] proposed dragonfly optimization and constraint measure-based LB approach in cloud computing by distributing uniform load across VMs with minimal power consumption. Xiao, et al. [ 38 ] proposed game theory based fairness-aware LB algorithm to minimize expected response time while maintain fairness. The Nash equilibrium point of the game corresponds to load balancing at optimal level.

The reactive based approaches act in response to a situation rather than controlling it. In reactive based approach of load balancing the problem of load unbalancing is solved as it arises and after which the consequences are visible. Most of load balancing algorithms fall under this category. The main flaw that has been analyzed from literature of existing works on load balancing is that load unbalancing problem is left to happen and then researchers propose some approaches to tackle that problem by optimizing some load balancing parameter(s) [ 32 ] as given in Fig.  2 . Table  2 discusses the reactive approaches in the existing LB approaches e.g. Adhikari et al. [ 39 ] proposed a heuristic-based scheduling and load balancing algorithm for IaaS cloud to minimize task completion time, makespan, waiting time, and increase resource utilization. Proactive approaches are more effective than reactive approaches as the former tries to avoid the problem in advance while in latter the solution is provided after the problem occurs.

figure 2

Load Balancing Metrics [ 32 ]

State of the algorithm

On the basis of state information of system that an algorithm relies on, LB algorithms are widely classified as static, dynamic and hybrid. From existing literature survey, it is evident that this is most widely used classification system for LB algorithms. Majority of work on comparative studies on load balancing begin the algorithmic taxonomy by placing this category on top of taxonomy. In static load balancing, traffic load is segregated uniformly across the servers. This is done by algorithm having the prior knowledge about system resources and task requirements. The static LB algorithm schedules tasks to VM for execution at compile time. The advantage of static algorithm is their less complexity but they suffer from a fatal bottleneck of being unable to move tasks during execution in progress to another machine for load balancing. Static algorithms do not consider current state of system and requires advance knowledge about machines and tasks like task resource requirements, communication time, processing power of nodes, memory, storage, bandwidth capacity and so on. The main drawback of static LB algorithm is that migration process is not possible during execution of tasks and hence is not suitable approach for distributed system like cloud where system state changes dynamically.

Further on the basis of mode of execution of tasks, dynamic algorithms are grouped as offline mode also called as batch mode and online mode or live mode as shown in Fig.  3 . In batch mode, the task is allocated only at some predefined instances where as in online mode the user task is mapped to a VM as soon it enters the scheduler. Dynamic load balancing algorithms are comparatively complex algorithms in contrast with their counterparts that handle incoming traffic flow at run time and can change state of a running task at any point of time. Dynamic load balancing takes into consideration the current state of system and has capacity to deal with unpredictable processing load. The advantage of dynamic load balancing is that tasks can move dynamically from an overloaded machine to under-loaded one but are much complex in nature and much complicated to design compared to static LB algorithms.

figure 3

Load balancing algorithms on the basis of nature and state of the system

However dynamic LB algorithms are much efficient in terms of performance, accuracy and functionality. Static load balancing algorithms work smoothly if nodes have small load variations but could not operate in varying load environments. Figure  3 shows the load balancing taxonomy on the basis of nature and state of algorithm.

Trait used for load balancing

The algorithms in this category are classified as scheduling and allocation algorithms. The allocation and scheduling algorithms in cloud are classified based upon current state of VM and accordingly can be static or dynamic. The allocation and scheduling policies play a vital role in resource management and performance monitoring of cloud, which in turn has a good impact on QoS delivery to user. The scheduling policies are decomposed into three subsequent activities task scheduling, resource scheduling and VM scheduling; likewise the allocation policies are decomposed as task allocation, resource allocation and VM allocation respectively.

Task scheduling is the method of assigning user tasks to relevant computing resources for execution while resource scheduling is the process of planning, managing and monitoring computing resources for task execution. VM scheduling is the process of creating, destroying and managing VMs within a physical host apart from managing the VMs during migration process across the hosts. The task allocation is the act of allocating task to a resource on which it is supposed to execute. Resource allocation is the act of allocating a resource to a task for its completion. Task allocation and resource allocation are inversion of each other. VM allocation is the allocation of virtual machine to a user or a set of users. Figure  4 shows the load balancing algorithms on the basis of trait being used.

figure 4

Load balancing on the basis of trait used

Functionality

On the basis of functionality, load balancers are classified as hardware load balancer and elastic load balancer as depicted in Fig.  5 . Hardware load balancers are concerned with the distribution of workload at hardware level i.e. memory, storage and CPU. Elastic Load Balancing automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, and IP addresses. It can handle varying load of user application traffic in a single availability zone or across multiple availability zones. Elastic Load Balancing offers three types of load balancers that all feature high availability, automatic scaling, and robust security necessary to make user applications fault tolerant. Application Load Balancer operates at the request level (layer 7) routing traffic to targets - EC2 instances, containers and IP addresses based on the content of the request. Ideal for advanced load balancing of HTTP and HTTPS traffic, Application Load Balancer provides advanced request routing targeted at delivery of modern application architectures, including micro-services and container-based applications. Application Load Balancer simplifies and improves the security of your application, by ensuring that the latest SSL/TLS ciphers and protocols are used at all times. Network load balancers are implemented at the transport layer of the OSI model. It has ability to handle millions of requests per second. The network load balancing is popularly used by Microsoft azure and AWS in deployment model. The network load balancing feature allows traffic distribution among servers using the TCP/IP internet protocol. Classic Load Balancer provides basic load balancing across multiple Amazon EC2 instances and operates at both the request level and connection level. Classic Load Balancer is intended for applications that were built within the EC2-Classic network.

figure 5

Load balancing on the basis of functionality and type

Type of load balancing

On the basis of type, LB algorithms are classified as VM LB, CPU LB, task LB, server LB, network LB and normal cloud LB as shown in Fig.  5 . VM load balancing is the process of redistribution of VMs from overloaded nodes to under loaded nodes and was first introduced as a new inbox feature in windows server 2016 that allows optimize node utilization in a failure cluster. VM load balancing identifies over committed nodes and redistributes VMs from those nodes to under committed nodes. VMs are live migrated from a node exceeding threshold to a newly added node in failure cluster. VM load balancing is achieved through VM migration process. CPU load balancing is the process of limiting the load on a CPU within its threshold limit. Task Load balancing is the act of distribution of tasks across the VMs from overloaded machines to under loaded machines. Server LB is proper distribution of total incoming load in a datacenter or a server farm across the servers. Network LB is concerned with management of incoming traffic without use of complex protocols.

Technique used in load balancing

On the basis of the technique used, load balancing algorithms are classified as heuristics and meta-heuristics techniques, and optimization techniques.

A heuristic approach is an approach to problem solving accounting a practical method or methodology guaranteed not be optimal, perfect, logical or rationale but sufficient to reach an immediate goal. Finding an optimal solution may be impossible or impractical particularly to load balancing which is a NP hard problem and heuristics play an important role to gear up the process of finding a decent solution. Heuristic methods are designed following strategies derived from previous experience with similar problem sets. Heuristics play a crucial role in load balancing process to sort up various issues faced by CSPs. A lot of research work has been carried out with heuristic and meta-heuristic approaches in cloud load balancing and as such we have classified the heuristic and meta-heuristic methods into nature inspired algorithms and classical algorithms. The nature inspired algorithms are sub divided into evolutionary based algorithms and swarm based algorithms.

Optimization techniques are used to find optimal solutions of a problem. Optimization techniques in cloud load balancing are broadly classified as classical and non classical optimization techniques. These algorithms can be either stochastic or deterministic. A further classification classifies the optimization techniques into constrained based and non constrained based algorithms and these can further be either a single objective or multi-criteria optimization. The multi- criteria optimization is further classified as multi-attribute and multi-objective optimization. The multi- objective algorithms may be either machine learning based, nature inspired based, swarm based or mathematical derived based load balancing algorithms. Figure  6 shows the load balancing algorithms on the basis of technique used.

figure 6

Load balancing algorithms on the basis of technique

Table 1 and Table 2 respectively list the different characteristics of proactive and reactive approaches in related literature along various dimensions. The strength and weakness of each approach is also reflected in Table 1 and Table 2 . Table 3 depicts the different approaches under investigation as single objective and multi-objective. Table 3 also highlights the implementation platform, tool and simulating environment under which a particular approach was studied and investigated. Finally Table  4 presents the essential load balancing metrics analyzed in the existing approaches.

Results and discussion

This section outlines the results achieved from comparative analysis of different load balancing approaches in cloud computing. Figure  7(a) shows the percentage of various scheduling types in proactive based load balancing approaches. It is clear that task scheduling and resource scheduling each with 45.45% contribution are more often considered in proactive based approaches with less attention towards VM scheduling which contribute 9.09%. From Fig.  7(b) it is evident that most of the reactive approaches in existing literature have been studied under task scheduling which amount to 51.85%, followed by VM scheduling which contribute to 25.93% and resource scheduling which contribute to 22.22% respectively. Figure  8 describes the percentage of research articles on cloud load balancing defining the algorithmic complexity. It is calculated that 80% of research articles did not considered algorithmic complexity in their work while only 20% define it in their work. It is analyzed from Fig.  9(a) that proactive approaches are always dynamic in nature while Fig. 9(b) depicts that most of the reactive approaches fall under dynamic state of algorithm which contribute to 68%, followed by static algorithm which contribute 20% and hybrid algorithm which represent 12%. It is evident from Fig.  10(a) that 60% of proactive approaches are multi-objective while 40% are single objective approaches. Likewise 56% of reactive approaches are multi-objective while 44% are single objective approaches as depicted in Fig.  10(b) . Figure  11 displays the testing environment on which a particular approach was run to evaluate the performance metrics. It is clear that CloudSim simulator is extensively used for conducting simulation experiment constituting 33.33% of experimental implementation followed by Cloud Analyst simulator with 19.44% of experimental implementation. C and C++, Matlab implementation of load balancing approaches amount to 11.11% each respectively while others constitute 19.44%. The real time implementation of cloud load balancing approaches is very less and constitutes only 5.56%. Figure  12 depicts the percentage of LB metrics in the existing approaches where response time, execution time, resource utilization, makespan, scalability and execution cost are most widely discussed each with 13.39%, 11.81%, 11.02%, 9.45%, 9.45% and 8.66% respectively.

figure 7

Percentage based on scheduling trait (task scheduling, VM scheduling, and resource scheduling) in proactive and reactive approaches

figure 8

Percentage of research articles designating algorithmic complexity

figure 9

Percentage based on state of algorithm in proactive (dynamic and static)and reactive (dynamic, static, and hybrid) approaches

figure 10

Percentage of multi-objective and single objective algorithms in proactive and reactive approaches

figure 11

Experimental platforms for cloud load balancing approaches

figure 12

Percentage of LB metrics in Existing approaches

Discussion on open issues on load balancing in cloud computing

The review presented in this article addresses some important issues that had not been taken with good consideration in existing survey literature neither in technical literature and which the cloud load balancing demands rigorously. Thus, we discuss some open research in this section.

Complexity of an algorithm is a pivotal element in determining the performance of any load balancing algorithm. Out of 35 potential technical articles considered in this study, it is found that only 7 articles define the corresponding algorithmic complexity which amounts to 20% and also 28 articles does not define the algorithmic complexity which amounts to 80%. Therefore, it is observed that majority of the works does not include algorithmic complexity and hence for future researchers it is suggested that algorithm complexity should be made a benchmark for developing a new load balancing approach with improved practicality.

A reactive approach of load balancing always features migration in particular task migration. Migration of tasks always incurs some cost that is called as migration cost. From the study it is evident that less literature in cloud load balancing focuses on migration cost apart from Service Level Violations, Task Rejection ratio and power consumption. This can be considered as important direction for future researchers in development of reactive approaches with minimum migration cost.

Further, from the study carried out in this work, it is investigated that majority of the works primarily focus on certain metrics and avoids other main metrics. Out of 16 different metrics collected in this study it is revealed that most of existing works on cloud load balancing features 6 metrics as key parameters for evaluation that are response time (13.39%), execution time (11.81%), resource utilization (11.02%), makespan (9.45%), Scalability (9.45%) and execution cost (8.66%) respectively as depicted in Fig.  12 . while the remaining 10 metrics account to only 36.22% and they are Throughput (7.87%), Overhead (7.09%), Fault Tolerance (4.72%), Degree of balance (4.72%), Migration Time (3.93%), Power consumption (3.14%), Waiting time (2.36%), Task Rejection Ratio (1.50%), and Service Level Violation (0.78%). considering these metrics in future works is also one of the insights for future researchers.

Conclusion and future work

The work presents a comparative study on load balancing approaches in reviewed articles. The problem of load unbalancing in cloud computing was discussed along with driving factors that lead to this problem. An abstracted load balancing model was briefly discussed together with activities involved in load balancing process. A proper research methodology was followed in which the problem was studied in guidelines with Constructive Generic Framework (CGF) further reinforced by Systematic Literature Review (SLR) methodology. We framed a set of problem related questions and discussed them in the work. The data collected for this study had been gathered from five reputed potential databases that include IEEE Xplore digital library, Science Direct, ACM digital Library, Springer and Elsevier. The data search process was assisted by different tools and advanced filter options. The data was collected for the period from 2010 to June 2018. A multilevel taxonomy based classification was proposed in this work where the classification process is done on five criteria. The most important criteria used in this work is “Nature of Algorithm”. Based on this criteria we classified 35 articles into two broad categories- 10 of them are proactive and 25 of them are reactive in nature. The statistics showed that proactive approaches are 100% dynamic while reactive approaches need not be dynamic. We also generalized that all proactive approaches are dynamic but all dynamic approaches may not be proactive Also the study revealed that task scheduling had been given much importance both in proactive and reactive approaches contributing 45% and 51.85% respectively.

The challenges of the load balancing algorithms are explored in this work in order to suggest more efficient load balancing methods in future. Majority of the reviewed articles had not considered significant and fundamental QoS metrics for investigation. Some of the essential QoS metrics are not discussed in reviewed articles in full depth e.g. migration time, migration cost, power consumption, service level violation, task rejection ratio and degree of balance. Further our study revealed that algorithm complexity is not given much attention in determining the performance of load balancing algorithm and as such 80% of the works does not consider it for evaluation of performance. Also majority of existing load balancing approaches have been implemented on simulator platforms which overall constitute 94.44%. Real time implementation of load balancing is very less (5.56) and should be encouraged in future works.

From the review conducted during this research process, it is concluded that there are a lot of issues still open in load balancing process which can be bridged in future by applying an efficient and sophisticated load balancing algorithm most importantly along dimensions of additional QoS metrics and algorithm complexity evaluation. The survey also presents some algorithms in taxonomy which can guide the future researchers to deal with load unbalancing problem effectively like nature inspired algorithms, machine learning and mathematical derived algorithms (Markov chain, game theory based).

Availability of data and materials

The data has been gathered from research papers and articles that are mentioned in Table 1 , Table 2 , Table 3 and Table 4 .

Abbreviations

Infrastructure as a service

Software as a service

Platform as a service

Pay-as-you-go

National Institute of Standards and Technology

Virtual Machine

Cloud Service Provider

Cloud Service Consumer

Quality of Service

Service Level Agreement

Ant Lion optimizer

Constructive generic framework

Systematic literature review

Load Balancing

Physical Machine

Mobile adhoc networks

Virtual Private Cloud

Pradhan P, Behera PK, Ray BNB (2016) Modified round Robin algorithm for resource allocation in cloud computing. Proced Comp Sci 85:878–890

Article   Google Scholar  

Mishra SK, Sahoo B, Parida PP (2018) Load balancing in cloud computing: a big picture. J King Saud Univ Comp Infor Sci:1–32

Reddy VK, Rao BT, Reddy LSS (2011) Research issues in cloud computing. Glob J Comp Sci Technol 11(11):70–76

Google Scholar  

Bohn RB, Messina J, Liu F, Tong J, Mao J (2011) NIST cloud computing reference architecture. In: Proceedings of IEEE 7th world congress on services (SERVICES’11), Washington, DC, USA, Jul. 2011, pp 594–596

Bokhari MU, Shallal QM, Tamandani YK (2016, March) Cloud computing service models: a comparative study. In: 3rd international conference on computing for sustainable global development (INDIACom), 16–18, March 2016, pp 890–895

Mahmood Z (2011, August) Cloud computing: characteristics and deployment approaches. In: 2011 IEEE 11th international conference on Computer and Information Technology (CIT), pp 121–126

Buyya R, Vecchiola C, Selvi ST (2013) Mastering cloud computing: foundations and applications programming. Morgan Kaufmann, USA, 2013

Chapter   Google Scholar  

Jain N, Choudhary S (2016, March) Overview of virtualization in cloud computing. In: Symposium on colossal data analysis and networking (CDAN), pp 1–4

Alouane M, El Bakkali H (2016, May) Virtualization in cloud computing: no hype vs HyperWall new approach. In: 2016 International Conference on Electrical and Information Technologies (ICEIT), pp 49–54

Rimal BP, Choi E, Lumb I (2009, August) A taxonomy and survey of cloud computing systems. In: Fifth international joint conference on INC, IMS and IDC, 2009. NCM’09, pp 44–51

Afzal S, Kavitha G (2018, December) Optimization of task migration cost in infrastructure cloud computing using IMDLB algorithm. In: 2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET), pp 1–6

Achar R, Thilagam PS, Soans N, Vikyath PV, Rao S, Vijeth AM (2013, December) Load balancing in cloud based on live migration of virtual machines. In: 2013 annual IEEE India Conference (INDICON), pp 1–5

Magalhães D, Calheiros RN, Buyya R, Gomes DG (2015) Workload modeling for resource usage analysis and simulation in cloud computing. Comp Elect Eng 47:69–81

Dam S, Mandal G, Dasgupta K, Dutta P (2015, February) Genetic algorithm and gravitational emulation based hybrid load balancing strategy in cloud computing. In: Proceedings of the 2015 third international conference on computer, communication, control and information technology (C3IT), pp 1–7

Dave A, Patel B, Bhatt G (2016, October) Load balancing in cloud computing using optimization techniques: a study. In: International Conference on Communication and Electronics Systems (ICCES), pp 1–6

Gupta H, Sahu K (2014) Honey bee behavior based load balancing of tasks in cloud computing. Int J Sci Res 3(6)

Mishra SK, Puthal D, Sahoo B, Jena SK, Obaidat MS (2017) An adaptive task allocation technique for green cloud computing. J Supercomp 405:1–16

Ibrahim AH, Faheem HEDM, Mahdy YB, Hedar AR (2016) Resource allocation algorithm for GPUs in a private cloud. Int J Cloud Comp 5(1–2):45–56

Jebalia M, Ben Letafa A, Hamdi M, Tabbane S (2015) An overview on coalitional game-theoretic approaches for resource allocation in cloud computing architectures. Int J Cloud Comp 4(1):63–77

Noshy M, Ibrahim A, Ali HA (2018) Optimization of live virtual machine migration in cloud computing: a survey and future directions. J Netw Comput Appl:1–10

Gkatzikis L, Koutsopoulos I (2013) Migrate or not? Exploiting dynamic task migration in mobile cloud computing systems. IEEE Wirel Commun 20(3):24–32

Jamshidi P, Ahmad A, Pahl C (2013) Cloud migration research: a systematic review. IEEE Trans Cloud Comp 1(2):142–157

Raviteja S, Atmakuri R, Vengaiah C (2017) A review on cloud computing migration and issues

Shamsinezhad E, Shahbahrami A, Hedayati A, Zadeh AK, Banirostam H (2013) Presentation methods for task migration in cloud computing by combination of Yu router and post-copy. Int J Comp Sci Iss 10(4):98

Ghomi EJ, Rahmani AM, Qader NN (2017) Load-balancing algorithms in cloud computing: a survey. J Netw Comput Appl 88:50–71

Milani AS, Navimipour NJ (2016) Load balancing mechanisms and techniques in the cloud environments: systematic literature review and future trends. J Netw Comput Appl 71:86–98

Kalra M, Singh S (2015) A review of metaheuristic scheduling techniques in cloud computing. Egypt Inform J 16(3):275–295

Mesbahi M, Rahmani AM (2016) Load balancing in cloud computing: a state of the art survey. Int J Mod Educ Comp Sci 8(3):64

Kanakala VR, Reddy VK, Karthik K (2015, March) Performance analysis of load balancing techniques in cloud computing environment. In: 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp 1–6

Shah JM, Kotecha K, Pandya S, Choksi DB, Joshi N (2017, May) Load balancing in cloud computing: methodological survey on different types of algorithm. In: 2017 International Conference on Trends in Electronics and Informatics (ICEI), pp 100–107

Neghabi AA, Navimipour NJ, Hosseinzadeh M, Rezaee A (2018) Load balancing mechanisms in the software defined networks: a systematic and comprehensive review of the literature. IEEE Access 6:14159–14178

Afzal S, Kavitha G A taxonomic classification of load balancing metrics: a systematic review

Vacca JR (2009) Computer and information security handbook. Morgan Kauffman, Burlington, MA, p 208

Kitchenham B (2004) Procedures for performing systematic reviews. Keele, UK, Keele University, 33(2004), 1–26

Soltani Z, Navimipour NJ (2016) Customer relationship management mechanisms: a systematic review of the state of the art literature and recommendations for future research. Comput Hum Behav 61:667–688

Pandey K, Swaroop A (2011) A comprehensive performance analysis of proactive, reactive and hybrid manets routing protocols. arXiv preprint arXiv:1112.5703

Polepally V, Chatrapati KS (2017) Dragonfly optimization and constraint measure-based load balancing in cloud computing. Cluster Comp:1–13

Xiao Z, Tong Z, Li K, Li K (2017) Learning non-cooperative game for load balancing under self-interested distributed environment. Appl Soft Comput 52:376–386

Adhikari M, Amgoth T (2018) Heuristic-based load-balancing algorithm for IaaS cloud. Futur Gener Comput Syst 81:156–165

Kumar M, Dubey K, Sharma SC (2018) Elastic and flexible deadline constraint load balancing algorithm for cloud computing. Proced Comp Sci 125:717–724

Borovskiy V, Wust J, Schwarz C, Koch W, Zeier A (2011) A linear programming approach for optimizing workload distribution in a cloud. Cloud Comp:127–132

Krishna PV (2013) Honey bee behavior inspired load balancing of tasks in cloud computing environments. Appl Soft Comput 13(5):2292–2303

Li K, Xu G, Zhao G, Dong Y, Wang D (2011, August). Cloud task scheduling based on load balancing ant colony optimization. In: 2011 sixth annual ChinaGrid conference, pp. 3–9

Singh A, Juneja D, Malhotra M (2015) Autonomous agent based load balancing algorithm in cloud computing. Proced Comp Sci 45:832–841

Lavanya M, Vaithiyanathan V (2015) Load prediction algorithm for dynamic resource allocation. Indian J Sci Technol 8(35)

Chen SL, Chen YY, Kuo SH (2017) CLB: a novel load balancing architecture and algorithm for cloud services. Comp Elect Eng 58:154–160

Ashouraei M, Khezr SN, Benlamri R, Navimipour NJ (2018, August) A new SLA-aware load balancing method in the cloud using an improved parallel task scheduling algorithm. In: 2018 IEEE 6th international conference on future internet of things and cloud (FiCloud), pp 71–76

Kumar M, Sharma SC (2017) Dynamic load balancing algorithm for balancing the workload among virtual machine in cloud computing. Proced Comp Sci 115(C):322–329

Rajput SS, Kushwah VS (2016, December) A genetic based improved load balanced min-min task scheduling algorithm for load balancing in cloud computing. In: 2016 8th international conference on Computational Intelligence and Communication Networks (CICN), pp 677–681

Tang L, Li Z, Ren P, Pan J, Lu Z, Su J, Meng Z (2017) Online and offline based load balance algorithm in cloud computing. Knowl-Based Syst 138:91–104

Ramezani F, Lu J, Hussain FK (2014) Task-based system load balancing in cloud computing using particle swarm optimization. Int J Parallel Prog 42(5):739–754

Vanitha M, Marikkannu P (2017) Effective resource utilization in cloud environment through a dynamic well-organized load balancing algorithm for virtual machines. Comp Elec Eng 57:199–208

Dasgupta K, Mandal B, Dutta P, Mandal JK, Dam S (2013) A genetic algorithm (ga) based load balancing strategy for cloud computing. Proced Technol 10:340–347

Cho KM, Tsai PW, Tsai CW, Yang CS (2015) A hybrid meta-heuristic algorithm for VM scheduling with load balancing in cloud computing. Neural Comput & Applic 26(6):1297–1309

Dam S, Mandal G, Dasgupta K, Dutta P (2015, February) Genetic algorithm and gravitational emulation based hybrid load balancing strategy in cloud computing. In: 2015 third international conference on computer, communication, control and information technology (C3IT), pp 1–7

Vasudevan SK, Anandaram S, Menon AJ, Aravinth A (2016) A novel improved honey bee based load balancing technique in cloud computing environment. Asian J Infor Technol 15(9):1425–1430

Kapur R (2015, August) A workload balanced approach for resource scheduling in cloud computing. In: 2015 eighth international conference on contemporary computing (IC3), pp 36–41

Panwar R, Mallick B (2015, October) Load balancing in cloud computing using dynamic load management algorithm. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), pp 773–778

Sharma S, Luhach AK, Abdhullah SS (2016) An optimal load balancing technique for cloud computing environment using bat algorithm. Indian J Sci Technol 9(28)

Ajit M, Vidya G (2013, July) VM level load balancing in cloud environment. In: 2013 fourth International Conference on Computing,Communications and Networking Technologies (ICCCNT), pp 1–5

Mondal B, Choudhury A (2015) Simulated annealing (SA) based load balancing strategy for cloud computing. Int J Comp Sci Info Technol 6(4):3307–3312

Pasha N, Agarwal A, Rastogi R (2014) Round robin approach for VM load balancing algorithm in cloud computing environment. Int J Adv Res Comp Sci Soft Eng 4(5):34–39

Gulati A, Chopra RK (2013) Dynamic round robin for load balancing in a cloud computing. IJCSMC 2(6):274–278

Galloway JM, Smith KL, Vrbsky SS (2011, October) Power aware load balancing for cloud computing. In: proceedings of the world congress on engineering and computer science, Vol. 1, pp 19–21

Garg S, Gupta DV, Dwivedi RK (2016, November) Enhanced active monitoring load balancing algorithm for virtual machines in cloud computing. In: International conference on System Modeling & Advancement in Research Trends (SMART), pp 339–344

Tripathi AM, Singh S (2018) PMAMA: priority-based modified active monitoring load balancing algorithm in cloud computing. J Adv Res Dynam Cont Syst:809–823

Singh AN, Prakash S (2018) WAMLB: weighted active monitoring load balancing in cloud computing. In: Big data analytics. Springer, Singapore, pp 677–685

Patel G, Mehta R, Bhoi U (2015) Enhanced load balanced min-min algorithm for static meta task scheduling in cloud computing. Proced Comp Sci 57:545–553

Chen H, Wang F, Helian N, Akanmu G (2013, February) User-priority guided min-min scheduling algorithm for load balancing in cloud computing. In: 2013 national conference on parallel computing technologies (PARCOMPTECH), pp 1–8

Mathur S, Larji AA, Goyal A (2017, June) Static load balancing using ASA max-min algorithm. Int J Res Appl Sci Eng Technol

Devi DC, Uthariaraj VR (2016) Load balancing in cloud computing environment using improved weighted round robin algorithm for non-preemptive dependent tasks. Sci World J

Download references

Acknowledgements

The authors are grateful to the editor and anonymous referees for their valuable comments and suggestions. Only the authors are responsible for the views expressed and mistakes made.

Author information

Authors and affiliations.

Department of Information Technology, B S Abdur Rahman Crescent Institute of Science and Technology, Chennai, 600048, India

Shahbaz Afzal & G. Kavitha

You can also search for this author in PubMed   Google Scholar

Contributions

Design the study: SA. Collected the data from different sources: SA and GK. Analysis and interpretation of data: SA. Drafting of Manuscript: GK. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shahbaz Afzal .

Ethics declarations

Competing interests.

The authors declare that there is no competing interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Afzal, S., Kavitha, G. Load balancing in cloud computing – A hierarchical taxonomical classification. J Cloud Comp 8 , 22 (2019). https://doi.org/10.1186/s13677-019-0146-7

Download citation

Received : 21 January 2019

Accepted : 25 November 2019

Published : 23 December 2019

DOI : https://doi.org/10.1186/s13677-019-0146-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cloud computing
  • Classification
  • Cloud service consumer
  • Cloud service provider
  • Quality of service
  • Load unbalancing
  • Load balancing

cloud computing in research papers

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Comput Intell Neurosci
  • v.2022; 2022

Logo of cin

This article has been retracted.

The rise of cloud computing: data protection, privacy, and open research challenges—a systematic literature review (slr), junaid hassan.

1 Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad, Chiniot-Faisalabad Campus, Chiniot 35400, Pakistan

Danish Shehzad

2 Department of Computer Science, Superior University, Lahore 54000, Pakistan

Usman Habib

3 Faculty of Computer Sciences and Engineering, GIK Institute of Engineering Sciences and Technology, Topi, Swabi 23640, Khyber Pakhtunkhwa, Pakistan

Muhammad Umar Aftab

Muhammad ahmad, ramil kuleev.

4 Institute of Software Development and Engineering, Innopolis University, Innopolis 420500, Russia

Manuel Mazzara

Associated data.

The data used to support the findings of this study are provided in this article.

Cloud computing is a long-standing dream of computing as a utility, where users can store their data remotely in the cloud to enjoy on-demand services and high-quality applications from a shared pool of configurable computing resources. Thus, the privacy and security of data are of utmost importance to all of its users regardless of the nature of the data being stored. In cloud computing environments, it is especially critical because data is stored in various locations, even around the world, and users do not have any physical access to their sensitive data. Therefore, we need certain data protection techniques to protect the sensitive data that is outsourced over the cloud. In this paper, we conduct a systematic literature review (SLR) to illustrate all the data protection techniques that protect sensitive data outsourced over cloud storage. Therefore, the main objective of this research is to synthesize, classify, and identify important studies in the field of study. Accordingly, an evidence-based approach is used in this study. Preliminary results are based on answers to four research questions. Out of 493 research articles, 52 studies were selected. 52 papers use different data protection techniques, which can be divided into two main categories, namely noncryptographic techniques and cryptographic techniques. Noncryptographic techniques consist of data splitting, data anonymization, and steganographic techniques, whereas cryptographic techniques consist of encryption, searchable encryption, homomorphic encryption, and signcryption. In this work, we compare all of these techniques in terms of data protection accuracy, overhead, and operations on masked data. Finally, we discuss the future research challenges facing the implementation of these techniques.

1. Introduction

Recent advances have given rise to the popularity and success of cloud computing. It is a new computing and business model that provides on-demand storage and computing resources. The main objective of cloud computing is to gain financial benefits as cloud computing offers an effective way to reduce operational and capital costs. Cloud storage is a basic service of cloud computing architecture that allows users to store and share data over the internet. Some of the advantages of cloud storage are offsite backup, efficient and secure file access, unlimited data storage space, and low cost of use. Generally, cloud storage is divided into five categories: (1) private cloud storage, (2) personal cloud storage, (3) public cloud storage, (4) community cloud storage, and (5) hybrid cloud storage.

However, when we outsource data and business applications to a third party, security and privacy issues become a major concern [ 1 ]. Before outsourcing private data to the cloud, there is a need to protect private data by applying different data protection techniques, which we will discuss later in this SLR. After outsourcing the private data to the cloud, sometimes the user wants to perform certain operations on their data, such as secure search. Therefore, while performing such operations on private data, the data needs to be protected from intruders so that intruders cannot hack or steal their sensitive information.

Cloud computing has many advantages because of many other technical resources. For example, it has made it possible to store large amounts of data, perform computation on data, and many other various services. In addition, the cloud computing platform reduces the cost of services and also solves the problem of limited resources by sharing important resources among different users. Performance and resource reliability requires that the platform should be able to tackle the security threats [ 2 ]. In recent years, cloud computing has become one of the most important topics in security research. These pieces of research include software security, network security, and data storage security.

The National Institute of Standards and Technology (NIST) defines cloud computing as [ 3 ] “a model for easy access, ubiquitous, resource integration, and on-demand access that can be easily delivered through various types of service providers. The Pay as You Go (PAYG) mechanism is followed by cloud computing, in which users pay only for the services they use. The PAYG model gives users the ability to develop platforms, storage, and customize the software according to the needs of the end-user or client. These advantages are the reason that the research community has put so much effort into this modern concept [ 4 ].

Security is gained by achieving confidentiality, integrity, and data availability. Cloud users want assurance that their data must be saved while using cloud services. There are various types of attacks that launch on a user's private data, such as intrusion attacks, hacking, stealing the user's private data, and denial of service attacks. 57% of companies report security breaches using cloud services [ 5 ]. Data privacy is more important than data security because cloud service providers (CSPs) have full access to all cloud user's data and can monitor their activities, because of which the cloud user privacy is compromised. For example, a user is a diabetic, and the CSP is analyzing their activities, such as what he is searching for more and what kind of medicine he is using the most. Because of this access, CSP can get all the sensitive information about an individual user and can also share this information with a medicine company or an insurance company [ 6 ]. Another problem is that the user cannot fully trust CSP. Because of this reason, there are many legal issues. Users cannot store their sensitive data on unreliable cloud services because of this mistrust. As a result, many users cannot use cloud services to store their personal or sensitive data in the cloud. There are two ways to solve this problem. One is that the user installs a proxy on his side, and this proxy takes the user's data, encrypts and saves their data using some data protection techniques, and then sends it to the untrusted CSP [ 7 ].

The recent Google privacy policy is that any user can use any Google service free of cost; however, Google monitors their activity by monitoring their data to improve their services [ 8 ]. In this paper, we compare different types of data protection techniques that provide privacy and security over the data stored on the cloud. Many papers discuss outsourcing data storage on the cloud [ 9 , 10 ], however, we also discuss how we can secure the outsourced data on the cloud. Most of the paper describes the data security on the cloud vs the external intruder attacks [ 11 , 12 ]. This paper not only discusses the security attacks from outside intruders and securing mechanisms but also inner attacks from the CSP itself. Many surveys cover data privacy by applying cryptographic techniques [ 13 , 14 ]. These cryptographic techniques are very powerful for the protection of data and also provide a very significant result. However, there is a problem as these cryptographic techniques require key management, and some of the cloud functionalities are not working on these cryptographic techniques. In this paper, we also discuss some steganographic techniques. To the best of our knowledge, no study discusses all the conventional and nonconventional security techniques. Therefore, all the data protection techniques need to be combined in one paper.

The rest of this paper is organized as follows: Section 3 of the paper describes the research methodology that consists of inclusion, exclusion criteria, quality assessment criteria, study selection process, research questions, and data extraction process. Also, we discuss assumptions and requirements for data protection in the cloud. Section 4 presents all the cryptographic and also noncryptographic techniques that are used for data protection over the cloud. Also, we discuss the demographic characteristics of the relevant studies by considering the following four aspects: (i) publication trend, (ii) publication venues (proceeding and journals), (iii) number of citations, and (iv) author information. Section 4 also compares all these data protection techniques. Lastly, in Section 5 , we discuss results and present conclusion and future work.

2. Related Work

The first access control mechanism and data integrity in the provable data possession (PDP) model is proposed in the paper [ 15 ], and it provides two mobile applications based on the RSA algorithm. Like the PDP, the author in the paper [ 16 ] proposed a proof of retrievability (PoR) scheme that is used to ensure the integrity of remote data. PoR scheme efficiency is improved using a shorter authentication tag that is integrated with the PoR system [ 17 ]. A more flexible PDP scheme is proposed by the author of the paper [ 18 ] that uses symmetric key encryption techniques to support dynamic operations. A PDP protocol with some flexible functionality is developed, in which, we can add some blocks at run time [ 19 ]. A new PDP system with a different data structure is introduced, and it improves flexibility performance [ 20 ]. Similarly, another PDP model with a different data structure is designed to handle its data functionality [ 21 ]. To improve the accuracy of the data, the author of the paper [ 22 ] designed a multireplicas data verification scheme that fully supports dynamic data updates.

A unique data integration protocol [ 23 ] for multicloud servers is developed. The author of the paper [ 24 ] also considers the complex area where multiple copies are stored in multiple CSPs and builds a solid system to ensure the integrity of all copies at once. A proxy PDP scheme [ 25 ] is proposed, which supports the delegation of data checking that uses concessions to verify auditor consent. In addition, the restrictions of the verifier are removed that strengthened the scheme, and it proposes a separate PDP certification system [ 26 ]. To maintain the security of information, a concept for information security is proposed and a PDP protocol for public research is developed [ 27 ]. To resolve the certification management issue, the PDP system with data protection is introduced [ 28 ].

Identity-based cryptography is developed, in which a user's unique identity is used as input to generate a secret key [ 29 ]. Another PDP protocol is recommended to ensure confidentiality [ 30 ]. The author of the paper [ 31 ] proposed a scheme, in which tags are generated through the ring signature technique for group-based data sharing that supports public auditing and maintains user privacy. A new PDP system is introduced for data sharing over the cloud while maintaining user privacy [ 32 ]. Additionally, it supports the dynamic group system and allows users to exit or join the group at any time. Another PDP system [ 33 ] that is based on broadcast encryption and supports dynamic groups [ 34 ] is introduced. The issue of user revocation has been raised [ 35 ], and to address this issue, a PDP scheme has been proposed, which removes the user from the CSP using the proxy signature method. A PDP-based group data protocol was developed to track user privacy and identity [ 36 ]. A PDP system [ 37 ] is proposed for data sharing between multiple senders. The author of the paper [ 38 ] provides SEPDP systems while maintaining data protection. However, the author of the paper [ 39 ] proved that the scheme proposed in [ 38 ] is vulnerable to malicious counterfeiting by the CSP. A collision-resistant user revocable public auditing (CRUPA) system [ 40 ] is introduced for managing the data that is shared in groups. Another scheme [ 41 ] is introduced as a way to ensure the integrity of mobile data terminals in cloud computing.

To address the PKI issue, identity-based encryption [ 42 ] is designed to enhance the PDP protocol and maintain user privacy in a dynamic community. Before sharing user-sensitive data with third parties or researchers, data owners ensure that the privacy of user-sensitive data is protected. We can do this using data anonymization techniques [ 43 ]. In recent years, the research community has focused on the PPDP search area and developed several approaches for tabular data and SN [ 44 – 49 ]. There are two popular settings in PPDP: one is interactive, and the other is noninteractive [ 50 ]. The K-anonymity model [ 51 ] and its effects are most commonly used in the noninteractive setting of PPDP [ 52 – 56 ]. Differential privacy (DP) [ 57 ] and an interactive configuration of PPDP make extensive use of DP-based methods [ 58 – 60 ]. Meanwhile, several studies for a noninteractive setting reported a PD-dependent approach [ 61 ]. Researchers have expanded the concepts used to anonymize tabular data to protect the privacy of SN users [ 62 – 64 ].

Most images on the internet are in a compressed form. Hence, various studies design some techniques for AMBTC-compressed images. Data concealment has become an active research area. We can hide the data by adding confidential information to the cover image, and as a result, we get the stego image. There are two types of data hiding schemes: one is irreversible [ 65 – 68 ], and the other is a reversible data hiding scheme [ 69 – 71 ]. A cipher text designated for data collection can be re-encrypted as designated for another by a semitrusted proxy without decryption [ 72 ]. The first concrete construction of collusion-resistant unidirectional identity-based proxy re-encryption scheme, for both selective and adaptive identity, is proposed in the paper [ 73 ]. One of the data hiding schemes is the histogram shifting scheme [ 74 – 76 ], and it is the most widely used. A histogram-shifting data hiding scheme [ 77 ] that detects pixel histograms in the cover image is introduced. When big and diverse data are distributed everywhere, we cannot control the vicious attacks. Therefore, we need a cryptosystem to protect our data [ 78 – 80 ].

Some identity-based signature (IBS) schemes [ 81 – 84 ] are introduced that are based on bilinear pairing. However, the authentication schemes based on bilinear pairing over elliptic curve are more efficient and safer than traditional public key infrastructure [ 85 , 86 ]. The paper [ 87 ] proposed a preserving proxy re-encryption scheme for public cloud access control. A differential attack is performed on one-to-many order preserving encryption OPE by exploiting the differences of the ordered ciphertexts in [ 88 ]. Another scheme is proposed, which consists of a cancelable biometric template protection scheme that is based on the format-preserving encryption and Bloom filters [ 89 ]. Some of the researchers also use the concept of paring free identity-based signature schemes [ 90 – 93 ]. A lightweight proxy re-encryption scheme with certificate-based and incremental cryptography for fog-enabled e-healthcare is proposed in [ 94 ].

3. Research Methodology

The objective of this SLR is to evaluate, investigate, and identify the existing research in the context of data storage security in cloud computing to find and evaluate all the existing techniques. SLR is a fair and unbiased way of evaluating all the existing techniques. This way provides a complete and evidence-based search related to a specific topic. At this time, there is no SLR conducted on data storage security techniques that explains all the cryptographic and noncryptographic techniques. Hence, this SLR fulfills the gap by conducting itself. This SLR aims to provide a systematic method using the guidelines of an SLR provided by Kitchenham [ 95 ]. Furthermore, to increase the intensity of our evidence, we follow another study that is provided by [ 96 ]. Our SLR consists of three phases, namely planning, conducting, and reporting. By following these three phases, we conduct our SLR, as shown in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.001.jpg

Review procedure.

3.1. Research Questions

The primary research question of this systematic literature review is “What types of data protection techniques have been proposed in cloud computing?” This primary research question is further divided into four RQs. All these four questions are enlisted below.

  •   RQ1: what types of data protection techniques have been proposed in cloud computing?
  •   RQ2: what are the demographic characteristics of the relevant studies?
  •   RQ3: which data protection technique provides more data protection among all the techniques?
  •   RQ4: what are the primary findings, research challenges, and directions for future research in the field of data privacy in cloud computing?

3.2. Electronic Databases

Six electronic databases were selected to collect primary search articles. All these six electronic databases are well-reputed in the domain of cloud computing. Most of the relevant articles are taken from two electronic databases, namely IEEE and Elsevier. All the electronic databases that we use in this research process are given in Table 1 .

Databases sources.

3.3. Research Terms

First of all, the title base search is done on the different electronic databases, which are given in Table 1 . After that, most related studies/articles are taken. Search is done using the string (p1 OR p2. . . . . .OR pn.) AND (t1 OR t2. . . . . . OR tn.). This string/query is constructed using a population, intervention, control, and outcomes (PICO) structure that consists of population, intervention, and outcome. Database search queries are given in Table 2 .

  •   Population : “cloud computing”
  •   Intervention : “data security,” “data privacy,” “data integrity”
  •   Using the PICO structure, we construct a general query for the electronic database. Generic: ((“Document Title”: cloud∗) AND (“Document Title”: data AND (privacy OR protect∗ OR secure∗ OR integrity∗))).

Databases search query.

3.4. Procedure of Study Selection

The procedure of study selection is described in Figure 2 . This procedure has three phases: the first one is exclusion based on the title, in which articles are excluded based on the title, and the relevant titles are included. The second is exclusion based on the abstract in which articles are excluded. By reading the abstract of the articles, the most relevant abstract is included, and the last one is exclusion based on a full text that also includes quality assessment criteria.

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.002.jpg

Study selection procedure.

3.5. Eligibility Control

In this phase, all the selected papers are fully readied, and relevant papers are selected to process our SLR further. Table 3 shows the final selected papers from each database based on inclusion and exclusion criteria. The related papers are selected based on inclusion and exclusion criteria, which are given in Table 4 .

Results from electronic databases.

Inclusion and exclusion criteria.

3.6. Inclusion and Exclusion Criteria

We can use the inclusion and exclusion criteria to define eligibility for basic study selection. We apply the inclusion and exclusion criteria to those studies that are selected after reading the abstract of the papers. The criteria for inclusion and exclusion are set out in Table 4. Table 4 outlines some of the conditions that we have applied to the articles. After applying the inclusion and exclusion criteria, we get relevant articles, which we finally added to our SLR. The search period is from 2010 to 2021, and most of the papers included in our SLR are from 2015 to onward.

We apply inclusion and exclusion criteria in the third phase of the study selection process, and we get 139 results. After that, we also apply quality criteria, and finally, we get 52 articles, which are included in this SLR. Most of the articles are taken from Elsevier and IEEE electronic databases. IEEE is the largest Venus for data storage security in cloud computing. The ratio of the selected articles from different electronic databases is shown in Figure 3 .

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.003.jpg

Percentage of selected studies.

3.7. Quality Assessment Criteria

Quality checking/assessment is done in the 3 rd phase of the study selection process. A scale of 0-1 is used for the quality assessment (QA) of the articles.

Poor-quality articles get 0 points on the scale, and good-quality articles get 1 point on the scale. The articles with 1 point on the scale are included in this SLR. Hence, by applying the quality checking/assessment criteria on all the articles, we finally get 52 articles. All the selected papers have validity and novelty for different data protection techniques, and also, we find the relevance of the articles in the quality assessment criteria, which ensures that all the articles are related to the SLR (data storage protection and privacy in cloud computing). The quality checking (QC) criteria are given in Table 5 .

Quality checking criteria.

3.8. Taxonomy of the Data Protection Techniques

In this section, all the data protection techniques are depicted in Figure 4 . All the data protection techniques are arranged and classified in their related categories. The purpose of the taxonomy is to give a presentational view of all the data protection techniques. The data protection techniques are mainly divided into two categories, namely (1) noncryptographic techniques and (2) cryptographic techniques.

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.004.jpg

Taxonomy of the data protection techniques.

4. Results and Discussions

Data protection on the cloud is done by developing a third-party proxy that is trusted by the user. The trusted proxy is not a physical entity. It is a logical entity that can be developed on the user end (like on the user's personal computer) or at that location on which the user can trust. Mostly, all the local proxies are used as an additional service or as an additional module (like browser plugins). To fulfill the objective of data protection by proxies, some requirements are needed to fulfill necessarily. The requirements are given below:

  • User privilege. There are several objectives of user privilege or user empowerment, however, the main objective is to increase the trust of the users in data protection proxies used by the cloud.
  • Transparency. Another important objective is that when users outsource their sensitive data to trusted proxies, their data should remain the same and should not be altered.
  • Cloud computing provides large computing power and cost saving resources. However, one concern is that if we increase data security, computation overhead should not increase. We want to minimize the computation overhead over the proxies.
  • Cloud functionalities preservation. Cloud functionalities preservation is the most important objective. The users encrypt their sensitive data on their personal computers by applying different encryption techniques to increase the protection of their data, however, by applying these different encryption techniques, they are not able to avail some of the cloud functionalities because of compatibility issues [ 97 ]. Hence, it is the main issue.

Figure 5 provides a data workflow for protecting sensitive data on the cloud using a local proxy. There are different types of the assumption that are made for data protection, and some of them are discussed below.

  • Curious CSPs, the most commonly used model in cloud computing, is given in the literature [ 98 ]. The cloud service provider honestly fulfills the responsibilities, i.e., they do not interfere in the user activities, and they only follow the stander protocols. The CSP is honest, however, sometimes, it is curious to analyze the users' queries and analyze their sensitive data, which is not good because it is against the protocol. Also, by this, the privacy of the user is compromised. Hence, we can avoid these things by applying some data protection techniques on the user end to protect the users' sensitive data from the CSPs.
  • In some cases, CSPs may collaborate with data protection proxies that are present on the users' sides to increase the level of trust between the users and CSPs because better trust can motivate more users to move to the cloud. This collaboration can be done if CSPs provide some services to the users with a stable interface for storing, searching, and computing their data.
  • A multicloud approach to cloud computing infrastructure has also been proposed to improve their performance. In this regard, multiple cloud computing services are provided in the same heterogeneous architecture [ 19 ]. A multicloud gives the user multiple different places to store their data at their desired location. There are several benefits to use a multicloud, e.g., it reduces reliance on a single CSP, which increases flexibility.

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.005.jpg

Data workflow on cloud using local proxy.

4.1. RQ1: What Type of Data Protection Techniques has Been Proposed in Cloud Computing?

In this session, we will discuss all the techniques for data storage security over the cloud. All these techniques are divided into two main categories, namely (i) cryptographic techniques and (ii) noncryptographic techniques. The local proxy uses different techniques to protect data that are stored on the cloud. Because of this reason, we cannot gain all the advantages of cloud services. Therefore, we analyze and compare all these techniques based on different criteria. These different criteria are as follows: (i) the data accuracy of all the techniques, (ii) the data protection level of all the techniques, (iii) all the functionalities these schemes allow on masked and unmasked data, and (iv) the overhead to encrypt and decrypt data over the cloud.

4.1.1. Noncryptographic Techniques

There are some noncryptographic techniques, and we discuss them in this paper as follows:

(1) Data Anonymization . Data anonymization is a data privacy technique used to protect a user's personal information. This technique hides the person's personal information by hiding the person's identifier or attributes that could reveal a person's identity. Data anonymization can be done by applying various mechanisms, for example, by removing or hiding identifiers or attributes. It can also be done by encrypting the user's personal information. The main purpose of performing data anonymization is that we can hide the identity of the person in any way. Data anonymity can be defined as the user's personal data being altered in such a way that we cannot directly or indirectly identify that person, and the CSP cannot retrieve any person's personal information. Data anonymization techniques have been developed in the field of statistical control disclosure. These techniques are most often used when we want to outsource sensitive data for testing purposes. Data anonymization is graphically represented in Figure 6 .

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.006.jpg

Data anonymization flow diagram.

Data anonymization techniques are most often used when we want to outsource sensitive data for testing purposes. For example, if some doctors want to diagnose certain diseases, some details of these diseases are required for this purpose. This information is obtained from the patients that suffer from these diseases, but it is illegal to share or disclose anyone's personal information. However, for this purpose, we use data anonymization technique to hide or conceal the person's personal information before outsourcing the data. In some cases, however, the CSP wants to analyze the user's masked data. In the data anonymization technique, attributes are the most important part. Attributes can include name, age, gender, address, salary, etc. Table 6 shows the identifiers classification.

Identifiers classification.

Data anonymization can be performed horizontally or vertically on this table and also on the record or group of records. The attributes are further classified into the following categories.

  • Sensitive Attributes: sensitive attributes possess sensitive information of the person, such as salary, disease information, phone number, etc. These attributes are strongly protected by applying some protection techniques.
  • Nonsensitive Attributes: these types of attributes do not belong to any type of category. Hence, they do not disclose the identity of a person.
  • Identifiers: identifier belongs to the identity of a person, such as Id card, name, social security number, etc. Because of the presence of these identifiers, the relationship between different attributes can be detected. Hence, these identifiers must be replaced or anonymized.
  • Quasi-Identifiers: quasi-identifiers are the group of identifiers that are available publicly, such as zip-code, designation, gender, etc. Separately, these identifiers cannot reveal the personal identity, however, by combining them, they may reveal the identity of the person. Hence, we want to separate these quasi-identifiers to avoid the discloser.

There are two main categories of data masking: (1) perturbative masking and (2) nonperturbative masking.

  • (1) Perturbative Masking
  • In perturbation, masking data is altered or masked with dummy datasets. Original data is replaced with dummy data, however, this data looks like the original data with some noise addition. The statistical properties of the original data are present in the masked data, however, nonperturbative masking does not contain the statistical properties of original data, because in perturbation masking, data is altered or masked with physically same but dummy data.
  • Data swapping
  • In data swapping, the data is randomly changed with the same but dummy data between different records [ 99 ]. However, if the numerical values are present in the dataset, then in certain limits, the values can be changed. Otherwise, the meaning of the data is changed. The masked data cannot look like the original data. For those attributes that can be ranked, the attribute is replaced with the nearby ranked attributes, and a very large difference between ranks is not suitable [ 100 ]. In data swapping, higher-level attributes are swapped [ 101 ] and individual values are not changed.
  • Noise Addition
  • In this mechanism, some noise is added to the original dataset to alter the original data. Noise is only added to the data that is continuous and divided into categories [ 102 ]. The noise is added into all the attributes that are present in the original dataset, such as sensitive attributes and also quasi-attributes.
  • Microaggregation
  • In this technique, all the relevant data is stored into different groups, and these different groups release average values from each record [ 103 ]. If a large number of similar records is present in different groups, then more data utility is done. We can cluster the data in many ways, e.g., in categorical versions [ 104 ]. Microaggregation is done on a quasi-attribute to protect these attributes from reidentification, and the quasi-attributes protect all the other attributes from reidentification. We can also minimize reidentification by data clustering [ 105 ].
  • Pseudonymization
  • In this method, the original data is replaced with artificial datasets [ 106 ]. In this technique, each attribute present in the original data is a pseudonym, and by doing this, data is less identifiable.
  • (2) Nonperturbative Masking
  • Nonperturbative masking does not change or alter the original data, however, it changes the statistical properties of the original data. Mask data is created by the reduction of the original data or suppressions of the original data [ 107 ].
  • Bucketization
  • In this method, original data is stored in different buckets, and these buckets are protected through encryption [ 108 ]. We can protect the sensitive attributes through bucketization.
  • Data slicing is a method in which a larger group of data is divided into smaller slices or segments [ 109 ]. Hence, we can slice the data, and in this way, the sensitive attribute and the quasi-attributes are divided into different slices. By identifying the individual slice, the identity of the person cannot be disclosed.
  • Sampling is a technique in which the population and sample concept is present. The entire data is called population, and the masked data is called a sample. In this technique, we make different samples of the original data. A smaller data sample provides more protection [ 110 ].
  • Generalization
  • It is a technique in which some additional attributes are added to the record. If the number of quasi-attributes is less rare, then some dummy attributes are added into the record, which look like the quasi-attributes. Hence, by doing this, reidentification becomes more difficult [ 111 ]. By applying generalization on data, we can protect the identity of a person because it hides the relationship between the quasi-attributes.

The summary of data anonymization techniques is given in Table 7 .

The summary of data anonymization techniques.

(2) Data Splitting . Data splitting is a technique in which sensitive data is divided into different fragments [ 112 ] to protect it from unauthorized access. In this technique, we first split the data into different fragments, then these fragments are randomly stored on different clouds. Even if the intruder gains access to a single fragment in any way, still the intruder will not be able to identify the person. For example, if an intruder gets a fragment from the cloud that contains the salary information of an organization, it is useless until he knows which salary belongs to which person. Hence, data splitting is a very useful technique for protecting data stored on the cloud.

Local proxies outsource data to the cloud without splitting the data, and they can also split the data first and then outsource to the same cloud using different accounts in the same CSP. It can also store data on different cloud platforms that run through different CSPs but provide some of the same services. Data is split before storing in different locations because even if some part or piece of data is known to an intruder, they will not be able to identify anyone.

Firstly, the local proxy retrieves sensitive data from the user and then calculates the risk factor for disclosure. In this method, the user can define the privacy level, and this privacy level provides information about all the sensitive attributes that can reveal someone's identity. These sensitive attributes are called quasi-attributes or quasi-identifiers. Next, the local proxy decides the number of pieces into which the sensitive data will be split and the number of locations that will be needed to store those pieces. Therefore, no one can reveal a person's identity, and all this information about the data splitting mechanism is stored at the local proxy. However, the system must be able to function properly and respond to the queries on time. After that, the local proxy stores these different data fragments in different cloud databases, and now, they are free from disclosure. The data-splitting mechanism supports almost all the functions of the cloud. Hence, we can use almost all the services provided by CSP using the data-splitting mechanism for storing data in the cloud.

When the users want to retrieve the original data, they process a query on a local proxy. The query is processed, and the data storage locations are retrieved from the local database. After that, the query is replicated as many times as the data is split into fragments, and these queries are forwarded to the relevant CSPs. As a result, each CSP provides a set of results that represent a partial view of the complete result. Finally, the proxy collects partial results according to the criteria used to split the data and provides the complete result to the user. Mostly, all these fragments are stored on different cloud databases in their original structure. Therefore, computation on these fragments can be performed easily. However, there is a problem if we want to perform computation separately on the individual fragment. Then, there is no algorithm that exists for this computation. Therefore, some algorithms are required to perform these types of computation as this computation requires communication between different CSPs. The redundancy of proxy metadata and backup policies must be essential to ensure the robustness of the mechanism. The data-splitting is graphically represented in Figure 7 .

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.007.jpg

Data-splitting flow diagram.

The summary of the data-splitting is given in Table 8 . Different data-splitting techniques are used for the protection of data stored on the cloud. Some of these are given below.

  • Byte level splitting
  • In this type, all the sensitive data is converted into bytes [ 113 ]. Then, these bytes are randomly shuffled with each other. After that, all the bytes are recombined. Fixed length fragments are made, and then, these fragments are stored on a different cloud.
  • Privacy level splitting
  • In this mechanism, the user chose the privacy level of each file [ 114 ] that is to be stored on a cloud database. Hence, a privacy level is attached with the file that is to be stored on the cloud. Using this privacy level, the user can decide that the higher privacy level files should be stored on the trusted cloud.
  • Byte level splitting with replication
  • Byte-level data splitting is combined with data replication to improve both performance and security. The author of the paper [ 115 ] proposed an algorithm to store the data fragments on different clouds, so that they are at a certain distance and by doing this; we can avoid confabulation attacks where the intruder can aggregate the split fragments.
  • Byte level splitting with encryption
  • Firstly, byte-level data splitting [ 116 , 117 ] is proposed. In this scheme, every fragment of data is encrypted to enhance the security of sensitive data. In this mechanism, the data is split into bytes, and these bytes are randomly shuffled and finally recombined. This type of data splitting is suitable for binary or multimedia files that are not processed through the cloud.
  • Another problem is the length of a fragment in which we can say that the data cannot be reidentified or the identity of a person cannot be revealed. If the length is too short, then the probability of disclosure increases, and if the length is too long, then it is difficult to handle these fragments. Hence, it should have a certain length so that we can also protect the identity of a person.
  • There is another type of data splitting in which we split data into attributes. The attribute level splitting is performed in two ways: one is horizontal splitting and the second is vertical splitting. These types of splitting are mostly done on structural databases, and they provide strong privacy.
  • Vertical splitting
  • In vertical data splitting [ 118 , 119 ], we divide quasi-identifiers or quasi-attributes in such a way that all the risky attributes are divided into different fragments to secure the reidentification. Some of the sensitive fragments required encryption on it. Hence, we can encrypt these fragments by applying some encryption algorithms or by applying some other privacy methods to increase the security level.

The summary of the data-splitting techniques.

A solution for sensitive data splitting without performing encryption on fragments is proposed [ 120 ]. This mechanism is suitable for data on which we want to perform some computation, because on encrypted data, we cannot perform computation directly. Another technique has been proposed [ 121 ], which demonstrates the redaction and sanitization of a document that identifies all sensitive attributes and protects the data in most documents.

The schemes that use vertical splitting to protect data are faster than other splitting techniques because data fragments consist of a single attribute or multiple attributes. It does not involve data masking or encryption. Hence, the computation is easy. There is another type of encryption in which we do not encrypt and decrypt every time to perform computation. It is called homomorphic encryption. In this case, all data modification is done on encrypted data, and actual data is not changed, however, the final result is preserved [ 122 ].

(3) Steganography . Steganography is the practice of concealing a message within another message or a physical object. In computing contexts, video, audio, image, message, or computer file is concealed within another image, message, or file. The steganography flow diagram is depicted in Figure 8 . There are two main types of steganography, namely (1) linguistic steganography and (2) technical steganography. These techniques are given as follows:

  • (1) Linguistic Steganography
  • It uses images and symbols alone to cover the data. There are two types of Semagrams [ 123 ]. The first is a visual Semagram. In this type, we can visualize the massage. The second type is a text Semagram. In this type, we change the font, color, or symbols of the text message.
  • In this case, we hide the real message from the intruder by installing the original massage in an authorized carrier [ 124 ]. Open code technique is further divided into two types: one is jargon code, and the second is covered ciphers.
  • (2) Technical Steganography
  • Text steganography
  • In this type, we change some textual characteristics of text, such as the font, color, or symbols of the text message [ 127 ]. Three coding techniques are used to change these textual features, which are as follows: (1) line-shift coding, (2) word-shift coding, and (3) feature coding.
  • Image steganography
  • It is the most popular type of steganography. Image steganography refers to the process of hiding sensitive data inside an image file [ 128 ]. The transformed image is expected to look very similar to the original image because the visible features of the stego image remain the same. The image steganography is divided into three parts, namely (1) least significant bits coding, (2) masking and filtering, and (3) transformations.
  • Audio steganography
  • Audio steganography is a technique that is used to transmit secret data by modifying a digitalized audio signal in an imperceptible manner [ 129 ]. Following types of audio steganography are given: (1) least significant bits coding, (2) phase coding, (3) spread spectrum, and (4) echo hiding.
  • Video steganography
  • In video steganography, both image and audio steganography are used [ 130 ]. A video consists of many frames. Hence, video steganography hides a large amount of data in carrier images. In this type of steganography, we select the specific frame in which we want to hide the sensitive data.
  • (ii) Methods
  • Frequency Domain
  • A frequency-domain steganography technique is used for hiding a large amount of data with no loss of secret message, good invisibility, and high security [ 131 ]. In the frequency domain, we change the magnitude of all of the DCT coefficients of the cover image. There are two types of frequency domain: (1) discrete cosine transformation and (2) discrete wavelet transformation.
  • Spatial Domain
  • The spatial domain is based on the physical location of pixels in an image [ 132 ]. A spatial domain technique gives the idea of pixel regulation, which minimizes the progressions of a stego image created from the spread image. Some methods of the spatial domain are given as follows: (1) least significant bit, (2) pixel value differencing, (3) pixel indicator, (4) gray level modification, and (5) quantized indexed modulation.

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.008.jpg

Steganography flow diagram.

The summary of the steganographic techniques is given in Table 9 .

The summary of the steganographic techniques.

4.1.2. Cryptographic Techniques

Cryptography is the most important and most widely used technique for security purposes. In cryptography, the plain text is converted into ciphertext using a key and some encryption algorithms. Cryptographic techniques are the most secure techniques among all the other security techniques. Hence, these cryptography techniques are widely used in data storage security over the cloud. The present day's cryptography techniques are more realistic. We can achieve different objectives by applying these cryptographic techniques, for example, data confidentiality and data integrity. Because of an increase in the number of data breaches in the last few years, some cloud service provider companies are shifting toward cryptographic techniques to achieve more security. The most commonly used cryptographic technique is AES [ 133 ]. Key management is an important issue in cryptographic techniques because if the key is hacked by an intruder, then all the data will be hacked or stolen by this intruder. Hence, key protection or key management is a very important issue. Therefore, it is mostly the responsibility of CSP to manage the key and also provide the protection of key. Cryptographic techniques also protect the user from an untrusted CSP because sometimes the CSP outsources sensitive data without taking the permission of users, and it is an illegal activity. Hence, to avoid these things and protect our sensitive data from untrusted CSPs, we use cryptographic techniques, and it is the best option for users. However, there are some difficulties the user has to face while using cryptographic techniques, i.e., if a user wants to update a small amount of data, the user needs to decrypt the data and then perform this minor update. Hence, this work is very costly. Over time, implementing cryptographic techniques gives us a higher level of security, however, we compromise on performance or speed. It all depends on the user, the standard, the performance, or the high level of security the user wants to achieve. In this paper, we are focusing on the four main functionalities that are required or needed on cloud computing when using cryptographic techniques. Figure 9 shows the flow diagram of encryption.

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.009.jpg

Encryption flow diagram.

Some of the main functionalities of cryptographic functions are given below.

  • Search on encrypted data
  • If a user wants to retrieve their data stored in a cloud database, they generate a query and run the query on a local proxy server and search for the data they want. Searching for encrypted data is a very important part of cryptography because every user who stores their sensitive data in a cloud database wants to retrieve it, and it is done by searching their sensitive data through queries. Therefore, the procedure of retrieving their data is very difficult.
  • Storage control
  • Sometimes the user wants to store data in a desired location or trusted database. Hence, the user must have full control over the storage of data.
  • Access control
  • It is a very important control and is referred to as data access restriction. Sometimes, the user does not want to share a private file publicly. Hence, access control is an important functionality.
  • Computation on data
  • Data computation is the main functionality of cloud computing. Sometimes, the user wants to perform some computation on data that are stored on a cloud database. For example, if a user wants to perform computation on encrypted data that is stored on cloud databases, then there are two ways. One is that the user, firstly, decrypts the entire data, performs computation on the data, and finally, the user encrypts the entire data and stores on the cloud database. This process is very expensive in terms of computation.

Some of the cryptographic techniques are as follows:

(1) Homomorphic Encryption . Homomorphic encryption is a form of encryption that permits users to perform computations on encrypted data without decrypting it. These resulting computations are left in an encrypted form, which, when decrypted, result in an identical output to that produced had the operations been performed on the unencrypted data. There are some types of homomorphic encryption that are described below.

  • Partial Homomorphic Encryption
  • In partial homomorphic encryption, only one arithmetic function addition or multiplication is performed at one time. If the resultant ciphertext is the addition of the plain text, then it is called an additive homomorphic scheme, and if the resultant ciphertext is the multiplication of the plaintext, then it is called the multiplicative homomorphic scheme. Two multiplicative homomorphic schemes are given as in [ 134 , 135 ]. There is one additive homomorphic scheme that is called Paillier [ 136 ].
  • Somewhat Homomorphic Encryption
  • This technique allows the user to perform the multiplication and subtraction mathematical operations. However, this scheme allows a limited number of arithmetic operations, because if it allows a large number of arithmetic operations, then it produces noise. This noise changes the structure of the original data. Hence, limited numerical math operations are allowed. There is a somewhat homomorphic encryption scheme that is presented by the authors of the papers [ 137 , 138 ]. In this scheme, the time of encryption and decryption is increased when multiplication operations are increased. To avoid this increase in time, we allow only a limited number of mathematical operations.
  • Fully Homomorphic Encryption
  • This technique allows a large number of arithmetic operations, namely multiplication and subtraction. Multiplication and addition in this technique are performed in the form of XOR and AND gates [ 139 ]. Completely homomorphic encryption techniques require a higher computation time to encrypt and decrypt data. Therefore, this technique is not applicable in real-life applications for implementation. This technique uses a bootstrapping algorithm when a large number of multiplication operations is performed on data and also for the decryption of the data it is used. Homomorphic encryption, on the other hand, represents the trade-off between operations and speed performance. Only a limited number of arithmetic operations are allowed if someone wants low computation, and a large number of arithmetic operations are allowed if someone wants high security. It depends on the needs of the user.

(2) Searchable Encryption . A searchable encryption technique is proposed by the author of the paper [ 140 ]. In this technique, before storing data on a cloud database, encryption is performed, and after that, it is stored on the cloud. The advantage of this technique is that when we search for some data over the cloud database, this technique provides a secure search over the cloud database.

  • Searchable Asymmetric Encryption
  • Over the past two decades, we have focused on searchable encryption. Much of the work is related to the multiwriter and single-reader cases. Searchable encryption is also called public keyword search encryption along with keyword search (PEKS) [ 141 ].
  • Searchable Symmetric Encryption
  • Symmetric-key algorithms use the same key for massage encryption and ciphertext decryption. The keys can be the same, or there can be a simple transformation to go between the two keys. Verifiable searchable symmetric encryption, as a key cloud security technique, allows users to retrieve encrypted data from the cloud with keywords and verify the accuracy of the returned results. Another scheme is proposed for keyword search over dynamic encrypted cloud data with a symmetric-key-based verification scheme [ 142 ].

(3) Encryption . In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can decipher a ciphertext back to plaintext and access the original information.

  • Symmetric Key Encryption
  • Only one key is used in symmetric encryption to encrypt and decrypt the message. Two parties that communicate through symmetric encryption should exchange the key so that it can be used in the decryption process. This method of encryption differs from asymmetric encryption, where a pair of keys is used to encrypt and decrypt messages. A secure transmission method of network communication data based on symmetric key encryption algorithm is proposed in [ 143 ].
  • Public Key Encryption
  • The public-key encryption scheme is proposed by the author of the paper [ 144 ]. In this scheme, a public key pair is created by the receiver. This public key pair consists of two keys. One is called a public key, which is known publicly to everyone, and the second is the private key, which is kept a secret. Hence, in this scheme, the sender performs encryption on the data using the public key of the receiver and then sends this encrypted data to the receiver. After receiving this encrypted data, the receiver can decrypt this data using the private key. Hence, in this way, we can perform secure communication between two parties.
  • Identity-Based Encryption
  • Identity-based encryption is proposed by the author of the paper [ 145 ]. In this technique, a set of users is registered on the database and a unique identity is assigned to all the registered users by an admin that controls this scheme. The identity of the users can be represented by their name or their e-mail address. Just like in a public-key encryption, there is a public key pair that consists of one public key, which is the identity of the user, and one private key, which is a secret key. Just like in public-key encryption, the receiver cannot generate their public key in identity-based encryption. The identity cannot be generated by the user. There is a central authority that generates and manage the user's identity. The identity-based encryption is improved by the author [ 146 ]. The main advantage of identity-based encryption is that anyone can generate the public key of a given identity with the help of the central main authority.
  • Attribute-Based Encryption
  • The authors of the papers [ 147 , 148 ] propose a technique called attribute-based encryption. Similar to identity-based encryption, attribute-based encryption also depends on the central main authority. The central main authority generates the private key and distributes it to all the registered users. It can be encrypting the messages, however, if it does not have this designation, then it cannot be generating the messages. Attribute-based encryption is used when the number of registered users is very large. Then, the attribute-based encryption is useful. The attribute-based encryption consists of two schemes, which are key policy and ciphertext policy.
  • Functional Encryption
  • A functional encryption technique [ 149 , 150 ] consists of identity-based encryption, attribute-based encryption, and public-key encryption. All the functionalities of these three techniques combinedly make function encryption. In this technique, all the private keys are generated by the central main authority, which is associated with a specific function. Functional encryption is a very powerful encryption technique that holds all the functionalities of three encryption techniques. A functional encryption technique is used in many applications.

(4) Signcryption . Cryptography is publicly open-source, and it functions simultaneously as a digital signature and cipher. Cryptography and digital signatures are two basic encryption tools that can ensure confidentiality, integrity, and immutability. In [ 151 ], a new scheme called signature, encryption and encryption is proposed, based on effectively verifiable credentials. The system not only performs encryption and encryption but also provides an encryption or signature form only when needed [ 152 ]. The paper proposes lightweight certificate-based encryption using a proxy cipher scheme (CSS) for smart devices connected to an IoT network to reduce computing and communications costs. To ensure the security and efficiency of the proposed CBSS project, we used a cipher system encoded with 80 bit subparameters. Reference [ 153 ] proposes an input control scheme for the IoT environment using a cryptographic scheme corresponding to the efficiency and robustness of the UK security system. The proposed scheme shows that besides security services, such as protection against attacks, confidentiality, integrity, nonblocking, nondisclosure, and confidentiality, accounting and communication costs are low compared to the current scheme. Document [ 154 ] gives the informal and formal security proof of the proposed scheme. Automated Validation of Internet Security Protocols and Applications (AVISPA) tool is used for formal security analysis, which confirms that the proposed CB-PS scheme can potentially be implemented for resource-constrained low-computing electronic devices in E-prescription systems. The proposed scheme [ 155 ] introduced a new concept that does not require a reliable channel. The main production center sends a part of the private key to the public consumers. The summary of the cryptographic schemes is given in Table 10 .

The summary of the cryptographic techniques.

All data storage protection on cloud computing is discussed in session 3. There are a lot of data protection techniques, however, all these techniques are only divided into three main categories, namely (i) data splitting, (ii) data anonymization, and (iii) cryptography. From different points views, we discuss all these techniques, e.g., overhead on the local proxy, computation cost, search on encrypted data, data accuracy all these techniques retained, and data protection level all these techniques have, and all the masked data techniques have the functionalities. These are some different views, and by considering them, we can analyze all the data protection techniques. Cryptography provides high-level security but limited cloud functionalities and a high cost of performing computation on cloud data. Data splitting provide low computation cost but a low level of security. Data anonymization is of two types: one is perturbative masking, and the second is nonperturbative masking. Hence, in perturbative masking, data is altered with dummy data. Hence, security is high, however, we cannot perform some functionalities.

4.2. RQ2: What are the Demographic Characteristics of the Relevant Studies?

We answer this question by considering the four following aspects: (i) publication trend, (ii) publication venues (proceeding and journals), (iii) number of citations, and (iv) author information.

4.2.1. Publication Trend

From 2010 to 2021, we found 52 papers that were of top ranked journals and conferences. From 2010 to 2017, there is linear work in cloud computing, however, after 2017, a lot of work is done in cloud computing data security. From 2018 to 2021, 37 papers are published. After 2018, the trend about data security in cloud computing increased very vastly. Most of the work is done in 2021. High-ranked studies are published in 2021. Figure 10 shows all trends of all the publications from 2010. Most of the articles are published in journals venue, and the highest number of papers have been published in IEEE Access journal. 6 papers were published in this journal.

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.010.jpg

Number of publications per year.

4.2.2. Publication Venues

There are different types of publication venues, and some of them are book articles, conference proceedings, journals, workshop proceedings, and symposium proceedings. Hence, in our SLR, the number of publications in a different venue is given in Figure 11 . We have a total of 52 papers after applying the inclusion and exclusion criteria in Section 2 .

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.011.jpg

Publication venues.

Out of 52 papers, 0 papers are published in book chapters. 1 paper is published in workshop proceedings. 0 papers are published in symposium proceedings. 43 papers are published in journals. 8 papers are published in conference proceedings. There are some most active journals in cloud data security, which are enlisted in Table 11 .

Top 5 most active journals.

The most active journal is the IEEE Access. In this journal, 6 papers are published. Journal of Cryptology is the second most active journal in the field of data storage, security, and privacy in cloud computing. In this journal, 3 papers are published. In the third journal, i.e., in the Journal of Information Fusion, 3 papers are published. The fourth journal is the Information Science. In this journal, 2 papers are published. The fifth journal is IEEE Transactions on Knowledge and Data Engineering, and in this journal, 2 papers are published. Most active conferences are given in Table 12 .

Top 5 most active conferences.

4.2.3. Number of Citations

The number of citations of a paper also tells the quality of the paper. The more the number of citations, the higher the quality, and the fewer the number of citations of the paper, the lower the paper quality. Table 13 shows the most influential authors, and Figure 12 shows the number of citations of all the papers that we have used in this SLR. Few papers have citations of more than 100. Hence, it shows that papers have a very high quality, and hence, the citation of those papers is very high. These papers are [ 105 , 118 , 124 , 139 ].

An external file that holds a picture, illustration, etc.
Object name is CIN2022-8303504.012.jpg

Number of citations of the papers.

Top 10 most influential authors in data protection in cloud computing.

4.2.4. Author Information

Some authors are most active in their publication. To identify these authors, we enlist the names of the top 10 authors that are more active in the field of data protection and privacy in cloud computing. Hence, we enlist the names of the top 10 authors and also their numbers of publications in Table 13 .

4.3. RQ3: Which Data Protection Technique Provides More Data Protection among all the Techniques?

We answer this question by considering the following four aspects: (i) publication trend, (ii) publication venues (proceeding and journals), (iii) number of citations, and (iv) author information.

4.3.1. Comparison of Data Protection Techniques

In this section, we compare all the data protection techniques that are discussed in this SLR, and finally, we review which technique is better and provides more protection among all these data protection techniques. We compare these techniques based on different functionalities, which are given as (i) local proxy overhead, (ii) data accuracy retain, (iii) level of data protection, (iv) transparency, and (v) operation supported, and finally, we discuss RQ2. Table 14 depicts a comparison of all the data protection techniques and provides a brief comparison of all the data protection techniques discussed in this SLR. Now, we discuss all these five functionalities one by one in more detail.

  • The overhead on the local proxy for encryption is very high because the data is encrypted. If the user wants to update the data, firstly, the user decrypts the data and then updates the data. After that, the user encrypts the data again. Hence, this operation requires a lot of time, and all this work is performed by the local proxy. It is the reason the overhead on the local proxy for encryption is very high for encryption.
  • Data Splitting
  • The overhead on a local proxy for data splitting is very low. The local proxy overhead remains constant while splitting data into fragments.
  • Anonymization
  • The overhead on a local proxy for anonymization is average because most of the anonymization methods require quasilinear computation in the number of records to generate the anonymized data set. Whenever the anonymized data is generated and stored in the cloud database, then there is no overhead on the local proxy.
  • Homomorphic Encryption
  • The overhead on local proxies for homomorphic encryption is very high because homomorphic encryption involves a large number of mathematical operations. Therefore, there is a lot of overhead on local proxies for homomorphic encryption.
  • Steganography
  • The overhead on the local proxy for steganography is not too much as the data is concealed inside the cover for secure communication. However, based on the complexity of the operation in the transformed domain technique, the local proxy overhead is more than the spatial domain technique.
  • Signcryption
  • The overhead on the local proxy for signcryption is high compared to the simple encryption because in signcryption, hashing and encryption are performed in a single logical step. Because of an extra operation in signcryption, the overhead on the local proxy is higher than the simple encryption.
  • The data accuracy level for encryption is very high because data is encrypted by applying some algorithms. The sensitive data is encrypted by the sender, and this data is decrypted by the receiver using a key. This data cannot be read by anyone who does not have the secret key. Therefore, data accuracy is very high for encryption.
  • The data accuracy level for data splitting is average because data-splitting data is present in the form of fragments. Therefore, CSP can easily access the fragments of data. Both encryption and data splitting are irreversible methods. Hence, we can retrieve the original data easily.
  • The data accuracy level for data anonymization is very low because anonymization is not irreversible. In anonymization, data is replaced with dummy data, and it cannot be retrieved back. Therefore, anonymization has a very low level of data accuracy.
  • The data accuracy level for homomorphic encryption is very high because data is encrypted by applying some algorithms.
  • The data accuracy level for steganography is very low as compared to the other cryptographic techniques because data is embedded inside the cover of another medium. Any change in the cover during transmission results in the change of the concealed data. Therefore, it is hard to ensure a high accuracy level in steganography. The stego image contains the secrete data that is transmitted over the communication channel. Data concealed by the sender is extracted from the cover by the receiver. Therefore, the concealment of data results in accurate data transmission.
  • The data accuracy level for signcryption is also very high, because in signcryption, confidentiality and authentication are achieved. Therefore, we can also verify the identity of the sender.
  • The level of data protection is very high for encryption techniques, because in encryption, data is changed into ciphertext, which cannot be understood. Therefore, we can say that the identification of data is impossible without decryption using a secret key because encryption is a one-way function that is easy to execute in one direction, however, it is impossible to execute in the opposite direction.
  • The level of data protection for data splitting is less high as compared to cryptographic techniques because data is split into different fragments, and these fragments contain original forms of data. Hence, if an intruder hacks or steal these fragments, then the untired data can be easily read. Hence, the data protection level is not high as compared to encrypted methods.
  • The level of data protection for data anonymization is less high as compared to cryptographic techniques, because in anonymization techniques, quasi-identifiers are protected if the quasi-identifiers are not protected strongly. Then, there is a change in the reidentification of person-sensitive data.
  • The level of data protection is very high for homomorphic encryption techniques because encryption data is changed into ciphertext, which cannot be understood.
  • The data protection level for steganography is medium because data is embedded inside the cover of another medium. The stego image contains the secrete data that is transmitted over the communication channel. Data concealed by the sender is extracted from the cover by the receiver. Therefore, the concealment of data results in secure data transmission.
  • The data protection level for signcryption is also very high, because in signcryption, both confidentiality and authentication are achieved. Therefore, we can also verify the identity of the sender.
  • There is no transparency for the encrypted data, because in encryption, there is a need for key management. Hence, the local proxy needs to keep the records of all the keys and manage all these keys. Therefore, there is no transparency for the encrypted data.
  • There is no transparency for the data-splitting mechanism, because in the data-splitting mechanism, data is split into different fragments, and the local proxy stores these fragments in different locations. Hence, there is a need to keep the record of the location of all the fragments that are stored on different locations.
  • Anonymization is fully transparent, because in anonymization, there is no need to keep the record of data storage by the local proxy. In anonymization, data is statistically similar to the original data. Hence, CSP also performs computation and some analysis on the anonymized data.
  • There is no transparency for the homomorphically encrypted data, because in encryption, there is a need for key management. Hence, the local proxy needs to keep the records of all the keys.
  • In steganography, as compared to other data protection techniques, the main aim is to transmit data without letting the attacker know about the data transmission as it is concealed inside the cover of another medium. The data transmission in steganography is fully transparent. No key management is required, and there is no need to keep track of data storage.
  • There is no transparency for the signcrypted data, because in signcryption, there is a need for key management. Hence, the local proxy needs to keep the records of all the keys and also manage all these keys.
  • Only the data storage operation is supported on the encrypted data, because if the user wants to update some encrypted data that are stored on a cloud database, firstly, the user needs to decrypt this data, and then the user performs an update on this data. We cannot perform any modification operation on encrypted data.
  • All the operations cloud be performed on data splitting, because in data splitting, the data is present in their original structure. Hence, we can perform data storage, search, data update, and also data computation.
  • In anonymization, there are two types of data anonymization: one is data masking, and the second is data nonmasking. If data is nonmasked, then we can perform data storage and search on this data. Otherwise, we can only perform data storage.
  • Only the data storage operation is supported on the encrypted data, because if the user wants to update some encrypted data that are stored on the cloud database, firstly, the user needs to decrypt this data, and then the user performs some updates on this data.
  • A stego image only supports data storage operations because if the user wants to update the data hidden in a stego image, the user, firstly, retrieves that data from the stego image, and the user can perform any modification on this data.
  • Only the data storage operation is supported on the signcrypted data, because if the user wants to update signcrypted data that are stored on the cloud database, firstly, the user needs to unsign this data, and then the user can perform any update on this data.

Comparison of data protection techniques.

5. Conclusion and Future Work

5.1. rq4: what are the primary findings, research challenges, and direction for future work in the field of data privacy in cloud computing, 5.1.1. conclusion and research challenges.

In this SLR, we have presented all the data privacy techniques related to data storage on cloud computing systematically, and we also present a comparison among all the protection techniques concerning the five finalities, which are the (i) local proxy overhead, (ii) data accuracy retains, (iii) level of data protection, (iv) transparency, and (v) operation supported. There are some research gaps we found in all these techniques of data splitting, anonymization, steganography, encryption, homomorphic encryption, and signcryption.

  • There is a very strong need to develop some ad hoc protocols for the communication of data splitting fragments that are stored on different CSPs, and also, there is a strong need to develop some protocol for the communication between different CSPs. Noncryptographic techniques are faster on different CSPs but do not provide enough security. Hence, we can improve security by developing some methods for data-splitting techniques.
  • Anonymity techniques work very effectively on a small amount of data but not for big data. Hence, there is a search gap in which we can develop some anonymity techniques to achieve more efficient performance. Therefore, some anonymous schemes need to be developed, which provide stronger protection to the quasi-identifier. Current anonymity techniques are very immature.
  • One of the limitations of steganography is that one can only use it to defend against a third party who does not know steganography. If the third party knows steganography, it can extract the data in the same way that the recipient extracts it. Therefore, we always use encryption with steganography. Therefore, there is a need to develop such steganography techniques that can protect sensitive data from third parties.
  • There is a need to develop some cryptographic techniques that can take less time than the existing cryptographic techniques to perform search and computation operation on encrypted data. Cryptographic techniques provide high security but low computational utility. Therefore, it is a search gap to develop some techniques that provide both high security with more efficiency.
  • The complexity of homomorphic encryption and decryption is far greater than that of normal encryption and decryption, and it is not applicable to many applications, such as healthcare and time-sensitive applications. Therefore, there is an urgent need to develop such homomorphic encryption schemes that have low complexity and computation cost.
  • Signcryption is used to verify and authenticate users. We can obtain confidentiality and authentication using signcryption, however, the main limitation of signcryption is that the calculation costs of the encryption algorithm used in signcryption are very high. Therefore, there is a need to develop such signcryption schemes that use such encryption algorithms, which have low computation cost.

Acknowledgments

This research was financially supported by The Analytical Center for the Government of the Russian Federation (Agreement nos. 70-2021- 00143 dd. 01.11.2021, IGK 000000D730321P5Q0002).

Data Availability

Conflicts of interest.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Cybersecurity professor and Ph.D. students to present autonomous driving research at ACM MobiSys ’24 Conference

Paper marks first time someone from rit has published at highly selective conference.

laptop on mobile tray in RIT parking lot for field research

Fawad Ahmad

The paper addresses solutions for occlusions and blind spots that hinder autonomous driving.

Fawad Ahmad , assistant professor in the Department of Computer Science and ESL Global Cybersecurity Institute at RIT, along with computing and information sciences Ph.D. candidates Kaleem Nawaz Khan and Ali Khalid, will present new research at the prestigious ACM International Conference on Mobile Systems, Applications, and Services ( ACM MobiSys’24 Conference ) in  Tokyo , Japan in June. The conference is highly selective with an acceptance rate of only 16%. Their paper, VRF: Vehicle Road-side Point Cloud Fusion , marks the first time RIT researchers have published at the conference.

The paper addresses solutions for occlusions and blind spots that hinder autonomous driving. The RIT team, along with collaborators Yash Turkar and Karthik Dantu from the University at Buffalo have developed a system that enables road-side mounted sensors to share and combine their 3D data with the vehicle’s own sensors. This creates a more complete picture of the surroundings, extending the car's "vision" beyond its own limitations.

“Our system, VRF, shares and fuses 3D views from a road-side sensors to a vehicle in real-time with high accuracy,” said Ahmad. “VRF is particularly impressive in that can share and fuse 3D data in under 20 milliseconds, which is crucial for real-time decision making in self-driving cars. At the same time, it maintains high accuracy, achieving positioning within 5 centimeters.”

With VRF, vehicles will have a more complete understanding of their surroundings. With this, vehicles will have more time to react to external events, make more informed decisions, and hence drive safer.

The conference takes place June 3 – 7, 2024.

Recommended News

April 26, 2024

Molly HIll is shown sitting on a directors style chair in jeans and a light brown suit jacket.

Color Chat: Disney’s Image and Color Engineer Molly Hill   

postPerspective talks to Molly Hill '18 (motion picture science), '18 (film and animation), about her experience working as a senior image and color engineer at The Walt Disney Studios.

a news reporter sits on an R I T clean snowmobile that is on display at Imagine R I T.

Showcasing creativity and innovation at Imagine RIT   

13WHAM/Fox Rochester previews Imagine RIT in eight segments.

an R I T student appears with a Spectrum News anchor who is holding a mic.

Imagine RIT festival returns to Rochester this weekend   

Spectrum News showcases exhibits appearing at Imagine RIT.

the Imagine R I T logo is displayed behind a news anchor on News 8 W R O C.

‘I’m hoping this makes a difference in little kids:’ Imagine RIT is this Saturday   

WROC-TV highlights exhibits that will be on display at Imagine RIT.

IMAGES

  1. (PDF) A Review Paper on Cloud Computing

    cloud computing in research papers

  2. (PDF) A Study on Cloud Computing Services

    cloud computing in research papers

  3. (PDF) Survey Paper on Cloud Computing

    cloud computing in research papers

  4. (PDF) Cloud Computing: Analysis of Various Platforms

    cloud computing in research papers

  5. (PDF) Research Paper on Cloud Computing

    cloud computing in research papers

  6. (PDF) A COMPREHENSIVE STUDY ON CLOUD COMPUTING

    cloud computing in research papers

VIDEO

  1. Cloud Computing Research paper presentation

  2. Research Domain and Topic: Cloud Computing

  3. #Cloud Computing and It's Service Models #Digital Fluency Part:-4

  4. Cloud Computing previous question papers 6th semester Bsc computers

  5. What is Green Cloud Computing? What does a green cloud mean? and How Does it Work?

  6. CS435 (Cloud Computing) Midterm Paper Spring 2023

COMMENTS

  1. Research Note Cloud computing research: A review of research themes, frameworks, methods and future research directions

    This paper presents a meta-analysis of cloud computing research in information systems with the aim of taking stock of literature and their associated research frameworks, research methodology, geographical distribution, level of analysis as well as trends of these studies over the period of 7 years.

  2. Articles

    The smart collection and sharing of data is an important part of cloud-based systems, since huge amounts of data are being created all the time. This feature allows users to distribute data to particular recip... S. Velmurugan, M. Prakash, S. Neelakandan and Arun Radhakrishnan. Journal of Cloud Computing 2024 13 :86.

  3. Security and privacy protection in cloud computing: Discussions and

    In a complex situation, this paper proposes a comprehensive cloud computing privacy protection security system based on a variety of technologies, such as access control, trust, ... and conforms to the real model of cloud computing. (4) Research on access control model of space-time awareness. When the cloud computing environment is expanded to ...

  4. Home page

    The Journal of Cloud Computing, Advances, Systems and Applications (JoCCASA) has been launched to offer a high quality journal geared entirely towards the research that will offer up future generations of Clouds. The journal publishes research that addresses the entire Cloud stack, and as relates Clouds to wider paradigms and topics.

  5. Adoption of cloud computing as innovation in the organization

    We also explore the cybersecurity elements associated with cloud computing, focusing on intrusion detection and prevention and understanding how that can be applied in the cloud. Finally, we investigate the future research directions for cloud computing and expand this paper into further articles with experiments and results.

  6. Cloud services selection: A systematic review and future research

    Additionally, assist cloud computing researchers in making intelligent decisions and identifying research gaps that require further investigation. The remaining structure of the paper is as follows: Section 2 begins with a brief introduction to review methodology, followed by the research questions, article selection procedure, and ...

  7. Cloud Computing: A Systematic Literature Review and Future Agenda

    review is thought to inspire enterprises and managers that would like to use cloud computing in. terms of the scope, solution methods, factors, dimensions, and the results achieved in a holistic ...

  8. Welcome to the new Journal of Cloud Computing by Springer

    Metrics. Since 2012, the Journal of Cloud Computing has been promoting research and technology development related to Cloud Computing, as an elastic framework for provisioning complex infrastructure services on-demand, including service models such as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service ...

  9. Cloud Computing

    Cloud computing is an evolution of information technology and a dominant business model for delivering IT resources. With cloud computing, individuals and organizations can gain on-demand network access to a shared pool of managed and scalable IT resources, such as servers, storage, and applications. Recently, academics as well as practitioners ...

  10. cloud computing Latest Research Papers

    The paper further compares and reviews different layout model for the discovery of services, selection of services and composition of services in Cloud computing. Recent research trends in service composition are identified and then research about microservices are evaluated and shown in the form of table and graphs. Download Full-text.

  11. A Systematic Literature Review on Cloud Computing Security: Threats and

    Cloud computing has become a widely exploited research area in academia and industry. Cloud computing benefits both cloud services providers (CSPs) and consumers. The security challenges associated with cloud computing have been widely studied in the literature. This systematic literature review (SLR) is aimed to review the existing research studies on cloud computing security, threats, and ...

  12. Research Advances in Cloud Computing

    Dr. Sanjay Chaudhary is a Professor and Associate Dean of the School of Engineering and Applied Science, Ahmedabad University, Ahmedabad, India. His research areas are data analytics, cloud computing, and ICT applications in agriculture and rural development. He has authored four books, six book chapters, and published more than hundred research papers and ten literary articles in ...

  13. (PDF) A COMPREHENSIVE STUDY ON CLOUD COMPUTING

    A COMPREHENSIVE STUDY ON CLOUD. COMPUTING PARADIGM. Ab Rashid Dar 1, Dr. D. Ravindran 2. 1,2 Department of Computer Science, St. Joseph's College. (Autonomous), Tiruchirappalli Tamil Nadu, (Indi ...

  14. Next generation cloud computing: New trends and research directions

    Next generation cloud computing systems are aimed at becoming more ambient, pervasive and ubiquitous given the emerging trends of distributed, heterogeneous and ad hoc cloud infrastructure and associated computing architectures. This will impact at least the following four areas considered in this paper.

  15. Cloud Computing: Overview & Current Research Challenges

    This research paper presents what cloud computing is, the various cloud models and the overview of the cloud computing architecture, and analyzes the key research challenges present in cloud computing and offers best practices to service providers and enterprises hoping to leverage cloud service to improve their bottom line in this severe economic climate.

  16. Cloud computing: state-of-the-art and research challenges

    In this paper, we present a survey of cloud computing, highlighting its key concepts, architectural principles, state-of-the-art implementation as well as research challenges. The aim of this paper is to provide a better understanding of the design challenges of cloud computing and identify important research directions in this increasingly ...

  17. Special issue on ''artificial intelligence in cloud computing''

    Artificial intelligence is being embedded into cloud computing infrastructures to help streamline workloads and automate repetitive tasks as well as to monitor, manage, and even self-heal when an issue occurs. The future research on the empowerment of AI through cloud computing is without limit. This special issue aims to gather researches and ...

  18. Survey on serverless computing

    The authors in have conducted a systematic exploration of serverless computing-related research papers. As they mentioned, their work is not a survey, but it is a supporting source for future research papers. ... Suter P (2017) Serverless Computing: Current Trends and Open Problems In: Research Advances in Cloud Computing, 1-20.. Springer ...

  19. This question is for testing whether you are a human visitor and to

    This question is for testing whether you are a human visitor and to prevent automated spam submission. Audio is not supported in your browser.

  20. (PDF) Research Paper on Cloud Computing

    Student. , M.Sc. I.T., I.C.S. College, Khed, Ratnagri. Abstract: Cloud Computing has come of age later Amazons introduce the first of its kind of cloud services in2006. It is. particularly ...

  21. Analysis and Research on Green Cloud Computing

    IT industry always has energy consumption and carbon emission problems. In recent years, with the emergence of new techniques, cloud computing has been taken as a helpful solution to these environmental problems. This paper first gives a brief introduction of cloud computing, and summaries relevant research of how to use it to realize green IT. Then the paper provides a way to verify cloud ...

  22. Load balancing in cloud computing

    The authors pointed out that from 2010 to 2015 there is a momentous rise in research papers on the scope of load balancing that follows a positive exponential curve and in our work find it more increasing in 2016, 2017 and 2018. ... Reddy VK, Rao BT, Reddy LSS (2011) Research issues in cloud computing. Glob J Comp Sci Technol 11(11):70-76 ...

  23. The Rise of Cloud Computing: Data Protection, Privacy, and Open

    From 2010 to 2021, we found 52 papers that were of top ranked journals and conferences. From 2010 to 2017, there is linear work in cloud computing, however, after 2017, a lot of work is done in cloud computing data security. From 2018 to 2021, 37 papers are published. After 2018, the trend about data security in cloud computing increased very ...

  24. Index-based Dynamic-provision of Resources in Cloud Computing ...

    Abstract. The majority of identity management research in the context of cloud computing has focused on safeguarding user information. However, when users utilize cloud services, they frequently leave a trace, and the resultant user traceability may result in the exposure of private user data.

  25. Cybersecurity professor and Ph.D. students to present autonomous

    Fawad Ahmad, assistant professor in the Department of Computer Science and ESL Global Cybersecurity Institute at RIT, along with computing and information sciences Ph.D. candidates Kaleem Nawaz Khan and Ali Khalid, will present new research at the prestigious ACM International Conference on Mobile Systems, Applications, and Services (ACM MobiSys'24 Conference) in June.