water quality thesis

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 29 March 2024

Reliable water quality prediction and parametric analysis using explainable AI models

M. K. Nallakaruppan 1 ,
E. Gangadevi 2 ,
M. Lawanya Shri 1 ,
Balamurugan Balusamy 3 ,
Sweta Bhattacharya 1 &
Shitharth Selvarajan 4 , 5

Scientific Reports volume 14 , Article number: 7520 ( 2024 ) Cite this article

402 Accesses

Metrics details

Electrical and electronic engineering
Engineering

The consumption of water constitutes the physical health of most of the living species and hence management of its purity and quality is extremely essential as contaminated water has to potential to create adverse health and environmental consequences. This creates the dire necessity to measure, control and monitor the quality of water. The primary contaminant present in water is Total Dissolved Solids (TDS), which is hard to filter out. There are various substances apart from mere solids such as potassium, sodium, chlorides, lead, nitrate, cadmium, arsenic and other pollutants. The proposed work aims to provide the automation of water quality estimation through Artificial Intelligence and uses Explainable Artificial Intelligence (XAI) for the explanation of the most significant parameters contributing towards the potability of water and the estimation of the impurities. XAI has the transparency and justifiability as a white-box model since the Machine Learning (ML) model is black-box and unable to describe the reasoning behind the ML classification. The proposed work uses various ML models such as Logistic Regression, Support Vector Machine (SVM), Gaussian Naive Bayes, Decision Tree (DT) and Random Forest (RF) to classify whether the water is drinkable. The various representations of XAI such as force plot, test patch, summary plot, dependency plot and decision plot generated in SHAPELY explainer explain the significant features, prediction score, feature importance and justification behind the water quality estimation. The RF classifier is selected for the explanation and yields optimum Accuracy and F1-Score of 0.9999, with Precision and Re-call of 0.9997 and 0.998 respectively. Thus, the work is an exploratory analysis of the estimation and management of water quality with indicators associated with their significance. This work is an emerging research at present with a vision of addressing the water quality for the future as well.

A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble

Nazish Shahid

Source discrimination of mine water based on the random forest method

Zhenwei Yang, Hang Lv, … Xinyi Wang

Computational assessment of groundwater salinity distribution within coastal multi-aquifers of Bangladesh

Mehdi Jamei, Masoud Karbasi, … Zaher Mundher Yaseen

Introduction

The major part of our earth comprises water and it is extremely important for the survival of all humans and animal species. Water makes up over 326 cubic metres of the planet’s surface, which is almost 71% of its total area out of which 97% is seawater. Only 0.5 percentage of the drinkable water on earth is accessible, while the remaining 2.5 percentage is either trapped in glaciers, polar ice caps, the atmosphere, on soil, is polluted, or lies beneath the earth’s surface far beyond human reach. If the global water supply is 100 L, consequently the amount of drinking water would be only 0.003 L, which is just a teaspoon. Therefore, the management and preservation of drinking water is regarded as a top priority. It is the most critical issue for mankind to address given the extremely limited amount of water that is accessible for use. The quantum of water around the world is represented in Table 1 .

Water is a common and crucial resource shared among all humans, animals, and plants and is a necessity for all species. Each one of these species has its own respective needs for water quality. Total Dissolvable Solids (TDS) of soft water for human consumption range from the best quality stated, which is between 50 mg/dL and 150 mg/dL. Between 150 mg/dL and 300 mg/dL is the next level that can be applied to humans. The plants need water that is between 700mg/dL and 800mg/dL. The animals, especially cattle consume water around the quality of 1000 mg/dL. It is thus evident from all these observations that water quality management is essential to ensure sustainability and a healthy life on Earth. The impact of water quality prediction is crucial at a global level for many reasons. First of all, to get clean and safe water is a basic human necessity and water quality prediction aids to guarantee the availability of potable water for societies worldwide. Water quality is related to public health as polluted water may cause waterborne diseases which could affect millions of humans globally. A sustainable environment is an important aspect of human well-being by preserving ecosystems and biodiversity. The significance of water quality assessment is profound and intricate by various organizations globally. The WHO (World Health Organization) , UNEP (United Nations Environment Programme), EPA (United States Environmental Protection Agency), EEA (European Environment Agency), IWA (International Water Association) and WEF (Water Environment Federation) are fanatical for water quality assessment and addressing the mitigation strategies for water quality challenges. Water quality creates impact on public health globally and resulting in dissemination of waterborne diseases like typhoid, dysentery, cholera, dengue and malaria and cause substantial risks worldwide.

The advancement in computing technologies and artificial intelligence have elevated the standards of water quality assessments 1 . Measurements and estimations about the quality of the water have become easier to calculate and accurate, especially with the development of Industry 4.0 standards and Internet of Things (IoT) sensors. With the integration of IoT sensors, AI solely serves as a supporting tool to automate water quality checks 2 . Classification and Regression models based on machine learning help in determining the water quality. Depending on the outcomes, classification results tend to be binary or multi-classified. Real-time sensor data are collected, given feature labels, and then classified based on the importance of the feature labels. Earlier, these measurements used to be carried out with fuzzy-based decision support systems 3 with subjective decision-making models. AI development has made it possible to classify and analyse quality aspects quantitatively. The accuracy of the water quality assessment has been validated using various performance metrics like accuracy, precision, recall, and f1-score. AI models also support such quantitative analysis, classification of water sources, and prediction of drinkable water as well as identifying the mixing of bouyant pollutants in water sources 4 .

Despite its success in automating tasks and making water quality predictions using diverse models, the AI models lack transparency and are considered black-box where the decisions are derived but the reasoning behind such decisions is not revealed. The present generation validation frameworks for water quality management need justifiability, transparency and explainability, which is possible to be rendered by Explainable AI (XAI) based systems. XAI is a technology that is white-box and answers the uncertainty related to the classification and regression problems of AI. XAI applies a model-agnostic approach, where the machine learning models can be treated independently for interpretation. Additionally, XAI discusses how the model is chosen, how it works, and how it performs categorization. Through the assessment of a problem’s feature weights, XAI also can determine a feature’s relevance. This clarifies how a feature value relates to a certain target class classification. As an example, XAI uses models like Partial Dependency Plots (PDP) 5 , which describes the relationship between the features using lasso functions. This model may identify the linear relationship between two characteristics of water quality data and explain their correlation. In XAI, models like Local Interpretable Model agnostic Explainer(LIME), explain the relationship between a single feature and relevant others in local surrogacy. This infers that, except for the one-row value of the dataset, it is possible to relate a target attribute to the other independent variables. LIME in this regard can be used to explain the target classification for a single row instance about the water quality 6 . In the proposed work, XAI, which employs both local and global surrogates, includes SHAPELY. The model offers a solution that takes into account the importance of each feature in determining the target as well as the dependency between features, the relationship between features, and the explanation of decisions through a variety of plots, including force plots, summary plots, dependency plots, and decision plots. The framework is very adaptable and capable of giving a thorough explanation of the characteristics of the water quality and how they affect the classification of the water quality.

Advantages of the proposed model

Explainable AI plays an important role in improving the interpretability of predictions made by machine learning models. More transparent predictions are generated by these models. In the proposed approach, the authors have employed LIME and SHAP to interpret predictions achieved from machine learning, which identifies inputs as an important metric for selecting the features. By applying the XAI approach, the proposed model provides deep insights into the features and allows informed decision-making in water management processes.

Contributions of the paper

The following points describe the contribution of the proposed work.

The proposed work offers a comprehensive analysis and white-box description of the classification problem for water quality.

The framework incorporates extensive pre-processing of the dataset to ensure it fit to be fed into the XAI model.

Imputation of missing data is carried out to increase the accuracy of the findings.

The proposed work ensures achievement of most significant features, identification of the feature importance, feature dependencies, and feature weights, that enable optimized classification of water quality dataset.

The proposed approach employs both model-based and model-agnostic interpretations, using model-based ML implementations and model-agnostic XAI implementations.

Organization of the paper

Section “ Introduction ” of the paper introduces the problem of the research paper with the description of the unique contributions. Section. Introduction ” also describes the literature review of the related problems on water quality, in related works subsection, with an exhaustive survey of the various applications and case studies pertaining to water quality management using AI and machine learning approaches. Section “ System model and architecture ” describes the methods applied in the proposed work with the implementation of the mathematical model with the algorithm of the proposed work. Section “ Results ” describes the results of various ML and XAI models with relevant tables and graphs. Section “ Discussion ” provides the comparative analysis of the results with a discussion of challenges and solutions of the proposed work. Section “ Conclusion ” concludes the paper with future directions.

Related works

Lu et al. 7 proposed the central environmental protection inspection (CEPI), which was implemented and the causes of transboundary water contamination were investigated. The triple difference technique (DDD) was used to assess how the CEPI affected pollution and the results to determine how significantly water pollution was decreased as well as the significance of CEPI laws for addressing transboundary pollution. Halder et al. 8 , the Turag River’s neighbouring communities are suffering from major health problems as a result of water contamination. For the sustainability of household and aquatic life, the river’s water quality was unsuitable. The study noted that the threshold values for turbidity, total dissolved solids (TDS), chloride (CL-), chemical oxygen demand (COD), carbon dioxide (CO2), and biochemical oxygen demand (BOD) are higher than the standard permissible limits, which may result in health problems like respiratory illnesses, diarrhoea, cholera, dengue, malaria, anaemia, and skin problems. A study evaluating metal pollution management and mitigation tactics on soil and water was presented by Wang et al. 9 . In this study, the remediation of metal contamination from water and soil utilising chemical, physical, and biological approaches was discussed. In this study, the current methods for reducing heavy metal pollution of the soil and water are examined. Elehinafe et al. 10 discussed the importance of water contamination and examined the main cause of water scarcity. The proposed work discussed the effect of hazardous chemicals on the water, including pesticides, heavy metals, and micro-pollutants. This study outlined the numerous technologies that are currently available to eliminate hazardous materials and provide sustainable clean water resources. Mu et al. 11 proposed a solution for the investigation into farmers’ readiness to implement Rural Water Pollution Control (RWPC). This study examines farmers’ viewpoints to improve the quality of life for locals who reside in rural regions and avoid water contamination. To analyse the contributions of contaminants, Wang et al. 12 developed a unique contaminant flux variable model for river water quality assessment. The framework effectively identified the sources of pollution and evaluated the efficacy of projects designed to reduce water pollution. Zadeh et al. 13 proposed WQPs for estimating chemical oxygen demand and biochemical oxygen demand using the MKSVR algorithm. PSO algorithm is used for solving optimization problems. The multiple kernel support vector regression (MKSVR) is compared with SVR and Random Forest Regression and achieves a better accuracy level for BOD prediction. Nagaf et al. 14 presented a framework for assessing the WQI values based on the NSF guidelines. This framework uses four data-driven models such as EPR, M5 MT, GEP and MARS for predicting WQI values in the Karun River. The classification uses 12 water quality parameters and missing values were extracted from the image analysis. Zadeh et al. 15 proposed a model that utilizes gene expression programming, evolutionary polynomial regression, and model trees for predicting WQPs. The biochemical oxygen demand, dissolved oxygen and chemical oxygen demand are used for estimation with nine parameters. The gamma test is used for determining important parameters. Najaf et al. 16 proposed a water quality predicting framework for estimating the water quality index in the Hudson River based on Canadian Council of Ministers of the Environment (CCME) guidelines. The four artificial intelligence techniques M5 MT, Multivariate Adaptive Regression Spline, Evolutionary Polynomial Regression and Gene Expression Programming are used with Landsat 8 OLI-TIRS images. The results proved that the MARS technique achieved the best outcome compared to other models.

Chowdhury et al. 17 emphasized the sources of water contamination which are caused by densely populated industrial areas that are located close to water bodies. The main causes of water contamination are dangerous chemicals and heavy metals. Farmers’ pre-owned pesticides, including different types of carbamate and organophosphorus pesticides, are the main causes of water contamination on agricultural grounds as per the study. Ahivar et al. 18 examined the use of heavy metal pollution indices (HPIs) in soil, water, and sediments. For assessing metal contamination, HPI is considered a crucial instrument. Each method’s pollution index is assessed to interpret the pollution levels. The selection of HPIs based on the parameters and standards for evaluating the quality of the water and soil is offered. Chen et al. 19 presented a study by used various mathematical and statistical approaches to check the quality of water. The factors indicating the water pollution and the seasonal characteristics are evaluated to reduce the river water pollution. The Principal Component Analysis, Cluster Analysis, Network Analysis and Co-Occurrence Analysis were carried out to find the potential source of river water pollution. Fan et al. 20 examined the quality of water using several mathematical and statistical techniques. To lessen river water pollution, the variables implicating contamination and the seasonal traits are assessed. To identify a likely cause of river water pollution, the Principal Component Analysis, Cluster Analysis, Network Analysis, and Co-Occurrence Analysis were performed. Wang et al. 21 formulated the performance indices for explaining the Water-Energy-Pollution nexus (InWEP) effects of scales. The Nexus Pressure Index (NPI) and Nexus Coupling Index (NCI) were used to represent the pollution pressure and the interacted relations. The factors for InWEP were analysed using the Structural Equation Model (SEM) considering four objects namely enterprises, countries, industrial zones and cities. The performance of InWEP was evaluated for the performance metrics - efficiency, structure and location. To evaluate the quality of groundwater surrounding nearby areas in an industrial metropolis, Asomaku 22 evaluated the water pollution indices. Nine samples from three landfills are used in the analysis of the groundwater’s chemical and metal characteristics. The study in Balaram et al. 23 explored many elements that have an impact on water quality, including climate change, industry, aquaculture, mining, and agriculture. For the quantitative and qualitative evaluation of hazardous metals, metal species, isotopes, and other contaminants that are present in water, various ICP-MS techniques are applied. Yuan et al. 24 proposed a water quality monitoring framework using biological sensors for water quality assessment. Borzooei et al. 25 presented a study to estimate the frequency weather events that creates impact on waste water assessment. The Time series data mining approach is used for categorizing the dry and wet weather events. Noori et al. 26 presented a report on decline of groundwater recharge in Iran. The study presents the average amount of ground water recharge is more than the annual runoff 4 utilized WCSPH (A weakly compressible smoothed particle hydrodynamics) model for simulating the near-shore hydrodynamics. The study conducted experimental and numerical evaluation for detecting the causes for mixing the buoyant pollutants in coastal water source. Yeganeh-Bakhtiar 27 presented a framework using MOS (Model Output Statistics) for establishing the statistical relationships among predicator and predicant.

When evaluating water quality using factors like toxicity and pollutants, computer vision and biological sensor systems are utilised in tandem. To retrieve the important data from images taken by a microscope, a microfluidic chip with sensors is utilised. This chip monitors water samples. Figure 1 describes various factors causing water pollution in smart cities including construction activities, atmospheric deposition, natural factors, municipal wastewater, stormwater runoff, incorrect waste disposal, industrial discharges, agricultural runoff, and municipal wastewater. Jeihouni et al. 28 implemented and compared five data mining techniques, including the Ordinary Decision Tree (ODT), Random Forest (RF), Chi-square Automatic Interaction Detector (CHAID), Iterative Dichotomiser 3 (ID3), and Random tree, to identify high-quality water zones. Eight parameters are used in the evaluation process while deriving rules. Compared to the remaining models, the RF performed well, with an accuracy rate of 97.10%. Lee et al. 29 implemented a framework for evaluating the quality of groundwater utilising a Self-Organizing Map (SOM) technique and fuzzy c-means clustering (FCM) was given. The two methods are employed to describe the complex nature of groundwater. SOM employed 91 neurons to categorise 343 groundwater samples, and FCM grouped the water sources into three groups. Agarwal et al. 30 proposed AI based water evaluating technique to predict the water quality index using Particle Swarm Optimization (PSO), Naïve Bayes Classifier (NBC), and Support vector machine (SVM). PSO was used in this regard for optimizing the classifiers wherein the PSO-optimized NBC obtained 92.8% accuracy and PSO-optimized SVM obtained 77.60% accuracy. Table 3 illustrates various existing state-of-art techniques proposed for assessing water quality, its advantages and research gaps.

Figure 1 illustrates the factors causing water pollution. The factors includes Industrial discharges, agricultural runoff, municipal waste water, storm water, improper waste disposal, oil spills and chemical spills, construction wastages, and atmospheric deposition. The factors are very crucial to protect public health and ecosystem , sustainability development, creating public awareness and for pollution prevention.

Factors causing water pollution.

Figure 2 depicts the required physical parameters such as Temperature, Turbidity, Conductivity, Odour and Color represented in percentage, for evaluating the quality of water. Examining the physical parameters is essential for identifying the potential hazards that leads to poor water quality and for preventing ecosystem health.

Physical Parameters.

Figure 3 depicts the necessary chemical parameters, such as pH, Dissolved Oxygen (DO), Total Dissolved Solids (TDS), Nutrients (nitrogen and phosphorus), Total Suspended Solids (TSS), Heavy Metals, and Organic Matter (OM), as well as Chemical Oxygen Demand (COD) and Biochemical Oxygen Demand (BOD) with percentages, that must be measured in order to assess the water’s quality.

Chemical parameters.

Figure 4 presents various supervised learning models for estimating water quality, including Random Forest, Support Vector Machine (SVM), Decision Trees, Neural Networks, and Gradient Boosting Approaches like XGBoost and AdaBoost.

Supervised learning models.

Figure 5 represents various unsupervised learning models such as Principal Component Analysis, Cluster Analysis and Self-Organizing Maps (SOM) for addressing the quality of the water. PCA is a dimensionality reduction approach mainly utilized for analyzing the high dimensional datasets. Cluster analysis techniques are used primarily for grouping water samples based on similarities. SOM technique is principally used for organizing the water quality data.

Unsupervised learning models.

Figure 6 highlights the various Hybrid ML models such as ensemble models with Reinforcement Learning (RL) for addressing the evaluation of quality of water. The various machine learning models can be verified based on the applications, parameters in order to determine the quality of the water, dataset size and its quality based on the assessment of the performance metrics.

Hybrid ML models.

The motivation for the proposed research, along with the research gap analysis with similar existing research works is discussed as per Table 2 . The comparative analysis and research of similar existing works are presented in Table 3 . These two discussions provide a comprehensive understanding of the requirements, that are essentially required in the design of the proposed system and implementation.

Table 3 refers to similar literature review of various models of machine learning such as DT,RF,DCF, SVM, and so on. This table also discusses about various deep learning models such as, Artificial Neural Networks (ANN), Probablistic Neural Network (PNN), Convolution Neural Networks (CNN) and statistical regression models such as Auto-Regression in Moving Average(ARIMA). This table discusses the the research gaps identified and enhanced in the proposed work. These models were mostly numerical evaluations with regression analysis. The proposed model and the system is classifier which deploys XAI framework, to discuss the impact of parameters, that determine the portability of the water with end user perspective. This is towards achieving environmental sustainability on water conservation and harvesting.

Statement of objectives

The proposed work offers a comprehensive analysis and white-box description of the classification problem for water quality . The framework incorporates extensive pre-processing of the dataset to ensure it fits into the XAI model. Imputation of missing data is carried out to increase the accuracy of the findings. The proposed work ensures the achievement of the most significant features, identification of the feature importance, feature dependencies, and feature weights, that enable optimized classification of the water quality dataset. The proposed approach employs both model-based and model-agnostic interpretations, using model-based ML. Donnelly et al. 46 implementations and model-agnostic XAI implementations. The quality of water is greatly challenged by innumerable influencing factors. These factors vary from condition to condition and place to place. For example, Microplastics (MP) are emerging pollutants in the marine environment with potential toxic effects on littoral and coastal ecosystems 47 and as well as identifying the mixing of bouyant pollutants in water sources 4 . The laboratory evaluations show the presence of polyethene (PE) particles in the waves of the ocean with wave steepness Sop of 2–5%. The transportation of which could cause severe water pollution on the seashores 48 .These measurements require quantification and feature analysis when it is evaluated with AI. This is where the XAI plays a vital role in measuring the order and degree of the pollutants causing the quantifiable pollution in the water.

Case studies

Importance of XAI in Water Quality Assessment: The following case studies delineate the advent of the potential impact of XAI, with a groundbreaking revolution in water quality assessment.

Case Study 1: Pollution of Ganges 49 This case study emphasises the Ganga River pollution issue in India, which has an extremely detrimental impact on humans and the entire ecosystem. The Ganga River is polluted by industrial, animal, and human waste. The main source of pollutants is industrial rubber waste, followed by leather and plastic manufacturers who dump their untreated wastewater into the river. The Ganga Action Plan was developed by the Indian government to combat Ganga pollution. This implies the need for the reinforcement of environmental restrictions to improve river quality.

Materials and methods

An effective policy for health protection should thus emphasize providing access to safe drinking water regardless of social and economic diversity. In some places, it is evident from previous studies that investments in access to clean water and sanitation yield economic benefits for any country. It is a significant aspect of eco-friendly health and public safety, as it regulates the appropriateness of water for numerous purposes, such as drinking, agriculture, industry, and recreational purposes. The important key indicators related to water quality are its physical, chemical, and biological characteristics and its sources of pollution. The dependent target class is potability. The other independent features are pH value, hardness, solids (Total Dissolved Solids-TDS), Chloramines, sulfate, conductivity, organic carbon, trihalomethanes, and turbidity. Water’s potability indicates its purity and safety for ingestion. The parameters used and their WHO limits, the hyper-parametric analysis are listed in Table 4 , and the feature description of parameters are listed in Table 5 .

XAI framework facilitates transparent and interpretable explanations of the outcome generated by the ML algorithm-based frameworks. XAI can thus be applied in the present context of water quality assessment to ensure accurate decision-making, thereby, enabling trustworthiness, enhancement of transparency and interpretability of the behaviour of the model.

Hydro-climatic application

XAI framework can be used to solve Hydro-Climatic problems 50 with diverse spatio-temporal scales. XAI is utilized to unveil the nonlinear correlative causes, in which the performance of the model is enhanced. It enables the users to discover new knowledge and further easily understand the rationale behind the decision outcomes.

Groundwater potential predictions

XAI approach can explain the decisions made by ML models for groundwater potential prediction. The user can easily interpret the outcomes and further comprehend the underlying for an outcome in the realm of water quality evaluation for conservation, and sustainability of water management.

Water quality predictions

XAI framework can forecast water quality using metrics and factors with interpretable results. Water quality assessment managers can comprehend the variables and parameters used for outcomes. This forces quality managers to mitigate water quality issues.

Flood hazard risk predictions

Floods can trigger landslides from excessive rainfall. Flooding causes countless casualties and property damage. Disaster warning systems need a flood risk assessment. XAI can forecast rapid water depths and provide timely, interpretable alerts to protect public health and safety.

Environmental impact assessment

XAI approach can be used for assessing the environmental impact on the water pollution incidents, and provide insight for mitigation and management. It enhances transparency and accountability by providing insights into the factors and parameters influencing environmental conditions. The analysis provided by the XAI model helps the stakeholders to identify the most significant factors contributing towards the environmental outcome.

System model and architecture

System model.

Worldwide, numerous water bodies are contaminated by a variety of anthropogenic and natural processes, resulting in a variety of health problems for human life. Thus water quality requires rigorous monitoring and management to prevent pollution. In accordance with WHO guidelines, the polluted water must be treated using the proper water treatment techniques before consumption. The quality of water is contaminated by the incessant addition of toxic chemicals and microbes and also by the relentless addition of local and industrial sewage sludge, trash, and extra hazardous waste that are toxic to humans and society. Many uncertainties are required to be quantified for all machine learning models. The uncertainties such as selecting and gathering the training data, absolute and accurate training data, understanding the machine learning models with performance bounds and drawbacks and finally the uncertainties which are based on the operational data. To minimize the challenges, adhoc steps like studying the model variability and sensitivity analysis are applied. In current years, the validation of water quality has taken active momentum because of ever-increasing water pollutants which spoil water that is dedicated for domestic use and irrigation. Water quality indices (WQIs) are used worldwide very efficiently for the assessment of the quality of both groundwater and other relevant water sources. Machine Learning techniques play a substantial role in identifying the quality of water using explainable AI. Figure 7 depicts the overall architecture of the proposed framework of our study. The dataset used in the study is split into the ratio of 70:30 wherein 70% is used for training and 30% is used for testing. The model is trained using a decision tree, random forest, SVM, logistic regression, and Naive Bayes algorithms. XAI model is implemented in the framework wherein LIME and Shapely are used to provide explainability and interpretability to the results generated by the machine learning model .

Interfacing ML algorithms with XAI.

Decision tree

The decision tree is stated as a recursive partition of the set of all possible instances 27 51 . The goal of a decision tree is to split the data which consequences in maximum information gain 52 . Let L be a sample for learning, L= ( \(v_{1}\) , \(c_{1}\) ), ( \(v_{2}\) , \(c_{2}\) ),( \(v_{i}\) , \(c_{j}\) ). Here, \(v_{1}\) , \(v_{2}\) , \(v_{3}\) , \(v_{i}\) are represented for measurement vectors, and \(c_{1}\) , \(c_{2}\) , \(c_{3}\) , \(c_{j}\) are represented for class labels.The batch conditions are reliant on one of the vector variables denoted as \(s_{i}\) 53 . Let us assume if the \(e_{i}\) of an element fits class label \(c_{i}\) , then \(p_{i}\) is denoted as per the Eq. ( 1 ).

Entropy evaluates the random value from the given samples and the homogeneity of the expected rate of a group of data 54 . To divide the data most optimally, the lowest value of entropy signifies better homogeneity.

L represents the data set evaluated by the entropy, ‘i’ denotes the classes in the set L, and \(e_{i}\) indicates the number of data labels that fit class ’i’ 55 . The least value of entropy is used for choosing the best feature. Information gain enumerates the amount of information provided by a particular characteristic about the target variable to minimize the uncertainty present in the data set. It is calculated by comparing the weighted average of entropy to the original data set after the splitting process. Let us assume that R is the rate for the features ‘f’, \([|{L}^R|]\) denotes the subset of LS so that bf=R 56 . After splitting L on the feature, information gain is given as follows.

The Gini index evaluates the heterogeneity of a selected node in the decision tree. It counts the probability of wrongly identifying data in the node. The Gini index begins from the value 0 to 1, where 0 indicates a pure node and 1 denotes a node that is distributed equally. The Gini index is represented as

Here, \(e_{i}\) represents the quantity of data labels. When the data is divided on class d as L1 and L2 with sizes \(s_{1}\) and \(s_{2}\) , Gini is evaluated as

Due to its comprehensible nature, decision trees can manage both numerical and categorical data with automatic feature selection.

Random forest

Random forest is an ensemble method that groups the results of multiple decision trees to compute predictions with enhanced accuracy. Every decision tree is improved on a random subset of labels from the dataset, to achieve diversity between the trees. When the data in the training label is t, then with replacement ‘n’ data are verified as bootstrap data 57 . This is done to produce the decision tree with training data. When there are ’m’ labels, a \(<<\) m is selected so that ‘a’ values are considered at random from ‘m’. The value ‘a’ is constant when the tree is growing to the highest level. The highest vote is noted as a new instance. (GE*) is the generalization error for the random forest and is denoted as

Here, f(X, Y) is a margin function to count the average number of votes from (X, Y). X denotes the prediction value and Y denotes the classification problem. The margin function is represented as

where ’F’ is for the indicator function. The value for the margin function is indicated as

The average value of a random forest and the mean correlation of the classifiers are combined as generalization errors. The p denotes the mean of the correlation. The generalization error for the upper bound is

Random forest reduces the over-fitting problem compared to a single decision tree. It can effectively manage high-dimensional data.

Support vector machine (SVM)

Let us consider a binary classification problem 1 or −1 to represent the sample variables 58 . When i elements of the sample variable is − 1, it is a positive class. When the i variables of the samples is 1, it is a negative set. Let V_i = X1, X2,...Xn, Yi, i = 1,2,...n, \(Y\_{i}\in {-1,1}\) , Si indicates i item from the samples. Yi is the i item of the tests performed 59 . To split the samples into two parts, the function f(X) = ZTX+ b is used, where Z is the coefficient vector to normalize the hyperplane. The optimal margin is given as

\(\underbrace{MIN}_{\begin{array}{c} w, b, \\ \varepsilon \end{array}} \left( {\frac{1}{2}}Z^{TZ}+C\sum _{i=1}^{n}\varepsilon _i\right)\)

subject to:

The Lagrangian equation is given as

\(\underbrace{MAX}_{\propto } \left( \sum _{i=1}^{n}{\propto _i-\frac{1}{2}}\sum _{i,j=1}^{n}{\propto _i\propto _jY_iY_jX_iX_j}\right)\)

The Lagrangian equation with the maximum value with \(\propto _i\) a positive multiplier for the equation \(\sum _{i=1}^{n}{\propto _iY_i=0}\) and \(\propto _i\ge 0\) to change the optimal hyperplane 60 is presented. The optimal equation is given as

In the above equation \(\propto _i=0\) of the Lagrangian multiplier is nearest to the margin of the optimal hyperplane denoted as a support vector. This data is linearly separable by the kernel to evaluate the expected result from the instance 61 . The kernel function is denoted as

The generalized linear equation is changed to represent the non-linear dual Lagrangian \(La(\alpha )\) .

\(Lag\left( \propto \right) =\ \sum _{i=1}^{n}{\propto _i-\frac{1}{2}\sum _{i,j=1}^{n}{\propto _i\propto _jY_iY_jK\left( X_i,X_j\right) }}\)

Subject to:

The Lagrangian equation can be used for the separable case as

The SVM algorithm is very effective when the quantity of features is higher than the number of samples 62 .

Logistic regression

Logistic regression is used for binary classification problems to forecast the probability of an occurrence matching to a particular class. If the dependent value is binary, a regression analysis is used. The idea in logistic regression(logreg) is the logarithm ‘logn’ of odds of X, and odds are the ratios of probabilities ‘pb’ of X 63 . The rate of the independent value is termed odds because logistic regression measures the probability of an act that happens over the likelihood of an occurrence that does not happen.

where p is the probability of a positive output and x is the variable. The \(\alpha\) and \(\beta\) , are the logistic regression parameters 64 . The above equation is used for finding the number of occurrences as

\(p=probability(Y=positive\ outcome|X=x,\) a specific value)

For multiple predictors, a logic regression equation can be written as

\(p=probability(Y=positive\ outcome|X_1=x_1,\ldots ,x_k)\)

Here, pb refers to the probability of the positive occurrence of the event, the Y-intercept is \(\alpha\) , the regression coefficient is \(\beta\) , and e is 2.71828. Logistic regression is applied in various domains like finance, healthcare, social sciences, and many more for predicting diseases, credit default, etc.

Naive Bayesian classification

Gaussian Naive Bayes is a probabilistic classification algorithm developed based on Bayes theorem. It refers to the features which represent a normal distribution 65 . It classifies the samples as most likely classified as

If the sample \(Y_{j}\) is a vector, \(x_{j}\) is the \(j^{th}\) value which contains different values of \(y_{j}\) . The attributes used are dependent and it is shown as

Substituting the above equation into Bayes classification, we get

The Gaussian Naive Bayes algorithm is mainly applied for spam filtering, sentiment analysis, and text classification problems where the features must be continuous and follow the Gaussian distribution 66 .

LIME (Local interpretable model-agnostic explanations)

LIME explains the predictions of any kind of classifier by approximating locally along with an interpretable system. It changes the data sample by altering the values of features and monitors the impact of the result. It explains the predictions from every sample 67 . To receive the labels for the current data, alter the samples z ’s into the unique form \(z \in {\mathbb {R}}^d\) . Since the samples x ’ are generated randomly, x samples closer to the unique instance z for weighing are considered. The weight is evaluated as \(\Pi _z(x)\) for measuring the intimacy between the data z to x. The currently weighted data X and the samples formed by f ( x ), are trained as \(g \in G\) , where G is a model. The interpretable model \(\xi (x)\) of the current data g for explaining f ( x ) as

L is the loss function to measure whether g is following the state of f in the nearest neighborhood of z . If the loss function is reduced, the behaviour of g takes the behaviour of f as \(\Pi _z\) . The complexity of the model \(\Omega (g)\) should be low. When \(g(x')\) is considered as a linear function, \(g(x') = \varphi ^T x' + \varphi _0\) , changes the equation into a linear regression task to evaluate \(\varphi\) and \(\varphi _0\) .

SHAP (SHAPELY Additive exPlanations)

SHAP values determine the status of each feature for the prediction of a specific class 68 . The prediction f ( y ), using \(s(y')\) , a model for the binary elements \(x' \in \{0,1\}^M\) with the sets \(\emptyset _i \in {\mathbb {R}}\) , is given as

M refers to the explanation variable.

where f is the model of the SHAP, z refers to the variable, and \(z'\) are the variables chosen. The value \(f_y(x') - f_y(x'\setminus i)\) indicates all the predictions.

In this section two algorithms are discussed: one for the algorithm-based evaluation of water quality 1 and another for the algorithm-based explanation of water quality 2 . These two algorithms provide a holistic analysis and explanation of water quality management.

Algorithm for water quality classification

Algorithm for water quality Explanation

The water quality is assessed in the proposed work based on nine parameters such as pH value, Hardness (Total Dissolved Soils), Sulphate, Chloramines, Trihalomethanes, Conductivity, Organic carbon, and Turbidity. The target class for this dataset is Potability which is binary where 0 indicates that the water is not potable and 1 reflects its potability.

The dataset consisted of high missing values on sulphate and lower missing values on Chloramines and Trihalomethanes. The missing value imputation is hence performed and all the attributes are imputed for the missing values. The target class is converted into a numeric array for the processing of XAI models. This is done with the label encoder application of Python. The dataset is split with a ratio of 80:20 for training and testing.

The correlation analysis is performed on the dataset. The attribute Hardness has a high correlation of 0.34 with the target attribute potability. The next best correlation value is 0.24, which is rendered by the attribute Chloramines, followed by 0.21 produced by the Trihalomethanes attribute. Turbidity is the next better parameter with a correlation value of 0.16. The correlation heat map between the attributes of interest and the target attribute is presented in Fig. 8 .

Correlation analysis for water quality attributes.

The trained dataset is applied with SVM, LR, DT, RF and Gaussian Naive Bayes machine learning models. The SVM did not provide the desired classification and failed to converge for the portable data. The other models generated the results within the desired range and are presented in Table 6 .

The sensitivity and specificity measurements for the Machine learning models are presented in Table 7 . Considering the performance metrics, the results reveal the superiority of the RF model which generates a better outcome in comparison to the other models and thus it has been selected to be fed into the XAI model to provide enhanced interpretability, justifiability and transparency.

The XAI model implementation is performed considering SHAPELY values in the pandas’ application. This application focuses on the value of each feature in determining the target attribute which is potability. The significance of every feature is assessed through the various applications of SHAPELY. The first XAI model generated is the force plot, which provides the minimum and maximum prediction score of the target attribute in a dataset. The blue colored contour shows that a low score is measured and the red color shows a high score. The values at the separation boundary have the highest priority attribute. The force plot is presented in the Figs. 9 and 10 .

Force plot for water quality.

Force plot for potability.

The Global surrogate version of the force plot is presented in Fig. 11 . The blue regions indicate no potability and the red-coloured regions indicate potability. The border areas of the intersection show the attributes which have higher significance for the feature selection. The Sulphate value of 444 at the point of intersection indicates its significance in explaining this test patch for the entire dataset.

Test patch for potability.

The next XAI application of SHAPELY is the summary plot. This plot describes the features in determining binary classification problems. This predicts the scale of low to high for two significant results. The blue contour indicates lower significance towards the prediction and red indicates higher significance. The summary plot is shown in Fig. 12 . The Solids, pH, Sulfate, and Hardness show higher significance in determining the output.

Summary plot for potability.

The dependency plot shows the relationship between two features in the dataset. It provides the output in granular form with a variable-like result rather than simply a graph-like result of a Partial Dependency Plot(PDP). The relationship between the Sulphate and Potability is depicted in Fig. 13 . The mid-range of the dataset provides more granular output, which shows that the Sulphate parameter values are more significant in determining the values of potability in the mid-range of the dataset.

Dependency plot for potability.

The decision plot, which displays how the values of the features affect the goal, is the final model of XAI. This plot is a local surrogate plot, which would only explain a certain data instance, in which what values of the attributes influence the decision to be 1 or 0 as the decision of the model. The decision plot for the potability as 1 is illustrated in Fig. 14 . The potability 0 is illustrated in Fig. 15 .

Decision plot for potability.

The results of the experiment reveal the superiority of the RF model which generates an accuracy of 0.999 followed by DT, generating an accuracy of 0.998. The lowest accuracy is generated by the SVM model of 0.63. The RF is thus chosen for the implementation of the XAI model using SHAPELY. The comparative analysis of the aforementioned various models is depicted in Fig. 16 , considering evaluation metrics accuracy, precision, recall, and f1-score. In the case of all the performance metrics, the RF model outperforms the other models. Figure 17 shows the comparison of the sensitivity and specificity measures. The RF model stands superior in these considerations as well. Thus, the discussion offers a visual representation and justification of the reasoning behind the choice of RF to be included in the XAI framework to offer explainability.

Comparative analysis of machine learning models used.

Comparative analysis of sensitivity and specificity.

Apart from the selection of the RF model, SHAPELY provided five different models to explain the feature importance and relationships. The proposed work presented the force plot, summary plot, test patch, dependency plot, and decision plot. The Final decision plot explained how the classification is carried out using the corresponding values of the independent variables. Thus the black-box classification is explained in the white-box context of XAI. The following section describes the challenges and opportunities of the proposed work with an emphasis on future directions.

The proposed work may be influenced by the following challenges which are described in detail as follows,

Global unity

For the successful implementation of the system, a unanimously accepted implementation is essential. Unfortunately, water quality estimation and related research are limited to consideration of specific datasets acquired for a particular region, wherein the generated results may differ with the changes in geographic location. Thus the generated results can never be considered suitable on a global scale. The parameters that influence the water quality may also vary across the world, and hence the proposed work can never be considered as a universal solution.

Training and re-training

The qualifying attributes that determine the quality of water vary across the globe and hence the proposed model needs to be re-trained 69 when applied to a new environment of study. This would allow the model to unlearn and re-learn new environments. On the contrary, the complexity of the model would also increase. The accuracy and other performance metrics which are measured in the proposed work may drastically decrease as well in a different environment of study. Thus applying this model to versatile environments is complex and would be a challenging task.

Subjective or quantitative

The trade-off from subjective analysis (which was done through fuzzy-based methods in the form of the Analytical Hierarchy Process (AHP) and The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)) has improved the performance and ability to classify the models with better accuracy. However, the involvement of a subject matter expert is a missing point in the current research. Despite all the implementation and analysis from an engineering perspective, the involvement of an environmental scientist in any aspect of water research would contribute towards the enhancement of research quality.

Confusing solids

The proposed work identifies Solids as the primary influencing factor that affects potability. In real-world applications, solids can be of any form. For example, in sewage water treatment plants it can be either mud, Fat-Oil-Grease(FOG), or any other substances. Every solid wastage has its way of filtration and impact on water quality, which makes the recordings unstable from time to time. The attributes of research are too complex to handle in real-life scenarios, which acts as an inevitable yet detrimental impact.

Environmental challenges

Water resources are under serious threat due to water scarcity, water contamination, water conflicts and climate changes. Chemical and the municipal wastewater contaminates the water and endangering the life of the aquatic organisms and affect their ability to reproduce. This also makes them an easier prey to their predators. The food cycle and livelihood of the human is also greatly affected by the water contamination. Chemical substances make the water hard to recycle and consume by reducing the regeneration ratios.

Water quality and industrial sustainability

The era of Industry 5.0 focuses on the consumer centric industrial evolution with the idea of environmental sustainability. The futuristic technologies evolve with the improvement of technical viability, with the mission of sustainable development in the environmental aspects. Since the water is an irreplaceable and finite, the demand of the water is increasing with the industrial evolution and the water requirements on manufacturing and production industries would be very much essential as ever. The challenge is enhancement of the water harvesting, recycling and conservation. For all the above said processes quality of the water is the common essential requirement. Thus the quality of the water is more critical in all futuristic technological developments.

Research finding of the proposed work

The following items are presented as the findings are outcomes of the proposed work

The proposed work performs an exploratory analysis with XAI implementation providing an ability to improve the reliability of machine learning models providing explanation and transparency to the classification process.

The proposed work acquires data from a single dataset, where the performance of classification yields optimized results. This result may vary if the model is subjected to a different dataset constituting different features and instances.

The XAI reveals the most significant features contributing towards classification results and also explains the same.

The best fitting machine learning model is chosen for the explanation through an exhaustive analysis and evaluation of all the models considering the essential performance metrics. Thus the results produced by SHAPELY can be considered as the most reliable and acceptable.

The proposed work also suggests the importance of the subject matter expert, which can extend the usability of the proposed model at the universal level.

The predictions of the proposed work with the support of an explainer, helps end users and consumers to understand the quality of the water they use.

The features related to the classification and explanation, can be further controlled to diminish the levels of chemicals and pollutants in water recycling.

Total dissolvable solids quantification and the feature weights for the same determine the levels of filtration and carbon purification required in the recycling plants.

The proposed work brings insights of pollutants on the seashore and how the explainabilty can support the impurity estimations for such conditions also.

Water quality management impacts almost all aspects of life on earth and clean water is a basic necessity. The proposed work is extremely relevant in this regard wherein an exploratory analysis conducted to analyze and control the factors that deteriorate the quality of the water. The impact of these factors is explained using XAI models. The contribution of the XAI model lies in its ability to explain the role of the underlying parameters towards the classification of water being potable or not, based on their relative importance and unique properties. The XAI model uses SHAPELY considering the probabilistic prediction generated from the Random Forest classifier. This RF model in this regard is chosen as it yields the highest accuracy of 0.999 with sensitivity and specificity of 0.999 and 0.998, which is found to be superior in comparison to the other state-of-the-art models considered in the study. This justifies the reason for the RF to be selected for XAI implementation. The proposed model identifies the parameter “solid” as the most significant in terms of its impact on the potability of water. The proposed model yields optimized and explainable results considering the dataset used in the study. Future work may involve more complex and heterogeneous datasets to generate predictions. In such scenarios, the metric evaluations may differ. The usage of deep learning algorithms could further enhance the examination the solid sediments and generate classification results based on their mass, dimensions, and shape. The use of XAI in such a model would ensure a better explanation of factors relevant to the solid sedimentation in water.

Data availability

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Zhu, M. et al. A review of the application of machine learning in water quality evaluation. Eco-Environ. Health 1 , 107–116. https://doi.org/10.1016/j.eehl.2022.06.001 (2022).

Article PubMed PubMed Central Google Scholar

Miller, M., Kisiel, A., Cembrowska-Lech, D., Durlik, I. & Miller, T. Iot in water quality monitoring are we really here?. Sensors 23 , 960. https://doi.org/10.3390/s23020960 (2023).

Article ADS PubMed PubMed Central Google Scholar

Akhtar, N. et al. Modification of the water quality index (wqi) process for simple calculation using the multi-criteria decision-making (mcdm) method: A review. Water 13 , 905. https://doi.org/10.3390/w13070905 (2021).

Article CAS Google Scholar

Abolfathi, S. & Pearson, J. Application of smoothed particle hydrodynamics (sph) in nearshore mixing: A comparison to laboratory data. Coastal Eng. Proc. 35 , 1–13 (2017).

Google Scholar

Hájek, M. et al. A European map of groundwater ph and calcium. Earth Syst. Sci. Data 13 , 1089–1105. https://doi.org/10.5194/essd-13-1089-2021 (2021).

Article ADS Google Scholar

Li, L. et al. Interpretable tree-based ensemble model for predicting beach water quality. Water Res. 211 , 118078. https://doi.org/10.1016/j.watres.2022.118078 (2022).

Article CAS PubMed Google Scholar

Lu, J. Can the central environmental protection inspection reduce transboundary pollution? Evidence from river water quality data in china. J. Clean. Prod. 332 , 130030 (2022).

Halder, J. N. & Islam, M. N. Water pollution and its impact on the human health. J. Environ. Hum. 2 , 36–46 (2015).

Article Google Scholar

Wang, Z. et al. Overview assessment of risk evaluation and treatment technologies for heavy metal pollution of water and soil. J. Clean. Prod. 379 , 134043 (2022).

Elehinafe, F. B., Agboola, O., Vershima, A. D. & Bamigboye, G. O. Insights on the advanced separation processes in water pollution analyses and wastewater treatment: A review. S. Afr. J. Chem. Eng. 48 , 188–200 (2022).

Mu, L., Mou, M., Tang, H. & Gao, S. Exploring preference and willingness for rural water pollution control: A choice experiment approach incorporating extended theory of planned behaviour. J. Environ. Manag. 332 , 117408 (2023).

Wang, Y., Ding, X., Chen, Y., Zeng, W. & Zhao, Y. Pollution source identification and abatement for water quality sections in Huangshui River Basin, China. J. Environ. Manag. 344 , 118326 (2023).

Najafzadeh, M. & Niazmardi, S. A novel multiple-kernel support vector regression algorithm for estimation of water quality parameters. Nat. Resour. Res. 30 , 3761–3775 (2021).

Najafzadeh, M., Homaei, F. & Farhadi, H. Reliability assessment of water quality index based on guidelines of national sanitation foundation in natural streams: Integration of remote sensing and data-driven models. Artif. Intell. Rev. 54 , 4619–4651 (2021).

Najafzadeh, M., Ghaemi, A. & Emamgholizadeh, S. Prediction of water quality parameters using evolutionary computing-based formulations. Int. J. Environ. Sci. Technol. 16 , 6377–6396 (2019).

Najafzadeh, M. & Basirian, S. Evaluation of river water quality index using remote sensing and artificial intelligence models. Remote Sens. 15 , 2359 (2023).

Chowdhury, M. A. Z. et al. Organophosphorus and carbamate pesticide residues detected in water samples collected from paddy and vegetable fields of the Savar and Dhamrai Upazilas in Bangladesh. Int. J. Environ. Res. Public Health 9 , 3318–3329 (2012).

Article CAS PubMed PubMed Central Google Scholar

Ahirvar, B. P., Das, P., Srivastava, V. & Kumar, M. Perspectives of heavy metal pollution indices for soil, sediment, and water pollution evaluation: An insight. Total Environ. Res. Themes 6 , 100039 (2023).

Chen, K., Liu, Q.-M., Peng, W.-H., Liu, Y. & Wang, Z.-T. Source apportionment of river water pollution in a typical agricultural city of Anhui province, Eastern China using multivariate statistical techniques with apcs-mlr. Water Sci. Eng. 16 , 165–174 (2023).

Fan, S. et al. Improved multi-criteria decision making method integrating machine learning for patent competitive potential evaluation: A case study in water pollution abatement technology. J. Clean. Prod. 403 , 136896 (2023).

Wang, Z., Wang, C. & Liu, Y. Evaluation for the nexus of industrial water-energy-pollution: Performance indexes, scale effect, and policy implications. Environ. Sci. Policy 144 , 88–98 (2023).

Asomaku, S. O. Quality assessment of groundwater sourced from nearby abandoned landfills from industrial city in Nigeria: Water pollution indices approach. HydroResearch 6 , 130–137 (2023).

Balaram, V., Copia, L., Kumar, U. S., Miller, J. & Chidambaram, S. Pollution of water resources and application of icp-ms techniques for monitoring and management: A comprehensive review. Geosyst. Geoenviron. 2 , 100210 (2023).

Yuan, F., Huang, Y., Chen, X. & Cheng, E. A biological sensor system using computer vision for water quality monitoring. Ieee Access 6 , 61535–61546 (2018).

Borzooei, S. et al. Impact evaluation of wet-weather events on influent flow and loadings of a water resource recovery facility. In New Trends in Urban Drainage Modelling: UDM 2018 11 706–711 (Springer, 2019).

Noori, R. et al. Decline in Iran’s groundwater recharge. Nat. Commun. 14 , 6674 (2023).

Article ADS CAS PubMed PubMed Central Google Scholar

Yeganeh-Bakhtiary, A., EyvazOghli, H., Shabakhty, N., Kamranzad, B. & Abolfathi, S. Machine learning as a downscaling approach for prediction of wind characteristics under future climate change scenarios. Complexity 2022 , 8451812 (2022).

Jeihouni, M., Toomanian, A. & Mansourian, A. Decision tree-based data mining and rule induction for identifying high quality groundwater zones to water supply management: a novel hybrid use of data mining and gis. Water Resour. Manag. 34 , 139–154 (2020).

Lee, K.-J. et al. The combined use of self-organizing map technique and fuzzy c-means clustering to evaluate urban groundwater quality in Seoul Metropolitan City, South Korea. J. Hydrol. 569 , 685–697 (2019).

Article ADS CAS Google Scholar

Agrawal, P. et al. Exploring artificial intelligence techniques for groundwater quality assessment. Water 13 , 1172 (2021).

Wang, Y. et al. Monthly water quality forecasting and uncertainty assessment via bootstrapped wavelet neural networks under missing data for Harbin, China. Environ. Sci. Pollut. Res. 20 , 8909–8923 (2013).

El Bilali, A., Taleb, A. & Brouziyne, Y. Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agric. Water Manag. 245 , 106625 (2021).

Arabgol, R., Sartaj, M. & Asghari, K. Predicting nitrate concentration and its spatial distribution in groundwater resources using support vector machines (svms) model. Environ. Model. Assess. 21 , 71–82 (2016).

Sajedi-Hosseini, F. et al. A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci. Total Environ. 644 , 954–962 (2018).

Article ADS CAS PubMed Google Scholar

Ransom, K. M., Nolan, B. T., Stackelberg, P., Belitz, K. & Fram, M. S. Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous united states. Sci. Total Environ. 807 , 151065 (2022).

Yadav, B., Gupta, P. K., Patidar, N. & Himanshu, S. K. Ensemble modelling framework for groundwater level prediction in urban areas of India. Sci. Total Environ. 712 , 135539 (2020).

Tomić, A. Š, Antanasijević, D., Ristić, M., Perić-Grujić, A. & Pocajt, V. A linear and non-linear polynomial neural network modeling of dissolved oxygen content in surface water: Inter-and extrapolation performance with inputs’ significance analysis. Sci. Total Environ. 610 , 1038–1046 (2018).

Zhi, W. et al. From hydrometeorology to river water quality: Can a deep learning model predict dissolved oxygen at the continental scale?. Environ. Sci. Technol. 55 , 2357–2368 (2021).

Srinivas, R., Bhakar, P. & Singh, A. P. Groundwater quality assessment in some selected area of Rajasthan, India using fuzzy multi-criteria decision making tool. Aquat. Procedia 4 , 1023–1030 (2015).

Haghibi, A. H., Nasrolahi, A. H. & Parsaie, A. Water quality prediction using machine learning. J. Water Qual. Res. 53 , 3–13 (2018).

Liu, M. & Lu, J. Support vector machine-an alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river?. Environ. Sci. Pollut. Res. 21 , 11036–11053 (2014).

Chen, K. et al. Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Res. 171 , 115454 (2020).

Sagan, V. et al. Monitoring inland water quality using remote sensing: Potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing. Earth-Sci. Rev. 205 , 103187 (2020).

Wu, Y., Zhang, X., Xiao, Y. & Feng, J. Attention neural network for water image classification under iot environment. Appl. Sci. 10 , 909 (2020).

Pu, F., Ding, C., Chao, Z., Yu, Y. & Xu, X. Water-quality classification of inland lakes using landsat8 images by convolutional neural networks. Remote Sens. 11 , 1674 (2019).

Donnelly, J., Daneshkhah, A. & Abolfathi, S. Forecasting global climate drivers using gaussian processes and convolutional autoencoders. Eng. Appl. Artif. Intell. 128 , 107536 (2024).

Abolfathi, S., Cook, S., Yeganeh-Bakhtiary, A., Borzooei, S. & Pearson, J. Microplastics transport and mixing mechanisms in the nearshore region. Coast. Eng. Proc. https://doi.org/10.9753/icce.v36v.papers.63 (2021).

Stride, B., Abolfathi, S., Odara, M. G. N., Bending, G. D. & Pearson, J. Modeling microplastic and solute transport in vegetated flows. Water Resour. Res. 59 , e2023WR034653. https://doi.org/10.1029/2023WR034653 (2023).

Unacademy (2022).

Başağaoğlu, H. et al. A review on interpretable and explainable artificial intelligence in hydroclimatic applications. Water 14 , 1230 (2022).

Habib, M., O’Sullivan, J., Abolfathi, S. & Salauddin, M. Enhanced wave overtopping simulation at vertical breakwaters using machine learning algorithms. PLoS ONE 18 , e0289318 (2023).

Mpia, H., Mburu, L. & Mwendia, S. Applying data mining in graduates’ employability: A systematic literature review. Int. J. Eng. Pedag. 13 , 86–108. https://doi.org/10.3991/ijep.v13i2.33643 (2023).

Raileanu, L. E. & Stoffel, K. Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 41 , 77–93. https://doi.org/10.1023/b:amai.0000018580.96245.c6 (2004).

Article MathSciNet Google Scholar

Gulati, P., Sharma, A. & Gupta, M. Theoretical study of decision tree algorithms to identify pivotal factors for performance improvement: A review. Int. J. Comput. Appl. 141 , 19–25. https://doi.org/10.5120/ijca2016909926 (2016).

Tangirala, S. Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. Int. J. Adv. Comput. Sci. Appl. 11 , 110277. https://doi.org/10.14569/ijacsa.2020.0110277 (2020).

Xu, P. Review on studies of machine learning algorithms. J. Phys. 1187 , 052103. https://doi.org/10.1088/1742-6596/1187/5/052103 (2019).

Purwanto, A. D., Wikantika, K., Deliar, A. & Darmawan, S. Decision tree and random forest classification algorithms for mangrove forest mapping in Sembilang National Park, Indonesia. Remote Sens. 15 , 16. https://doi.org/10.3390/rs15010016 (2022).

Huang, H. et al. A new fruit fly optimization algorithm enhanced support vector machine for diagnosis of breast cancer based on high-level features. BMC Bioinform. https://doi.org/10.1186/s12859-019-2771-z (2019).

Ji, Y. & Sun, S. Multitask multiclass support vector machines: Model and experiments. Pattern Recogn. 46 , 914–924. https://doi.org/10.1016/j.patcog.2012.08.010 (2013).

Übeyli, E. D. ECG beats classification using multiclass support vector machines with error correcting output codes. Dig. Signal Process. 17 , 675–684. https://doi.org/10.1016/j.dsp.2006.11.009 (2007).

Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20 , 273–297. https://doi.org/10.1007/bf00994018 (1995).

Ye, F., Lou, X. Y. & Sun, L. F. An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications. PLoS ONE 12 , e0173516. https://doi.org/10.1371/journal.pone.0173516 (2017).

Peng, C.-Y.J., Lee, K. L. & Ingersoll, G. M. An introduction to logistic regression analysis and reporting. J. Educ. Res. 96 , 3–14. https://doi.org/10.1080/00220670209598786 (2002).

Park, H.-A. An introduction to logistic regression: From basic concepts to interpretation with particular attention to nursing domain. J. Korean Acad. Nurs. 43 , 154. https://doi.org/10.4040/jkan.2013.43.2.154 (2013).

Article PubMed Google Scholar

Chen, H., Hu, S., Hua, R. & Zhao, X. Improved Naive Bayes classification algorithm for traffic risk management. EURASIP J. Adv. Signal Process. https://doi.org/10.1186/s13634-021-00742-6 (2021).

Shen, J. & Fang, H. Human activity recognition using gaussian Naïve Bayes algorithm in smart home. J. Phys. 1631 , 012059. https://doi.org/10.1088/1742-6596/1631/1/012059 (2020).

Gramegna, A. & Giudici, P. SHAP and LIME: An evaluation of discriminative power in credit risk. Front. Artif. Intell. https://doi.org/10.3389/frai.2021.752558 (2021).

Zaremba, L., Zaremba, C. S. & Suchenek, M. Modification of shapley value and its implementation in decision making. Found. Manag. 9 , 257–272. https://doi.org/10.1515/fman-2017-0020 (2017).

Krishnan, S. R. et al. Smart water resource management using artificial intelligence;a review. Sustainability https://doi.org/10.3390/su142013384 (2022).

Download references

Acknowledgements

Acknowledgements should be brief, and should not include thanks to anonymous referees and editors, or effusive comments. Grant or contribution numbers may be acknowledged.

Author information

Authors and affiliations.

School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, 632014, India

M. K. Nallakaruppan, M. Lawanya Shri & Sweta Bhattacharya

Department of Computer Science, Loyola College, Chennai, Tamil Nadu, 600034, India

E. Gangadevi

Shiv Nadar University, Delhi-NCR, 201314, India

Balamurugan Balusamy

School of Built Environment, Engineering and Computing, Leeds Beckett University, Leeds, LS13HE, UK

Shitharth Selvarajan

Department of Computer Science, Kebri Dehar University, Kebri Dehar, Ethiopia

You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally in this research work.

Corresponding author

Correspondence to Shitharth Selvarajan .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Nallakaruppan, M.K., Gangadevi, E., Shri, M.L. et al. Reliable water quality prediction and parametric analysis using explainable AI models. Sci Rep 14 , 7520 (2024). https://doi.org/10.1038/s41598-024-56775-y

Download citation

Received : 21 October 2023

Accepted : 11 March 2024

Published : 29 March 2024

DOI : https://doi.org/10.1038/s41598-024-56775-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Water Quality
Explainable AI
Water Quality Prediction

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Skip to main content
Accessibility information

Enlighten Enlighten

Enlighten Theses

Latest Additions
Browse by Year
Browse by Subject
Browse by College/School
Browse by Author
Browse by Funder
Login (Library staff only)

In this section

Evaluation of sampling and monitoring designs for water quality

Haggarty, Ruth Alison (2012) Evaluation of sampling and monitoring designs for water quality. PhD thesis, University of Glasgow.

Assessing water quality is of crucial importance to both society and the environment. Deterioration in water quality through issues such as eutrophication presents substantial risk to human health, plant and animal life, and can have detrimental effects on the local economy. Long-term data records across multiple sites can be used to investigate water quality and risk factors statistically, however, identification of underlying changes can only be successful if there is a sufficient quantity of data available. As vast amounts of resources are required for the implementation and maintenance of a monitoring network, logistically and financially it is not possible to employ continuous monitoring of all water environments. This raises the question as to the optimal design for long-term monitoring networks which are capable of capturing underlying changes. Two of the main design considerations are clearly where to sample, and how frequently to sample. The principal aim of this thesis is to use statistical analysis to investigate frequently used environmental monitoring networks, developing new methodology where appropriate, so that the design and implementation of future networks can be made as effective and cost efficient as possible. Using data which have been provided by the Scottish Environment Protection Agency, several data from Scottish lakes and rivers and a range of determinands are considered in order to explore water quality monitoring in Scotland. Chapter 1 provides an introduction to environmental monitoring and both existing statistical techniques, and potential challenges which are commonly encountered in the analysis of environmental data are discussed. Following this, Chapter 2 presents a simulation study which has been designed and implemented in order to evaluate the nature and statistical power for commonly used environmental sampling and monitoring designs for surface waters. The aim is to answer questions regarding how many samples to base the chemical classification of standing waters, and how appropriate the currently available data in Scotland are for detecting trends and seasonality. The simulation study was constructed to investigate the ability to detect the different underlying features of the data under several different sampling conditions. After the assessment of how often sampling is required to detect change, the remainder of the thesis will attempt to address some of the questions associated with where the optimal sampling locations are. The European Union Water Framework Directive (WFD) was introduced in 2003 to set compliance standards for all water bodies across Europe, with an aim to prevent deterioration, and ensure all sites reach `good' status by 2015. One of the features of the WFD is that water bodies can be grouped together and the classification of all members of the group is then based on the classification of a single representative site. The potential misclassification of sites means one of the key areas of interest is how well the existing groups used by SEPA for classification capture differences between the sites in terms of several chemical determinands. This will be explored in Chapter 3 where a functional data analysis approach will be taken in order to investigate some of the features of the existing groupings. An investigation of the effect of temporal autocorrelation on our ability to distinguish groups of sites from one another will also be presented here. It is also of interest to explore whether fewer, or indeed more groups would be optimal in order to accurately represent the trends and variability in the water quality parameters. Different statistical approaches for grouping standing waters will be presented in Chapter 4, where the question of how many groups is statistically optimal is also addressed. As in Chapter 3, these approaches for grouping sites will be based on functional data in order to include the temporal dynamics of the variable of interest within any analysis of group structure obtained. Both hierarchical and model based functional clustering are considered here. The idea of functional clustering is also extended to the multivariate setting, thus enabling information from several determinands of interest to be used within formation of groups. This is something which is of particular importance in view of the fact that the WFD classification encompasses a range of different determinands. In addition to the investigation of standing waters, an entirely different type of water quality monitoring network is considered in Chapter 5. While standing waters are assumed to be spatially independent of one another there are several situations where this assumption is not appropriate and where spatial correlation between locations needs to be accounted for. Further developments of the functional clustering methods explored in Chapter 4 are presented here in order to obtain groups of stations that are not only similar in terms of mean levels and temporal patterns of the determinand of interest, but which are also spatially homogenous. The river network data explored in Chapter 5 introduces a set of new challenges when considering functional clustering that go beyond the inclusion of Euclidean distance based spatial correlation. Existing methodology for estimating spatial correlation are combined with functional clustering approaches and developed to be suitable for application on sites which lie along a river network. The final chapter of this thesis provides a summary of the work presented and discussion of limitations and suggestions for future directions.

Actions (login required)

Downloads per month over past year

View more statistics

The University of Glasgow is a registered Scottish charity: Registration Number SC004401

Evaluating River Water Quality Modelling Uncertainties at Multiple Time and Space Scales

--> Camacho Suarez, Vivian V. (2020) Evaluating River Water Quality Modelling Uncertainties at Multiple Time and Space Scales. PhD thesis, University of Sheffield.

Maintaining healthy river ecosystems is crucial for sustaining human needs and biodiversity. Therefore, accurately assessing the ecological status of river systems and their response to short and long-term pollution events is paramount. Water quality modelling is a useful tool for gaining a better understanding of the river system and for simulating conditions that may not be obtained by field monitoring. Environmental models can be highly unreliable due to our limited knowledge of environmental systems, the difficulty of mathematically and physically representing these systems, and limitations to the data used to develop, calibrate and run these models. The extensive range of physical, biochemical and ecological processes within river systems is represented by a wide variety of models: from simpler one-dimensional advection dispersion equation (1D ADE) models to complex eutrophication models. Gaining an understanding of uncertainties within catchment water quality models across different spatial and temporal scales for the evaluation and regulation of water compliance is still required. Thus, this thesis work 1) evaluates the impact of parameter uncertainty from the longitudinal dispersion coefficient on the one-dimensional advection-dispersion model and water quality compliance at the reach scale and sub-hourly scale, 2) evaluates the impact of input data uncertainty and the representation of ecological processes on an integrated catchment water quality model, and 3) evaluates the impact of one-dimensional model structures on water quality regulation. Findings from this thesis stress the importance of longitudinal mixing specifically in the sub daily time scales and in-between 10s of meters to 100s of meters. After the sub daily time scale, other biological and ecological processes become more important than longitudinal mixing for representing the seasonal dynamics of dissolved oxygen (DO). The thorough representation of the dominant ecological processes assists in obtaining accurate seasonal patterns even under input data variability. Furthermore, the use of incorrect model structures for water quality evaluation and regulation leads to considerable sources of uncertainty when applying duration over threshold regulation within the first 100s of meters and sub hourly time scale.

--> Vivian V Camacho Suarez PhD Thesis --> -->

Filename: Vivian V Camacho Suarez PhD Thesis.pdf

Embargo Date:

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy. You can contact us about this thesis . If you need to make a general enquiry, please see the Contact us page.

Environmental Pollution in the Moscow Region According to Long-term Roshydromet Monitoring Data

Published: 02 November 2020
Volume 45 , pages 523–532, ( 2020 )

Cite this article

G. M. Chernogaeva 1 , 2 ,
L. R. Zhuravleva 1 ,
Yu. A. Malevanov 1 ,
N. A. Fursov 3 ,
G. V. Pleshakova 3 &
T. B. Trifilenkova 3

111 Accesses

Explore all metrics

Long-term Roshydromet monitoring data (2009–2018) on the pollution of the atmosphere, soil, and surface water are considered for the Moscow region (Moscow city within its new boundaries and the Moscow oblast). The air quality in the megacity (Moscow) and in background conditions (Prioksko-Terrasny Reserve) is compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Air Quality in Russian Cities for 1991–2016

N. N. Klyuev

Spatial and temporal look at ten-years air quality of Istanbul city

Sibel Mentese & Seda Özgur Ogurtani

Author information

Authors and affiliations.

Izrael Institute of Global Climate and Ecology, 107258, Moscow, Russia

G. M. Chernogaeva, L. R. Zhuravleva & Yu. A. Malevanov

Institute of Geography, Russian Academy of Sciences, 119017, Moscow, Russia

G. M. Chernogaeva

Central Administration for Hydrometeorology and Environmental Monitoring, 127055, Moscow, Russia

N. A. Fursov, G. V. Pleshakova & T. B. Trifilenkova

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. M. Chernogaeva .

Additional information

About this article

Chernogaeva, G.M., Zhuravleva, L.R., Malevanov, Y.A. et al. Environmental Pollution in the Moscow Region According to Long-term Roshydromet Monitoring Data . Russ. Meteorol. Hydrol. 45 , 523–532 (2020). https://doi.org/10.3103/S1068373920080014

Download citation

Received : 06 February 2020

Revised : 06 February 2020

Accepted : 06 February 2020

Published : 02 November 2020

Issue Date : August 2020

DOI : https://doi.org/10.3103/S1068373920080014

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Anthropogenic environmental pollution
Surface water
Urbanized areas
Background conditions
Find a journal
Publish with us
Track your research

WaterQualityWatch -- Continuous Real-Time Water Quality of Surface Water in the United States

What is the USGS? What is continuous RTWQ? How are sites selected? Why continuous and real time? How are these data used? What are these measurements? How are monitors maintained? What is a surrogate? -->

State Links to Surrogates and Reports

North Dakota
South Dakota

Technical Resources

U.S. Geological Survey
USGS Publications
Water Resources Discipline
Water Quality
Field Protocols
Monitor Operations Protocol
National Water Information System
Nitrate Monitor O&M

Search USGS Publications

The "Real-time" map tracks short-term changes (over several hours) of water quality. Although the general appearance of the map changes very little from one hour to the next, individual sites may change rapidly in response to major rain events or to reservoir releases. The data used to produce this map are provisional.

Animate national map by current Month , or last 12 months

Accessibility FOIA Privacy Policies and Notices

IMAGES

👍 Water quality research paper. Thesis Statement Examples For Water
(PDF) Predicting Water Quality using WSN and Machine Learning Supervised By
(PDF) Water quality investigations of the River Lea (NE London)
(PDF) The Role of Complexity in Addressing the Water Quality Challenge
Assessment of Water Quality Parameters.pdf
(PDF) Analysis of Water Quality Parameters: A review

VIDEO

Architect's Thesis ( Water Treatment Center )
Modern Research Suggests that Water is more Powerful

COMMENTS

(PDF) An Introduction to Water Quality Analysis
People use water for various activities, and the two main problems man contends with water are the quantity and quality of water [2]. The characteristics of water that relate to its fitness for a ...
PDF Thesis Produced Water Quality Characterization and Prediction For
THESIS PRODUCED WATER QUALITY CHARACTERIZATION AND PREDICTION FOR WATTENBERG FIELD Submitted by Huishu Li Department of Civil and Environmental Engineering In partial fulfillment of the requirements For the Degree of Master of Science Colorado State University Fort Collins, Colorado Spring 2013 Master's Committee: Advisor: Kenneth H. Carlson
Evaluating Drinking Water Quality Using Water Quality Parameters and
Water is a vital natural resource for human survival as well as an efficient tool of economic development. Drinking water quality is a global issue, with contaminated unimproved water sources and inadequate sanitation practices causing human diseases (Gorchev & Ozolins, 1984; Prüss-Ustün et al., 2019).Approximately 2 billion people consume water that has been tainted with feces ().
PDF Assessment of Drinking Water Quality Using Water Quality ...
The water quality index (WQI) model is a commonly helpful technique for evaluating surface and groundwater quality. The model mainly employs aggregation techniques to diminish large amounts of data to a sole value. The WQI model has been used across the globe to assess ground and surface water using regional standards.
Full article: Overview of water quality modeling
2. Significance of water quality modeling. Water quality management is an essential component of overall integrated water resources management (UNESCO, Citation 2005).The output of the model for diﬀerent pollution scenarios with water quality models is an imperative component of environmental impact assessment (Q. Wang et al., Citation 2013).Sound water quality is very limited in the world ...
PDF Investigating Perceptions of Well Water Quality in Rural Alberta
safeguard or improve water quality, lies with well owners. The purpose of this thesis was to 1. Describe the perceptions, knowledge, and beliefs rural Albertan residents have of well water quality and whether they associate livestock farming with water well contamination. 2.
A comprehensive review of water quality indices (WQIs ...
Water quality index (WQI) is one of the most used tools to describe water quality. It is based on physical, chemical, and biological factors that are combined into a single value that ranges from 0 to 100 and involves 4 processes: (1) parameter selection, (2) transformation of the raw data into common scale, (3) providing weights and (4) aggregation of sub-index values. The background of WQI ...
PDF Modeling Hydrologic and Water Quality Responses to Changing Climate and
hydrology and water quality. In this study, the future potential impacts of LULC and climate change on the hydrologic regimes and water quality in Wolf Bay watershed, South Alabama ... support and dedication to the completion of this thesis. Thanks are due to Shufen Pan's group for providing land use/cover map of 2005. I also want to thank Dr ...
Assessment and modeling of groundwater quality by using water quality
Water Quality Index (WQI) approach is utilized with the groundwater parameters and spatial distribution maps have been developed using GIS for the obtained indexes. The anthropogenic activities may be the likely cause of poor water quality. The north and north-west regions are influenced by anthropogenic inputs from the leaching of landfill and ...
A review of water quality index models and their use for assessing
The Water Quality Guidelines Task Group of the Canadian Council of Ministers of the Environment developed the CCME WQI in 2001 (Saffran et al., 2001) following a review and revision of the BCWQI model (Lumb et al., 2011). The BCWQI model has been recognized since in 1990 by the CCME (Dunn, 1995). In recent times models such as the Liou Index ...
Reliable water quality prediction and parametric analysis using
Water is a common and crucial resource shared among all humans, animals, and plants and is a necessity for all species. Each one of these species has its own respective needs for water quality.
Assessment of physicochemical and bacteriological water quality of
The TH is represented in terms of CaCO 3 which is a main water quality parameter. In the present study, the values for the TH varied from 40 mg/L (Kundi in semi-dry season) to 215 mg/L (Dinki in semi-dry season). All the results were within the standard limits of drinking water quality set by the WHO (300 mg/L) and national standard (392 mg/L).
Groundwater quality assessment using water quality index (WQI) under
The water quality index (WQI) has been applied to categorize the water quality viz: excellent, good, poor, etc. which is quite useful to infer the quality of water to the people and policy makers in the concerned area. The WQI in the study area ranges from 4.75 to 115.93. The overall WQI in the study area indicates that the groundwater is safe ...
Evaluating Drinking Water Quality Using Water Quality Parameters and
The study considered a combination of users' perceptions with the measured water quality parameters determined using the water quality index (WQI) tool. Data were collected using a cross-sectional research design for a household survey, and water quality samples were collected from improved and unimproved alternative sources.
Evaluation of sampling and monitoring designs for water quality
The final chapter of this thesis provides a summary of the work presented and discussion of limitations and suggestions for future directions. Item Type: Thesis (PhD) Qualification Level: Doctoral. Keywords: monitoring, water quality, river networks, cluster analysis, functional data analysis, Water Framework Directive. Subjects:
PDF Assessment of Water Quality Using Multivariate Statistical Techniques
Proposal of my thesis would not have been possible without the enthusiastic help of Dr. Ruan. I would also like to thank Dr. Lizhu Wang who has ... water quality assessment using multivariate statistical techniques. In this study, water quality data sets obtained during 2008-2010 in the Ying River basin were analyzed . 4
PDF A Study on the Water Quality of NIT Rourkela
Classification of the water according to hardness. 30 List of Figures Page No. Fig.2.1. Map of NIT Rourkela 8 Fig.4.1. Average Temperature of tap water from different areas during winter 26 Fig.4.2. Average pH of the water samples from different areas. 27 Fig.4.3. Average Turbidity of the water samples from different areas.
Evaluating River Water Quality Modelling Uncertainties at Multiple Time
Thus, this thesis work 1) evaluates the impact of parameter uncertainty from the longitudinal dispersion coefficient on the one-dimensional advection-dispersion model and water quality compliance at the reach scale and sub-hourly scale, 2) evaluates the impact of input data uncertainty and the representation of ecological processes on an ...
ANNUAL WATER QUALITY REPORT
We are once again proud to present to you our annual water quality report. This edition covers all testing completed from January 1 through December 31, 2020. Over the years, we have dedicated ourselves to producing drinking water that meets all state and federal drinking water standards. We continually strive to adopt new and better methods ...
Environmental Pollution in the Moscow Region According to Long-term
Abstract Long-term Roshydromet monitoring data (2009-2018) on the pollution of the atmosphere, soil, and surface water are considered for the Moscow region (Moscow city within its new boundaries and the Moscow oblast). The air quality in the megacity (Moscow) and in background conditions (Prioksko-Terrasny Reserve) is compared.
Real-time water quality
The "Real-time" map tracks short-term changes (over several hours) of water quality. Although the general appearance of the map changes very little from one hour to the next, individual sites may change rapidly in response to major rain events or to reservoir releases. The data used to produce this map are provisional.

Reliable water quality prediction and parametric analysis using explainable AI models

Similar content being viewed by others

A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble

Source discrimination of mine water based on the random forest method

Computational assessment of groundwater salinity distribution within coastal multi-aquifers of Bangladesh

Introduction

Advantages of the proposed model

Contributions of the paper

Organization of the paper

Related works

Statement of objectives

Case studies

Materials and methods

Hydro-climatic application

Groundwater potential predictions

Water quality predictions

Flood hazard risk predictions

Environmental impact assessment

System model and architecture

Decision tree

Random forest

Support vector machine (SVM)

Logistic regression

Naive Bayesian classification

LIME (Local interpretable model-agnostic explanations)

SHAP (SHAPELY Additive exPlanations)

Global unity

Training and re-training

Subjective or quantitative

Confusing solids

Environmental challenges

Water quality and industrial sustainability

Research finding of the proposed work

Data availability

Acknowledgements

Author information

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Share this article

Quick links

Enlighten Theses

Evaluation of sampling and monitoring designs for water quality

Actions (login required)

Evaluating River Water Quality Modelling Uncertainties at Multiple Time and Space Scales

--> Vivian V Camacho Suarez PhD Thesis --> -->

Environmental Pollution in the Moscow Region According to Long-term Roshydromet Monitoring Data

Cite this article

Access this article

Similar content being viewed by others

Air Quality in Russian Cities for 1991–2016

Spatial and temporal look at ten-years air quality of Istanbul city

Author information

Corresponding author

Additional information

About this article

Share this article

State Links to Surrogates and Reports

Technical Resources

Other Links

Search USGS Publications

IMAGES

VIDEO

COMMENTS