Help | Advanced Search

Quantitative Biology > Populations and Evolution

Title: modeling biological networks: from single gene systems to large microbial communities.

Abstract: In this research, we study biological networks at different scales: a gene autoregulatory network at the single-cell level and the gut microbiota at the population level. Proteins are the main actors in cells, they are the building blocks, act as enzymes and antibodies. The production of proteins is mediated by transcription factors. In some cases, a protein acts as its own transcription factor, this is called autoregulation. It is known that autorepression speeds up the response and that autoactivation can lead to multiple stable equilibria. In this thesis, we study the effects of the combination of activation and repression in autoregulation, as a case study we investigate the possible dynamics of the leucine responsive protein B of the archaeon Sulfolobus solfataricus (Ss-LrpB), a protein that regulates itself in a unique and non-monotonic way via three binding boxes. We examine for which conditions this type of network leads to oscillations or bistability. In the second part, much larger biological systems are considered. Ecological systems, among which the human gut microbiome, are characterized by heavy-tailed abundance profiles. We study how these distributions can arise from population-based models by adding saturation effects and linear noise. Moreover, we examine different characteristics of experimental time series of microbial communities, such as the noise color and neutrality of the biodiversity, and look at the influence of the parameters on these characteristics. With the first research topic we want to lay a foundation for the understanding of non-monotonic gene regulation and take the first steps toward synthetic biology in archaea. In the second part of the thesis, we investigate experimental time series from complex ecosystems and seek theoretical models reproducing all observed characteristics in view of building predictive models.

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • Open access
  • Published: 24 November 2022

Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons

  • Guo Mao 1 ,
  • Ruigeng Zeng 1 ,
  • Jintao Peng 1 ,
  • Zhengbin Pang 1 &
  • Jie Liu 1 , 2  

BMC Bioinformatics volume  23 , Article number:  503 ( 2022 ) Cite this article

2510 Accesses

3 Citations

6 Altmetric

Metrics details

Building biological networks with a certain function is a challenge in systems biology. For the functionality of small (less than ten nodes) biological networks, most methods are implemented by exhausting all possible network topological spaces. This exhaustive approach is difficult to scale to large-scale biological networks. And regulatory relationships are complex and often nonlinear or non-monotonic, which makes inference using linear models challenging.

In this paper, we propose a multi-layer perceptron-based differential equation method, which operates by training a fully connected neural network (NN) to simulate the transcription rate of genes in traditional differential equations. We verify whether the regulatory network constructed by the NN method can continue to achieve the expected biological function by verifying the degree of overlap between the regulatory network discovered by NN and the regulatory network constructed by the Hill function. And we validate our approach by adapting to noise signals, regulator knockout, and constructing large-scale gene regulatory networks using link-knockout techniques. We apply a real dataset (the mesoderm inducer Xenopus Brachyury expression) to construct the core topology of the gene regulatory network and find that Xbra is only strongly expressed at moderate levels of activin signaling.

We have demonstrated from the results that this method has the ability to identify the underlying network topology and functional mechanisms, and can also be applied to larger and more complex gene network topologies.

Peer Review reports

The growth and development of organisms and their responses to internal and external stimuli are controlled by complex internal regulatory mechanisms, including the gene level. Gene regulation network is the mapping of complex regulation mechanism in organism at gene level. At the molecular level and in the microscopic domain, the function of genes is understood as the interaction behavior of complex networks. Cell function is controlled by the interconnections between gene expression mechanisms and gene regulation. The mapping between gene interactions and functions is one of the main research topics in systems biology [ 1 ].

Cellular networks undergo steady-state or oscillatory stimulation signals, which provide a way to reconstruct network topology. To understand how the interrelationships of genes in living organisms respond accurately to external signals and perform their functions robustly. For example, the adaptive function of cells [ 2 ] refers to the ability of the system to respond to signal changes and return to the pre-stimulated level, which is the key for living systems to perceive large-scale changes [ 3 ]. The transient nature of this stimulus response is important to prevent cells from experiencing uncontrolled proliferation or apoptosis [ 4 ]. For example, nuclear enrichment of MAP kinase Hog1 completely adaptes to changes in external osmotic pressure and is robust to very low signal fidelity and operating noise [ 5 ].

In the construction of small networks, enumeration search [ 6 ] has obvious effect on listing all possible network topology modules, but in larger and more complex networks, enumeration method is difficult to calculate. At present, the models used for gene regulation network modeling mainly include the following: Boolean network, Bayesian network, differential equation, etc [ 7 ]. Boolean network is a relatively simple model, and the simulation of the system is fixed and relatively rough; Bayesian network is a probabilistic model that can quantitatively and randomly describe the control network; Differential equations can quantitatively and accurately predict the system behavior; Modeling and reconstructing gene regulatory networks from time series data, most of the existing methods [ 8 , 9 , 10 , 11 , 12 ] are based on ordinary differential equations (ODE).

Ordinary differential equation models include linear differential equations and nonlinear differential equations. Linear differential equation models have been used to infer large-scale gene regulatory networks due to their simple structure and few parameters and expression data. For example, Matsumoto et al. [ 13 ] proposed the SCODE algorithm based on linear ordinary differential equations to study gene regulatory network information related to the process of cell differentiation. They first performed single-cell sequencing on individual cells, and then used the algorithm to assess differences in expression patterns between individual cells. Aubin et al. [ 14 ] proposed the GRISLI method that infers a velocity vector fields in the space of scRNA-seq data from profiles of individual cells, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression procedure. Although linear regulatory functions can describe network regulatory system, gene regulatory networks are mostly nonlinear. Many classical nonlinear differential equations that conform to the laws of biochemistry have been proposed to infer GRN, such as S-system model [ 15 ], Hill function. In recent years, the S-system model has been widely utilized to infer GRN and biochemical reactions, which follows the theory of a biological system [ 16 ], Since the structure of the S-system model is fixed, heuristic search algorithms have been used to search for the optimal parameters of the S-system model. , such as differential evolution (DE) [ 17 ], cooperative coevolutionary algorithm [ 18 ], sensitivity-based incremental evolution method [ 19 ], bat algorithm (BA) [ 20 ], immune algorithm (IA) [ 21 ], firefly algorithm [ 22 ], dissipative particle swarm optimization (DPSO) [ 23 ], cockroach genetic algorithm (CGA) [ 24 ], hybrid algorithm based on genetic algorithm (GA) and PSO [ 25 ].

Hidde De Jong [ 7 ] proposed the Hill function as a regulatory function. Hill functions are considered suitable for building GRN models with ODEs [ 7 , 26 ]. They can quantify activation and inhibition effects of genes. The regulating function can also be sigmoid function [ 27 ] commonly used in neural networks, referred to as S-type function, whose input and output characteristics are usually expressed by logarithmic curve or tangent curve. It introduces the necessary nonlinearity and defines an upper bound on the rate of change in molecular concentration. The advantage of this neural network-based differential equation model is that a large number of effective learning algorithms have been developed for the learning of parameters in the regulatory network. For example, Mattias Wahde [ 28 ] provided a differential equation system based on feedback neural network, the regulation function is the commonly used logarithmic Sigmoid activation function, and the parameter estimation adopts genetic algorithm. Shen et al. [ 29 ] believed that deep learning could be used to search network topology more effectively and train deep neural network to find satisfactory network topology by relying on trajectory and error. The idea is to learn differential equations in data, i.e. use a neural network to train an accurate potential unknown dynamical system [ 30 , 31 ], the model is difficult to be extended to a large number of genes, so it is difficult to describe the complex behavior of the system.

Inspired by the above methods, we propose a multi-layer perceptron-based differential equation method, which specifically transforms the gene regulation network(GRN) system into an input-output regression problem, where the input is gene expression data and the output is the derivative estimated from the expression data. Our method utilizes time-series gene expression data to train a regulatory function that simulates the transcription rate of a gene, which is a fully connected neural network(NN) with a four-layer structure. The fully connected neural network is trained by using the gene expression of the previous moment to predict the gene expression of the next moment, and using the loss function between the obtained prediction result and the real gene expression for feedback training. After training the model, the link knockout technique is used to set the expression value of a gene to 0 and determine the regulatory relationship between genes by looking at the influence of the gene on the synthesis rate (see Materials, Methods and Results for a detailed description). Figure 1 a illustrates the detailed work of the overall framework. Figure 1 b is used as an example to fully understand the composition of the regulatory relationship between the three genes. The control variable method is used to obtain the relationship between the synthesis rate and the gene over time. When the synthesis rate of gene 1 is restricted to 0, That is, taking gene 1 as the stimulus signal, looking at the changes of the three genes and their corresponding synthesis rates over time, and obtaining the final regulatory relationship between individual genes through the cross-sectional view of the fully connected neural network.

In this paper, we verify whether the regulatory network constructed by the NN method can continue to achieve the expected biological function by verifying the degree of overlap between the regulatory network discovered by NN and the regulatory network constructed by the Hill function (HF). Moreover, our method is verified by three cases: adaptive noise signal, link knockout, and using link knockout technology to build large-scale gene regulatory network. And apply the real dataset (the mesoderm inducer Xenopus Brachyury (XBra) expression) to construct the core topology of gene regulatory network. The resulting network topology can be intuitively explained by the concentration changes between genes, and many target functions can be achieved by comparing the resulting network with existing biological networks.

figure 1

Fully linked neural network. a Schematic diagram of the fully connected neural network training synthetic term f. b Three genes are used as an example to illustrate. The synthetic term \(f_2\) ( \(f_3\) ) of \(g_2\) ( \(g_3\) ) is evaluated by a fully connected network. \(f_2\) and \(f_3\) (wheat and light blue) can depend on all three variables: \(g_3\) , \(g_2\) and the input signal \(g_1\)

Simulation studies

In this section, we conduct some simulation studies to empirically evaluate our proposed framework under different settings. In what follows, we demonstrate our proposed ability to construct gene regulatory networks in three scenarios: One is a regulatory network that can adapt to the influence of Gaussian white noise, and the other is the simulation of link knockout. The regulatory network obtained by training NN is redundant, and the core gene regulatory network can be obtained by link knockout. The last is to use linked knockouts to construct large-scale gene regulatory networks.

Case one: adaptation

Since adaptive systems often operate in noisy environments, exploring the adaptive properties and noise immunity of the network is the main goal of this section. It can be seen from Fig.  1 that the fully connected neural network (NN) transfers hidden information from the previous time point to the next time point, and we ask the output node \(g_3\) to perform the adaptation function (Fig.  1 b). The input node \(g_2\) has no functional limitation, but can play an adjustment role. The input node \(g_1\) is used as the input signal I .

Therefore, as shown in Fig.  2 (1), the time evolution of the input signal ( I ) after adding noise, the expression level of \(g_3\) (blue line) is basically consistent with its target time course value (blue line) Dotted line), with a fast response phase and a slower recovery phase. And it can be intuitively seen in Fig.  2 that \(f_3\) and \(f_2\) change with the increase or decrease of the input \(g_3\) , \(g_2\) , I , it is easy to know the adjustment logic hidden in NN, and directly construct Gene regulatory network (Fig. 2 (3)). Our method achieves a reliable response to Gaussian white noise.

figure 2

Time evolution process under noisy conditions. (1) Under the stimulus after adding Gaussian white noise to the input signal (the red line of Input), without any constraints on g 2 , the time evolution curves g 3 and g 2 obtained after training the NN, and the expression level of g 3 (blue line) is the same as The target time progress value (target’s blue dotted line) basically matches. (2) Cross-section information obtained by training a NN under noisy conditions. Three panels show the dependence of \(f_3\) , \(f_2\) on I , \(g_3\) or \(g_2\) with the other two variables fixed. (3) The regulatory network obtained from (2)

In the context of biological regulatory networks, we need to verify whether the resulting network is biologically feasible. Here, in order to check whether the network obtained by the NN model (Fig.  1 b) is reliable, the f term in Equation 2 is represented by the hill function, which is widely used in biological regulation modeling. The obtained gene regulatory network (Fig.  1 b(3)) can be successfully transferred to the hill function model (Table 1 ), and the expected adaptive function can be achieved. Similarly, The obtained gene regulatory network (Fig.  2 (3)) can be successfully transferred to the hill function model (Table 2 ).

Case two: link-knockout knockout

According to Formula 5 , the regulation logic of the gene regulatory network can be obtained by knocking out the difference before and after gene expression obtained by a certain edge, such as knocking out the regulatory link from \(g_3\) to \(g_2\) , by setting \(g_3\) to 0 when calculating \(f_2\) . Four examples of link knock-out with \(\lambda\) set to 0 are shown in Fig. 3 (1). The first panel shows that \(g_3 = 0\) is entered into NN to obtain the value of \(f_2\) . \(g_2\) expression level increases from the lighter orange line to the brighter orange line, indicating that \(g_3\) inhibits \(g_2\) . The second panel shows that \(g_2\) is set to 0 and input into NN to obtain \(f_3\) . In the figure, the expression level of \(g_3\) decreases from lighter blue to darker blue, indicating that \(g_2\) stimulates \(g_3\) . In the third panel, \(g_3\) was set to 0 and input into NN to obtain \(f_3\) . The expression level of \(g_3\) increased from lighter blue to darker blue, indicating \(g_3\) self-inhibition, and in the same way, the fourth panel showed \(g_2\) self-activation.

figure 3

Simulate regulator knockout. (1) The perturbed f function can be iterated to simulate the effect of mutants in which specific regulatory chains are deleted. For example, deletion of \(g_3\) ’s modulating effect on \(g_3\) leads to an increase in \(g_3\) (from darker to brighter solid green lines), indicating self-inhibition (shown in the third panel). The difference in \(g_2\) levels is not important here (dotted line). A similar argument applies to the other three panel. (2) Describes the sensitivity of network sequence and adapt to the error. Pane shows the knockout technology through links, step 1 to 4 of the evolution process of the network topology. The minimum incoherent feed forward motif appears naturally (topology #4), before the network has too few links to adapt

As above mentioned in this paper, through the order to remove unnecessary link to adjust repeatedly sparse network. The Fig.  3 (2) depicts the sensitivity(response peak) and adaptive error(difference between the pre-stimulus and the fully adapted \(g_3\) levels) of the network sequence. The network shown in Panel #1 in Fig.  3 (2) includes a basic adaptive function: incoherent feed-forward loops. #1 is the regulatory network learned by NN without any constraints, which has redundancy. By linking knockout technology, applications to the existing links knockout, find the smallest change after deleting network, links to knock out after retraining within NN can get effective gene regulation network of sparse #4 (in sensitivity and adaptation error is equal). #4 is the adaptive function with the least links to achieve the minimum incoherent feedforward network.

Case three: Large-scale gene regulation networks are constructed using link knockout techniques

To apply to large-scale data, we evaluate the performance of our model using two datasets, each containing time-series expression profiles. Time-series data reflects how the network responds to perturbations and how it recovers after the perturbations are removed. The first one is the simulation data, we choose the InSilico_Size100 dataset from the DREAM4 In Silico Network Challenge [ 32 ]. The second one is from the real dataset, we select a large-scale E. coli dataset (GSE20305) from the Gene Expression Omnibus (GEO) database [ 33 ]. The gold standard benchmark for E.coli consists of part from DREAM5 challenge [ 34 ] and other experimentally verified part from RegulonDB [ 35 ]. GSE20305 [ 33 ] provides real gene time-series data of E. coli under different experimental environments. We choose the data under the three conditions (cold stress, heat stress and oxidative stress) to make up our experimental dataset. The specific information of each dataset is shown in Table  3 .

To evaluate the effectiveness of our method on the datasets in Table  3 , Serveral methods are chosen as baselines as follows:

GENIE3 [ 36 ]: an approach to infer gene regulatory networks from gene expression data. It trains a random forest model that predicts the expression of each gene in the dataset and uses the expression of transcription factors (TFs) as input.

BiXGBoost [ 37 ]: it is a bidirectional-based method by considering both candidate regulatory genes and target genes for a specific gene. Moreover, BiXGBoost utilizes time information efficiently and integrates XGBoost to evaluate the feature importance.

SIGNET [ 38 ]: a deep learning-based framework for capturing complex regulatory relationships between genes under the assumption that the expression levels of transcription factors participating in gene regulation are strong predictors of the expression of their target genes.

GNIPLP [ 39 ]: an approach to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses.

PoLoBag [ 40 ]: it is an ensemble regression algorithm in a bagging framework where Lasso weights estimated on bootstrap samples are averaged. These bootstrap samples incorporate polynomial features to capture higher-order interactions.

For a fair comparison of the above methods in this experiment, we always use the default parameters when running the program. We systematically evaluate the model using seven evaluation metrics, namely True Positive Rate(TPR), False positive rate(FPR), Matthews correlation coefficient (MCC), Accuracy(ACC), F-measure(F1), Area Under the Receiver Operating Characteristic curve(AUROC), Area under the precision-recall curve (AUPR). Experimental results for each method provide all predicted edges and their corresponding weights. The higher the weight, the higher the credibility of the regulatory relationship. Since different thresholds construct different GRN, the FPR, TPR, MCC, ACC and F1 measures are also correspondingly different.

The experiments are first performed on the DREAM4 InSilico_Size100 five networks. The edge weights predicted by all methods are sorted, and the first 250 predicted values are set to 1, and the other predicted values are set to 0. The following five indicators are calculated as shown in Table  4 . The results in Table  4 show that our method outperforms the comparative methods, which indicates that our method can construct regulatory networks of time-series gene expression data by linking knockout techniques. In order to comprehensively consider the experimental results under different thresholds, we choose AUROC and AUPR values as evaluation criteria. Due to the randomness of the NN, the results will be different from run to run. In our experiments, these methods are ran 10 times and the results are presented in Fig.  4 . For Network 4, the GENIE3 method outperforms the rest on AUROC. In InSilico_Size100 Networks 1–3 and 5, our method has higher average AUROC, and the average AUPR number on Networks 1–5 is better than other methods.

figure 4

The AUROC and AUPR of GENIE3, BiXGBoost, SIGNET, GNIPLP, PoLoBag and our methods on DREAM4 InSilico_Size100 five networks

For the E.coli dataset, we sorted the edges predicted by all methods and set the predicted value of the first 3080 edges to 1, and the value of the other predicted edges to 0. The FPR, TPR, MCC, ACC, and F1 measures are calculated between the predicted labels and the ground truth labels. The results are shown in Table  5 . As shown in Table  5 , Our methods perform the best. In order to consider the case of different thresholds, we show the results of ten average runs of all methods in Fig.  5 . In particular, all available methods obtain worse results with less than 0.05 AUPR values on E.coli network (Fig.  5 ). This is due to the fact that AUPR tends to present smaller values on large-scale networks. Compared with the other five methods, our method achieves the best AUROC and the best AUPR. To test the efficiency of our method, we compare the running time of the six methods on a 32GB RAM, Intel(R) Xeon(R) CPU E5-2630 computer. The comparison results on the DREAM4 InSilico_Size100 and E.coli datasets are shown in Table  6 . The table shows the average running time values of the six algorithms executed 10 times. Our method is relatively faster than other state-of-the-art methods.

figure 5

E. coli network including 1484 genes. Each bar represents the performance of one method in which the abscissas are the corresponding AUROC (right) and AUPR (left) values

Application to the Xenopus Brachyury (XBra)

In this section, real data are used to demonstrate the effectiveness of our method in a complex situation—the Activin/GSC/Xbra System. This is a well-researched system, including experiments and modeling [ 41 ]. Here, the fully connected data network model is used to solve the inverse problem, finding the core gene regulatory network given the observed gene expression of the Xenopus Brachyury as the desired output. The results obtained by NN were compared with known biological networks.

Gene regulation in Fig. 6 follows the NN modeling in Fig. 1 a, where g and f are three-dimensional vectors (Activin, Goosecold, Xbra) and input signal \(g_1\) (Bcd). The expression of three genes is taken from the study of Green et al. [ 41 ], and the morphogenetic gradient (Bcd) is regarded as static. The results of 40 repetitions of NN training were all overlapped with the target graph (as shown in Fig.  6 ). When using the link knockout method, the gene regulatory network obtained is consistent with the known network structure. In Tables  7 , the frequency with which the link is activated, non-existent, and inhibited by 40 repetitions of training is listed, and in accordance with the majority coloring (orange activated, blue inhibited), the majority network (Fig.  6 on the right) has a very similar structure to the known biological network revealed in experiment [ 41 ]. This experiment helps demonstrate the effectiveness of our method on real data.

figure 6

the activin/gsc/Xbra system. The Activin gene was activated by the input signal of morphogenetic gradient (Bcd), so it began to imitate its gradient mode. The Activin gene activated Xbra gene and opened the positive feedback of Xbra gene at a certain threshold. The Activin gene activates the Goosecold gene, and when the concentration of the two genes accumulates high enough, it forces the Xbra gene down. However, the concentration is highest only on the left side, when the concentration of Goosecold gene is low and its inhibitory effect is low, so that Xbra gene reaches a stable state

In this paper, we propose a multi-layer perceptron-based differential equation method, which operates by training a fully connected neural network (NN) to simulate the transcription rate of genes in traditional differential equations. From the dataset validation results, our algorithm is superior to other methods, and its good performance is attributed to the use of neural networks to simulate unknown dynamical systems. This has many advantages. First, there is no detailed mathematical equation format for using the input-output function of a multilayer perceptron. Training a neural network is to establish the necessary logical connections between input and output nodes, without specific constraints. Second, fully connected neural networks can speed up model training and scale to large-scale complex gene regulatory networks. Finally, neural networks are well suited for building gene regulatory networks on time-series gene expression data due to their limited short-term memory advantage.

Our goal is to visually explain how gene regulatory networks (GRNs) achieve concentration-dependent responses. However, the number of different mechanisms that may exist in cells, such as feedback or local cell-cell communication, is unclear. Some well-defined biological functions may have broad kinetic interpretations (even for relatively simple three-gene networks and limited forms of modeling). There are more complex cellular processes that cannot simply be attributed to activating or repressive regulation. The structure of the regulatory network itself should be re-explored in a more comprehensive context. The method developed here can provide ideas for further exploration of reconstructed gene regulatory networks in the future, and interesting future research topics can apply our method to different real-world biological and biomedical data problems.

Conclusions

A long-standing question in biology is how complex biological networks perform complex regulatory functions. One strategy is to exhaustively search all possible biological networks for single or multiple functions, which is only suitable for small gene networks. For a biological network of four genes, the computational complexity of the exhaustive search method is enormous. In this study, we propose a multi-layer perceptron-based differential equation method. Figure 1 a illustrates the specific work of the whole framework. Our method utilizes time-series gene expression data to train a regulatory function that simulates the transcription rate of a gene, which is a fully connected neural network (NN) with a four-layer structure. The fully connected neural network is trained by using the gene expression of the previous moment to predict the gene expression of the next moment, and using the loss function between the obtained prediction result and the real gene expression for feedback training. After the model is obtained after training, the link knockout technique is used to set the expression value of a gene to 0, and the regulatory relationship between genes can be judged by looking at the effect of the gene on the synthesis rate.

First we verify the adaptation function of our method. The adaptive function is performed by training a NN, and our method also performs well in the presence of Gaussian white noise on the internal and external stimulus signals. Then, through the link knockout technique, redundant links are eliminated from the gene regulatory network trained by NN, and an effective core gene regulatory network is finally obtained. Finally, to validate our approach on large-scale datasets, we use InSilico_Size100 time series simulation data and E.coli real datasets. Our model is compared with three state-of-the-art regression models on these two datasets. Experiments show that our method performs well in all six networks, which proves the good scalability and adaptability of our method. In addition to validating on a large-scale real dataset, we also validate our method on a real dataset (Xenopus laevis) with five genes to demonstrate its effectiveness. Our method can help discover the regulatory logic and network topology of complex tasks. For the resulting network topologies, it is possible to intuitively explain how their structures generate their functions, thus linking network topology to function.

Nonlinear ordinary differential equation models

In the gene regulation system, the time effect variable xi is used to represent the expression level of the ith gene at time t, and the value of this variable is non-negative. Then, the regulatory relationship between n genes in the system can be expressed by ordinary differential equations: One is a regulatory network that can adapt to the influence of gaussian white noise, and the other is the simulation of link knockout. The regulatory network obtained by training NN has redundancy, and the most core gene regulatory network can be obtained by link knockout. The last is the use of linked knockouts to build large-scale gene regulatory networks.

The above equations are also called kinetic equations. where \(\frac{dx_i}{dt}\) indicates the rate of change of the expression level of the i -th gene at time t , \({x_1},{x_2},\ldots ,{x_n}\) represents the expression level of each gene. Therefore, the expression change rate of the i -th gene at time t depends on the expression levels of other genes, including its own expression level \(x_i\) . The structure of the function \({f_i}({x_1},{x_2},\ldots ,{x_n})\) on the right-hand side of Equation 1 indicates the internal regulatory mechanism between genes, that is, the structure of the regulatory network.

In most cases, the interactions between genes exhibit complex nonlinear relationships. At this time, the nonlinear regulation function \(f_i(x)\) can better explain the real situation in the organism, it is usually considered that the function f is a continuously differentiable and monotonically increasing bounded function. Here, we use the hill function to model the complex GRN structure. The dynamics of this GRN can be modeled as:

Here \(h_{ij}^ + = \frac{{{b_{ij}}g_j^n}}{{K_{ij}^n + g_j^n}}\) represents the activation item, and \(h_{il}^ - = \frac{{K_{il}^n}}{{K_{il}^n + g_l^n}}\) represents the inhibitory item. For simplicity, we set the Hill coefficient \(n = 2\) in the enumeration study. Each activation link \(h_{ij}^ +\) has two parameters K and b , while the inhibitory link \(h_{il}^ -\) has only one parameter K . For each network topology, the network topology is considered ’successful’ when the parameters ( K and b ) are sampled independently of the exponential distribution \(p(x) = {e^{ - x}}\) (refer to the study of Ehsan et al. [ 42 ]), 100,000 groups of random parameters are sampled and no less than 2 groups of parameters are obtained. The exhaustive search of hill function model used in this paper is only a verification step, and some false positives do not affect our conclusions.

Fully connected neural network model

In this paper, multilayer perceptron is used to obtain the gene synthesis rate f in time series gene expression data. In biological cells, the regulatory relationship between genes may be time-lag. Therefore, the input layer of multi-layer perceptron in our algorithm is the expression level of all genes at the t time point, and the output layer is the synthesis rate f of corresponding genes at the t time point. In this paper, the activation function of the hidden layer shown in Fig.  1 a is ReLU, and the activation function of the output layer is sigmoid, so the value of the synthesis rate f is between 0 and 1. They are respectively expressed as:

As shown in Fig.  1 b using three genes as an example, the ordinary differential equation of the corresponding gene regulatory network is expressed as:

where the \(f_i(g_1,g_2,g_3)\) function contains the regulatory relationship between genes. The Euclidean distance between \({\hat{g}}(t + dt)\) calculated by the ordinary differential equation formula in Fig.  1 a and the expression level of gene \(g(t + dt)\) (time step is 1) at the next moment used as the training loss function of NN, \(Loss = \root 2 \of { {\textstyle \sum _{t}}({\hat{g}}(t + dt) - g(t+dt))^{2} }\) . \(\lambda {g_i}\) represents the degradation term, we simply set \(\lambda =1\) . In reality the degradation term can be represented by the diagonal term of the synthetic term f .

In Fig.  1 b, three genes are used as an example to demonstrate that our method can intuitively construct gene regulatory networks. Taking \(g_1\) as the stimulus signal, the neural network training principle is shown in Fig.  1 b(1), and Fig.  1 b(2) represents the cross-sectional information obtained by training the NN. When \(f_1=0\) is satisfied, what is shown in Fig.  1 b(2) is that \(f_2\) (blue dotted line) and \(f_3\) (orange dotted line) increase with the increase of \(g_1\) , that is, \(g_1\) activates \(g_2\) , and \(g_1\) activates \(g_3\) , inhibited by \(g_3\) after \(g_3\) reaches a steady state. The regulatory network extracted from the information of Fig.  1 b(2) is composed of an incoherent feedforward loop, and b(3) is the regulatory network obtained from the cross-sectional information of the neural network of Fig.  1 b(2).

Link knockout technique

For regulatory networks with many more genes, direct visualization of the f -function is difficult. Once we have a predictive model between all the genes and the synthetic term f , we question which genes in the gene pool have a strong influence on the synthetic term f . Therefore, we introduced the linked knockout technique, which passes raw data to the data for gene knockout, i.e., sets the expression of one gene at a time to 0, and uses the expression of the remaining genes as input to predict a specific synthetic term expression status. Therefore, this method can effectively improve the ability of constructing the regulatory network without reading the weight of NN. A disadvantage of this method is that when the synthesis rate of \(g_j\) is strongly inhibited by the highly expressed \(g_i\) gene. that is, when the expression of gene \(g_i\) is set to 0, the value of synthesis rate \(f_j\) obtained through neural network training will be very large. Therefore, a more accurate measure of the change in \(f_j\) with a fold change in \(g_i\) is:

This formula represents the link knockout experiment. \(\mu\) represents discount factor, \(\mu = 0\) represents link knockout. We truncated the domain where transcription factor i binds to gene j . With the regulatory link from node i to j being knocked down by a factor \(\mu\) , the NN output (synthesis term \(f_j\) ) changes accordingly. \(\Delta _{ij}\) reflect the regulation effect of \(g_i\) on \(g_j\) . A more intuitive example in Fig.  3 depicting the mutational trajectory where the regulatory link from \(g_3\) to \(g_2\) is knocked out is given by:

Figure  3 (1) first panel shows that the increase of \(g_3\) level means that \(g_2\) negatively regulates or inhibits \(g_3\) , whereas the decrease of \(g_3\) level means that \(g_2\) positively regulates or promotes \(g_3\) .

Availability of data and materials

Our source codes and data can be found in URL: https://github.com/lhfkd/NNGRN . InSilico_Size100 time series data is automatically generated by GeneNetWeaver 2.0 ( http://gnw.sourceforge.net/ ).

Zhang W, Fang J-A, Tang Y. Robust stability for genetic regulatory networks with linear fractional uncertainties. Commun Nonlinear Sci Numer Simul. 2012;17(4):1753–65. https://doi.org/10.1016/j.cnsns.2011.09.026 .

Article   Google Scholar  

Ma W, Trusina A, El-Samad H, Lim WA, Tang C. Defining network topologies that can achieve biochemical adaptation. Cell. 2009;138(4):760–73. https://doi.org/10.1016/j.cell.2009.06.013 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Gardner ST, Cantor RC, Collins JJ. Construction of a genetic toggle switch in Escherichia coli . Nature. 2000;403:339–42. https://doi.org/10.1038/35002131 .

Article   CAS   PubMed   Google Scholar  

Ferrell JE. Perfect and near-perfect adaptation in cell signaling. Cell Syst. 2016;2(2):62–7. https://doi.org/10.1016/j.cels.2016.02.006 .

Muzzey D, Gómez-Uribe CA, Mettetal JT, van Oudenaarden A. A systems-level analysis of perfect adaptation in yeast osmoregulation. Cell. 2009;403:160–71. https://doi.org/10.1016/j.cell.2009.04.047 .

Article   CAS   Google Scholar  

Qiao L, Zhao W, Tang C, Nie Q, Zhang L. Network topologies that can achieve dual function of adaptation and noise attenuation. Cell Syst. 2019;17(9):271–85. https://doi.org/10.1016/j.cels.2019.08.006 .

de Jong H. Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol. 2002;9(1):67–103. https://doi.org/10.1089/10665270252833208 .

Article   PubMed   Google Scholar  

Oates CJ, Dondelinger F, Bayani N, Korkola J, Gray JW, Mukherjee S. Causal network inference using biochemical kinetics. Bioinformatics. 2014;30(17):468–74. https://doi.org/10.1093/bioinformatics/btu452 .

Andrejr A, Dirk H, Marco G. Approximate bayesian inference in semi-mechanistic models. Stat Comput. 2017;27:1003–40. https://doi.org/10.1007/s11222-016-9668-8 .

Mangan NM, Brunton SL, Proctor JL, Kutz JN. Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Trans Mol Biol Multi-Scale Commun. 2016;2(1):52–63. https://doi.org/10.1109/TMBMC.2016.2633265 .

Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc Natl Acad Sci. 2016;113(15):3932–7. https://doi.org/10.1073/pnas.1517384113 .

Penfold CA, Shifaz A, Brown PE, Nicholson A, Wild DL. CSI: a nonparametric bayesian approach to network inference from multiple perturbed time series gene expression data. Stat Appl Genet Mol Biol. 2015;14(3):307–10. https://doi.org/10.1515/sagmb-2014-0082 .

Matsumoto H, Kiryu H, Furusawa C, Ko MSH, Ko SBH, Gouda N, Hayashi T, Nikaido I. SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics. 2017;33(15):2314–21. https://doi.org/10.1093/bioinformatics/btx194 .

Aubin-Frankowski P-C, Vert J-P. Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference. Bioinformatics. 2020;36(18):4774–80. https://doi.org/10.1093/bioinformatics/btaa576 .

Ren H-P, Huang X-N, Hao J-X. Finding robust adaptation gene regulatory networks using multi-objective genetic algorithm. IEEE/ACM Trans Comput Biol Bioinf. 2016;13(3):571–7. https://doi.org/10.1109/TCBB.2015.2430321 .

Savageau MA. Finding multiple roots of nonlinear algebraic equations using s-system methodology. Appl Math Comput. 1993;55(2):187–99. https://doi.org/10.1016/0096-3003(93)90020-F .

Koduru P, Das S, Welch S, Roe JL, Lopez-Dee ZP. A co-evolutionary hybrid algorithm for multi-objective optimization of gene regulatory network models. In: Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation. GECCO ’05, pp. 393–399. Association for Computing Machinery, New York, NY, USA 2005. https://doi.org/10.1145/1068009.1068073 .

Kimura S, Ide K, Kashihara A, Kano M, Hatakeyama M, Masui R, Nakagawa N, Yokoyama S, Kuramitsu S, Konagaya A. Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics. 2004;21(7):1154–63. https://doi.org/10.1093/bioinformatics/bti071 .

Hsiao Y-T, Lee W-P. Inferring robust gene networks from expression data by a sensitivity-based incremental evolution method. BMC Bioinform. 2012;13(7):8. https://doi.org/10.1186/1471-2105-13-S7-S8 .

Mandal S, Khan A, Saha G, Pal RK. Reverse engineering of gene regulatory networks based on s-systems and bat algorithm. J Bioinform Comput Biol. 2016;14(03):1650010. https://doi.org/10.1142/S0219720016500104 ( PMID: 26932274 ).

Nakayama T, Seno S, Takenaka Y, Matsuda H. Inference of s-system models of gene regulatory networks using immune algorithm. J Bioinform Comput Biol. 2011;09(supp01):75–86. https://doi.org/10.1142/S0219720011005768 .

Mandal S, Saha G, Pal RK. S-system based gene regulatory network reconstruction using firefly algorithm. In: Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT), 2015;1–5. https://doi.org/10.1109/C3IT.2015.7060217

Palafox L, Noman N, Iba H. Reverse engineering of gene regulatory networks using dissipative particle swarm optimization. IEEE Trans Evol Comput. 2013;17(4):577–87. https://doi.org/10.1109/TEVC.2012.2218610 .

Wu S-J, Wu C-T. Computational optimization for s-type biological systems: Cockroach genetic algorithm. Math Biosci. 2013;245(2):299–313. https://doi.org/10.1016/j.mbs.2013.07.019 .

Hsiao Y-T, Lee W-P. Reverse engineering gene regulatory networks: coupling an optimization algorithm with a parameter identification technique. BMC Bioinform. 2014;15(15):8. https://doi.org/10.1186/1471-2105-15-S15-S8 .

Karlebach G, Shamir R. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol. 2008;9(01):67–103. https://doi.org/10.1038/nrm2503 .

D’Haeseleer P. Reconstructing gene networks from large scale gene expression data. PhD thesis 2000. AAI9993496

Wahde M, Hertz J. Coarse-grained reverse engineering of genetic regulatory networks. Biosystems. 2000;55(1):129–36. https://doi.org/10.1016/S0303-2647(99)00090-8 .

Jingxiang S, Feng L, Yuhai T, Chao T. Finding gene network topologies for given biological function with recurrent neural network. Nat Commun. 2021;12(1):3125–37. https://doi.org/10.1038/s41467-021-23420-5 .

Raissi M, Perdikaris P, Karniadakis GE. Multistep neural networks for data-driven discovery of nonlinear dynamical systems. arXiv preprint 2018. arXiv:1801.01236

Rackauckas C, Ma Y, Martensen J, Warner C, Zubov K, Supekar R, Skinner D, Ramadhan A, Edelman A. Universal differential equations for scientific machine learning. arXiv preprint 2020. arXiv:2001.04385

Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci. 2010;107(14):6286–91. https://doi.org/10.1073/pnas.0913357107 .

Article   PubMed   PubMed Central   Google Scholar  

Jozefczuk S, Klie S, Catchpole G, Szymanski J, Cuadros-Inostroza A, Steinhauser D, Selbig J, Willmitzer L. Metabolomic and transcriptomic stress response of Escherichia coli . Mol Syst Biol. 2010;6(1):364. https://doi.org/10.1038/msb.2010.18 .

Marbach D, Costello JC, Robert Küffner NMV, Prill RJ, Camacho DM, Allison KR, Consortium TD, Kellis M, Collins JJ, Stolovitzky G. Wisdom of crowds for robust gene network inference. Nature Methods 2012;9(8):796–804. https://doi.org/10.1038/nmeth.2016

Gama-Castro S, Salgado H, Santos-Zavaleta A, Ledezma-Tejeida D, Muñiz-Rascado L, García-Sotelo JS, Alquicira-Hernández K, Martínez-Flores I, Pannier L, Castro-Mondragón JA, Medina-Rivera A, Solano-Lira H, Bonavides-Martínez C, Pérez-Rueda E, Alquicira-Hernández S, Porrón-Sotelo L, López-Fuentes A, Hernández-Koutoucheva A, Moral-Chávez VD, Rinaldi F, Collado-Vides J. RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res 2015;44(D1):133–143. https://doi.org/10.1093/nar/gkv1156 .

Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE. 2010;5(9):12776. https://doi.org/10.1371/journal.pone.0012776 .

Zheng R, Li M, Chen X, Wu F-X, Pan Y, Wang J. BiXGBoost: a scalable, flexible boosting-based method for reconstructing gene regulatory networks. Bioinformatics. 2018;35(11):1893–900. https://doi.org/10.1093/bioinformatics/bty908 .

Luo Q, Yu Y, Lan X. SIGNET: single-cell RNA-seq-based gene regulatory network prediction using multiple-layer perceptron bagging. Brief Bioinform. 2021. https://doi.org/10.1093/bib/bbab547 .

Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics. 2021;37(16):2423–31. https://doi.org/10.1093/bioinformatics/btab099 .

Ghosh Roy G, Geard N, Verspoor K, He S. PoLoBag: polynomial Lasso Bagging for signed gene regulatory network inference from expression data. Bioinformatics. 2020;36(21):5187–93. https://doi.org/10.1093/bioinformatics/btaa651 .

Green J. Morphogen gradients, positional information, and xenopus: interplay of theory and experiment. Dev Dyn. 2002;225(4):392–408. https://doi.org/10.1002/dvdy.10170 .

Ehsan Elahi F, Hasan A. A method for estimating hill function-based dynamic models of gene regulatory networks. Royal Society Open Sci. 2018;5(2): 171226. https://doi.org/10.1098/rsos.171226 .

Download references

Acknowledgements

The authors would like to express thanks to Xinhai Chen, Jintao Peng and Ruigeng Zeng. And we would like to thank all authors of the cited references.

This research work was supported in part by the National Key Research and Development Program of China (2021YFB0300101).

Author information

Authors and affiliations.

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073, China

Guo Mao, Ruigeng Zeng, Jintao Peng, Ke Zuo, Zhengbin Pang & Jie Liu

Laboratory of Software Engineering for Complex System, National University of Defense Technology, Deya Road, Changsha, 410073, China

You can also search for this author in PubMed   Google Scholar

Contributions

GM, RZ, JP, and KZ contributed the idea, designed the study and revised the manuscript. GM and ZP implemented and performed most of the experiments. GM and JL wrote the manuscript. GM contributed to this work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jie Liu .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

Authors have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Mao, G., Zeng, R., Peng, J. et al. Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons. BMC Bioinformatics 23 , 503 (2022). https://doi.org/10.1186/s12859-022-05055-5

Download citation

Received : 16 July 2022

Accepted : 14 November 2022

Published : 24 November 2022

DOI : https://doi.org/10.1186/s12859-022-05055-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Fully connected neural network
  • Biological function
  • Differential equations
  • Dynamical systems
  • Link knockout

BMC Bioinformatics

ISSN: 1471-2105

biological networks thesis

  • Search Menu
  • Author Guidelines
  • Submission Site
  • Open Access
  • About Briefings in Bioinformatics
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Introduction, graph neural networks, applications in biology.

  • < Previous

Biological network analysis with deep learning

Giulia Muzio and Leslie O’Bray have contributed equally to this work.

  • Article contents
  • Figures & tables
  • Supplementary Data

Giulia Muzio, Leslie O’Bray, Karsten Borgwardt, Biological network analysis with deep learning, Briefings in Bioinformatics , Volume 22, Issue 2, March 2021, Pages 1515–1530, https://doi.org/10.1093/bib/bbaa257

  • Permissions Icon Permissions

Recent advancements in experimental high-throughput technologies have expanded the availability and quantity of molecular data in biology. Given the importance of interactions in biological processes, such as the interactions between proteins or the bonds within a chemical compound, this data is often represented in the form of a biological network. The rise of this data has created a need for new computational tools to analyze networks. One major trend in the field is to use deep learning for this goal and, more specifically, to use methods that work with networks, the so-called graph neural networks (GNNs). In this article, we describe biological networks and review the principles and underlying algorithms of GNNs. We then discuss domains in bioinformatics in which graph neural networks are frequently being applied at the moment, such as protein function prediction, protein–protein interaction prediction and in silico drug discovery and development. Finally, we highlight application areas such as gene regulatory networks and disease diagnosis where deep learning is emerging as a new tool to answer classic questions like gene interaction prediction and automatic disease prediction from data.

Understanding many biological processes requires knowledge not only about the biological entities themselves but also the relationships among them. For example, processes such as cell differentiation depend not only on which proteins are present, but also on which proteins bind together. A natural way to represent such processes is as a graph, also called a network, since a graph can model both entities as well as their interactions.

Recent advances in experimental high-throughput technology have vastly increased the data output from interaction screens at a lower cost and resulted in a large amount of such biological network data [ 1 ]. The availability of this data makes it possible to use biological network analysis to tackle many exciting challenges in bioinformatics, such as predicting the function of a new protein based on its structure or anticipating how a new drug will interact with biological pathways. This wealth of new data, combined with the recent advances in computing technology that has enabled the fast processing of such data [ 2 , p. 440], has reignited interest in neural networks [ 3–6 ], which date back to the 1970s and 1980s, and set the stage for the emergence of deep neural networks, a.k.a deep learning, as a new way to address these unsolved problems.

Deep learning is a neural network comprised of multiple layers with (often non-linear) activation functions, whose composition is able to model non-linear dependencies. This has shown empirically strong performance in multiple fields, such as image analysis [ 7 ] and speech recognition [ 8 ]. One of the strengths of deep learning is its ability to detect complex patterns in the data, making it well suited for application in bioinformatics, where the data represent complex, interdependent relationships between biological entities and processes, which are often intrinsically noisy and occurring at multiple scales [ 9 ]. Furthermore, deep learning methods have been extended to graph-structured data, making it a promising technology to tackle these biological network analysis problems. The early examples of applying deep learning to biological network data, detailed in this paper, have consistently reported comparable or better results than the existing classical machine learning methods, highlighting its potential in the field.

We begin this paper by introducing biological networks and describing typical learning tasks on networks. Subsequently, we will explain the core concepts underpinning deep learning on graphs, namely graph neural networks (GNNs). Finally, we will discuss the most popular application tasks for GNNs in bioinformatics.

Biological networks

DNA, RNA, proteins and metabolites have crucial roles in the molecular mechanisms of the cellular processes underlying life. Studying their structure and interactions is fundamental for a variety of reasons, including the development of new drugs and discovery of disease pathways. Both the structure and interactions of these entities can be represented using a graph, which is comprised of a set of nodes and a set of edges representing the connections between nodes. For example, molecules can be represented as a graph, where the nodes are the atoms and the edges are the bonds between the atoms. Similarly, many biological processes can be modeled with the entities as nodes and the interactions or relationships among them as edges. The aforementioned representation as a graph is convenient for a variety of reasons. Networks provide a simple and intuitive representation of heterogeneous and complex biological processes [ 10 ]. Moreover, it facilitates modeling and understanding complicated molecular mechanisms through the use of graph theory, machine learning and deep learning techniques.

As seen above, it is possible to define biological networks at different levels of detail. Besides the graph representation of biological actors used in investigating molecular properties and functions, other common biological networks include protein–protein interaction (PPI) networks, gene regulatory networks (GRN) and metabolic networks. Additionally, because of their relevance in contemporary health research, the above definition of a biological network is extended to include drug–drug interaction (DDI) networks. In the following, we provide a brief introduction to each of these networks.

Protein-Protein Interaction Networks PPI networks represent the interactions among proteins [ 11 ]. PPIs are essential for almost all cellular functions [ 12 ], ranging from the assembly of cell structural components, i.e. the cytoskeleton, to processes such as transcription, translation and active transport [ 13 ]. PPIs also include transient interactions, i.e. protein complexes that are formed and broken easily [ 14 ]. In PPI networks, nodes correspond to proteins while the edges define the interaction among connected proteins [ 15 ]. An exhaustive graph representation of PPIs would include also the type of the interaction, i.e. phosphorylation, or bond. However, in practice this is rarely captured.

Gene Regulatory Networks A GRN represents the complex mechanisms that regulate gene expression, the set of processes which leads to generating proteins from the DNA sequence [ 16 ]. Regulation mechanisms occur at different stages of protein production from DNA, such as during the transcription, translation and splicing phases. An intuitive explanation of these complex and interconnected mechanisms sees proteins both as the product and the controller of the gene expression [ 13 ]. In GRNs, each node represents a gene, and a directed link among two genes implies that one gene directly regulates the expression of the other without mediation from other genes [ 17 ].

Metabolic Networks Metabolic networks use graphs to represent metabolism, the set of all chemical reactions that occur within a living organism to maintain life. Metabolic actors are called metabolites, and they represent the intermediate and final products of metabolic reactions. Given their complexity, metabolic networks are usually decomposed into metabolic pathways, i.e. series of chemical reactions related to perform a specific metabolic function [ 18 ]. The graph representation of metabolism consists of mapping each metabolite to a node and each reaction to a directed edge labeled with the enzyme acting as the catalyst [ 19 ].

Drug–Drug Interaction Networks The objective of DDI networks is to model the interactions among different drugs [ 20 ]. A DDI network provides drugs as nodes and represents their interactions as edges. Unlike the previous networks, a DDI network does not represent a biological process. However, since it is a meaningful representation of knowledge about drug interactions, DDI networks are of increasing interest to researchers nowadays. Indeed, DDI networks are widely investigated for polypharmacy research [ 21 ].

As we have seen, biological networks are a rich way of representing biological data because they capture information not only about the entity itself but also the relationship between those entities. A large amount of information about these networks is already available, and we report on some of the most relevant biological network resources used in the reviewed methods in Table 1 . Besides being an effective representation of a biological process, biological networks also unlock a suite of methods available for drawing new insights from graph data. We will introduce the classical types of problems that can be formulated on such graph-structured data in the following section.

Resources of the most common biological networks that were used in the reviewed methods. We report the name, a short description, the website and which of the reviewed methods use them. The description indicates if the resource is a dataset (and therefore easy downloadable) or if it is a database accessible via web interface. The DrugBank database is included in two sections since it is used to collect the drug chemical structure and the information about DDIs.

Learning tasks on graphs

Learning tasks on graphs are at a high level categorized into node classification, link prediction, graph classification and graph embedding, though as we will discuss, approaches designed for one task can often be adapted to address multiple tasks. We will now explain each task in more detail.

Node Classification A typical task in biological network analysis is predicting the unknown function of a protein based on the functions of its neighbors in a PPI network. This problem, called node classification [ 77 ], is important when an input graph contains some nodes with labels, but many without, and the goal is to classify the remaining unlabeled nodes in the network. This is typically solved through some form of semi-supervised learning, where the algorithm uses the entire network as input during training with the goal of classifying all nodes. Although all nodes will be classified, the loss is calculated only on the nodes with a true label during training, thereby learning from the nodes with labels in order to classify the remaining unlabeled ones.

Link Prediction Current knowledge of interactions in biological networks is often incomplete, such as which genes regulate the expression of another in GRNs. Predicting these missing edges, i.e. link prediction [ 78 ], is a common task when working with such data, since it can be used to predict additional edges in a graph, or in the case of a weighted graph, the edge weight itself. This is also often framed as a semi-supervised learning problem, where the known links in a graph are used to predict where additional links may be present, similar to the node classification setup. Alternatively, link prediction can also be framed as a supervised learning problem, where after an embedding is learned for nodes, a secondary model is trained to predict whether there is a link between a given pair of nodes.

Graph Classification or Regression When the biological network data is comprised of multiple individual networks, such as a dataset of the 3D structure of molecules, the objective becomes predicting properties of each network, such as a molecule’s solubility or toxicity. This task, called graph classification [ 79 ], takes a dataset of graphs as its input, and then performs classification (or regression) for each individual graph. This is most commonly a supervised learning problem.

Graph Embedding Graph embedding [ 80–82 ] has the goal of finding a lower-dimensional, fixed-size vector representation of a graph, such as a PPI network, or an element within a network, such as a protein. This is typically achieved through unsupervised learning. Given the usefulness of representing nodes or graphs as a fixed-size vector, which enables a graph to use any off-the-shelf machine learning algorithm, learning a graph embedding is often used as a pre-processing step before using a standard machine learning algorithm for a particular task.

As described above, the graph representation of biological data enables the formulation of many classical learning tasks. While the high-throughput technology available today has resulted in a huge amount of such data, it has further underscored the need for novel computational methods to process and analyze it. These methods need to be both efficient, given the quantity of data, as well as high performing, in order to effectively replace previous methods. Deep learning can address both needs: it offers scalability for time-consuming tasks and has the potential for strong classification performance, as evidenced by strong performance gains in other fields. In the next section, we will discuss the principles and fundamental algorithms behind the deep learning approaches used on biological networks.

Deep learning methods operate on vector data, and since graph data cannot directly be converted to a vector, special methods are needed to adapt deep learning methods to work with graphs.

GNNs are a class of such methods that adapt neural network methods to work in the graph domain [ 83 ]. While the field of GNNs encompasses many different sub-architectures, such as recurrent GNNs [ 84 , 85 ], spatial-temporal GNNs [ 86 , 87 ] and graph autoencoders [ 83 ], we focus here on the ones that are currently used in biological network analysis, namely graph embedding techniques [ 80–82 ] and graph convolutional networks (GCNs) [ 83 ]. We note that although closely related to GNNs, graph embedding techniques are not always considered a subset of GNNs. However, network embedding is closely related and it is used frequently as one of the building blocks for the deep learning applications mentioned in this paper, so we will describe it under the umbrella categorization of GNNs. In this section, we will first present the critical notation used when working with graphs and present the fundamental graph embedding and GCN algorithms used in bioinformatics.

We will refer to a graph |$\mathcal{G} = (V, E)$|⁠ , as the set of vertices |$V$|⁠ , with |$|V|=n$|⁠ , and the set of edges |$E$|⁠ , where |$e_{ij} \in E$| indicates an edge between |$v_i$| and |$v_j$|⁠ . Each graph |$\mathcal{G}$| can be represented by its adjacency matrix |$\textbf{A} \in \mathbb{R}^{n \times n}$|⁠ . If the graph is unweighted and undirected, any edge |$e_{ij}$| will be denoted by a |$1$| at |$\textbf{A}_{ij}$| and |$\textbf{A}_{ji}$|⁠ . Graphs with node attributes store these values in an additional matrix |$\textbf{X} \in \mathbb{R}^{n \times d}$|⁠ , where |$d$| is the dimension of the node attributes. While this section deals primarily with homogeneous, unweighted and undirected graphs, it is worth noting the diversity of graph representations. Graphs can be heterogeneous, meaning that their nodes or edges can have multiple types, such as in a knowledge graph [ 88 ]. If |$\mathcal{G}$| is a weighted graph, the entry for edge |$e_{ij}$| in |$\textbf{A}$| will be the edge weight |$w_{ij}$|⁠ , and if |$\mathcal{G}$| is a directed graph, an edge |$e_{ij}$| does not imply an edge |$e_{ji}$|⁠ , meaning |$\textbf{A}$| is not necessarily symmetric.

This shows the process of learning a simple graph embedding using DeepWalk. From an input graph, a fixed number of random walks are generated from each node with a predetermined length. The embeddings for each node are then learned using the Skipgram objective, where a node on the random walk is given as input to a single layer neural network. The input is compressed down to an $d$-dimensional representation (here, $d=2$) with an embedding matrix $W \in \mathbb{R}^{\mid V \mid \times d}$, and then used to predict which nodes surround it on the walk. That is, a node $v_i$ is used to predict the surrounding nodes on the walk within a given context window (here, size two): $v_{i-2}, v_{i-1}, v_{i+1}$ and $v_{i+2}$. After training, this lower dimensional representation for each node, which can be easily retrieved from $W$, is then used as the embedding for each node. Note that DeepWalk chooses the next node in the random walk uniformly at random, and therefore can return to previous nodes in the walk, whereas node2vec introduces a parameter to control the probability of doing so.

This shows the process of learning a simple graph embedding using DeepWalk. From an input graph, a fixed number of random walks are generated from each node with a predetermined length. The embeddings for each node are then learned using the Skipgram objective, where a node on the random walk is given as input to a single layer neural network. The input is compressed down to an |$d$| -dimensional representation (here, |$d=2$|⁠ ) with an embedding matrix |$W \in \mathbb{R}^{\mid V \mid \times d}$|⁠ , and then used to predict which nodes surround it on the walk. That is, a node |$v_i$| is used to predict the surrounding nodes on the walk within a given context window (here, size two): |$v_{i-2}, v_{i-1}, v_{i+1}$| and |$v_{i+2}$|⁠ . After training, this lower dimensional representation for each node, which can be easily retrieved from |$W$|⁠ , is then used as the embedding for each node. Note that DeepWalk chooses the next node in the random walk uniformly at random, and therefore can return to previous nodes in the walk, whereas node2vec introduces a parameter to control the probability of doing so.

Fundamental algorithms for deep learning on graphs

We will now detail two sub-fields that are widely used in bioinformatics today: graph embedding and GCNs, which in addition to being the most widely used architectures in bioinformatics, are the fundamental building blocks of many other GNN architectures. The algorithms that we will present can be used to solve the learning tasks presented in the introduction, namely node classification, link prediction, graph classification/regression and graph embedding.

Graph embedding

While graph embedding is often not strictly considered as a subset of GNNs, it is intertwined with them, and given its importance for other GNNs and bioinformatics, is considered in detail here. Graph embedding approaches seek to learn a low-dimensional vector representation of a graph or elements of a graph, such as its nodes. This embedding is typically then re-purposed for use in node or graph classification, or link prediction tasks.

Graph convolutional networks

GCNs are a subset of GNNs that adapt the highly successful convolutional neural network (CNN) architecture [ 92 ] to work on graph-structured data. Whereas CNNs, which are often used with images, are able to leverage the spatial information and relationships captured in an image, due to the fact that a set of images can be defined on the same regular grid, the ordering of a graph’s adjacency matrix is arbitrary, and thus cannot not directly translate to the CNN framework. GCN methods define and use a spectral- or spatial-based convolution over the graph, providing a graph domain analog to the image convolution in CNNs.

Spectral methods, first introduced by Bruna et al. [ 93 ] and later Defferrard et al. [ 94 ], build a convolution by creating a spectral filter defined in the Fourier domain using the graph Laplacian. However, due to the computational complexity of the eigendecomposition of the graph Laplacian necessary for spectral methods, many more methods have been developed using spatial methods, where the idea is to learn an embedding for each node by aggregating its neighborhood in each successive layer in the network. By using a permutation-invariant function for the aggregation step, such as the sum or the mean, one can circumvent the problem of the arbitrary ordering of an adjacency matrix, which was what prevents a graph from using a standard CNN. Each additional layer incorporates information from further out neighborhoods; the |$k^{\textrm{th}} $| layer in the network corresponds to incorporating the |$k$| -hop neighborhood of a given node. Duvenaud et al. [ 95 ] was an early example of this, providing a permutation-invariant convolution that operates over all nodes in the graph, and in doing so, calculated the sum of the features of a node and its neighbors. While initially designed to retrieve a fixed size vector representation of a graph, i.e. an embedding of the graph, the actual method was trained on graph regression tasks.

Hamilton et al. [ 64 ] posit a similar idea with their GraphSAGE algorithm, but with the goal of learning a more generalizable and computationally efficient approach to the problem. While the initial goal is node embedding, this is again done with the end goal of another task, such as node classification or link prediction. They achieve this speedup by sampling a node’s neighbors, rather than taking the entire neighborhood, and by learning an aggregation function, for which they considered the mean, max and long-short term memory aggregator functions.

A visual depiction of a $k$-layer GCN. The input is the adjacency matrix $\textbf{A} \in \mathbb{R}^{n \times n}$ of a graph and node attribute matrix $\textbf{X} \in \mathbb{R}^{n \times d}$. Each layer of the GCN aggregates over the neighborhood of each node, using the node representations from the previous layer in the network. The aggregations in each layer then pass through an activation function (here, $ReLU$) before going to the next layer. This network can be used to produce various different outputs: for predicting new edges in the input network (link prediction), classifying individual nodes in the input graph (node classification), or classifying the entire input graph (graph classification). In order to perform graph classification, an additional readout step (here, the sum over all nodes) is required to map the output from $\mathbb{R}^{n \times c}$ to $\mathbb{R}^{c}$. The color represents the predicted classes for the respective entity in the output.

A visual depiction of a |$k$| -layer GCN. The input is the adjacency matrix |$\textbf{A} \in \mathbb{R}^{n \times n}$| of a graph and node attribute matrix |$\textbf{X} \in \mathbb{R}^{n \times d}$|⁠ . Each layer of the GCN aggregates over the neighborhood of each node, using the node representations from the previous layer in the network. The aggregations in each layer then pass through an activation function (here, |$ReLU$|⁠ ) before going to the next layer. This network can be used to produce various different outputs: for predicting new edges in the input network (link prediction), classifying individual nodes in the input graph (node classification), or classifying the entire input graph (graph classification). In order to perform graph classification, an additional readout step (here, the sum over all nodes) is required to map the output from |$\mathbb{R}^{n \times c}$| to |$\mathbb{R}^{c}$|⁠ . The color represents the predicted classes for the respective entity in the output.

Gilmer et al. [ 36 ] provide an interpretation of graph convolutions from a message passing point of view, where each node sends and receives messages from its neighbors, and in doing so is able to update the node state. At the end of the network there is a readout step that aggregates the node states to the appropriate level of output (e.g. from the node level to the graph level). Impressively, Gilmer et al. are able to show the direct translation of many of the papers mentioned here into their framework, and thus their neural message passing has become a leading paradigm in GNNs today. Furthermore, they test out various configurations of such a scheme and show the best configuration to predict molecular properties.

These approaches to GCNs can also be understood as a neural network analog to the Weisfeiler–Lehman kernel for measuring graph similarity [ 97 , 98 ], which is based on the classic Weisfeiler–Lehman test of isomorphism [ 99 ], a comparison which Kipf and Welling [ 96 ] and Hamilton et al. [ 64 ] make explicitly. By aggregating over all neighbors of a node, using the identity matrix for |$\textbf{W}$|⁠ , and setting |$\sigma $| to an appropriate hash function, one effectively recovers the Weisfeiler–Lehman algorithm. The adaptations in GCNs can therefore be seen as a differentiable and continuous extension of the Weisfeiler–Lehman algorithm and kernel.

In an entirely different approach to deep learning on graphs, Niepert et al. [ 30 ] solve the node correspondence problem by imposing an ordering upon the graph, and in doing so opens the door to utilize a more traditional CNN structure. Rather than using the full graph as input, it defines a common fixed-size representation for all graphs. The entries in the grid are filled by the |$j$| most important nodes in a graph, according to some predefined importance measure, as well as the |$k$| closest neighbors of each of the |$j$| nodes. Any corresponding node and/or edge attributes associated with the nodes in question can also be included. In doing so, graphs of different sizes are all standardized to the same size grid, which enables learning using a standard CNN filter.

In all these approaches, training is done by iteratively calculating a task-specific loss function over all relevant samples (such as the nodes with labels or the graphs). The loss is then propagated back through the network via backpropagation. The gradient of the weights |$\textbf{W}$| are calculated and |$\textbf{W}$| is correspondingly adjusted according to a pre-defined update equation.

In reviewing the different applications of deep learning on biological networks, we encountered varying degrees to which network information was included. We therefore had to define what constituted deep learning on a biological network. From the deep learning point of view, we defined this as learning approaches based on a hierarchy of non-linear functions. This review accordingly focuses on deep learning methods and does not summarize methods using classic machine learning algorithms, such as kernel methods, SVMs, random forests, etc, though we will discuss how the new deep learning methods perform relative to the classic counterparts. Secondly, we had to define what qualified as a biological network, since some methods can use features of a graph without explicitly leveraging the graph structure. As an example, one could build a feature vector based on the node label counts of amino acids in a protein. Whether to include an example such as this is not always straightforward. We ultimately decided to include any method that explicitly discussed or generated features from the graph properties as valid methods.

We will now discuss some of the main use cases of biological network analysis and deep learning. We begin with the more established practices, namely in protein analysis and drug development and discovery. We will then discuss the application areas in which deep learning is emerging as a competitive alternative to current methods, such as in disease diagnosis and the analysis of gene regulatory and metabolic networks. We provide information about the implementations of the various methods in Table 2 in the Supplementary Materials. In general, the performance of the reviewed methods have been assessed using a classic cross validation framework. Some papers go even further and use an additional external validation dataset to test the generalizability of the proposed approach. Furthermore, some works even validate the de novo prediction through literature research or by performing lab experiments. When either of these is the case, it is explicitly mentioned.

Proteins play a pivotal role in many biological processes, and thus better understanding their roles and interactions with one another is critical to answering a variety of biological questions. Deep learning has emerged as a promising new way to answer some of these classic questions. In this section we will focus on three main categories of deep learning tasks on proteins: predicting whether a pair of proteins will interact, determining the function of a given protein and predicting the 3D structure of proteins.

Protein interaction prediction

As mentioned in the introduction, nodes in a PPI network are proteins and the edges between nodes represent an interaction. Given a graph of proteins with edges representing known protein interactions, the goal is to predict what other pairs of proteins in the graph are also likely to interact. From a graph-theoretic point of view, this is a link prediction problem. Using GCNs enables these methods to directly incorporate network information, which is typically not included in classical machine learning methods. Traditionally, many methods use the primary structure of amino acid sequences in order to vectorize a protein and perform classification. However, the recent methods that leverage the graph structure have shown stronger performance compared to merely using the sequence information and is discussed in more detail below.

As a broader assessment of classic approaches, Yue et al. [ 41 ] evaluate state-of-the-art network-based methods from other fields on bioinformatics tasks, to provide a baseline performance from which the field should be improving upon. The approaches generally combine a network embedding with another deep learning approach in order to assess its performance on predicting links in a PPI network and concluded that the more recent neural network based embedding approaches showed the most potential on bioinformatics tasks, and outperformed the traditional methods.

Liu et al. [ 60 ] augment protein interaction prediction from a pure sequence-based vector approach to one that also incorporates network information using a GCN. They propose learning a representation of each node by using a generic GCN framework on a PPI with an encoding of the primary structure sequences of the protein. The representations of each pair of proteins are later used as the input to a deep neural network to predict whether a pair of proteins will interact. This approach extends the previous work of DeepPPI [ 59 ], which used deep learning on a vector summary of the protein sequences to predict links. DeepPPI outperformed classical methods such as SVM, random forest, and naive Bayes, across a variety of metrics including accuracy, precision and recall. Liu et al.’s model surpassed even DeepPPI’s performance, showing the value of incorporating the network information into the model.

Zhang and Kabuka [ 100 ] attempt to capture the complexity of protein data and directly use topological features by incorporating multiple modalities of the data, such as the first and second order similarity, and the homology features extracted from protein sequences. They pre-process the data by forming a vector summary for each protein based on features such as the amino acid composition and then use a combination of unsupervised and supervised learning approaches to predict the interaction. Besides having better accuracy and precision compared to classical methods such as nearest neighbor and naive Bayes, they also showed that their state-of-the-art prediction performance method was maintained across datasets from eight different species.

Protein function prediction

Another area of protein analysis lies in predicting the function of a protein, given that manual assessment of the large amounts of data resulting from high-throughput experiments is rather slow and costly. There are two typical ways in which this question is posed: as a node classification task or a graph classification task. As we will discuss below, the new deep learning methods reviewed here are typically compared to the state-of-the-art methods based on classical machine learning approaches and report to outperform them.

Node Classification In a node classification approach, the input is a PPI where only the function of some nodes (i.e. proteins) is known. The task is to classify the unknown nodes’ function. Some of the previously discussed methods in predicting PPIs were also used to classify the nodes in the network. For example, two of the classic GCN algorithms described in in the Section “Graph neural networks,” GraphSAGE [ 64 ] and node2vec [ 54 ], were validated on PPI datasets and used to predict the function of proteins within the network. Additionally, Zhang and Kabuka’s approach [ 100 ] to predict PPIs was also extended to classify the function of a given protein. Similarly, Yue et al. [ 41 ] also evaluate the performance of various network algorithms on the task of node prediction to predict the function of proteins.

In a new approach, Gligorijević et al. [ 71 ] consider the idea of representing PPI networks using multiple representations of the same network. Each network contains different information, but uses the same set of nodes. They create a vector representation of each node using Random Walk with Restarts from Cao et al. [ 101 ] to then construct a positive point-wise mutual information matrix for each of the adjacency matrices, which is used as the input to a multimodal deep autoencoder. The setup allows giving multiple PPIs as input and facilitates the integration of all this information, ultimately yielding a low-dimensional vector which is then given to a SVM for protein function classification. The authors found that using a deep learning autoencoder learned a richer and more complex vector embedding of a network, leading to better performance compared to the previous state-of-the-art methods based on classical machine learning methods.

Zeng et al. [ 56 ] seek to identify essential proteins from a PPI network. They learn a dense vector representation of each node using node2vec [ 54 ], and combine that with a representation learned from gene expression profiles using a RNN. This is then passed through a regular fully connected network, in order to classify each node as an essential or non-essential protein. They again compare their methods with classic machine learning approaches such as an SVM, decision trees, and random forests, and find their method outperforms all of them across metrics such as accuracy, recall and AUC. Furthermore, through an ablation study the authors revealed the most critical component of their method driving the performance was the network embedding of the PPI, showcasing the valuable information that is captured in a network.

OhmNet [ 65 ] provides yet another approach, which learns representations of nodes in an unsupervised manner by using multiple layers of PPI networks generated from different tissues. This provides a more informative view into cellular function by incorporating the differences across tissues. The representation is learned based on the network architecture, in an extension to node2vec [ 54 ] for multi-scale graphs, which is later used to classify the protein function in the network. They compare themselves against classic methods such as methods based on tensor factorization and SVMs, as well as to some of the baseline network embedding methods like LINE and node2vec, and found superior performance to all of them in terms of AUROC and AUPRC. They attribute the benefit from having their multi-scale view of the proteins across tissues, which previous methods often modeled as a single network.

Graph Classification The second type of approach takes the graph of a protein’s secondary structure elements as input and classifies it into a functional group. While there are many classical methods that tackle this problem, as in [ 102 ], deep learning offers an alternative way to address the problem. Several of the classic GCN methods mentioned in the Section “Graph neural networks” use protein function prediction as an application of their method, such as Niepert et al. [ 30 ]. The formulation of the question is quite similar to that of drug properties prediction, discussed further in the subsection “Prediction of drug properties” except that the task is classification rather than regression. Given the strong overlap, we leave the discussion of specific methods to that subsection.

Protein structure prediction

A related problem to protein function prediction is protein structure prediction. Since the 3D structure of a protein largely informs its function, these two problems are interlinked. Recent work has focused on developing methods to predict the 3D structure of a protein from its genetic sequence, also known as the protein folding problem. Although there were previous efforts to use deep learning to predict residue contact to help solve the protein folding problem [ 103 , 104 ], AlphaFold [ 76 ] represents a groundbreaking approach that set a new baseline substantially above both deep learning and traditional approaches and is thus the only article we discuss in detail here. AlphaFold, like other approaches, begins with the sequence of amino acids as the basis upon which it will predict the 3D structure. This input is combined with other feature information gathered from protein databases, and uses a CNN to predict the discrete probability distribution of the distances between all pairs of amino acids, as well as the probability distribution of the torsion angles. Predicting the distance and its corresponding distribution yielded more informative and accurate results compared to previous approaches which just predict whether two residues were connected by a link. The authors used the distances and the torsion angles, in conjunction with a penalty if the prediction caused atoms to overlap, to assess the quality of their prediction, called the potential. They were then able to perform stochastic gradient descent to iteratively improve their model. Using this approach yielded unprecedented results, and gave insight into the potential that deep learning can have in addressing some of the most challenging bioinformatics problems.

Drug development, discovery and polypharmacy

Deep learning has recently been used to improve two steps of the process of drug discovery and development [ 105 ], namely: (i) screening thousands of chemical compounds to find the ones that react with a previously identified therapeutic target, and (ii) studying the properties of the potential drug candidates, e.g. toxicity, or absorption, distribution, metabolism, and excretion (ADME). There is interest in improving the screening step since it is quite laborious, expensive and time-consuming. We begin this section reviewing papers that present deep learning methods as an alternative to the current manual screening process, often called drug-target prediction. Then, we summarize deep learning approaches whose aim is to predict drug properties. Subsequently, we discuss the increased interest focused on the identification of which combination of drugs, known as polypharmacy, can be effective for treating human diseases whose mechanisms are too complicated to be treated by using a single one [ 106 ]. However, this therapeutic can have undesired side effects due to the interaction among combination of drugs [ 107 ]. It is therefore crucial to identify DDIs, which is nearly impossible to do manually. We present the papers which try to address this problem by combining deep learning approaches with DDI networks.

Drug–target prediction

After the identification of a therapeutically relevant target, i.e. a protein, it is essential to properly determine its interactions with different chemical compounds to characterize their binding affinity, or drug-target interactions (DTIs). This testing process is usually referred to as a screening, and its output consists of a list of potential drug candidates showing high binding affinities with the target. As already mentioned, manual screening is expensive and time-consuming since it must be performed on thousands of molecules to find a single drug. Deep learning methods try to overcome this limitation, often using DDI networks. Drug-target interaction prediction within the graph deep learning framework is therefore typically formulated as a link prediction problem. Graph-based deep learning methods have shown that they are capable of effectively tackling the drug-target prediction problem across various methods, achieving superior performance to previous state-of-the-art methods.

Some of these methods follow a systemic approach, where several biological networks (PPIs, DDIs) are taken into account in order to solve the prediction problem. An interesting paper belonging to this category is from Manoochehri et al. [ 40 ], which proposes an encoder-decoder GCN to predict the interactions among potential drugs and a therapeutic target. The method takes an heterogeneous network composed of drugs, proteins, diseases, and side effects as input, where nodes can be drugs, proteins, or diseases. Edges exist when nodes are connected by a relationship whose interaction type determines the edge label, such as drug–drug and protein–protein similarities, drug–protein, drug–drug, protein–protein, drug–disease and drug–protein side effects interactions. The authors combine different data resources in order to construct this network. The encoder takes the described network as input, and returns an embedding of the nodes, which is used by the decoder to capture drug-protein interactions. The output of this procedure is the estimated likelihood of the existence of an edge between pairs of proteins and drugs. The stability and flexibility of the proposed method is evaluated on substantial variations of the heterogeneous network.

Zeng et al. [ 28 ] follow a similar systemic approach to solve the DTI prediction problem, proposing a method called deepDTnet. Both [ 40 ] and deepDTnet [ 28 ] outperform the state-of-the-art methods in the field. In addition, deepDTnet is compared to classic machine learning approaches, namely random forests, SVMs, k-nearest neighbors, and naive Bayes, and outperform them on an additional external validation set, demonstrating the generalizability of their method. Additionally, deepDTnet shows higher robustness in comparison to the baselines, since it performs well on drugs or targets showing high and low connectivity as well as high or low chemical similarity. deepDTnet’s predictions were further validated in a in vivo lab experiment.

Another category of methods characterizes the DTI by considering and analyzing the molecular structure of drugs and targets. In [ 26 ], the authors propose a GCN approach to the DTI prediction problem whose input consists of two graphs, a protein pocket graph and a 2D drug molecular graph. Their method is composed of two steps, namely (i) a preliminary unsupervised phase consisting of an autoencoder used for learning general pocket features, and (ii) a supervised graph convolutional binding classifier. The latter is composed of two GCN models working in parallel, i.e. a pocket and a drug GCN, which extract features from the protein pocket graph and the 2D molecule graph respectively. There is a layer responsible for the integration of interactions between proteins, generating a joint drug-target fingerprint, which are then classified into “binding” and “non-binding” classes. The authors compare their model with existing deep learning methods and docking programs popular in the field and report better performance. Results obtained on an external validation dataset showed the higher generalizability of [ 26 ] in comparison to the baselines.

Fout et al. [ 75 ] introduce another method to predict whether a given pair of proteins will interact for the purpose of drug-target prediction. In this approach, two graphs are given as input: the ligand protein and the receptor protein. The nodes in both graphs correspond to residues, and each node is connected to the |$k$| closest other nodes determined by the mean distance between their atoms. Rather than simply predict whether a pair of proteins interact, this predicts where specifically on the protein it will interact. Their method is an extension of the fingerprint method introduced by Duvenaud et al. [ 95 ], but allows for different weighting of the center node vs. its neighbors by training different weights and enables the inclusion of edge features. This approach outperformed the other state-of-the-art method that was based on using an SVM.

Lastly, PotentialNet is a family of GCNs proposed by Feinberg et al. [ 108 ] which differs from the previous ones since it considers the non-covalent interactions among different molecules as input, in addition to the graph molecular structure. More specifically, the method includes three stages: (i) a graph convolution over covalent bonds only, (ii) a simultaneous covalent and noncovalent propagation which takes into account the spatial information between atoms and (iii) a graph gather step performed only on the ligand atoms, whose representation is derived from both bonded ligand information and spatial proximity to protein atoms. The cross validation strategy in [ 108 ] is particularly interesting, since it tests PotentialNet’s generalization capabilities by mimicking real DTIs prediction scenarios, e.g. predicting affinity properties on unseen molecules. Furthermore, PotentialNet is comparable to the classic machine learning state-of-the-art methods in molecular affinity prediction field.

End-to-End Drug Discovery & Development While the approaches described above solve just the screening step, Stokes et al. [ 23 ] recently introduced an approach to tackle the entire drug discovery and target validation step. Motivated by both the marked increase in antibiotic-resistant bacteria and the difficulty of discovering new antibiotics, the authors propose a deep learning approach to identify molecules showing growth inhibition against a target bacterium, namely E. coli . Their research is directed towards the discovery of candidates whose molecular structure is different from currently available and known antibiotics. Unlike the other drug-target prediction methods, this is a graph classification problem. A directed message passing neural network named Chemprop [ 109 ] is trained with a feature-enriched graph representation of molecules labeled according to their action against E. coli . Since the previous step mainly captures local properties, a global feature molecular representation [ 110 ] is also given to the classifier. After the learning step, the obtained classifier is deployed on several chemical libraries, containing more than 107 million molecules, to obtain a list of potential candidate compounds that could be antibacterial against E. coli . Then, the identified molecules are filtered according to the clinical phase of the investigation and to pre-defined scores penalizing similarity with training molecules and toxicity. This procedure led to the identification of halicin from the Drug Repurposing Hub. Halicin properties and action mechanisms were experimentally investigated and the results proved its antibacterial activity on E. coli and on other bacteria in mice, showing that deep learning can effectively improve the antibiotic discovery screening process in a more time and cost effective way.

Prediction of drug properties

After the screening step, which provides a list of molecules showing high affinity with the therapeutic target, the properties of these candidates have to be investigated. This becomes a graph classification or regression problem. We will review methods that seek to predict those properties, such as the absorption, distribution, metabolism and excretion (ADME), stability, solubility, toxicity and quantum properties of chemical compounds represented as graphs. The following methods are compared to the classic machine learning counterparts, with competitive results and are detailed below. This fact highlights the effectiveness of deep learning to capture meaningful information from the graph structure, and therefore its potential to provide an alternative to classic state-of-the-art methods for predicting drug properties.

ADME prediction is the objective of Chemi-Net [ 111 ], a method which combines a GCN with a multi-task deep neural network, which can simultaneously solve multiple learning tasks. Chemi-Net’s input is a molecule represented by two feature sets, describing atoms and atom pairs respectively. The first operation consists of the projection of the assembling of the atoms and atom pair descriptor onto a 3D space, to obtain a molecule-shaped graph structure. The latter undergoes a series of graph convolution operations whose output is then reduced to a single fixed sized molecule embedding during the readout step. ADME prediction is obtained after this last embedding representation passes through several fully connected layers. The authors compare the results obtained by employing the GCN’s embeddings, e.g. single-task learning Chemi-Net, with the ones achieved when using the traditional property descriptors. Chemi-Net outperforms the baseline on almost all datasets, except the small noisy ones. The authors overcome this limitation by means of a multi-task learning framework, which allows them to leverage the information enclosed in large datasets to compensate for the small ones.

Stability is another crucial property to be investigated in the drug discovery and development process. The method proposed in DeepChemStable [ 112 ] aims to predict the stability of chemical compounds from their graph representation by combining a GCN and an attention mechanism. The GCN is able to capture the molecular structure at a local level, while the attention mechanism learns the global graph information. DeepChemStable investigates which features cause the instability of the chemical compound, which enables it to obtain more interpretable results. The authors contrast DeepChemStable with a naive Bayes-based baseline, showing the potential of the proposed deep learning framework. DeepChemStable and the baseline are comparable in terms of AUC and precision, while DeepChemStable is superior in terms of recall. PotentialNet [ 108 ], introduced in the subsection “Drug-target prediction” for DTI prediction, has further applications in drug molecular properties prediction, where its performance is also competitive or superior to existing methods.

Additionally, several of the fundamental GCN algorithms tried to address the problem of drug property prediction. As discussed earlier, Duvenaud et al. [ 95 ] propose a neural network based approach for finding a fingerprint for each molecule, which is then used to predict drug properties of molecules such as solubility, drug efficacy and organic photovoltaic efficiency of molecules and showed improved performance relative to the state-of-the-art circular fingerprint method. Kearnes et al. [ 33 ] expand upon this idea by performing convolutions on edge information in addition to the node information. The Patchy-San algorithm by Niepert et al. [ 30 ], also previously discussed, was also used to classify molecules according to their carcinogenicity [ 30 ] and found similar or better classification accuracy to the classic kernel based methods. Finally, as previously mentioned, Gilmer et al. [ 36 ] iterate upon existing GNN methods (reframed as message passing) to find the best configuration to predict molecular properties among the existing deep learning approaches.

DDI prediction

As introduced previously, polypharmacy is a promising treatment approach in the case of complex diseases, but with a cost: the possibility of undesirable interactions among co-administrated drugs, i.e. polypharmacy side effects. The appearance of side effects has often been reported by patients affected by multiple illnesses who have been treated with multiple drugs simultaneously. Since laboratory screenings of DDIs are very challenging and expensive, there is growing interest in studying and predicting drug interactions using computational methods. Therefore, this section will review some deep learning approaches that use biological networks to predict the interaction among drugs, which is usually formulated as a link prediction problem. As detailed below, the reviewed graph-based deep learning methods outperform, often in a significant way, the classic machine learning and deep learning methods used as baselines, showing that graph-based deep learning approaches can capture meaningful insights into the DDIs prediction problem.

Decagon [ 46 ] is an innovative GCN method for multi-relational link prediction which operates on large multimodal graphs where nodes, i.e. proteins and drugs, are connected through diverse kinds of edges according to the interaction type. These multimodal networks are constructed combining PPIs, DDIs and drug–protein interaction networks. Once the multimodal network is obtained, Decagon performs two main steps: an encoding and a decoding process. The first step is executed by a GCN, which takes in the graph and gives out a node embedding for it. The second step is carried out by a tensor factorization decoder, which obtains a polypharmacy side effects model from the embedding of the nodes given as input. One of the major strengths of Decagon is its capability of identifying not only the presence of an interaction between drugs, but also of which type. Decagon outperforms the state-of-the-art baselines, e.g. classic machine learning approaches for link-prediction, methods for representation learning on graphs and methodologies for multirelational tensor factorization, by an average of 20% and in some cases was as high as 69%. The authors, furthermore, note the importance of including the PPI network in such analysis. In fact, 68% of drug combinations have no common targets, suggesting that PPI information may represent a critical link to understanding which specific target drugs interact with proteins.

Another encoder-decoder method for multi-relational link prediction is presented in [ 27 ]. The proposed method, HLP, is designed to perform on a multi-graph representation of DDIs, defined as networks having drugs as nodes and multiple interactions as edges among node pairs. The characteristic which makes HLP an interesting method is its ability to capture the global graph structure in addition to the local neighborhood information. HLP shows enhanced performance when contrasted with similar multi-link prediction models and to Decagon [ 46 ]. However, Decagon is tailored to work on networks composed of relationships between drugs and also proteins, which according to Decagon’s authors is important to include, while HLP works and is tested on DDI networks only.

Ma et al. [ 43 ] propose yet another approach for DDI prediction by integrating multiple sources of information and using an attention mechanism to learn the appropriate weights associated with each view, resulting in interpretable drug similarity measures. They use a GCN architecture to build an autoencoder, with a GCN as the encoder and another GCN as the decoder. Each drug is a node in their graph, but it contains multiple graphs with the same nodes, and the edge in each view of the graph corresponds to the similarity between the node features in that view. Ultimately, they want to get a node embedding for each node in the graph and recover a single adjacency matrix that captures the information across views, which can predict drug to drug interactions. Ma et al.’s method is compared with several baselines, such as nearest neighbor, label propagation, multiple kernel learning and the non-probabilistic GAE model in [ 96 ]. Results show that [ 43 ] significantly outperforms the baselines for both the binary and multilabel prediction settings.

As previously mentioned, DDIs represent a promising research direction to find therapies for complex diseases. Therefore, besides the prediction of side effects from multiple drugs, many efforts are currently aimed at the discovery of polypharmacy treatments. Jiang et al. [ 55 ] propose an approach to predict synergistic drug combinations against different cancer cell lines. The authors formulate the problem as a link prediction task. The input is a heterogeneous network, diverse for each cancer cell line under study, obtained through the combination of synergistic DDI, DTI and PPI networks. The method, whose algorithm is based upon Decagon [ 46 ], presents a GCN encoder followed by a matrix decoder to predict the synergistic score among pairs of drugs. The method proposed by Jiang et al. [ 55 ] shows improved performance in comparison to an SVM, random forest, elastic net and feature-based deep learning methods. Additionally, it is comparable to a state-of-the-art approach very popular in the field. Finally, the authors apply the method to predict de novo combinations of drugs and discovered that some of them have been already reported in the literature as synergistic against cancer.

Another line of research leverages GCN methods for personalized drug combination predictions. An approach sharing this aim is GAMENet [ 44 ]. GAMENet combines the patient representation obtained by employing an embedding network followed by a dual recurrent neural network with the network information derived from a memory module. The latter is based on a GCN and captures information from two networks, namely the graph representation of longitudinal patient electronic health records (EHR) and a DDI network.

CompNet, proposed in [ 45 ], is another method for supporting doctors in the prescription of drug combinations. In particular, EHR data, prescribed drugs records and adverse DDI networks are used for learning patient and drug information representations which are then combined to obtain the prediction. The module encoding the drug information, referred to as a medicine knowledge graph representation module, is constructed using a relational GCN. Both GAMENet [ 44 ] and CompNet [ 45 ] are subjected to an ablation study to assess the importance of including DDIs information. In both cases, including the DDI network enhances the performance in a significant way. Furthermore, GAMENet and CompNet outperformed several state-of-the-art and classic machine learning approaches across various effectiveness measures, including F1, Jaccard coefficient and DDI rate. In addition, CompNet contrasts its performances to the ones achieved by GAMENet. CompNet outperforms GAMENet in terms of the Jaccard coefficient, recall, F1 and DDI rate, whereas GAMENet is superior only in terms of precision. CompNet’s authors claim that recall is more important than precision when the aim consists of recommending combinations of drugs. In reality, such prediction systems represent a support tool for doctors, and therefore the objective is to provide them with a wide and comprehensive screening of drugs co-administration possibilities, rather than with a precise but limited list.

A different way of handling DDI prediction is presented in [ 25 ]. The authors propose a method to enhance DDI extraction from texts by using a graph representation of the drugs under study. This approach concatenates the results of a CNN used on textual drug pairs with the ones obtained by applying a GCN on their graph molecular structure. Such an approach is motivated by the fact that a lot of information about interactions among different drugs is available in the literature but is not always reported in DDI databases or easily available when prescribing drugs, and at the same time the molecular structure encloses meaningful information for interaction prediction. Results show that [ 25 ] has comparable performance to deep learning state-of-the-art approaches, including Zeng et al. [ 113 ] on which [ 25 ] is based upon, which outperforms the classic machine learning methods used as baseline. Moreover, in [ 25 ] it is shown that including the information on the molecular structure enhances the text-based DDI predictions in a considerable way.

Disease diagnosis

In the last few years, investigating disease diagnoses through deep learning has been of great interest to the research community. However, methods which use graphs, and in particular biological networks, are in a minority. The work proposed in [ 62 ] is situated in this small research area. The authors aim to predict lung cancer from a PPI network integrated with gene expression data by using a combination of spectral clustering and CNNs. The authors try different configurations of the proposed method to identify the one which performs the best and evaluate their method in terms of accuracy, precision and recall.

Additionally, Rhee et al. [ 72 ] propose another example of deep learning on biological networks to perform breast cancer sub-type classification. Their method integrates a GCN and a relational network (RN) and takes in a PPI network enriched with gene expression data. Exploiting the GCN, their approach is capable of learning local graph information, while the use of the RN permits capturing complex patterns among sets of nodes. The GCN and RN outputs are combined to obtain the classification results. The method is compared to SVMs, random forest, k-nearest neighbor and multinomial and Gaussian naive Bayes and performance is obtained through a Monte-Carlo cross validation experiment. The results show that the proposed method outperforms the baselines across all the used metrics, showing that learning PPI network feature-representation by means of a GCN may significantly help in capturing patterns in gene expression data.

Apart from performing disease diagnosis using the biological networks described in the introduction, there are also studies that use different types of networks, such as RNA-disease associations or graphs obtained by converting biomedical images, in combination with deep learning techniques. Deep learning is gaining traction nowadays in the disease diagnosis research area and so we report on some of these approaches in the following paragraphs to demonstrate how broad this field is, despite using networks that are not conventionally considered biological networks.

The next two examples are applications which employ RNA-disease and gene-disease association networks respectively. Zhang et al. [ 114 ] propose a method whose input is a graph representing the association among diseases and RNAs, named a RNAs-disease network. The authors use a GCN combined with a graph attention network to capture both the global and the local structure information of the input, with the objective of predicting RNA-disease associations. Instead, the objective of Han et al. [ 115 ] is to predict gene-disease associations. To this aim, the authors propose a combination of two GCNs and a matrix factorization. Diseases, gene features and similarity graphs are given to two parallel GCNs, which combine their obtained embeddings through an inner product to obtain the prediction. Both [ 114 ] and [ 115 ] show their effectiveness in capturing useful information from the RNAs- or gene-disease association networks in respect to the methods used as baselines.

Besides that, research in this field has centered around converting biomedical images to a graph and then performing classification. For example, Zhang et al. [ 116 ] predict Parkinson’s Disease from a graph representation of multimodal neuroimages using a classifier based on a GCN. Marzullo et al. [ 117 ] present a GCN working on a graph mapping of MRI images to predict Multiple Sclerosis. The use of GCNs enhances the performance with respect to the machine learning and/or deep learning baselines for both [ 116 ] and [ 117 ], showing the potential improvements that GCNs can yield in the image analysis research area.

Another example is [ 118 ], whose aim is breast cancer diagnosis from mammogram images, with only a few labeled samples. They are able to create pseudo-labels for the unlabeled images via graph-based semi-supervised learning, where each node is an image and the edge represents the similarity between images. A CNN is then trained on the individual images using the true and pseudo-labels. This method introduces a valuable contribution in the area of medical image analysis with deep learning, where large datasets are required for the training to be effective. Specifically, the authors develop a strategy to overcome a typical limitation in the field: having few labeled data points. They use instead an algorithm which allows for the inclusion of unlabeled data in the training procedure of the deep learning model. Results show the merits of this strategy, which drastically enhances the performance.

Metabolic networks and GRNs

While less extensively studied, GNNs have also been used for analyzing metabolic and GRNs. These early studies have reported promising results, showing that deep learning’s capability to capture non-linearity in the data can positively affect the study of these complex and meaningful biological networks.

Metabolic Networks Studying and reconstructing metabolic pathways is a key aspect of obtaining a better understanding of physiological processes, drug metabolism and toxicity mechanisms and others. To the best of our knowledge, the literature lacks papers investigating this network using graph-based deep learning methods. It is possible to find plenty of work aiming to analyze, model and reconstruct metabolic pathways, or whose objective is to predict drug metabolism, but they use classical tools [ 119 ]. Two recent papers, namely [ 52 ] and [ 51 ], fit our review topic. The method in [ 52 ] aims to predict the metabolic pathway to which a given compound belongs by means of a hybrid approach. It uses a GCN to learn the shape feature representation of a given molecular graph, which then is the input to a random forest to perform classification. The authors compare their method with several state-of-the-art machine learning approaches, showing the positive impact of employing GCNs as a means for capturing insights from the graph representation of the molecules under study. Furthermore, the authors develop a methodology to interpret the feature representation provided by the GCN in terms of chemical structure parameters, such as the diameter.

The objective of the work presented in [ 51 ] is different. The authors aim at predicting the dynamical properties of metabolic pathways by leveraging their graph representation’s structure using a GNN framework. The graph representing the pathway is a bipartite graph obtained from systems biology markup language models of biochemical pathways using a Petri net modeling approach [ 120 ]. The authors contrast the proposed method with a classifier predicting the majority class in the test set and report that their method always outperforms the baseline. The method in [ 51 ] represents a computationally efficient alternative to the onerous numerical and stochastic simulations which are often used for assessing the dynamical properties of biochemical pathways.

Gene Regulatory Networks Knowledge about GRNs is essential to gain insights about complex cellular mechanisms and may be useful for the identification of disease pathways or new therapeutic targets. Therefore, GRNs are widely investigated, with particular interest bestowed upon inferring, validating and reconstructing them. Such investigations are mostly performed with classic methods, while the amount of developed graph-based deep learning approaches is rather small, as for metabolic networks. To date, curated GRN datasets are not yet available or are difficult to obtain for a large number of organisms [ 49 , 121 ]. For this reason, GRNs are mostly analyzed with unsupervised methods [ 121 ], since supervised techniques, and deep learning in particular, require a large number of well annotated samples in order to be effective. Additionally, GRN inference is usually accomplished by employing information from gene expression data, which are intrinsically noisy [ 122 ] and therefore not ideal for training models. However, some deep learning models, specifically RNNs, report promising results, although they do not use any kind of graph information to perform the task. One example is the work in [ 122 ], which enhances the training quality by introducing a non-linear Kalman filter, which deals very effectively with the noise in the data.

Despite the limitations discussed above, Turki et al. [ 49 ] present an example of graph-based deep learning approaches. The authors use an unsupervised method to obtain a preliminary version of the GRN from gene expression time series data, which is denoised through a cleaning algorithm, and then used to train diverse supervised methods to perform link prediction among gene pairs. The proposed data cleaning algorithm is of crucial importance and could positively impact the field of GRN analyses since it increases the quality of the GRN data. More in detail, the denoised features are obtained by projecting the original features onto the eigenvectors of the distance matrix of the feature vectors calculated using the Laplacian kernel function. The supervised methods Turki et al. use after cleaning the GRN includes SVMs and deep learning approaches, such as a DNN and a deep belief network. The latter two outperform the unsupervised state-of-the-art baseline, although failed to outperform the linear SVM-based approach.

The promise of deep learning, based on its success in other fields [ 7 , 8 ], is now also being seen across many different areas of biological network analysis. The methods we reviewed reported to consistently match or beat previous state-of-the-art methods using classical machine learning algorithms, providing evidence of one of deep learning’s core advantages: its strong empirical classification performance.

Another advantage of deep learning is its ability to effectively deal with large datasets [ 123 ], which can be challenging for classical machine learning methods [ 123 , 124 ]. Although the training process of deep learning models with huge amounts of data is a non-trivial task, the advances in parallel and distributed computing have made training these large deep learning models possible [ 125 , 126 ]. The large number of matrix multiplications, high memory requirements and easy parallelizability of neural networks have been particularly well served by the recent breakthroughs in GPU computing [ 2 , p. 440].

Finally, given that deep learning is a learning approach based on a hierarchy of non-linear functions, it is capable of detecting patterns in the raw data without explicit feature engineering. While it is not the only method that can handle non-linear relationships, the composition of many simple, non-linear layers makes it particularly adept at learning patterns at different layers of abstraction [ 126 ], enabling more complex patterns to be detected.

While deep learning methods are very promising, there are limitations and many open questions to be solved. One of the main problems with deep learning is its lack of interpretability. While there has been some recent progress in this area [ 127 , 128 ], the black box nature of deep learning algorithms remains a key challenge, particularly in bioinformatics, where one is interested in understanding the mechanisms underlying the biological processes [ 129 , 130 ]. Additionally, interpretability is critical in the context of models that guide medical decisions, where doctors and patients are often unlikely to trust the output of a deep learning model without sufficient understanding of the prediction process [ 127 ].

Another issue is the need for large labeled datasets, since deep neural networks have a large amount of hyperparameters to tune. Although the recent advances in the technology enable the collection of huge amounts of data, the field of bioinformatics often suffers from quality issues with the data and the lack of reliable labels, since much of the data is unlabeled [ 127 ]. In such a scenario, training can be difficult and can limit the effectiveness of deep learning in bioinformatics, which can be seen for example in GRN analysis. Furthermore, not all application areas in bioinformatics have access to large amounts of data. In disease diagnosis, for example, data points can represent individual patients and therefore amassing the large datasets necessary for deep learning to excel can be challenging. Furthermore, the access to disease-related data is often limited by privacy restrictions [ 131 ], therefore contributing to the limited size of datasets in the field [ 132 ]. In such smaller data regimes, classical machine learning methods, which are often available in standard programming libraries, can be a suitable alternative [ 133 ], such as graph kernels [ 98 , 102 , 134 , 135 ] and their implementations [ 136 ].

Despite these challenges, deep learning on graphs is an active area of research and is already achieving exciting results across various bioinformatics disciplines such as proteomics, drug development and discovery, disease diagnosis and more, as we have seen in this review. We can therefore anticipate the continued development of new algorithms, both within and outside bioinformatics, that can be used to analyze biological networks. Moreover, the amount of data generated from recent advancements in high-throughput technology will continue to grow, providing even more opportunities for deep learning to solve existing as well as new problems in biological network analysis.

Biological networks are a meaningful way of representing many biological processes, such as PPI networks, DDI networks and GRNs, because they can model both the biological entities as well as the relationships between those entities.

The graph representation of biological networks enables the formulation of classic machine learning tasks in bioinformatics, such as node classification, link prediction and graph classification.

Deep learning methods on graphs, specifically GNNs, are a new way of solving these tasks by capturing hierarchical non-linearities in the data and neighborhood information represented by the network.

GNNs have been successfully applied in several areas of bioinformatics such as protein function prediction in proteomics and polypharmacy prediction in drug discovery & development.

GNNs are also being used to tackle questions across various emerging applications of bioinformatics, such as metabolic pathway prediction in metabolic network analysis.

This work was supported in part from the Alfried Krupp Prize for Young University Teachers of the Alfried Krupp von Bohlen und Halbach-Stiftung (K.B.) and in part from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 813533.

Giulia Muzio is a PhD student at the Machine Learning and Computational Biology Lab at ETH Zürich, whose main research interest is computational biology.

Leslie O’Bray is a PhD student at the Machine Learning and Computational Biology Lab at ETH Zürich, whose main research focus is machine learning on graphs.

Karsten Borgwardt is a Full Professor of Data Mining in the Life Sciences at ETH Zürich since 2017 and the recipient of the 2013 Alfried Krupp Award.

Reuter JA , Spacek D , Snyder MP . High-throughput sequencing technologies . Mol Cell 2015 ; 58 ( 4 ): 586 – 97 .

Google Scholar

Goodfellow I , Bengio Y , Courville A . Deep Learning . Cambridge, MA : MIT Press , 2016 . http://www.deeplearningbook.org .

Google Preview

Werbos P . Beyond regression: new tools for prediction and analysis in the behavioral sciences . Ph.D. diss., Harvard University , 1974 .

Parker DB . Learning logic technical report tr-47 . In: Center of Computational Research in Economics and Management Science . Cambridge, MA : Massachusetts Institute of Technology , 1985 .

LeCun Y . Une procédure d’apprentissage pour réseau à seuil assymétrique . In: Proceedings of Cognitiva 85: A la Frontière de l’Intelligence Artificielle, des Sciences de la Connaissance et des Neurosciences [in French] , 1985 , pp. 599 – 604 .

Rumelhart DE , Hinton GE , Williams RJ . Learning representations by back-propagating errors . Nature 1986 ; 323 (6088): 533 – 536 .

Krizhevsky A , Sutskever I , Hinton GE . Imagenet classification with deep convolutional neural networks . In: Pereira F, Burges CJC, Bottou L et al. (eds). Proceedings of the 26th International Conference on Neural Information Processing Systems , 2012 , pp. 1097 – 105 .

Hinton G , Deng L , Yu D , et al.  Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups . IEEE Signal Process Mag 2012 ; 29 ( 6 ): 82 – 97 .

Peng GCY , Alber M , Buganza Tepole A , et al.  Multiscale modeling meets machine learning: What can we learn? Arch Comput Methods Eng 2020 ; 1–21 .

Zhang B , Tian Y , Zhang Z . Network biology in medicine and beyond . Circ Cardiovasc Genet 2014 ; 7 ( 4 ): 536 – 47 .

De Las Rivas J , Fontanillo C . Protein—protein interactions essentials: key concepts to building and analyzing interactome networks . PLoS Comput Biol 2010 ; 6 (6):e1000807.

Raman K . Construction and analysis of protein—protein interaction networks . Autom Exp 2010 ; 2 ( 1 ):2.

Junker BH , Schreiber F . Analysis of Biological Networks . Hoboken, NJ: Wiley-Interscience , 2008 .

Perkins JR , Diboun I , Dessailly BH , et al.  Transient protein-protein interactions: structural, functional, and network properties . Structure 2010 ; 18 : 1233 – 43 .

Kurzbach B . Network representation of protein interactions: Theory of graph description and analysis . Protein Sci 2016 ; 25 ( 9 ): 1617 – 27 .

Meng J , Huang Y . Gene Regulation . New York, NY : Springer New York , 2013 , 797 – 801 .

Wang Y . Gene regulatory networks. In: Encyclopedia of Systems Biology . New York : Springer , 2013, 801–805.

Berg JM , Tymoczko JL , Stryer L . Biochemistry . New York : W.H. Freeman , 2002 .

Jeong HH , Tombor B , Albert R , et al.  The large-scale organization of metabolic networks . Nature 2000 ; 407 : 651 – 4 .

Hu T , Hayton WL . Architecture of the drug–drug interaction network . J Clin Pharm Ther 2011 ; 36 ( 2 ): 135 – 43 .

Zhang L , Zhang Y , Zhao P , et al.  Predicting drug–drug interactions: an FDA perspective . AAPS J 2009 ; 11 ( 2 ): 300 – 6 .

Corsello SM , Bittker JA , Liu Z , et al.  The drug repurposing hub: a next-generation drug library and information resource . Nat Med 2017 ; 23 ( 4 ): 405 – 8 .

Stokes JM , Yang K , Swanson K , et al.  A deep learning approach to antibiotic discovery . Cell 2020 ; 180 ( 4 ): 688 – 702 .

Wishart DS , Knox C , Guo AC , et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration . Nucleic Acids Res 2006 ; 34 : D668 – 72 .

Asada M , Miwa M , Sasaki Y . Enhancing drug–drug interaction extraction from texts by molecular structure information . In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics , Melbourne, Australia: Association for Computational Linguistics, 2018 , pp. 680 – 5 .

Torng W , Altman RB . Graph convolutional neural networks for predicting drug-target interactions . J Chem Inform Model 2019 ; 59 ( 10 ): 4131 – 49 .

Vaida M , Purcell K . Hypergraph link prediction: learning drug interaction networks embeddings . In: Proceedings of the 18th IEEE International Conference On Machine Learning And Applications (ICMLA) , 2019 , pp. 1860 – 5 .

Zeng X , Zhu S , Lu W , et al.  Target identification among known drugs by deep learning from heterogeneous networks . Chem Sci 2020 ; 11 : 1775 – 97 .

Debnath AK , Lopez de Compadre RL , Debnath G , et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds . Correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal Chemistry 1991 ; 34 ( 2 ): 786 – 97 .

Niepert M , Ahmed M , Kutzkov K . Learning convolutional neural networks for graphs . In: Proceedings of the 33rd International Conference on Machine Learning , New York, NY, USA : PMLR , 2016 .

Wale N , Karypis G . Comparison of descriptor spaces for chemical compound retrieval and classification . In: Proceedings of the International Conference on Data Mining (ICDM) , Los Alamitos, CA : IEEE Computer Society , 2006 , 678 – 89 .

Wang Y , Xiao J , Suzek T , et al.  PubChem’s BioAssay database . Nucleic Acids Res 2012 ; 40 ( D1 ): D400 – 12 .

Kearnes S , McCloskey K , Berndl M , et al.  Molecular graph convolutions: moving beyond fingerprints . J Comput Aided Mol Des 2016 ; 30 ( 8 ): 595 – 608 .

Toivonen H , Srinivasan A , King RD , et al.  Statistical evaluation of the predictive toxicology challenge 2000–2001 . Bioinformatics 2003 ; 19 ( 10 ): 1183 – 93 .

Ramakrishnan R , Dral PO , Rupp M , et al.  Quantum chemistry structures and properties of 134 kilo molecules . Sci Data 2014 ; 1 :140022.

Gilmer J , Schoenholz SS , Riley PF , et al.  Neural message passing for quantum chemistry . In: Proceedings of the 34th International Conference on Machine Learning 2017 ; 70 : 1263 – 72 .

Mayr A , Klambauer G , Unterthiner T , et al.  DeepTox: toxicity prediction using deep learning . Frontiers in Environmental Science 2015 ; 3 : 8 .

Knox C , Law V , Jewison T , et al.  Drugbank 3.0: a comprehensive resource for ’omics’ research on drugs . Nucleic Acids Res 2011 ; 39 : D1035 – 41 .

Wishart DS , Feunang Y , Guo A , et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 . Nucleic Acids Res , 46 (D1): D1074 – D1082 , 2018 .

Manoochehri HE , Pillai A , Nourani M . Graph convolutional networks for predicting drug-protein interactions . In: Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , 2019 , pp. 1223 – 5 .

Yue X , Wang Z , Huang J , et al.  Graph embedding on biomedical networks: methods, applications and evaluations . Bioinformatics 2019 ; 36 ( 4 ): 1241 – 51 .

Tatonetti N , Patrick P , Daneshjou R , et al.  Data-driven prediction of drug effects and interactions . Sci Transl Med , 2012 ; 4 ( 125 ).

Ma T , Xiao C , Zhou J , et al.  Drug similarity integration through attentive multi-view graph auto-encoders . In: Proceedings of the 27th International Joint Conference on Artificial Intelligence , ijcai.org , 2018 , pp. 3477 – 83 .

Shang J , Xiao C , Ma T , et al.  GAMENet: graph augmented memory networks for recommending medication combination . In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence , Palo Alto, CA, USA : AAAI Press , 2019 , AAAI-19 , pp. 1126 – 33 .

Wang S , Ren P , Chen Z , et al.  Order-free medicine combination prediction with graph convolutional reinforcement learning . In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management , 2019 , pp. 1623 – 32 .

Zitnik M , Agrawal M , Leskovec J . Modeling polypharmacy side effects with graph convolutional networks . Bioinformatics 2018 ; 34 ( 13 ): i457 – 66 .

Greenfield A , Madar A , Ostrer H , et al.  DREAM4: combining genetic and dynamic information to identify biological networks and dynamical models . PLoS One 2010 ; 5 (10):e13397 .

Madar A , Greenfield A , Vanden-Eijnden E , et al.  DREAM3: network inference using dynamic context likelihood of relatedness and the inferelator . PLoS One 2010 ; 5 ( 3 ): 1 – 14 .

Turki T , Wang JTL , Rajikhan I . Inferring gene regulatory networks by combining supervised and unsupervised methods . In: Proceedings of the 15th IEEE International Conference on Machine Learning and Applications (ICMLA) , 2016 , pp. 140 – 5 .

Le Novere N , Bornstein B , Broicher A , et al.  Biomodels database: a free, centralized database of curated, published, quantitative kinetics models of biochemical and cellular systems . Nucleic Acids Res 2006 ; 34 : D689 – 91 .

Bove P , Micheli A , Milazzo P , et al.  Prediction of dynamical properties of biochemical pathways with graph neural networks . In: Proceedings of the 11th International Conference on Bioinformatics Models, Methods and Algorithms , Setúbal, Portugal : SciTePress , 2020 , pp. 32 – 43 .

Baranwal M , Magner A , Elvati P , et al.  A deep learning architecture for metabolic pathway prediction . Bioinformatics 2019 ; 36 ( 8 ): 2547 – 53 .

Kanehisa M , Goto S . KEGG: Kyoto encyclopedia of genes and genomes . Nucleic Acids Res 2000 ; 28 ( 1 ): 27 – 30 .

Grover A , Leskovec J . node2vec: scalable feature learning for networks . In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , New York, NY : Association for Computing Machinery , 2016 , pp. 855 – 64 .

Jiang P , Huang S , Fu Z , et al.  Deep graph embedding for prioritizing synergistic anticancer drug combinations . Comput Struct Biotechnol J 2020 ; 18 : 427 – 38 .

Zeng M , Li M , Wu F-X , et al.  DeepEP: a deep learning framework for identifying essential proteins . BMC Bioinform 2019 ; 20 (16):506) .

Breitkreutz B-J , Stark C , Reguly T , et al.  The BioGRID interaction database: 2008 update . Nucleic Acids Res 2007 ; 36 : D637 – 40 .

Xenarios I , Rice DW , Salwinski L , et al.  DIP: the database of interacting proteins . Nucleic Acids Res 2000 ; 28 ( 1 ): 289 – 91 .

Du X , Sun S , Hu C , et al.  DeepPPI: Boosting prediction of protein—protein interactions with deep neural networks . J Chem Inform Model 2017 ; 57 ( 6 ): 1499 – 510 .

Liu L , Ma Y , Zhu X , et al.  Integrating sequence and network information to enhance protein-protein interaction prediction using graph convolutional networks . In: Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , Red Hook, NY : Curran Associates , 2019 , pp. 1762 – 8 .

Das J , Yu H . HINT: high-quality protein interactomes and their applications in understanding human disease . BMC Syst Biol 2012 ; 6 : 92 .

Matsubara T , Nacher JC , Ochiai T , et al.  Convolutional neural network approach to lung cancer classification integrating protein interaction network and gene expression profiles . In: Proceedings of the 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE) , 2018 , pp. 151 – 4 .

Schaefer MH , Fontaine JF , Vinayagam A , et al.  Hippie: integrating protein interaction networks with experiment based quality scores . PLoS One 2012 ; 7 (2) .

Hamilton WL , Ying R , Leskovec J . Inductive representation learning on large graphs . In: Proceedings of the 30th International Conference on Neural Information Processing Systems , 2017 , pp. 1024 – 34 .

Zitnik M , Leskovec J . Predicting multicellular function through multi-layer tissue networks . Bioinformatics 2017 ; 33 ( 14 ): 190 – 8 .

Peri S , Navarro JD , Amanchy R , et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans . Genome Res 2003 ; 13 ( 10 ): 2363 – 71 .

Keshava Prasad TS , Goel R , Kandasamy K , et al.  Human protein reference database—2009 update . Nucleic Acids Res 2009 ; 37 : D767 – 72 .

Licata L , Briganti L , Peluso D , et al.  MINT, the molecular interaction database: 2012 update . Nucleic Acids Res 2012 ; 40 : D857 – 61 .

Cowley MJ , Pinese M , Kassahn KS , et al.  PINA v2.0: mining interactome modules . Nucleic Acids Res 2012 ; 40 : D862 – 5 .

Szklarczyk D , Franceschini A , Wyder S , et al.  String v10: protein-protein interaction networks, integrated over the tree of life . Nucleic Acids Res 2015 ; 43 : D447 – 52 .

Gligorijević V , Barot M , Bonneau R . deepNF: deep network fusion for protein function prediction . Bioinformatics , 34 (22): 3873 – 3881 , 2018 .

Rhee S , Seo S , Kim S . Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification . In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence 2018 ; IJCAI-18 : 3527 – 34 .

Dobson PD , Doig AJ . Distinguishing enzyme structures from non-enzymes without alignments . J Mol Biol 2003 ; 330 ( 4 ): 771 – 83 .

Berman HM , Westbrook J , Feng Z , et al.  The protein data bank . Nucleic Acids Res 2000 ; 28 ( 1 ): 235 – 42 .

Fout A , Byrd J , Shariat B , et al.  Protein interface prediction using graph convolutional networks . In: Proceedings of the 31st International Conference on Neural Information Processing Systems , Red Hook, NY : Curran Associates , 2017 , pp. 6533 – 42 .

Senior AW , Evans R , Jumper J , et al.  Improved protein structure prediction using potentials from deep learning . Nature 2020 ; 577 ( 7792 ): 706 – 10 .

Bhagat S , Cormode G , Muthukrishnan S . Node classification in social networks . In: Aggarwal CC (ed). Social Network Data Analytics . New York, NY: Springer , 2011 , 115 – 48 .

Lü L , Zhou T . Link prediction in complex networks: A survey . Physica A: Statistical Mechanics and its Applications 2011 ; 390 ( 6 ): 1150 – 70 .

Tsuda K , Saigo H . Graph classification. In: Managing and Mining Graph Data . New York, NY: Springer , 2010 , 337 – 63 .

Hamilton W , Ying R , Leskovec J . Representation learning on graphs: Methods and applications . IEEE Data Eng Bull , 2017 .

Cui P , Wang X , Pei J , et al.  A survey on network embedding . IEEE Trans Knowl Data Eng 2019 ; 31 : 833 – 52 .

Cai H , Zheng VW , Chang KC . A comprehensive survey of graph embedding: Problems, techniques, and applications . IEEE Trans Knowl Data Eng 2018 ; 30 ( 9 ): 1616 – 37 .

Wu Z , Pan S , Chen F , et al.  A comprehensive survey on graph neural networks . IEEE Trans Neural Netw Lear Syst 2020 ; 1 – 21 .

Li Y , Tarlow D , Brockschmidt M , et al.  Gated graph sequence neural networks . In: Proceedings from the 4th International Conference on Learning Representations (ICLR) , New York, NY : JMLR: W&CP , 2016 .

Scarselli F , Gori M , Tsoi AC , et al.  The graph neural network model . IEEE Trans Neural Netw 2009 ; 20 ( 1 ): 61 – 80 .

Jain A , Zamir A , Savarese S , et al.  Structural-rnn: Deep leaning on spatio-temporal graphs . In: Computer Vision and Pattern Recognition , Las Vegas, Nevada: IEEE, 2016 .

Li Y , Yu R , Shahabi C , et al.  Diffusion convolutional recurrent neural network: Data-driven traffic forecasting . In: Proceedings from the 6th International Conference on Learning Representations (ICLR) , OpenReview.net , 2018 .

Fensel D , Şimşek U , Angele K , et al.  Introduction: What Is a Knowledge Graph? Cham, Switzerland: Springer International Publishing , 2020 , 1–10 .

Perozzi B , Al-Rfou R , Skiena S . Deepwalk: online learning of social representations . In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’14, 2014 , pp. 701 – 710 .

Tang J , Qu M , Wang M , et al.  LINE: large-scale information network embedding . In: Proceedings of the 24th International Conference on World Wide Web , New York, NY, United States : Association for Computing Machinery , 2015 .

Mikolov T , Sutskever I , Chen K , et al.  Distributed representations of words and phrases and their compositionality . In: Proceedings of the 26th International Conference on Neural Information Processing Systems, Volume 2 , Red Hook, NY, United States : Curran Associates Inc., 57 Morehouse Lane , 2013 , 3111 – 9 .

LeCun Y , Boser B , Denker JS , et al.  Backpropagation applied to handwritten zip code recognition . Neural Comput 1989 ; 1 ( 4 ): 541 – 51 .

Bruna J , Zaremba W , Szlam A , et al.  Spectral networks and deep locally connected networks on graphs . In: Proceedings from the 2nd International Conference on Learning Representations (ICLR) , OpenReview.net, 2014 .

Defferrard M , Bresson X , Vandergheynst P . Convolutional neural networks on graphs with fast localized spectral filtering . In: Proceedings of the 29th International Conference on Neural Information Processing Systems , 2016 , 3844 – 52 .

Duvenaud D , Maclaurin D , Aguilera-Iparraguirre J , et al.  Convolutional networks on graphs for learning molecular fingerprints . In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 2 , 2015 , 2224 – 32 .

Kipf TN , Welling M . Semi-supervised classification with graph convolutional networks . In: Proceedings from the 5th International Conference on Learning Representations (ICLR) , 2017 .

Shervashidze N , Borgwardt K . Fast subtree kernels on graphs . In: Proceedings of the 23rd International Conference on Neural Information Processing Systems , Red Hook, NY, United States : Curran Associates Inc., 57 Morehouse Lane , 2009 , pp. 1660 – 8 .

Shervashidze N , Schweitzer P , van Leeuwen EJ , et al.  Weisfeiler-Lehman graph kernels . J Mach Learn Res 2011 ; 12 : 2539 – 61 .

Weisfeiler B , Lehman AA . Reduction of a graph to a canonical form and an algebra arising during this reduction . Nauchno-Technicheskaya Informatsia , Ser. 2, 9 , 1968 .

Zhang D , Kabuka M . Multimodal deep representation learning for protein interaction identification and protein family classification . BMC Bioinformatics , 20 (16): 531 , 2019 .

Cao S , Lu W , Xu Q . Deep neural networks for learning graph representations . In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence , Palo Alto, California : The AAAI Press , 2016 , pp 1145–1152 .

Borgwardt KM , Ong CS , Schönauer S , et al.  Protein function prediction via graph kernels . Bioinformatics 2005 ; 21 : i47 – 56 .

Wang S , Sun S , Li Z , et al.  Accurate de novo prediction of protein contact map by ultra-deep learning model . PLoS Comput Biol 2017 ; 13 ( 1 ):e1005324.

Jones DT , Kandathil SM . High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features . Bioinformatics 2018 ; 34 ( 19 ): 3308 – 15 .

Hughes J , Rees S , Kalindjian S , et al.  Principles of early drug discovery . Br J Pharmacol 2011 ; 162 ( 6 ): 1239 – 49 .

Keith CT , Borisy AA , Stockwell BR . Multicomponent therapeutics for networked systems . Nat Rev Drug Discov 2005 ; 4 : 71 – 8 .

Becker ML , Kallewaard M , Caspers PWJ , et al.  Hospitalisations and emergency department visits due to drug-drug interactions: a literature review . Pharmacoepidemiol Drug Safety 2007 ; 16 ( 6 ): 641 – 51 .

Feinberg EN , Sur D , Wu Z , et al.  PotentialNet for molecular property prediction . ACS Central Sci 2018 ; 4 ( 11 ): 1520 – 30 .

Yang K , Swanson K , Jin W , et al.  Analyzing learned molecular representations for property prediction . J Chem Inform Model 2019 ; 59 : 3370 – 88 .

Landrum G . RDKit: open-source cheminformatics . http://www.rdkit.org (24 February 2020, date last accessed).

Liu K , Sun X , Jia L , et al.  Chemi-Net: a molecular graph convolutional network for accurate drug property prediction . Int J Mol Sci 2019 ; 20 : 3389 .

Li X , Yan X , Gu Q , et al.  DeepChemStable: chemical stability prediction with an attention-based graph convolution network . J Chem Inform Model 2019 ; 59 ( 3 ): 1044 – 9 .

Zeng D , Liu K , Lai S , et al.  Relation classification via convolutional deep neural network . In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers Dublin, Ireland : Dublin City University and Association for Computational Linguistics , 2014 , 2335 – 44 .

Zhang J , Hu X , Jiang Z , et al.  Predicting disease-related RNA associations based on graph convolutional attention network . In: Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , Red Hook, NY : Curran Associates , 2019 , 177 – 82 .

Han P , Yang P , Zhao P , et al.  GCN-MF: disease-gene association identification by graph convolutional networks and matrix factorization . In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , New York, NY, United States : Association for Computing Machinery , 2019 , 705 – 13 .

Zhang M , He L , Chen K , et al.  Multi-view graph convolutional network and its applications on neuroimage analysis for parkinson’s disease . In: AMIA Annual Symposium proceedings , 2018 , pp. 1147–1156 .

Marzullo A , Kocevar G , Stamile C , et al.  Classification of multiple sclerosis clinical profiles via graph convolutional neural networks . Front Neurosci 2019 ; 13 : 594 .

Sun W , Tseng T-L , Zhang J , et al.  Enhancing deep convolutional neural network scheme for breast cancer diagnosis with unlabeled data . Comput Med Imaging Graph 2017 ; 57 : 4 – 9 .

Cuperlovic-Culf M . Machine learning methods for analysis of metabolic data and metabolic pathway modeling . Metabolites 2018 ; 8 : 4 .

Gilbert D , Heiner M , Lehrack S . A unifying framework for modelling and analysing biochemical pathways using petri nets . In: Calder M, Gilmore S (eds). Computational Methods in Systems Biology . Berlin, Heidelberg : Springer , 2007 .

Maetschke SR , Madhamshettiwar PB , Davis MJ , et al.  Supervised, semi-supervised and unsupervised inference of gene regulatory networks . Bioinformatics 2013 ; 15 ( 2 ): 192 – 211 .

Raza K , Alam M . Recurrent neural network based hybrid model for reconstructing gene regulatory network . Comput Biol Chem 2016 ; 64 : 322 – 34 .

Chen X-W , Lin X . Big data deep learning: Challenges and perspectives . IEEE Access 2014 ; 2 : 514 – 25 .

Zhou L , Pan S , Wang J , et al.  Machine learning on big data: Opportunities and challenges . Neurocomputing 2017 ; 237 : 350 – 61 .

Dean J , Corrado G , Monga R , et al.  Large scale distributed deep networks . In: Adv Neural Infor Process Syst , 2012 , pp. 1223–1231 .

LeCun Y , Bengio Y , Hinton G . Deep learning . Nature 2015 ; 521 ( 7553 ): 436 – 44 .

Ching T , Himmelstein DS , Beaulieu-Jones BK , et al.  Opportunities and obstacles for deep learning in biology and medicine . J R Soc Interface 2018 ; 15 : 20170387 .

Li Y , Huang C , Ding L , et al.  Deep learning in bioinformatics: Introduction, application, and perspective in the big data era . Methods 2019 ; 166 : 4 – 21 .

Miotto R , Wang F , Wang S , et al.  Deep learning for healthcare: review, opportunities and challenges . Brief Bioinform 2018 ; 19 ( 6 ): 1236 – 46 .

Zampieri G , Vijayakumar S , Yaneske E , et al.  Machine and deep learning meet genome-scale metabolic modeling . PLoS Comput Biol 2019 ; 15 :e1007084.

Malin BA , Emam KE , O’Keefe CM . Biomedical data privacy: problems, perspectives, and recent advances . J Am Med Inform Assoc 2013 ; 20 ( 1 ): 2 – 6 .

Min S , Lee B , Yoon S . Deep learning in bioinformatics . Brief Bioinform 2016 ; 18 ( 5 ): 851 – 69 .

Playe B , Stoven V . Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity . J Cheminform 2020 ; 12 (1):11 .

Gärtner T , Flach P , Wrobel S . On graph kernels: Hardness results and efficient alternatives . In: Learning theory and kernel machines . Springer , 2003 , pp. 129 – 143 .

Borgwardt KM , Kriegel H.-P . Shortest-path kernels on graphs . In: Fifth IEEE International Conference on Data Mining (ICDM’05) . IEEE , 2005 .

Sugiyama M , Ghisu ME , Llinares-López F , et al.  graphkernels: R and Python packages for graph comparison . Bioinformatics , 2017 ; 34 (3): 530 – 532 .

Author notes

Supplementary data, email alerts, citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1477-4054
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

biological networks thesis

  • Networks in Systems Biology

Applications for Disease Modeling

  • © 2020
  • Fabricio Alves Barbosa da Silva   ORCID: https://orcid.org/0000-0002-8172-5796 0 ,
  • Nicolas Carels   ORCID: https://orcid.org/0000-0003-4547-7155 1 ,
  • Marcelo Trindade dos Santos 2 ,
  • Francisco José Pereira Lopes 3

Scientific Computing Program (PROCC), Oswaldo Cruz Foundation, Rio de Janeiro, Brazil

You can also search for this editor in PubMed   Google Scholar

CDTS, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil

Department of computational modeling, national laboratory of scientific computing, petrópolis, brazil, graduate program in nanobiosystems, federal university of rio de janeiro, duque de caxias, brazil.

  • Provides a detailed analysis of the latest research into biological networks and disease modeling
  • Gathers contributions from renowned experts in the field
  • Covers topics including networks in systems biology, the computational modeling of multidrug-resistant bacteria, and systems biology of cancer

Part of the book series: Computational Biology (COBO, volume 32)

11k Accesses

11 Citations

2 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (13 chapters)

Front matter, biological networks and methods in systems biology, network medicine: methods and applications.

  • Italo F. do Valle, Helder I. Nakaya

Computational Tools for Comparing Gene Coexpression Networks

  • Vinícius Carvalho Jardim, Camila Castro Moreno, André Fujita

Functional Gene Networks and Their Applications

  • Hong-Dong Li, Yuanfang Guan

A Review of Artificial Neural Networks for the Prediction of Essential Proteins

  • Kele Belloze, Luciana Campos, Ribamar Matias, Ivair Luques, Eduardo Bezerra

Transcriptograms: A Genome-Wide Gene Expression Analysis Method

  • Rita M. C. de Almeida, Lars L. S. de Souza, Diego Morais, Rodrigo J. S. Dalmolin

A Tutorial on Sobol’ Global Sensitivity Analysis Applied to Biological Models

  • Michel Tosin, Adriano M. A. Côrtes, Americo Cunha

Reaction Network Models as a Tool to Study Gene Regulation and Cell Signaling in Development and Diseases

  • Francisco José Pereira Lopes, Claudio Daniel Tenório de Barros, Josué Xavier de Carvalho, Fernando de Magalhães Coutinho Vieira, Cristiano N. Costa

Disease and Pathogen Modeling

Challenges for the optimization of drug therapy in the treatment of cancer.

  • Nicolas Carels, Alessandra Jordano Conforte, Carlyle Ribeiro Lima, Fabricio Alves Barbosa da Silva

Opportunities and Challenges Provided by Boolean Modelling of Cancer Signalling Pathways

  • Petronela Buiga, Jean-Marc Schwartz

Integrating Omics Data to Prioritize Target Genes in Pathogenic Bacteria

  • Marisa Fabiana Nicolás, Maiana de Oliveira Cerqueira e Costa, Pablo Ivan P. Ramos, Marcelo Trindade dos Santos, Ernesto Perez-Rueda, Marcelo A. Marti et al.

Modelling Oxidative Stress Pathways

  • Harry Beaven, Ioly Kotta-Loizou

Computational Modeling in Virus Infections and Virtual Screening, Docking, and Molecular Dynamics in Drug Design

  • Rachel Siqueira de Queiroz Simões, Mariana Simões Ferreira, Nathalia Dumas de Paula, Thamires Rocco Machado, Pedro Geraldo Pascutti

Cellular Regulatory Network Modeling Applied to Breast Cancer

  • Luiz Henrique Oliveira Ferreira, Maria Clicia Stelling de Castro, Alessandra Jordano Conforte, Nicolas Carels, Fabrício Alves Barbosa da Silva

Back Matter

  • Systems Biology
  • Precision Medicine
  • Modeling and Simulation of Biological Systems
  • Systems Biology of Cancer

About this book

This book presents a range of current research topics in biological network modeling, as well as its application in studies on human hosts, pathogens, and diseases. Systems biology is a rapidly expanding field that involves the study of biological systems through the mathematical modeling and analysis of large volumes of biological data. Gathering contributions from renowned experts in the field, some of the topics discussed in depth here include networks in systems biology, the computational modeling of multidrug-resistant bacteria, and systems biology of cancer. Given its scope, the book is intended for researchers, advanced students, and practitioners of systems biology. The chapters are research-oriented, and present some of the latest findings on their respective topics.

Editors and Affiliations

Fabricio Alves Barbosa da Silva

Nicolas Carels

Marcelo Trindade dos Santos

Francisco José Pereira Lopes

About the editors

Dr. Fabrício Alves Barbosa da Silva is a Public Health Researcher in the Scientific Computing Program (PROCC) at the Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro, Brazil. His other publications include the Springer title Theoretical and Applied Aspects of Systems Biology (with Dr. Carels).

Dr. Nicolas Carels is a Titular Researcher in the Centre for Technological Development in Health (CDTS) at FIOCRUZ.

Dr. Marcelo Trindade dos Santos is a Full Technologist at the  Department of Computational Modeling, National Laboratory of Scientific Computing (LNCC), Petrópolis, Brazil.

Dr. Francisco José Pereira Lopes is an Associate Professor in the Graduate Program in Nanobiosystems at the Federal University of Rio de Janeiro (UFRJ), Duque de Caxias, Brazil.

Bibliographic Information

Book Title : Networks in Systems Biology

Book Subtitle : Applications for Disease Modeling

Editors : Fabricio Alves Barbosa da Silva, Nicolas Carels, Marcelo Trindade dos Santos, Francisco José Pereira Lopes

Series Title : Computational Biology

DOI : https://doi.org/10.1007/978-3-030-51862-2

Publisher : Springer Cham

eBook Packages : Computer Science , Computer Science (R0)

Copyright Information : Springer Nature Switzerland AG 2020

Hardcover ISBN : 978-3-030-51861-5 Published: 04 October 2020

Softcover ISBN : 978-3-030-51864-6 Published: 05 October 2021

eBook ISBN : 978-3-030-51862-2 Published: 03 October 2020

Series ISSN : 1568-2684

Series E-ISSN : 2662-2432

Edition Number : 1

Number of Pages : X, 377

Number of Illustrations : 11 b/w illustrations, 71 illustrations in colour

Topics : Computational Biology/Bioinformatics , Systems Biology , Simulation and Modeling , Health Informatics , Cancer Research

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 21 February 2023

Single-cell biological network inference using a heterogeneous graph transformer

  • Anjun Ma   ORCID: orcid.org/0000-0001-6269-398X 1 , 2   na1 ,
  • Xiaoying Wang 3   na1 ,
  • Jingxian Li 3 ,
  • Cankun Wang   ORCID: orcid.org/0000-0002-0225-9855 1 ,
  • Tong Xiao 2 ,
  • Yuntao Liu 3 ,
  • Hao Cheng 1 ,
  • Juexin Wang   ORCID: orcid.org/0000-0002-2260-4310 4 , 5 ,
  • Yang Li 1 ,
  • Yuzhou Chang 1 , 2 ,
  • Jinpu Li 5 , 6 ,
  • Duolin Wang   ORCID: orcid.org/0000-0003-1649-460X 4 , 5 ,
  • Yuexu Jiang 4 , 5 ,
  • Li Su 5 , 6 ,
  • Gang Xin 2 ,
  • Shaopeng Gu 1 ,
  • Zihai Li   ORCID: orcid.org/0000-0003-4603-927X 2 ,
  • Bingqiang Liu   ORCID: orcid.org/0000-0002-5734-1135 3   na2 ,
  • Dong Xu   ORCID: orcid.org/0000-0002-4809-0514 4 , 5 , 6   na2 &
  • Qin Ma   ORCID: orcid.org/0000-0002-3264-8392 1 , 2   na2  

Nature Communications volume  14 , Article number:  964 ( 2023 ) Cite this article

25k Accesses

15 Citations

41 Altmetric

Metrics details

  • Cancer genomics
  • Computational models
  • Gene regulatory networks
  • Machine learning

Single-cell multi-omics (scMulti-omics) allows the quantification of multiple modalities simultaneously to capture the intricacy of complex molecular mechanisms and cellular heterogeneity. Existing tools cannot effectively infer the active biological networks in diverse cell types and the response of these networks to external stimuli. Here we present DeepMAPS for biological network inference from scMulti-omics. It models scMulti-omics in a heterogeneous graph and learns relations among cells and genes within both local and global contexts in a robust manner using a multi-head graph transformer. Benchmarking results indicate DeepMAPS performs better than existing tools in cell clustering and biological network construction. It also showcases competitive capability in deriving cell-type-specific biological networks in lung tumor leukocyte CITE-seq data and matched diffuse small lymphocytic lymphoma scRNA-seq and scATAC-seq data. In addition, we deploy a DeepMAPS webserver equipped with multiple functionalities and visualizations to improve the usability and reproducibility of scMulti-omics data analysis.

Similar content being viewed by others

biological networks thesis

Multi-omics single-cell data integration and regulatory inference with graph-linked embedding

biological networks thesis

Paired single-cell multi-omics data integration with Mowgli

biological networks thesis

scGCN is a graph convolutional networks algorithm for knowledge transfer in single cell omics

Introduction.

Single-cell sequencing, such as single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq), reshapes the investigation of cellular heterogeneity and yields insights in neuroscience, cancer biology, immuno-oncology, and therapeutic responsiveness 1 , 2 . However, an individual single-cell modality only reflects a snapshot of genetic features and partially depicts the peculiarity of cells, leading to characterization biases in complex biological systems 2 , 3 . Single-cell multi-omics (scMulti-omics) allows the quantification of multiple modalities simultaneously to fully capture the intricacy of complex molecular mechanisms and cellular heterogeneity. Such analyses advance various biological studies when paired with robust computational analysis methods 4 .

The existing tools for integrative analyses of scMulti-omics data, e.g., Seurat 5 , MOFA+ 6 , Harmony 7 , and totalVI 8 , reliably predict cell types and states, remove batch effects, and reveal relationships or alignments among multiple modalities. However, most existing methods do not explicitly consider the topological information sharing among cells and modalities. Hence, they cannot effectively infer the active biological networks of diverse cell types simultaneously with cell clustering and have limited power to elucidate the response of these complex networks to external stimuli in specific cell types.

Recently, graph neural networks (GNN) have shown strength in learning low-dimensional representations of individual cells by propagating neighbor cell features and constructing cell-cell relations in a global cell graph 9 , 10 . For example, our in-house tool scGNN, a GNN model, has demonstrated superior cell clustering and gene imputation performance based on large-scale scRNA-seq data 11 . Furthermore, a heterogeneous graph with different types of nodes and edges has been widely used to model a multi-relational knowledge graph 12 . It provides a natural representation framework for integrating scMulti-omics data and learning the underlying cell-type-specific biological networks. Moreover, a recent development in the attention mechanism for modeling and integrating heterogeneous relationships has made deep learning models explainable and enabled the inference of cell-type-specific biological networks 12 , 13 .

In this work, we developed DeepMAPS (Deep learning-based Multi-omics Analysis Platform for Single-cell data), a heterogeneous graph transformer framework for cell-type-specific biological network inference from scMulti-omics data. This framework uses an advanced GNN model, i.e., heterogeneous graph transformer (HGT), which has the following advantages: (i) It formulates an all-in-one heterogeneous graph that includes cells and genes as nodes, and the relations among them as edges. (ii) The model captures both neighbor and global topological features among cells and genes to construct cell-cell relations and gene-gene relations simultaneously 9 , 14 , 15 , 16 . (iii) The attention mechanism in this HGT model enables the estimation of the importance of genes to specific cells, which can be used to discriminate gene contributions and enhances biological interpretability. (iv) This model is hypothesis-free and does not rely on the constraints of gene co-expressions, thus potentially inferring gene regulatory relations that other tools usually cannot find. It is noteworthy that DeepMAPS is implemented into a code-free, interactive, and non-programmatic interface, along with a Docker, to lessen the programming burden for scMulti-omics data.

Overview of DeepMAPS

Overall, DeepMAPS is an end-to-end and hypotheses-free framework to infer cell-type-specific biological networks from scMulti-omics data. There are five major steps in the DeepMAPS framework (Fig.  1 and Methods). (i) Data are preprocessed by removing low-quality cells and lowly-expressed genes, and then different normalization methods are applied according to the specific data types. An integrated cell-gene matrix is generated to represent the combined activity of each gene in each cell. Different data integration methods are applied for different scMulti-omics data types 5 , 6 , 7 , 8 . (ii) A heterogeneous graph is built from the integrated matrix, including cells and genes as nodes and the existence of genes in cells as edges. (iii) An HGT model is built to jointly learn the low-dimensional embedding for cells and genes and generate an attention score to indicate the importance of a gene to a cell. (iv) Cell clustering and functional gene modules are predicted based on HGT-learned embeddings and attention scores. (v) Diverse biological networks, e.g., gene regulatory networks (GRN) and gene association networks, are inferred in each cell type.

figure 1

a The overall framework of DeepMAPS. Five main steps were included in carrying out cell clustering and biological gene network inference from the input scMulti-omics data. b The graph autoencoder was inserted with a HGT model. The integrated cell-gene matrix was used to build a heterogeneous graph include all cells (green) and genes (purple) as nodes. The HGT model is trained on multiple subgraphs (50 subgraphs as an example) that cover nodes in the whole graph as many as possible. Each subgraph is used to train the model with 100 epochs; thus, the whole training process iterates 5,000 times. The trained model is then applied to the whole graph to learn and update the embeddings of each node. c An illustration of embedding update process of the target node in a single HGT layer. The red circle in the upper panel indicates the target node and the black circle indicates the source nodes. Arrows represents for the connection between a target node and source nodes. Colored rectangles represent for embeddings of different nodes. The zoom in detailed process in the bottom panel shows the massage passing process and attention mechanism. The final output of one HGT layer is an update of node embedding for all nodes. HGT heterogeneous graph transformer.

To learn joint representations of cells and genes, we first generate a cell-gene matrix integrating the information of the input scMulti-omics data. A heterogeneous graph with cell nodes and gene nodes is then constructed, wherein an unweighted cell-gene edge represents the existence of gene activity of a gene in a cell, and the initial embedding of each node is learned from the gene-cell integrated matrix via two-layer GNN graph autoencoders (Methods). Such a heterogeneous graph offers an opportunity to clearly represent and organically integrate scMulti-omics data so that biologically meaningful features can be learned synergistically. The entire heterogeneous graph is then sent to a graph autoencoder to learn the relations between the cells and genes and update the embedding of each node. Here, DeepMAPS adopts a heterogeneous multi-head attention mechanism to model the overall topological information (global relationships) and neighbor message passing (local relationships) on the heterogeneous graph. The heterogeneous graph representation learning provides a way to enable the embedding of cells and genes simultaneously using the transformer in DeepMAPS. The initial graph determines the path of message passing and how the attention scores can be calculated in DeepMAPS.

In each HGT layer, each node (either a cell or a gene) is considered a target, and its 1-hop neighbors as sources. DeepMAPS evaluates the importance of its neighbor nodes and the amount of information that can be passed to the target based on the synergy of node embedding (i.e., attention scores). As a result, cells and genes with highly positive correlated embeddings are more likely to pass messages within each other, thus maximizing the similarity and disagreement of the embeddings. To make the unsupervised training process feasible on a large heterogeneous graph, DeepMAPS is performed on 50 subgraphs sampled from the heterogeneous graph, covering a minimum of 30% of cells and genes to train for the shared parameters between different nodes, information which is later used for testing of the whole graph. As an important training outcome, an attention score is given to represent the importance of a gene to a cell. A high attention score for a gene to a cell implies that the gene is relatively important for defining cell identity and characterizing cell heterogeneity. This discrimination allows for the construction of reliable gene association networks in each cell cluster as the final output of DeepMAPS. We then build a Steiner Forest Problem (SFP) model 17 to identify genes with higher attention scores and similar embedding features to a cell cluster. The gene-gene and gene-cell relations in the optimized solution of the SFP model mirror the embedding similarity of genes and the attention importance of genes to each cell cluster. A gene association network can be established by genes with the highest importance in characterizing the identity of that cell cluster based on their attention scores and embedding similarities, and these genes are considered to be cell-type-active.

DeepMAPS achieves superior performances in cell clustering and biological network inference from scMulti-omics data

We benchmarked the cell clustering performance of DeepMAPS on ten scMulti-omics datasets, including three multiple scRNA-seq datasets (R-bench-1, 2, and 3), three CITE-seq datasets (C-bench-1, 2, and 3), and four matched scRNA-seq and scATAC-seq (scRNA-ATAC-seq) datasets measured from the same cell (A-bench-1, 2, 3, and 4) (Supplementary Data  1 ). Specifically, the six R-bench and C-bench datasets have benchmark annotations provided in their original manuscripts, while the four A-bench datasets do not. These datasets cover the number of cells ranging from 3,009 to 32,029; an average read depth (considering scRNA-seq data only) ranging from 2,885 to 11,127; and a zero-expression rate (considering scRNA-seq data only) from 82 to 96% (Supplementary Data  1 ).

We compared DeepMAPS with four benchmarking tools (Seurat v3 and v4 5 , 18 , MOFA +  6 , TotalVI 8 , Harmony 7 , and GLUE 19 (Methods)) in terms of the Average Silhouette Width (ASW), Calinski-Harabasz (CH), Davies-Bouldin Index (DBI), and Adjusted Rand Index (ARI) to evaluate cell clustering performance. For each dataset, we trained DeepMAPS on 36 parameter combinations, including the number of heads, learning rate, and the number of training epochs. To ensure fairness, each benchmarking tool was also tuned with different parameter combinations (Methods). DeepMAPS achieves the best performance comparing all benchmark tools in all test datasets in terms of ARI (for R-benches and C-benches) and ASW (for A-benches) (Fig.  2a , Supplementary Figs.  1 – 3 , and Source Data  1 - 3 ). We also noticed that Seurat was the second-best performed tool, with small variances for different parameter selections in all benchmark datasets. We selected the default parameter per data type based on the performance of parameter combinations on the grid-search benchmarking. The parameter combination with the highest median ARI/ASW scores averaged in all benchmark datasets was considered as the default parameters for the corresponding data type.

figure 2

a Benchmark cell clustering results of ten datasets in ARI for the three multiple scRNA-seq data and the three CITE-seq data with benchmark labels, and ASW for the four scRNA-ATAC-seq data without benchmark labels. Each box showcases the minimum, first quartile, median, third quartile, and maximum ARI or AWS results of a tool using different parameter settings (DeepMAPS: n  = 96, Seurat: n  = 16 for RNA-RNA and CITE-seq and 36 for RNA-ATAC, Harmony: n  = 36, MOFA + : n  = 36, TotalVI: n  = 48, and GLUE: n  = 72). Dots represent outliers. b Results comparison on five independent datasets. No repeated experiment was conducted. c Robustness test of DeepMAPS using the cell cluster leave-out method for the three independent test datasets with benchmarking cell labels. p -values were calculated based on two-tail t.test. Each box showcases the minimum, first quartile, median, third quartile, and maximum ARI results of a tool performed on different data subsets (R-test: n  = 5, C-test: n  = 20, and A-test-1: n  = 5). Dots represent outliers. d – f UMAP comparison of R-test, C-test, and A-test-1 datasets between DeepMAPS and other tools using the original cell labels. Source data are provided as a Source Data file. ASW average Silhouette width, ARI adjusted rand index.

Additional benchmarking experiments were carried out to justify the selection of different integration methods in DeepMAPS. Specifically, for the analysis of scRNA-ATAC-seq data, we designed an integration method using gene velocity to balance the weight between gene expressions and chromatin accessibilities in characterizing cell activities and states (Methods). This integration process can ensure harmonizing datasets (especially for multiple scRNA-seq data) and generate an integrated matrix (with genes as rows and cells as columns) as the input for HGT. Our results showed that, for benchmark data 1 and 2 (A-bench-1 and −2), the velocity-based approach showed significantly ( p -value <0.05) higher ASW scores than the weighted nearest neighbor (WNN) approach in Seurat v 4.0 on all grid-search parameter combinations (Supplementary Fig.  4 and Source Data  4 ). We reason that with the inclusion of velocity information, the modality weight between gene expression and chromatin accessibility that contribute to recognize cell types are better balanced (Supplementary Fig.  5 ). The comparison of modality weight of scATAC-seq in different cell clusters by using or without using the velocity-weighted balance method. In addition, we compared different clustering methods (i.e., Leiden, Louvain, and SLM) in DeepMAPS and compared the impact of clustering resolutions (i.e., 0.4, 0.8, 1.2, and 1.6) to cell clustering results. We found no significant differences among these clustering methods, and Louvain showed slightly better performance than the other two (Supplementary Fig.  6 and Source Data  5 ). Lastly, DeepMAPS achieved higher scores than other tools when selecting the same clustering resolution. We also found that, in most cases, higher resolution lower cell clustering prediction scores; therefore, we selected resolution at 0.4 as the default parameter in DeepMAPS (Supplementary Fig.  7 and Source Data  6 – 8 ).

We further independently tested our default parameter selection on five independent datasets, named R-test, C-test, A-test-1, −2, and −3, by comparing our results with the same benchmarking tools using their default parameters. For the three test datasets with benchmarking cell labels, DeepMAPS performs the best in terms of ARI score, while for the two scRNA-ATAC-seq datasets without cell labels, the benchmarking tools in the comparison achieve similar performance (Fig.  2b and Source Data  9 ). In order to evaluate the robustness of DeepMAPS, a leave-one-out test was performed on the three independent test datasets with benchmark labels (Fig.  2c and Source Data  10 ). We first removed all cells in a cell cluster based on benchmark labels and then applied DeepMAPS and other tools on the remaining cells. For each dataset, the leave-one-out results of DeepMAPS were better than the other tools with higher ARI scores, indicating that the message passing and attention mechanism used in DeepMAPS maintains cell-cell relations in a robust manner.

The cell clustering UMAP on the three independent datasets with benchmarking labels showcased that the latent representations obtained in DeepMAPS can better preserve the heterogeneity of scRNA-seq data (Fig.  2d–f ). For the R-test dataset, all tools showed the ability to separate mesenchymal, leukocyte, and endothelial cells, but failed to separate urothelium basal cells and bladder cells. However, cells on the DeepMAPS UMAP are more compact, and the bladder cells (red dots) are grouped better than MOFA + and Seurat (Fig.  2d ). For the C-test dataset, cells in the same cluster are more ordered and compact (e.g., the B cell cluster and NK cell cluster), while cells from different clusters are more apart from each other on DeepMAPS UMAP (e.g., CD8 cell clusters and CD4 cell clusters). (Fig.  2e ). For the A-test-1 dataset, DeepMAPS was the only tool that accurately separated each cell type. In contrast, Seurat and MOFA + mistakenly divided the PDX1 or PDX2 population into two clusters and included more mismatches (Fig.  2f ).

DeepMAPS can infer statistically significant and biologically meaningful gene association networks from scMulti-omics data

We evaluated the two kinds of biological networks that DeepMAPS can infer, gene association network and GRN, in terms of centrality scores and functional enrichment. For the R-test dataset (Fig.  3a ) and C-test dataset (Fig.  3b ), we used two centrality scores, closeness centrality (CC) and eigenvector centrality (EC), that have been used in previous single-cell gene association network evaluations 20 , to compare the identified gene association networks from all the tools in this comparison. CC reflects the average connectivity of a node to all others in a network, and EC reflects the importance of a node based on its connected nodes. Both CC and EC can interpret node’s influence in identifying genes that may play more critical roles in the network. A gene association network with higher node centrality indicates that the detected genes are more likely to be involved in critical and functional biological systems. We also constructed gene co-expression networks as a background using all genes in a dataset by calculating Pearson’s correlation coefficients of gene expression in a cell cluster. p -value = 0.05 was set as the edge cutoff. We compared cell-type-active gene association networks generated in DeepMAPS with those generated in IRIS3 15 and the background co-expression networks. The average CC and EC of networks constructed by DeepMAPS in R-test and C-test datasets showed significantly higher scores than IRIS3 and the background co-expression networks (Source Data  11 ). We reason that the gene association network generated in DeepMAPS is not only co-expressed but also of great attention impact on cells; thus, genes in the network tend to be more important to the cell type.

figure 3

a , b Closeness centrality (CC) and eigenvector centrality (EC) were used to indicate the compactness and importance of genes to the network. We compared our results with IRIS3 and a background network using all genes for the R-test dataset ( n  = 5) a and C-test dataset ( n  = 14) b . p -values were calculated using a two-tail t-test. c Comparison of the number of unique TFs in GRNs that showed significantly enriched biological functions in three public databases. Each box contains the results of six scRNA-ATAC-seq datasets ( n  = 6). d Comparison of the number of cell-type-specific regulons in GRNs significantly enriched in only one biological function/pathway in the three public databases ( n  = 6). e The F1 score comparisons of regulons enriched to only one function/pathway using three databases ( n  = 6). The mean value of precision and recall scores of the selected six scRNA-ATAC-seq datasets were max-min scaled and shown in the heatmap with darker blue indicating high values and lighter blue indicating low values. Source data are provided as a Source Data file. Each box in Fig.  3 showcases the minimum, first quartile, median, third quartile, and maximum score of the corresponding criteria. CC closeness centrality, EC eigenvector centrality, CTSR cell-type-specific regulon.

To evaluate whether DeepMAPS can identify biologically meaningful GRNs in specific cell types, we performed enrichment tests on basic gene regulatory modules (i.e., regulons 14 ), with three public functional databases, Reactome 21 , DoRothEA 22 , and TRRUST v2 23 . To avoid any bias in comparison, we compared cell-type-specific GRNs inferred from DeepMAPS with (i) IRIS3 and SCENIC 14 on the scRNA-seq matrix, (ii) IRIS3 and SCENIC on a gene-cell matrix recording the gene activity scores (GAS) calculated in DeepMAPS based on the velocity-based integration method, (iii) MAESTRO 24 on scATAC-seq matrix, and (iv) MAESTRO on original scRNA-seq and scATAC-seq matrix. The six datasets collected from human tissue were used (i.e., A-test-1, A-bench-2, A-bench-3, A-bench-4, A-test-1, A-test-2). We first showcased that the GRNs identified in DeepMAPS included more unique transcription factor (TF) regulations than the other tools, except for the enrichment to the DoRothEA database (Fig.  3c and Source Data  12 ). We considered that a highly cell-type-specific regulon (CTSR) might represent only one significant enriched functionality; alternatively, a generic regulon might improperly contain genes involved in several pathways. Therefore, we compared the number of CTSRs enriched to one function/pathway across different tools. DeepMAPS outperformed ( p -value<0.05) other tools on most of the six scRNA-ATAC-seq datasets in terms of the number of regulons that enrich only one function/pathway and the enrichment F1 scores (Fig.  3d, e and Source Data  12 ). For the F1 score of the enrichment test to the TRRUST v2 database, DeepMAPS (median F1 score is 0.026) was slightly lower than IRIS3 using the GAS matrix (median F1 score is 0.031). We also noticed that all tools did not achieve good enrichments in the TRRUST v2 database mainly due to the small number of genes (on average, 10 genes are regulated by one TF; 795 TFs in total). SCENIC also showed competitive scaled precision scores (scaled mean: 0.47 for Reactome, 0.66 for DoRothEA, and 0.61 for TRRUST v2), while achieving lower scaled recall scores, making the F1 scores smaller than DeepMAPS for most datasets. IRIS3 and SCENIC performed on the GAS matrix showed better enrichment results than using scRNA-seq data only, indicating that integrating information from scRNA-ATAC-seq data is more useful for GRN inference than using scRNA-seq data alone.

DeepMAPS accurately identifies cell types and infers cell-cell communication in PBMC and lung tumor immune CITE-seq data

We present a case study that applies DeepMAPS to a published mixed peripheral blood mononuclear cells (PBMC) and lung tumor leukocytes CITE-seq dataset (10× Genomics online resource, Supplementary Data  1 ) to demonstrate capacity in modeling scMulti-omics in characterizing cell identities. The dataset includes RNAs and proteins measured on 3485 cells. DeepMAPS identified 13 cell clusters, including four CD4 + T cell groups (naive, central memory (CM), tissue-resident memory (TRM), and regulatory (Treg)), two CD8 + T cell groups (CM and TRM), a natural killer cell group, a memory B cell group, a plasma cell group, two monocyte groups, one tumor-associated macrophage (TAM) group, and a dendritic cell (DC) group. We annotated each cluster by visualizing the expression levels of curated maker genes and proteins (Fig.  4a, b and Supplementary Data  2 ). Compared to cell types identified using only proteins or RNA, we isolated or accurately annotated cell populations that could not be characterized using the individual modality analysis. For example, the DC cluster was only successfully identified using the integrated protein and RNA. By combining signals captured from both RNA and proteins, DeepMAPS successfully identified biologically reasonable and meaningful cell types in the CITE-seq data.

figure 4

a UMAPs for DeepMAPS cell clustering results from integrated RNA and protein data, protein data only, and RNA data only. Cell clusters were annotated based on curated marker proteins and genes. b Heatmap of curated marker proteins and genes that determine the cell clustering and annotation. c Heatmap of the Spearman correlation comparison of top differentially expressed genes and proteins in plasma cells and memory B cells. d UMAP is colored by the 51 st embedding, indicating distinct embedding representations in plasma cells and memory B cells. e Expression of top differentially expressed genes and proteins in c as a function of the 51 st embedding to observe the pattern relations between plasma cells and memory B cells. Each line represents a gene/protein, colored by cell types. For each gene, a line was drawn using a loess smoothing function based on the corresponding embedding and scaled gene expression in a cell. f – h Similar visualization was conducted for the 56 th embedding to compare EM CD8 + T cells and TRM CD8 + T cells c – e . i Two signaling pathways, NECTIN and ALCAM, are shown to indicate the predicted cell–cell communications between two cell clusters. A link between a filled circle (resource cluster with highly expressed ligand coding genes) and an unfilled circle (target cluster with highly expressed receptor coding genes) indicates the potential cell-cell communication of a signaling pathway. Circle colors represent different cell clusters, and the size represents the number of cells. The two monocyte groups were merged. TRM tissue-resident memory, CM central memory, TAM tumor-associated macrophage, HGT heterogeneous graph transformer.

We then compared the modality correlation between the two cell types. We used the top differentially expressed genes and proteins between memory B cells and plasma cells, and performed hierarchical clustering of the correlation matrix. The result clearly stratified these features into two anticorrelated modules: one associated with memory B cells and the other with plasma cells (Fig.  4c ). Furthermore, we found that the features in the two modules significantly correlated with the axis of maturation captured by our HGT embeddings (Supplementary Fig.  8 and Supplementary Data  3 ). For example, one HGT embedding (the 51 st ) showed distinctive differences between plasma cells and memory B cells (Fig.  4d, e ). Similar findings were also observed when comparing EM CD8 + T cells with TRM CD8 + T cells (Fig.  4f ). Nevertheless, it is possible to identify a representative HGT embedding (56 th ) that maintains embedding signals for a defined separation of the two groups (Fig.  4g, h ). These results point to any two cell clusters consisting of coordinated activation and repression of multiple genes and proteins, leading to a gradual transition in cell state that can be captured by a specific dimension of the DeepMAPS latent HGT space. On the other hand, we generated the gene-associated networks with genes showing high attention scores for EM CD8 + T cells, TRM CD8 + T cells, memory B cells, and plasma cells and observed diverse patterns (Supplementary Fig.  9 ).

Based on the cell types and raw data of gene and protein expressions, we inferred cell–cell communications and constructed communication networks among different cell types within multiple signaling pathways using CellChat 25 (Fig.  4i ). For example, we observed a CD6-ALCAM signaling pathway existing between DC (source) and TRM CD4 + T cells (target) in the lung cancer tumor microenvironment (TME). Previous studies have shown that ALCAM on antigen-presenting DCs interacts with CD6 on the T cell surface and contributes to T cell activation and proliferation 26 , 27 , 28 . As another example, we identified the involvement of the NECTIN-TIGIT signaling pathway during the interaction between the TAM (source) and TRM CD8 + T cells (target), which is supported by a previous report that NECTIN ( CD155 ) expressed on TAM could be immunosuppressive when interacting with surface receptors, TIGIT, on CD8 + T cells in the lung cancer TME 29 , 30 .

DeepMAPS identifies specific GRNs in diffuse small lymphocytic lymphoma scRNA-seq and scATAC-seq data

To further extend the power of DeepMAPS to GRN inference, we used a single-cell Multiome ATAC + Gene expression dataset available on the 10× Genomics website (10× Genomics online resource). The raw data is derived from 14,566 cells of flash-frozen intra-abdominal lymph node tumor from a patient diagnosed with diffuse small lymphocytic lymphoma (DSLL) of the lymph node lymph. We integrate gene expression and chromatin accessibility by balancing the weight of each modality of a gene in a cell based on RNA velocity (Fig.  5a and Method). To build TF-gene linkages, we considered gene expression, gene accessibility, TF-motif binding affinity, peak-to-gene distance, and TF-coding gene expression. Genes found to be regulated by the same TF in a cell cluster are grouped as a regulon. We considered regulons with higher centrality scores to have more significant influences on the characterization of the cell cluster. Regulons regulated by the same TF across different cell clusters are compared for differential regulon activities. Those with significantly higher regulon activity scores (RAS) are considered as the cell-type-specific regulons in the cell cluster.

figure 5

a Conceptual illustration of DeepMAPS analysis of scRNA-ATAC-seq data. Modalities are first integrated based on a velocity-weighted balance. The integrated GAS matrix was then used to build a heterogeneous graph as input into the HGT framework. The cell cluster and gene modules with high attention scores were then used for building TF-gene linkages and determining regulons in each cell cluster. b The UMAP shows the clustering results of DeepMAPS. Cell clusters were manually annotated based on curated marker genes. c The observed and extrapolated future states (arrows) based on the RNA velocity of the normal B cell and the two DSLL states are shown (top panel). Velocity-based trajectory analysis shows the pseudotime from the top to the bottom right (bottom panel). d Selected 20 TF in each of the three clusters, representing the top 20 regulons with the highest centrality scores. Colors represent regulons uniquely identified in each cluster or shared between different clusters. e Regulons in DSLL state-1 showed a significant difference in regulon activity compared to the other clusters. Motif shape and number of regulated genes are also shown. f Violin plots of regulon activities of the four regulons compared between the three clusters. g The downstream-regulated genes of JUN (the most differentially active regulon in DSLL state-1) in the three clusters. h An illustration of the BAFF signaling pathway identified from GAS-based cell-cell communication prediction using CellChat. The BAFF signaling pathway was found to exist between macrophage and both DSLL states. It further activates the JUN regulon and enables the transcription of genes like CDK6 . Figure created with BioRender.com. i The ATAC peak, RNA expression, and GAS level of TNFRSF13B (the coding gene of TACI, the receptor in the BAFF signaling pathway). Source data are provided as a Source Data file.

DeepMAPS identified 11 cell clusters in the DSLL data. All clusters were manually annotated based on curated gene markers (Fig.  5b and Supplementary Data  4 ). Two DSLL-like cell clusters (DSLL state-1 and state-2) were observed. The RNA velocity-based pseudotime analysis performed on the three B cell clusters (normal B cell and two DSLL states) assumed that the two DSLL states were derived from normal B cells, and state-1 was derived earlier than state-2, although the two states seemed to be partially mixed (Fig.  5c ). We further selected the top 20 TFs with the highest regulon centrality scores in each of the three cell clusters (Fig.  5d and Source Data  13 ). Interestingly, these TFs showed distinctions between the normal and the two DSLL states and inferred variant regulatory patterns within the two DSLL states. For regulons shared by all three B cell clusters, EGR1, MEF2B, and FOS were transcriptionally active in both normal B and DSLL cells and responsible for regulating B cell development, proliferation, and germinal center formation 31 , 32 , 33 , 34 . E2F6, ELF3, and KLF16 were identified as shared only in the two DSLL states, with reported roles in tumorigenesis 35 , 36 , 37 , 38 , 39 , 40 . Further, JUN, MAFK, and MAFG, which encode the compartments of the activating protein-1 (AP-1), 34 , 41 , 42 were found to be active in DSLL state-1 while NFKB1, coding for a subunit of the NF-κB protein complex 43 , 44 , was found to be active in DSLL state-2.

We constructed a GRN consisting of the four cell-type-specific regulons (JUN, KLF16, GATA1, and FOS) (Fig.  5e and Supplementary Fig.  10 ) in DSLL state-1 with RAS that is significantly higher than normal B cells and DSLL state-2 (Fig.  5f ). KLF16 reportedly promotes the proliferation of both prostate 39 and gastric cancer cells 40 . FOS and JUN are transcription factors in the AP-1 family, regulating the oncogenesis of multiple types of lymphomas 34 , 41 , 42 , 45 , and GATA1 is essential for hematopoiesis, the dysregulation implicated in multiple hematologic disorders, and malignancies 46 , 47 . Distinct regulatory patterns were also observed when we zoomed in on a single regulon (Fig.  5g and Supplementary Figs.  11 - 12 ). As the most active regulon in DSLL state-1, JUN was found to regulate five unique downstream genes and 12 genes shared with DSLL state-2. Downstream genes, including CDK6 33 , 34 , IGF2R 48 , and RUNX1 49 , are critical for cell proliferation, survival, and development functions in DSLL.

Moreover, we further built connections between upstream cell-cell communication signaling pathways and downstream regulatory mechanisms in DSLL cells. We identified a cell-cell communication between macrophage and the two DSLL states via the B cell activation factor (BAFF) signaling pathway, based on the integrated GAS matrix using CellChat 25 , which includes BAFF as the ligand on macrophage cells and TACI (transmembrane activator and calcium-modulator and cyclophilin ligand interactor) as the receptor on DSLL cells (Fig.  5h ). BAFF signaling is critical to the survival and maturation of normal B cells 50 , 51 , while aberrations contribute to the resistance of malignant B cells to apoptosis 52 , 53 . We observed that the expression of the TACI coding gene, TNFRSF13B , was explicitly higher in the two DSLL states, while the corresponding chromatin accessibility maintained high peaks in state-1 (Fig.  5i ). Upon engagement with its ligand, TACI has been reported to transduce the signal and eventually activate the AP-1 54 , 55 and NF-κB 56 , 57 transcriptional complexes for downstream signaling in B cells. JUN (a subunit of AP-1) was identified as the most specific and key regulator in state-1 responsible for cell proliferation and regulating downstream oncogenes, such as CDK6 , that has been reported to promote the proliferation of cancer cells in multiple types of DSLLs as well as other hematological malignancies 58 , 59 , 60 . It is clear that BAFF signaling first appears in DSLL state-1 and triggers the activation of the JUN regulatory mechanism, leading to a high regulon activity of JUN. The JUN regulon accelerates the proliferation and oncogenesis explicitly in DSLL, leading to a more terminal differential stage of DSLL (state-2). As a result, state-1 includes cells undergoing rapid cell proliferation and differentiation, transitioning from normal B cells to matured DSLL. In short, DeepMAPS can construct GRNs and identify cell-type-specific regulatory patterns to offer a better understanding of cell states and developmental orders in diseased subpopulations.

DeepMAPS provides a multi-functional and user-friendly web portal for analyzing scMulti-omics data

Due to the complexity of single-cell sequencing data, more webservers and dockers have been developed in the past three years 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 (Supplementary Data  5 ). However, most of these tools only provide minimal functions such as cell clustering and differential gene analysis. They do not support the joint analysis of scMulti-omics data and especially lack sufficient support for biological network inference. On the other hand, we recorded the running time of DeepMAPS and benchmark tools on different datasets with cell numbers ranging from 1000 to 160,000 (Supplementary Data  6 ). The deep learning models (DeepMAPS and TotalVI) have longer running time than Seurat and MOFA + . To these ends, we provided a code-free, interactive, and non-programmatic interface to lessen the programming burden for scMulti-omics data (Fig.  6a ). The webserver supports the analysis of multiple RNA-seq data, CITE-seq data, and scRNA-ATAC-seq data using DeepMAPS (Fig.  6b ). Some other methods, e.g., Seurat, are also incorporated as an alternative approach for the users’ convenience. Three major steps—data preprocessing, cell clustering and annotation, and network construction—are included in the server. In addition, the DeepMAPS server supports real-time computing and interactive graph representations. Users may register for an account to have their own workspace to store and share analytical results. Other than the advances mentioned, the DeepMAPS webserver highlights an additional function for the elucidation of complex networks in response to external stimuli in specific cell types. The user can upload a metadata file with phenotype information (e.g., cells with treatment and without treatment), select, and re-label the corresponding cells (e.g., CD8+ T cells with treatment and CD8+ T cells without treatment). In this way, DeepMAPS will predict the treatment-related networks in CD8+ T cells. Examples are given in the online tutorial at https://bmblx.bmi.osumc.edu/tutorial .

figure 6

a Software-engineering diagram of DeepMAPS and an overview of the framework. b Pipeline illustration of the server, including major steps (left; colors indicate different steps), detailed analyses (middle), and featured figures and tables (right).

DeepMAPS is a deep-learning framework that implements heterogeneous graph representation learning and a graph transformer in studying biological networks from scMulti-omics data. By building a heterogeneous graph containing both cells and genes, DeepMAPS identifies their joint embedding simultaneously and enables the inference of cell-type-specific biological networks along with cell types in an intact framework. Furthermore, the application of a heterogeneous graph transformer models the cell-gene relation in an interpretable uniform multi-relation. In such a way, the training and learning process in a graph can be largely shortened to consider cell impacts from a further distance.

By jointly analyzing gene expression and protein abundance, DeepMAPS accurately identified and annotated 13 cell types in a mixed CITE-seq data of PBMC and lung tumor leukocytes based on curated markers that cannot be fully elucidated using a single modality. We have also proved that the embedding features identified in DeepMAPS capture statistically significant signals and amplify them when the original signals are noisy. Additionally, we identified biologically meaningful cell-cell communication pathways between DC and TRM CD4 + T cells based on the gene association network inferred in the two clusters. For scRNA-ATAC-seq, we employed an RNA velocity-based method to dynamically integrate gene expressions and chromatin accessibility that enhanced the prediction of cell clusters. Using this method, we identified distinct gene regulatory patterns among normal B cells and two DSLL development states. We further elucidated the deep biological connections between cell-cell communications and the downstream GRNs, which helped characterize and define DSLL states. The identified TFs and genes can be potential markers for further validation and immuno-therapeutical targets in DSLL treatment.

While there are advantages and improved performances for analyzing scMulti-omics data, there is still room to improve the power of DeepMAPS further. First, the computational efficiency for super-large datasets (e.g., more than 1 million cells) might be a practical issue considering the complexity of the heterogeneous graph representation (which may contain billions of edges). Moreover, DeepMAPS is recommended to be run on GPUs, which leads to a potential problem of reproducibility. Different GPU models have different floating-point numbers that may influence the precision of loss functions during the training process. For different GPU models, DeepMAPS may generate slightly different cell clustering and network results. Lastly, the current version of DeepMAPS is based on a bipartite heterogeneous graph with genes and cells. Separate preprocessing and integration steps are required to transfer different modalities to genes for integration into a cell-gene matrix. To fully achieve an end-to-end framework for scMulti-omics analysis, the bipartite graph can be extended to a multipartite graph, where different modalities can be included as disjoint node types (e.g., genes, proteins, or peak regions). Such a multipartite heterogeneous graph can also include knowledge-based biological information, such as known molecular regulations and more than two modalities in one graph. However, by including more node types, the computational burden will be increased geometrically, which requires a dedicated discovery of model and parameter optimization in the future.

In summary, we evaluated DeepMAPS as a pioneer study for the integrative analysis of scMulti-omics data and cell-type-specific biological network inference. It will likely provide different visions of deep learning deployment in single-cell biology. With the development and maintenance of the DeepMAPS webserver, our long-term goal is to create a deep learning-based eco-community for archiving, analyzing, visualizing, and disseminating AI-ready scMulti-omics data.

Data description

We included ten public datasets (i.e., R-bench-1-3, C-bench-1-3, and A-bench-1-4) for grid-test benchmarking among DeepMAPS and existing tools and additional five datasets (i.e., R-test-1, C-test-1, and A-test-1-3) for independent test with optimized parameters. The human PBMC and lung tumor leukocyte CITE-seq data and 10× lymph node scRNA-seq & scATAC-seq data were used for the two case studies, respectively. All data are publicly available (Supplementary Data  1 and Data Availability).

Data preprocessing and integration

The analysis of DeepMAPS takes the raw counts matrices of multiple scRNA-seq (multiple gene expression matrices), CITE-seq (gene and surface protein expressions matrices), and scRNA-ATAC-seq (gene expression and chromatin accessibility matrices) data as input. For each data matrix, we define modality representations as rows (genes, proteins, or peak regions) and cells as columns across the paper unless exceptions are mentioned. In each data matrix, a row or a column is removed if it contains less than 0.1% non-zero values. The data quality control is carried out by Seurat v3, including but not limited to total read counts, mitochondrial gene ratio, and blacklist ratio. Additional data preprocessing and integration methods are showcased below.

Multiple scRNA-seq data

Gene expression matrices are log-normalized, and the top 2000 highly variable genes are selected using Seurat v3 18 from each matrix. If there are less than 2000 genes in the matrix, all of them will be selected for integration. We then apply the widely-used canonical correlation analysis (CCA) in Seurat to align these matrices and harmonize scRNA-seq data, leading to a matrix \(X=\{{x}_{{ij}}|i={1,2},\ldots,{I;j}={1,2},\ldots,\,J)\) for I genes in J cells.

CITE-seq data

Gene and surface protein expression matrices are log-normalized. The top I 1 highly variable genes ( I 1 = 2000) and I 2 proteins ( I 2 is the total number of proteins in the matrix) are selected. The two matrices are then concatenated vertically, leading to a matrix \(X=({x}_{{ij}}|i={{1,2}},\ldots,({{{\rm I}_{1}}}+{{{\rm I}_{2}}}){{;}} \, {{{{{\rm{j}}}}}}={{1,2}},\ldots,\,J)\) for I 1 genes and I 2 proteins in J cells (can be treated as \({I}_{1}+{I}_{2}=I\) genes in J cells). A centered log-ratio (CLR) transformation is performed on X as follows:

where \({{{{{{\mathcal{Z}}}}}}}_{j}\) represents the set of indices for non-zero genes in cell j , and |∙| means the number of elements in the set.

scRNA-ATAC-seq data

The gene expression matrix \({X}^{R}=\{{x}_{{ij}}^{R}|i={1,2},\ldots,{I;j}={1,2},\ldots,\,J\}\) with I genes and J cells is log-normalized. Then a left-truncated mixture Gaussian (LTMG) model is used to provide a qualitative representation of each gene over all cells, through the modeling of how underlying regulatory signals control gene expressions in a cell population 74 . Specifically, if gene i can be represented by G i Gaussian distributions over all J cells, that means there are potentially G i regulatory signals regulating this gene. A matrix \({X}^{{R{{\hbox{'}}}}}=\{{x}_{{ij}}^{{R{{\hbox{'}}}}}\}\) with the same dimension as X R can be generated, where the gene expressions are labeled by discrete values of \({x}_{{ij}}^{{R{{\hbox{'}}}}}={1,2},\ldots {G}_{i}\) .

The chromatin accessibility matrix is represented as \({X}^{A}=\{{x}_{{kj}}^{A}|k={1,2},\ldots,{K;j}={1,2},\ldots,\,J\}\) for K peak regions in J cells. We annotate peak regions in X A into corresponding genes based on the method described in MAESTRO 24 . Specifically, a regulatory potential weight w ik for peak k to gene i is calculated conditional to the distance of peak k to gene i in the genome:

where \({d}_{{ik}}\) is the distance between the center of peak \(k\) and the transcription start site of gene \(i\) , and \({d}_{0}\) is the half-decay of the distance (set to be 10 kb). The regulatory potential weight \({w}_{{ik}}\) of peak \(k\) to gene \(j\) is normally calculated by \({2}^{-\frac{{d}_{{ik}}}{{d}_{0}}}\) . For peaks with \({d}_{{ik}} \, > \, 150{kb}\) , \({w}_{{ik}}\) will be less than 0.0005, and thus we set it to 0 for convenience. In MAESTRO, for peaks located in the exon region, \({d}_{0}\) is 0, so that \({w}_{{ik}}\) should be \(1\) according to the formula, and \({w}_{{ik}}\) is normalized by the total exon length of gene \(i\) . The reason is that, in bulk ATAC-seq data, it is observed that many highly expressed genes will also have ATAC-seq peaks in the exon regions, mainly due to the temporal PolII and other transcriptional machinery bindings. Based on that observation, to better fit the model with gene expression, MAESTRO added the signals from the exon regions. However, as reads tend to be located in longer exons more easily than shorter exons, to normalize the possibility of background reads, it normalizes the total reads on exons by the total exon length for each gene. Eventually, a regulatory potential score of peak \(k\) to gene \(i\) in cell \(j\) can be calculated as \({r}_{{ik|j}}={w}_{{ik}}{\times x}_{{kj}}^{A}\) . The scATAC-seq matrix \({X}^{A}\) can then be transformed into a gene regulatory potential matrix by summing up the regulatory potential scores of peaks that regulate the same gene:

giving rise to the regulatory potential matrix \({X}^{A^{\prime} }=\{{x}_{ij}^{A^{\prime} }|i=1,2,\ldots,I;\,j=1,2,\ldots,\,J\}\) for same \(I\) genes in \(J\) cells with \({X}^{R}\) .

We assume that the activity of a gene in a cell is determined by gene expression and gene regulatory activity with different contributions. Unlike the contribution weights determined directly based on the expression and chromatin accessibility values in Seurat v4 (weighted nearest neighbor) 5 , we hypothesize that the relative contribution of the expression and chromatin accessibility of a gene to a cell is dynamic rather than static and not accurately determined with a snapshot of the cell. RNA velocity is determined by the abundance of unspliced and spliced mRNA in a cell. The amount of unspliced mRNA is determined by gene regulation and gene transcription rate, and the amount of spliced mRNA is determined by the difference between unsliced mRNA and degraded mRNA. We reasoned that for genes with positive RNA velocities in a cell, there are higher potentials to drive gene transcription. Thus, their regulatory activity related to chromatin accessibility has a greater influence than the gene expression in defining the overall transcriptional activity in the cell of the current snapshot. For genes with negative velocities, the transcription rate tends to be decelerated; hence chromatin accessibility has less influence on transcriptional activity than gene expression.

A velocity matrix \({X}^{V}=\{{x}_{{ij}}^{V}|i={1,\, 2},\ldots {I;\, j}={1,\, 2},\ldots,\,J\}\) is generated using scVelo with the default parameters 75 . Considering that some genes may fail to obtain valid velocity or regulatory potential values, we simultaneously remove the genes that have all-zero rows in \({X}^{{A{{\hbox{'}}}}}\) or \({X}^{V}\) from the four matrices \({X}^{R},\, {X}^{{R{{\hbox{'}}}}},\,{X}^{{A{{\hbox{'}}}}},\,{X}^{V}.\) Without loss of generality, we still use \(I\) and \(J\) represent the size of these new matrices. Furthermore, considering the potential bias when interpreting the velocity of a gene in a cell, we use the LTMG representations \({x}_{{ij}}^{{R{{\hbox{'}}}}}\in \{{{{{\mathrm{1,2}}}}},\ldots,{G}_{i}\}\) to discretize \({x}_{{ij}}^{V}\) . For gene \(i\) , let \({{{{{{\mathcal{J}}}}}}}_{g}\) be the cell set where gene \(i\) has the same LTMG signal \(g\in \{{{{{\mathrm{1,\, 2}}}}},\ldots,{G}_{i}\}\) . For the cells in \({{{{{{\mathcal{J}}}}}}}_{g}\) , we use the mean velocity of gene \(i\) in these cells to replace the original velocities. To calculate a velocity weight \(\beta\) of gene \(i\) in cell \(j\) , we first extract \({X}_{i}^{V}=({x}_{i1}^{V},\, {x}_{i2}^{V},\ldots {x}_{{iJ}}^{V})\) for the velocity of gene \(i\) in all cells and \({X}_{j}^{V}=({x}_{1j}^{V},\, {x}_{2j}^{V},\ldots {x}_{{I}_{2}j}^{V})\) for the velocity of all genes in cell \(j\) . Then, for \({X}_{i}^{V}\) , let \({{{{{{\mathcal{X}}}}}}}_{i}^{V+}=\{{x}_{{ij}}^{V}|{x}_{{ij}}^{V} \, > \, 0,j={1,\, 2},\ldots,J\}\) for all cells with positive velocities of gene \(i\) and \({{{{{{\mathcal{X}}}}}}}_{i}^{V-}=\{{x}_{{ij}}^{V}|{x}_{{ij}}^{V} \, < \, 0,j={1,2},\ldots,J\}\) for all cells with negative velocities of gene \(i\) . Similarly, for \({X}_{j}^{V}\) , let \({{{{{{\mathcal{X}}}}}}}_{j}^{V+}=\{{x}_{{ij}}^{V}|{x}_{{ij}}^{V} \, > \, 0{;i}={1,2},\ldots I\}\) for all genes with positive velocities in cell \(i\) and \({{{{{{\mathcal{X}}}}}}}_{j}^{V-}=\{{x}_{{ij}}^{V}|{x}_{{ij}}^{V} \, < \, 0{;}i={1,2},\ldots I\}\) for all genes with negative velocities in cell \(i\) . For \({x}_{{ij}}^{V} \, > \, 0\) , rank \({{{{{{\mathcal{X}}}}}}}_{i}^{V+}\) and \({{{{{{\mathcal{X}}}}}}}_{j}^{V+}\) from high to low based on velocity values with the ranking starting from 1 and calculate the velocity weight as:

where \(a\) is the rank of \({x}_{{ij}}^{V}\) in \({{{{{{\mathcal{X}}}}}}}_{i}^{V+}\) , b is the rank of \({x}_{{ij}}^{V}\) in \({{{{{{\mathcal{X}}}}}}}_{j}^{V+}\) .

Similarly, for \({x}_{{ij}}^{V} \, < \, 0\) , rank \({{{{{{\mathcal{X}}}}}}}_{i}^{V-}\) and \({{{{{{\mathcal{X}}}}}}}_{j}^{V-}\) from high to low based on absolute value of velocities with ranking starting from 0 and calculate the velocity weight as:

where \(a\) is the rank of \({x}_{{ij}}^{V}\) in \({{{{{{\mathcal{X}}}}}}}_{i}^{V-}\) and \(b\) is the rank of \({x}_{{ij}}^{V}\) in \({{{{{{\mathcal{X}}}}}}}_{j}^{V-}\) .

We now generate a gene activity matrix \({X}^{G}=\{{x}_{{ij}}^{G}\}\) , integrating gene expression and chromatin accessibility based on the velocity weight. \({x}_{{ij}}^{G}\) is the gene activity score (GAS) of gene \(i\) in cell \(j\) :

Construction of gene-cell heterogeneous graph

To simplify notations, we now redefine any integrative matrix generated in the previous section as \(X=\{{x}_{{ij}}|i={1,\, 2},\ldots,\, {I;\,j}={1,2},\ldots,J\}\) with \(I\) genes and \(J\) cells. \({x}_{{ij}}\) represents either normalized expressions (for multiple scRNA-seq and CITE-seq) or GAS (for scRNA-ATAC-seq) of gene \(i\) in cell \(j\) . We calculate initial embeddings for genes and cells via two autoencoders. We used two autoencoders to generate the initial embeddings for cells and genes, respectively. The cell autoencoder reduces gene dimensions for each cell from \({{{{{\rm{I}}}}}}\) dimensions to 512 dimensions and eventually to 256 dimensions; a gene autoencoder reduces cell dimensions for each gene from \({{{{{\rm{J}}}}}}\) dimensions to 512 and 256 dimensions. So that, each cell and gene have the same initial embedding of 256 dimensions. The number of lower dimensions is optimized as a hyperparameter that differs for each dataset. The output layer is a reconstructed matrix \(\hat{X}\) with the same dimension as \(X\) . The loss function of cell autoencoder is the mean squared error ( MSE ) of \(X\) and \(\hat{X}\) :

The gene autoencoder learns low dimensional features of genes from all cells, which has an encoder, latent space, and a decoder similar to the cell autoencoder, while the input \({X}^{T}\) is the transposed matrix of \(X\) . The loss function of gene autoencoder is

where \(\hat{{X}^{T}}\) is the reconstructed matrix in the output layer with the same dimensions as \({X}^{T}\) .

Definition 1 (Heterogeneous graph): A heterogeneous graph is a graph with multiple types of nodes and/or multiple types of edges. We denote a heterogeneous graph as \(G=(V,\, E,\, A,\, R)\) , where \(V\) represents nodes, \(E\) represents edges, \(A\) represents the node type union, and \(R\) represents the edge type union.

Definition 2 (Node type and edge type mapping function): We define \(\tau \left(v\right):V\to A\) and \(\phi \left(e\right):E\to R\) as the mapping function for node types and edge types, respectively.

Definition 3 (Node meta relation): For a node pair of \({v}_{1}\) and \({v}_{2}\) linked by an edge \({e}_{{{{{\mathrm{1,2}}}}}}\) , the meta relation between \({v}_{i}\) and \({v}_{j}\) is denoted as \(langle \tau ({v}_{i}) \,,\phi ({e}_{i,j}),\tau ({v}_{j})\rangle\) .

Giving the integrated matrix \(X\) , we construct a bipartite gene-cell heterogeneous graph \(G\) with two node types (cell and gene) and one edge type (gene-cell edge). \({{{{{\bf{V}}}}}}={{{{{{\bf{V}}}}}}}^{{{{{{\bf{C}}}}}}}\cup {{{{{{\bf{V}}}}}}}^{{{{{{\bf{G}}}}}}}\) , where \({{{{{{\bf{V}}}}}}}^{{{{{{\bf{G}}}}}}}=\left\{{v}_{i}^{G}|i={1,\, 2},\ldots,I\right\}\) denotes all genes, and \({{{{{{\bf{V}}}}}}}^{{{{{{\bf{C}}}}}}}=\big\{{v}_{j}^{C}|\,j={{{{\mathrm{1,\, 2}}}}},\ldots,J\big\}\) denotes all cells. \(E=\big\{{e}_{i,j}\big\}\) represents the edge between \({v}_{i}^{G}\) and \({v}_{j}^{C}\) . For \({x}_{{ij}} \, > \, 0\) , the weight of the corresponding edge \(\omega ({e}_{i,j})=1,\) otherwise, \(\omega ({e}_{i,j})=0\) .

Joint embedding via a heterogeneous graph transformer

We propose an unsupervised HGT framework 12 , 13 to learn graph embeddings of all the nodes and mine relationships between genes and cells. The input of HGT is the integrated matrix \(X\) , and the outputs are the embeddings of cells and genes and attention scores representing the importance of genes to cells.

Definition 4 (Target node and source node): A node in \({{{{{\bf{V}}}}}}\) is considered as a target node, represented as \({v}_{t}\) , when performing HGT to aggregate information and update embeddings of this node. A node is considered as a source node, represented as \({v}_{s},{v}_{s}\,\ne\, {v}_{t}\) , if there is an edge between \({v}_{s}\) and \({v}_{t}\) in \(E\) , denoted as \({e}_{s,t}\) for convenience.

Definition 5 (Neighborhood graph of target node): A neighborhood graph of a target node v t is induced from G and denoted as G ′ = ( V ′, E ′, A ′, R ′), where \({{{{{\bf{V}}}}}}^{\prime}=\{{v}_{t}\} {{\cup }}{{{{{\mathscr{N}}}}}}\left({v}_{t}\right)\) , \({{{{{\mathscr{N}}}}}}\left({v}_{t}\right)\) is the complete set of neighbors of \({v}_{t}\) , \({{{{{\bf{E}}}}}}^{\prime}=\{{e}_{i,j}\in {E|}{v}_{i},{v}_{j}\in {V{{\hbox{'}}}}\}\) , A ′ marks the target and source node types, and R ′ represents the target-source edge. e s,t ∈ E ′ represents the edge between \({v}_{s}\) and \({v}_{t}\) . As only one edge type is included in \(G\) , the node meta relation of \({v}_{s}\) and \({v}_{t}\) is denoted as \(\left\langle \tau \left({v}_{s}\right),\phi \left({e}_{s,t}\right),\tau \left({v}_{t}\right)\right\rangle\) .

Multi-head attention mechanism and linear mapping of vectors.

Let \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}\) denotes the embedding of the \({l}^{{th}}\) HGT layer ( \(l={{{{\mathrm{1,2}}}}},\ldots,L\) ). The embedding of \({v}_{t}\) and \({v}_{s}\) on the \({l}^{{th}}\) layer is denoted as \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}]\) . A multi-head mechanism is applied to equally divide both \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}]\) into \(H\) heads. Multi-head attention allows the model to jointly attend to information from different embedding subspaces, and each head can run through an attention mechanism in parallel to reduce computational time.

For the \({h}^{{th}}\) head in the \({l}^{{th}}\) HGT layer, the \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) is updated from \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}{{{{{\boldsymbol{-}}}}}}{{{{{\bf{1}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}{{{{{\boldsymbol{-}}}}}}{{{{{\bf{1}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}]\) . The \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{0}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{0}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}]\) are the initial embedding of \({v}_{t}\) and \({v}_{s}\) , respectively. Three linear projection functions are applied to map node embeddings into the \({h}^{{th}}\) vector. Specifically, the \({{{{{{\rm{Q}}}}}}\_{{{{{\rm{linear}}}}}}}_{\tau \left({{{{{{\rm{v}}}}}}}_{{{{{{\rm{t}}}}}}}\right)}^{{{{{{\rm{h}}}}}}}\) function maps \({v}_{t}\) into the \({h}^{{th}}\) query vector \({{{{{{\bf{Q}}}}}}}^{{{{{{\bf{h}}}}}}}\left({{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}\right)\) , with dimension \({{\mathbb{R}}}^{d}\to {{\mathbb{R}}}^{\frac{d}{H}}\) , where \(d\) is the dimension of \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}{{{{{\boldsymbol{-}}}}}}{{{{{\bf{1}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \(\frac{d}{H}\) is the vector dimension per head. Similarly, the \({{{{{{\rm{K}}}}}}\_{{{{{\rm{linear}}}}}}}_{\tau \left({{{{{{\rm{v}}}}}}}_{{{{{{\rm{s}}}}}}}\right)}^{{{{{{\rm{h}}}}}}}\) and \({{{{{{\rm{V}}}}}}\_{{{{{\rm{linear}}}}}}}_{\tau \left({{{{{{\rm{v}}}}}}}_{{{{{{\rm{s}}}}}}}\right)}^{{{{{{\rm{h}}}}}}}\) function map the source node \({v}_{s}\) into the \({h}^{{th}}\) key vector \({{{{{{\bf{K}}}}}}}^{{{{{{\bf{h}}}}}}}\left({{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}\right)\) and the \({h}^{{th}}\) value vector \({{{{{{\bf{V}}}}}}}^{{{{{{\bf{h}}}}}}}\left({{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}\right)\) .

Each type of node has a unique linear projection to maximally model the distribution differences.

Heterogeneous mutual attention

To calculate the mutual attention between \({v}_{t}\) and \({v}_{s}\) , we introduce the Attention operator which estimates the importance of each \({v}_{s}\) to \({v}_{t}\) :

The attention function can be described as mapping a query vector and a set of key-value pairs to an output for each node pair \(e=({v}_{s},\, {v}_{t})\) . The overall attention of \({v}_{t}\) and \({v}_{s}\) is the concatenation of the attention weights in all heads, followed by a softmax function. \(\mathop{{{{{{\rm{||}}}}}}}\limits_{H}(\cdot )\) is the concatenation function. The ATT_head h \(\left({v}_{s},\, {e}_{s,\,t},\, {v}_{t}\right)\) term is the \({h}^{{th}}\) head attention weight between the \({v}_{t}\) and \({v}_{s}\) , which can be calculated by:

The similarity between the queries and keys was measured where \({W}_{\phi \left({e}_{s,t}\right)}^{{ATT}}\in {{\mathbb{R}}}^{\frac{d}{H}\times \frac{d}{H}}\) is a transformation matrix to capture meta-relation features. \({(\cdot )}^{T}\) is the transposal function and \(\mu\) is a prior tensor to denote the significance for each the node meta relation \(\left\langle \tau \left({v}_{s}\right),\,\phi \left({e}_{s,t}\right),\,\tau \left({v}_{t}\right)\right\rangle\) , serving as an adaptive scaling to the attention. The concatenation of attention heads results in the attention coefficients between \({v}_{s}\) and \({v}_{t}\) , followed by a Softmax function in Eq.  12 .

Heterogeneous message passing

A Message operator is used to extract the message of \({v}_{s}\) that can be passed to \({v}_{t}\) . The multi-head \({Message}\) is defined by:

The \({h}^{{th}}\) head message MSG_head h \(\left({v}_{s},\,{e}_{s,\,t},\,{v}_{t}\right)\) for each edge \(\left({v}_{s},{v}_{t}\right)\) is defined as:

where each source node \({v}_{s}\) in the head \(h\) was mapped into a message vector by a linear projection \({V}^{h}\left({v}_{s}\right):{{\mathbb{R}}}^{d}\to \times {{\mathbb{R}}}^{\frac{d}{H}}\) . \({W}_{\phi \left(e\right)}^{{MSG}}\in {{\mathbb{R}}}^{\frac{d}{H}\times \frac{d}{H}}\) is also a transformation matrix similar to \({W}_{\phi \left({e}_{s,t}\right)}^{{ATT}}\) .

Target specific aggregation

To update the embedding of \({v}_{t}\) , the final step in the \({l}^{{th}}\) HGT layer is to Aggregate the neighbor information obtained in this layer \(\widetilde{{{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}}\left[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}\right]\) into the target node embedding \({{{{{{\mathcal{H}}}}}}}^{l-1}\left[{v}_{t}\right]\) .

where \(\theta\) is a trainable parameter and \({ReLU}\) is the activation function. The final embedding of \({v}_{t}\) is obtained by stacking information via all \(L\) HGT layers, and \(L\) is set to be 2 in DeepMAPS.

Determination of gene to cell attention

We call out the final attention score \({{{{{{\mathcal{a}}}}}}}_{i,j}\) of gene \(i\) to cell \(j\) in the last HGT layer after the completion of the HGT process:

HGT training on subgraphs

To improve the efficiency and capability of the HGT model on a giant heterogeneous graph (tens of thousands of nodes and millions of edges), we deploy a modified HGSampling method for subgraph selection and HGT training on multiple mini-batches 12 . For the graph \(G\) with \(I\) genes and \(J\) cells, the union of subgraphs should cover \(a\) % (set to be 30%) nodes of gene and cell nodes to ensure the training power. As such, the sampler constructs a number of small subgraphs (50 in DeepMAPS) from the given heterogeneous graph \(G\) , and feeds the subgraphs into the HGT model in different batches using multiple GPUs. Each graph should include \(a\%\times I/50\) genes, and \(a\%\times J/50\) cells. Take a cell \(j\) as a target node \({v}_{t}\) and its neighbor \({v}_{s}{{\in }}{{{{{\mathscr{N}}}}}}\left({v}_{t}\right)\) , corresponding to gene \(i\) , as source nodes, we calculate the probability on the edge \({e}_{s,t}\) as:

where \(x({v}_{s},{v}_{t})={x}_{{ij}}\) refers to the expression or GAS value of the gene \(i\) in cell \(j\) in the integrated matrix \(X\) . Thus, for each target node \({v}_{t}\) , we randomly select \(\frac{a\%\times I/50}{a\%\times J/50}\) neighbor genes for \({v}_{t}\) based on sampling probability \({{{{{\bf{Prob}}}}}}\left({{{{{{\bf{e}}}}}}}_{{{{{{\bf{s}}}}}}{{,}}{{{{{\bf{t}}}}}}}\right)\) . HGT hyperparameters, such as \({W}_{\phi \left({e}_{s,t}\right)}^{{ATT}}\) , \({W}_{\phi \left({e}_{s,t}\right)}^{{MSG}}\) , and \(\theta\) , will be trained and inherited sequentially from subgraphs 1 to 50 in one epoch. The subgraph training is performed in an unsupervised way with a graph autoencoder (GAE). The HGT is the encoder layer, and the inner product of embeddings is the decoder layer. We calculate the loss function of the GAE as the Kullback-Leibler divergence ( KL ) of reconstructed matrix \(\hat{X}\) and the integrated matrix \(X\) :

The subgraph training will be completed if the loss is restrained or reaches 100 epochs, whichever happens first.

Determination of active genes module in cell clusters

Predict cell clusters.

We deploy a Louvain clustering (Seurat v3) to predict cell clusters cell embeddings \({{{{{{\mathcal{H}}}}}}}^{{{{{{\boldsymbol{L}}}}}}}\left[{{{{{{\boldsymbol{v}}}}}}}_{{{{{{\boldsymbol{c}}}}}}}\right]\) generated from the final HGT layer. The resolution of Louvain clustering is determined by a grid-search test of multiple HGT hyperparameter combinations, and we set the clustering resolution of 0.4 as the default.

Identify cell cluster-active gene association network

We used an SFP model 17 to select genes that highly contribute to cell cluster characterization and construct cell cluster-active gene association networks. Define a new heterogeneous graph \(\widetilde{G}=\left(V,\,\widetilde{E}\right),V\in {V}^{G}\cup {V}^{C},\widetilde{E}\in {\widetilde{E}}^{1}\cup {\widetilde{E}}^{2}\) , where \({\widetilde{E}}^{1}\) represents the gene-gene relations, and \({\widetilde{E}}^{2}\) represents the gene-cell relations. The weight of the corresponding edge \(\omega ({\widetilde{e}}_{{i}_{1},{i}_{2}}^{1})\) of \({v}_{{i}_{1}}^{G}\in {V}^{G}\) and \({v}_{i}^{G}\in {V}^{G}\) is the Pearson’s correlation of the HGT embeddings between \({v}_{{j}_{1}}^{G}\) and \({v}_{{j}_{2}}^{G}\) . The weight of the corresponding edge \(\omega \left({\widetilde{e}}_{i,j}^{2}\right)\) of \({v}_{i}^{G}\in {V}^{G}\) and \({v}_{j}^{G}\in {V}^{C}\) is the final attention score \({{{{{{\mathcal{a}}}}}}}_{i,j}\) . Only edges with \(\omega \left({\widetilde{e}}_{{i}_{1},{i}_{2}}^{1}\right) \, > \, 0.5\) and \(\omega \left({\widetilde{e}}_{i,j}^{2}\right) \, > \, \mu \left({{{{{{\mathcal{a}}}}}}}_{i,j}\right)+{sd}({{{{{{\mathcal{a}}}}}}}_{i,j})\) , where \(\mu\) () represents the mean and \({sd}()\) represents the standard deviation of \({{{{{{\mathcal{a}}}}}}}_{i,j}\) , will be kept within a cell cluster. The weight of the remaining edges will then be max-min normalized to ensure an edge with the largest weight being rescaled to be 0 and an edge with the smallest weight being rescaled to be 1.

Let \(Z\) be the number of clusters predicted via Louvain clustering, and \({V}^{C\left[z\right]}=\{{v}^{C[z]}\}\) be the node set corresponding with the cell set in cluster label of \(z={{{{\mathrm{1,2}}}}},\ldots,Z\) . We then formulate this problem using a combinatorial optimization model defined below

where \({{{{{\mathcal{L}}}}}}({v}_{{j}_{1}}^{C},{v}_{{j}_{2}}^{C})\) is a binary indicator function representing whether two cell nodes, \({v}_{{j}_{1}}^{C}\) and \({v}_{{j}_{2}}^{C}\) , can be connected (1) or not (0) in \(\widetilde{G}\) via a \({\widetilde{E}}_{{j}_{1},{j}_{2}}^{{{{{{\mathcal{L}}}}}}}=\{{\widetilde{e}}_{{i}_{1},{j}_{1}}^{2},\,{\widetilde{e}}_{{i}_{1},{i}_{2}}^{1},\,{\widetilde{e}}_{{i}_{2},{i}_{3}}^{1}\ldots,{\widetilde{e}}_{{i}_{t-1},{i}_{t}}^{1},{\widetilde{e}}_{{i}_{t},{j}_{2}}^{2}\}\) path. Denote \({\widetilde{E}}^{{{{{{\mathcal{L}}}}}}}=\{{\widetilde{E}}_{{j}_{1},{j}_{2}}^{{{{{{\mathcal{L}}}}}}}\}\) as the complete collection of \({\widetilde{E}}_{{j}_{1},{j}_{2}}^{{{{{{\mathcal{L}}}}}}}\) connecting \({v}_{{j}_{1}}^{C}\) and \({v}_{{j}_{2}}^{C}\) . The combinatorial optimization model aims to identify the path connecting \({v}_{{j}_{1}}^{C}\) and \({v}_{{j}_{2}}^{C}\) with the minimum summed edge weight. We consider the gene networks remained in an SFP result of cluster \(z\) as the cluster-active gene association networks.

Construct GRNs from scRNA-ATAC-seq data

For genes in a cell cluster-active gene association network resulting from SFP, a set of TFs \(q={{{{\mathrm{1,2}}}}},\ldots,Q\) can then be assigned to genes. The TF-peak relations are retrieved by finding alignments between the TF binding sites with peak regions in the scATAC-seq data, and the peak-gene relations are established previously when calculating the potential regulation scores \({r}_{{ik|j}}\) (Eq.  3 ). We design a regulatory intensity (RI) score \({{{{{{\mathcal{s}}}}}}}_{i,j,q}\) to quantify the intensity of TF \(q\) in regulating gene \(i\) in the cell \(j\) :

where \({b}_{{qk}}^{A}\) is the binding affinity score of TF \(q\) to peak \(k\) . The binding affinity score is calculated by three steps: ( a ) We retrieved the genome browser track file from JASPAR, which stores all known TF binding sites of each TF. A p -value score was calculated as -log 10 ( p ) × 100 in JASPAR, where 0 corresponds to a p -value of 1 and 1,000 corresponds to a p -value <10 −10 . We removed TF binding sites with p -value scores smaller than 500. ( b ) If a TF binding site overlaps with any peak regions in the scATAC-seq profile, it will be kept, otherwise, it will be removed. ( c ) Divide the corresponding p -value score by 100. We claim that a gene set regulated by the same TF is a regulon.

We calculate a regulon activity score (RAS) \({{{{{\mathcal{r}}}}}}\left(q,z\right)\) of a regulon with genes regulated by TF \(q\) in cell cluster \(z\) as:

where \({I}_{q}\) denotes genes regulated by TF \(q\) in cell cluster \(z\) . We used the Wilcoxon rank-sum test to identify differentially active regulons in a cluster based on RAS. If the BH-adjusted p -value is less than 0.05 between different cell clusters and the log fold change larger than 0.10, we consider the regulon to be differentially active in this cluster, and it is defined as a cell-type-specific regulon (CTSR).

A GRN in a cell cluster is constructed by merging regulons in a cell cluster. The eigenvector centrality ( \({{{{{{\mathcal{c}}}}}}}_{v}\) ) of a TF node \(v\) in GRN was defined as:

where \({\alpha }_{\max }\) is the eigenvector corresponding to the largest eigenvalue of the weighted adjacency matrix of a GRN. TFs with higher \({{{{{{\mathcal{c}}}}}}}_{v}\) ranks were regarded as master TFs (top 10 by default).

Benchmarking quantification and statistics

Grid-search parameter test for cell clustering on benchmark data.

To determine the default parameters of HGT on different data types, we performed a grid-search test on HGT parameters, including the pair of number of embeddings and number of heads (91/13, 104/13, 112/16, and 128/16), learning rate (0.0001, 0.001, and 0.01), and training epochs (50, 75, and 100). Altogether, 36 parameter combinations were tested. For each of the three data types, the HGT parameter training were performed on three benchmark data, and the default parameter combination was selected based on the highest median score (ARI for multiple scRNA-seq data and CITE-seq data with benchmark labels and AWS for scRNA-ATAC-seq data without benchmark labels) of the three datasets.

To assess the performance of DeepMAPS alongside other proposed scMulti-omics benchmark tools, we compared DeepMAPS with Seurat (v3.2.3 and v4.0, https://github.com/satijalab/seurat ), MOFA + (v1.0.0, https://github.com/bioFAM/MOFA2 ), Harmony (v0.1, https://github.com/immunogenomics/harmony ), TotalVI (v0.10.0, https://github.com/YosefLab/scvi-tools ), and GLUE (v0.3.2, https://github.com/gao-lab/GLUE ). Because of the integration capability for different data types, DeepMAPS was compared with Seurat v 3.2.3 and Harmony on multiple scRNA-seq data, with Seurat v4.0.0, MOFA+, and TotalVI on CITE-seq data, and with Seurat v4.0.0, MOFA+, and GLUE on scRNA-ATAC-seq data. For each benchmarking tool, grid-search tests were also applied to a combination of parameters, such as the number of dimensions for cell clustering and clustering resolution.

The default HGT parameter combination selected for each data type was then applied to additional datasets (one multiple scRNA-seq, one CITE-seq, and three scRNA-ATAC-seq data) for independent tests. All benchmarking tools use their default parameters.

To showcase the rationale for selecting integrative methods and cell clustering methods in DeepMAPS, we evaluated the cell clustering performances by replacing the methods with several others. Specifically, for data integration, we replaced the CCA method with Harmony integration (multiple scRNA-seq), replaced the CLR method with Seurat weighted nearest neighbor method (CITE-seq), and replaced the velocity-weighted method with Seurat weighted nearest neighbor method and without using velocity (scRNA-ATAC-seq). For the cell clustering method, we replaced Louvain clustering with Leiden and the smart local moving (SLM) method. We also compared the influence of clustering resolution (use 0.4, 0.8, 1.2, and 1.6) to the cell clustering results in DeepMAPS. Each comparison was performed on all 36 parameter combinations as used in the grid-search test. For DeepMAPS without velocity, we simply add up the gene expression matrix from scRNA-seq data and the gene potential activity matrix derived from scATAC-seq data, considering the balance weight introduced by velocity for gene j in cell i as 1.

Adjusted rand index (ARI)

ARI is used to compute similarities by considering all pairs of the samples assigned in clusters in the current and previous clustering adjusted by random permutation. A contingency table is built to summarize the overlaps between the two cell label lists with \(b\) elements (cells) to calculate the ARI. Each entry denotes the number of objects in common between the two label lists. The ARI can be calculated as:

Where \(E\left[.\right]\) is the expectation, \({J}_{a}\) is the number of cells assigned to the same cell cluster as benchmark labels; \({J}_{b}\) is the number of cells assigned to different cell clusters as benchmark labels; \({C}_{n}^{2}\) is the combination of selecting two cells from the total of \(n\) cells in the cluster.

Average Silhouette Width (ASW)

Unlike ARI, which requires known ground truth labels, a silhouette score refers to a method of interpretation and validation of consistency within clusters of data. The silhouette weight indicates how similar an object is to its cluster (cohesion) compared to other clusters (separation). The silhouette width ranges from −1 to +1, where a high value indicates that the object is well-matched to its cluster. The silhouette score sil(j) can be calculated by:

where \(m(j)\) is the average distance between a cell \(j\) and all other cells in the same cluster, and \(n\left(j\right)\) is the average distance of \(i\) to all cells in the nearest cluster to which \(j\) does not belong. We calculated the mean silhouette score of all cells as the ASW to represent the silhouette score of the dataset.

Calinski-Harabasz index

The CH index calculates the ratio of the sum of between-clusters dispersion and inter-cluster dispersion for all clusters. A higher CH index indicates a better performance. For a set of data E of size \({n}_{E}\) with \(k\) clusters, the CH index is defined as:

where \(t\left({B}_{k}\right)\) is the trace of the between group dispersion matrix, and \(t\left({W}_{k}\right)\) is the trace of the within-cluster dispersion matrix. \({C}_{q}\) is the set of points in cluster \(q\) , \({c}_{q}\) is the center of cluster \(q\) , \({c}_{E}\) is the center of E, and \({n}_{q}\) is the number of points in cluster \(q\) . \(T\) refers to the matrix transformation.

Davies-Bouldin index

The DB index signifies the average ‘similarity’ between clusters, where the similarity is a measure that compares the distance between clusters with the size of the clusters themselves. A lower DB index relates to a model with better separation between the clusters. For data with \(k\) clusters, \(i\in k\) and \(j\in k\) , the DB index is defined as:

where \({s}_{i}\) and \({s}_{j}\) are the average distance between each point within the cluster to the cluster centroid. \({d}_{{ij}}\) is the distance of cluster centroids of \(i\) and \(j\) .

Gene association network evaluations

We evaluated the performance of the gene association network identified in DeepMAPS by comparing it with IRIS3 15 and a normal gene co-expression network inference using all genes. We calculated the closeness centrality and eigenvector centrality for the network generated in each tool. The formulations are given below.

Closeness centrality (CC)

The closeness centrality (CC) 76 of a vertex \(u\) is defined by the inverse of the sum length of the shortest paths to all the other vertices \(v\) in the undirected weighted graph. The formulation is defined as:

where \({d}_{w}\left(u,\,v\right)\) is the shortest weighted path between \(u\) and \(v\) . If there is no path between vertex \({{{{{\rm{u}}}}}}\) and \({{{{{\rm{v}}}}}}\) , the total number of vertices is used in the formula instead of the path length. A higher CC indicates a node is more centralized in the network, reflecting a more important role of this gene in the network. The CC is calculated using igraph R package with function igraph::betweenness. We take the average CC of all nodes in a network to represent the network CC.

Eigenvector centrality (EC)

Eigenvector centrality (EC) 77 scores correspond to the values of the first eigenvector of the graph adjacency matrix. The EC score of \(u\) is defined as:

where λ is inverse of the eigenvalue of eigenvector \(x=({x}_{1},{x}_{2},\ldots,{x}_{n})\) , \({a}_{{uv}}\) is the adjacent weighted matrix of undirect graph G . A node with a high eigenvector centrality score means that it is connected to many nodes which themselves have high scores. The EC is calculated using igraph R package with function igraph::evcent. We take the average EC of all nodes in a network to represent the network EC.

Evaluations on GRN

For scRNA-ATAC-seq data, we compared cell-type-specific GRNs inferred from DeepMAPS with ( i ) IRIS3 and SCENIC on the scRNA-seq matrix, ( ii ) IRIS3 and SCENIC on GAS matrix, ( iii ) MAESTRO on scATAC-seq matrix, and ( iv ) MAESTRO on original scRNA-seq and scATAC-seq matrix. For each dataset comparison, we set the cell clusters used in the benchmarking tool the same as generated in DeepMAPS to ensure fairness. GRNs generated from each tool were compared with three public functional databases, including Reactome 21 , DoRothEA 22 , and TRRUST v2 23 . Only human sample datasets were used for comparison as these databases are all human-related. We performed hypergeometric tests for GRN resulting in each tool to each database and compared the precision, recall, and F1 score of enriched GRNs and functional terminologies.

Cell cluster leave-out test

For a benchmark dataset with a real cell type label, we removed all cells in one cell type and ran DeepMAPS. Then, we traversed all cell types (one at a time) to evaluate the robustness of ARI. We removed cells in predicted cell clusters from DeepMAPS and other benchmark tools for data without benchmark labels.

DeepMAPS server construction

DeepMAPS is hosted on an HPE XL675d RHEL system with 2 × 128-core AMD EPYC 7H12 CPU, 64GB RAM, and 2×NVIDIA A100 40GB GPU. The backend server is written in TypeScript using the NestJs framework. Auth0 is used as an independent module to provide user authentication and authorization services. Redis houses a queue of all pending analysis jobs. There are two types of jobs in DeepMAPS: The stateful jobs are handled by the Plumber R package to provide real-time interactive analysis; and the stateless jobs, such as CPU-bound bioinformatics pipelines and GPU training tasks that could take a very long time, are constructed using Nextflow. All running jobs are orchestrated using Nomad, allowing each job to be assigned with proper cores and storage and keeping jobs scalable based on the server load. The job results are deposited into a MySQL database. The front-end is built with NUXT, Vuetify as the UI library, Apache ECharts, and Cytoscape.js for data visualization. The frontend server and backend server are communicated using REST API.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

All data used in this study are from public domain. The raw data are downloaded from the GEO database with the accession numbers for: human pancreatic islets scRNA-seq data GSE84133 and healthy bone marrow mononuclear cell CITE-seq data: GSE194122 . The following datasets were obtained from figshare: the human pancreas scRNA-seq data [ https://figshare.com/articles/dataset/Benchmarking_atlas-level_data_integration_in_single-cell_genomics_-_integration_task_datasets_Immune_and_pancreas_/12420968/8 ], the mouse bladder from the Tabula Muris scRNA-seq data [ https://doi.org/10.6084/m9.figshare.5968960.v1 ], and the human lung adenocarcinoma PBMC CITE-seq data [ https://doi.org/10.6084/m9.figshare.c.5018987.v1 ]. The following paired scRNA-seq and scATAC-seq datasets were obtained from the 10X Genomics website: 3k healthy PBMC data [ https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-no-cell-sorting-3-k-1-standard-2-0-0 ], 10k healthy PBMC data [ https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-10-k-1-standard-2-0-0 ], frozen human healthy brain data [ https://www.10xgenomics.com/resources/datasets/frozen-human-healthy-brain-tissue-3-k-1-standard-2-0-0 ], 10k human PBMC data [ https://www.10xgenomics.com/resources/datasets/10-k-human-pbm-cs-multiome-v-1-0-chromium-x-1-standard-2-0-0 ], healthy PBMC data [ https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0 ], fresh embryonic data [ https://www.10xgenomics.com/resources/datasets/fresh-embryonic-e-18-mouse-brain-5-k-1-standard-2-0-0 ], and lymph node data [ https://www.10xgenomics.com/resources/datasets/fresh-frozen-lymph-node-with-b-cell-lymphoma-14-k-sorted-nuclei-1-standard-2-0-0 ]. The scRNA-seq and scATAC-seq cancer cell line data was downloaded from CNGB Nucleotide Sequence Archive with an accession code of CNP0000213 . All datasets are publicly available without restrictions. Details of data information can be found in Supplementary Data  1 .  Source data are provided with this paper.

Code availability

The python source code of DeepMAPS Docker is freely available at https://github.com/OSU-BMBL/deepmaps and the DeepMAPS webserver is freely available at https://bmblx.bmi.osumc.edu/ . The source code is also available on Zenodo 78 .

Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20 , 257–272 (2019).

Article   CAS   PubMed   Google Scholar  

Ma, A., McDermaid, A., Xu, J., Chang, Y. & Ma, Q. Integrative methods and practical challenges for single-cell multi-omics. Trends Biotechnol. 38 , 1007–1022 (2020).

Argelaguet, R., Cuomo, A. S. E., Stegle, O. & Marioni, J. C. Computational principles and challenges in single-cell data integration. Nat. Biotechnol. 39 , 1202–1215 (2021).

S Teichmann, M. E. Method of the year 2019: single-cell multimodal omics. Nat. Methods 17 , 1 (2020).

Article   Google Scholar  

Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184 , 3573–3587.e3529 (2021).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21 , 111 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16 , 1289–1296 (2019).

Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18 , 272–282 (2021).

Li, Y. et al. Elucidation of biological networks across complex diseases using single-cell omics. Trends Genet. 36 , 951–966 (2020).

Ma, Q. & Xu, D. Deep learning shapes single-cell data analysis. Nat. Rev. Mol. Cell Biol. 23 , 303–304 (2022).

Wang, J. et al. scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nat. Commun. 12 , 1882 (2021).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Hu, Z., Dong, Y., Wang, K. & Sun, Y. In Proceedings of The Web Conference 2020 2704–2710 (Association for Computing Machinery, Taipei, Taiwan; 2020).

Wang, X. et al. In The World Wide Web Conference 2022–2032 (Association for Computing Machinery, San Francisco, CA, USA; 2019).

Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14 , 1083–1086 (2017).

Ma, A. et al. IRIS3: integrated cell-type-specific regulon inference server from single-cell RNA-Seq. Nucleic Acids Res. 48 , W275–W286 (2020).

Han, P., Gopalakrishnan, C., Yu, H. & Wang, E. Gene regulatory network rewiring in the immune cells associated with cancer. Genes (Basel) 8 , 308 (2017).

Article   PubMed   Google Scholar  

Gassner, E. The Steiner Forest Problem revisited. J. Discret. Algorithms 8 , 154–163 (2010).

Article   MathSciNet   MATH   Google Scholar  

Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36 , 411–420 (2018).

Cao, Z.-J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40 , 1458–1466 (2022).

Iacono, G., Massoni-Badosa, R. & Heyn, H. Single-cell transcriptomics unveils gene regulatory network plasticity. Genome Biol. 20 , 110 (2019).

Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33 , D428–D432 (2005).

Garcia-Alonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. & Saez-Rodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29 , 1363–1375 (2019).

Han, H. et al. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 46 , D380–D386 (2018).

Wang, C. et al. Integrative analyses of single-cell transcriptome and regulome using MAESTRO. Genome Biol. 21 , 198 (2020).

Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12 , 1088 (2021).

Ampudia, J. et al. CD6-ALCAM signaling regulates multiple effector/memory T cell functions. J. Immunol. 204 , 150.113–150.113 (2020).

Skonier, J. E. et al. Mutational analysis of the CD6 ligand binding domain. Protein Eng. Des. Selection 10 , 943–947 (1997).

Article   CAS   Google Scholar  

Gimferrer, I. et al. Relevance of CD6-mediated interactions in T cell activation and proliferation. J. Immunol. (Baltim., Md.: 1950) 173 , 2262–2270 (2004).

Johnston, R. J., Lee, P. S., Strop, P. & Smyth, M. J. Cancer immunotherapy and the nectin family. Annu. Rev. Cancer Biol. 5 , 203–219 (2021).

Li, X.-Y. et al. CD155 loss enhances tumor suppression via combined host and tumor-intrinsic mechanisms. J. Clin. Invest. 128 , 2613–2625 (2018).

Gururajan, M. et al. Early growth response genes regulate B cell development, proliferation, and immune response. J. Immunol. (Baltim., Md.: 1950) 181 , 4590–4602 (2008).

Oh, Y.-K., Jang, E., Paik, D.-J. & Youn, J. Early growth response-1 plays a non-redundant role in the differentiation of B cells into plasma cells. Immune Netw. 15 , 161–166 (2015).

Brescia, P. et al. MEF2B instructs germinal center development and acts as an oncogene in B cell lymphomagenesis. Cancer Cell 34 , 453–465.e459 (2018).

Trøen, G. et al. Constitutive expression of the AP-1 transcription factors c-jun, junD, junB, and c-fos and the marginal zone B-cell transcription factor Notch2 in splenic marginal zone lymphoma. J. Mol. Diagn. 6 , 297–307 (2004).

Sánchez-Beato, M. et al. Abnormal PcG protein expression in Hodgkin’s lymphoma. Relation with E2F6 and NFkappaB transcription factors. J. Pathol. 204 , 528–537 (2004).

Saha, A., Robertson, E. S. & Goodrum, F. Mechanisms of B-cell oncogenesis induced by Epstein-Barr virus. J. Virol. 93 , e00238–00219 (2019).

Yachida, S. et al. Genomic sequencing identifies ELF3 as a driver of ampullary carcinoma. Cancer Cell 29 , 229–240 (2016).

Wang, H. et al. Overexpression of ELF3 facilitates cell growth and metastasis through PI3K/Akt and ERK signaling pathways in non-small cell lung cancer. Int. J. Biochem. Cell Biol. 94 , 98–106 (2018).

Zhang, J. et al. KLF16 affects the MYC signature and tumor growth in prostate cancer. Onco Targets Ther. 13 , 1303–1310 (2020).

Ma, P. et al. KLF16 promotes proliferation in gastric cancer cells via regulating p21 and CDK4. Am. J. Transl. Res. 9 , 3027–3036 (2017).

CAS   PubMed   PubMed Central   Google Scholar  

Mathas, S. et al. Aberrantly expressed c-Jun and JunB are a hallmark of Hodgkin lymphoma cells, stimulate proliferation and synergize with NF-κB. EMBO J. 21 , 4104–4113 (2002).

Eferl, R. & Wagner, E. F. AP-1: a double-edged sword in tumorigenesis. Nat. Rev. Cancer 3 , 859–868 (2003).

Nagel, D., Vincendeau, M., Eitelhuber, A. C. & Krappmann, D. Mechanisms and consequences of constitutive NF-κB activation in B-cell lymphoid malignancies. Oncogene 33 , 5655–5665 (2014).

Jost, P. J. & Ruland, J. R. Aberrant NF-κB signaling in lymphoma: mechanisms, consequences, and therapeutic implications. Blood 109 , 2700–2707 (2006).

Garces de Los Fayos Alonso, I. et al. The role of activator protein-1 (AP-1) family members in CD30-positive lymphomas. Cancers (Basel) 10 , 93 (2018).

Crispino, J. D. & Horwitz, M. S. GATA factor mutations in hematologic disease. Blood 129 , 2103–2110 (2017).

Shimizu, R., Engel, J. D. & Yamamoto, M. GATA1-related leukaemias. Nat. Rev. Cancer 8 , 279–287 (2008).

Mosquera Orgueira, A. et al. Detection of new drivers of frequent B-cell lymphoid neoplasms using an integrated analysis of whole genomes. PLoS ONE 16 , e0248886 (2021).

Blyth, K. et al. Runx1 promotes B-cell survival and lymphoma development. Blood Cells Mol. Dis. 43 , 12–19 (2009).

Mackay, F., Schneider, P., Rennert, P. & Browning, J. BAFF AND APRIL: a tutorial on B cell survival. Annu Rev. Immunol. 21 , 231–264 (2003).

Smulski, C. R. & Eibel, H. BAFF and BAFF-receptor in B cell selection and survival. Front. Immunol. 9 , 2285 (2018).

Yang, S., Li, J. Y. & Xu, W. Role of BAFF/BAFF-R axis in B-cell non-Hodgkin lymphoma. Crit. Rev. Oncol. Hematol. 91 , 113–122 (2014).

He, B. et al. Lymphoma B cells evade apoptosis through the TNF family members BAFF/BLyS and APRIL. J. Immunol. (Baltim., Md.: 1950) 172 , 3268–3279 (2004).

Xia, X. Z. et al. TACI is a TRAF-interacting receptor for TALL-1, a tumor necrosis factor family member involved in B cell regulation. J. Exp. Med 192 , 137–143 (2000).

Laâbi, Y., Egle, A. & Strasser, A. TNF cytokine family: more BAFF-ling complexities. Curr. Biol. 11 , R1013–R1016 (2001).

Mackay, F. & Schneider, P. TACI, an enigmatic BAFF/APRIL receptor, with new unappreciated biochemical and biological properties. Cytokine Growth Factor Rev. 19 , 263–276 (2008).

Rihacek, M. et al. B-cell activating factor as a cancer biomarker and its implications in cancer-related Cachexia. Biomed. Res. Int. 2015 , 792187–792187 (2015).

Su, H., Chang, J., Xu, M., Sun, R. & Wang, J. CDK6 overexpression resulted from microRNA‑320d downregulation promotes cell proliferation in diffuse large B‑cell lymphoma. Oncol. Rep. 42 , 321–327 (2019).

CAS   PubMed   Google Scholar  

Lee, C., Huang, X., Di Liberto, M., Martin, P. & Chen-Kiang, S. Targeting CDK4/6 in mantle cell lymphoma. Ann. Lymphoma 4 , 1 (2020).

Otto, T. & Sicinski, P. Cell cycle proteins as promising targets in cancer therapy. Nat. Rev. Cancer 17 , 93–115 (2017).

Li, K. et al. cellxgene VIP unleashes full power of interactive visualization, plotting and analysis of scRNA-seq data in the scale of millions of cells. bioRxiv , 2020.2008.2028.270652 (2020).

Pereira, W. et al. Asc-Seurat – Analytical single-cell Seurat-based web application. bioRxiv , 2021.2003.2019.436196 (2021).

Gardeux, V., David, F. P. A., Shajkofci, A., Schwalie, P. C. & Deplancke, B. ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data. Bioinformatics 33 , 3123–3125 (2017).

Li, B. et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat. Methods 17 , 793–798 (2020).

Hillje, R., Pelicci, P. G. & Luzi, L. Cerebro: interactive visualization of scRNA-seq data. Bioinformatics 36 , 2311–2313 (2019).

Article   PubMed Central   Google Scholar  

Prompsy, P. et al. Interactive analysis of single-cell epigenomic landscapes with ChromSCape. Nat. Commun. 11 , 5702 (2020).

Bolisetty, M. T., Stitzel, M. L. & Robson, P. CellView: Interactive exploration of high dimensional single cell RNA-seq data. bioRxiv , 123810 (2017).

Mohanraj, S. et al. CReSCENT: CanceR Single Cell ExpressioN Toolkit. Nucleic Acids Res. 48 , W372–W379 (2020).

Patel, M. V. iS-CellR: a user-friendly tool for analyzing and visualizing single-cell RNA sequencing data. Bioinformatics 34 , 4305–4306 (2018).

Yousif, A., Drou, N., Rowe, J., Khalfan, M. & Gunsalus, K. C. NASQAR: a web-based platform for high-throughput sequencing data analysis and visualization. BMC Bioinforma. 21 , 267 (2020).

Zhu, Q. et al. PIVOT: platform for interactive analysis and visualization of transcriptomics data. BMC Bioinforma. 19 , 6 (2018).

Innes, B. & Bader, G. scClustViz - Single-cell RNAseq cluster assessment and visualization. F1000Res . 7 , ISCB Comm J-1522 (2018).

Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 53 , 403–411 (2021).

Wan, C. et al. LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data. Nucleic Acids Res. 47 , e111 (2019).

Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38 , 1408–1414 (2020).

Li, G. S., Li, M., Wang, J. X., Li, Y. H. & Pan, Y. United Neighborhood Closeness Centrality and Orthology for Predicting Essential. Proteins Ieee Acm T Comput Bi 17 , 1451–1458 (2020).

CAS   Google Scholar  

Parisutham, N. & Rethnasamy, N. Eigenvector centrality based algorithm for finding a maximal common connected vertex induced molecular substructure of two chemical graphs. J. Mol. Struct. 1244 , 130980 (2021).

Ma, A. et al. Single-cell biological network inference using a heterogeneous graph transformer. Zenodo https://doi.org/10.5281/zenodo.7559037 (2023).

Download references

Acknowledgements

This work was supported by awards R01-GM131399 (Q.M.), R35-GM126985 (D.X.), and U54-AG075931 (Q.M.) from the National Institutes of Health. The work was also supported by the award NSF1945971 (Q.M.) from the National Science Foundation. This work was supported by the Pelotonia Institute of Immuno-Oncology (PIIO). The content is solely the responsibility of the authors and does not necessarily represent the official views of the PIIO. In addition, we thank Dr. Fei He from the Northeast Normal University for his valued suggestions in framework construction and data testing.

Author information

These authors contributed equally: Anjun Ma, Xiaoying Wang.

These authors jointly supervised this work: Bingqiang Liu, Dong Xu, Qin Ma.

Authors and Affiliations

Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA

Anjun Ma, Cankun Wang, Hao Cheng, Yang Li, Yuzhou Chang, Shaopeng Gu & Qin Ma

Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA

Anjun Ma, Tong Xiao, Yuzhou Chang, Gang Xin, Zihai Li & Qin Ma

School of Mathematics, Shandong University, Jinan, Shandong, China

Xiaoying Wang, Jingxian Li, Yuntao Liu & Bingqiang Liu

Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA

Juexin Wang, Duolin Wang, Yuexu Jiang & Dong Xu

Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA

Juexin Wang, Jinpu Li, Duolin Wang, Yuexu Jiang, Li Su & Dong Xu

Institute for Data Science and Informatics, University of Missouri, Columbia, MO, USA

Jinpu Li, Li Su & Dong Xu

You can also search for this author in PubMed   Google Scholar

Contributions

Q.M., B.L., and D.X. conceived the basic idea and designed the framework. X.W. wrote the backbone code of DeepMAPS. C.W. and H.C. built the backend and front-end servers. S.G. designed the interactive figures on the server. Y.Liu carried out RNA velocity calculations. Y.Li designed the SFP model for gene module predictions. A.M, X.W., and J.L. carried out benchmark experiments. X.W., Y.C., and B.L. performed robustness tests. A.M., J.L., X.W., G.X., Z.L. and T.X. carried out the case study. J.W., D.W., Y.J., J.L., and L.S. performed tool optimizations. A.M. and Q.M. led the figure design and manuscript writing. All authors participated in the interpretation and writing of the manuscript.

Corresponding authors

Correspondence to Bingqiang Liu , Dong Xu or Qin Ma .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Saugato Rahman Dhruba and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description to additional supplementary information, supplementary data 1-2 and 4-6, supplementary data 3, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ma, A., Wang, X., Li, J. et al. Single-cell biological network inference using a heterogeneous graph transformer. Nat Commun 14 , 964 (2023). https://doi.org/10.1038/s41467-023-36559-0

Download citation

Received : 16 October 2022

Accepted : 06 February 2023

Published : 21 February 2023

DOI : https://doi.org/10.1038/s41467-023-36559-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

The diversification of methods for studying cell–cell interactions and communication.

  • Erick Armingol
  • Hratch M. Baghdassarian
  • Nathan E. Lewis

Nature Reviews Genetics (2024)

MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer

  • Xiaoying Wang
  • Maoteng Duan

Nature Communications (2024)

Deciphering cell types by integrating scATAC-seq data with genome sequences

  • Yuansong Zeng
  • Yuedong Yang

Nature Computational Science (2024)

Graph neural network approaches for single-cell data: a recent overview

  • Konstantinos Lazaros
  • Dimitris E. Koumadorakis
  • Aristidis G. Vrahatis

Neural Computing and Applications (2024)

Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data

  • Pengyi Yang

npj Systems Biology and Applications (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

biological networks thesis

  • Reference Manager
  • Simple TEXT file

People also looked at

Perspective article, machine learning for causal inference in biological networks: perspectives of this challenge.

www.frontiersin.org

  • Faculty of Computer Science, Free University of Bozen-Bolzano, Piazza Domenicani, Bolzano, Italy

Most machine learning-based methods predict outcomes rather than understanding causality. Machine learning methods have been proved to be efficient in finding correlations in data, but unskilful to determine causation. This issue severely limits the applicability of machine learning methods to infer the causal relationships between the entities of a biological network, and more in general of any dynamical system, such as medical intervention strategies and clinical outcomes system, that is representable as a network. From the perspective of those who want to use the results of network inference not only to understand the mechanisms underlying the dynamics, but also to understand how the network reacts to external stimuli (e. g. environmental factors, therapeutic treatments), tools that can understand the causal relationships between data are highly demanded. Given the increasing popularity of machine learning techniques in computational biology and the recent literature proposing the use of machine learning techniques for the inference of biological networks, we would like to present the challenges that mathematics and computer science research faces in generalising machine learning to an approach capable of understanding causal relationships, and the prospects that achieving this will open up for the medical application domains of systems biology, the main paradigm of which is precisely network biology at any physical scale.

1 Introduction

The availability of big data, the use of (deep) machine learning techniques to process them, and consequently the opportunity to access and/or perform high-performance computing are becoming of crucial importance for biology ( Xu and Jackson, 2019 ), medicine and healthcare ( Bates et al., 2020 ; Prosperi et al., 2020 ). Machine learning aims to develop computer algorithms that improve with experience and with the use of data. Nowadays, machine learning techniques are integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation ( Auslander et al., 2021 ). This integration is not without its challenges and hurdles to overcome, but it is an endeavour that many researchers have decided to actively address given the tantalizing promise of machine learning to enable sophisticated analysis of complex biological systems.

In molecular biology, machine learning techniques are used for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data ( Libbrecht and Noble, 2015 ). Xu et al. ( Xu and Jackson, 2019 ) provide a brief but comprehensive overview of the main uses of machine learning in genomic applications. Machine learning has been used to predict the sequence specificities of DNA- and RNA-binding proteins, enhancers, and other regulatory regions ( Libbrecht and Noble, 2015 ; Zou et al., 2018 ) on data generated by omics approach, such as DNase I hypersensitive sites (DNase-seq), formaldehyde-assisted isolation of regulatory elements with sequencing (FAIRE-seq), assay for transposase-accessible chromatin using sequencing (ATAC-seq), and self-transcribing active regulatory region sequencing (STARR-seq). Machine learning in molecular biology can be used also to build model to predict regulatory elements and non-coding variant effects de novo from a DNA sequence ( Zou et al., 2018 ). Recently, machine learning approaches have been used in population and evolutionary genetics, to address questions such as the identification of regions under purifying selection or selective sweep. Moreover, machine learning approaches have been used to predict transcript abundance ( Washburn et al., 2019 ), imputation of missing SNPs and DNA methylation states ( Sun and Kardia, 2008 ; Angermueller et al., 2017 ), and to accurately calling genetic variants present in an individual genome from billions of short full-of-errors sequence reads ( Poplin et al., 2018 ).

From applications in genetics and genomics, machine learning has quickly made the leap to applications in medicine, to address disease diagnosis, disease classification, and to precision medicine to assess disease risk, take preventive measures, make diagnoses and define personalized treatment. However, precision medicine, and more in general medicine, are not only about predicting risks and outcomes, but also about predicting clinical models. In this regard, Prosperi et al. (2020) point out that interventional clinical predictive models require the correct specification of cause and effect, and the calculation of alternative scenarios. The deduction of cause and effect relationships is generally done in experiments or with the help of data-driven approaches. Many questions in biomedical research can only be answered with observational studies. Unfortunately, however, unlike controlled experiments or well-planned, experimental randomized clinical trials, observational studies are subject to a number of potential problems that may jeopardize the reliability of their results. Factors that may bias the results of observational studies include selection bias resulting from the way study subjects are recruited or from differing rates of study attendance depending on the subjects’ cultural background, perception of the problem, age, or socioeconomic status, information bias, measurement error, and confounders ( Hammer et al., 2009 ). Under these conditions and without a substantial a priori knowledge, causal inference is not feasible. On the other hand, data-driven prediction models - often implemented by machine learning algorithms - even assuming they are derived from an experiment in which several bias were minimised, should be used with great caution and their results should be subjected to critical review before interpretation. Although these methods are widely used to draw cause-effect relationships, attention must be paid to the fact that neither their parameters nor their predictions necessarily have a causal interpretation ( Prosperi et al., 2020 ). Therefore, the belief that data-driven prediction models allow trustable decisions and efficient planning of interventions for precision medicine, and in general, for medicine, is doubtful.

The same problem can be found in molecular biology, where a computational method for inferring biological networks, such as gene regulatory, protein-protein, metabolic, and signalling networks, has been sought for years since the emergence of the paradigm of systems biology in the early 2000s. Critical reviews and comparative analyses of the various methods that have been proposed over the years can be found in various reviews and research papers over the last 10 years, some of which in these references ( Veiga et al., 2010 ; Dongarra et al., 2011 ; Oates and Mukherjee, 2012 ; Omony, 2014 ; Chang et al., 2015 ; Zarayeneh et al., 2016 ; Angermueller et al., 2017 ; Liu et al., 2019 ; Lu et al., 2021 ), who principally present data integrative statistical methods and methods of network reconstruction and causal contextualization of reconstructed networks. The development of experiments which, especially in genetics and genomics, manage to collect large amounts of heterogeneous data, along with the today-real possibility of high-performing computers and cloud parallel architectures have made it possible to apply various machine (and deep) learning methods to the problem of deducing causality relationships in a biological network. Consequently, the scientific literature already proposes several articles in which machine (and deep) learning approaches are used and applied in the most appropriate ways for the specific context of investigation.

Very recent contributions can be found in ( Yuan and Bar-Joseph, 2019a ; Badsha and Fu, 2019 ; Kishan et al., 2019 ; Li F. et al., 2020 ; Muzio et al., 2020 ; Yazdani et al., 2020 ) and a primer on machine learning for life scientists, including an introduction to deep learning can be found in ( Kishan et al., 2019 ). In particular, Camacho et al. ( Kishan et al., 2019 ) discuss opportunities and challenges of the application of machine learning to network biology, and envisage the impact on disease biology, drug discovery, microbiome research, and synthetic biology. The reader can find a comprehensive review on the opportunities of intersection between machine learning and network biology in ( Camacho et al., 2018 ). Machine learning approaches have been used more for network reconstruction and network inference than for causal inference, a task the latter of which has been partly attempted to be solved by informing inference algorithms with a priori knowledge about network nodes and edges. Nevertheless, the current literature provides a very solid and promising basis from which to implement machine learning approaches for causal discovery.

Despite the promising premises that these works hint at, it is the opinion of many experts in artificial intelligence that—using Schölkopf et al. (2021) words—if we compare what machine learning can do to what animals accomplish, we observe that the former is rather bad at some crucial feats where animals excel ( Schölkopf, 2019 ; Schölkopf et al., 2021 ). These crucial operations include transfer to new problems, and any form of generalization that is not from one data point to the next one (sampled from the same distribution), but rather from one problem to the next one. Schölkopf et al. (2021) also note that this is not surprising, because machine learning often disregards information that animals use heavily, such as interventions in the world, domain shifts, and temporal structure. When designing machine learning algorithms, these factors are usually considered a nuisance and a source of noise, and are therefore not included. Yet these same factors are what would enable machine and deep learning methods to infer causal structures.

Finally, as we will discuss later in the paper, we recall that in any computational pipeline dedicated to causal inference, methods of identifying causal variables are of strategic importance. In this respect, there is already available a literature of advanced computational methods and advanced experimental technology. Noticeable findings that provide a solid basis on which to develop new methods for the identification of causal variables are, for example, studies aimed at the identification of cancer biomarkers as the study of Zhang et al. (2020a) , who proposed a novel method for high-throughput identification of cancer biomarkers in human body fluids. Their use in research has increased greatly in current research, given the importance of biomarkers in defining the causal pathway of a disease ( Mayeux, 2004 ). The method in ( Zhang J. et al., 2020 ) integrates physicochemical properties and the weighted observed percentages and position-specific scoring matrices profiles to enhance their attributes reflecting the evolutionary conservation of the body fluid-related proteins. The least absolute selection and shrinkage operator feature selection are used to generate the optimal feature subset. By the same author ( Zhang et al., 2019 ), is a paper on the collection of data also needed for the identification of the cancer biomarkers, and another paper introducing and discussing structure-trained predictors to predict protein-binding residues. Zhang J. et al. (2017a) presented also a method to detect bioluminescent proteins, that by virtue of their capability of emitting lights, can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Zhang and co-authors in ( Zhang J. et al., 2017 ) collected a series of sequence-derived features known to be involved in the structure and function of bioluminescent proteins. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. They found the combination of four types of features that outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, they introduced Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets.

In this paper, some possible scenarios for the development of current machine learning approaches towards machine learning approaches capable of inferring causal relationships between components of a biological system are presented. The paper does not pretend to cover all the issues involved in the effort to make machine learning capable of causality. Such an attempt is largely impossible at present. In fact, it is only in recent years that the use of machine learning techniques has become ubiquitous, and so it is only now that its limitations are beginning to be understood. However, we focus on a rather large and very topical and timely slice of current research on causal discovery in machine learning. i.e. the analysis of the issues and possible perspectives that machine learning has in structural causal model inference. Causal inference methods based on Pear’s ( Pearl, 2010 ) and other theoretical work with counterfactuals and structural causal models (see a comprehensive summary of them in ( Nogueira et al., 2021 ) and a seminal works in ( Andrieu et al., 2003 ; Yin and Yao, 2016 )) have recently paved the way for the improvement of machine learning models, especially in biology, biomedicine, and recently in epidemiology ( Triantafillou et al., 2017 ; Glymour et al., 2019 ; Castro et al., 2020 ; Rivas-Barragan et al., 2020 ; Rose, 2020 ; Wilkinson et al., 2020 ; Raita et al., 2021 ), the increment transparency of model assumptions, and the help control for confounding variables, the understanding of counterfactual reasoning, and ultimately the increment of the understanding of the effect of training set selection bias, and the causal discovery ( Piraino, 2018 ).

The paper outlines as follows: Section 2 introduces some basic concepts of machine learning and mathematics of structural models and presents the current state of the art and the possible imminent perspectives on the development of machine learning methods for the learning of causal graphs. In particular, the session proposes the perspective of a strong coupling between meta-modelling and meta-learning to overcome the current limitation of machine learning in performing causal discovery. Section 3, comment on popular machine learning algorithm who has been reformulated specifically for causal discovery, deepens the perspectives presented in Section 2, presents the possible difficulties and proposes modular meta-leaning upstream of meta-modelling as the next challenge and opportunity for the development of machine learning that wants to perform causal inference. Finally, Section 4 draws some conclusions. The perspectives presented and discussed in this paper are conceived with particular reference to biological networks of any scale (molecular, cellular, ecosystem), but remain valid also in other research areas where the problem of inferring causal relationships in a dynamical system is posed.

2 Machine Learning and Structural Causal Models

A machine learning algorithm is a computer program that is able to learn from data. Mitchell ( Mitchell, 1997 ) provided this definition: ”A computer program P is said to learn from experience E with respect to some class of tasks T and performance P, if its performance at tasks in T, as measured by performance P, improves with experience E”. A variety of experiences E, tasks T, and performance measures P, can be used to build a machine learning algorithm ( Goodfellow et al., 2016 ). Learning is the ability to perform the task, which are usually described in terms of how the machine learning should process an example . An example is a set of features that have been quantitatively measured from some object or event that we want to process with machine learning. An example is usually represented by a vector x ∈ R n , whose entries are features. Machine learning can perform many tasks, such as classification, regression, transcription, machine translation, anomaly detection, imputation of missing values, de-noising, probability density estimation ( Goodfellow et al., 2016 ), but causal inference is still a challenge for it, because of its inability to implement a generalization from one problem to the next, rather than a generalization from a data point to the next (both sampled from the same distribution). Schölkopf et al. (2021) use an evocative term for the first form of generalization, that is “out-of-distribution” generalization. In order to understand the nature of the problem more fully, it is necessary to start with a formal description of a causal model, which we briefly outline in the following subsection adopting a notation similar to that of Schölkopf et al. ( Schölkopf et al., 2021 ). We then present some perspectives for machine learning of a structural causal model.

2.1 Learning a Structural Causal Model: The Present and the Perspectives

Consider a set of variables X 1 , … , X n associated with the vertices of a directed acyclic graph (DAG), and assume that each observable is

where f i a deterministic function depending on X i ’s parents in the graph (denoted by PA i ) and noise random variable U i . Directed edges in the graph represent direct causation, since the parents are connected to X i by directed edges and through (1) directly affect X i . The presence of a noise term make X i itself a random variable, for which it is possible to define a conditional probability distribution P ( X i | PA i ) The noises U 1 , … , U n are assumed to be jointly independent. If they were not, then by the Common Cause Principle there should be another variable that causes their dependence, and thus the model in (1) would not be causally sufficient.

The graph structure along with the joint independence of the noises implies the factorization of the joint probability P ( X 1 , X 2 , … , X n ) into causal conditionals as follows ( Schölkopf et al., 2021 )

The factorization decomposes the joint distribution into conditionals corresponding to the functions f i in Eq. 1 . These assignments can be interpreted as the causal relationships responsible for all statistical dependencies among the variables. A structural causal model allows a straightforward formalization of interventions as operations that modify the arguments of the function f i in (1) , e.g., changing U i , or changing the functional form of f i itself. In particular, changing the form of f i means changing the dependency of X i on its parents ( Spirtes et al., 1993 ; Neuberg, 2003 ; Schölkopf et al., 2021 ). It is worth to note that there is a substantial difference between the statistical model and the causal model. If we dispose of a causal model, by a learning process, causal reasoning allows us to draw conclusions on the effect of interventions, and potential outcomes. On the contrary, statistical models only allow to reason about the outcome of independent identically distributed experiments.

There are two fundamental elements in a structural causal model: the graph and the functions f i , and then the fundamental question to ask is whether it is possible to deduce from the data both the graph and the functions f i . The question does not have an easy answer considering the fact that the deducing the graph is in general depending on deducing the functions. A common method to infer the graph from data is performing conditional independence tests, i.e. testing whether two random variables X and Y are independent, given Z . The conditional independences are implied by the causal Markov condition stating that if G is a causal graph with vertex set V and P is a probability distribution over the vertices in V generated by the causal structure represented by G , G and P satisfy the Causal Markov Condition if and only if for every X ∈ V , Y is independent of V ∖(Descendants( X ) ∪Parents( X )) given Parents( X ) ( Hájek, 2011 ). The causal Markov condition holds regardless of the complexity of the functions appearing in an structural causal model ( Schölkopf et al., 2021 ), and this is definitely an advantage of the condition independence tests. However, it is well-known that testing for conditional independence is a hard statistical problem, especially if Z is a continuous random variable ( Shah and Peters, 2020 ). Another problem, highlighted by Schölkopf ( Schölkopf et al., 2021 ) is that in the case of only two variables, the ternary concept of conditional independence is not applicable and the Markov condition thus has no non-trivial implications. A possible solution to both problems could be found by making assumptions on the function in the structural causal model. This is the route typically taken by machine learning methods. It is well known that in absence of assumptions on the functions f i it not possible to generalize finite-sample data. To better explain these statements, let’s take a practical example.

A non-linear structural causal model is a generative process of the following form ( Galanti et al., 2020 ):

The functions g : R D f + D U → R D Y and f : R D X → R D f are non-linear unknown functions. Here, as in Eq. 1 , X is the input random variable and U is the noise variable that is independent of X . We say that X ∈ R D X causes Y ∈ R D Y if they satisfy a generative process, such as Eq. 3 .

For any joint distribution P ( X , Y ) of two random variables X and Y , there is a structural causal model, Y = g ( f ( X ), U ), such that, X ⊥ U and f , g are some measurable functions. Therefore, in general, deciding whether X causes Y or vice versa is ill-posed when only provided with samples from the joint distribution. However, for the one dimensional case (i.e., X , Y ∈ R ), under reasonable conditions, a representation

holds only in one direction ( Zhang and Hyvärinen, 2010 ). The model for Y in Eq. 4 is an additive noise mode, and thus represents a restricted class of functions f , The restriction is considerable compared to the possibilities for f offered by model in Eq. 3 . Assuming a noise additive model make it easier to learn functions from data, and can break the symmetry between cause and effect in the two-variable case. Schölkopf et al. ( Schölkopf et al., 2021 ) report several literature references showing that given a distribution over X , Y generated by an additive noise model, one cannot fit an additive noise model in the opposite direction (i.e., with the roles of X and Y interchanged). The assumption can be justified when the function f depends weakly on U and U has a not too large variance.

Restriction of function classes is not the only strategy to simplify the causal structure inference. The so-called “distribution-shifts”, i.e. observing the systems in different environments and contexts may also help to infer causal structure. Different contexts can come for instance from different interventions and from heterogeneous non-stationary data ( Zhang K. et al., 2017 ). In a very general scenario different contexts may require the execution of different tasks and, thus, prepare the ground for the meta-learning, i.e. learning algorithms that learn from the output of other learning algorithms. Meta-learning is derived from meta-modelling with which it shares many objectives. Both rely on meta-data, i.e. data that describe other data, to model a predefined class of problems. Both are used to define the output and input relationships and then may be used to identify the best model representing the behaviour of the data ( Hartmann et al., 2019 ). Meta learning studies and approaches has started in 1980s and became popular after Schmidhuber ( Schmidhuber, 2015 ) and Bengio’s works ( Goodfellow et al., 2016 ). The interest in meta-learning accelerated especially after the massive use of deep learning and advanced machine learning algorithms. The increment of the difficulties to train these learning algorithms generated a stronger interest for meta-learning studies. Currently, the major application domains of meta-learning for machine learning in bioinformatics have been genomic survival studies in cancer research ( Qiu et al., 2020 ), and the estimation of heterogeneous therapeutic treatment effects ( Künzel et al., 2019 ; Rivas-Barragan et al., 2020 ). Let us therefore look in more detail at the methodology of meta-learning for identifying causal directions.

An interesting prospect for meta-learning is the development of methods for learning conditional probability functions rather than f i functions. Learning probability models is a more general approach than learning dynamical functions, it requires less a priori knowledge, allows it to be questioned in the light of new knowledge, thanks to Bayes’ theorem, and incorporates a comparison of models and possible scenarios by comparing the likelihoods of the models themselves. To learn the joint distribution of two variables X and Y we can use their conditional distributions p X | Y ( X , Y ) and p Y | X ( X , Y ) alongside their marginal distributions p X and p Y . In a Bayesian framework, we can write the following probabilities

Let us assume that the true causal direction is X → Y and use the training distribution p 0 ( x , y ) = p 0 ( x ) p ( y | x ), Thereafter, the distribution is changed to the transfer distribution p 1 ( x , y ) = p 1 ( x ) p ( y | x ). According to a recent methodology proposed by Wong et al. ( Wong and Damjakob, 2021 ) both networks, X → Y and Y → X , are meta-trained to the transfer distribution for N steps according to the following two step process:

1) The relationship between X and Y is learned using two models: one assumes X causes Y , the other the opposite causal direction;

2) the distribution of X is changed to a transfer distribution. Both models are retrained on the new data and the resulting likelihoods are recorded.

The Resulting Likelihood Are

where P X → Y , n denotes the trained Bayesian network after step n . The loss function is calculated as

where α is a structural parameter defining the causal direction and σ (⋅) is a sigmoid function. From Eq. 6 , we can see that ∂ R ∂ α > 0 if L X → Y < L Y → X , i.e. if P X → Y is better at explaining the transfer distribution than P Y → X . Bengio et al. ( Bengio et al., 2020 ) showed that

where E [ ⋅ ] denotes the expected value, and Data_transfer is the data drawn from the transfer distribution. Regarding Eq. 7 , Wong et al. ( Wong and Damjakob, 2021 ) observed that as the loss function modelling the correct direction ( X → Y ) only needs to update its estimate for the unconditional distribution P X → Y ( X ) from p 0 ( x ) to p 1 ( x ) while the reverse direction network ( Y → X ) needs to change both P Y → X ( Y ) and P Y → X ( X | Y ), we indeed obtain that the loss function for the correct direction has a lower expected value. Then, in summary, the meta-training of the candidate networks, the estimation of loss function, its likelihoods and their expected values, provide a methodology to recover causal directions.

The deduction of causal directions with meta-learning is still a new research topic, but meta-learning is in perspective a very promising approach to develop machine learning methods capable of causal discovery. The greatest strength of meta-leaning, as we could see in the works of Wong and Damjakob (2021) and Bengio et al. (2020) here reported, lies in not considering assumptions on the data distribution itself, but rather on how it changes (e.g., when going from a training distribution to a transfer distribution, due some agent’s actions). However, there is a need to integrate meta-learning into a composite upside-down framework that includes the following phases in the following order: 1) meta-modelling. 2) meta-learning, and finally 3) testing of classes of candidate functions f i . Instead of posing the problem of learning the causal graph as a problem of determining f i functions, one would pose it as an efficient process of learning for learning’s sake, and only at the end of this process, once a set of causal graphs has been obtained, e.g. according to schemes similar to Wong’s ( Wong and Damjakob, 2021 ), one proceeds to the identification of the optimal f i functions for each of these graphs. At this point the determination of the optimal f i functions becomes a regression problem of the available experimental data. Figure 1 illustrates a scheme of this approach. Reducing the number of classes of functions in order to allow more nimble machine learning upstream of the whole learning procedure could create biases that are difficult to identify. Indeed, it could happen that the structure of the functions that is imposed at the beginning for simplifying purposes is such as to supply an incorrect causal model, precisely because it is too simplified or too abstract and therefore far from the physical reality of the task that one wants to learn. In these cases an evaluation of the goodness of the learning method based on measures of performance like the count of the false positives and false negatives and/or on other classic tests could not reveal this problem and in cases of very good performances it could even lead to think that the functions f i are descriptive of the physical process that governs the interactions among the nodes of the graph.

www.frontiersin.org

FIGURE 1 . Outline of a computational procedure using upstream meta-modelling for the inference of causal structures. Meta-learning into a composite upside-down framework that includes the following phases in the following order. Step 1: meta-modelling first provide candidate models; step 2: meta-learning is designed to learn from data the conditional probability structure of these models where the structure and parameters of mathematical relations defining the interaction between nodes (i.e. the functions f i , i = 1, 2, … , M with M the number of arc in the probability graph) are then determined by regression methods (step 3). By employing meta-modelling upstream to meta-learning, and meta-learning it-self in place of a direct application of machine-learning, this pipeline extends a typical machine learning approach that generally poses the problem of structural causal discovery as a problem of learning the functions f i . In this pipeline, the determination of optimal f i functions is posed as a regression problem once meta-modelling and meta-learning has identified wiring diagrams. The data are essential to the learning and regression procedure. The data are typically divided into train set, test set and cross-validation set. Cross-validation is a resampling procedure used to assess machine learning models on a limited data sample, and for the sake of simplicity in this figure is reported as a subset of the dataset. However, the procedure has a single parameter called K that refers to the number of groups that a given data sample is to be split into (for this reason for “cross-validation” it is usually meant K-fold cross-validation.). In K-fold cross validation we have multiple (K) train-test sets instead of 1, so that we train and test the model K-times. The purpose of doing this is that in a single train-test split, the test part of the data that we chose might be really easy to predict and the model will perform extremely well on it but not exactly so for the actual test sets. The image of the laboratory in this figure is part of the Pixabay free online pictures ( https://pixabay.com/it/ ).

In this new perspective the restriction of the classes of function is the results of a discrimination among the fitting classes of function, rather than an upstream simplification of the learning process. The starting point is instead the meta-modelling, i.e. construction and development of the frames, rules, constraints, and theories applicable for modelling a predefined class of problem. So, instead of restricting the class of possible functions, we start by considering a set of models for describing a class of problems, which in the language of machine learning derive from different contexts in turn obtained from different interventions. Meta-learning is then applied to learn the structural causal models outlined by meta-modelling. Preceding the machine learning phase by the meta-modelling phase could be a successful strategy to mimic the learning patterns of humans. Using a metaphor to explain machine learning from meta-models, we could say that people who know how to drive a car can most likely figure out how to drive a truck after a few instructions and a short demonstration. This may be a loosening and pursuable perspective, but it is not without its challenges. While meta-learning is evolving towards causal discovery meta-learning, popular classification algorithms, such as k-nearest neighbour (KNN), support vector machine (SVM), and random forests (RF), have been repurposed in the guise of causal KNN, causal SVM and causal RF as we see in the following sections. The KNN agorithm is the one whose mathematical specification has been most closely adapted to develop its own version for causal inference, and it is to this that we devote more space in the following. The other two approaches have instead been used more in combination with other techniques for causal inference in order to implement a computational pipeline for causal inference.

2.2 Causal K-Nearest-Neighbourhood

Zhou and Kosorok (2017) first and then Hitsch and Misra ( Hitsch and Misra, 2018 ) introduced a way to use KNN to causal discovery. The method became rapidly popular and has been implemented in common statistics software libraries (see for example ( Kricke and Peschenz, 2019 )). In particular, in the application domain of precision medicine, the first proposed a causal k-nearest neighbour (CKNN) method to estimate the optimal treatment regime and the causal treatment effects within the nearest neighbourhood. The purpose was to tailor treatments to individual patients to maximize treatment benefit and reduce bad side-effects. Nevertheless, Zhou and Kosorok (2017) method is also applicable to biological networks, such as gene networks or protein-protein networks, when one wants to identify nodes or pathways affected by drug treatments or when one wants to design drug repurposing strategies, or drug combination prediction, and more in general to identify differences between networks of different patients. In this respect, we refer the reader to some significant recent studies such as ( Feng et al., 2017 ; Cheng et al., 2019 ; Hasan et al., 2020 ; Lu et al., 2020 ; Adhami et al., 2021 ; Ruiz et al., 2021 ; Somolinos et al., 2021 ). In this context, the crucial step is the treatment selection rule, or optimal treatment regime. By “treatment regime” we mean a decision rule that assigns a treatment to a patient based on his or her clinical or medical characteristics. Similarly, for a biological network by “treatment regime” we mean a decision rule that assigns a therapeutic intervention to a network based on the state of the network or some of its pathways (e.g. altered pathways in disease state). In the next, we describe and comment on how the decision rule is constructed in a KNN method.

The KNN rule is a classification approach, where a subject is classified by a majority vote of its k neighbours. As noted by Zhou et al. ( Zhou and Kosorok, 2017 ), the rationale of nearest neighbour rule is that close covariate vectors share similar properties more often than not. We briefly summarize causal KNN method, using a notation similar to that of Zhou and Kosorok (2017) .

Consider a randomized clinical trial with M treatment arms. Let R ∈ R denote the observed clinical outcome, A ∈ A = 1 , . … , M denotes the treatment assignment received by the patient, and X = ( X 1 , … , X p ) T ∈ X ⊂ R p , where X is compact,

Hldenotes the covariates, describing the feature of a node (e.g. the patient’s clinical covariates in a network patients-treatments, or gene expression in a gene network, or protein concentration in a protein-protein network). Let

denote the probability of being assigned treatment m for a node with covariates x . In the Zhou et al. framework this probability is assumed to be predefined in the design. Potential outcomes, denoted by R *(1), … , R *( M ), are introduced and are defined as the outcomes that would be observed were a node to receive treatment 1, … , M , respectively. Very often in literature, we find two assumptions regarding the potential outcomes:

1) Consistency assumption: the potential outcomes and the observed outcomes agree, i.e.,

where I (·) is the indicator function;

2) No unmeasured confounders assumption: conditional on covariates X , the potential outcomes { R *(1), … , R *( M )} are independent of the treatment assignment A that has been actually received.

Let’s make some remarks on these assumptions right away. We point out that it is often assumed that assumption two is valid in the case of randomised trials, but this belief is debatable. We consider it more prudent to state that in randomised trials the number of unmeasured confounders is reduced, but not completely eliminated and that, in any case, the influence on the predictive ability of the algorithm is not only given by the number of possible confounders, but also by their role in the system studied.

Mathematically, a treatment regime d is a function from covariates X to the treatment assignment A . For a treatment regime d , we can thus define its potential outcome Let

be the potential outcome of a treatment regime d . Let denote with E ( R * ( d ) | X = x ) the expected potential outcome under any regime d . An optimal regime d opt is a regime that maximizes E ( R * ( d ) ) i.e.:

By the consistency assumption and non-unmeasured confounder assumption, we have that

The causal nearest neighbour algorithm implements the following steps (i) to find a neighbourhood of x in X , (ii) to find an estimate m ̂ = E ( R | X = x a n d A = m ) for each arm in this neighbourhood, and (iii) to plug m ̂ into Eq. 10 to obtain the nearest neighbour estimate for the optimal treatment regime, i.e.

d opt C K N N ( x ) is called the causal k-nearest neighbour regime ( Zhou and Kosorok, 2017 ).

We refer the reader to the works of Zhou and Kosorok (2017) , Hitsch and Misra ( Hitsch and Misra, 2018 ), ( Kricke and Peschenz (2019) for the models used to calculate m ̂ and to the technical algorithmic details. What is important to note here is that the causal nearest neighbour regimes are calculated as local averaging, and k is a tuning parameter. It is required that k be small enough so that local changes of the distribution can be detected. On the other hand, k needs to be large enough so that averaging over the arm is effective. This parameter can be estimated by a cross validation procedure to balance the two requirements, provided we have a 1) sufficiently large sample and 2) sufficient computational resources to deal with large complex network. It is however well known that biological network inference is in many realistic situations an undetermined problem, since we size of the node covariate sample is small, whereas the size of the network is huge. However even assuming that computational costs can be managed and optimal covariate sample size can be obtained, still we have to deal with the problem that the potential outcomes of any node only depends on their nearest neighbours. The concept of neighbour is defined through the concept of distance, so ultimately the results of a CKNN in terms of both interpretation and reliability depend on the definition of distance we use. We point out that here by distance we do not mean the metrics e.g. Euclidean distance rather than Manhattan or Minkowski distance or others, but precisely the physical quantities and related processes we use in the definition of these metrics. For example, using the difference between the expression levels of genes in a gene network or protein concentrations in a protein-protein network might be insufficient for the purposes of causal inference, since cause-effect relations between nodes might not manifest themselves through a variation of this distance and might not manifest themselves only through appropriate behaviour of this distance.

2.3 Causal Random Forests

A random forest (RF) algorithm consists of many decision trees, i. e. a “forest” generated by the algorithm itself. The forest is trained through bootstrap aggregating. The RF algorithm establishes the outcome based on the predictions of the decision trees. It predicts by taking the mean of the output from various trees. Causal random forests (CRF) are recently proposed as a causal inference learning method that are an extension of Random Forests. In random forests, the data is repeatedly split in order to minimize prediction error of an outcome variable. Causal forests are built similarly, except that instead of minimizing prediction error, data are split in such a way to maximize the difference across groups in the relationship between an outcome variable and a “treatment” variable. Also in the context of CRF “treatment” is used in the broadest sense of the term. Causal forests simply estimate heterogeneity in a causal effect. In fact, the term causal referring to random forest can be misleading, as causal forests can reveal heterogeneity in a causal effect, but they do not by themselves make the effect causal. There have been interesting approaches to achieve this goal very recently, see for example the work of Li et al. (2020a) which developed a causal inference model combining Granger causality analysis and a random forest learning model. In the same vein, we find the works of Schmidt et al. (2020) . and Tsai et al. (2020) , all these devoted to identify cause-effect relationships in climatic phenomena, and at the present still not easily generalizable to the inference of biological networks of different kinds, due to the different physical nature of the climatic effects and the large variety of biological interactions. Nevertheless, on the same methodological line, there have also been important achievements in this direction in gene regulatory networks in the recent past, such as the development of a random forest algorithm for gene regulatory network inference by Petralia et al. (2015) , Furqan and Siyal (2016) , Deng et al. (2017) , Huynh-Thu and Geurts (2018) , Kimura et al. (2020) , Zhang et al. (2020a) , Cassan et al. (2021) . The majority of the methods base on random forest for causal discovery in gene regulatory networks.

Most of these approaches implement upstream of the inference process the integration of large amounts of data of different natures that are indispensable for inferring causal relationships in structures as complex as biological networks. The complexity of a biological network, be it a gene regulatory network or a signalling network or a metabolic or biochemical network, lies in its size expressed by the number of nodes and the potential number of arcs and very often by the potential non-linear relationships between nodes that challenge the reliability and the accuracy of the regression techniques. The big amount of heterogeneous data would require the RF algorithm to generate a large quantity of trees to improve its efficiency. However, it is well known that the main limitation of random forest is that a large number of trees can make the algorithm too slow and ineffective for real-time predictions.

Furthermore, it is also well known that an RF algorithm cannot extrapolate. It can only calculate an average of previously observed labels. This means that when applied to a regression problem, a RF algorithm provide a range of predictions that is bound by the highest and lowest labels in the training data. This behaviour is regrettable when the training and prediction inputs differ in their range and/or distributions. This is called covariate shift and it is difficult for most models to handle (also to KNN) but especially for RF algorithms, because they only interpolate. The frequency with which the problem of covariate shift may be encountered is also very high when using heterogeneous biological data to aid causal inference in complex biological networks.

The current literature is promising regarding the applications of RF-based methods for causal inference in biological networks, but we believe that there are still many steps to be taken to overcome these limitations and to arrive at a mathematical model underlying RF angles that makes these approaches generalizable to networks other than gene regulatory networks, e.g. metabolic and biochemical networks.

2.4 Causal Support Vector Machine

Support vector machines (SVMs) appeared in the early nineties as optimal margin classifiers in the context of Vapnik’s statistical learning theory. The SVM algorithm aims to find a hyperplane in an N -dimensional space (where N is the number of features) that distinctly classifies the data points. SVMs can be used both for classification and regression tasks.

Moguerza and Muñoz (2006) highlights that an advantage of the support vector approach is that sparse solutions to classification and regression problems are usually obtained. i.e. only a few samples are involved in the determination of the classification or regression functions. This fact constitutes a facilitation of the application of SVMs to problems that involve a large amount of data, such as text processing and bioinformatics tasks. However, to the best of our knowledge, their evolution as a function of causal inference has not yet been developed, although in the literature we find some works in which SVMs are used in combination with other techniques in order to infer the structure of gene regulatory networks. In these regards, we report the work Gillani et al. (2014) who proposed CompareSVM a tool that can be used to infer gene regulatory network highly accurate for networks with less than 200 nodes. The tool employs SVM Gaussian kernel for biological datasets (knockout, knockdown, multifactorial and all). The authors state that for large network, choice of algorithm depends upon the type of biological condition. Interestingly they state that since there are variations in prediction accuracy in all inference methods, prediction should be limited for simple network. Furthermore they envisage that future work is needed for the development of semi-supervised methods capable of predicting targets of transcription factors which have no prior known targets.

Another study, representative of the works using SVMs in biological network inference, is the paper of Vert et al. (2007) , who deal with inferring network edges in a supervised way from a set of high-confidence edges, possibly characterized by multiple, heterogeneous data sets (protein sequence, gene expression, etc.). In this setting the authors distinguish between two modes of inference: direct inference based upon similarities between nodes joined by an edge, and indirect inference based upon similarities between one pair of nodes and another pair of nodes. Theirs is a supervised approach for the direct case consisting of learning a distance metric. In this framework, a relaxation of the resulting convex optimization problem leads to the a SVM algorithm with a particular kernel for pairs, that is called “the metric learning pairwise kernel”. The proposed methods hold the promise of being used by most SVM implementations to solve problems of supervised classification and inference of pairwise relationships from heterogeneous data.

Finally, a recent work by Le Borgne et al. (2021) not specifically on biological networks, but on treatment-effect networks, found that SVM approach is competing with the most powerful recent methods, such as G-computation ( Snowden et al., 2011 ) for small sample sizes with one hundred nodes when the relationships between the covariates and the outcome are complex. These findings, as well as the literature mentioned in this section, constitute important insights into the development of an efficient future causal version of SVMs.

3 The Challenges of the Modern Machine Learning

The two main challenges that machine learning algorithms have to face are:

• the need for large datasets for training and the high operational costs due to many trials/experiments during the training phase

• and the remarkable dependency of the algorithms results of the training data, and the risk of over-fitting.

These are the challenges of current machine learning approaches and should be distinguished from the challenges faced by users of these approaches, such as for instance:

• data quality control

• exclusion of irrelevant features.

Not in all areas of application of computational biology, we can count among the challenges of relevance to the user of machine learning, the collection of high-dimensional data samples, as this is not always experimentally feasible and could often be very expensive. For this reason, when dealing with biological networks, which can be not only gene networks (where indeed in the last decade the volume and heterogeneity of the data strongly continued to grow), but also biochemical networks, protein-protein interaction networks, metabolic network, and signalling networks (where the volume of experimental data is lower and the sample size is not always optimal), we prefer to indicate the ability to infer causal structures from a limited number of data as a challenge that computational procedures must try to win. The challenges for users of machine learning, listed here, are common to many computational approaches and are addressed by many textbooks and many innovative solutions presented in papers in the scientific literature. Given the large amount of literature on the subject and the fact that these challenges are not unique to machine learning algorithms, we do not address them in this work. Instead, in Section 3.1, we put into perspective two possible approaches to solving the problems concerning the large amount of training data and the risk of over-fitting, i.e., lacking predictive and abstractive capabilities. The first perspective concerns meta-modelling upstream of any learning procedure and in particular modular meta-modelling, while the second perspective concerns meta-learning and modular meta-learning. We believe that these two perspectives can contribute to overcoming the challenges posed to machine learning algorithms. In particular, the first perspective can be useful in overcoming the first challenge, while the second perspective can be useful in overcoming the second challenge, even if only the synergy between the methods proposed in the first and second perspective is much more effective in achieving both objectives in network biology. Finally, section 3.2 reports on one of the major problems that meta-learning methods also will have to solve, namely connecting causal variables to data, and the necessity to rely both on observational and interventional data.

3.1 Modular Meta-modelling for Modular meta-Learning

Meta-modelling goes in the direction of reducing the amount of needed data (for this purpose, recent meta-learning algorithm are also equipped with data-augmentation procedures ( Ni et al., 2021 )), and computational costs and times, but a further step in these directions can be taken by exploiting the modular structure of many physical systems. Biological systems, and specifically biological networks are known to have a modular organisation. to give an example, there several studies dealing with the mergence of modularity in gene networks ( Rives and Galitski, 2003 ; Lorenz et al., 2011 ; Zhang and Zhang, 2013 ; Hütt, 2019 ; Serban, 2020 ) and in protein-protein network ( Zhang et al., 2010 ). Modularity is a ubiquitous phenomenon in various biological systems, both in genotype and in phenotype. Biological modules, which consist of components relatively independent in functions as well as in morphologies, have facilitated daily performance of biological systems and their long time evolution in history. Indeed, modularity in biological networks is an emergent property evolved to be highly functional within uncertain environments while remaining remarkably adaptable. Modular organization is one of the main contributors to the robustness and evolvability of biological networks ( Hintze and Adami, 2008 ). In order to understand the intimate connection between the modular organisation of a biological network and the improvement of meta-learning efficiency we start from the following considerations.

An artificial or natural system in a complex world is faced with limited resources. This concerns training data, i.e., we only have limited data for each task/domain, and thus need to find ways of pooling/re-using data ( Schölkopf et al., 2021 ). It also concerns computational resources. A biological network has a limited size defined by the number of its nodes and arcs to solve the plethora of tasks in the daily life. Seen as a computational system that responds to stimuli and reconfigures its algorithms to changes in the surrounding environment, it is therefore forced to adapt its topology and consequently its computational performances to the environment conditions and the tasks it have to accomplish. The adaptability and the consequent evolvability of a biological network are made possible by the ability of the network to factor out variations of tasks and contexts. The modular structure of a biological network is the key of the mechanisms allowing to the network to factor out tasks and contexts.

Future machine learning models that aim to infer a biological network should be implemented to learn the modular structure of the biological network itself and also the crosstalks between the network modules. The latter capability in particular is the one that allows machine-learning for network inference to be carried out with less data and a limited number of computational resources. It is advisable that machine learning methods to learn the functional modules and their interactions mimic the way of learning of animal brain. In fact, modularity is the principle that provides a natural way of achieving compositionality and generalization. An example from Schölkopf’s work ( Schölkopf et al., 2021 ) may help to better understand this statement. If, because of the variations of natural lighting the environment can appear in brightness conditions spanning several orders of magnitude, then visual processing algorithms in animal brain factor out these variations, so they do not need to build separate sets of object recognition algorithms for every lighting condition. Building different object recognizers would require considerable computational expenses and would involve the risk of not having sufficient computational resources within the physical dimensions of the brain.

The modular structure and the interactions between the modules of a biological network, if reflected in an automatic learning procedure, are what allows artificial intelligence to factor out different variations of tasks and contexts, to save computational resources, and to require less training data. Paraphrasing a statement by Schölkopf ( Schölkopf et al., 2021 ) regarding the components of an AI device, we could say that “if the world is indeed modular, in the sense that components/mechanisms of the world play roles across a range of environments, tasks, and settings, then it would be prudent for a machine learning approach to employ corresponding modules”. This also holds with regard to the capability of a machine learning model to implement causal discovery. Modular meta-learning for causal discovery could be the new research Frontier of the systems biology. Modular meta-learning would in fact allow learning sets of network modules ( Alet et al., 2018 ) whose dynamic interactions and adaptive reconfiguration mechanisms could be learned subsequently. This perspective is depicted in Figure 2 . A modular network maps the modular structure of the set of function of a real world system. Then, modular meta-learning approaches learn the causal structure of processes within each cluster of the network, and finally meta-learning approaches infer the causal structure of the network connecting the clusters. Regression procedures are applied at the end of modular meta-learning to determine the f i functions and at the end of meta-learning to determine the F i functions describing the dynamics of the crosstalks between network modules.

www.frontiersin.org

FIGURE 2 . In many situations, training experience is very expensive. While meta-learning is a strategy to reduce the training-data requirements for a new task, modular meta-learning is a strategy to reduce or save computational resources. Modular meta-learning methods learn sets of network modules of a biological network. This learning scheme aims at mimicking the animal brain which is capable to factor out variations of a context or a task, and by virtue of this ability it does not need to implement different algorithms in separate anatomical regions to learn each single variation of a context or task. The functional modularity of a real system (here represented as a collection kitchen utensils with different functions) is first mapped into a modular network (each module of which performs a different function). The causal structure of processes within each module can be learned by modular meta-learning methods, and finally the causal structure of the interactions among network clusters is learned by meta-learning approaches including regression of the functions f i internal to each modules (modular meta-learning) and then of the F i representing the cross-talks between clusters (global meta-learning). The image of the kitchen utensils in this figure is part of the Pixabay free online pictures ( https://pixabay.com/it/ ).

3.2 The Challenges of meta-Learning

In bioinformatics over the last 5 years, there have been studies using meta-learning approaches applied to the design of inferential systems (see for example Arredondo and Ormazábal (2015) , who developed inference systems for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps). However, these have mainly been inference procedures for deducing missing knowledge at the level of the network node rather than at the level of the arc and interaction mechanism, i.e. the causal relationship between nodes ( Hill et al., 2016 ). Modular meta-learning in the schemes proposed in the previous section could be a step towards the evolution of machine learning algorithms for causal discovery. However, these schemes also face challenges. In order to function at their best, they must rely on observational and interventional data when causal variables are observed. It is indeed known that a central problem for AI and causality is, learning of high-level causal variables from low-level observations. A key research area at the intersection of modular meta-learning and causal modelling will be in the next future learning of causal variables and connecting them to the data and the task/contexts. Connecting causal variables to the data is at the moment an undetermined problem in machine learning and more specifically in modular meta-learning, as when a network is trained for sets of tasks, different high-level causal variables may be identified depending on the task. We mainly see this problem as the first next challenge to be faced by machine learning approaches for causal discovery, including more specialised approaches such as reinforcement learning and deep learning ( Luo et al., 2020 ; Shen et al., 2020 ). The second major challenge will then be to identify indirect causal relationships, i.e. those relationships that would take place through latent mediating variables. To this end, current research focuses in particular on deep-learning methods, as testified by recent preliminary results published as a preprint (e. g.) ( Yuan and Bar-Joseph, 2019b ; Fan et al., 2021 )), and envisaged by previous studies (e.g. ( Ching et al., 2018 ; Jin et al., 2020 )).

4 Conclusion

In the light of the above, the convergence of three research areas, namely experimental research guided by data acquisition protocols aimed at inferring causal relationships, machine learning, and graphical causal modelling, is becoming increasingly urgent. In the context of this convergence, the limitations of one area will have to be compensated for by the advances of another area. For example, experiments in biology or in the clinic, observational data can often provide observational data, due to experimental limitations or ethical codes. It is well known that causal inference from observational data is particularly difficult and its outputs are mostly unreliable. Observational data are affected by biases from confounding, selection and measurement, which can result in an underestimate or overestimate of the effect of interest ( Hammerton and Munafò, 2021 ). In this case, it is expected that in the near future machine learning methods will be able to identify high-level causal variables from low-level data, and causal modelling approaches will be able to output accurate causal relationships. Meta-modelling and meta-learning are two approaches conceived in the logic of “doing more with less” ( Hammerton and Munafò, 2021 ). This is why they want to imitate the animal brain in learning causality, focusing on its ability to generalise to different contexts, and thus using a smaller amount of training data and a limited amount of computational resources. However, we should not forget that this advantage can come at a cost. Machine learning is a computational process. To that end, it is strongly tied to computational power and hardware supporting it. Hardware shapes the methods used in the design and development of machine learning models. Characteristics such as the power consumption of chips also define where and how machine learning can be used in the real world. At present, it is not yet possible to have a clear idea of the components and hardware architecture that computational schemes such as those shown in Figure 1 and Figure 2 might require. This is therefore another line of research that needs to be pursued, not only to understand its impact on artificial intelligence, but also on the systems biology and medicine, and, more importantly, on the community.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

The study is supported by the COMPANET 2020 Project fund issued to PL by the Faculty of Computer Science of the Free University of Bozen-Bolzano, Italy.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Adhami, M., Sadeghi, B., Rezapour, A., Haghdoost, A. A., and MotieGhader, H. (2021). Repurposing Novel Therapeutic Candidate Drugs for Coronavirus Disease-19 Based on Protein-Protein Interaction Network Analysis. BMC Biotechnol. 21, 22. doi:10.1186/s12896-021-00680-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Alet, F., Lozano-Pérez, T., and Kaelbling, L. P. (2018). “Modular Meta-Learning,” in 2nd Annual Conference on Robot Learning, CoRL 2018 , Zürich, Switzerland , 29-31 October 2018 , 856–868. Proceedings (PMLR), vol. 87 of Proceedings of Machine Learning Research.

Google Scholar

Andrieu, C., de Freitas, N., Doucet, A., and Jordan, M. I. (2003). Machine Learn. 50, 5–43. doi:10.1023/a:1020281327116

CrossRef Full Text

Angermueller, C., Lee, H. J., Reik, W., and Stegle, O. (2017). Erratum to: DeepCpG: Accurate Prediction of Single-Cell DNA Methylation States Using Deep Learning. Genome Biol. 18, 90. doi:10.1186/s13059-017-1233-z

Arredondo, T., and Ormazábal, W. (2015). Meta-learning Framework Applied in Bioinformatics Inference System Design. Int. J. Data Min Bioinform 11, 139–166. doi:10.1504/ijdmb.2015.066775

Auslander, N., Gussow, A. B., and Koonin, E. V. (2021). Incorporating Machine Learning into Established Bioinformatics Frameworks. Int. J. Mol. Sci. 22, 2903. doi:10.3390/ijms22062903

Badsha, M. B., and Fu, A. Q. (2019). Learning Causal Biological Networks with the Principle of Mendelian Randomization. Front. Genet. 10, 460. doi:10.3389/fgene.2019.00460

Bates, D. W., Auerbach, A., Schulam, P., Wright, A., and Saria, S. (2020). Reporting and Implementing Interventions Involving Machine Learning and Artificial Intelligence. Ann. Intern. Med. 172, S137–S144. doi:10.7326/m19-0872

Bengio, Y., Deleu, T., Rahaman, N., Ke, R., Lachapelle, S., Bilaniuk, O., et al. (2020). “A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms,” in The Eighth International Conference on Learning Representations (ICLR 2020) Proceedings , Apr 26th through May 1st , Addis Ababa, Ethiopia .

Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C., and Collins, J. J. (2018). Next-generation Machine Learning for Biological Networks. Cell 173, 1581–1592. doi:10.1016/j.cell.2018.05.015

Cassan, O., Lèbre, S., and Martin, A. (2021). Inferring and Analyzing Gene Regulatory Networks from Multi-Factorial Expression Data: a Complete and Interactive Suite. BMC Genomics 22, 387. doi:10.1186/s12864-021-07659-2

Castro, D. C., Walker, I., and Glocker, B. (2020). Causality Matters in Medical Imaging. Nat. Commun. 11, 3673. doi:10.1038/s41467-020-17478-w

Chang, R., Karr, J. R., and Schadt, E. E. (2015). Causal Inference in Biology Networks with Integrated Belief Propagation. Pac. Symp. Biocomput , 359–370. doi:10.1142/9789814644730_0035

Cheng, F., Kovács, I. A., and Barabási, A.-L. (2019). Network-based Prediction of Drug Combinations. Nat. Commun. 10, 1197. doi:10.1038/s41467-019-09186-x

Ching, T., Himmelstein, D. S., Beaulieu-Jones, B. K., Kalinin, A. A., Do, B. T., Way, G. P., et al. (2018). Opportunities and Obstacles for Deep Learning in Biology and Medicine. J. R. Soc. Interf. 15, 20170387. doi:10.1098/rsif.2017.0387

Deng, W., Zhang, K., Busov, V., and Wei, H. (2017). Recursive Random forest Algorithm for Constructing Multilayered Hierarchical Gene Regulatory Networks that Govern Biological Pathways. PLOS ONE 12, e0171532. doi:10.1371/journal.pone.0171532

Dongarra, J., Luszczek, P., Wolf, F., Träff, J. L., Quinton, P., Hellwagner, H., et al. (2011). “Systems Biology, Network Inference in,” in Encyclopedia of Parallel Computing (US: Springer ), 1997–2002. doi:10.1007/978-0-387-09766-4_466

CrossRef Full Text | Google Scholar

Fan, Z., Kernan, K. F., Benos, P. V., Canna, S. W., Carcillo, J. A., Kim, S., et al. (2021). Causal Inference Using Deep-Learning Variable Selection Identifies and Incorporates Direct and Indirect Causalities in Complex Biological Systems. bioRxiv . doi:10.1101/2021.07.17.452800

Feng, Y., Wang, Q., and Wang, T. (2017). Drug Target Protein-Protein Interaction Networks: A Systematic Perspective. Biomed. Res. Int. 2017, 1–13. doi:10.1155/2017/1289259

Furqan, M. S., and Siyal, M. Y. (2016). Inference of Biological Networks Using Bi-directional Random forest granger Causality. SpringerPlus 5, 514. doi:10.1186/s40064-016-2156-y

Galanti, T., Nabati, O., and Wolf, L. (2020). A Critical View of the Structural Causal Model . arXiv:2002.10007.

Gillani, Z., Akash, M. S., Rahaman, M. D., and Chen, M. (2014). CompareSVM: Supervised, Support Vector Machine (SVM) Inference of Gene Regularity Networks. BMC Bioinformatics 15, 395. doi:10.1186/s12859-014-0395-x

Glymour, C., Zhang, K., and Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models. Front. Genet. 10, 524. doi:10.3389/fgene.2019.00524

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning . MIT Press . Available at: http://www.deeplearningbook.org .

Hájek, A. (2011). “Conditional Probability,” in Philosophy of Statistics ( Elsevier ), 99–135. doi:10.1016/b978-0-444-51862-0.50003-4

Hammer, G. P., Prel, J.-B. d., and Blettner, M. (2009). Avoiding Bias in Observational Studies. Deutsches Aerzteblatt Online . doi:10.3238/arztebl.2009.0664

Hammerton, G., and Munafò, M. R. (2021). Causal Inference with Observational Data: the Need for Triangulation of Evidence. Psychol. Med. 51, 563–578. doi:10.1017/s0033291720005127

Hartmann, T., Moawad, A., Schockaert, C., Fouquet, F., and Le Traon, Y. (2019). “Meta-modelling Meta-Learning,” in 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS) , September 15-20, 2019 , Munich, Germany ( IEEE ). doi:10.1109/models.2019.00014

Hasan, M. R., Paul, B. K., Ahmed, K., and Bhuyian, T. (2020). Design Protein-Protein Interaction Network and Protein-Drug Interaction Network for Common Cancer Diseases: A Bioinformatics Approach. Inform. Med. Unlocked 18, 100311. doi:10.1016/j.imu.2020.100311

Hill, S. M., Heiser, L. M., Cokelaer, T., Unger, M., Nesser, N. K., Carlin, D. E., et al. (2016). Inferring Causal Molecular Networks: Empirical Assessment through a Community-Based Effort. Nat. Methods 13, 310–318. doi:10.1038/nmeth.3773

Hintze, A., and Adami, C. (2008). Evolution of Complex Modular Biological Networks. Plos Comput. Biol. 4, e23. doi:10.1371/journal.pcbi.0040023

Hitsch, G. J., and Misra, S. (2018). Heterogeneous Treatment Effects and Optimal Targeting Policy Evaluation. SSRN J. . doi:10.2139/ssrn.3111957

Hütt, M.-T. (2019). “Modular Organization and Emergence in Systems Biology,” in Emergence and Modularity in Life Sciences ( Springer International Publishing ), 37–49. doi:10.1007/978-3-030-06128-9_2

Huynh-Thu, V. A., and Geurts, P. (2018). Methods in Molecular Biology . New York: Springer , 195–215. doi:10.1007/978-1-4939-8882-2_8Unsupervised Gene Network Inference with Decision Trees and Random Forests

Jin, S., Zeng, X., Xia, F., Huang, W., and Liu, X. (2020). Application of Deep Learning Methods in Biological Networks. Brief Bioinform 22, 1902–1917. doi:10.1093/bib/bbaa043

Kishan, K. C., Li, R., Cui, F., Yu, Q., and Haake, A. R. (2019). GNE: a Deep Learning Framework for Gene Network Inference by Aggregating Biological Information. BMC Syst. Biol. 13, 38. doi:10.1186/s12918-019-0694-y

Kimura, S., Fukutomi, R., Tokuhisa, M., and Okada, M. (2020). Inference of Genetic Networks from Time-Series and Static Gene Expression Data: Combining a random-forest-based Inference Method with Feature Selection Methods. Front. Genet. 11, 595912. doi:10.3389/fgene.2020.595912

Kricke, M., and Peschenz, T. (2019). Applied Predictive Analytics Seminar - Causal KNN

Künzel, S. R., Sekhon, J. S., Bickel, P. J., and Yu, B. (2019). Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning. Proc. Natl. Acad. Sci. U S A. 116, 4156–4165. doi:10.1073/pnas.1804597116

Le Borgne, F., Chatton, A., Léger, M., Lenain, R., and Foucher, Y. (2021). G-computation and Machine Learning for Estimating the Causal Effects of Binary Exposure Statuses on Binary Outcomes. Sci. Rep. 11. doi:10.1038/s41598-021-81110-0

Li, F., Zhu, F., Ling, X., and Liu, Q. (2020a). Protein Interaction Network Reconstruction through Ensemble Deep Learning with Attention Mechanism. Front. Bioeng. Biotechnol. 8, 390. doi:10.3389/fbioe.2020.00390

Li, L., Shangguan, W., Deng, Y., Mao, J., Pan, J., Wei, N., et al. (2020b). A Causal Inference Model Based on Random Forests to Identify the Effect of Soil Moisture on Precipitation. J. Hydrometeorology 21, 1115–1131. doi:10.1175/jhm-d-19-0209.1

Libbrecht, M. W., and Noble, W. S. (2015). Machine Learning Applications in Genetics and Genomics. Nat. Rev. Genet. 16, 321–332. doi:10.1038/nrg3920

Liu, A., Trairatphisan, P., Gjerga, E., Didangelos, A., Barratt, J., and Saez-Rodriguez, J. (2019). From Expression Footprints to Causal Pathways: Contextualizing Large Signaling Networks with CARNIVAL. NPJ Syst. Biol. Appl. 5, 40. doi:10.1038/s41540-019-0118-z

Lorenz, D. M., Jeng, A., and Deem, M. W. (2011). The Emergence of Modularity in Biological Systems. Phys. Life Rev. 8, 129–160. doi:10.1016/j.plrev.2011.02.003

Lu, H., Zhou, Q., He, J., Jiang, Z., Peng, C., Tong, R., et al. (2020). Recent Advances in the Development of Protein-Protein Interactions Modulators: Mechanisms and Clinical Trials. Signal. Transduct Target. Ther. 5, 213. doi:10.1038/s41392-020-00315-3

Lu, J., Dumitrascu, B., McDowell, I. C., Jo, B., Barrera, A., Hong, L. K., et al. (2021). Causal Network Inference from Gene Transcriptional Time-Series Response to Glucocorticoids. Plos Comput. Biol. 17, e1008223. doi:10.1371/journal.pcbi.1008223

Luo, Y., Peng, J., and Ma, J. (2020). When Causal Inference Meets Deep Learning. Nat. Mach Intell. 2, 426–427. doi:10.1038/s42256-020-0218-x

Mayeux, R. (2004). Biomarkers: Potential Uses and Limitations. NeuroRX 1, 182–188. doi:10.1602/neurorx.1.2.182

Mitchell, T. (1997). Machine Learning . New York: McGraw-Hill .

Moguerza, J. M., and Muñoz, A. (2006). Support Vector Machines with Applications. Statist. Sci. 21. doi:10.1214/088342306000000493

Muzio, G., O'Bray, L., and Borgwardt, K. (2020). Biological Network Analysis with Deep Learning. Brief Bioinform 22, 1515–1530. doi:10.1093/bib/bbaa257

Neuberg, L. G. (2003). Causality: Models, Reasoning, and Inference, by Judea Pearl, cambridge university Press, 2000. Econometric Theor. 19, 675–685. doi:10.1017/s0266466603004109

Ni, R., Goldblum, M., Sharaf, A., Kong, K., and Goldstein, T. (2021). “Data Augmentation for Meta-Learning,” in Proceedings of the 38th International Conference on Machine Learning , July 18-24, 2021 . Editors M. Meila, and T. Zhang ( PMLR ), 8152–8161. Moscow, Russia: PMLR, MIR Press . vol. 139 of Proceedings of Machine Learning Research.

Nogueira, A. R., and Gama, J. (2021). Causal Discovery in Machine Learning: Theories and Applications. Jdg 8, 203–231. doi:10.3934/jdg.2021008

Oates, C. J., and Mukherjee, S. (2012). Network Inference and Biological Dynamics. Ann. Appl. Stat. 6, 1209–1235. doi:10.1214/11-aoas532

Omony, J. (2014). Biological Network Inference: A Review of Methods and Assessment of Tools and Techniques. Arrb 4, 577–601. doi:10.9734/arrb/2014/5718

Pearl, J. (2010). An Introduction to Causal Inference. Int. J. Biostat 6, 7. doi:10.2202/1557-4679.1203

Petralia, F., Wang, P., Yang, J., and Tu, Z. (2015). Integrative Random forest for Gene Regulatory Network Inference. Bioinformatics 31, i197–205. doi:10.1093/bioinformatics/btv268

Piraino, D. W. (2018). “Structural Causal Models: A Method to Increase Transparency of Machine Learning Model Assumptions and Increase Rigor of Machine Learning Model Evaluation,” in Abstract Presented at: Society for Imaging Informatics in Medicine Conference on Machine Intelligence in Medical Imaging , 2018 SIIM Conference on Machine Intelligence in Medical Images , September 9-10, 2018 , San Francisco, CA .

Poplin, R., Chang, P. C., Alexander, D., Schwartz, S., Colthurst, T., Ku, A., et al. (2018). A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks. Nat. Biotechnol. 36, 983–987. doi:10.1038/nbt.4235

Prosperi, M., Guo, Y., Sperrin, M., Koopman, J. S., Min, J. S., He, X., et al. (2020). Causal Inference and Counterfactual Prediction in Machine Learning for Actionable Healthcare. Nat. Mach Intell. 2, 369–375. doi:10.1038/s42256-020-0197-y

Qiu, Y. L., Zheng, H., Devos, A., Selby, H., and Gevaert, O. (2020). A Meta-Learning Approach for Genomic Survival Analysis. Nat. Commun. 11, 6350. doi:10.1038/s41467-020-20167-3

Raita, Y., Camargo, C. A., Liang, L., and Hasegawa, K. (2021). Leveraging "big Data" in Respiratory Medicine - Data Science, Causal Inference, and Precision Medicine. Expert Rev. Respir. Med. 15, 717–721. doi:10.1080/17476348.2021.1913061

Rivas-Barragan, D., Mubeen, S., Guim Bernat, F., Hofmann-Apitius, M., and Domingo-Fernández, D. (2020). Drug2ways: Reasoning over Causal Paths in Biological Networks for Drug Discovery. Plos Comput. Biol. 16, e1008464. doi:10.1371/journal.pcbi.1008464

Rives, A. W., and Galitski, T. (2003). Modular Organization of Cellular Networks. Proc. Natl. Acad. Sci. U S A. 100, 1128–1133. doi:10.1073/pnas.0237338100

Rose, S. (2020). Intersections of Machine Learning and Epidemiological Methods for Health Services Research. Int. J. Epidemiol. 49, 1763–1770. doi:10.1093/ije/dyaa035

Ruiz, C., Zitnik, M., and Leskovec, J. (2021). Identification of Disease Treatment Mechanisms through the Multiscale Interactome. Nat. Commun. 12, 1796. doi:10.1038/s41467-021-21770-8

Schmidhuber, J. (2015). Deep Learning in Neural Networks: An Overview. Neural Netw. 61, 85–117. doi:10.1016/j.neunet.2014.09.003

Schmidt, L., Heße, F., Attinger, S., and Kumar, R. (2020). Challenges in Applying Machine Learning Models for Hydrological Inference: A Case Study for Flooding Events across germany. Water Resour. Res. 11. doi:10.1029/2019wr025924

Schölkopf, B. (2019). Causality for Machine Learning . ArXiv:1911.10500v2.

Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., et al. (2021). Toward Causal Representation Learning. Proc. IEEE 109, 612–634. doi:10.1109/jproc.2021.3058954

Serban, M. (2020). Exploring Modularity in Biological Networks. Philos. Trans. R. Soc. Lond. B Biol. Sci. 375, 20190316. doi:10.1098/rstb.2019.0316

Shah, R. D., and Peters, J. (2020). The Hardness of Conditional independence Testing and the Generalised Covariance Measure. Ann. Statist. 48, 1514–1538. doi:10.1214/19-aos1857

Shen, X., Ma, S., Vemuri, P., and Simon, G. (2020). Challenges and Opportunities with Causal Discovery Algorithms: Application to Alzheimer's Pathophysiology. Sci. Rep. 10, 2975. doi:10.1038/s41598-020-59669-x

Snowden, J. M., Rose, S., and Mortimer, K. M. (2011). Implementation of G-Computation on a Simulated Data Set: Demonstration of a Causal Inference Technique. Am. J. Epidemiol. 173, 731–738. doi:10.1093/aje/kwq472

Somolinos, F. J., León, C., and Guerrero-Aspizua, S. (2021). Drug Repurposing Using Biological Networks. Processes 9, 1057. doi:10.3390/pr9061057

Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation, Prediction, and Search, Second Edition. Cambridge, MA: MIT Press , 543.

Sun, Y. V., and Kardia, S. L. (2008). Imputing Missing Genotypic Data of Single-Nucleotide Polymorphisms Using Neural Networks. Eur. J. Hum. Genet. 16, 487–495. doi:10.1038/sj.ejhg.5201988

Triantafillou, S., Lagani, V., Heinze-Deml, C., Schmidt, A., Tegner, J., and Tsamardinos, I. (2017). Predicting Causal Relationships from Biological Data: Applying Automated Causal Discovery on Mass Cytometry Data of Human Immune Cells. Sci. Rep. 7, 12724. doi:10.1038/s41598-017-08582-x

Tsai, W.-P., Fang, K., Ji, X., Lawson, K., and Shen, C. (2020). Revealing Causal Controls of Storage-Streamflow Relationships with a Data-Centric Bayesian Framework Combining Machine Learning and Process-Based Modeling. Front. Water 2, 583000. doi:10.3389/frwa.2020.583000

Veiga, D. F., Dutta, B., and Balázsi, G. (2010). Network Inference and Network Response Identification: Moving Genome-Scale Data to the Next Level of Biological Discovery. Mol. Biosyst. 6, 469–480. doi:10.1039/b916989j

Vert, J. P., Qiu, J., and Noble, W. S. (2007). A New Pairwise Kernel for Biological Network Inference with Support Vector Machines. BMC Bioinformatics 8 Suppl 10, S8. doi:10.1186/1471-2105-8-s10-s8

Washburn, J. D., Mejia-Guerra, M. K., Ramstein, G., Kremling, K. A., Valluru, R., Buckler, E. S., et al. (2019). Evolutionarily Informed Deep Learning Methods for Predicting Relative Transcript Abundance from DNA Sequence. Proc. Natl. Acad. Sci. U S A. 116, 5542–5549. doi:10.1073/pnas.1814551116

Wilkinson, J., Arnold, K. F., Murray, E. J., van Smeden, M., Carr, K., Sippy, R., et al. (2020). Time to Reality Check the Promises of Machine Learning-Powered Precision Medicine. Lancet Digit Health 2, e677–e680. doi:10.1016/s2589-7500(20)30200-4

Wong, J., and Damjakob, D. (2021). A Meta Learning Approach to Discerning Causal Graph Structure . CoRR abs/2106.05859.

Xu, C., and Jackson, S. A. (2019). Machine Learning and Complex Biological Data. Genome Biol. 20, 76. doi:10.1186/s13059-019-1689-0

Yazdani, A., Lu, L., Raissi, M., and Karniadakis, G. E. (2020). Systems Biology Informed Deep Learning for Inferring Parameters and Hidden Dynamics. Plos Comput. Biol. 16, e1007575. doi:10.1371/journal.pcbi.1007575

Yin, Y., and Yao, D. (2016). Causal Inference Based on the Analysis of Events of Relations for Non-stationary Variables. Sci. Rep. 6, 29192. doi:10.1038/srep29192

Yuan, Y., and Bar-Joseph, Z. (2019a). Deep Learning for Inferring Gene Relationships from Single-Cell Expression Data. Proc. Natl. Acad. Sci. U S A. 116, 27151–27158. doi:10.1073/pnas.1911536116

Yuan, Y., and Bar-Joseph, Z. (2019b). Deep Learning for Inferring Gene Relationships from Single-Cell Expression Data. bioRxiv . doi:10.1101/365007

Zarayeneh, N., Oh, J. H., Kim, D., Liu, C., Gao, J., Suh, S. C., and Kang, M. (2016). “Integrative Gene Regulatory Network Inference Using Multi-Omics Data,” in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , December 15-18, 2016 , Shenzhen China , 1336–1340. doi:10.1109/BIBM.2016.7822711

Zhang, J., Chai, H., Yang, G., and Ma, Z. (2017a). Prediction of Bioluminescent Proteins by Using Sequence-Derived Features and Lineage-specific Scheme. BMC Bioinformatics 18, 294. doi:10.1186/s12859-017-1709-6

Zhang, K., and Hyvärinen, A. (2010). “Distinguishing Causes from Effects Using Nonlinear Acyclic Causal Models,” in Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008 , December 12, 2008 , Whistler, Canada . Proceedings of Machine Learning Research , 6, 157–164. Available from: https://proceedings.mlr.press/v6/zhang10a.html .

Zhang, J., Zhang, Y., Li, Y., Guo, S., and Yang, G. (2020a). Identification of Cancer Biomarkers in Human Body Fluids by Using Enhanced Physicochemical-Incorporated Evolutionary Conservation Scheme. Curr. Top. Med. Chem. 20, 1888–1897. doi:10.2174/1568026620666200710100743

Zhang, J., Zhang, Y., and Ma, Z. (2019). In Silico prediction of Human Secretory Proteins in Plasma Based on Discrete Firefly Optimization and Application to Cancer Biomarkers Identification. Front. Genet. 10, 542. doi:10.3389/fgene.2019.00542

Zhang, J., and Zhang, S. (2013). “Modular Organization of Gene Regulatory Networks,” in Encyclopedia of Systems Biology (New York: Springer ), 1437–1441. doi:10.1007/978-1-4419-9863-7_473

Zhang, K., Huang, B., Zhang, J., Glymour, C., and Schölkopf, B. (2017b). “Causal Discovery from Nonstationary/heterogeneous Data: Skeleton Estimation and Orientation Determination,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence , August 19-25, 2017 , Melbourne, CA , 1347–1353. doi:10.24963/ijcai.2017/187

Zhang, S., Ning, X. M., Ding, C., and Zhang, X. S. (2010). Determining Modular Organization of Protein Interaction Networks by Maximizing Modularity Density. BMC Syst. Biol. 4 Suppl 2, S10. doi:10.1186/1752-0509-4-s2-s10

Zhang, Y., Chen, Q., Gao, D., and Zou, Q. (2020b). “GRRFNet: Guided Regularized Random forest-based Gene Regulatory Network Inference Using Data Integration,” in 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ( IEEE ). doi:10.1109/bibm49941.2020.9313349

Zhou, X., and Kosorok, M. R. (2017). Causal Nearest Neighbor Rules for Optimal Treatment Regimes .

Zou, J., Huss, M., Abid, A., Mohammadi, P., Torkamani, A., and Telenti, A. (2018). A Primer on Deep Learning in Genomics. Nat. Genet. 51, 12–18. doi:10.1038/s41588-018-0295-5

Keywords: machine learning, deep learning, causality, inference, causal thinking, artificial intelligence, systems biology

Citation: Lecca P (2021) Machine Learning for Causal Inference in Biological Networks: Perspectives of This Challenge. Front. Bioinform. 1:746712. doi: 10.3389/fbinf.2021.746712

Received: 24 July 2021; Accepted: 08 September 2021; Published: 22 September 2021.

Reviewed by:

Copyright © 2021 Lecca. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Paola Lecca, [email protected]

This article is part of the Research Topic

Inference of Biological Networks

Macquarie University

Data integration and knowledge discovery using biological networks

Table of contents, awarding institution, degree type, department, centre or school, year of award, principal supervisor, additional supervisor 1, former identifiers, usage metrics.

Macquarie University Theses

  • Other education not elsewhere classified

University of Leicester

Biological networks : a thermodynamical approach

Date of award, author affiliation, awarding institution, qualification level, qualification name, administrator link.

  • https://leicester.figshare.com/account/articles/10147859

Usage metrics

University of Leicester Theses

  • Uncategorized

American University

Biological models of artificial networks

The last several years have seen a dramatic increase in the number of neurobiologists building or using computer-based models as a regular part of their efforts to understand how different neural systems function. Computational neuroscience focuses on how the nervous system computes, and several models have been proposed that describe neural behavior at the cellular as well as a network level [11]. Artificial neural networks (ANN), originally inspired by their biological counterpart, are far from being analogous to the biological neural networks. After simple models proved useful, relatively little work has been done to make artificial neural networks more like biological systems. Even as strictly computational devices, current ANN systems have problems, such as local minima, network configuration, and preparation of input data. In this work, Biological network configurations of species like aplysia will be studied to understand the learning processes and to look for ways to incorporate the ideas into artificial networks.

Access statement

Usage metrics.

Theses and Dissertations

  • Biomedical engineering
  • Information and computing sciences

IMAGES

  1. Types of biological networks

    biological networks thesis

  2. Major categories of biological networks. An overview of the major

    biological networks thesis

  3. Modular structures of large biological networks. Here we see graphical

    biological networks thesis

  4. Schematic representation of a multilayered biological network. The

    biological networks thesis

  5. Network analysis in biology

    biological networks thesis

  6. A Comparative Study on Biological Networks / 978-3-8443-9466-5

    biological networks thesis

VIDEO

  1. Introduction to Biological Network Analysis I: Network Basics and Properties

  2. Network biology: Connecting new omics data with existing literature

  3. 2.7 Biological Networks

  4. Introduction to Biological Network Analysis III: Identifying Network Modules

  5. Methods in building and analysing biological networks

  6. 21

COMMENTS

  1. Extracting information from biological networks

    Abstract. Systems biology, the study of biological systems in a holistic manner, has been catalyzed by a dramatic improvement in experimental techniques, coupled with a constantly increasing availability of biological data. The representation and analysis of this heterogeneous data is facilitated by the powerful abstraction of biological networks.

  2. Modeling biological networks: from single gene systems to large

    In this research, we study biological networks at different scales: a gene autoregulatory network at the single-cell level and the gut microbiota at the population level. ... In the second part of the thesis, we investigate experimental time series from complex ecosystems and seek theoretical models reproducing all observed characteristics in ...

  3. PDF BIONIC: Biological Network Integration using Convolutions

    Biological networks constructed from varied data can be used to map cellular function, but each data type has limitations. Network integration promises to address these limitations by combining and automatically weighting input information to obtain a more accurate and comprehensive representation of the underlying biology.

  4. PDF Analysis of Large-Scale Biological Networks with Constraint-Based

    Analysis of Large-Scale Biological Networks with Constraint-Based Approaches over Static Models Carito Guziolowski To cite this version: ... The research I present in this thesis could not have been accomplished without inter-acting with all these people. First, my profound thanks to Anne Siegel, my thesis director, for being so involved ...

  5. PDF The Betweenness Centrality Of Biological Networks

    Of Biological Networks Shivaram Narayanan Thesis submitted to the Faculty of the ... Biological networks have been mainly constructed from experiments or from scien-tic literature. These networks contain interactions between proteins, DNA, RNA, metabolites, etc. There are many publicly available databases of these interaction

  6. Reconstructing gene regulatory networks of biological function using

    Building biological networks with a certain function is a challenge in systems biology. For the functionality of small (less than ten nodes) biological networks, most methods are implemented by exhausting all possible network topological spaces. This exhaustive approach is difficult to scale to large-scale biological networks. And regulatory relationships are complex and often nonlinear or non ...

  7. Biological network analysis with deep learning

    The graph representation of biological networks enables the formulation of classic machine learning tasks in bioinformatics, such as node classification, link prediction and graph classification. Deep learning methods on graphs, specifically GNNs, are a new way of solving these tasks by capturing hierarchical non-linearities in the data and ...

  8. PDF Biological network evaluation and relation discovery from scientific

    This thesis is submitted to the University of Cambridge for the degree of Doctor of Philosophy Fitzwilliam College March, 2014. ... Biological network evaluation involves several layers of information. The identifica-tion and normalisation of biomedical entities is of high importance. The research work as

  9. Networks in Systems Biology: Applications for Disease Modeling

    About this book. This book presents a range of current research topics in biological network modeling, as well as its application in studies on human hosts, pathogens, and diseases. Systems biology is a rapidly expanding field that involves the study of biological systems through the mathematical modeling and analysis of large volumes of ...

  10. PDF Computational Methods for Modelling and Analysing Biological Networks

    Biological Networks Thesis for the degree of Doctor of Science in Technology to be presented with due permission for public examination and criticism in Sähkötalo Building, Auditorium S2, at Tampere University of Technology, on the 27th of March 2015, at 12 noon. Tampereen teknillinen yliopisto - Tampere University of Technology Tampere 2015

  11. Plant Networks as Traits and Hypotheses: Moving Beyond Description

    One simplification in the systems biology toolkit is to use biological networks to organize, simplify, and analyze data. These networks can be used to map responses across biological scales, including relationships among cellular entities, such as protein-protein interactions, gene-gene coexpression, and protein:DNA binding. The description ...

  12. Single-cell biological network inference using a heterogeneous graph

    Single-cell multi-omics and deep learning could lead to the inference of biological networks across specific cell types. Here, the authors develop DeepMAPS, a deep learning, graph-based approach ...

  13. Electronic Thesis/Dissertation

    Electronic Thesis/Dissertation. Journal Issue. Journal Issue. Close. GW ScholarSpace Toggle navigation. Information for Authors; ... We study the robustness of a biological network from a global perspective, by employing a non-deterministic Markov Chain model and explore the energy landscape structure for the network. We also propose to take ...

  14. Next-Generation Machine Learning for Biological Networks: Cell

    By enabling one to generate models that learn from large datasets and make predictions on likely outcomes, machine learning can be used to study complex cellular systems such as biological networks. Here, we provide a primer on machine learning for life scientists, including an introduction to deep learning.

  15. Machine Learning for Causal Inference in Biological Networks

    Most machine learning-based methods predict outcomes rather than understanding causality. Machine learning methods have been proved to be efficient in finding correlations in data, but unskilful to determine causation. This issue severely limits the applicability of machine learning methods to infer the causal relationships between the entities of a biological network, and more in general of ...

  16. Data integration and knowledge discovery using biological networks

    The overall objective of this thesis is to analyse and understand the intricate network of protein interactions inside the cell. Proteins are molecular machines, which interact and communicate to perform different cellular functions. Research effort in molecular and cellular biology enables the detection of molecular interactions on a large scale. The experimental results generated by high ...

  17. Plant Networks as Traits and Hypotheses: Moving Beyond Description

    Biology relies on the central thesis that the genes in an organism encode molecular mechanisms that combine with stimuli and raw materials from the environment to create a final phenotypic expression representative of the genomic programming. While conceptually simple, the genotype-to-phenotype linkage in a eukaryotic organism relies on the interactions of thousands of genes and an environment ...

  18. PDF COMPARATIVE ANALYSIS OF BIOLOGICAL NETWORKS A Thesis Submitted to the

    ysis of Biological Networks. Major Professors: Ananth Grama and Wojciech Sz-pankowski. Recent developments in molecular biology have resulted in experimental data that entails the relationships and interactions between biomolecules. Biomolecular inter-action data, generally referred to as biological or cellular networks, are frequently

  19. Next-Generation Machine Learning for Biological Networks

    A crucial aspect of deep learning is that the behavior of these layers—that is, how they transform the data—can be learned by the machine rather than defined by the researcher (Angermueller et al., 2016, LeCun et al., 2015).Deep neural networks accomplish this by iteratively tuning their internal parameters to minimize prediction error, typically via a process known as backpropagation.

  20. PDF Study of Gene Regulatory Networks Inference Methods from Gene

    The understanding of these networks allows gaining a systems-level acknowledgement of biological organisms and also to genetically related diseases. This thesis focused on network inference from gene expression data, will contribute to this field of knowledge by studying different techniques that allows a better reconstruction of GRN.

  21. Biological Network Distances

    In this thesis I develop a model of evolution in terms of network structure. This model represents biologically relevant mutations in terms of their effect on the network. With this an estimate of a distance between the biological entities can be found in terms of the number of mutations needed to mutate a network into another, or mutate an ...

  22. Biological networks : a thermodynamical approach

    The third challenge is to get access using suitable techniques to the network architecture from expression data, such as mRNA abundances, for example. We first show in this thesis that it is possible to generate networks from a thermodynamical viewpoint. This approach allows us to relate the architecture of network to some constraints.

  23. Biological models of artificial networks

    In this work, Biological network configurations of species like aplysia will be studied to understand the learning processes and to look for ways to incorporate the ideas into artificial networks. Browse. Browse and Search Search. Explore more content. thesesdissertations_6015_OBJ. pdf (2.53 MB) ... thesis. posted on 2023-08-04, ...

  24. Eghbal Hosseini Thesis Defense: Towards Synergistic Understanding of

    Title: Towards Synergistic Understanding of Language Processing in Biological and Artificial Systems. Abstract: The faculty of language in the human brain relies on a network of frontal and temporal regions, predominantly in the left hemisphere, often defined as the "language network". Despite decades of research aimed at uncovering the ...