Graph Self-Contrast Representation Learning

Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs- K 𝐾 K scheme to construct one positive and K 𝐾 K negative samples for each graph, but it is difficult to set K 𝐾 K . For those methods that do not use negative samples, it is often necessary to add additional strategies to avoid model collapse, which could only alleviate the problem to some extent. All these drawbacks will undoubtedly have an adverse impact on the generalizability and efficiency of the model. In this paper, to address these issues, we propose a novel graph self-contrast framework GraphSC, which only uses one positive and one negative sample, and chooses triplet loss as the objective. Specifically, self-contrast has two implications. First, GraphSC generates both positive and negative views of a graph sample from the graph itself via graph augmentation functions of various intensities, and use them for self-contrast. Second, GraphSC uses Hilbert-Schmidt Independence Criterion (HSIC) to factorize the representations into multiple factors and proposes a masked self-contrast mechanism to better separate positive and negative samples. Further, Since the triplet loss only optimizes the relative distance between the anchor and its positive/negative samples, it is difficult to ensure the absolute distance between the anchor and positive sample. Therefore, we explicitly reduced the absolute distance between the anchor and positive sample to accelerate convergence. Finally, we conduct extensive experiments to evaluate the performance of GraphSC against 19 other state-of-the-art methods in both unsupervised and transfer learning settings.

Index Terms:

I introduction.

Graph self-supervised learning (GSSL)  [ 1 , 2 , 3 ] has attracted significant attention in recent years. Compared with traditional semi-supervised and supervised graph learning  [ 4 , 5 , 6 ] , GSSL seeks to employ supervision extracted from data itself, which can effectively circumvent the need for costly annotated data. In particular, one of the main types of GSSL is graph contrastive learning (GCL)  [ 7 , 3 ] , whose core idea is to minimize the distance between representations of different augmented views of the same graph (“positive pairs”), and maximize that of augmented views from different graphs (“negative pairs”).

Refer to caption

According to whether negative samples are used by the model or not, most existing graph-level GCL methods fall into one of two classes. On the one hand, some approaches construct one positive sample and K 𝐾 K negative samples for each graph  [ 3 , 8 , 9 ] , and formulate their objectives based on the normalized temperature-scaled cross entropy loss (NT-Xent)  [ 10 ] , such as GraphCL  [ 3 ] . However, these methods are easily affected by K 𝐾 K and an appropriate K 𝐾 K value is usually set empirically, which lacks theoretical supports. When K 𝐾 K is small, the model might not learn sufficient information to discriminate positive and negative samples; otherwise, there could lead to a large number of false-negative samples and slow convergence. In these methods, for each graph in a batch, other graphs in the same batch are considered as its negative samples, i.e., K = B − 1 𝐾 𝐵 1 K=B-1 , where B 𝐵 B is the batch size. As shown in Fig. 1 , the performance of GraphCL is significantly affected by K 𝐾 K on the COLLAB dataset. In particular, when K 𝐾 K is small, the performance of GraphCL drops drastically. On the other hand, the second type of methods propose to not use negative samples. However, these methods could suffer from a degenerate solution [ 11 ] , where all outputs “crash” to an undesired constant. To avoid such model collapse, additional strategies have to be applied, such as asymmetric dual encoders [ 12 , 13 ] . Recently, some studies [ 14 ] have showed that although these training strategies can avoid collapse to some extent, they may still cause collapse in partial dimensions of the representation, which leads to worse performance. The main reason for the model collapse is the complete non-use of negative samples. Therefore, a research question arises: To avoid the problem of K 𝐾 K selection and the degenerate solution, can we develop a GCL model that constructs only one positive sample and one negative sample for each graph?

Given one positive sample and one negative sample for each graph, a straightforward framework is to use triplet loss as objective function. However, triplet loss is hard-to-train and mainly suffers from poor local optima and slow convergence, partially due to that the loss function employs only one negative example while not interacting with other negative classes per update  [ 10 ] . In short, there are two difficulties: one is to find a valid negative sample and the other is to solve the hard-to-train problem.

For the first difficulty, hard negative sample mining  [ 15 , 16 , 17 , 18 , 19 ] has been proposed. However, most existing methods applied to graphs are node-level sampling, and very few is for graph-level sampling. Recently, Cuco  [ 20 ] proposes curriculum contrastive learning, which ranks negative samples from easy to hard and trains them in order. CGC  [ 21 ] proposes to obtain reliable counterfactual negative samples by pre-training to help contrastive learning. However, this introduces additional computational overhead that limits the performance of GCL. Inspired by the fact that some substances change their properties in response to external conditions, we propose a simple yet effective method to obtain negative samples from graph themselves. For example, an enzymatic protein could become a non-enzymatic one after some perturbations. Since the non-enzymatic protein is directly generated from the enzymatic protein, they can share structural similarities to some degree, which makes the negative sample discriminatively difficult, thereby achieving a similar effect to hard negative sampling.

To address the hard-to-train problem, we consider multiple facets of each graph to construct masked embedding vectors for its positive/negative samples. Then the self-contrast is performed not only between the whole embedding vectors, but also between masked embedding vectors corresponding to each facet. The masked contrast can be used to provide more information and speed up the model convergence. Further, optimizing the triplet loss essentially maximizes the distance between positive and negative samples. This amplifies the margin between different classes but cannot ensure low-dimensional representations for each class compact. Therefore, we further shorten the absolute distance between anchor and positive sample, which can make each class more compact and make the distance between similar samples in the feature space closer. It can also provide shortcuts for model convergence (we will show the experimental results in Section  V ).

In this paper, we study graph contrastive learning and propose a novel Graph S elf- C ontrast framework GraphSC, which follows the pattern of generating positive and negative samples from the samples themselves and conducting self-contrast. For each graph, GraphSC first generates one positive sample and one negative sample from the graph itself, and then self-contrasts the graph with its positive/negative samples as well as their masked embeddings. Inspired by the assumption in  [ 3 ] that the semantics of a graph will not change for a certain perturbation strength, we move forward and assume that the semantics of a graph will change under strong perturbations. Specifically, we propose to generate two different (positive and negative) views of a graph via graph augmentation functions of various intensities. After that, the original graph and two generated views of the graph are fed into a shared GNN encoder, after which sum pooling is used to derive graph-level representations. In particular, we use the representation of the original graph as anchor, and the representations of views generated by weak and strong perturbations as a positive sample and a negative sample, respectively. Further, to implement masked self-contrast, we perform a division on the embeddings of positive/negative samples, and divide each representation vector into multiple factors by Hilbert-Schmidt Independence Criterion (HSIC)  [ 22 ] . In addition to the contrast between the whole embedding vectors, we mask each factor separately and perform masked self-contrast between corresponding representations. Moreover, we use Mean Square Error (MSE) loss/Barlow Twins loss (BT)  [ 23 ] as a regularization to shorten the absolute distance between anchor and positive sample. This leads to better convergence in implementation. Finally, we summarize the contributions as follows:

We propose a novel graph self-contrast representation learning framework GraphSC.

We present a simple yet effective method to construct negative samples from graphs themselves in graph-level representation learning.

We use triplet loss in graph contrastive learning and address the hard-to-train problem of triplet loss by putting forward a masked self-contrast mechanism and directly shortening the absolute distance for positive pairs.

We conduct extensive experiments to evaluate the performance of GraphSC in both unsupervised learning and transfer learning settings. Experimental results show that GraphSC  performs favorably against other state-of-the-arts.

II RELATED WORK

Ii-a graph self-supervised learning.

Graph self-supervised learning  [ 1 , 2 , 3 ] aims to extract informative knowledge from graphs through pre-designed pretext tasks without relying on manual labels. They can be used to alleviate the annotation bottleneck that is one of the main barriers for practical deployment of deep learning today. According to the objectives of pretext tasks, existing graph self-supervised learning methods can be broadly divided into four categories: (1) generation-based methods  [ 1 ] , which aim to reconstruct the input graph data and use the input data as their supervision signals; (2) auxiliary-property-based methods  [ 24 ] , which attempt to obtain graph-related properties from the graph and further take them as supervision signals, such as pseudo labels of unlabeled data; (3) contrast-based methods  [ 2 , 3 ] , which construct positive and negative pairs for contrast. These methods follow the core idea of maximizing the mutual information (MI)  [ 25 ] between positive pairs and minimizing that between negative pairs. (4) hybrid methods  [ 26 ] , which integrate various pretext tasks together in a multi-task learning fashion. Our proposed method GraphSC  is contrast-based and we next introduce contrast-based methods in detail. For a comprehensive survey on graph self-supervised learning, see  [ 27 ] .

II-B Graph contrastive learning

According to the contrast mode, graph contrastive learning can be mainly divided into three categories: node-node contrast, node-graph contrast and graph-graph contrast.

For node-node contrast, the representative model GRACE  [ 28 ] first generates two contrastive views of a graph via graph augmentation, and then pulls close the representations of samples in positive pairs while pushing away that of samples in inter-view and intra-view negative pairs. GCA  [ 29 ] further introduces an adaptive augmentation by incorporating various priors for topological and semantic aspects of the graph, which results in a more competitive performance. GCC  [ 30 ] utilizes random walk as augmentations to extract the contextual information. BGRL  [ 31 ] maximizes the MI between node representations from online and target networks.

There also exist methods  [ 2 , 32 , 33 , 34 , 7 ] that are based on node-graph contrast. For example, DGI  [ 2 ] learns both local and global semantic information in graphs by contrasting node-level embeddings with the graph-level representation. After that, GIC  [ 32 ] seeks to additionally capture cluster-level information by first clustering nodes based on their embeddings, and then maximizing the MI between nodes in the same cluster. MVGRL [ 33 ] first generates two graph views via graph diffusion and subgraph sampling. Then it trains graph encoders by contrasting node embeddings in a view and the graph-level representation in another view. Further, SUBG-CON  [ 34 ] uses triplet loss as objective function. For each node, it first extracts the top-k most informative neighbors to form a subgraph. Then it pulls close the distance between the representations of the node and the subgraph, and pushes away that between the representations of the node and a randomly selected subgraph.

The third type of methods are based on graph-graph contrast. The early model GraphCL  [ 3 ] designs four types of graph augmentation (node dropping, edge perturbation, attribute masking and subgraph extraction), and then adopts the NT-Xent loss to learn the graph-level representation. Further, JOAO  [ 35 ] proposes a unified bi-level optimization framework to automatically select data augmentations. AD-GCL  [ 36 ] uses adversarial graph augmentation strategies that enables GNNs to avoid capturing redundant information during training. Inspired by Invariant Rationale Discovery (IRD), RGCL  [ 9 ] puts forward rationale-aware augmentations for graph contrastive learning to preserve the critical information in the graph. There are also methods that do not need data augmentations. For example, SimGRACE  [ 8 ] feeds the original graph into a GNN encoder and achieves data augmentation through perturbation of the encoder.

III Preliminary

In this section, we introduce basic concepts used in this paper.

III-A Graph Neural Networks (GNNs)

Let G = ( 𝒱 , ℰ ) 𝐺 𝒱 ℰ G=(\mathcal{V},\mathcal{E}) denote an undirected graph, where 𝒱 = { v 1 , v 2 , ⋯ , v N } 𝒱 subscript 𝑣 1 subscript 𝑣 2 ⋯ subscript 𝑣 𝑁 \mathcal{V}=\{v_{1},v_{2},\cdots,v_{N}\} is the node set and ℰ ⊆ 𝒱 × 𝒱 ℰ 𝒱 𝒱 \mathcal{E}\subseteq\mathcal{V}\times\mathcal{V} represents the edge set. We use X ∈ ℝ N × F 𝑋 superscript ℝ 𝑁 𝐹 X\in\mathbb{R}^{N\times F} to denote the node feature matrix, where F 𝐹 F is the dimension of node features. Generally, given a GNN model f ​ ( ⋅ ) 𝑓 ⋅ f(\cdot) , message propagation in the l 𝑙 l -th layer can be divided into two operations: one is to aggregate information from a node’s neighbors while the other is to update a node’s embedding. Taking node v i subscript 𝑣 𝑖 v_{i} as an example, we formally define these two operations as:

(1)
(2)

where h i ( l ) superscript subscript ℎ 𝑖 𝑙 h_{{i}}^{(l)} is the embedding of node v i subscript 𝑣 𝑖 v_{i} in the l 𝑙 l -th layer and 𝒩 ​ ( v i ) 𝒩 subscript 𝑣 𝑖 \mathcal{N}(v_{i}) is a set of nodes adjacent to v i subscript 𝑣 𝑖 v_{i} . AGGREGATE ( ⋅ ) ( l ) {}^{(l)}(\cdot) and COMBINE ( ⋅ ) ( l ) {}^{(l)}(\cdot) are two functions in each GNN layer. After L 𝐿 L propagation layers, the output embedding for G 𝐺 G is summarized on node embeddings via the READOUT function, which is formulated as:

(3)

III-B Hilbert-Schmidt Independence Criterion

The Hilbert-Schmidt Independence Criterion (HSIC)  [ 22 ] is a kernel-based measure of dependence between probability distributions. Let ℱ ℱ \mathcal{F} be a Hilbert space of real-value functions from a set 𝒳 𝒳 \mathcal{X} to ℝ ℝ \mathbb{R} . We say ℱ ℱ \mathcal{F} is a Reproducing Kernel Hilbert Space (RKHS) if ∀ x ∈ 𝒳 for-all 𝑥 𝒳 \forall x\in\mathcal{X} , the Dirac evaluation operator δ x subscript 𝛿 𝑥 \delta_{x} : ℱ → ℝ → ℱ ℝ \mathcal{F}\rightarrow\mathbb{R} , which maps f ∈ ℱ 𝑓 ℱ f\in\mathcal{F} to f ​ ( x ) ∈ ℝ 𝑓 𝑥 ℝ f(x)\in\mathbb{R} , is a bounded linear functional. In RHKS, ∀ x ∈ 𝒳 for-all 𝑥 𝒳 \forall x\in\mathcal{X} , there is a mapping ϕ ​ ( x ) ∈ ℱ italic-ϕ 𝑥 ℱ \phi(x)\in\mathcal{F} and there also exists a unique definite kernel u 𝑢 u : 𝒳 × 𝒳 → ℝ → 𝒳 𝒳 ℝ \mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R} , such that ⟨ ϕ ( x ) , ϕ ( x ′ ) ⟩ ℱ = u ( x , x ) ′ \langle\phi(x),\phi(x^{{}^{\prime}})\rangle_{\mathcal{F}}=u(x,x{{}^{\prime}}) .

Assume that we have two separable RKHSs ℱ ℱ \mathcal{F} , 𝒢 𝒢 \mathcal{G} and a joint measure p x ​ y subscript 𝑝 𝑥 𝑦 p_{xy} over ( 𝒳 × 𝒴 𝒳 𝒴 \mathcal{X}\times\mathcal{Y} , Γ × Λ Γ Λ \Gamma\times\Lambda ), where Γ Γ \Gamma is the Borel sets on set 𝒳 𝒳 \mathcal{X} and Λ Λ \Lambda is the Borel sets on set 𝒴 𝒴 \mathcal{Y} . Then the Hilbert-Schmidt Independence Criterion (HSIC) is defined as the squared Hilbert-Schmidt norm of the associated cross-covariance operator C x ​ y subscript 𝐶 𝑥 𝑦 C_{xy} :

(4)
(5)

Here, ⊗ tensor-product \otimes is tensor product, and ϕ ​ ( ⋅ ) italic-ϕ ⋅ \phi(\cdot) , φ ​ ( ⋅ ) 𝜑 ⋅ \varphi(\cdot) are functions that map x ∈ 𝒳 𝑥 𝒳 x\in\mathcal{X} and y ∈ 𝒴 𝑦 𝒴 y\in\mathcal{Y} to RKHSs ℱ ℱ \mathcal{F} and 𝒢 𝒢 \mathcal{G} w.r.t. the kernel functions u ( x , y ) = < ϕ ( x ) , ϕ ( y ) > u(x,y)=<\phi(x),\phi(y)> and s ( x , y ) = < φ ( x ) , φ ( y ) > s(x,y)=<\varphi(x),\varphi(y)> , respectively. Accordingly, given i.i.d. m 𝑚 m samples ( X , Y ) = { ( x 1 , y 1 ) , ⋯ , ( x m , y m ) } 𝑋 𝑌 subscript 𝑥 1 subscript 𝑦 1 ⋯ subscript 𝑥 𝑚 subscript 𝑦 𝑚 (X,Y)=\{(x_{1},y_{1}),\cdots,(x_{m},y_{m})\} drawn from the joint distribution of p x ​ y subscript 𝑝 𝑥 𝑦 p_{xy} , the empirical version of HSIC is given as:

(6)

Refer to caption

In this section, we introduce the GraphSC framework. Given a graph, GraphSC  first generates two augmented views as positive and negative samples via weak and strong perturbation, respectively (Step ①). Then the graph and its augmented views are fed into a GNN encoder with shared parameters to obtain corresponding graph-level representations (Step ②). After these representations have been mapped, they are self-contrasted (Step ③). At the same time, considering each graph has multi-facet features, GraphSC first factorizes the representations of positive and negative samples by using HSIC, and then masks each factor sequentially to generate multiple masked views (Step ④). The representations of anchors and the masked representations of positive and negative samples are contrasted after a projection head (Step ⑤). Finally, GraphSC shortens the absolute distance between an anchor and its positive sample (Step ⑥). The overall framework of GraphSC is given in Fig. 2 .

IV-A Data augmentation

To construct positive and negative pairs, most existing graph contrastive learning methods  [ 3 , 36 ] first perform data augmentations on graphs, such as node dropping and edge perturbation. After that, for each graph, its augmented views of graphs form positive samples while that of other graphs in the same mini-batch are considered as negative samples. Despite the success, these methods can be easily affected by the number of negative samples K 𝐾 K . To mitigate the influence of K 𝐾 K on the model performance and reduce the number of false negative samples selected, using hard negative samples could be a feasible solution. However, general hard negative sample mining strategies are either not suitable for graph data  [ 37 ] or computationally costly  [ 20 ] .

G^{+} , while that generated by strong perturbation is taken as negative sample G − superscript 𝐺 G^{-} . Formally, for any graph augmentation function 𝒜 ​ ( ⋅ ) 𝒜 ⋅ \mathcal{A}(\cdot) and two different perturbation rates r a subscript 𝑟 𝑎 r_{a} , r b subscript 𝑟 𝑏 r_{b} , where ( r a < r b ) subscript 𝑟 𝑎 subscript 𝑟 𝑏 (r_{a}<r_{b}) , we have

(7)

In this way, since negative samples are directly constructed from graphs themselves, they can share similarities with the original graphs to some degree. Therefore, these negative samples can play the role of hard negative, which can boost the model performance.

IV-B Model architecture

superscript subscript ℎ 𝑖 h_{i},h_{i}^{+},h_{i}^{-} , respectively. We denote

(8)

(9)

superscript subscript 𝑦 𝑖 superscript ℝ 𝑑 y_{i},y_{i}^{+},y_{i}^{-}\in\mathbb{R}^{d} . These vectors characterize the overall feature information of samples, which can be used for contrast.

(10)

(11)

These projected embeddings can be used for masked contrast across various views.

(12)

z_{i}^{+} .

IV-C Contrastive loss

superscript subscript ℎ 𝑖 (h_{i},h_{i}^{+},h_{i}^{-}) through a shared GNN encoder.

superscript subscript 𝑦 𝑖 (y_{i},y_{i}^{+},y_{i}^{-}) , and use triplet margin loss to enlarge the relative distance between positive and negative sample pairs:

(13)

Note that ϵ italic-ϵ \epsilon is the margin.

y_{i}^{+} and y i − superscript subscript 𝑦 𝑖 y_{i}^{-} into n 𝑛 n factors:

(14)

𝑚 1 𝑛 \{y_{im}^{+}\}_{m=1}^{n} and { y i ​ m − } m = 1 n superscript subscript superscript subscript 𝑦 𝑖 𝑚 𝑚 1 𝑛 \{y_{im}^{-}\}_{m=1}^{n} , respectively. When a factor is masked, it is still expected that the positive sample can be close to the anchor while the negative sample is distant. Therefore, we formulate the masked contrastive loss as:

(15)
(16)

(17)

Here, we introduce the weight w i ​ m subscript 𝑤 𝑖 𝑚 w_{im} for the m 𝑚 m -th factor and ∑ m = 1 n w i ​ m = 1 superscript subscript 𝑚 1 𝑛 subscript 𝑤 𝑖 𝑚 1 \sum_{m=1}^{n}w_{im}=1 . For the factor that leads to a small relative distance, our model will assign a large weight and thus pay more attention to the corresponding optimization process.

Shorten the absolute distance: To shorten the absolute distance between positive pairs, which can make each class more compact, we propose two models GraphSC  and GraphSC-MSE, which use Barlow Twins loss and MSE loss as regularization terms, respectively.

(1) GraphSC: GraphSC  pulls close the representations of the anchor and the positive sample after the third projection head g 3 subscript 𝑔 3 g_{3} , and we formulate the objective function ℒ a ​ b subscript ℒ 𝑎 𝑏 \mathcal{L}_{ab} as:

(18)

(19)

Here, Barlow Twins loss can additionally reduce the redundancy between components of embedding vectors, which can also boost the model performance.

(2) GraphSC-MSE: GraphSC-MSE no longer structurally needs the third projection head g 3 subscript 𝑔 3 g_{3} , and the regularization term ℒ a ​ b subscript ℒ 𝑎 𝑏 \mathcal{L}_{ab} can be written as:

(20)

Note that the MSE loss is a widely used distance measure, and we can use GraphSC-MSE to verify the necessity and effectiveness of shortening the absolute distance between the anchor and the positive sample. Finally, our objective function is summarized as:

(21)

where λ 1 subscript 𝜆 1 \lambda_{1} , λ 2 subscript 𝜆 2 \lambda_{2} and λ 3 subscript 𝜆 3 \lambda_{3} are hyper-parameters that are used to balance the term importance.

V EXPERIMENTS

In this section, we conduct experiments on multiple benchmark datasets to evaluate the performance of GraphSC through answering the following research questions.

RQ1. (Generalizability) Does GraphSC outperform other competitors in unsupervised settings?

RQ2.(Transferability) Can GNNs pre-train with GraphSC show better transferability than competitors?

RQ3. (Effectiveness) Are the individual components of GraphSC really valid for the model ?

RQ4. (Convergence) What is the effect of ℒ m ​ a subscript ℒ 𝑚 𝑎 \mathcal{L}_{ma} , ℒ a ​ b subscript ℒ 𝑎 𝑏 \mathcal{L}_{ab} and the proposed negative sample generation strategy on the convergence of the model?

RQ5. (Hyperparameters Sensitivity) Is the proposed GraphSC sensitive to hyperparameters like perturbation intensity r a subscript 𝑟 𝑎 r_{a} , r b subscript 𝑟 𝑏 r_{b} and the term weight λ 1 subscript 𝜆 1 \lambda_{1} , λ 2 subscript 𝜆 2 \lambda_{2} , λ 3 subscript 𝜆 3 \lambda_{3} ?

Dataset Category Graph Num. Avg. Node Avg. Edge
NCI1 Biochemical Molecules 4110 29.87 32.30
PROTEINS Biochemical Molecules 1113 39.06 72.82
DD Biochemical Molecules 1178 284.32 715.66
MUTAG Biochemical Molecules 188 17.93 19.79
COLLAB Social Networks 5000 74.49 2457.78
RDT-B Social Networks 2000 429.63 497.75
RDB-M Social Networks 4999 508.52 594.87
IMDB-B Social Networks 1000 19.77 96.53

V-A Experimental Setup

Datasets : For unsupervised learning, we use 8 datasets from the benchmark TU dataset [ 39 ] , including graph data for various biochemical molecules (i.e., NCI1, PROTEINS, DD, MUTAG) and social networks (i.e., COLLAB, REDDIT-BINARY, REEDIT-MULTI-5K and IMDB-BINARY). For transfer learning, we perform pre-training on ZINC-2M which samples 2 million unlabeled molecules from ZINC15  [ 40 ] and fine-tune the model with 8 datasets including BBBP, Tox21, ToxCast, SIDER, ClinTox, MUV, HIV and BACE. More details can be seen in the Table I and Table II .

Datasets Category Utilization Graph Num. Avg. Node Avg.Degree
ZINC-2M Biochemical Molecules PRE-TRAINING 2,000,000 26.62 57.72
BBBP Biochemical Molecules FINETUNING 2,039 24.06 51.90
TOX21 Biochemical Molecules FINETUNING 7,831 18.57 38.58
TOXCAST Biochemical Molecules FINETUNING 8,576 18.78 38.52
SIDER Biochemical Molecules FINETUNING 1,427 33.64 70.71
CLINTOX Biochemical Molecules FINETUNING 1,477 26.15 55.76
MUV Biochemical Molecules FINETUNING 93,087 24.23 52.55
HIV Biochemical Molecules FINETUNING 41,127 25.51 54.93
BACE Biochemical Molecules FINETUNING 1,513 34.08 73.71

Baselines : For unsupervised learning, we compare GraphSC with three kernel-based methods including graphlet kernel (GL) [ 41 ] , Weisfeiler-Lehman kernel (WL) [ 42 ] , and deep graph kernel (DGK) [ 43 ] . Furthermore, we compare GraphSC with other state-of-the-art methods: sub2vec [ 44 ] , graph2vec [ 45 ] , InfoGraph [ 7 ] , GraphCL [ 3 ] , JOAO(v2) [ 35 ] , AD-GCL  [ 36 ] , SimGRACE [ 8 ] , RGCL [ 9 ] and LaGraph [ 46 ] . We also take GraphSC-MSE as our baseline. For transfer learning, we adopt DGI [ 2 ] , EdgePred [ 47 ] , AttrMasking [ 47 ] , ContextPred [ 47 ] , GraphCL [ 3 ] , JOAO(v2) [ 35 ] , AD-GCL  [ 36 ] , SimGRACE [ 8 ] , GraphLoG [ 48 ] and RGCL [ 9 ] , which are the state-of-the-art pre-training paradigms in this area, as our baselines.

Evaluation Protocols : Following the settings of previous works [ 47 , 3 , 9 ] , we evaluate the performance and generalizability of the learned representations on both unsupervised and transfer learning settings. In unsupervised setting, we train GraphSC  using the whole dataset to learn graph representations and feed them into a downstream SVM classifier with 10-fold cross-validation, report the mean accuracy with standard deviation after 5 runs. For transfer learning, we pre-train and fine-tune GNN encoder in different datasets to evaluate the transferability of the pre-training scheme. The fine-tuning procedure is repeated for 10 times with different random seeds and we evaluate the mean and standard deviation of AUROC scores on each downstream dataset, which is consistent with our baselines.

Implementation details : We implement GraphSC  using PyTorch. The model is initialized by Xavier initialization [ 49 ] and trained by Adam  [ 50 ] . As suggested in  [ 19 ] , we set ϵ italic-ϵ \epsilon in ( 13 ) and ( 15 ) to 0.2. Similarly, we set β 𝛽 \beta in ( 18 ) to 0.013 according to  [ 23 ] . For other hyperparameters, we fine-tune them by grid search. For unsupervised learning, we first fine-tune learning rate from { 0.001 , 0.005 , 0.01 } 0.001 0.005 0.01 \{0.001,0.005,0.01\} . For the augmentation functions A ​ ( ⋅ ) 𝐴 ⋅ A(\cdot) , we choose from four augmentations and some of their combinations, which are in line with GraphCL  [ 3 ] . For the perturbation rates r a subscript 𝑟 𝑎 r_{a} and r b subscript 𝑟 𝑏 r_{b} , we fine-tune them from {0.05, 0.1, 0.15, 0.2} and {0.15,0.2,0.25,0.3,0.35,0.4} respectively. In addition, we fine-tune λ 1 subscript 𝜆 1 \lambda_{1} , λ 2 subscript 𝜆 2 \lambda_{2} and λ 3 subscript 𝜆 3 \lambda_{3} from {0.001, 0.01, 0.1, 1, 10, 100 }. In transfer learning, we pre-trained the GNN encoder on the ZINC-2M dataset, and we set learning rate to 0.001, the number of epochs to 80, r a subscript 𝑟 𝑎 r_{a} to 0.1, r b subscript 𝑟 𝑏 r_{b} to 0.25, λ 1 subscript 𝜆 1 \lambda_{1} to 1, λ 2 subscript 𝜆 2 \lambda_{2} to 0.01 and λ 3 subscript 𝜆 3 \lambda_{3} to 0.01. In addition, we use the combination of subgraph and node dropping as augmentation function, which is the same as GraphCL  [ 3 ] . In the process of fine-tuning, we adjust the two hyperparameters learning rate and epoch, and the grid search range is {0.0001, 0.0005, 0.001} and {20, 40, 60, 80, 100} respectively. Since most results of baselines are publicly available, we directly report these results from their original papers. For the results of AD-GCL and GraphLoG, we report these results from RGCL [ 9 ] . For fairness, we run all the experiments on a server with 128G memory and a single NVIDIA 2080Ti GPU. We provide our code and data here: https://anonymous.4open.science/r/GraphSC-8360 .

V-B Unsupervised learning (RQ1)

For unsupervised representation learning, we take the one-hot representations of node labels and degrees as node feature vectors for molecular datasets and social network datasets, respectively. We summarize the experimental results in Table  III . From the table, we see that GraphSC ranks first on 5 out of 8 datasets and has competitive results on the other three. For example, the accuracie of GraphSC on the COLLAB datasets is 78.90% , which is > 1.2 % absent percent 1.2 >1.2\% higher than the runner-up. Moreover, GraphSC  also leads in the other four datasets (i.e. NCI1, PROTEINS, RED-B and RED-M5K) by 0.3%-0.8%. Further, the average ranking of GraphSC  across all the datasets is 1.5, much better than the runner-up’s, which is 2.8. We also notice that GraphSC-MSE, which utilizes the MSE loss to minimize the absolute distance between anchor and positive sample, is the runner-up among all the methods. This further verifies the effectiveness of our proposed self-contrast framework, which is not simply originated from the Barlow Twin loss regularization.

Dataset NCI1 PROTEINS DD MUTAG COLLAB RDT-B RDT-M5K IMDB-B A.R.
GL - - - - 13.3
WL - - 7.9
DGK - - 6.4
sub2vec - - 14.5
graph2vec - - 11.5
InfoGraph 7.9
GraphCL 7.8
JOAO 8.9
JOAOv2 8
AD-GCL 8.5
SimGRACE 5.6
RGCL 5.1
LaGraph 3
GraphSC-MSE 2.8
GraphSC 1.5
Dataset BBBP Tox21 ToxCast SIDER ClinTox MUV HIV BACE AVG.
No Pre-Train 66.96
DGI 70.29
EdgePred 70.28
AttrMasking 71.15
ContextPred 70.89
GraphCL 70.77
JOAO 71.9
JOAOv2 72.12
AD-GCL 70.90
SimGRACE - - - - - -
GraphLoG 72.16
RGCL 73.16
GraphSC 73.37
Dataset NCI1 PROTEINS DD MUTAG COLLAB RDT-B RDT-M5K IMDB-B AVG.
GraphSC_rd 77.04
GraphSC_nm 77.25
GraphSC_nB 76.95
GraphSC

V-C Transfer learning (RQ2)

In transfer learning, we first pre-train a backbone model on Zinc-2M, and fine-tune the model on 8 multi-task binary classification datasets. All the results w.r.t. the area under receiver operating characteristic (AUROC) on downstream tasks are presented in Table IV , as well as the average scores. From the table, we see that GraphSC achieves the highest average score compared with other baselines, and also the best performances on the BBBP and Toxcast datasets. For other cases where GraphSC is not the winner, the gap between GraphSC’s score and the winner’s is small. For example, the gaps between GraphSC  and the winner on the SIDER and MUV datasets are only 0.89% and 0.35%, respectively. Further, let us take a closer look at GraphCL, which uses the same augmentation functions as GraphSC. Specifically, GraphSC  leads GraphCL by > 6 % absent percent 6 >6\% on both ClinTox and MUV datasets, and has an average score on all datasets that is 2.6% higher than GraphCL.

Refer to caption

V-D Ablation Study (RQ3)

We conduct an ablation study on GraphSC  to understand the characteristics of its main components. One variant randomly selects an augmented view of another sample as a negative sample. This is different from GraphSC, which uses a negative sample that is directly constructed from the graph sample itself (see Section IV-A ). We call this variant GraphSC_rd ( r an d om), which helps us evaluate the validity of our negative generation strategy. Another variant trains model without considering masked self-contrast. This helps us understand the importance of masked self-contrast. We call this variant GraphSC_nm ( n o m asked self-contrast). Finally, to show the importance of the Barlow Twins loss regularization term, We remove ℒ a ​ b subscript ℒ 𝑎 𝑏 \mathcal{L}_{ab} from the objective function and call this variant GraphSC_nB ( n o B arlow Twins). The results are given in Table V and we observe:

(1) GraphSC  achieves better performance than GraphSC_rd. Since GraphSC_rd randomly selects an augmented view of other sample as a negative, the performance gaps between GraphSC  and GraphSC_rd show that GraphSC’s negative generation strategy is very effective in selecting a valid negative sample to improve classification accuracy.

(2) GraphSC  clearly outperforms GraphSC_nm on all datasets. GraphSC_nm, which removes ℒ m ​ a subscript ℒ 𝑚 𝑎 \mathcal{L}_{ma} from the objective function, does not perform masked self-contrast. This leads to significant performance degradation due to the ignorance of the fact that graph-structured data generally contains multiple aspects of information.

(3) GraphSC  beats GraphSC_nB across all datasets. In particular, GraphSC significantly outperforms GraphSC_nB on MUTAG and COLLAB. This shows that the Barlow Twins loss regularization is particularly important for model training. When using Barlow Twins loss regularization, the model can explicitly shorten the absolute distance between the anchor and the positive sample. This effectively compensates for the inherent weakness of triplet loss.

V-E Convergence analysis (RQ4)

We next show how different components of GraphSC  can address the hard-to-train due to the usage of triplet loss objective. To show the difficulty in training triplet loss based objective, we further consider a variant of GraphSC that randomly selects an augmented view of another sample as a negative sample and uses the triplet loss to train GNN encoder. We call it GraphSC_rdt ( r an d om selection and t riplet loss). We take it as a reference and show the convergence results of GraphSC variants on four datasets in Fig. 3 . From the figure, we observe that:

(1) The accuracy of GraphSC_rdt does not increase with more training epochs on all four datasets. This shows that GraphSC_rdt is hard-to-train due to the triplet loss objective.

(2) GraphSC  converges faster than GraphSC_nm. Specifically, on the COLLAB and REDDIT-BINARY datasets, GraphSC_nm converges with more than 60 epochs, while GraphSC  uses only 20 epochs. In addition, GraphSC_nm trains very unsteadily on REDDIT-MULTI-5K. The convergence speed gaps between GraphSC  and GraphSC_nm show that GraphSC’s masked self-contrast is very effective in accelerating model convergence.

(3) Compared with GraphSC, GraphSC_nB converges slower and performs worse. GraphSC_nB, which only consider relative distances between anchor and positive sample, might not pull close positive sample pairs well. By explicitly shortening the absolute distance for positive sample pairs, GraphSC converges faster and performs better.

(4) GraphSC  also achieves faster convergence than GraphSC_rd. This shows that the negative generation strategy is particularly important for model training. When using our proposed negative generation strategy, the model can obtain informative negative samples, which accelerates convergence.

V-F Hyperparameter sensitivity analysis (RQ5)

∙ ∙ \bullet Perturbation intensity : As shown in Fig. 4 , we calculate the accuracy difference between GraphSC and GraphCL for different combinations of perturbation strengths. Although the best settings of perturbations are different for each dataset, GraphSC performs very well over various perturbation strength combinations. This demonstrates the stability of the model.

∙ ∙ \bullet Term weight λ 1 subscript 𝜆 1 \lambda_{1} , λ 2 subscript 𝜆 2 \lambda_{2} and λ 3 subscript 𝜆 3 \lambda_{3} : As can be observed in Fig. 5 , for the term weight λ 1 subscript 𝜆 1 \lambda_{1} , λ 2 subscript 𝜆 2 \lambda_{2} and λ 3 subscript 𝜆 3 \lambda_{3} , GraphSC  gives very stable performances over a wide range of parameter values. This also shows the robust performance of GraphSC.

VI CONCLUSIONS

In this paper, we studied graph-level representation learning and proposed GraphSC, a graph contrastive learning framework, which uses triplet loss as objective. GraphSC  first uses graph augmentation functions of different intensities to obtain a positive and negative view of a graph sample from the graph itself. Further, it factorizes the graph representations into multiple factors and then presented a self-contrast mechanism to separate positive and negative samples. Moreover, GraphSC tries to shorten the absolute distance between an anchor and its positive sample, which addresses the problem of triplet loss in optimizing only the relative distance between the anchor point and its positive/negative samples. We conducted extensive experiments in both unsupervised and transfer learning settings, and experimental results demonstrate the effectiveness and transferability of the proposed framework.

  • [1] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” arXiv preprint arXiv:1611.07308 , 2016.
  • [2] P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax.” ICLR (Poster) , vol. 2, no. 3, p. 4, 2019.
  • [3] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph contrastive learning with augmentations,” in NeurIPS , vol. 33, pp. 5812–5823, 2020.
  • [4] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907 , 2016.
  • [5] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903 , 2017.
  • [6] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” arXiv preprint arXiv:1810.00826 , 2018.
  • [7] F.-Y. Sun, J. Hoffmann, V. Verma, and J. Tang, “Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization,” arXiv preprint arXiv:1908.01000 , 2019.
  • [8] J. Xia, L. Wu, J. Chen, B. Hu, and S. Z. Li, “Simgrace: A simple framework for graph contrastive learning without data augmentation,” in Proceedings of the ACM Web Conference 2022 , 2022, pp. 1070–1079.
  • [9] S. Li, X. Wang, Y. Wu, X. He, T.-S. Chua et al. , “Let invariant rationale discovery inspire graph contrastive learning,” arXiv preprint arXiv:2206.07869 , 2022.
  • [10] K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” Advances in neural information processing systems , vol. 29, 2016.
  • [11] Y. Zhu, Y. Xu, Q. Liu, and S. Wu, “An empirical study of graph contrastive learning,” arXiv preprint arXiv:2109.01116 , 2021.
  • [12] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar et al. , “Bootstrap your own latent-a new approach to self-supervised learning,” in NeurIPS , vol. 33, pp. 21 271–21 284, 2020.
  • [13] X. Chen and K. He, “Exploring simple siamese representation learning,” in CVPR , 2021, pp. 15 750–15 758.
  • [14] A. C. Li, A. A. Efros, and D. Pathak, “Understanding collapse in non-contrastive learning,” arXiv preprint arXiv:2209.15007 , 2022.
  • [15] C.-Y. Chuang, J. Robinson, Y.-C. Lin, A. Torralba, and S. Jegelka, “Debiased contrastive learning,” Advances in neural information processing systems , vol. 33, pp. 8765–8775, 2020.
  • [16] J. Robinson, C.-Y. Chuang, S. Sra, and S. Jegelka, “Contrastive learning with hard negative samples,” arXiv preprint arXiv:2010.04592 , 2020.
  • [17] M. Wu, M. Mosse, C. Zhuang, D. Yamins, and N. Goodman, “Conditional negative sampling for contrastive learning of visual representations,” arXiv preprint arXiv:2010.02037 , 2020.
  • [18] Y. Kalantidis, M. B. Sariyildiz, N. Pion, P. Weinzaepfel, and D. Larlus, “Hard negative mixing for contrastive learning,” NeurIPS , vol. 33, pp. 21 798–21 809, 2020.
  • [19] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in CVPR , 2015, pp. 815–823.
  • [20] G. Chu, X. Wang, C. Shi, and X. Jiang, “Cuco: Graph representation with curriculum contrastive learning.” in IJCAI , 2021, pp. 2300–2306.
  • [21] H. Yang, H. Chen, S. Zhang, X. Sun, Q. Li, and G. Xu, “Generating counterfactual hard negative samples for graph contrastive learning,” ArXiv , vol. abs/2207.00148, 2022.
  • [22] A. Gretton, O. Bousquet, A. Smola, and B. Schölkopf, “Measuring statistical dependence with hilbert-schmidt norms,” in International conference on algorithmic learning theory .   Springer, 2005, pp. 63–77.
  • [23] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” in ICML .   PMLR, 2021, pp. 12 310–12 320.
  • [24] Y. You, T. Chen, Z. Wang, and Y. Shen, “When does self-supervision help graph convolutional networks?” in international conference on machine learning .   PMLR, 2020, pp. 10 871–10 880.
  • [25] R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio, “Learning deep representations by mutual information estimation and maximization,” arXiv preprint arXiv:1808.06670 , 2018.
  • [26] J. Zhang, H. Zhang, C. Xia, and L. Sun, “Graph-bert: Only attention is needed for learning graph representations,” arXiv preprint arXiv:2001.05140 , 2020.
  • [27] Y. Liu, M. Jin, S. Pan, C. Zhou, Y. Zheng, F. Xia, and P. Yu, “Graph self-supervised learning: A survey,” IEEE Transactions on Knowledge and Data Engineering , 2022.
  • [28] Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep graph contrastive representation learning,” arXiv preprint arXiv:2006.04131 , 2020.
  • [29] ——, “Graph contrastive learning with adaptive augmentation,” in WebConf , 2021, pp. 2069–2080.
  • [30] J. Qiu, Q. Chen, Y. Dong, J. Zhang, H. Yang, M. Ding, K. Wang, and J. Tang, “Gcc: Graph contrastive coding for graph neural network pre-training,” in KDD , 2020, pp. 1150–1160.
  • [31] S. Thakoor, C. Tallec, M. G. Azar, R. Munos, P. Veličković, and M. Valko, “Bootstrapped representation learning on graphs,” in ICLR 2021 Workshop on Geometrical and Topological Representation Learning , 2021.
  • [32] C. Mavromatis and G. Karypis, “Graph infoclust: Leveraging cluster-level node information for unsupervised graph representation learning,” arXiv preprint arXiv:2009.06946 , 2020.
  • [33] K. Hassani and A. H. Khasahmadi, “Contrastive multi-view representation learning on graphs,” in International Conference on Machine Learning .   PMLR, 2020, pp. 4116–4126.
  • [34] Y. Jiao, Y. Xiong, J. Zhang, Y. Zhang, T. Zhang, and Y. Zhu, “Sub-graph contrast for scalable self-supervised graph representation learning,” in 2020 IEEE international conference on data mining (ICDM) .   IEEE, 2020, pp. 222–231.
  • [35] Y. You, T. Chen, Y. Shen, and Z. Wang, “Graph contrastive learning automated,” in ICML .   PMLR, 2021, pp. 12 121–12 132.
  • [36] S. Suresh, P. Li, C. Hao, and J. Neville, “Adversarial graph augmentation to improve graph contrastive learning,” Advances in Neural Information Processing Systems , vol. 34, pp. 15 920–15 933, 2021.
  • [37] J. Xia, L. Wu, G. Wang, J. Chen, and S. Z. Li, “Progcl: Rethinking hard negative mining in graph contrastive learning,” in ICML .   PMLR, 2022, pp. 24 332–24 346.
  • [38] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in ICML .   PMLR, 2020, pp. 1597–1607.
  • [39] C. Morris, N. M. Kriege, F. Bause, K. Kersting, P. Mutzel, and M. Neumann, “Tudataset: A collection of benchmark datasets for learning with graphs,” arXiv preprint arXiv:2007.08663 , 2020.
  • [40] T. Sterling and J. J. Irwin, “Zinc 15–ligand discovery for everyone,” Journal of chemical information and modeling , vol. 55, no. 11, pp. 2324–2337, 2015.
  • [41] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt, “Efficient graphlet kernels for large graph comparison,” in Artificial intelligence and statistics .   PMLR, 2009, pp. 488–495.
  • [42] N. Shervashidze, P. Schweitzer, E. J. Van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-lehman graph kernels.” Journal of Machine Learning Research , vol. 12, no. 9, 2011.
  • [43] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in KDD , 2015, pp. 1365–1374.
  • [44] B. Adhikari, Y. Zhang, N. Ramakrishnan, and B. A. Prakash, “Sub2vec: Feature learning for subgraphs,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining .   Springer, 2018, pp. 170–182.
  • [45] A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal, “graph2vec: Learning distributed representations of graphs,” arXiv preprint arXiv:1707.05005 , 2017.
  • [46] Y. Xie, Z. Xu, and S. Ji, “Self-supervised representation learning via latent graph prediction,” arXiv preprint arXiv:2202.08333 , 2022.
  • [47] W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec, “Strategies for pre-training graph neural networks,” arXiv preprint arXiv:1905.12265 , 2019.
  • [48] M. Xu, H. Wang, B. Ni, H. Guo, and J. Tang, “Self-supervised graph-level representation learning with local and global structure,” ArXiv , vol. abs/2106.04113, 2021.
  • [49] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics .   JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
  • [50] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.

ar5iv homepage

  • DOI: 10.1109/ICDM58522.2023.00012
  • Corpus ID: 261557283

Graph Self-Contrast Representation Learning

  • Minjie Chen , Yao Cheng , +2 authors Ming Gao
  • Published in Industrial Conference on Data… 5 September 2023
  • Computer Science

Figures and Tables from this paper

figure 1

51 References

Cuco: graph representation with curriculum contrastive learning, graph contrastive learning with adaptive augmentation, simgrace: a simple framework for graph contrastive learning without data augmentation.

  • Highly Influential

Deep Graph Contrastive Representation Learning

Sub-graph contrast for scalable self-supervised graph representation learning, graph contrastive learning automated, generating counterfactual hard negative samples for graph contrastive learning, self-supervised representation learning via latent graph prediction, let invariant rationale discovery inspire graph contrastive learning, graph self-supervised learning: a survey, related papers.

Showing 1 through 3 of 0 Related Papers

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Subscribe to the PwC Newsletter

Join the community, edit social preview.

graph self contrast representation learning

Add a new code entry for this paper

Remove a code repository from this paper.

graph self contrast representation learning

Mark the official implementation from paper authors

Add a new evaluation result row.

TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK REMOVE
  • CONTRASTIVE LEARNING
  • GRAPH REPRESENTATION LEARNING
  • NODE CLASSIFICATION
  • REPRESENTATION LEARNING

Remove a task

graph self contrast representation learning

Add a method

Remove a method.

  • CONTRASTIVE LEARNING -

Edit Datasets

Generative subgraph contrast for self-supervised graph representation learning.

25 Jul 2022  ·  Yuehui Han , Le Hui , Haobo Jiang , Jianjun Qian , Jin Xie · Edit social preview

Contrastive learning has shown great promise in the field of graph representation learning. By manually constructing positive/negative samples, most graph contrastive learning methods rely on the vector inner product based similarity metric to distinguish the samples for graph representation. However, the handcrafted sample construction (e.g., the perturbation on the nodes or edges of the graph) may not effectively capture the intrinsic local structures of the graph. Also, the vector inner product based similarity metric cannot fully exploit the local structures of the graph to characterize the graph difference well. To this end, in this paper, we propose a novel adaptive subgraph generation based contrastive learning framework for efficient and robust self-supervised graph representation learning, and the optimal transport distance is utilized as the similarity metric between the subgraphs. It aims to generate contrastive samples by capturing the intrinsic structures of the graph and distinguish the samples based on the features and structures of subgraphs simultaneously. Specifically, for each center node, by adaptively learning relation weights to the nodes of the corresponding neighborhood, we first develop a network to generate the interpolated subgraph. We then construct the positive and negative pairs of subgraphs from the same and different nodes, respectively. Finally, we employ two types of optimal transport distances (i.e., Wasserstein distance and Gromov-Wasserstein distance) to construct the structured contrastive loss. Extensive node classification experiments on benchmark datasets verify the effectiveness of our graph contrastive learning method.

Code Edit Add Remove Mark official

Tasks edit add remove, datasets edit.

graph self contrast representation learning

Results from the Paper Edit

Methods edit add remove.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Code for ICDM2020 full paper: "Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning"

yzjiao/Subg-Con

Folders and files.

NameName
9 Commits

Repository files navigation

Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning (Jiao et al. , ICDM 2020): https://arxiv.org/abs/2009.10273

Here we provide an implementation of Subg-Con in PyTorch and Torch Geometric. The repository is organised as follows:

  • subgcon.py is the implementation of the Subg-Con pipeline;
  • subgraph.py is the implementation of subgraph extractor;
  • model.py is the implementation of components for Subg-Con, including a GNN layer, a pooling layer, and a scoring function;
  • utils_mp.py is the necessary processing subroutines;
  • dataset/ will contain the automatically downloaded datasets;
  • subgraph/ will contain the processed subgraphs.

Finally, train.py puts all of the above together and may be used to execute a full training.

Dependencies

  • Python 3.7.3
  • PyTorch 1.5.1
  • torch_geometric 1.4.3
  • scikit-learn 0.23.2
  • scipy 1.5.2
  • cytoolz 0.10.0

If you make advantage of Subg-Con in your research, please cite the following in your manuscript:

Contributors 2

  • Python 100.0%

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Deep graph contrastive learning model for drug-drug interaction prediction

Roles Formal analysis, Validation, Visualization, Writing – original draft

Affiliation College of Information and Intelligence, Hunan Agricultural University, Changsha, China

Roles Formal analysis, Supervision, Writing – review & editing

* E-mail: [email protected] (ZG); [email protected] (CS); [email protected] (XD)

Affiliations School of Computer Science and Engineering, Hunan University of Information Technology, Changsha, China, Key Laboratory of Intelligent Perception and Computing, Hunan University of Information Technology, Changsha, China

Roles Funding acquisition, Methodology, Supervision, Writing – review & editing

Affiliations College of Information and Intelligence, Hunan Agricultural University, Changsha, China, School of Computer Science and Engineering, Hunan University of Information Technology, Changsha, China, Key Laboratory of Intelligent Perception and Computing, Hunan University of Information Technology, Changsha, China

ORCID logo

Roles Formal analysis, Methodology, Validation

Roles Formal analysis, Investigation, Supervision

Affiliation School of Computer Science, University of South China, Hengyang, China

Roles Funding acquisition, Methodology, Project administration, Supervision, Writing – review & editing

Affiliation School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore

  • Zhenyu Jiang, 
  • Zhi Gong, 
  • Xiaopeng Dai, 
  • Hongyan Zhang, 
  • Pingjian Ding, 

PLOS

  • Published: June 17, 2024
  • https://doi.org/10.1371/journal.pone.0304798
  • Reader Comments

Fig 1

Drug-drug interaction (DDI) is the combined effects of multiple drugs taken together, which can either enhance or reduce each other’s efficacy. Thus, drug interaction analysis plays an important role in improving treatment effectiveness and patient safety. It has become a new challenge to use computational methods to accelerate drug interaction time and reduce its cost-effectiveness. The existing methods often do not fully explore the relationship between the structural information and the functional information of drug molecules, resulting in low prediction accuracy for drug interactions, poor generalization, and other issues. In this paper, we propose a novel method, which is a deep graph contrastive learning model for drug-drug interaction prediction (DeepGCL for brevity). DeepGCL incorporates a contrastive learning component to enhance the consistency of information between different views (molecular structure and interaction network), which means that the DeepGCL model predicts drug interactions by integrating molecular structure features and interaction network topology features. Experimental results show that DeepGCL achieves better performance than other methods in all datasets. Moreover, we conducted many experiments to analyze the necessity of each component of the model and the robustness of the model, which also showed promising results. The source code of DeepGCL is freely available at https://github.com/jzysj/DeepGCL .

Citation: Jiang Z, Gong Z, Dai X, Zhang H, Ding P, Shen C (2024) Deep graph contrastive learning model for drug-drug interaction prediction. PLoS ONE 19(6): e0304798. https://doi.org/10.1371/journal.pone.0304798

Editor: Satyaki Roy, The University of Alabama in Huntsville, UNITED STATES

Received: February 20, 2024; Accepted: May 17, 2024; Published: June 17, 2024

Copyright: © 2024 Jiang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data is publicly available at the following URL: https://doi.org/10.6084/m9.figshare.25522918 .

Funding: The author(s) received funding for this work from the following sources: Research Foundation of Hunan Educational Committee, Grant No. 20C1579, Pingjian Ding. Hunan Province Higher Education Reform Research Project, Grant No. HNJG-2021-1242, Zhi Gong. National Natural Science Foundation of China, Grant No. 62002154, Pingjian Ding. Hunan Provincial Natural Science Foundation of China, Grant No. 2021JJ40467, Hongyan Zhang.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Drug–drug interaction (DDI) refers to the phenomenon that occurs when two or more drugs are taken together, resulting in adverse effects on an organism [ 1 , 2 ]. Thus, how to accurately identify drug-drug interactions has become an important research content. Traditional methods which used in drug-drug interaction identification are mainly based on experimental assays and clinical reports [ 3 ]. However, this process would be costly and time-consuming, especially for identifying drug-drug interactions from a large drug space. Computational methods (in silico [ 4 – 6 ]) can be used as an effective and fast alternative to alleviate this problem. Among these methods usually focus on learning single drug properties and lack effective integration of multiple sources of drug-related information, which ultimately limits the predictive capabilities of the model. Therefore, it has become an important research direction in the field of drug discovery to propose an effective and fast calculation method for drug-drug interaction prediction.

In recent years, accumulated research findings have demonstrated promising results in computational-based drug-drug interactions (DDIs) prediction. These achievements are primarily attributed to the rapid advancements in drug molecular property prediction [ 7 – 10 ]. These methods for predicting DDIs can be broadly categorized into two groups: Structure-based methods and network-based methods. Firstly, structure-based methods mainly consider the entire drug as a graph or sequence. For example, some researchers consider atoms as nodes and bonds between atoms as edges, then use a graph neural network (GNN) to learn the representation of each drug [ 11 – 17 ]. Additionally, some models use SMILES (Simplified Molecular Input Line Entry System) [ 18 ] as the input for sequence models (including GRU [ 19 ], LSTM [ 20 ], and Transformer [ 21 ]), then predict the DDIs. In these methods, drugs are treated as independent individuals, and the representation is learned from the drug molecular structure and then transported to the classifier through some aggregating operations. Next, another important method for predicting DDIs is the network-based method. In this kind of method, the authors mainly consider the drug as a node, and then consider the interaction or similarity between drugs as an edge to form a large network, and then use the traditional network science method or the graph neural network method to predict the unknown interaction of drug molecules [ 22 – 24 ]. Although these methods have achieved good performance, they still have some limitations. Firstly, the structure-based methods assume that drugs with similar features will behave similarly in the DDIs, however, there may be a lower similarity between interacting drugs. Meanwhile, the performance of the network-based methods relies on the quality of the interaction network, and it is time-consuming and difficult to build large-scale high-quality networks. Second, the drug molecular graph and the drug interaction network contain mutually irreplaceable pharmacological properties, which are very important for predicting DDIs. The drug molecular graph contains information about the drug functional groups that determine the chemical and physical properties of the drug. The topological information between drugs is contained in the interaction network, which contains some specific functions of some drugs. Although these methods obtain great performance in some specific tasks, they focus only on single-view learning and ignore the mutual complementarity of information among multi-view.

Existing research has demonstrated the effectiveness of building models to predict DDIs from multiple perspectives, primarily by aggregating multi-source information, including drug structure information, network topological information, and more [ 25 – 27 ]. For example, MUFFIN [ 28 ] has aggregated molecular structure information and drug topology information to predict DDIs. DSN-DDI [ 29 ] has utilized both local and global representation learning modules, which can learn drug substructures from individual drugs (intra-view) and drug pairs (inter-view) simultaneously. m2vec [ 22 ] has combined drug target networks with SMILES information and then used graph autoencoders to learn the final representation of drugs. The success of these methods confirmed the advantages of predicting DDIs from the multi-view. However, these methods prioritize leveraging multi-view data to improve drug representation, without considering the balance and consistency of multi-source information, and cannot effectively utilize the structure-level and network-level information. Contrastive learning is often used to maximize the mutual information between multiple perspectives. Thus, researchers have used the contrastive learning component to balance and integrate molecular structure information and interaction network information [ 30 ]. Moving forward, if the drug pair can be directly regarded as a whole, the representation vector can be learned at the level of the drug pair, which can be used to model training and DDI prediction, it may provide a new perspective for DDI prediction.

In this paper, we introduce a novel Deep Graph Contrastive Learning model (DeepGCL) for drug-drug interaction prediction. DeepGCL leverages graph contrastive learning to combine both molecular structure features and network topological features. Firstly, DeepGCL constructs the molecular structure graph for each drug and employs a graph convolutional network (GCN) to learn the structural features of the drugs. Then, DeepGCL constructs a subgraph for each drug pair and utilizes GCN to learn the topological features of each drug pair. To better choose a pooling operation, it is important to emphasize that we utilize a virtual node to aggregate the node features of the entire subgraph. Next, the graph contrastive learning model is used to combine the features of drug structure and network topology. Finally, the structural and topological features of the learned drug pairs are integrated for drug-drug interaction prediction. Experimental results demonstrate that DeepGCL achieves the best performance across three real-world datasets. In our study, we performed an ablation analysis which unequivocally demonstrated the essential role of graph contrastive learning in integrating information from various perspectives. Meanwhile, we also conducted experiments to assess the robustness of the DeepGCL model.

DeepGCL framework

The overview of the DeepGCL is shown in Fig 1 . DeepGCL is decomposed into three parts: (1) Topology information learning module ( Fig 1B ). This module mainly uses a graph convolutional network to learn the representation of each node in a local subgraph. (2) Structural information learning module ( Fig 1C ). This module utilizes a graph convolutional network to learn the representation of each drug, and drugs of the drug pair share parameters during the learning process. (3) Graph contrastive learning module and prediction module ( Fig 1D ). This module mainly uses graph contrastive learning and cross-entropy loss to constrain the model iteration and predict the probability of interaction between input drug pairs. Firstly, all drugs form an interconnected network in which the drugs represent nodes, and the edges represent interactions between the drugs ( Fig 1A ). Then, we sample the common H-Hop neighbor nodes from the drug interaction network for any input drug pair to construct a subgraph. Meanwhile, we introduce a virtual node to learn the global features of the subgraph, which is connected to all nodes within the subgraph. Additionally, we utilize the internal structural information of drug molecules to construct molecular graphs. Furthermore, to balance the information between the molecular and subgraphs of the drug pair, we incorporate a graph contrastive learning module that optimizes the model by ensuring consistency in the representation between the molecular and subgraphs. Finally, the prediction of drug-drug interactions combines the molecular structure and topology information of drug pairs.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

(A) Drug-drug interaction networks: This component illustrates the network of interactions between drugs. (B) Topology information learning module: This module extracts common H-Hop neighbor nodes for drug pairs to form a subgraph. Subsequently, the subgraph is passed through GCN to generate a global representation for the drug pair. (C) Structural information learning module: This module employs GCN with shared parameters to acquire representations for drugs within drug pairs. (D) Graph contrastive learning module and prediction module: In this module, graph contrastive learning and cross-entropy loss are employed to regulate the model’s iteration and predict the interaction probability between input drug pairs.

https://doi.org/10.1371/journal.pone.0304798.g001

Subgraph construction and representation learning

In recent years, the use of graph neural networks for analyzing networked graph data has garnered considerable attention. For example, Scorpius constructs a knowledge graph to evaluate the correlation between drugs and diseases [ 31 ]. However, dealing with large-scale networks can pose significant computational challenges. Therefore, there’s increasing interest in extracting topological information from subgraphs. For instance, DisenCite utilizes L hops neighbors from the paper relationships network to learn topological information [ 32 ]. Inspired by this, DeepGCL learns topological relationships among drugs within a specific neighborhood by selecting H-Hop neighboring nodes and edges to construct subgraphs. This approach enables DeepGCL to concentrate on learning local drug pair features from subgraphs without the need to train the entire drug interaction network. If two drugs lack shared neighbors, the subgraph consists of only those two drugs, otherwise, shared neighbors are used to form subgraphs.

graph self contrast representation learning

Molecular structure representation

graph self contrast representation learning

Graph contrastive learning

Due to the scarcity of labeled data, unsupervised learning has been widely applied in the fields of few-shot learning [ 37 ], recommendation systems [ 38 , 39 ], and natural language processing [ 40 ]. Among them, KGNN combines graph neural networks and kernel-based networks to effectively utilize both labeled and unlabeled graphs [ 41 ]. Additionally, graph contrastive learning represents one of the most advanced unsupervised learning methods, with successful applications in various tasks, including node classification, graph classification, and drug discovery [ 42 – 44 ]. Some graph contrastive learning models commonly use a contrastive learning framework based on graph augmentation, such as edge perturbation [ 45 ], node deletion [ 46 ], and attribute augmentation [ 47 ] to form a contrastive view. However, methods that perturb the structure of the input graph to obtain different contrasting views may introduce noise, potentially affecting the model’s performance. Therefore, DSGC [ 48 ] constructs contrasting views in different spaces and fits the advantage of each by graph contrastive learning. Inspired by this, both drug molecule graphs and subgraphs contain rich information about drug interactions, and we can leverage contrastive learning to combine the two to obtain better embeddings.

For drug molecule graphs and subgraphs, after the nonlinear mapping introduced above, we obtain their low-dimensional representations. The representation of the molecular graph contains information on multiple functional groups, while the subgraph contains local topological information within the interaction network. The consistency of the representation vectors can be maximized by adding a contrastive learning component.

graph self contrast representation learning

Drug-drug interaction prediction

graph self contrast representation learning

Experiments

In this section, we demonstrate the performance of the model on two real-world datasets to test the effectiveness of the model in the task of predicting adverse drug reaction classification to answer the following three questions:

  • Q1: How does DeepGCL perform in real-world datasets compared to other models?
  • Q2: Does integrating information from drug molecular graphs and subgraphs improve the performance of the model?
  • Q3: After adding the contrastive learning component, can the model further improve the learning ability of the model?

Dataset and baseline

DeepGCL is a binary classification model that focuses on detecting drug interactions, which utilizes three real-world datasets (BioSNAP [ 49 ], AdverseDDI [ 50 ], and DrugBank [ 51 ]) to verify performance. After preprocessing drug SMILES strings with RDKit, we excluded drugs lacking SMILES representations and corresponding molecular data. In this model, the positive-to-negative sample ratio is 1:1. Details are shown in Table 1 .

thumbnail

https://doi.org/10.1371/journal.pone.0304798.t001

To verify the validity of DeepGCL and answer Q1, we compare it with two types of models, which are graph neural network models and network embedding models. The graph neural network models include CSGNN [ 52 ], DeepDDI [ 53 ], DeepDDS [ 12 ] and CASTER [ 8 ]. Among these, DeepDDI and DeepDDS use drug structure information to learn drug representation. CSGNN proposes that a hybrid multi-Hop neighborhood aggregator will be incorporated to capture the interrelationships of indirect neighbors in molecular interaction networks. CASTER considers the functional substructure of the drug, uses a self-encoder to learn the chemical structure data, and increases the interpretability of the model by adding a dictionary learning module. The network embedding model includes Deepwalk [ 54 ], Line [ 55 ], node2vec [ 56 ], SDNE [ 57 ], and struc2vec [ 58 ]. Deepwalk preserves the similarity between neighboring nodes and Line further preserves the similarity of nodes that have common neighbors. node2vec improves the random wandering strategy and enriches the contextual information of the nodes. SDNE is a semi-supervised learning method that uses self-encoders to simultaneously optimize the similarity of the nodes’ higher-order neighbors and learn the local and global features of the nodes. struc2vec focuses on the spatial structure features of nodes in the network, considering the similarity of nodes in the local topology. NNPS [ 59 ] constructs initial features by amalgamating information concerning drug molecular side effects and drug-protein interactions. Subsequently, NNPS employs neural networks to compute the probabilities associated with adverse drug reactions for given drug combinations.

Experimental setting

To evaluate our model more comprehensively, we randomly split the dataset into a training set, a validation set, and a test set using an 8:1:1 ratio. For each experiment, we randomly split the dataset 5 times. All comparison models were set according to the parameters in the original paper. Referring to the DeepDDS [ 12 ], we train two shared parameter 3-layer GCNs for learning drug molecular graphs with dimensions {78, 156, 128}, and train drug interaction network subgraphs with encoder 3-layer GCNs with dimensions {166, 332, 128}. The optimizer is Adam and the dropout rate is {0.2, 0.5, 0.8}. For joint training, we set α = 0.1, β = 1, λ = 1, η = 1. We use the area under the ROC curve (AUC), F1 score (F1), and area under the precision recall curve (AUPR) as metrics to evaluate the model. The training process encompassed a total of 100 epochs.

Experimental results

In Table 2 , we provide the mean and standard deviation of performance metrics for DeepGCL and various baseline models on three real-world datasets. Superior results are highlighted in bold text. DeepGCL consistently exhibits strong performance across all datasets. This observation not only supports the effectiveness of our approach but also provides validation for the research question Q1.

thumbnail

Best performance in each metric is shown in bold font.

https://doi.org/10.1371/journal.pone.0304798.t002

As demonstrated in Table 2 , network embedding models such as Line and SDNE exhibit performance levels comparable to that of CASTER. This observation underscores the significance of drug topological information within the drug interaction network, placing it on equal importance with the molecular structure information predicted by DDI. While DeepDDS outperforms DeepDDI in the BioSNAP and AdverseDDI datasets, DeepDDI achieves superior performance in the DrugBank dataset. Both models rely solely on molecular structure information. DeepDDS is based on neural network architecture and highlights the potential of neural network models in DDI prediction. It’s noteworthy to emphasize that DeepDDI’s exceptional performance can be attributed to its integration of additional drug databases for drug structure similarity calculations. This incorporation of additional data sources broadens DeepDDI’s scope, allowing it to incorporate a more comprehensive range of drug-related information compared to other models. Among the baselines, CSGNN consistently shows stable performance across all three datasets. This observation suggests that the model’s approach of enhancing communication among higher-order neighbor nodes contributes to its predictive abilities. Comparatively, DeepGCL uses virtual nodes to connect all nodes in the subgraph, and higher-order neighbors communicate through virtual nodes as intermediaries during message passing. DeepGCL outperforms all other compared models. DeepGCL aggregates drug molecular structure and drug topology information to make up for the limitations of single-molecule graph learning.

We have incorporated an array of evaluation metrics, including Mean Average Precision (MAP), Mean Reciprocal Ranking (MRR), and HIT@K metrics. DeepGCL consistently demonstrates competitive performance across these diverse evaluation criteria, as evident in the experimental results presented in S1 Table . These metrics are particularly relevant in the context of drug discovery, where the emphasis is often on identifying the most promising drug candidates for further experimentation. DeepGCL showed reliable performance, indicating its ability to identify potentially interacting drug pairs. In practical drug recommendation scenarios, the top-ranked drug pairs are of paramount importance, and our model’s proficiency in this regard further underscores its utility in drug discovery.

Furthermore, we analyzed the training time of the model on various datasets to evaluate its computational efficiency. As shown in S2 Table , DeepGCL exhibits advantages in computational efficiency compared to several models, notably NNPS and DeepDDI. This superiority can be attributed to the effectiveness of graph neural networks in learning features from graph-structured data. DeepGCL focuses on molecular structure and network topology to enhance drug interaction prediction accuracy, which consequently affects computational efficiency. However, it remains competitive in both model prediction accuracy and computational efficiency.

Ablation study

To further investigate the necessity and effectiveness of each component of the DeepGCL model and address questions Q2 and Q3, we designed the following variants of DeepGCL for experiments on three datasets. Each variant was trained five times independently, and the mean and standard deviation were calculated five times.

DeepGCL without molecular structure learning (DeepGCL w/o molecular) learns drug interaction information only from drug interaction subgraphs to make predictions about drug pairs.

DeepGCL without subgraph learning (DeepGCL w/o subgraph) learns only the embedding representation of the drug from the molecular graph as the representation vector of the drug pair.

DeepGCL without contrastive learning (DeepGCL w/o contrastive) trains the target based on supervised signals, and the acquired drug pair embeddings are used for downstream binary classification.

Fig 2 shows the results of DeepGCL and its variants for AUC, AUPR, and F1 scores. It’s evident that removing any component from DeepGCL results in weaker performance compared to the model before removal. These results demonstrate the necessity of the existence of each component in the DeepGCL model. In the BioSNAP and DrugBank datasets, the model’s performance is not significantly improved after adding contrastive learning, as observed in the AdverseDDI dataset. This disparity can be attributed to the fact that BioSNAP and DrugBank already contain rich drug interaction information. Consequently, even without the use of contrastive learning, DeepGCL achieves excellent performance by effectively integrating drug structure and interaction information. However, it’s worth noting that the addition of contrastive learning still resulted in performance improvements, albeit to a lesser extent. This indicates that it can effectively improve the predictive power of the model by integrating information from multiple drug perspectives. By incorporating the contrastive learning component, the model’s two encoders can glean more profound insights into the interplay between drug molecules. Consequently, the model’s learning capacity can be further enhanced. In conclusion, each component of the DeepGCL model is necessary and effective.

thumbnail

https://doi.org/10.1371/journal.pone.0304798.g002

Robustness analysis

Existing deep learning models are susceptible to interference from noise. Here, to verify the robustness of the model, we randomly remove 10%, 20%, 30%, 40%, 50% of known associations. The results are shown in Fig 3 . As the corresponding proportion of known associations was removed, all models showed a downward trend, but DeepGCL still performed best in all scenarios. Among graph neural network models, DeepDDS, and DeepDDI perform poorly in the process of removing edges. This can be due to data sparsity problems. These methods begin with drug similarity, learn drug embedding by drug molecular structure information, and assume that similar drugs will have similar performance in DDIs. In comparison, CSGNN shows reliable robustness after removing some edges. The possible reason for this result is that CSGNN uses deep mix-Hop graph neural networks to capture higher-order neighbors to alleviate the problem of data sparsity. In the AdverseDDI dataset, Deepwalk and node2vec degrade performance rapidly as the edges are removed. The above model assumes that nodes with common neighbors will be more similar, and this assumption is easily influenced by noise. In conclusion, DeepGCL can fit the respective advantages of molecular graphs and interaction networks to improve the robustness of the model.

thumbnail

(a) Performance in BioSNAP dataset, (b) Performance in AdverseDDI dataset, (c) Performance in DrugBank dataset.

https://doi.org/10.1371/journal.pone.0304798.g003

Cold start experiments

In Drug-Drug Interaction (DDI) prediction tasks, traditional K-fold cross-validation (CV) can inadvertently introduce information overlap between training and testing sets, potentially inflating results. To address this challenge, we employ two distinct cold start scenarios [ 60 ]: drug-wise CV and pairwise CV. In drug-wise CV, our objective is to predict interactions between known drugs and unknown drugs. In pairwise CV, the goal is to predict interactions exclusively between unknown drugs. We categorize input drugs into two groups: drugs for training ( drugs train ) and cold-start drugs ( drugs cold ) lacking known interactions in the training set. This categorization results in three distinct DDI subsets: DDI train (for known drugs), DDI drugwise (for interactions between cold-start and known drugs), and DDI pairwise (for interactions between cold-start drugs). Subsequently, we utilize logistic regression classifiers trained on DDI train to make predictions in both drug-wise and pairwise scenarios. The results are presented in Table 3 . Remarkably, All the employed models show a noticeable decline in performance across both scenarios when compared to traditional CV. Attributable to DeepGCL’s ability to learn drug representations from both molecular graphs and interaction networks, it exhibits robust performance across two distinct scenarios.

thumbnail

The best score is in bold.

https://doi.org/10.1371/journal.pone.0304798.t003

Visualization

In this section, we analyze the drug pair features learned by DeepGCL. To intuitively observe the relationships between features, we employ dimensionality reduction methods to visualize them as points in a two-dimensional space. Dimensionality reduction methods mainly include linear and nonlinear approaches [ 61 ]. Linear LRPER [ 62 ] and the nonlinear method T-SNE [ 63 ] are two popular dimensionality reduction methods. Given T-SNE’s advantage in preserving local structure, we opt for T-SNE as our dimensionality reduction tool. Since DeepGCL is a deep learning model, we only compare DeepGCL with other deep learning models, including DeepDDS, CSGNN, DeepDDI, and CASTER. As shown in Fig 4 , the DeepGCL can well distinguish between drug pairs with (green) and without (red) interactions. In the BioSNAP dataset, we chose the contour coefficients as an indicator to measure the quality of the drug pair representations, and the contour coefficients of DeepGCL, DeepDDS, CSGNN, DeepDDI, CASTER were 0.3246, 0.2085, 0.1973, 0.0188, 0.1944. This indicates that DGCL can better extract the molecular representation of drug pairs.

thumbnail

Red points indicate drug pairs without interactions and green points indicate drug pairs with interactions.

https://doi.org/10.1371/journal.pone.0304798.g004

Parameter analysis

DeepGCL uses GCN to learn semantic information about the graph, where the network depth is crucial for the final learning quality. To verify the effect of different depths of the network on the performance of the model, we build networks of different depths for experiments. The experimental results are shown in Fig 5 , where the horizontal axis coordinates correspond to the depth of the learned drug molecular graph network and the vertical axis corresponds to the depth of the learned drug interaction graph network, the darker the color of the module corresponds to the better model performance.

thumbnail

https://doi.org/10.1371/journal.pone.0304798.g005

In the BioSNAP dataset, the combination of l = 2 and k = 3 respectively works best, and the combination of k = 1, and l = 3 in the AdverseDDI dataset works best. In the DrugBank dataset, the combination of l = 2 and k = 2 performs best. By evaluating the performance of various combinations, we ultimately chose l = 2 and k = 3.

DeepGCL constructs subgraphs from shared H-Hop drug neighbors to learn drug topology within the DDI network. Increasing H includes more nodes in the subgraph, enhancing topological understanding. In DeepGCL, we select H from the set 1,2 since computational limits. S1 Fig shows that including 2-Hop neighbors enhances performance on AdverseDDI and BioSNAP. In contrast, DrugBank performs optimally with only 1-Hop neighbors due to its higher node degree, where a larger subgraph might introduce noise, potentially counteracting the benefits. In the concluding model, the parameter H is set to 2 for the BioSNAP and AdverseDDI datasets, whereas it is set to 1 for the DrugBank dataset.

We present DeepGCL, a novel deep graph contrastive learning framework that integrates drug interaction network topology and molecular structure information. DeepGCL constructs subgraphs from shared H-Hop neighboring nodes in the Drug-Drug Interaction (DDI) network and employs GCN to obtain representations for drug molecular graphs and subgraphs. DeepGCL introduces a key graph contrastive learning component to enhance the consistency of embeddings across various perspectives. DeepGCL consistently demonstrates competitive performance across various metrics. When the model is applied to larger datasets, it learns additional topological information from subgraphs, leading to performance improvements. However, this comes with increased computational complexity. In the future, we’ll focus on efficient methods [ 64 ] for learning subgraph features, balancing computational efficiency and model performance on large-scale data to improve scalability. In summary, DeepGCL advances drug-drug interaction prediction and maintains a competitive edge in drug interaction research while providing valuable insights.

Supporting information

S1 fig. the experiment assesses the impact of varying h-hop neighbors on deepgcl’s performance across multiple datasets..

https://doi.org/10.1371/journal.pone.0304798.s001

S1 Table. Performance comparison of DeepGCL and competitive methods based on evaluation metrics MRR, MAP, and HIT@K.

https://doi.org/10.1371/journal.pone.0304798.s002

S2 Table. Training time of models on different datasets (Seconds).

https://doi.org/10.1371/journal.pone.0304798.s003

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 7. Wang H, Lian D, Zhang Y, Qin L, Lin X. GoGNN: Graph of Graphs Neural Network for Predicting Structured Entity Interactions. arXiv e-prints. 2020; p. arXiv–2005.
  • 8. Huang K, Xiao C, Hoang T, Glass L, Sun J. Caster: Predicting drug interactions with chemical substructure representation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34; 2020. p. 702–709.
  • 13. Fang X, Liu L, Lei J, He D, Zhang S, Zhou J, et al. ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction. arXiv e-prints. 2021; p. arXiv–2106.
  • 14. Chen X, Liu X, Wu J. Drug-drug interaction prediction with graph representation learning. In: 2019 IEEE International conference on bioinformatics and biomedicine (BIBM). IEEE; 2019. p. 354–361.
  • 21. Fabian B, Edlich T, Gaspar H, Segler M, Meyers J, Fiscato M, et al. Molecular representation learning with language models and domain-relevant auxiliary tasks. arXiv preprint arXiv:201113230. 2020;.
  • 22. Purkayastha S, Mondal I, Sarkar S, Goyal P, Pillai JK. Drug-drug interactions prediction based on drug embedding and graph auto-encoder. In: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE). IEEE; 2019. p. 547–552.
  • 23. Ma T, Shang J, Xiao C, Sun J. Genn: predicting correlated drug-drug interactions with graph energy neural networks. arXiv preprint arXiv:191002107. 2019;.
  • 30. Wang Y, Min Y, Chen X, Wu J. Multi-view graph contrastive representation learning for drug-drug interaction prediction. In: Proceedings of the Web Conference 2021; 2021. p. 2921–2933.
  • 32. Wang Y, Song Y, Li S, Cheng C, Ju W, Zhang M, et al. Disencite: Graph-based disentangled representation learning for context-specific citation generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36; 2022. p. 11449–11458.
  • 34. Ishiguro K, Maeda Si, Koyama M. Graph warp module: an auxiliary module for boosting the power of graph neural networks in molecular graph analysis. arXiv preprint arXiv:190201020. 2019;.
  • 37. Song Y, Ju W, Tian Z, Liu L, Zhang M, Xie Z. Building Conversational Diagnosis Systems for Fine-Grained Diseases Using Few Annotated Data. In: International Conference on Neural Information Processing. Springer; 2022. p. 591–603.
  • 38. Qin Y, Wang Y, Sun F, Ju W, Hou X, Wang Z, et al. DisenPOI: Disentangling sequential and geographical influence for point-of-interest recommendation. In: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining; 2023. p. 508–516.
  • 39. Ju W, Yi S, Wang Y, Long Q, Luo J, Xiao Z, et al. A survey of data-efficient graph learning. arXiv preprint arXiv:240200447. 2024;.
  • 40. Liu C, Shen J, Xin H, Liu Z, Yuan Y, Wang H, et al. Fimo: A challenge formal dataset for automated theorem proving. arXiv preprint arXiv:230904295. 2023;.
  • 41. Ju W, Yang J, Qu M, Song W, Shen J, Zhang M. Kgnn: Harnessing kernel-based networks for semi-supervised graph classification. In: Proceedings of the fifteenth ACM international conference on web search and data mining; 2022. p. 421–429.
  • 43. Shuai J, Zhang K, Wu L, Sun P, Hong R, Wang M, et al. A review-aware graph contrastive learning framework for recommendation. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval; 2022. p. 1283–1293.
  • 44. Yang Y, Huang C, Xia L, Li C. Knowledge graph contrastive learning for recommendation. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval; 2022. p. 1434–1443.
  • 47. Fang Y, Zhang Q, Yang H, Zhuang X, Deng S, Zhang W, et al. Molecular contrastive learning with chemical element knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36; 2022. p. 3968–3976.
  • 48. Yang H, Chen H, Pan S, Li L, Yu PS, Xu G. Dual space graph contrastive learning. In: Proceedings of the ACM Web Conference 2022; 2022. p. 1238–1247.
  • 49. Marinka Zitnik SM Rok Sosič, Leskovec J. BioSNAP Datasets: Stanford Biomedical Network Dataset Collection; 2018. http://snap.stanford.edu/biodata .
  • 52. Zhao C, Liu S, Huang F, Liu S, Zhang W. CSGNN: Contrastive Self-Supervised Graph Neural Network for Molecular Interaction Prediction. In: IJCAI; 2021. p. 3756–3763.
  • 54. Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining; 2014. p. 701–710.
  • 55. Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web; 2015. p. 1067–1077.
  • 56. Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining; 2016. p. 855–864.
  • 57. Wang D, Cui P, Zhu W. Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining; 2016. p. 1225–1234.
  • 58. Ribeiro LF, Saverese PH, Figueiredo DR. struc2vec: Learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining; 2017. p. 385–394.
  • 64. Zou D, Hu Z, Wang Y, Jiang S, Sun Y, Gu Q. Layer-dependent importance sampling for training deep and large graph convolutional networks. Advances in neural information processing systems. 2019;32.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

CLEAR: Cluster-Enhanced Contrast for Self-Supervised Graph Representation Learning

  • PMID: 35675236
  • DOI: 10.1109/TNNLS.2022.3177775

This article studies self-supervised graph representation learning, which is critical to various tasks, such as protein property prediction. Existing methods typically aggregate representations of each individual node as graph representations, but fail to comprehensively explore local substructures (i.e., motifs and subgraphs), which also play important roles in many graph mining tasks. In this article, we propose a self-supervised graph representation learning framework named cluster-enhanced Contrast (CLEAR) that models the structural semantics of a graph from graph-level and substructure-level granularities, i.e., global semantics and local semantics, respectively. Specifically, we use graph-level augmentation strategies followed by a graph neural network-based encoder to explore global semantics. As for local semantics, we first use graph clustering techniques to partition each whole graph into several subgraphs while preserving as much semantic information as possible. We further employ a self-attention interaction module to aggregate the semantics of all subgraphs into a local-view graph representation. Moreover, we integrate both global semantics and local semantics into a multiview graph contrastive learning framework, enhancing the semantic-discriminative ability of graph representations. Extensive experiments on various real-world benchmarks demonstrate the efficacy of the proposed over current graph self-supervised representation learning approaches on both graph classification and transfer learning tasks.

PubMed Disclaimer

Similar articles

  • A Comprehensive Survey on Deep Graph Representation Learning. Ju W, Fang Z, Gu Y, Liu Z, Long Q, Qiao Z, Qin Y, Shen J, Sun F, Xiao Z, Yang J, Yuan J, Zhao Y, Wang Y, Luo X, Zhang M. Ju W, et al. Neural Netw. 2024 May;173:106207. doi: 10.1016/j.neunet.2024.106207. Epub 2024 Feb 27. Neural Netw. 2024. PMID: 38442651 Review.
  • Local structure-aware graph contrastive representation learning. Yang K, Liu Y, Zhao Z, Ding P, Zhao W. Yang K, et al. Neural Netw. 2024 Apr;172:106083. doi: 10.1016/j.neunet.2023.12.037. Epub 2023 Dec 27. Neural Netw. 2024. PMID: 38182463
  • Attribute-driven streaming edge partitioning with reconciliations for distributed graph neural network training. Mu Z, Tang S, Zhuang Y, Yu D. Mu Z, et al. Neural Netw. 2023 Aug;165:987-998. doi: 10.1016/j.neunet.2023.06.026. Epub 2023 Jun 28. Neural Netw. 2023. PMID: 37467586 Review.
  • A multi-view contrastive learning for heterogeneous network embedding. Li Q, Chen W, Fang Z, Ying C, Wang C. Li Q, et al. Sci Rep. 2023 Apr 25;13(1):6732. doi: 10.1038/s41598-023-33324-7. Sci Rep. 2023. PMID: 37185784 Free PMC article.
  • Attention-wise masked graph contrastive learning for predicting molecular property. Liu H, Huang Y, Liu X, Deng L. Liu H, et al. Brief Bioinform. 2022 Sep 20;23(5):bbac303. doi: 10.1093/bib/bbac303. Brief Bioinform. 2022. PMID: 35940592
  • Rumor detection based on Attention Graph Adversarial Dual Contrast Learning. Zhang B, Liu T, Ke Z, Li Y, Silamu W. Zhang B, et al. PLoS One. 2024 Apr 22;19(4):e0290291. doi: 10.1371/journal.pone.0290291. eCollection 2024. PLoS One. 2024. PMID: 38648224 Free PMC article.

LinkOut - more resources

Full text sources.

  • IEEE Engineering in Medicine and Biology Society

Research Materials

  • NCI CPTC Antibody Characterization Program
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Search Menu

Sign in through your institution

  • Advance Articles
  • Special Issues
  • Author Guidelines
  • Submission Site
  • Open Access
  • Reviewer Guidelines
  • Review and Appeals Process
  • About The Computer Journal
  • About the BCS, The Chartered Institute for IT
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

BCS, The Chartered Institute for IT

  • < Previous

Detection of E-Commerce Fraud Review via Self-Paced Graph Contrast Learning

  • Article contents
  • Figures & tables
  • Supplementary Data

WeiDong Zhao, XiaoTong Liu, Detection of E-Commerce Fraud Review via Self-Paced Graph Contrast Learning, The Computer Journal , Volume 67, Issue 6, June 2024, Pages 2054–2065, https://doi.org/10.1093/comjnl/bxad123

  • Permissions Icon Permissions

Recently, graph neural networks (GNNs) have been widely used for e-commerce review fraud detection by aggregating the neighborhood information of nodes in various relationships to highlight the suspiciousness of nodes. However, existing GNN-based detection methods are susceptible to sample class imbalance and fraud camouflage problems, resulting in poor quality of constructed graph structures and inability to learn reliable node embeddings. To address the above problems, we propose a novel e-commerce review fraud detection method based on self-paced graph contrast learning (SPCL-GNN). Firstly, the method constructs a subgraph by initially selecting nodes through a labeled balanced extractor. Secondly, the subgraph connections are filtered and complemented by combining self-paced graph contrast learning and an adaptive neighbor sampler to obtain an optimized graph structure. Again, an attention mechanism is introduced in intra- and inter-relationship aggregation to focus on the importance of aggregation under different relationships. Finally, the quality of the node embedding representation is further improved by maximizing the mutual information between the local and global representations. Experimental results on the Amazon and YelpChi datasets show that SPCL-GNN significantly outperforms the baseline.

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD

Institutional access

Sign in with a library card.

  • Sign in with username/password
  • Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Short-term Access

To purchase short-term access, please sign in to your personal account above.

Don't already have a personal account? Register

Month: Total Views:
December 2023 7
January 2024 16
February 2024 2
March 2024 8
April 2024 3
May 2024 1
June 2024 14

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1460-2067
  • Print ISSN 0010-4620
  • Copyright © 2024 British Computer Society
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

DGMem: learning visual navigation policy without any labels by dynamic graph memory

  • Published: 27 June 2024

Cite this article

graph self contrast representation learning

  • Wenzhe Cai 1 ,
  • Teng Wang 1 ,
  • Guangran Cheng 1 ,
  • Lele Xu 1 &
  • Changyin Sun   ORCID: orcid.org/0000-0001-9269-334X 1  

20 Accesses

Explore all metrics

In recent years, learning-based approaches have demonstrated significant promise in addressing intricate navigation tasks. Traditional methods for training deep neural network navigation policies rely on meticulously designed reward functions or extensive teleoperation datasets as navigation demonstrations. However, the former is often confined to simulated environments, and the latter demands substantial human labor, making it a time-consuming process. Our vision is for robots to autonomously learn navigation skills and adapt their behaviors to environmental changes without any human intervention. In this work, we discuss the self-supervised navigation problem and present Dynamic Graph Memory (DGMem), which facilitates training only with on-board observations. With the help of DGMem, agents can actively explore their surroundings, autonomously acquiring a comprehensive navigation policy in a data-efficient manner without external feedback. Our method is evaluated in photorealistic 3D indoor scenes, and empirical studies demonstrate the effectiveness of DGMem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

graph self contrast representation learning

Data Availability

The datasets generated for current study are available in the Habitat-Lab repository, https://github.com/facebookresearch/habitat-lab/blob/main/DATASETS.md . And the code will be available in the future in the following repository, https://github.com/wzcai99/DGMem-Navigator

LaValle SM (2006) Planning algorithms

Thrun S (2002) Probabilistic robotics. Commun ACM 45:52–57

Article   Google Scholar  

Wrobel BP (2001) Multiple view geometry in computer vision. Künstliche Intell 15:41

Google Scholar  

Wijmans E, Kadian A, Morcos AS, Lee S, Essa I, Parikh D, Savva M, Batra D (2020) Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames. In: ICLR

Datta S, Maksymets O, Hoffman J, Lee S, Batra D, Parikh D (2020) Integrating egocentric localization for more realistic point-goal navigation agents. In: Conference on Robot Learning

Du H, Yu X, Zheng L (2021) {VTN}et: Visual transformer network for object goal navigation. In: International Conference on Learning Representations

Pal A, Qiu Y, Christensen HI (2020) Learning hierarchical relationships for object-goal navigation. In: Conference on Robot Learning

Wu Y, Wu Y, Tamar A, Russell SJ, Gkioxari G, Tian Y (2019) Bayesian relational memory for semantic visual navigation. IEEE/CVF International Conference on Computer Vision (ICCV) 2019:2769–2779

Narasimhan M, Wijmans E, Chen X, Darrell T, Batra D, Parikh D, Singh A (2020) Seeing the un-scene: Learning amodal semantic maps for room navigation. Computer Vision - ECCV 2020:513–529

Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta AK, Fei-Fei L, Farhadi A (2016) Target-driven visual navigation in indoor scenes using deep reinforcement learning. IEEE International Conference on Robotics and Automation (ICRA) 2017:3357–3364

Choi Y, Oh S (2021) Image-goal navigation via keypoint-based reinforcement learning. 2021 18th International Conference on Ubiquitous Robots (UR), 18–21

Wang H, Wang W, Liang W, Xiong C, Shen J (2021) Structured scene memory for vision-language navigation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021:8451–8460

Chen C, Majumder S, Al-Halah Z, Gao R, Ramakrishnan SK, Grauman K (2021) Learning to set waypoints for audio-visual navigation. In: International Conference on Learning Representations

Ramrakhya R, Undersander E, Batra D, Das A (2022) Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022:5163–5173

Kolve E, Mottaghi R, Han W, VanderBilt E, Weihs L, Herrasti A, Deitke M, Ehsani K, Gordon D, Zhu Y, Kembhavi A, Gupta AK, Farhadi A (2017) Ai2-thor: An interactive 3d environment for visual ai. arXiv:1712.05474

Savva M, Kadian A, Maksymets O, Zhao Y, Wijmans E, Jain B, Straub J, Liu J, Koltun V, Malik J, Parikh D, Batra D (2019) Habitat: A platform for embodied ai research. IEEE/CVF International Conference on Computer Vision (ICCV) 2019:9338–9346

Szot A, Clegg A, Undersander E, Wijmans E, Zhao Y, Turner JM, Maestre ND, Mukadam M, Chaplot DS, Maksymets O, Gokaslan A, Vondruš V, Dharur S, Meier F, Galuba W, Chang AX, Kira Z, Koltun V, Malik J, Savva M, Batra D (2021) Habitat 2.0: Training home assistants to rearrange their habitat. In: Beygelzimer A, Dauphin Y, Liang P, Vaughan JW (eds) Advances in Neural Information Processing Systems

Shen B, Xia F, Li C, Mart’in-Mart’in R, Fan LJ, Wang G, Buch S, D’Arpino CP, Srivastava S, Tchapmi LP, Tchapmi M, Vainio K, Fei-Fei L, Savarese S (2020) igibson 1.0: A simulation environment for interactive tasks in large realistic scenes. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 7520–7527

Li C, Xia F, Martín-Martín R, Lingelbach M, Srivastava S, Shen B, Vainio KE, Gokmen C, Dharan G, Jain T, Kurenkov A, Liu K, Gweon H, Wu J, Fei-Fei L, Savarese S (2021) igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In: 5th Annual Conference on Robot Learning

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller MA, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529–533

Jaderberg M, Mnih V, Czarnecki WM, Schaul T, Leibo JZ, Silver D, Kavukcuoglu K (2017) Reinforcement learning with unsupervised auxiliary tasks. In: International Conference on Learning Representations

Fang K, Toshev A, Fei-Fei L, Savarese S (2019) Scene memory transformer for embodied agents in long-horizon tasks. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019:538–547

Chen T, Gupta S, Gupta A (2019) Learning exploration policies for navigation. In: International Conference on Learning Representations

Chang M, Gupta A, Gupta S (2020) Semantic visual navigation by watching youtube videos. Advances in Neural Information Processing Systems 33:4283–4294

Hahn M, Chaplot DS, Tulsiani S, Mukadam M, Rehg JM, Gupta A (2021) No RL, no simulation: Learning to navigate without navigating. In: Beygelzimer A, Dauphin Y, Liang P, Vaughan JW (eds) Advances in Neural Information Processing Systems

Shah D, Sridhar AK, Bhorkar A, Hirose N, Levine S (2022) Gnm: A general navigation model to drive any robot. IEEE International Conference on Robotics and Automation (ICRA) 2023:7226–7233

Chen AS, Nair S, Finn C (2021) Learning generalizable robotic reward functions from ”in-the-wild” human videos. In: Self-Supervision for Reinforcement Learning Workshop - ICLR 2021

Touati A, Ollivier Y (2021) Learning one representation to optimize all rewards. In: Beygelzimer A, Dauphin Y, Liang P, Vaughan JW (eds) Advances in Neural Information Processing Systems

Schwarzer M, Rajkumar N, Noukhovitch M, Anand A, Charlin L, Hjelm D, Bachman P, Courville AC (2021) Pretraining representations for data-efficient reinforcement learning. In: Neural Information Processing Systems

Eysenbach B, Gupta A, Ibarz J, Levine S (2019) Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations

Sharma A, Gu S, Levine S, Kumar V, Hausman K (2020) Dynamics-aware unsupervised discovery of skills. In: International Conference on Learning Representations

Chaplot DS, Gandhi D, Gupta S, Gupta A, Salakhutdinov R (2020) Learning to explore using active neural slam. In: International Conference on Learning Representations

Chaplot DS, Gandhi D, Gupta A, Salakhutdinov R (2020) Object goal navigation using goal-oriented semantic exploration. In: In Neural Information Processing Systems (NeurIPS)

Georgakis G, Bucher B, Schmeckpeper K, Singh S, Daniilidis K (2022) Learning to map for active semantic goal navigation. In: International Conference on Learning Representations

Kwon O, Kim N, Choi Y, Yoo H, Park J, Oh S (2021) Visual graph memory with unsupervised representation for visual navigation. IEEE/CVF International Conference on Computer Vision (ICCV) 2021:15870–15879

Kim N, Kwon O, Yoo H, Choi Y, Park J, Oh S (2022) Topological semantic graph memory for image goal navigation. In: CoRL

He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016:770–778

Ultralytics (2021) YOLOv5: A state-of-the-art real-time object detection system. https://docs.ultralytics.com

Cobbe K, Hilton J, Klimov O, Schulman J (2020) Phasic policy gradient. In: International Conference on Machine Learning

Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2017:488–489

Burda Y, Edwards H, Storkey A, Klimov O (2019) Exploration by random network distillation. In: International Conference on Learning Representations

Sammut C (2010). In: Sammut C, Webb GI (eds) Behavioral Cloning, pp 93–97. Springer, Boston, MA

Ross S, Gordon GJ, Bagnell JA (2010) A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

Khandelwal A, Weihs L, Mottaghi R, Kembhavi A (2022) Simple but effective: Clip embeddings for embodied ai. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Yadav K, Ramrakhya R, Majumdar A, Berges V-P, Kuhar S, Batra D, Baevski A, Maksymets O (2022) Offline visual representation learning for embodied navigation. arXiv:2204.13226

Download references

National Natural Science Foundation of China (Grant No. 62236002, 62273093, 61921004).

Author information

Authors and affiliations.

School of Automation, Southeast University, Sipailou Road 2, Nanjing, 210096, Jiangsu, China

Wenzhe Cai, Teng Wang, Guangran Cheng, Lele Xu & Changyin Sun

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the study conception and design. The Implementations, Material preparation, data collection were performed by Wenzhe Cai and Guangran Cheng. The first draft of the manuscript was written by Wenzhe Cai and thanks to Teng Wang and Changyin Sun for their revision of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Changyin Sun .

Ethics declarations

Ethics approval.

No human subjects or animals are involved in this study.

Conflicts of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cai, W., Wang, T., Cheng, G. et al. DGMem: learning visual navigation policy without any labels by dynamic graph memory. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05323-2

Download citation

Accepted : 05 February 2024

Published : 27 June 2024

DOI : https://doi.org/10.1007/s10489-024-05323-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Self-supervised learning
  • Reinforcement learning
  • Visual navigation
  • Find a journal
  • Publish with us
  • Track your research

ACM Digital Library home

  • Advanced Search

Self-Supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, index terms.

Applied computing

Life and medical sciences

Health informatics

Computing methodologies

Machine learning

Machine learning algorithms

Recommendations

Weakly semi-supervised phenotyping using electronic health records.

Display Omitted

  • WSS-DL yields high phenotyping performance despite using very small numbers of labels.
  • WSS-DL combines the strengths of deep learning and weakly supervised learning.
  • Unlike weakly supervised learning algorithms fail for episodic ...

Electronic Health Record (EHR) based phenotyping is a crucial yet challenging problem in the biomedical field. Though clinicians typically determine patient-level diagnoses via manual chart review, the sheer volume and heterogeneity of ...

Robust Hypergraph-Augmented Graph Contrastive Learning for Graph Self-Supervised Learning

Graph contrastive learning has emerged as a promising method for self-supervised graph representation learning. The traditional framework conventionally imposes two graph views generated by leveraging graph data augmentations. Such an approach focuses ...

Electronic health records: how can IS researchers contribute to transforming healthcare?

Electronic health records (EHR) facilitate integration of patient health history for planning safe and proper treatment. Combined with data analytics, aggregate-level EHR enable examination and development of effective medicines and therapies for ...

Information

Published in.

cover image ACM Transactions on Computing for Healthcare

University of Pennsylvania, USA

William and Mary, USA

Association for Computing Machinery

New York, NY, United States

Publication History

Check for updates, author tags.

  • Graph contrastive learning
  • patient representation learning
  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 165 Total Downloads
  • Downloads (Last 12 months) 165
  • Downloads (Last 6 weeks) 60

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

View this article in Full Text.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

IMAGES

  1. [PDF] Multi-Scale Contrastive Siamese Networks for Self-Supervised

    graph self contrast representation learning

  2. Self-Supervised Dynamic Graph Representation Learning via Temporal

    graph self contrast representation learning

  3. (PDF) Multi-Scale Contrastive Siamese Networks for Self-Supervised

    graph self contrast representation learning

  4. Top Research Topic in Cluster-Enhanced Contrast for Self-Supervised

    graph self contrast representation learning

  5. Self-Supervised Dynamic Graph Representation Learning via Temporal

    graph self contrast representation learning

  6. Self-Supervised Dynamic Graph Representation Learning via Temporal

    graph self contrast representation learning

VIDEO

  1. Constructing life sciences knowledge graphs with more semantics and less code

  2. Data structure unit-5 one short part-2

  3. best hand graph self defence road fight

  4. RAL-ICRA'22: SegContrast: 3D Point Cloud Feature Representation Learning ... by Nunes et al

  5. KDD 2023

  6. Unity How to create Shader Graph

COMMENTS

  1. [2309.02304] Graph Self-Contrast Representation Learning

    Graph Self-Contrast Representation Learning. Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs-K scheme to construct one positive and K negative samples for each graph, but it is difficult to set K. For those methods that do not use negative ...

  2. Graph Self-Contrast Representation Learning

    •We propose a novel graph self-contrast representation learning framework GraphSC. •We present a simple yet effective method to construct negative samples from graphs themselves in graph-level representation learning. •We use triplet loss in graph contrastive learning and address the hard-to-train problem of triplet loss by putting

  3. Graph Self-Contrast Representation Learning

    Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs-K scheme to construct one positive and K negative samples for each graph, but it is difficult to set K. For those methods that do not use negative samples, it is often necessary to add additional strategies to avoid model collapse, which could ...

  4. Graph Self-Contrast Representation Learning

    Specifically, self-contrast has two implications. First, GraphSC generates both positive and negative views of a graph sample from the graph itself via graph augmentation functions of various intensities, and use them for self-contrast. Second, GraphSC uses Hilbert-Schmidt Independence Criterion (HSIC) to factorize the representations into ...

  5. Graph Self-Contrast Representation Learning

    Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs-K scheme to construct one positive and K negative samples for each graph, but it is difficult to set K. For those methods that do not use negative samples, it is often necessary to add additional strategies to avoid model collapse, which could ...

  6. [2309.02304] Graph Self-Contrast Representation Learning

    Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs- scheme to construct one positive and negative samples for each gra…

  7. Generation-based Multi-view Contrast for Self-supervised Graph

    Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph.

  8. Graph Self-Contrast Representation Learning

    A novel graph self-contrast framework GraphSC is proposed, which only uses one positive and one negative sample, and chooses triplet loss as the objective, and uses Hilbert-Schmidt Independence Criterion to factorize the representations into multiple factors and proposes a masked self-Contrast mechanism to better separate positive and negative samples. Graph contrastive learning (GCL) has ...

  9. Generative Subgraph Contrast for Self-Supervised Graph Representation

    Contrastive learning has shown great promise in the field of graph representation learning. By manually constructing positive/negative samples, most graph contrastive learning methods rely on the vector inner product based similarity metric to distinguish the samples for graph representation. However, the handcrafted sample construction (e.g., the perturbation on the nodes or edges of the ...

  10. Unbiased and augmentation-free self-supervised graph representation

    Graph Contrastive Learning (GCL) is a promising self-supervised method for learning node representations that combines graph convolutional networks (GCN) and contrastive learning. However, existing GCL methods heavily rely on graph structure data and augmentation schemes to learn invariant representations between different augmentation views.

  11. Graph Self-Contrast Representation Learning

    Graph Self-Contrast Representation Learning . Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs-K scheme to construct one positive and K negative samples for each graph, but it is difficult to set K. ...

  12. Self-Supervised Dynamic Graph Representation Learning via Temporal

    Realistic graphs are often dynamic, which means the interaction between nodes occurs at a specific time. This article proposes a self-supervised dynamic graph representation learning framework DySubC, which defines a temporal subgraph contrastive learning task to simultaneously learn the structural and evolutional features of a dynamic graph.

  13. Dual-channel graph contrastive learning for self-supervised graph-level

    When it comes to the graph-level contrastive learning process, existing efforts [13], [14] usually only adopt node-graph (i.e., local-global) mode to contrast the node representations and corresponding graph-level representations. However, without graph-level contrasting pairs, these methods overwhelmingly focus on capturing graph summary from ...

  14. Tensor Representation Based Multi-View Graph Contrastive Learning for

    To this end, we design a novel tensor representation based multi-view contrastive graph representation learning framework including adaptive data augmentation, high-confidence sample pairs construction, and a simple yet effective self-optimizing module guided by clustering objective function, to address issues of graph contrastive learning in ...

  15. CLEAR: Cluster-Enhanced Contrast for Self-Supervised Graph

    This article studies self-supervised graph representation learning, which is critical to various tasks, such as protein property prediction. Existing methods typically aggregate representations of each individual node as graph representations, but fail to comprehensively explore local substructures (i.e., motifs and subgraphs), which also play important roles in many graph mining tasks. In ...

  16. Generative Subgraph Contrast for Self-Supervised Graph Representation

    To this end, in this paper, we propose a novel adaptive subgraph generation based contrastive learning framework for efficient and robust self-supervised graph representation learning, and the optimal transport distance is utilized as the similarity metric between the subgraphs. It aims to generate contrastive samples by capturing the intrinsic ...

  17. Generative Subgraph Contrast for Self-Supervised Graph Representation

    Graph representation learning [] has received intensive attention in recent years due to its superior performance in various downstream tasks, such as node/graph classification [17, 19], link prediction [] and graph alignment [].Most graph representation learning methods [10, 17, 31] are supervised, where manually annotated nodes are used as the supervision signal.

  18. PDF CLEAR: Cluster-Enhanced Contrast for Self-Supervised Graph

    In contrast to these supervised tasks, our work utilizes graph clustering algorithms in unsupervised scenarios, which efficiently capture semantic information from. a local view. To the best of our knowledge, we are the first to integrate graph clustering into the self-supervised graph representation learning task.

  19. Self-supervised contrastive graph representation with node and graph

    Contrastive graph representation learning (CGRL) is a self-supervised graph learning method. As shown in Fig. 1, it adds a contrastive module after the traditional graph neural network (GNN) as a contrastive loss to optimize the GNN model.It contrasts the original graph with the augmentation graph by positive-positive and positive-negative node pairs.

  20. Self-supervised Graph-level Representation Learning with Adversarial

    To tackle this issue, we propose a Graph Adversarial Contrastive Learning (GraphACL) scheme that learns a bank of negative samples for effective self-supervised whole-graph representation learning. Our GraphACL consists of (i) a graph encoding branch that generates the representations of positive samples and (ii) an adversarial generation ...

  21. Graph Self-Contrast Representation Learning

    Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs-K scheme to construct one positive and K ...

  22. Sub-graph Contrast for Scalable Self-Supervised Graph Representation

    In this paper, a novel self-supervised representation learning method via Subgraph Contrast, namely \textsc {Subg-Con}, is proposed by utilizing the strong correlation between central nodes and their sampled subgraphs to capture regional structure information. Instead of learning on the complete input graph data, with a novel data augmentation ...

  23. GitHub

    Here we provide an implementation of Subg-Con in PyTorch and Torch Geometric. The repository is organised as follows: subgcon.py is the implementation of the Subg-Con pipeline;; subgraph.py is the implementation of subgraph extractor;; model.py is the implementation of components for Subg-Con, including a GNN layer, a pooling layer, and a scoring function;

  24. Deep graph contrastive learning model for drug-drug interaction

    DSN-DDI has utilized both local and global representation learning modules, which can learn drug substructures from individual drugs (intra-view) and drug pairs (inter-view) simultaneously. m2vec has combined drug target networks with SMILES information and then used graph autoencoders to learn the final representation of drugs. The success of ...

  25. CLEAR: Cluster-Enhanced Contrast for Self-Supervised Graph

    In this article, we propose a self-supervised graph representation learning framework named cluster-enhanced Contrast (CLEAR) that models the structural semantics of a graph from graph-level and substructure-level granularities, i.e., global semantics and local semantics, respectively. Specifically, we use graph-level augmentation strategies ...

  26. Detection of E-Commerce Fraud Review via Self-Paced Graph Contrast Learning

    To address the above problems, we propose a novel e-commerce review fraud detection method based on self-paced graph contrast learning (SPCL-GNN). Firstly, the method constructs a subgraph by initially selecting nodes through a labeled balanced extractor. ... the node embedding representation is further improved by maximizing the mutual ...

  27. DGMem: learning visual navigation policy without any labels ...

    In recent years, learning-based approaches have demonstrated significant promise in addressing intricate navigation tasks. Traditional methods for training deep neural network navigation policies rely on meticulously designed reward functions or extensive teleoperation datasets as navigation demonstrations. However, the former is often confined to simulated environments, and the latter demands ...

  28. [2112.08733] Self-Supervised Dynamic Graph Representation Learning via

    Self-supervised learning on graphs has recently drawn a lot of attention due to its independence from labels and its robustness in representation. Current studies on this topic mainly use static information such as graph structures but cannot well capture dynamic information such as timestamps of edges. Realistic graphs are often dynamic, which means the interaction between nodes occurs at a ...

  29. arXiv:2406.18937v1 [cs.LG] 27 Jun 2024

    In contrast, intra-graph FGL involves each participant owning only a sub-set of the entire graph and the objective is to address miss- ... learning to graph-like data for self-supervised methods [Zhu et al., 2021b; Liu et al., 2022a]. Traditional unsupervised. methods on graph representation learning approaches [Grover and Leskovec, 2016 ...

  30. Self-Supervised Representation Learning on Electronic Health Records

    Recently, contrastive learning has shown great success in self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems.