Deep Graph Representation Learning and Optimization for Influence Maximization

deep graph representation learning and optimization for influence maximization

Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users. Researchers have made great progress in designing various traditional methods, and their theoretical design and performance gain are close to a limit. In the past few years, learning-based IM methods have emerged to achieve stronger generalization ability to unknown graphs than traditional ones. However, the development of learning-based IM methods is still limited by fundamental obstacles, including 1) the difficulty of effectively solving the objective function; 2) the difficulty of characterizing the diversified underlying diffusion patterns; and 3) the difficulty of adapting the solution under various node-centrality-constrained IM variants. To cope with the above challenges, we design a novel framework DeepIM to generatively characterize the latent representation of seed sets, and we propose to learn the diversified information diffusion pattern in a data-driven and end-to-end manner. Finally, we design a novel objective function to infer optimal seed sets under flexible node-centrality-based budget constraints. Extensive analyses are conducted over both synthetic and real-world datasets to demonstrate the overall performance of DeepIM. The code and data are available at: https://github.com/triplej0079/DeepIM.

deep graph representation learning and optimization for influence maximization

Junji Jiang

Junxiang Wang

Meikang Qiu

deep graph representation learning and optimization for influence maximization

Related Research

A cooperative game theoretic approach for the budgeted influence maximization problem, influence maximization via representation learning, profit maximization using social networks in two-phase setting, a survey on location-driven influence maximization, an invertible graph diffusion neural network for source localization, monotone submodular diversity functions for categorical vectors with application to diversification of seeds for targeted influence maximization, inf-vae: a variational autoencoder framework to integrate homophily and influence in diffusion prediction.

Please sign up or login with your details

Generation Overview

AI Generator calls

AI Video Generator calls

AI Chat messages

Genius Mode messages

Genius Mode images

AD-free experience

Private images

  • Includes 500 AI Image generations, 1750 AI Chat Messages, 30 AI Video generations, 60 Genius Mode Messages and 60 Genius Mode Images per month. If you go over any of these limits, you will be charged an extra $5 for that group.
  • For example: if you go over 500 AI images, but stay within the limits for AI Chat and Genius Mode, you'll be charged $5 per additional 500 AI Image generations.
  • Includes 100 AI Image generations and 300 AI Chat Messages. If you go over any of these limits, you will have to pay as you go.
  • For example: if you go over 100 AI images, but stay within the limits for AI Chat, you'll have to reload on credits to generate more images. Choose from $5 - $1000. You'll only pay for what you use.

Out of credits

Refill your membership to continue using DeepAI

Share your generations with friends

Deep Graph Representation Learning and Optimization for Influence Maximization

Cite this paper, related material.

  • Download PDF

Deep Graph Representation Learning and Optimization for Influence Maximization

Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users. Researchers have made great progress in designing various traditional methods, and their theoretical design and performance gain are close to a limit. In the past few years, learning-based IM methods have emerged to achieve stronger generalization ability to unknown graphs than traditional ones. However, the development of learning-based IM methods is still limited by fundamental obstacles, including 1) the difficulty of effectively solving the objective function; 2) the difficulty of characterizing the diversified underlying diffusion patterns; and 3) the difficulty of adapting the solution under various node-centrality-constrained IM variants. To cope with the above challenges, we design a novel framework DeepIM to generatively characterize the latent representation of seed sets, and we propose to learn the diversified information diffusion pattern in a data-driven and end-to-end manner. Finally, we design a novel objective function to infer optimal seed sets under flexible node-centrality-based budget constraints. Extensive analyses are conducted over both synthetic and real-world datasets to demonstrate the overall performance of DeepIM. The code and data are available at: \url https://github.com/triplej0079/DeepIM.

1 Introduction

As one of the fundamental research problems in network analysis, the objective of Influence Maximization (IM) is to find a set of seed nodes that maximizes the spread of influence in a social network. IM has been extensively studied in recent years due to its large commercial value. For example, consider the case of viral marketing  (Chen et al., 2010 ) for promoting a commercial product, where a company may wish to spread the adoption of a new product from some initially selected users, the selected initial users are expected to spread the information about the product on their respective social networks. This cascading process will be continued, and ultimately, a significant fraction of the users will try the product. Besides viral marketing, IM is also the cornerstone in many other critical applications such as network monitoring  (Wang et al., 2017 ) , misinformation containment  (Yang et al., 2020 ) , and friend recommendation  (Ye et al., 2012 ) .

As a typical type of combinatorial optimization problem, retrieving a (near) optimal seed set to maximize the influence in a network is challenging due to the stochastic nature of information diffusion and the hardness of the problem. Traditional (non-learning-based) methods for IM (Leskovec et al., 2007 ; Kempe et al., 2003 ; Tang et al., 2014 , 2015 ; Nguyen et al., 2016 ; Saito et al., 2012 ) have made great progress in the last decade, and Li et al. ( 2019b ) have even achieved exact solutions under specific diffusion models. The commonality of traditional methods is the explicit requirement of the information diffusion model as the model input. However, the real-world information diffusion process is complex and cannot be simply modeled by prescribed diffusion models. With the recent development of machine/deep learning, it is natural to consider a learning-based way to characterize the underlying diffusion process.

While great progress has been made in the field, current efforts on learning-based IM solutions are still in the infancy stage due to fundamental obstacles as follows. 1). The difficulty of efficiently optimizing the objective function. Learning-based IM methods tend to solve the discrete problem in continuous space by mostly leveraging deep network representations (Zhang et al., 2022 ; Kumar et al., 2022 ) and deep reinforcement learning (Tian et al., 2020 ; Li et al., 2022 ) . Even though they could attain a competitive performance with traditional methods, their scalability and execution efficiency are problematic due to (a) the need to iteratively update all node embeddings at each action and (b) the #P-hardness of computing the influence spread (Lin et al., 2017 ) . 2). The difficulty of automatically identifying and modeling the actual diffusion process. To maximize the influence spread in a network, the underlying information diffusion pattern is an imperative part as it determines the overall information propagation outcome. However, both traditional and learning-based methods cannot characterize the underlying diffusion process without heuristics. To work around this, both traditional and current learning-based methods have been leveraging pre-defined diffusion models (e.g., Linear Threshold (LT) and Independent Cascade (IC)) as the input to solve the combinatorial optimization problem. Although they could work well only for the process following their heuristics, the real-world network process is way more complex than the heuristics and largely unknown. 3). the difficulty of adapting solutions to various node-centrality-constrained IM problems. There are a lot of variants of IM that relate to node centrality, e.g., the constraint on the number of seed nodes, the constraint on the total degree of seed nodes, etc. Current learning-based IM solutions do not have a well-defined paradigm for solving different node-centrality-constrained IM problems, which poses another challenge to their solution adaptivity.

To address the above challenges, we propose a novel framework - DeepIM, to solve the IM problem by developing a novel strategy that embeds the initial discrete optimization domain into a larger continuous space. Remarkably, we propose to learn the latent representation of seed sets by retaining their expressiveness and directly optimizing in the continuous space to reduce the problem’s hardness. We further design a learning-based diffusion model to characterize the underlying diffusion dynamics in an end-to-end manner. Moreover, we develop a generic seed set inference framework to directly optimize and generate set embeddings under a uniform budget constraint. Finally, we summarize our contributions as follows:

Problem. We formulate the learning-based IM problem as embedding the initial discrete optimization domain into continuous space for easing the optimization and identify its unique challenges arising from real applications.

Framework. We propose modeling the representation of the seed set in a latent space, and the representation is jointly trained with the model that learns the underlying graph diffusion process in an end-to-end manner.

Adaptivity. We propose a novel constrained optimization objective function to infer the optimal seed set by leveraging deep graph embeddings, which can be applied under arbitrary node-centrality-related constraints.

Evaluation. We conduct extensive experiments over four real-world datasets to demonstrate the performance of the proposed method. Compared with other state-of-the-art in various application scenarios, DeepIM achieves the best results in finding a seed set to maximize the influence.

2 Related Work

2.1 learning-based influence maximization.

Influence Maximization (IM) was first formulated as a combinatorial optimization problem by Kempe et al. ( 2003 ) , which has inspired extensive research and applications in the next decade. Most of the traditional (i.e., non-learning-based) IM methods can be categorized as simulation-based, proxy-based, and heuristic-based. Traditional methods have achieved near or exact solutions under specific diffusion models with efficiency. Note that Du et al. ( 2014 ); Vaswani et al. ( 2017 ) have alluded to the possibility of learning the influence from the cascade data; however, they still assumed a prescribed model guides the diffusion pattern, specifically the Coverage function. We refer readers to recent surveys (Li et al., 2018 ; Banerjee et al., 2020 ) for more detailed reviews of traditional methods.

Learning-based methods use deep learning to address the drawbacks of traditional IM methods, namely the lack of generalization ability. Pioneered works (Lin et al., 2015 ; Ali et al., 2018 ) first combined reinforcement learning with IM, and they triggered extensive works that leveraged deep reinforcement learning to solve the IM problem. Existing state-of-the-art solutions (Li et al., 2019a ; Tian et al., 2020 ; Manchanda et al., 2020 ; Li et al., 2022 ; Chen et al., 2022 ) follow a similar paradigm: learning latent embeddings of nodes or networks, and taking the current node embedding as the state of the agent in order to choose the next seed node as action, where the reward is its marginal influence gain. Other than reinforcement learning-based IM methods, there also exist methods (Kumar et al., 2022 ; Kamarthi et al., 2019 ; Panagopoulos et al., 2020 ) that solely leverage graph neural networks to encode social influence into node embedding and guide the node selection process. The crux of current learning-based IM methods is also obvious, the model complexity and adaptivity of learning-based IM methods are still not comparable to traditional methods. Particularly, current ML-based algorithms can neither handle the diversified diffusion patterns nor guarantee the quality of the solution as well as the model scalability.

2.2 Graph Neural Network

Graph Neural Networks (GNNs) (Wu et al., 2020 ) are a class of deep learning methods designed to perform inference on data described by graphs. The general paradigm of GNNs alternates between node feature transformation and neighbor nodes’ information aggregation. For a K 𝐾 K -layer GNN, a node aggregates information within K 𝐾 K -hop neighbors. Specifically, the k 𝑘 k -th layer transformation is:

where a k superscript 𝑎 𝑘 a^{k} is an aggregated feature, and h k superscript ℎ 𝑘 h^{k} is the k 𝑘 k -th layer node feature. The flexibility of aggregation function 𝒜 ​ ( ⋅ ) 𝒜 ⋅ \mathcal{A}(\cdot) and combine function 𝒞 ​ ( ⋅ ) 𝒞 ⋅ \mathcal{C}(\cdot) functions induces different GNN models  (Veličković et al., 2017 ; Kipf & Welling, 2016 ; Xu et al., 2018 ; Wang et al., 2022b ) . The high-level representations of nodes or graphs are utilized for different tasks. GNNs have been applied in various learning tasks such as information diffusion estimation  (Chamberlain et al., 2021 ; Xia et al., 2021 ; Ko et al., 2020 ) , graph source localization   (Wang et al., 2022a ; Ling et al., 2022b ) , deep graph generation (Ling et al., 2021 , 2023a , 2023b ) , and graph analogical reasoning (Ling et al., 2022a ) . In this work, we leverage GNN to characterize the underlying diffusion pattern and construct an end-to-end model for estimating the influence.

3 Problem Formulation

y\in\mathbb{R}_{+} measures the total number of infected nodes  (Li et al., 2018 ) . Based on the formalization of the influence spread, the IM problem is defined as follows:

Definition 1 (Influence Maximization) .

The generic IM problem requires selecting a set of k 𝑘 k users from V 𝑉 V as the seed set to maximize the influence spread:

where 𝐱 ~ ~ 𝐱 \tilde{\mathbf{x}} is the optimal seed node set that can produce a maximal influence spread in G 𝐺 G .

Intuitively, selecting 𝐱 ~ ~ 𝐱 \tilde{\mathbf{x}} heavily depends on the underlying diffusion process. We have witnessed lots of works that develop algorithms with GNNs and reinforcement learning to tackle the problem. However, the expressiveness and generalization capability of existing learning-based IM frameworks is still limited due to the following challenges.

Challenges. Firstly, most existing learning-based IM frameworks calculate the latent node embedding for selecting high-influential ones. However, their objective functions require iteratively updating the latent embeddings for each node at each action/optimization step no matter whether they are included in the current 𝐱 𝐱 \mathbf{x} . This poses a severe scalability problem if we are dealing with million-scale networks. Secondly, even though we leverage deep node/network embedding and various reward functions to guide the node selection process, existing frameworks are still tailored for specific diffusion models (e.g., they model M ​ ( ⋅ ) 𝑀 ⋅ M(\cdot) as explicit IC and LT model). However, these simple diffusion models cannot meet the needs of real applications. Moreover, to ease the computational overhead of the #P-hard influence estimation, learning-based IM methods rely on techniques from traditional methods, such as proxy-based and sampling-based estimation way, which makes scalability and generalization even worse. Lastly, there are plenty of node-centrality-constrained IM variants. For example, other than regulating the budget of the seed nodes, we may also need to regulate the total cost of selecting seed nodes. Learning-based IM solutions have different objective functions designed according to various application scenarios, and they do not have a well-defined scheme for all of the node-centrality-related constraints.

In this section, we propose the DeepIM framework to ease the computational overhead of the learning-based IM methods and automatically identify the underlying diffusion patterns. The framework can be divided into two phases: the learning phase is leveraged to characterize the probability of the observed seed set and model the underlying information propagation distribution, and the inference phase is employed to optimize the selection of seeds in continuous space to maximize the influence spread.

Refer to caption

4.1 Learning Representation of Seed Set

To build an effective and efficient objective function, we propose to characterize the probability of the seed node set p ​ ( 𝐱 ) 𝑝 𝐱 p(\mathbf{x}) over 𝐱 𝐱 \mathbf{x} given the graph G 𝐺 G since learning p ​ ( 𝐱 ) 𝑝 𝐱 p(\mathbf{x}) can help depict the seed set’s underlying nature. However, learning such probability is not a trivial task because different nodes are inter-connected within each seed set and highly correlated based on the topology of G 𝐺 G . These connections make the relationship between nodes very complex and hard to decipher than other similar combinatorial problems.

Learning Probability over Seed Nodes. Instead of directly modeling the highly-intractable probability p ​ ( 𝐱 ) 𝑝 𝐱 p(\mathbf{x}) , we introduce an unobserved latent variable z 𝑧 z to represent 𝐱 𝐱 \mathbf{x} and define a conditional distribution p ​ ( 𝐱 | z ) 𝑝 conditional 𝐱 𝑧 p(\mathbf{x}|z) to quantify the likelihood. These latent variables have much lower dimensions than the observed sub-optimal seed sets, which can yield a compressed representation. Particularly, we marginalize over the latent variables to obtain p ​ ( 𝐱 ) = ∫ p ​ ( 𝐱 , z ) ​ 𝑑 z = ∫ p ​ ( 𝐱 | z ) ​ p ​ ( z ) ​ 𝑑 z . 𝑝 𝐱 𝑝 𝐱 𝑧 differential-d 𝑧 𝑝 conditional 𝐱 𝑧 𝑝 𝑧 differential-d 𝑧 p(\mathbf{x})=\int p(\mathbf{x},z)\,dz=\int p(\mathbf{x}|z)p(z)\,dz. The posterior likelihood p ​ ( z | 𝐱 ) = p ​ ( 𝐱 | z ) ⋅ p ​ ( z ) / p ​ ( 𝐱 ) 𝑝 conditional 𝑧 𝐱 ⋅ 𝑝 conditional 𝐱 𝑧 𝑝 𝑧 𝑝 𝐱 p(z|\mathbf{x})=p(\mathbf{x}|z)\cdot p(z)/p(\mathbf{x}) allows us to infer z 𝑧 z given the observed seed sets 𝐱 𝐱 \mathbf{x} . In this work, we adopt autoencoder to generatively infer the posterior, where both encoder f ϕ subscript 𝑓 italic-ϕ f_{\phi} (parameterized by ϕ italic-ϕ \phi ) and decoder f ψ subscript 𝑓 𝜓 f_{\psi} (parameterized by ψ 𝜓 \psi ) are used to characterize the likelihood of both posterior and conditional distribution, respectively. The objective of the autoencoder is to maximize the joint likelihood:

Learning the End-to-end Diffusion Model. Once we have learned the latent distribution of seed nodes p ​ ( 𝐱 ) 𝑝 𝐱 p(\mathbf{x}) , the next step is to update the seed node set 𝐱 𝐱 \mathbf{x} in order to increase the marginal gain of the influence spread. Current learning-based IM solutions still assume the computation of the influence spread (i.e., M ​ ( 𝐱 , G ; θ ) 𝑀 𝐱 𝐺 𝜃 M(\mathbf{x},G;\theta) ) relies on prescribed mathematical models. However, real-world information diffusion is complicated, and it is not easy to determine the most suitable diffusion model in practice. A chosen diffusion model may be misspecified compared to real-world data and lead to large model bias. In addition, the diffusion network structure can also be hidden from us, so we need to learn not only the parameters in the diffusion model but also the diffusion network structure (Du et al., 2014 ) .

y=g_{r}(\tau;\xi),y\in\mathbb{R}_{+} denotes the final information spread, where g r ​ ( ⋅ ) subscript 𝑔 𝑟 ⋅ g_{r}(\cdot) is a normalization function (e.g., l 𝑙 l -1 norm) and ξ 𝜉 \xi is the threshold to transform the probability into discrete value. The GNN-based M ​ ( ⋅ ) 𝑀 ⋅ M(\cdot) is visualized in Fig. 1 (a).

Definition 2 ( Score Monotonicity and Infection Monotonicity ) .

Monotonicity is a natural property for us in modeling the overall diffusion network structure. A monotonic diffusion model indicates the spread of influence would continue to increase. Intuitively, if we select a larger community 𝐱 ′ superscript 𝐱 ′ \mathbf{x}^{\prime} as the seed set, the larger 𝐱 ′ superscript 𝐱 ′ \mathbf{x}^{\prime} would intrinsically infect no less nodes in the whole network than a smaller seed set 𝐱 𝐱 \mathbf{x} if 𝐱 ⪯ 𝐱 ′ precedes-or-equals 𝐱 superscript 𝐱 ′ \mathbf{x}\preceq\mathbf{x}^{\prime} . Ensuring the property of both monotonicities allows us to better characterize the underlying diffusion network structure and mimic the real-world diffusion pattern (Dolhansky & Bilmes, 2016 ) . Hence, we add constraints to make the GNN-based diffusion model M ​ ( 𝐱 , G ; θ ) 𝑀 𝐱 𝐺 𝜃 M(\mathbf{x},G;\theta) monotonic during the influence spread estimation.

Theorem 1 ( Monotonicity of GNN Models ) .

We further illustrate that the well-known Graph ATtention network (GAT) can be score and infection monotonic under the constraint we claimed in Theorem 1 . Note that we follow the standard network structures of GAT as introduced in the original paper.

Corollary 2 ( Montonicity of GAT ) .

M 𝑀 M is score and infection monotonic when g u subscript 𝑔 𝑢 g_{u} is GAT if θ k ≥ 0 superscript 𝜃 𝑘 0 \theta^{k}\geq 0 in Eq. ( 1 ) and g r subscript 𝑔 𝑟 g_{r} is also non-decreasing.

Due to the space limitation, the proof of Theorem 1 and Corollary 2 are provided in Appendix A . According to Theorem 1 and Corollary 2 , the GNN-based M ​ ( 𝐱 , G ; θ ) 𝑀 𝐱 𝐺 𝜃 M(\mathbf{x},G;\theta) has the theoretical guarantee to retain monotonicity, and the objective of learning the GNN-based M ​ ( 𝐱 , G ; θ ) 𝑀 𝐱 𝐺 𝜃 M(\mathbf{x},G;\theta) is given as maximizing the following probability with a constraint:

End-to-end Learning Objective. Finally, in order to bridge the representation learning and the learning of the diffusion model, we propose a unified objective function in an end-to-end manner by putting together Eq. ( 3 ) and ( 4 ) as:

However, optimizing the expectation of joint probabilities could be computationally difficult. We instead derive the negative log \log term of Eq. ( 5 ) and derive its lower bound as the final learning objective according to Jensen’s inequality:

4.2 Seed Node Set Inference

To infer the high-influential seed node set in the testing domain, we leverage the latent distribution p ​ ( 𝐱 ) 𝑝 𝐱 p(\mathbf{x}) of the seed node set and the end-to-end diffusion model M ​ ( ⋅ ) 𝑀 ⋅ M(\cdot) jointly from Eq. ( 4.1 ). Firstly, if the autoencoder is well trained and can retain both continuity (i.e., two close points in the latent space should not give two completely different contents once decoded) and completeness (i.e., for a chosen distribution, a point sampled from the latent space should give “meaningful” content once decoded), the autoencoder in Eq. ( 3 ) can generate contents by exploiting the latent feature space p ​ ( z ) 𝑝 𝑧 p(z) learned from all the examples it was trained from, i.e., p ​ ( 𝐱 ) 𝑝 𝐱 p(\mathbf{x}) . Therefore, we propose to alternatively search the optimal seed node set x ~ ~ 𝑥 \tilde{x} in the lower-dimensional and less-noisy latent space p ​ ( z ) 𝑝 𝑧 p(z) . The following corollary demonstrates it is equivalent to estimating the influence spread with the latent variable z 𝑧 z rather than high-dimensional 𝐱 𝐱 \mathbf{x} if the autoencoder retains both continuity and completeness.

Corollary 3 ( Influence Estimation Consistency ) .

For any M ​ ( f ψ ​ ( z ( i ) ) , G ; θ ) > M ​ ( f ψ ​ ( z ( j ) ) , G ; θ ) 𝑀 subscript 𝑓 𝜓 superscript 𝑧 𝑖 𝐺 𝜃 𝑀 subscript 𝑓 𝜓 superscript 𝑧 𝑗 𝐺 𝜃 M(f_{\psi}(z^{(i)}),G;\theta)>M(f_{\psi}(z^{(j)}),G;\theta) , we have M ​ ( x ( i ) , G ; θ ) > M ​ ( x ( j ) , G ; θ ) 𝑀 superscript 𝑥 𝑖 𝐺 𝜃 𝑀 superscript 𝑥 𝑗 𝐺 𝜃 M(x^{(i)},G;\theta)>M(x^{(j)},G;\theta) .

Adaptation to Different IM Variants with Node Centrality Constraints. Since the introduction of IM in (Kempe et al., 2003 ) , IM was studied under various budget constrained settings on nodes in recent years. To enhance the adaptivity of DeepIM, we design a unified constraint that allows inferring seed sets under various budgets on individual nodes. Specifically, the objective ℒ pred subscript ℒ pred \mathcal{L}_{\text{pred}} is given as:

where ∑ i = 0 | V | ℱ ​ ( v i , G ) ⋅ x i subscript superscript 𝑉 𝑖 0 ⋅ ℱ subscript 𝑣 𝑖 𝐺 subscript 𝑥 𝑖 \sum\nolimits^{|V|}_{i=0}\mathcal{F}(v_{i},G)\cdot x_{i} is a generalized budget constraint applied on individual nodes, and k 𝑘 k is the actual budget. For the vanilla IM problem that only requires selecting a given number of seed nodes, ∑ i = 0 | V | ℱ ​ ( v i , G ) ⋅ x i subscript superscript 𝑉 𝑖 0 ⋅ ℱ subscript 𝑣 𝑖 𝐺 subscript 𝑥 𝑖 \sum\nolimits^{|V|}_{i=0}\mathcal{F}(v_{i},G)\cdot x_{i} can be derived as ∥ x ⋅ 𝟏 ∥ 1 subscript delimited-∥∥ ⋅ 𝑥 1 1 \lVert x\cdot\mathbf{1}\rVert_{1} , where the 𝟏 ∈ { 1 } N × 1 1 superscript 1 𝑁 1 \mathbf{1}\in\{1\}^{N\times 1} is an all-one vector indicating the price of selecting each node are the same. In addition, for node degree constrained IM problems  (Leskovec et al., 2007 ; Nguyen et al., 2017 ) , ∑ i = 0 | V | ℱ ​ ( v i , G ) ⋅ x i subscript superscript 𝑉 𝑖 0 ⋅ ℱ subscript 𝑣 𝑖 𝐺 subscript 𝑥 𝑖 \sum\nolimits^{|V|}_{i=0}\mathcal{F}(v_{i},G)\cdot x_{i} can be derived as ∥ x ⋅ A ∥ 1 subscript delimited-∥∥ ⋅ 𝑥 𝐴 1 \lVert x\cdot A\rVert_{1} , where A ∈ { 0 , 1 } N × N 𝐴 superscript 0 1 𝑁 𝑁 A\in\{0,1\}^{N\times N} is the adjacency matrix of the network G 𝐺 G , and ∥ 𝐱 ⋅ A i ∥ 1 ≤ k subscript delimited-∥∥ ⋅ 𝐱 superscript 𝐴 𝑖 1 𝑘 \lVert\mathbf{x}\cdot A^{i}\rVert_{1}\leq k represents the l ​ 1 𝑙 1 l1 -norm of the total seed node degree is bounded by a budget k 𝑘 k . The budget constraint ℱ ​ ( v i , G ) ⋅ x i ≤ k ⋅ ℱ subscript 𝑣 𝑖 𝐺 subscript 𝑥 𝑖 𝑘 \mathcal{F}(v_{i},G)\cdot x_{i}\leq k can also be easily designed, combined, and adapted to solve the IM variants with even non-uniform prices on each node. With the proposed flexible constraint, the challenge of IM methods’ adaptability can be partially addressed.

where the y ~ ~ 𝑦 \tilde{y} is given as the optimal influence spread (i.e., y ~ = | V | ~ 𝑦 𝑉 \tilde{y}=|V| ), and the full derivation of the above equation is provided in Appendix. Furthermore, we utilize the Projected Gradient Descent and propose a regularization function Φ ​ ( 𝐱 ) Φ 𝐱 \Phi(\mathbf{x}) to keep the predicted seed set 𝐱 𝐱 \mathbf{x} in a valid region in terms of different constraints. For example, Φ ​ ( 𝐱 ) Φ 𝐱 \Phi(\mathbf{x}) can be defined as selecting k 𝑘 k nodes with the highest probabilities when the price of selecting each node is equal in Eq. ( 4.2 ). Φ ​ ( 𝐱 ) Φ 𝐱 \Phi(\mathbf{x}) can also be defined as cost-efficiently selecting the top- k 𝑘 k nodes from 𝐱 / c ​ ( 𝐱 ) 𝐱 𝑐 𝐱 \mathbf{x}/c(\mathbf{x}) , where c ​ ( 𝐱 ) 𝑐 𝐱 c(\mathbf{x}) denotes the budget on one node (e.g., node degree). Finally, The optimization procedure is summarized in Algorithm 1 . Specifically, we firstly sample an initial latent variable z 𝑧 z on Line 1 1 1 . From Line 2 2 2 - 6 6 6 , we iteratively solve the optimization problem proposed in Eq. ( 4.2 ) via gradient descent optimizer (e.g., Adam) while regularizing the predicted seed set in a valid region with Φ ​ ( ⋅ ) Φ ⋅ \Phi(\cdot) . Fig. 1 (b) illustrates the overall process of the inference objective learning. We provide the derivation details of both Eq. ( 4.1 ) and ( 4.2 ) in Appendix A .

5 Experiment

In this section, we compare the performance of our proposed DeepIM framework across six real networks in maximizing the influence under various settings, following a case study to qualitatively demonstrate the performance of DeepIM. Due to the space limit, more details of the experiment setup, hyperparameter settings, dataset description, and comparison methods can be found in Appendix.

5.1 Experiment Setup

Our primary purpose is to evaluate the expected influence spread as defined in Eq. ( 2 ) under various IM application scenarios. Since DeepIM can be easily adapted to different diffusion patterns, we choose two representative models that are commonly used in the IM problem, i.e., LT and IC model. In addition, we also evaluate the IM problem under the susceptible-infectious-susceptible (SIS) epidemic model  (Kermack & McKendrick, 1927 ) , where the major difference is activated nodes can be de-activated in SIS. Due to the space limit, the experiment of the non-progressive diffusion model can be found in Appendix.

Refer to caption

5.2 Comparison Method.

5.3 quantitative analysis.

We evaluate the performance of DeepIM in maximizing the influence against other approaches under various IM application schemes. Each model selects 1%, 5%, 10%, and 20% nodes in each dataset as seed nodes, and we allow each diffusion model to simulate until the diffusion process stops and record the average influence spread of 100 100 100 rounds. We report the percentage of final infected nodes (i.e., the number of infected nodes/the total number of nodes).

Refer to caption

IM under IC Model. We first examine the effectiveness of DeepIM against other baseline methods under the IC diffusion pattern. As shown in Table 2 , DeepIM can achieve an overall better performance than other methods across all datasets. Compared to traditional methods, IMM, OPIM, and SubSIM are three state-of-the-art methods based on reserve-set sampling and various approximation techniques, which generate similar results across all datasets; however, they rely on different heuristics to guide the node selection for efficiency improvement and fail to decode the underlying distribution of seed sets. OIM achieves better performance than traditional methods in most datasets because it can automatically update the edge weight iteratively. However, the disadvantage of OIM is also obvious: it is tailored for the specific IC diffusion model, which is less applicable in real-world scenarios. Lastly, the learning-based IM methods (IMINFECTOR, PIANO, and ToupleGDD) achieve competitive and generally better performance than traditional ones due to their larger model size and better generalization ability. However, learning-based methods that leverage reinforcement learning experience scalability issues and they cannot be applied in billion-scale networks (e.g., Digg and Weibo), which makes them hard to be applied in real-world scenarios. Compared to learning-based methods, DeepIM proposes a more robust way of learning the end-to-end diffusion model and searching the high-influential node set in latent space directly, which can better capture the underlying diffusion dynamics and resolve the scalability issues. In addition, DeepIM s incorporates a lightweight end-to-end learning-based diffusion model that could retain both efficacy and efficiency than other learning-based methods.

IM under LT Model. We then evaluate the final influence spread with respect to the initial seed set size by assuming LT as the diffusion model. As shown in Table 3 , DeepIM can generate more superior seed sets to infect the most number of nodes and excel other methods by an evident margin across all datasets. Notably, DeepIM demonstrates its superiority in the Synthetic dataset that can effectively spread the influence to the whole network when choosing 20 % percent 20 20\% of the node as the initial seed set, yet other methods can infect at most 70 % percent 70 70\% nodes of the network. Specifically, DeepIM and DeepIM s excel other methods by on average 200 % percent 200 200\% in the Jazz dataset and 30 % percent 30 30\% in the Synthetic dataset. The reason is largely due to the lack of generalization capability of other methods under various diffusion models.

IM with Budget Constraint. We then compare the quality of the seed sets generated by DeepIM and CELF under the IC and LT model with the budget constraint, and such a budget is explicitly defined as the node degree in this paper. As can be seen from Fig. 2 , our proposed method generally performs better than CELF across all networks of different sizes, and the margins are more evident under the LT model (Fig. 2f - 2j ). In addition, compared to CELF, the growths of influence spread in DeepIM have fewer fluctuations across all datasets, which also demonstrates the stability of DeepIM because of its capability of identifying the latent distribution seed sets while considering the budget constraint.

5.4 Scalability Analysis

We record the runtime of the seed set inference with regard to the increase in node size against other learning-based IM solutions. As can be clearly seen in Table 4 , DeepIM demonstrates near-linear growth of runtime as the graph size increases. In addition, it achieves a generally shorter inference time (on average 20 % percent 20 20\% faster inference time than the second-fast IMINFECTOR) compared to other learning-based methods. In addition, our DeepIM s coupled with a lightweight end-to-end diffusion model can greatly reduce the computational cost of estimating the expected influence spread, and achieves even 90 % percent 90 90\% improvement in the inference time on average than our DeepIM model.

5.5 Case Study: Graph Diffusion Visualization

Finally, we conduct a case study to demonstrate the distribution of selected 20 % percent 20 20\% seed nodes as well as the final infection status of all nodes in Fig. 3 , where blue nodes indicate the initial seed node, red nodes indicate the infected node during the influence spread, and grey nodes represent uninfected nodes. Due to the ease of representation, we only visualize the result of the Jazz dataset because of its overall smaller graph size. Overall, DeepIM demonstrates better performance in terms of spreading influence. Due to the space limit, we compare the influence spread result between different initial seed set sizes, namely 10 % percent 10 10\% and 20 % percent 20 20\% , in the appendix and provide more discussions there.

6 Conclusion

In this paper, we propose a novel framework to tackle the IM problem in a more robust and generalized way than existing learning-based IM methods. Particularly, to characterize the complex nature of the seed set, we propose to character the probability of the seed set and directly search for a more optimal seed set in continuous space. Furthermore, to solve the challenge of modeling the underlying diffusion pattern, we offer two different learning-based diffusion models to characterize the diversified diffusion dynamics with efficiency and efficacy guarantee. Finally, we propose a novel objective function that can be coupled with multiple constraints for seed node set inference, which can adapt to different IM application schemes. Extensive experiments and case studies on both synthetic and real-world datasets demonstrate the advantages of DeepIM over existing state-of-the-art methods to maximize the influence spread.

  • Ali et al. (2018) Ali, K., Wang, C.-Y., and Chen, Y.-S. Boosting reinforcement learning in competitive influence maximization with transfer learning. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI) , pp.  395–400. IEEE, 2018.
  • Banerjee et al. (2020) Banerjee, S., Jenamani, M., and Pratihar, D. K. A survey on influence maximization in a social network. KAIS , 62(9):3417–3455, 2020.
  • Chamberlain et al. (2021) Chamberlain, B., Rowbottom, J., Gorinova, M. I., Bronstein, M., Webb, S., and Rossi, E. Grand: Graph neural diffusion. In Proc. of the ICML , pp.  1407–1418, 2021.
  • Chen et al. (2022) Chen, T., Yan, S., Guo, J., and Wu, W. Touplegdd: A fine-designed solution of influence maximization by deep reinforcement learning. arXiv preprint arXiv:2210.07500 , 2022.
  • Chen et al. (2010) Chen, W., Wang, C., and Wang, Y. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proc. of the KDD , pp.  1029–1038, 2010.
  • Dolhansky & Bilmes (2016) Dolhansky, B. W. and Bilmes, J. A. Deep submodular functions: Definitions and learning. Advances in Neural Information Processing Systems , 29, 2016.
  • Du et al. (2014) Du, N., Liang, Y., Balcan, M., and Song, L. Influence function learning in information diffusion networks. In ICML , pp.  2016–2024, 2014.
  • Erdős et al. (1960) Erdős, P., Rényi, A., et al. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci , 5(1):17–60, 1960.
  • Guo et al. (2020) Guo, Q., Wang, S., Wei, Z., and Chen, M. Influence maximization revisited: Efficient reverse reachable set generation with bound tightened. In Proc. of the SIGMOD , pp.  2167–2181, 2020.
  • Kamarthi et al. (2019) Kamarthi, H., Vijayan, P., Wilder, B., Ravindran, B., and Tambe, M. Influence maximization in unknown social networks: Learning policies for effective graph sampling. arXiv preprint arXiv:1907.11625 , 2019.
  • Kempe et al. (2003) Kempe, D., Kleinberg, J., and Tardos, É. Maximizing the spread of influence through a social network. In Proc. of the KDD , 2003.
  • Kermack & McKendrick (1927) Kermack, W. O. and McKendrick, A. G. A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london. Series A, Containing papers of a mathematical and physical character , 115(772):700–721, 1927.
  • Kipf & Welling (2016) Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 , 2016.
  • Ko et al. (2020) Ko, J., Lee, K., Shin, K., and Park, N. Monstor: An inductive approach for estimating and maximizing influence over unseen networks. In Proc. of the ASONAM , pp.  204–211, 2020.
  • Kumar et al. (2022) Kumar, S., Mallik, A., Khetarpal, A., and Panda, B. Influence maximization in social networks using graph embedding and graph neural network. Information Sciences , 607:1617–1636, 2022.
  • Lei et al. (2015) Lei, S., Maniu, S., Mo, L., Cheng, R., and Senellart, P. Online influence maximization. In Proc. of the KDD , 2015.
  • Leskovec et al. (2007) Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., and Glance, N. Cost-effective outbreak detection in networks. In Proc. of the KDD , 2007.
  • Li et al. (2019a) Li, H., Xu, M., Bhowmick, S. S., Sun, C., Jiang, Z., and Cui, J. Disco: Influence maximization meets network embedding and deep learning. arXiv preprint arXiv:1906.07378 , 2019a.
  • Li et al. (2022) Li, H., Xu, M., Bhowmick, S. S., Rayhan, J. S., Sun, C., and Cui, J. Piano: Influence maximization meets deep reinforcement learning. IEEE Transactions on Computational Social Systems , 2022.
  • Li et al. (2019b) Li, X., Smith, J. D., Dinh, T. N., and Thai, M. T. Tiptop:(almost) exact solutions for influence maximization in billion-scale networks. IEEE/ACM Transactions on Networking , 27(2):649–661, 2019b.
  • Li et al. (2018) Li, Y., Fan, J., Wang, Y., and Tan, K.-L. Influence maximization on social graphs: A survey. TKDE , 30(10):1852–1872, 2018.
  • Lin et al. (2015) Lin, S.-C., Lin, S.-D., and Chen, M.-S. A learning-based framework to handle multi-round multi-party influence maximization on social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp.  695–704, 2015.
  • Lin et al. (2017) Lin, Y., Chen, W., and Lui, J. C. Boosting information spread: An algorithmic approach. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE) , pp.  883–894, 2017.
  • Ling et al. (2021) Ling, C., Yang, C., and Zhao, L. Deep generation of heterogeneous networks. In 2021 IEEE International Conference on Data Mining (ICDM) , pp.  379–388. IEEE, 2021.
  • Ling et al. (2022a) Ling, C., Chowdhury, T., Jiang, J., Wang, J., Zhang, X., Chen, H., and Zhao, L. Deepgar: Deep graph learning for analogical reasoning. In 2022 IEEE International Conference on Data Mining (ICDM) , pp.  1065–1070, Los Alamitos, CA, USA, Dec 2022a.
  • Ling et al. (2022b) Ling, C., Jiang, J., Wang, J., and Zhao, L. Source localization of graph diffusion via variational autoencoders for graph inverse problems. In Proc. of the KDD , 2022b.
  • Ling et al. (2023a) Ling, C., Cao, H., and Zhao, L. Stgen: Deep continuous-time spatiotemporal graph generation. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part III , pp.  340–356, 2023a.
  • Ling et al. (2023b) Ling, C., Yang, C., and Zhao, L. Motif-guided heterogeneous graph deep generation. Knowledge and Information Systems , pp.  1–26, 2023b.
  • Manchanda et al. (2020) Manchanda, S., Mittal, A., Dhawan, A., Medya, S., Ranu, S., and Singh, A. Gcomb: Learning budget-constrained combinatorial algorithms over billion-sized graphs. Advances in Neural Information Processing Systems , 33:20000–20011, 2020.
  • McCallum et al. (2000) McCallum, A. K., Nigam, K., Rennie, J., and Seymore, K. Automating the construction of internet portals with machine learning. Information Retrieval , 3(2):127–163, 2000.
  • Nguyen et al. (2016) Nguyen, H. T., Thai, M. T., and Dinh, T. N. Stop-and-stare: Optimal sampling algorithms for viral marketing in billion-scale networks. In Proc. of the SIGMOD , 2016.
  • Nguyen et al. (2017) Nguyen, H. T., Thai, M. T., and Dinh, T. N. A billion-scale approximation algorithm for maximizing benefit in viral marketing. IEEE/ACM Transactions On Networking , 25(4):2419–2429, 2017.
  • Panagopoulos et al. (2020) Panagopoulos, G., Malliaros, F., and Vazirgiannis, M. Multi-task learning for influence estimation and maximization. IEEE Transactions on Knowledge and Data Engineering , 2020.
  • Rossi & Ahmed (2015) Rossi, R. A. and Ahmed, N. K. The network data repository with interactive graph analytics and visualization. In AAAI , 2015.
  • Saito et al. (2012) Saito, K., Kimura, M., Ohara, K., and Motoda, H. Efficient discovery of influential nodes for sis models in social networks. Knowledge and information systems , 30(3):613–635, 2012.
  • Tang et al. (2018) Tang, J., Tang, X., Xiao, X., and Yuan, J. Online processing algorithms for influence maximization. In Proc. of the SIGMOD , pp.  991–1005, 2018.
  • Tang et al. (2014) Tang, Y., Xiao, X., and Shi, Y. Influence maximization: Near-optimal time complexity meets practical efficiency. In Proc. of the SIGMOD , pp.  75–86, 2014.
  • Tang et al. (2015) Tang, Y., Shi, Y., and Xiao, X. Influence maximization in near-linear time: A martingale approach. In Proc. of the SIGMOD , 2015.
  • Tian et al. (2020) Tian, S., Mo, S., Wang, L., and Peng, Z. Deep reinforcement learning-based approach to tackle topic-aware influence maximization. Data Science and Engineering , 5(1):1–11, 2020.
  • Vaswani et al. (2017) Vaswani, S., Kveton, B., Wen, Z., Ghavamzadeh, M., Lakshmanan, L. V., and Schmidt, M. Model-independent online learning for influence maximization. In ICML , 2017.
  • Veličković et al. (2017) Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. arXiv preprint arXiv:1710.10903 , 2017.
  • Wang et al. (2022a) Wang, J., Jiang, J., and Zhao, L. An invertible graph diffusion neural network for source localization. In Proceedings of the ACM Web Conference 2022 , pp. 1058–1069, 2022a.
  • Wang et al. (2022b) Wang, J., Li, H., Chai, Z., Wang, Y., Cheng, Y., and Zhao, L. Toward quantized model parallelism for graph-augmented mlps based on gradient-free admm framework. IEEE Transactions on Neural Networks and Learning Systems , pp.  1–11, 2022b. doi: 10.1109/TNNLS.2022.3223879.
  • Wang et al. (2017) Wang, Y., Fan, Q., Li, Y., and Tan, K.-L. Real-time influence maximization on dynamic social streams. arXiv preprint arXiv:1702.01586 , 2017.
  • Wu et al. (2020) Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Philip, S. Y. A comprehensive survey on graph neural networks. IEEE TNNLS , 32(1):4–24, 2020.
  • Xia et al. (2021) Xia, W., Li, Y., Wu, J., and Li, S. Deepis: Susceptibility estimation on social networks. In Proc. of the WSDM , pp.  761–769, 2021.
  • Xu et al. (2018) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 , 2018.
  • Yang et al. (2020) Yang, L., Li, Z., and Giua, A. Containment of rumor spread in complex social networks. Information Sciences , 506:113–130, 2020.
  • Ye et al. (2012) Ye, M., Liu, X., and Lee, W.-C. Exploring social influence for recommendation: a generative model approach. In Proc. of the SIGIR , pp.  671–680, 2012.
  • Zhang et al. (2022) Zhang, C., Li, W., Wei, D., Liu, Y., and Li, Z. Network dynamic gcn influence maximization algorithm with leader fake labeling mechanism. IEEE Transactions on Computational Social Systems , 2022.

Appendix A Proofs

The proof of Theorem 1 is demonstrated as follows.

g u ​ ( x , G ; θ ) = 𝒜 1 ∘ ( C 1 ∘ 𝒜 2 ∘ C 2 ​ ⋯ ∘ 𝒜 K ∘ C K ) subscript 𝑔 𝑢 𝑥 𝐺 𝜃 superscript 𝒜 1 superscript 𝐶 1 superscript 𝒜 2 superscript 𝐶 2 ⋯ superscript 𝒜 𝐾 superscript 𝐶 𝐾 g_{u}(x,G;\theta)=\mathcal{A}^{1}\circ(C^{1}\circ\mathcal{A}^{2}\circ C^{2}\cdots\circ\mathcal{A}^{K}\circ C^{K}) via iterating Eq. ( 1 ) recursively. Because 𝒜 k superscript 𝒜 𝑘 \mathcal{A}^{k} and C k superscript 𝐶 𝑘 C^{k} are non-decreasing, so is 𝒜 1 ∘ C 1 ​ ⋯ ∘ 𝒜 K ∘ C K superscript 𝒜 1 superscript 𝐶 1 ⋯ superscript 𝒜 𝐾 superscript 𝐶 𝐾 \mathcal{A}^{1}\circ C^{1}\cdots\circ\mathcal{A}^{K}\circ C^{K} , which is g u subscript 𝑔 𝑢 g_{u} . Therefore, M 𝑀 M is infection monotonic. Because g u subscript 𝑔 𝑢 g_{u} and g r subscript 𝑔 𝑟 g_{r} are non-decreasing, M 𝑀 M is also non-decreasing and hence score monotonic. ∎

The proof of Corollary 2 is demonstrated as follows.

The proof of Corollary 3 is illustrated as follows.

According to Theorem 1 , the GNN-based M ​ ( x , G ; θ ) 𝑀 𝑥 𝐺 𝜃 M(x,G;\theta) is monotonic. Then for any two x ( i ) > x ( j ) superscript 𝑥 𝑖 superscript 𝑥 𝑗 x^{(i)}>x^{(j)} , we have M ​ ( x ( i ) , G ; θ ) > M ​ ( x ( j ) , G ; θ ) 𝑀 superscript 𝑥 𝑖 𝐺 𝜃 𝑀 superscript 𝑥 𝑗 𝐺 𝜃 M(x^{(i)},G;\theta)>M(x^{(j)},G;\theta) . If the reconstruction error is minimized during the training of f ψ ​ ( ⋅ ) subscript 𝑓 𝜓 ⋅ f_{\psi}(\cdot) , we also have f ψ ​ ( z ( i ) ) > f ψ ​ ( z ( j ) ) subscript 𝑓 𝜓 superscript 𝑧 𝑖 subscript 𝑓 𝜓 superscript 𝑧 𝑗 f_{\psi}(z^{(i)})>f_{\psi}(z^{(j)}) . Hence, M ​ ( f ψ ​ ( z ( i ) ) , G ; θ ) > M ​ ( f ψ ​ ( z ( j ) ) , G ; θ ) 𝑀 subscript 𝑓 𝜓 superscript 𝑧 𝑖 𝐺 𝜃 𝑀 subscript 𝑓 𝜓 superscript 𝑧 𝑗 𝐺 𝜃 M(f_{\psi}(z^{(i)}),G;\theta)>M(f_{\psi}(z^{(j)}),G;\theta) also holds. ∎

The derivation of Equation 4.2 is shown as follows.

Since we assume the optimal y ~ = | V | ~ 𝑦 𝑉 \tilde{y}=|V| and the predicted y ∈ [ 0 , | V | ] 𝑦 0 𝑉 y\in[0,|V|] , the optimization target is to maximize the y 𝑦 y until it reaches the fully infected status. Therefore, the first term in Equation ( 4.2 ) is written as the Mean Squared Loss (MSE): ∥ y ~ − M ​ ( x , G ; θ ) ∥ 2 2 superscript subscript delimited-∥∥ ~ 𝑦 𝑀 𝑥 𝐺 𝜃 2 2 \lVert\tilde{y}-M(x,G;\theta)\rVert_{2}^{2} . For the second term in Equation ( 4.2 ), the value range of x 𝑥 x after the autoencoder is [ 0 , 1 ] 0 1 [0,1] , indicating the probability of each node being selected to the seed set. x ∈ [ 0 , 1 ] 𝑥 0 1 x\in[0,1] fits the binomial distribution so that minimizing the negative log-likelihood is equivalent to minimizing the probability mass function. Therefore, the second term of the above function can be written to − log [ ∏ i | V | f ψ ( z i ) x i ( 1 − f ψ ( z i ) 1 − x i ] -\log\big{[}\prod_{i}^{|V|}f_{\psi}(z_{i})^{x_{i}}(1-f_{\psi}(z_{i})^{1-x_{i}}\big{]} . Adding both terms gives us the final expression as shown in Equation ( 4.2 ):

Refer to caption

Appendix B More Experiment

The statistics of datasets are depicted in Table 1 , and we also provide a more detailed dataset description as follows. 1) Jazz (Rossi & Ahmed, 2015 ) . This dataset is a Jazz musicians collaboration network, where each node represents a musician and each edge represents two musicians who have played together in a band. 2) Cora-ML (McCallum et al., 2000 ) . This network contains computer science research papers, where each node represents a paper and each edge represents one paper cites the other one. 3) Power Grid (Rossi & Ahmed, 2015 ) . This is a topology network of the US Western States Power Grid. An edge represents a power supply line. A node is either a generator, a transformation, or a substation. 4) Network Science (Rossi & Ahmed, 2015 ) . This is a coauthorship network between scientists working on network theory, where nodes represent scientists and edges represent two scientists who have collaborated. 5) Digg (Panagopoulos et al., 2020 ) . A directed network of social media where users follow each other and a vote to a post allows followers to see the post. 6) Weibo (Panagopoulos et al., 2020 ) . A directed follower network where a cascade is defined by the first tweet and the list of retweets.

B.2 Hyperparameter Setting.

B.3 im under non-progressive diffusion model.

We demonstrate the performance of each model under the non-progressive SIS model. As shown in Table 5 , we can observe a vast number of performance reductions regarding the final influence spread compared to Table 2 and Table 3 , which is mainly due to the SIS diffusion model assumes the nodes can switch from activated to de-activated with a certain probability. Existing models all failed to capture the intrinsically more complicated diffusion dynamics. Nevertheless, DeepIM still exhibits better results and excels over others on average 10 10 10 % on all datasets under such circumstances. To sum up, by jointly learning the seed set representation and the end-to-end diffusion estimation model, DeepIM illustrates its robustness by the capability of adapting to various underlying diffusion patterns and producing a generally competitive and stable influence spread.

B.4 Case Study: Graph Diffusion Visualization

Finally, we conduct a case study to demonstrate the distribution of selected seed nodes as well as the final infection status of all nodes in Fig. 4 , where blue nodes indicate the initial seed node, red nodes indicate the infected node during the influence spread, and grey nodes represent uninfected nodes. We compare the influence spread result between different initial seed set sizes, namely 10 % percent 10 10\% and 20 % percent 20 20\% . Due to the ease of representation, we only visualize the result of the Jazz dataset because of its overall smaller graph size.

DeepIM demonstrates an overall better performance in terms of spreading the influence to a great extent. Notably, it can be visually seen from Fig. 4a and 4f that the final influence spread achieved by DeepIM with different initial seed set sizes is also very little, which means DeepIM can attain a better result with a lower cost comparing to others. The visualization demonstrates a consistent result with Table 2 .

ar5iv homepage

avatar

Welcome! I’m Chen Ling, a fourth-year Ph.D. student at Emory University , where I am fortunately working with Professor Liang Zhao . Prior to that, I obtained my M.S. Degree in Computer Science at the University of Delaware and B.S. Degree in Computer Science at the University of Vermont. I have several research experience in industries, including NEC Labs America , IBM Watson AI Lab , and iFlyTek .

My research encompasses Graph Mining, Natural Language Processing, and Applied Machine Learning. During the past few years, I am privileged to collaborate with amazing industry researchers from Amazon AWS, NEC Labs America, Microsoft Research, and IBM Research. My works have been published at top AI conferences, including ICML, KDD, ICLR, ACL, NAACL, EMNLP, TheWebConf (aka. WWW), ICDM, ECML-PKDD, etc.

I have been actively looking for research scientist or machine learning engineer full-time positions in Fall 2024 . Feel free to reach out if there are any opportunities!

Research Interests

  • Graph Data Mining: Graph Representation Learning, Generative Models on Graphs, Graph Inverse Problems
  • Machine Learning: Domain Generalization, Continual Learning
  • Natural Language Processing: Applications of Large Language Model, Neural Machine Reasoning
  • [May. 2024] Two papers are accepted by KDD 2024 (Cross-network Source Localization) and ACL 2024 (Knowledge Distillation)!
  • [Apr. 2024] Happy to be invited to present novel techniques on specializing LLMs in domain-specific applications at BlackRock Atlanta innovation hub (ATL iHub), slides can be found here .
  • [Mar. 2024] Our survey paper on Domain Specialization of LLMs is honorably mentioned by The 2024 Economic Report of the President from the White House. Thanks to all collaborators, stay tuned!
  • [Mar. 2024] One paper about LLM Uncertainty Decomposition is accepted by NAACL 2024.
  • [Feb. 2024] I will join AWS Security Analytics and AI Research team as an Applied Scientist Intern at NYC in summer 2024!
  • [Jan. 2024] Our paper about Influence Maximization on Multiplex Networks is accepted by AISTATS 2024.
  • [Oct. 2023] Our paper about commonsense reasoning is accepted by EMNLP 2023!
  • [Sep. 2023] Our LLM4Bio Workshop has been accepted by AAAI 2024 . Call for Papers is here!

Selected Publications

deep graph representation learning and optimization for influence maximization

Conference/Workshop Organizer

  • AAAI 2024: LLMs4Bio

Conference Reviewers/Program Committee

  • ACL Rolling Review (ARR) Regular Reviewer
  • International Conference on Learning Representations (ICLR) 2024
  • ACM International Conference on Web Search and Data Mining (WSDM) 2024
  • Neural Information Processing Systems (NeurIPS), 2022-2023
  • SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2021-2024

Journal Reviewers

  • IEEE Transactions on Big Data (TBD)
  • IEEE Transactions on Data Engineering (TKDE)
  • ACM Transactions on Knowledge Discovery from Data (TKDD)

Powered by Jekyll and Minimal Light theme.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

xiaofangxd/Influence-Maximization-Reinforcement-Learning

Folders and files, repository files navigation, a collection of main papers on influence maximization with deep learning, evolutionary algorithm, deep learning, reinforcement learning, miscellaneous.

If you have any questions, please feel free to contact us. Emails: [email protected]

Deep Graph Representation Learning and Optimization for Influence Maximization

Chen ling , junji jiang , junxiang wang , my thai , lukas xue , james song , meikang qiu , liang zhao, send feedback.

Enter your feedback below and we'll get back to you as soon as possible. To submit a bug report or feature request, you can use the official OpenReview GitHub repository: Report an issue

BibTeX Record

IMAGES

  1. Deep Graph Representation Learning and Optimization for Influence

    deep graph representation learning and optimization for influence maximization

  2. Introduction to Deep Learning for Graphs and Where It May Be Heading

    deep graph representation learning and optimization for influence maximization

  3. A Deep Latent Space Model for Directed Graph Representation Learning

    deep graph representation learning and optimization for influence maximization

  4. Graph Structure Learning for Robust Graph Neural Networks

    deep graph representation learning and optimization for influence maximization

  5. Illustration of graph representation learning input and output

    deep graph representation learning and optimization for influence maximization

  6. DeepWalk Based Influence Maximization (DWIM): Influence Maximization

    deep graph representation learning and optimization for influence maximization

VIDEO

  1. [tt8745] Text-Attributed Graph Representation Learning: Methods, Applications, and Challenges

  2. 【点论文】280 Improving Generalization in Federated Learning by Seeking Flat Minima

  3. Maxim Panov: Link Prediction with Graph Neural Networks

  4. 09 Gradient-Based Optimization in ML & Deep Learning

  5. KDD 2023

  6. [rfp0559] Graph Contrastive Learning with Kernel Dependence Maximization for Social Recommendation

COMMENTS

  1. Deep Graph Representation Learning and Optimization for Influence

    DeepIM is a learning-based method to select optimal seed sets for influence maximization in social networks. It learns the latent representation of seed sets and the diffusion pattern in a data-driven and end-to-end manner, and adapts to node-centrality-based budget constraints.

  2. Deep Graph Representation Learning and Optimization for Influence

    Influence Maximization (IM) was first formulated as a com-binatorial optimization problem by Kempe et al. (2003), which has inspired extensive research and applications in the next decade. Most of the traditional (i.e., non-learning-based) IM methods can be categorized as simulation-based, proxy-based, and heuristic-based.

  3. Deep graph representation learning and optimization for influence

    Deep graph representation learning and optimization for influence maximization. Pages 21350-21361. Previous Chapter Next Chapter. ABSTRACT. Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users.

  4. Deep Graph Representation Learning and Optimization for Influence

    A novel framework DeepIM is designed to generatively characterize the latent representation of seed sets, and a novel objective function to infer optimal seed sets under flexible node-centrality-based budget constraints is proposed. Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users.

  5. Deep Graph Representation Learning and Optimization for Influence

    Deep Graph Representation Learning and Optimization f or Influence Maximization Chen Ling * 1 Junji Jiang * 2 Junxiang Wang 3 My Thai 4 Lukas Xue 1 James Song 1 Meikang Qiu 5

  6. PDF Deep Graph Representation Learning and Optimization for Influence

    §Influence Maximization (IM) aims at selecting a subset of users to maximize the spread of information in the network. §Learning-based IM solutions improve their solution generalization ability but also bring: §Effectively and efficiently optimizing the objective function. §Automatically identifying and modeling the actual diffusion process.

  7. Deep Graph Representation Learning and Optimization for Influence

    Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users. Researchers have made great progress in designing various traditional methods, and their theoretical design and performance gain are close to a limit. In the past few years, learning-based IM ...

  8. Deep Graph Representation Learning and Optimization for Influence

    Figure 4. The visualization of influence spread in Jazz dataset: The size of nodes is determined by the node degree, and the color on nodes determines the infection status: blue means the node is in seed set, red means the node is infected, and grey means the node is not infected. - "Deep Graph Representation Learning and Optimization for Influence Maximization"

  9. Deep Graph Representation Learning Influence Maximization with ...

    The development of learning-based IM methods is still constrained by a number of fundamental hardships, including 1) solving the objective function efficiently, 2) struggling to characterize the diverse underlying diffusion patterns, and 3) adapting the solution to different node-centrality-constrained IM variants.To address the aforementioned ...

  10. PDF Deep Graph Representation Learning and Optimization for Influence

    Deep Graph Representation Learning and Optimization for Influence Maximization and neighbor nodes' information aggregation. For a K-layer GNN, a node aggregates information within K-hop neighbors. Specifically, thek-th layer transformation is: ak= Ak(hk−1;θk),hk= Ck(ak;θk),∀1 ≤k≤K. (1) where a kis an aggregated feature, and h is the ...

  11. Deep Graph Representation Learning and Optimization for Influence

    DeepIM is a learning-based method to select optimal seed sets for influence maximization in social networks. It learns the latent representation of seed sets and the diversified diffusion pattern in an end-to-end manner, and adapts to node-centrality-based budget constraints.

  12. Deep Graph Representation Learning and Optimization for Influence

    Influence Maximization (IM) was first formulated as a combinatorial optimization problem by Kempe et al. (), which has inspired extensive research and applications in the next decade.Most of the traditional (i.e., non-learning-based) IM methods can be categorized as simulation-based, proxy-based, and heuristic-based.

  13. A Survey on Influence Maximization: From an ML-Based Combinatorial

    Deep graph representation learning and optimization for influence maximization. In International Conference on Machine Learning (ICML'23). Google Scholar [134] Liu Jing, Chen Yudi, Li Duanshun, Park Noseong, Lee Kisung, and Lee Dongwon. 2019. Predicting influence probabilities using graph convolutional networks.

  14. Deep Graph Representation Learning and Optimization for Influence

    Deep Graph Representation Learning and Optimization for Influence Maximization; Citation Details; Deep Graph Representation Learning and Optimization for Influence Maximization. Award ID(s): 1908594 NSF-PAR ID: 10462547 Author(s) / Creator(s):

  15. Chen Ling

    Deep Graph Representation Learning and Optimization for Influence Maximization. Chen Ling, Junji Jiang, Junxiang Wang, My Thai, Lukas Xue, James Song, Meikang Qiu, Liang Zhao. Fortieth International Conference on Machine Learning (ICML), 2023. PDF Code BibTex Slides. ICLR'23.

  16. ‪Junxiang Wang‬

    2019. Deep graph representation learning and optimization for influence maximization. C Ling, J Jiang, J Wang, MT Thai, R Xue, J Song, M Qiu, L Zhao. International Conference on Machine Learning (ICML), 21350-21361. , 2023. 54. 2023. Distant-supervision of heterogeneous multitask learning for social event forecasting with multilingual indicators.

  17. xiaofangxd/Influence-Maximization-Reinforcement-Learning

    A Survey on Influence Maximization: From an ML-Based Combinatorial Optimization: ACM TKDD: 2023: Li et al. Deep Graph Representation Learning and Optimization for Influence Maximization: ICML: 2023: Ling et al. Influence maximization in social networks using transfer learning via graph-based LSTM: ESWA: 2023: Kumar et al.

  18. Deep Graph Representation Learning and Optimization for Influence

    Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users. Researchers have made great progress in designing various traditional methods, and their theoretical design and performance gain are close to a limit.

  19. Deep Graph Representation Learning and Optimization for Influence

    DOI: 10.48550/arXiv.2305.02200 Corpus ID: 258461396; Deep Graph Representation Learning and Optimization for Influence Maximization @article{Ling2023DeepGR, title={Deep Graph Representation Learning and Optimization for Influence Maximization}, author={Chen Ling and Junji Jiang and Junxiang Wang and My T. Thai and Lukas Xue and James Song and Meikang Qiu and Liang Zhao}, journal={ArXiv}, year ...

  20. Disentangled Self-Attention with Auto-Regressive Contrastive Learning

    Parallel to the ascent of deep learning paradigms and attention mechanisms, a number of innovative approaches [17,18,19] have emerged, incorporating attention-based modules within user-item bipartite graphs to effectively harness user interactions for the formulation of group preference representations. The recent discourse has progressively ...

  21. Deep Graph Representation Learning and Optimization for Influence

    Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users. Researchers have made great progresses to design various traditional methods, yet both theoretical design and performance gain are close to their limits. In the past few years, learning-based IM methods have emerged to achieve stronger ...

  22. Energy-Efficient Virtual Network Embedding: A Deep Reinforcement ...

    Network virtualization (NV) technology is the cornerstone of modern network architectures, offering significant advantages in resource utilization, flexibility, security, and streamlined management. By enabling the deployment of multiple virtual network requests (VNRs) within a single base network through virtual network embedding (VNE), NV technology can substantially reduce the operational ...

  23. Deep Graph Representation Learning and Optimization for Influence

    Deep Graph Representation Learning and Optimization for Influence Maximization. (a) DeepIM-20% (b) OIM-20% (c) OPIM-20% (d) SubSIM-20% (e) ToupleGDD-20%. Figure 3. The visualization of influence ...