carl: content-aware representation learning for ... · skipgram and deep semantic encoding. •we...

9
CARL: Content-Aware Representation Learning for Heterogeneous Networks Chuxu Zhang University of Notre Dame [email protected] Ananthram Swami Army Research Laboratory [email protected] Nitesh V. Chawla University of Notre Dame [email protected] ABSTRACT Heterogeneous networks not only present a challenge of hetero- geneity in the types of nodes and relations, but also the attributes and content associated with the nodes. While recent works have looked at representation learning on homogeneous and heteroge- neous networks, there is no work that has collectively addressed the following challenges: (a) the heterogeneous structural infor- mation of the network consisting of multiple types of nodes and relations; (b) the unstructured semantic content (e.g., text) associ- ated with nodes; and (c) online updates due to incoming new nodes in growing network. We address these challenges by developing a Content-Aware Representation Learning model (CARL). CARL performs joint optimization of heterogeneous SkipGram and deep semantic encoding for capturing both heterogeneous structural closeness and unstructured semantic relations among all nodes, as function of node content, that exist in the network. Furthermore, an additional online update module is proposed for efficiently learning representations of incoming nodes. Extensive experiments demon- strate that CARL outperforms state-of-the-art baselines in various heterogeneous network mining tasks, such as link prediction, doc- ument retrieval, node recommendation and relevance search. We also demonstrate the effectiveness of the CARL’s online update module through a category visualization study. KEYWORDS Heterogeneous Information Networks, Representation Learning, Network Embedding 1 INTRODUCTION Heterogeneous information networks (HetNets) [16, 17], e.g., aca- demic networks, encode rich information through multi-typed nodes, relationships, and attributes or content associated with nodes. For example, the academic networks can represent human- human relationship (authors), human-object relationship (author- paper or author-venue or author-organization), and object-object relationship (paper-paper, paper-venue, paper-organization). The nodes in this case (human and object) can carry attributes or se- mantic content (such as paper abstract). Given the multi-typed nodes, relationships, and content at the nodes, feature engineering Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. XXX, XXX, XXX © 2018 Association for Computing Machinery. ACM ISBN 123-4567-24-567/08/06. . . $15.00 https://doi.org/10.475/123_4 A 1 A 2 A 3 A 4 author paper P 1 P 2 P 3 P 4 venue V 1 V 2 V 3 academic network (<T) ( ) ( ) ( ) ( ) A 1 A 2 A 3 A 4 author paper P 1 P 2 P 3 P 4 P 5 venue V 1 V 2 V 3 A 5 ( ) ( ) ( ) ( ) ( ) new academic network (>=T) (a) (b) time split T Figure 1: An Illustrative example of challenges in content- aware heterogeneous network representation learning. has presented a unique challenge for network mining tasks such as relation mining [15, 23], relevance search [7, 16], personalized recommendation [9, 14, 27]. The typical feature engineering ac- tivity responds to the requirements of the network mining task for an application, requiring both a domain understanding and large exploratory search space for possible features. Not only this is expensive, but it also may not result in optimal performance. To that end, we ask the question: Can we generalize the feature engineering activity through representation learning on HetNets that addresses the complexity of multi-typed data in HetNets? With the advent of deep learning, significant effort has been devoted to network representation learning in the last few years, starting with a focus on homogeneous networks [5, 12, 20] and more recently on HetNets [4]. The underlying theme of the models developed in these works is to automate the discovery of useful node latent features that can be further utilized in various network mining problems such as link prediction and node recommendation. However, these methods are limited in truly addressing the challenges of HetNets: (C1) HetNets include multiple types of nodes and relations. For example, in Figure 1(a), an academic network involves three types of nodes, i.e., author, paper and venue, which are connected by three types of relations, i.e., author-write-paper, paper-cite- paper and paper-publish-venue. Most of the previous models (e.g., Deepwalk and node2vec) employ homogeneous language mod- els which make application to HetNets difficult. Thus challenge 1 is: how to extend homogeneous language model to heteroge- neous network representation learning for maintaining struc- tural closeness among multiple types of nodes and relations? We build on our prior work metapath2vec [4] for this. (C2) HetNets include both structural content (e.g., node type and relation connection) and unstructured semantic content (e.g., text). For example, in Figure 1(a), paper in academic network connects to author & venue and contains semantic text. The current models purely depend on structural content yet can not leverage unstructured content to infer semantic relations that are far away in network. To be more specific (as we will show in Section 4.5), given query author “Jure Leskovec”, con- ventional techniques (e.g., node2vec and metapath2vec) tend to arXiv:1805.04983v1 [cs.SI] 14 May 2018

Upload: others

Post on 26-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CARL: Content-Aware Representation Learning for ... · SkipGram and deep semantic encoding. •We design the corresponding optimization strategy and train-ing algorithm to effectively

CARL: Content-Aware Representation Learning forHeterogeneous Networks

Chuxu ZhangUniversity of Notre Dame

[email protected]

Ananthram SwamiArmy Research Laboratory

[email protected]

Nitesh V. ChawlaUniversity of Notre Dame

[email protected]

ABSTRACTHeterogeneous networks not only present a challenge of hetero-geneity in the types of nodes and relations, but also the attributesand content associated with the nodes. While recent works havelooked at representation learning on homogeneous and heteroge-neous networks, there is no work that has collectively addressedthe following challenges: (a) the heterogeneous structural infor-mation of the network consisting of multiple types of nodes andrelations; (b) the unstructured semantic content (e.g., text) associ-ated with nodes; and (c) online updates due to incoming new nodesin growing network. We address these challenges by developinga Content-Aware Representation Learning model (CARL). CARLperforms joint optimization of heterogeneous SkipGram and deepsemantic encoding for capturing both heterogeneous structuralcloseness and unstructured semantic relations among all nodes, asfunction of node content, that exist in the network. Furthermore, anadditional online update module is proposed for efficiently learningrepresentations of incoming nodes. Extensive experiments demon-strate that CARL outperforms state-of-the-art baselines in variousheterogeneous network mining tasks, such as link prediction, doc-ument retrieval, node recommendation and relevance search. Wealso demonstrate the effectiveness of the CARL’s online updatemodule through a category visualization study.

KEYWORDSHeterogeneous Information Networks, Representation Learning,Network Embedding

1 INTRODUCTIONHeterogeneous information networks (HetNets) [16, 17], e.g., aca-demic networks, encode rich information through multi-typednodes, relationships, and attributes or content associated withnodes. For example, the academic networks can represent human-human relationship (authors), human-object relationship (author-paper or author-venue or author-organization), and object-objectrelationship (paper-paper, paper-venue, paper-organization). Thenodes in this case (human and object) can carry attributes or se-mantic content (such as paper abstract). Given the multi-typednodes, relationships, and content at the nodes, feature engineering

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected], XXX, XXX© 2018 Association for Computing Machinery.ACM ISBN 123-4567-24-567/08/06. . . $15.00https://doi.org/10.475/123_4

A1

A2

A3

A4

author paperP1P2P3P4

venue

V1

V2

V3

academic network (<T)

( )( )

( )( )

A1

A2

A3

A4

author paperP1P2P3P4P5

venue

V1

V2

V3

A5

( )( )

( )( )

( )

new academic network (>=T)

(a) (b)

time split T

Figure 1: An Illustrative example of challenges in content-aware heterogeneous network representation learning.

has presented a unique challenge for network mining tasks suchas relation mining [15, 23], relevance search [7, 16], personalizedrecommendation [9, 14, 27]. The typical feature engineering ac-tivity responds to the requirements of the network mining taskfor an application, requiring both a domain understanding andlarge exploratory search space for possible features. Not only thisis expensive, but it also may not result in optimal performance.

To that end, we ask the question: Can we generalize the featureengineering activity through representation learning on HetNets thataddresses the complexity of multi-typed data in HetNets?With theadvent of deep learning, significant effort has been devoted tonetwork representation learning in the last few years, starting witha focus on homogeneous networks [5, 12, 20] and more recently onHetNets [4]. The underlying theme of the models developed in theseworks is to automate the discovery of useful node latent featuresthat can be further utilized in various network mining problemssuch as link prediction and node recommendation. However, thesemethods are limited in truly addressing the challenges of HetNets:• (C1) HetNets include multiple types of nodes and relations. Forexample, in Figure 1(a), an academic network involves threetypes of nodes, i.e., author, paper and venue, which are connectedby three types of relations, i.e., author-write-paper, paper-cite-paper and paper-publish-venue. Most of the previous models (e.g.,Deepwalk and node2vec) employ homogeneous language mod-els which make application to HetNets difficult. Thus challenge1 is: how to extend homogeneous language model to heteroge-neous network representation learning for maintaining struc-tural closeness among multiple types of nodes and relations?We build on our prior work metapath2vec [4] for this.

• (C2) HetNets include both structural content (e.g., node type andrelation connection) and unstructured semantic content (e.g.,text). For example, in Figure 1(a), paper in academic networkconnects to author & venue and contains semantic text. Thecurrent models purely depend on structural content yet cannot leverage unstructured content to infer semantic relationsthat are far away in network. To be more specific (as we willshow in Section 4.5), given query author “Jure Leskovec”, con-ventional techniques (e.g., node2vec and metapath2vec) tend to

arX

iv:1

805.

0498

3v1

[cs

.SI]

14

May

201

8

Page 2: CARL: Content-Aware Representation Learning for ... · SkipGram and deep semantic encoding. •We design the corresponding optimization strategy and train-ing algorithm to effectively

XXX, XXX, XXX Chuxu Zhang, Ananthram Swami, and Nitesh V. Chawla

return authors who collaborated with “Jure” due to structuralrelations bridged by paper or return authors who are differentfrom “Jure” in specific research interests due to structural rela-tions bridged by venue. Thus challenge 2 is: how to effectivelyincorporate unstructured content of nodes into a representationlearning framework for capturing both structural closeness andunstructured semantic relations among all nodes?

• (C3) HetNets can grow with time. Current models are not able tohandle this due to lack of an update strategy and it is impracticalto re-run the model for each new node. For example, in Figure1(b), new authorA5 co-authors withA1 andA4 on new paper P5after a given time split T. Thus challenge 3 is: how to efficientlylearn representations of new nodes in a growing network?

Our proposed method CARL, a content-aware representationlearning model for HetNets, addresses these challenges. Specifically,first, we develop a heterogeneous SkipGram model to maintainstructural closeness among multiple types of nodes and relations.Next, we design two effective ways based on deep semantic encod-ing to incorporate unstructured content (i.e., text) of some typesof nodes into heterogeneous SkipGram for capturing semantic re-lations. The negative sampling technique and the walk samplingbased strategy are utilized to optimize and train the proposed mod-els. Finally, we develop an online update module to efficiently learnrepresentation of each new node by using its relations with existingnodes and the learned node representations.

To summarize, the main contributions of our work are:

• We formalize the problem of content-aware representation learn-ing in HetNets and develop a model, i.e., CARL, to solve theproblem. CARL performs joint optimization of heterogeneousSkipGram and deep semantic encoding.

• We design the corresponding optimization strategy and train-ing algorithm to effectively learn node representations. Theoutput representations are further utilized in various HetNetmining tasks, such as link prediction, document retrieval, noderecommendation and relevance search, which demonstrate thesuperior performance of CARL over state-of-the-art baselines.

• We propose an update module in CARL to handle growing net-works and conduct the category visualization study to show theeffectiveness of this module.

2 PROBLEM DEFINITIONWe first introduce the concepts of HetNets and random & meta-path walks, then formally define the problem of content-awarerepresentation learning in HetNets.

Definition 2.1. (Heterogeneous Networks) A heterogeneousnetwork [17] is defined as a network G = (V ,E,OV ,RE ) with mul-tiple types of nodes V and links E. OV and RE represent the sets ofobject types and relation types. Each nodev ∈ V and each link e ∈ Eis associated with a node type mapping functionψv : V → OV anda link type mapping functionψe : E → RE .

For example, in Figure 2(b), the academic network can be seen asa HetNet. The set of node types OV includes author (A), paper (P)and venue (V). The set of link types RE includes author-write-paper,paper-cite-paper and paper-publish-venue.

Definition 2.2. (RandomWalk) A random walk [5] is defined asa node sequence Sv0 = {v0,v1,v2, ...,vL−1} wherein the i-th nodevi−1 in the walk is randomly selected from the neighbors of itspredecessor vi−2.

Definition 2.3. (Meta-path Walk) A meta-path walk [4] in Het-Net is defined as a random walk guided by a specific meta-pathscheme with the form of P ≡ o1

r1→ o2r2→ · · · rm−1→ om , where

oi ∈ OV , ri ∈ RE and r = r1 ∗ r2 · · · ∗ rm−1 represents a composi-tional relation between relation types r1 and rm . Each meta-pathwalk recursively samples a specific P until it meets the given length.

Figure 2(b) shows examples of random walk and “APVPA” meta-path walk in the academic network.

Definition 2.4. (Content-Aware Representation Learning inHeterogeneous Networks) Given a HetNet with both structuraland unstructured content at each node, the task is to design a modelto learn a d-dimensional feature representations θ ∈ R |V |×d (d ≪|V |), which can encode both structural closeness and unstructuredsemantic relations. Furthermore, the model is able to efficientlyinfer representation θ ′v ′ ∈ Rd for each new node v ′ by using thelearned representations θ .

For example, in the network of Figure 1, author and venue nodescontain structural content, i.e., node id, node type as well as linkrelations with others, and paper node contains both structuralcontent and unstructured semantic content, e.g., abstract text. Theoutput θ denotes representations of all existing nodes via the samelatent space, which can be further utilized in various HetNet miningtasks. Besides, the learned representation θ ′v ′ of each new node v ′can benefit different tasks for v ′ such as category assignment.

3 CARL FRAMEWORKWepresent the framework of content-aware representation learningwhich will address the three challenges described in Section 1.

3.1 Heterogeneous Network Embedding (C1)Inspired by word2vec [10] for learning distributed representationof words in corpus, Deepwalk [12] and node2vec [5] leverage Skip-Gram and random walks to learn node representations. However,we argued that those techniques focus on homogeneous networksand proposed metapath2vec [4] for HetNets by feeding meta-pathwalks to SkipGram. Similar to metapath2vec, we formulate theheterogeneous network representation learning as heterogeneousSkipGram (HSG) to address challenge C1. Specifically, given a Het-Net G = (V ,E,OV ,RE ), the objective is to maximize the likelihoodof each type of context node given the input node v :

o1 = argmaxθ

∏v ∈V

∏t ∈OV

∏vc ∈Nt (v)

p(vc |v ;θ ) (1)

where θ contains the representations of all nodes and Nt (v) is theset of t-type context node of v which can be collected in differentways such as one-hop neighbors or surrounding neighbors in ran-dom walks. For example, in Figure 2(b), A3 is structurally close toother authors (e.g., A1 & A4), papers (e.g., P2 & P4) and venues (e.g.,V1 & V3). Thus objective o1 is able to maintain structural closenessamong multiple types of nodes and relations in G.

Page 3: CARL: Content-Aware Representation Learning for ... · SkipGram and deep semantic encoding. •We design the corresponding optimization strategy and train-ing algorithm to effectively

CARL: Content-Aware Representation Learning for Heterogeneous Networks XXX, XXX, XXX

Yv

tokenizer

pre-train word

embedding

x1

x2

x3

x4

xtmax

GRU

GRU

GRU

GRU

GRU

h1

h2

h3

h4

htmax

meanpooling

f(Yv)

deep semantic encoder f(a)

w1

w2

w3

w4

wtmax

A1

A2

A3

A4

author paper

P1P2P3P4

venue

V1

V2

V3

(b)

walksample

A1

P1

V1

P2

A3

P4

V3

representationmapping

V representations

A representations

P representations

deep semantic encoding

HSG-SR

SE-HSGV representations

A representations

P representations

P semanticrepresentationsacademic network

SkipGram

SkipGram

GaussianD

softmaxlayer

cross-entropyoptimization

Adamoptimizer

random walkA1 P1 P2 A3 P4 V3meta-path walk (APVPA)

A1 P1 V1 P2 A3 P4 V3 P3 A2

(C)

P semanticrepresentations

Figure 2: An illustrative example of CARL in the academic network: (a) paper semantic encoder based on gated recurrentneural network; (b) academic network and random & meta-path walks; (c) framework of the proposed models.

3.2 Incorporating Deep Semantic Encoder (C2)The objective o1 formulates structural closeness but ignores un-structured semantic relations. To address challenge C2, we designtwo ways to incorporate unstructured content of some types ofnodes into heterogeneous network representation learning.

3.2.1 HSG with Unstructured Semantic Regularization (HSG-SR).One way is to tightly join HSG with the conditional probability ofsemantic constraint, leading to unstructured semantic regulariza-tion onto objective o1. The objective is defined as:

o2 = argmaxθ,Φ

∏v ∈V

∏t ∈OV

∏vc ∈Nt (v)

p(vc |v ;θ )∏v ∈VS

p(θv |Yv ;Φ) (2)

where VS is the set of nodes with unstructured semantic content,Yv represents unstructured content of nodev and Φ are parametersof a deep semantic encoder that will be described later. The condi-tional probabilityp(vc |v ;θ ) is defined as the heterogeneous softmaxfunction: p(vc |v ;θ ) = eθvc ·θv∑

vk ∈Vt eθvk ·θv , where Vt is the set of t-type

nodes. Besides, we model the conditional probability p(θv |Yv ;Φ)as Gaussian prior: p(θv |Yv ;Φ) = N (θv |Ev ,σ 2I ), where Ev denotesv’s semantic representation encoded by deep learning architecturef : Ev = f (Yv ). For example, in the network of Figure 2, VS ino2 is the set of papers and the formulation involves four kinds ofrepresentations, i.e., author representations, venue representations,paper representations and paper semantic representations. Noticethat, there are two kinds of paper representations and we will usepaper semantic representation for evaluation in Section 4.

3.2.2 Unstructured Semantic Enhanced HSG (SE-HSG). Anotherway is to concatenate the output of deep semantic encoder withthe input of HSG, leading to unstructured semantic enhancementonto objective o1. The objective is defined as:

o3 = argmaxθ,Φ

∏v ∈V

∏t ∈OV

∏vc ∈Nt (v)

p(vc |v ;θ ;Y ;Φ) (3)

where Y is the set of all unstructured semantic content of VS . Theconditional probability p(vc |v;θ ;Y ;Φ) is defined as the seman-tic enhanced heterogeneous softmax function: p(vc |v;θ ;Y ;Φ) =

eΘvc ·Θv∑vk ∈Vt e

Θvk ·Θv , where Θ denotes the enhanced representations.

That is,Θv = Ev = f (Yv ) forv ∈ VS otherwise Θv = θv . For exam-ple, in the network of Figure 2, Y in o3 is text content of all papers

and the formulation involves three kinds of representations, i.e.,author representations, venue representations and paper semanticrepresentations. Notice that, θ in o3 only denotes representationsof nodes without unstructured content (e.g., author and venue inacademic network), which is a bit different from o2.

3.2.3 Unstructured Semantic Content Encoder. Both objectiveso2 and o3 involve deep semantic encoding architecture. To encodeunstructured content of some types of nodes into fixed length repre-sentations E ∈ R |VS |×d , we introduce gated recurrent units (GRU),a specific type of recurrent neural network, which has been widelyadopted for many applications such as machine translation [3]. Fig-ure 2(a) gives an illustrative example of this encoder for papers inthe academic network. To be more specific, each paper’s abstract isrepresented as a sequence of words: {w1,w2, · · · ,wtmax }, followedby the word embeddings sequence: {x1, x2, · · · , xtmax } trained byword2vec [10], where tmax is the maximum length of text. Foreach step t with the input word embedding xt and previous hiddenstate vector ht−1, the current hidden state vector ht is updated byht = GRU(xt , ht−1), where the GRU module is defined as:

zt = σ (Azxt + Bzht−1)rt = σ (Ar xt + Brht−1)ht = tanh[Ahxt + Bh (rt ◦ ht−1)]ht = zt ◦ ht−1 + (1 − zt ) ◦ ht

(4)

where σ is the sigmoid function, A and B are parameter matricesof GRU network (i.e., Φ in objectives o2 and o3 includes A andB), operator ◦ denotes element-wise multiplication, zt and rt areupdate gate vector and reset gate vector, respectively. The GRUnetwork encodes word embeddings to deep semantic embeddingsh ∈ Rtmax×d , which is concatenated with a mean pooling layer toobtain the general semantic representation of paper. All of thesesteps construct the deep semantic encoder f . We have also exploredother encoding architectures such as LSTM, bidirectional GRU andattention-based GRU, and obtain similar results. Thus we chooseGRU since it has a concise structure and reduce training time.

3.3 Model Optimization and TrainingWe leverage the negative sampling technique [10] to optimizemodeland introduce the walk sampling based strategy for model training.

Page 4: CARL: Content-Aware Representation Learning for ... · SkipGram and deep semantic encoding. •We design the corresponding optimization strategy and train-ing algorithm to effectively

XXX, XXX, XXX Chuxu Zhang, Ananthram Swami, and Nitesh V. Chawla

3.3.1 Optimization of HSG-SR. By applying negative samplingto the construction of softmax function, we can approximate thelogarithm of p(vc |v ;θ ) in objective o2 as:

logσ (θvc · θv ) +M∑

m=1Evc′∼Pt (vc′ ) logσ (−θvc′ · θv ) (5)

whereM is the negative sample size and Pt (vc ′) is the pre-definedsampling distribution w.r.t. the t-type node. In our case,M makeslittle impact on the performance of proposed models. Thus we setM = 1 and obtain the cross entropy loss for optimization:

logp(vc |v ;θ ) = logσ (θvc · θv ) + logσ (−θvc′ · θv ) (6)

That is, for each context node vc of v , we sample a negative nodevc ′ according to Pt (vc ′). Besides, as p(θv |Yv ;Φ) = N (θv |Ev ,σ 2I ),the logarithm of p(θv |Yv ;Φ) in objective o2 is equivalent to:

logp(θv |Yv ;Φ) = −[θv − Ev

]T [θv − Ev

](7)

where Ev = f (Yv ) and f is the deep semantic encoder. Thereforewe rewrite objective o2 as:

o2 =∑

⟨v,vc ,vc′ ⟩∈Twalk

{logσ (θvc · θv ) + logσ (−θvc′ · θv )

− γ∑

v∗∈T Str i

[θv∗ − f (Yv∗ )

]T [θv∗ − f (Yv∗ )

]} (8)

where γ is a trade-off factor and Twalk denotes the set of triplets⟨v,vc ,vc ′⟩ collected by walk sampling on HetNet, which will bedescribed later. Besides, T S

tr i is the set of nodes with unstructuredcontent in each triplet of Twalk for semantic regularization. Thatis,max{|T S

tr i |} = 3.

3.3.2 Optimization of SE-HSG. Similar to HSG-SR, the loga-rithm of p(vc |v ;θ ;Y ;Φ) in objective o3 is approximated by:

log p(vc |v ;θ ;Y ;Φ) = logσ (Θvc · Θv ) + logσ (−Θvc′ · Θv ) (9)

where Θv = Ev = f (Yv ) for nodes with unstructured contentotherwise Θv = θv . Therefore we rewrite objective o3 as:

o3 =∑

⟨v,vc ,vc′ ⟩∈Twalk

logσ (Θvc · Θv ) + logσ (−Θvc′ · Θv ) (10)

As in HSG-SR,Twalk is the set of triplets ⟨v,vc ,vc ′⟩ collected fromwalk sequences on HetNet.

3.3.3 Model Training. Both optimized objectives o2 and o3 areaccumulated on set Twalk . Similar to Deepwalk, node2vec andmetapath2vec, we design a walk sampling strategy to generateTwalk . Specifically, first, we uniformly generate a set of randomwalks or meta-path walks S in HetNet. Then, for each node v inSi ∈ S , we collect context node vc which satisfies: dist(v,vc ) ≤ τ .That is,v’s neighbors within distance τ in Si . For example, in Figure2(c), the context node of A3 (τ = 2) in the sample walk are V1, P2,P4 and V3. Finally, for each vc , we sample a negative node vc ′ withthe same node type of vc according to Pt (vc ′) ∝ dд

3/4vc′ , where

dдvc′ is the frequency of vc ′ in S . Furthermore, we design a mini-batch based Adam Optimizer [8] to train the model. Specifically,at each iteration, we sample a mini-batch of triplets in Twalk andaccumulate the objective according to equation (8) or (10), thenupdate the parameters via Adam. We repeat the training iterations

A1

A2

A3

A4

author paperP1P2P3P4P5

venue

V1

V2

V3

A5

walksample

A5

P5

V3

P3

A2

new A representation

learned V representations

learned P semanticrepresentations

learned A representations

OnlineUpdate

new A representation

new academic network

representationmapping

SGDoptimizer

cross-entropyoptimization

Figure 3: An Illustrative example of online update for newauthor in academic network.

until the change between two consecutive iterations is sufficientlysmall. Figure 2(c) shows an illustration of the framework of HSG-SRand SE-HSG on the academic network. The output representationsθ and E can be utilized in various HetNet mining tasks, as we willshow in Section 4.

3.4 Online Update for New Nodes (C3)The optimization and training strategies in previous section learnrepresentations of all existing nodes but they cannot be employedin an online situation. Considering the growing property of Het-Nets, we aim to design an online update module to efficiently learnrepresentation of each new node and address challenge C3.

3.4.1 Representation of New Node with Unstructured SemanticContent. As we introduce deep architecture f to encode semanticrepresentations of nodes with unstructured content, the optimizedparametersΦ∗ can be directly applied to infer representation of suchnew node without an extra training step. That is, Ev ′ = f ∗(Yv ′)for new node v ′, where Yv ′ is the text content of v ′ and f ∗ is thelearned semantic encoder with Φ∗. For example, in the academicnetwork, we can use the learned paper semantic encoder to inferthe representation of each new paper.

3.4.2 Representation of NewNode without Unstructured SemanticContent. Inspired by eALS [6] for online recommendation, we makea reasonable assumption that each new node should not change thelearned node representations too much from a global perspective.Accordingly, for each incoming node v ′ without unstructured con-tent, we leverage objective o3 to formulate the following objective:

o4 =∑

⟨v ′,vc ,vc′ ⟩∈Tv′

walk

logσ (Θ∗vc · θ

′v ′) + logσ (−Θ∗

vc′ · θ′v ′) (11)

where θ ′ denotes representation ofv ′, Θ∗ is the learned node repre-sentations via SE-HSG, i.e.,Θ∗

v = E∗v = f ∗(Yv ) forv ∈ VS otherwiseΘ∗v = θ

∗v . That is, we take Θ∗ as constant values in o4. In addition,

we leverage meta-path walks rooted atv ′ to collect triplet setTv′

walkfor the training process of update module. Specifically, Figure 3gives an illustrative example of online update for new author inacademic network. First, we randomly sample a number of “APVPA”meta-path walks rooted at new author node A5. The length of eachwalk equals the window distance τ (τ = 4) and nodes in the walkare context node ofA5, e.g., P5,V3, P3 andA2. For each context nodevc , we randomly sample a negative node vc ′ with the same nodetype as vc . Furthermore, the SGD optimizer is utilized to repeatlyupdate θ ′ for each triplet in Tv

walk until the change between two

Page 5: CARL: Content-Aware Representation Learning for ... · SkipGram and deep semantic encoding. •We design the corresponding optimization strategy and train-ing algorithm to effectively

CARL: Content-Aware Representation Learning for Heterogeneous Networks XXX, XXX, XXX

consecutive iterations is sufficiently small. The proposed updatemodule is efficient since only simple meta-path walk sampling andlimited update steps on new node representation are performed.In the experimental evaluation, we only consider new author whowrites new paper at a venue that exists in network, making the“APVPA” meta-path walk always feasible.

4 EXPERIMENTSIn this section, we conduct extensive experiments with the aim ofanswering the following research questions:• (RQ1) How does CARL perform vs. state-of-the-art network rep-resentation learning models for different HetNet mining tasks,such as link prediction (RQ1-1), document retrieval (RQ1-2)and node recommendation (RQ1-3)? In addition, how do hyper-parameters impact CARL’s performance in each task?

• (RQ2) What is the performance difference between CARL andbaselines in relevance search w.r.t. each task in RQ1?

• (RQ3) What is the performance of CARL’s online update moduleon the task for new nodes such as category assignment?Notice that, although our model can be applied to or modified

for different HetNets, we focus on experiments on the academicHetNet due to data availability.

4.1 Experimental Setup4.1.1 Data. We use the AMiner computer science dataset [21],

which is publicly available1. To avoid noise, we remove the paperspublished in venues (e.g., workshop) with limited publications andthe instances without abstract text. In addition, topic of each areachanges over time. For example, according to our analysis of thedata, the most popular topics in data mining change from web min-ing and clustering (1996∼2005) to network mining and learning(2006∼2015). To make a thorough evaluation of CARL and verify itseffectiveness for networks in different decades, we independentlyconduct experiments on two datasets, i.e., AMiner-I (1996∼2005)and AMiner-I I (2006∼2015). As a result, AMiner-I contains 160,713authors, 111,409 papers and 150 venues, AMiner-I I contains 571,693authors, 483,449 papers and 492 venues. The structure of the aca-demic network used in this work is shown in Figure 1.

4.1.2 Comparison Baselines. We compare CARL with four state-of-the-art models, i.e., Deepwalk [12], LINE [20], node2vec [5]and metapath2vec [4]. Notice that, we use either random walk(rw) or meta-path walk (mw) to collect context node in HSG-SRand SE-HSG, resulting in four variants of CARL: CARLrwHSG−SR ,CARLmw

HSG−SR , CARLrwSE−HSG and CARLmw

SE−HSG .

4.1.3 Reproducibility. For fairness comparison, we use the samerepresentation dimension d = 128 for all models. The window size τ= 7, the number of walks per node N = 10 and the walk length L =30 are used for Deepwalk, node2vec, metapath2vec and CARL. Thesame set of parameters is used for CARL’s online update module.The size of negative samplesM is set to 5 for node2vec, LINE andmetapath2vec. In addition, γ = 1.0 for CARLHSG−SR and threemeta-path schemes "APA", "APPA" and "APVPA" are jointly used togenerate meta-path walks for CARLmw . We employ TensorFlow

1https://aminer.org/citation

to implement all variants of CARL and further conduct them onNVIDIA TITAN X GPU. Code will be available upon publication.

4.2 Link Prediction (RQ1-1)Who will be your academic collaborators? As a response toRQ1-1, we design an experiment to evaluate CARL’s performanceon the author collaboration link prediction task.

4.2.1 Experimental Setting. Unlike past work [5] that randomlysamples a portion of links for training and uses the remaining forevaluation, we consider a more realistic setting that splits train-ing/test data via a given time stamp T. Specifically, first, the networkbefore T is utilized to learn node representations. Then, the collab-oration links before T are used to train a binary logistic classifier.Finally, the collaboration relations after T with equal number ofrandomnon-collaboration links are used to evaluate the trained clas-sifier. In addition, only new collaborations among current authors(who appear before T) are considered and duplicated collaborationsare removed from evaluation. For example, in Figure 1(b), A1 andA4 co-author a new paper P5 after T. The classifier tends to predictnew collaboration betweenA1 andA4 using previous links, e.g., col-laboration betweenA1 andA3. The representation of link is formedby element-wise multiplication between representations of two endnodes. We use Accuracy and F1 score of binary classification asthe evaluation metrics. Besides, T is set as 2003/2004 and 2013/2014for AMiner-I and AMiner-I I , respectively.

4.2.2 Results. The performances of different models are re-ported in Table 1. According to the table: (a) All variants of CARLperform better than baselines, demonstrating the effectiveness ofincorporating unstructured semantic content to learn author rep-resentations; (b) CARLmw

SE−HSG achieves the best performances inall cases. The average improvements of CARLmw

SE−HSG over dif-ferent baselines range from 10.9% to 41.0% and 6.7% to 30.9% onAMiner-I and AMiner-I I , respectively. (c) CARLmw outperformsCARLrw , showing that meta-path walk is better than random walkfor collecting context node in CARL. In addition, CARLSE−HSG hasbetter performance than CARLHSG−SR , indicating that concatenat-ing text encoder with heterogeneous SkipGram is more significantthan taking text encoding as semantic regularization.

4.2.3 Parameter Sensitivity. We conduct experiment to analyzethe impact of two key parameters, i.e., the window size τ of walksampling and the representation dimension d . We investigate aspecific parameter by changing its value and fixing the others.The prediction results of CARLmw

SE−HSG as a function of τ and don AMiner-I I (T = 2013) are shown in Figure 4. We see that: (a)With increasing of τ , accuracy and F1 score increase at first since alarger window means more useful context information. But whenτ goes beyond a certain value, performances decrease slowly withτ possibly due to uncorrelated noise. The best τ is around 7; (b)Similar to τ , an appropriate value should be set for d such that thebest node representations are learned. The optimal d is around 128.

4.3 Document Retrieval (RQ1-2)Which relevant papers should be retrieved for your query?As a response to RQ1-2, we design an experiment to evaluateCARL’s performance on the paper retrieval task.

Page 6: CARL: Content-Aware Representation Learning for ... · SkipGram and deep semantic encoding. •We design the corresponding optimization strategy and train-ing algorithm to effectively

XXX, XXX, XXX Chuxu Zhang, Ananthram Swami, and Nitesh V. Chawla

Table 1: Collaboration prediction results comparison.

AMiner-I T = 2004 T = 2003 GainAccuracy F1 Accuracy F1

Deepwalk 0.6341 0.4323 0.6244 0.4058 41.0%LINE 0.6722 0.5263 0.6714 0.5231 20.8%

node2vec 0.6758 0.5291 0.6821 0.5409 18.9%metapath2vec 0.7013* 0.5914* 0.7041* 0.5935* 10.9 %

CARLrwHSG−SR 0.7302 0.6561 0.7378 0.6623 –CARLmw

HSG−SR 0.7367 0.6618 0.7401 0.6635 –CARLrwSE−HSG 0.7388 0.6579 0.7419 0.6648 –CARLmw

SE−HSG 0.7482 0.6753 0.7525 0.6881 –

AMiner-I I T = 2014 T = 2013 GainAccuracy F1 Accuracy F1

Deepwalk 0.6559 0.5024 0.6487 0.4833 30.9%LINE 0.7034 0.6048 0.6956 0.5898 14.3%

node2vec 0.7136 0.6122 0.7066 0.5965 12.7%metapath2vec 0.7299* 0.6628* 0.7254* 0.6512* 6.7%

CARLrwHSG−SR 0.7498 0.7061 0.7495 0.6946 –CARLmw

HSG−SR 0.7511 0.7084 0.7503 0.6962 –CARLrwSE−HSG 0.7562 0.7125 0.7546 0.6978 –CARLmw

SE−HSG 0.7627 0.7208 0.7602 0.7097 –

4 5 6 7 8 90.66

0.68

0.70

0.72

0.74

0.76

0.78

Accuracy

F1 Score

(a) Window size τ

24

25

26

27

28

29

0.66

0.68

0.70

0.72

0.74

0.76

0.78

Accuracy

F1 Score

(b) Representation dimension d

Figure 4: Parameter sensitivity in collaboration prediction.

4.3.1 Experimental Setting. As in the previous task, the networkbefore T is utilized to learn node representations. The ground truthof relevance is assumed as the co-cited relation between two papersafter T. For example, in Figure 1(b), new paper P5 cites both previouspapers P2 and P3. The model tends to retrieve P2 when querying P3or retrieve P3 when querying P2. The relevant score of two papersis defined as the cosine similarity between representations of twopapers. We useHitRatio@k as the evaluation metric. Due to largenumber of candidate papers, we follow the sampling strategy in [26]to reduce evaluation time. Specifically, for each evaluated paper,we randomly generate 100 negative samples for comparison withthe true relevant paper. The hit ratio equals 1 if the true relevantpaper is ranked in the top-k list of relevant score, otherwise 0.The overall result is the average value of HitRatio@k among allevaluated papers. The duplicated co-cited relations are removedfrom evaluation and k is set to 10 or 20. In addition, T is set as2003/2004 and 2013/2014 for AMiner-I and AMiner-I I , respectively.

4.3.2 Results. The results are reported in Table 2. From thetable: (a) All variants of CARL achieve better performance thanbaselines, demonstrating the benefit of incorporating unstructuredsemantic content to learn paper representations; (b) The averageimprovements of CARLmw

SE−HSG over different baselines range from2.3% to 16.4% and 6.3% to 29.1% on AMiner-I and AMiner-I I , re-spectively; (c) CARLSE−HSG outperforms CARLHSG−SR , showing

Table 2: Paper retrieval results comparison.

AMiner-I T = 2004 T = 2003 GainHit@10 Hit@20 Hit@10 Hit@20

Deepwalk 0.8120 0.8816 0.8217 0.8967 6.7%LINE 0.7320 0.8130 0.7485 0.8380 16.4%

node2vec 0.8552* 0.9148* 0.8653* 0.9250* 2.3%metapath2vec 0.8239 0.8910 0.8366 0.9081 5.3%

CARLrwHSG−SR 0.8685 0.9351 0.8696 0.9375 –CARLmw

HSG−SR 0.8673 0.9342 0.8722 0.9389 –CARLrwSE−HSG 0.8741 0.9412 0.8788 0.9455 –CARLmw

SE−HSG 0.8751 0.9425 0.8783 0.9470 –

AMiner-I I T = 2014 T = 2013 GainHit@10 Hit@20 Hit@10 Hit@20

Deepwalk 0.7460 0.8366 0.7392 0.8316 13.2%LINE 0.6502 0.7453 0.6353 0.7382 29.1%

node2vec 0.8041* 0.8785* 0.7981* 0.8749* 6.3%metapath2vec 0.7214 0.8136 0.7257 0.8201 15.9%

CARLrwHSG−SR 0.8263 0.9162 0.8141 0.9101 –CARLmw

HSG−SR 0.8287 0.9184 0.8153 0.9096 –CARLrwSE−HSG 0.8619 0.9281 0.8516 0.9243 –CARLmw

SE−HSG 0.8625 0.9278 0.8518 0.9254 –

4 5 6 7 8 90.78

0.82

0.86

0.90

0.94

Hit@10

Hit@20

(a) Window size τ

24

25

26

27

28

29

0.74

0.78

0.82

0.86

0.90

0.94

Hit@10

Hit@20

(b) Representation dimension d

Figure 5: Parameter sensitivity in paper retrieval.

that semantic enhanced SkipGram is better than SkipGram withsemantic regularization. However, CARLrw has performance closeto that of CARLmw , indicating that meta-path walk has little impactfor this task. It is reasonable since the paper representations dependon the deep semantic encoder in our model.

4.3.3 Parameter Sensitivity. Following the same setup in linkprediction, we investigate the impact of window size τ and represen-tation dimension d on CARLmw

SE−HSG ’s performance on AMiner-I I(T = 2013), as shown by Figure 5. It can be seen that: (a) The resultsare little sensitive to τ when τ ≥ 7. As we noted above, paper repre-sentations depend on the deep semantic encoder; (b) The dimensiond plays significant role on generating paper representations. Thebest representations are learned when d is around 128 for the paperretrieval task.

4.4 Node Recommendation (RQ1-3)Which venues should be recommended to you? As a responseto RQ1-3, we design an experiment to evaluate CARL’s perfor-mance on the venue recommendation task.

4.4.1 Experimental Setting. As in the previous two tasks, thenetwork before T is utilized to learn node representations. Theground truth of recommendation is based on author’s appearancein venue after T. For example, in Figure 1(b), author A1 writes

Page 7: CARL: Content-Aware Representation Learning for ... · SkipGram and deep semantic encoding. •We design the corresponding optimization strategy and train-ing algorithm to effectively

CARL: Content-Aware Representation Learning for Heterogeneous Networks XXX, XXX, XXX

Table 3: Venue recommendation results comparison.

AMiner-I T = 2004 T = 2003 GainRec@5 Rec@10 Rec@5 Rec@10

Deepwalk 0.1051* 0.1628* 0.0864* 0.1403* 18.9%LINE 0.0376 0.0677 0.0388 0.0717 178.7%

node2vec 0.0945 0.1570 0.0774 0.1386 27.2%metapath2vec 0.0878 0.1527 0.0714 0.1395 33.3%

CARLrwHSG−SR 0.1156 0.1772 0.0988 0.1593 –CARLmw

HSG−SR 0.1208 0.1813 0.1050 0.1631 –CARLrwSE−HSG 0.1225 0.1831 0.1054 0.1648 –CARLmw

SE−HSG 0.1250 0.1852 0.1073 0.1667 –

AMiner-I I T = 2014 T = 2013 GainRec@5 Rec@10 Rec@5 Rec@10

Deepwalk 0.1039* 0.1549* 0.0925* 0.1384* 11.7%LINE 0.0297 0.0470 0.0267 0.0433 275.6%

node2vec 0.0766 0.1182 0.0691 0.1074 47.8%metapath2vec 0.0654 0.1077 0.0608 0.1001 66.1%

CARLrwHSG−SR 0.1092 0.1643 0.0977 0.1466 –CARLmw

HSG−SR 0.1116 0.1713 0.1032 0.1526 –CARLrwSE−HSG 0.1107 0.1725 0.1023 0.1534 –CARLmw

SE−HSG 0.1136 0.1742 0.1045 0.1551 –

4 5 6 7 8 90.07

0.09

0.11

0.13

0.15

0.17

Rec@5

Rec@10

(a) Window size τ

24

25

26

27

28

29

0.07

0.09

0.11

0.13

0.15

0.17

Rec@5

Rec@10

(b) Representation dimension d

Figure 6: Parameter sensitivity in venue recommendation.

new paper P5 on V3 after T, indicating A1 is likely to accept V3recommendation before T. The preference score is defined as thecosine similarity between representations of author and venue.We use Recall@k as the evaluation metric and k is set to 5 or10. In addition, the duplicated author-venue pairs are removedfrom evaluation. The reported score is the average value over allevaluated authors.

4.4.2 Results. The results are reported in Table 3. From the table:(a) All variants of CARL achieve better performance than baselines,showing the benefit of incorporating unstructured semantic con-tent for learning author and venue representations; (b) The averageimprovements of CARLmw

SE−HSG over different baselines are sig-nificant, and range from 18.9% to 178.7% and 11.7% to 275.6% onAMiner-I and AMiner-I I ; (c) The results of different variants ofCARL are close due to relative small recall values. However, wefind that CARLrwHSG−SR is the worst among four, indicating bothmeta-path walk and semantic enhanced SkipGram help improvethe performance of CARL in the venue recommendation task.

4.4.3 Parameter Sensitivity. Figure 6 shows the impact of win-dow sizeτ and feature dimensiond on the performance of CARLmw

SE−HSGon AMiner-I I (T = 2013). Accordingly, CARLmw

SE−HSG achieves thebest results when τ is around 7 and d is around 128 for the venuerecommendation task.

4.5 Relevance Search: Case Study (RQ2)To answerRQ2, we present three case studies of relevance search onAMiner-I I (T = 2013) to show the performance differences betweenCARLmw

SE−HSG and baselines. The ranking of each search result isbased on the cosine similarity of representations.

4.5.1 Relevant Author Search. Table 4 lists the top-5 returnedauthors for query author “Jure Leskovec” of CARLmw

SE−HSG andtwo baselines, i.e., node2vec and metapath2vec, which achievesrelatively better performances in the collaboration prediction task.According to this table: (a) most of returned authors of node2vechave collaboration relations with “Jure” before T, indicating thatnode2vec highly depends on structural closeness and cannot findrelevant authors who are far away from “Jure” in the network;(b) metapath2vec returns some authors (e.g., P. Nguyen) who aredifferent from “Jure” in their specific research interests, illustratingthat “APVPA” meta-path walks (used by metapath2vec) may collectcontext node that are different from target node since it is commonthat authors bridged by the same venue study different researchtopics; (c) CARLmw

SE−HSG not only returns structurally close authorswho have collaboration relations with “Jure” before T but also findsfarther authors (e.g., D. Romero) who share similar research interestwith “Jure”, demonstrating CARLmw

SE−HSG captures both structuralcloseness and unstructured semantic relations for learning authorrepresentations.

4.5.2 Relevant Paper Search. Table 5 lists the top-5 returned pa-pers for query paper “When will it happen?” of CARLmw

SE−HSGand the best baseline node2vec in paper retrieval task. From thistable: (a) all returned papers of node2vec are written by at leastone author in query paper, showing that node2vec only returnsstructurally close papers but has difficulty finding farther semanti-cally related papers; (b) CARLmw

SE−HSG not only returns structurallyclose papers which have common authors with query paper but alsofinds semantically related papers without authorship overlapping,showing that CARLmw

SE−HSG utilizes both structural content andunstructured semantic content for learning paper representations.

4.5.3 Relevant Author-Venue Search. Table 6 lists the top-10 re-turned venues for query author “ChiWang” of CARLmw

SE−HSG andtwo baselines, i.e., Deepwalk and node2vec, which have relativelybetter performances on the venue recommendation task. We findthat: (a) Deepwalk and node2vec recommend both data mining (e.g.,KDD & ICDM) and database (e.g., SIGMOD & PVLDB ) venues to“Chi” since some of his works cite database papers and some of hisco-authors focus on database research, which illustrates that bothDeepwalk and node2vec return structurally close venues, some ofwhich are not the most suitable ones; (b) most of the venues inCARLmw

SE−HSG ’s recommendation list belong to data mining relatedareas, demonstrating that incorporating semantic content helpslearn better representations of author and venue.

4.6 Category Visualization of New Nodes (RQ3)The previous experiments and case studies demonstrate the effec-tiveness of CARL in learning representations of current nodes, i.e.,nodes that exist in HetNet before T. As described in Section 3.4,the learned semantic encoder of CARL can be directly applied toinfer representation of each incoming node with semantic content.

Page 8: CARL: Content-Aware Representation Learning for ... · SkipGram and deep semantic encoding. •We design the corresponding optimization strategy and train-ing algorithm to effectively

XXX, XXX, XXX Chuxu Zhang, Ananthram Swami, and Nitesh V. Chawla

Table 4: Case study of relevant author search. “Coauthor-b” denotes whether two authors have a collaboration relation beforeT and “Similar-I” represents whether two authors have similar research interests.

Query: Jure Leskovec (2009 SIGKDD Dissertation Award, Research Interest: Network Mining & Social Computing)

Rank node2vec metapath2vec CARLmwSE−HSG

Author Coauthor-b ? Similar-I ? Author Coauthor-b ? Similar-I ? Author Coauthor-b ? Similar-I ?

1 S. Kairam ✓ ✓ L. Backstrom ✓ ✓ J. Kleinberg ✓ ✓

2 M. Rodriguez ✓ ✓ P. Nguyen ✗ ✗ D. Romero ✗ ✓

3 D. Wang ✓ ✓ S. HanhijÃďrvi ✗ ✗ A. Dasgupta ✓ ✓

4 J. Yang ✓ ✓ S. Myers ✓ ✓ L. Backstrom ✓ ✓

5 A. Jaimes ✗ ✗ V. Lee ✗ ✓ G. Kossinets ✗ ✓

Table 5: Case study of relevant paper search.Query: When will it happen?: relationship prediction in heterogeneous information networks (WSDM2012, Citation > 180), A: Y. Sun, J. Han, C. Aggarwal, N. Chawla

Model Rank Returned Paper

node2vec

1 Co-author relationship prediction in heterogeneous bibliographic networks (ASONAM2011), A: Y. Sun, R. Barber, M. Gupta, C. Aggarwal, J. Han

2 A framework for classification and segmentation of massive audio data streams (KDD2007), A: C. Aggarwal

3 Mining heterogeneous information networks: the next frontier (KDD2012), A: J. Han

4 Ranking-based classification of heterogeneous information networks (KDD2011), A: M. Ji, J. Han, M. Danilevsky

5 Evolutionary clustering and analysis of bibliographic networks (ASONAM2011), A: M. Gupta, C. Aggarwal, J. Han, Y. Sun

CARLmwSE−HSG

1 Collective prediction of multiple types of links in heterogeneous information networks (ICDM2014), A: B. Cao, X. Kong, P. Yu

2 Community detection in incomplete information networks (WWW2012), A: W. Lin, X. Kong, P. Yu, Q. Wu, Y. Jia, C. Li

3 Meta path-based collective classification in heterogeneous information networks (CIKM2012), A: X. Kong, P. Yu, Y. Ding, D. Wild

4 Ranking-based classification of heterogeneous information networks (KDD2011), A: M. Ji, J. Han, M. Danilevsky

5 Fast computation of SimRank for static and dynamic information networks (EDBT2010), A: C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, T. Wu

Table 6: Case study of relevant author-venue search.

Query: Chi Wang (2015 SIGKDD Dissertation Award)Research Interest: Unstructured Data/Text Mining

Rank Deepwalk node2vec CARLmwSE−HSG

1 KDD KDD KDD2 CIKM WSDM CIKM3 DASFAA SIGMOD ICDM4 SIGMOD PVLDB PAKDD5 ICDM WWW WWW6 WSDM ICDM WWWC7 ICDE EDBT TKDE8 PVLDB PAKDD KAIS9 WWW GRC DASFAA10 EDBT FPGA WSDM

Besides, CARL’s online update module can efficiently learn repre-sentation of each new node without semantic content. To answerRQ3 and show the effectiveness of learned new node representa-tions, we employ the Tensorflow embedding projector to visualizethe new paper representations inferred by paper semantic encoderand new author representations learned by online update modulein sequence. Figure 7 shows the results of new paper and authornodes (appearing after T) of four selected research categories, i.e.,Data Mining (DM), Computer Vision (CV), Human Computer In-teraction (HCI) and Theory, on AMiner-I I (T = 2013). Specifically,we choose three top venues2 for each area. Each new paper is as-signed according to venue’s area and the category of each author2DM: KDD, WSDM, ICDM. CV: CVPR, ICCV, ECCV. HCI: CHI, CSCW, UIST. T: SODA, STOC, FOCS

ComputerVisionHuman Computer

Interaction

DataMining

Theory

2DComputerVision

Theory

DataMining

Human ComputerInteraction

3D

(a) Visualization of New Papers

2DComputerVision

Human ComputerInteraction Theory

DataMining

3DComputerVision

Theory

DataMining

Human ComputerInteraction

(b) Visualization of New Authors

Figure 7: Representation visualizations of new paper and au-thor nodes in four selected research categories.

is assigned to the area with the majority of his/her publications.We randomly sample 100 new papers/authors of each area. Accord-ing to Figure 7: (a) The representations of new papers in the samecategory cluster closely and can be well discriminated from othersfor both 2D and 3D visualizations, indicating that the semantic

Page 9: CARL: Content-Aware Representation Learning for ... · SkipGram and deep semantic encoding. •We design the corresponding optimization strategy and train-ing algorithm to effectively

CARL: Content-Aware Representation Learning for Heterogeneous Networks XXX, XXX, XXX

encoder achieves satisfactory performance in inferring semanticrepresentations of new papers. Notice that, papers belong to DMhave few intersections with the other three since semantic contentof few DM papers are quite similar to CV, HCI and Theory papersw.r.t. model, application and theoretical basis, respectively. (b) Therepresentations of new authors in the same category are clearlydiscriminated from others without intersection for both 2D and3D visualizations, which demonstrates the effectiveness of onlineupdate module in learning new author representations.

5 RELATEDWORKIn the past decade, many works have been devoted to mining Het-Nets for different applications, such as relevance search [2, 7, 16, 28],node clustering [17, 18], personalized recommendation [14, 26, 27].

The network representation learning has gained a lot of attentionin the last few years. Some walk sampling based models [4, 5, 12]have been proposed to learn vectorized node representations thatcan be further utilized in various tasks in network. Specifically,inspired by word2vec [10] for learning distributed representationsof words in text corpus, Perozzi et al. developed the innovativeDeepwalk [12] which introduces node-context concept in network(analogy to word-context) and feeds a set of random walks overnetwork (analogy to “sentences”) to SkipGram for learning node rep-resentations. In order to deal with neighborhood diversity, Grover &Leskovec suggested taking biased random walks (a mixture of BFSand DFS) as the input of SkipGram. More recently, we argued thatthose models are not able to truly tackle network heterogeneity andproposed metapath2vec [4] for heterogeneous network representa-tion learning by feeding meta-path walks to SkipGram. In addition,many other models have been proposed [1, 11, 13, 19, 20, 22, 24, 25],such as PTE [19] for text data embedding, HNE [1] for image-textdata embedding and NetMF [13] for unifying network representa-tion models as matrix factorization.

Our work furthers the investigation of network representationlearning by developing a content-aware representation learningmodel CARL for HetNets. Unlike previous models, CARL leveragesboth structural closeness and unstructured semantic relations tolearn node representations, and contains an online update modulefor learning representations of new nodes.

6 CONCLUSIONIn this paper, we formalize the problem of content-aware represen-tation learning in HetNets and propose a novel model CARL to solvethe problem. CARL performs joint optimization of heterogeneousSkipGram and deep semantic encoding for capturing both struc-tural closeness and unstructured semantic relations in a HetNet.Furthermore, an online update module is designed to efficientlylearn representations of new nodes. Extensive experiments demon-strate that CARL outperforms state-of-the-art baselines in variousHetNet mining tasks, such as link prediction, document retrieval,node recommendation and relevance search. Besides, the onlineupdate module achieves satisfactory performance, as reflected bycategory visualization of new nodes. In the future, we plan to de-sign the dynamic heterogeneous network representation learningmodels by using time series information of nodes and links.

REFERENCES[1] Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and

Thomas S Huang. 2015. Heterogeneous network embedding via deep archi-tectures. In KDD. 119–128.

[2] Ting Chen and Yizhou Sun. 2017. Task-Guided and Path-Augmented Heteroge-neous Network Embedding for Author Identification. In WSDM. 295–304.

[3] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau,Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phraserepresentations using RNN encoder-decoder for statistical machine translation.arXiv:1406.1078 (2014).

[4] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec:Scalable representation learning for heterogeneous networks. In KDD. 135–144.

[5] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning fornetworks. In KDD. 855–864.

[6] Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fastmatrix factorization for online recommendation with implicit feedback. In SIGIR.549–558.

[7] Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis,and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneousinformation networks. In KDD. 1595–1604.

[8] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti-mization. arXiv preprint arXiv:1412.6980 (2014).

[9] Xiaozhong Liu, Yingying Yu, Chun Guo, and Yizhou Sun. 2014. Meta-path-basedranking with pseudo relevance feedback on heterogeneous graph for citationrecommendation. In CIKM. 121–130.

[10] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.Distributed representations of words and phrases and their compositionality. InNIPS. 3111–3119.

[11] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asym-metric transitivity preserving graph embedding. In KDD. 1105–1114.

[12] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learningof social representations. In KDD. 701–710.

[13] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018.Network Embedding as Matrix Factorization: UnifyingDeepWalk, LINE, PTE, andnode2vec. In WSDM.

[14] Xiang Ren, Jialu Liu, Xiao Yu, Urvashi Khandelwal, Quanquan Gu, Lidan Wang,and Jiawei Han. 2014. Cluscite: Effective citation recommendation by informationnetwork-based clustering. In KDD. 821–830.

[15] Yizhou Sun, Jiawei Han, Charu C Aggarwal, and Nitesh V Chawla. 2012. Whenwill it happen?: relationship prediction in heterogeneous information networks.In WSDM. 663–672.

[16] Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim:Meta path-based top-k similarity search in heterogeneous information networks.VLDB 4, 11 (2011), 992–1003.

[17] Yizhou Sun, Brandon Norick, Jaiwei Han, Xifeng Yan, Philip Yu, and Xiao Yu.2012. PathSelClus: Integrating Meta-Path Selection with User-Guided ObjectClustering in Heterogeneous Information Networks. In KDD. 1348–1356.

[18] Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering ofheterogeneous information networks with star network schema. In KDD. 797–806.

[19] Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive text embeddingthrough large-scale heterogeneous text networks. In KDD. 1165–1174.

[20] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei.2015. Line: Large-scale information network embedding. In WWW. 1067–1077.

[21] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet-miner: extraction and mining of academic social networks. In KDD. 990–998.

[22] Ke Tu, Peng Cui, Xiao Wang, Fei Wang, and Wenwu Zhu. 2018. Structural DeepEmbedding for Hyper-Networks. In AAAI.

[23] ChiWang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, and Jingyi Guo.2010. Mining advisor-advisee relationships from research publication networks.In KDD. 203–212.

[24] Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embed-ding. In KDD. 1225–1234.

[25] Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, FuzhengZhang, Xing Xie, and Minyi Guo. 2018. GraphGAN: Graph RepresentationLearning with Generative Adversarial Nets. In AAAI.

[26] Carl Yang, Lanxiao Bai, Chao Zhang, Quan Yuan, and Jiawei Han. 2017. BridgingCollaborative Filtering and Semi-Supervised Learning: A Neural Approach forPOI Recommendation. In KDD. 1245–1254.

[27] Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandel-wal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation:A heterogeneous information network approach. In WSDM. 283–292.

[28] Chuxu Zhang, Chao Huang, Lu Yu, Xiangliang Zhang, and Nitesh Chawla. 2018.Camel: Content-Aware and Meta-path Augmented Metric Learning for AuthorIdentification. In WWW.