community detection in weighted temporal text networks detection inweight… · temporal community...

9
Community Detection in Weighted Temporal Text Networks Yuting Jia, Ying Huang, Yaowei Huang, Bo Wang, Xinbing Wang Shanghai Jiao Tong university {hnxxjyt,hy941001,14330222150355,wb1996,xwang8}@sjtu.edu.cn ABSTRACT e community detection in real-life text networks, the majority of which are weighted temporal text networks, is confronted with two main problems – how to model the weight of edges and how to exploit the temporal information. In this paper, we oer a new perspective to encode the edge weight and temporal information. A probabilistic generative model, named Custom Temporal Community Detection (CTCD) is intro- duced, which views the link between two nodes as a weighted edge with several time stamps. Our model utilizes network, semantic and temporal information simultaneously to extract temporal com- munity aliations for individual user, inuence strength across communities and temporal interested topic in each community. An ecient inference method and corresponding parallel implementa- tion are proposed to scale to large datasets. rough the knowledge extracted by CTCD, we are able to spot the community shi of the individual user and track the develop- ment of the communities over time. Moreover, experiments on two large-scale temporal text networks show that CTCD achieves signicant improvement over state-of-the-art methods on the task of user behavior prediction. CCS CONCEPTS Information systems Clustering; Social networks; Computing methodologies Topic modeling; Mathematics of comput- ing Bayesian networks; Networks Social media networks; KEYWORDS temporal community detection, topic model, data mining, social network ACM Reference format: Yuting Jia, Ying Huang, Yaowei Huang, Bo Wang, Xinbing Wang. 2016. Community Detection in Weighted Temporal Text Networks. In Proceedings of ACM Conference, Washington, DC, USA, July 2017 (Conference’17), 9 pages. DOI: 10.1145/nnnnnnn.nnnnnnn 1 INTRODUCTION From blogs to video-sharing sites to social networks and still oth- ers, online social media have experienced rapid growth over the past half-decade [17]. To extract meaningful knowledge from the available large amounts of data, the network structure inside it proves to be of utmost importance [12]. One way to understand Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Conference’17, Washington, DC, USA © 2016 Copyright held by the owner/author(s). 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 DOI: 10.1145/nnnnnnn.nnnnnnn network is to identify groups of nodes which share same properties or functions, which is known as communit detection [27]. Detect- ing network communities allows us to discover functionally related objects [13, 14, 25], infer missing aribute values [1, 8], and predict unobserved connections [5]. Most of the real-life networks we are interested in are weighted temporal text networks, such as online social media (e.g. Twier) and the academic citation/coauthor network (e.g. DBLP digital library), as shown in 1(a). e nodes, which in real-life network oen represent users, are associated with several posts. ese posts can be tweets in Twier, papers published in conference and journal etc. e edge between users is associated with a weight. Naturally, each post is designated with a time stamp which indicates the publish time. In addition, each link is also characterized by several time stamps illustrating the time of interactions, which is detailedly formulated in §3.1. However, it is dicult to comprehensively model such networks. In terms of edge weight, for simplication, existing methods oen treat edges equally [18, 22, 25]. And in the few works which take it into consideration, the edge weight is only used to calculate and maximize/minimize some measure of graph, by which the community can be detected [6, 10, 21]. However, the edge weight is an important feature of the network and supposed to be explicitly involved into the model. erefore, we intend to introduce the edge weight into the generative model. Another issue is how to utilize the temporal information. A large amount of works divide the period of time into slices articially and focus on the subnetwork in each time slice [7, 10, 19]. However, time is intrinsically continuous so that the determination of the slice size is tricky. Instead of discretizing time, we employ continuous distribution to model the time aribute of both posts and links. In order to tackle the problems mentioned above within a uni- ed framework, we propose a probabilistic generative model based on Bayesian network, named Custom Temporal Community De- tection (CTCD), which models the edge weight, post time stamps and interaction time stamps directly. Taking both weight infor- mation and temporal information in to account, CTCD is able to extract temporal community aliations for each user, the inuence strength across communities and temporal interested topics in each community at the same time. e expected output of CTCD is demonstrated in 1(b). An ecient Gibbs-sampling-based inference algorithm is designed and corresponding parallel implementation for adapting to large-scale social networks is introduced. Besides, our approach is proved to achieve high accuracy for prediction task. And the extracted knowledge reveals interesting developing paern of communities in the network. To summarize, our contributions are as follows: 1. Novel Perspective. We propose a new idea that model mul- tiple interactions between two users as a weighted edge with two bags of time stamps, which are generated within a topic model

Upload: others

Post on 30-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Community Detection in Weighted Temporal Text Networks Detection inWeight… · temporal community detection, topic model, data mining, social network ACM Reference format: Yuting

Community Detection in Weighted Temporal Text NetworksYuting Jia, Ying Huang, Yaowei Huang, Bo Wang, Xinbing Wang

Shanghai Jiao Tong university{hnxxjyt,hy941001,14330222150355,wb1996,xwang8}@sjtu.edu.cn

ABSTRACT�e community detection in real-life text networks, the majorityof which are weighted temporal text networks, is confronted withtwo main problems – how to model the weight of edges and howto exploit the temporal information.

In this paper, we o�er a new perspective to encode the edgeweight and temporal information. A probabilistic generative model,named Custom Temporal Community Detection (CTCD) is intro-duced, which views the link between two nodes as a weighted edgewith several time stamps. Our model utilizes network, semanticand temporal information simultaneously to extract temporal com-munity a�liations for individual user, in�uence strength acrosscommunities and temporal interested topic in each community. Ane�cient inference method and corresponding parallel implementa-tion are proposed to scale to large datasets.

�rough the knowledge extracted by CTCD, we are able to spotthe community shi� of the individual user and track the develop-ment of the communities over time. Moreover, experiments ontwo large-scale temporal text networks show that CTCD achievessigni�cant improvement over state-of-the-art methods on the taskof user behavior prediction.

CCS CONCEPTS•Information systems→Clustering; Social networks; •Computingmethodologies →Topic modeling; •Mathematics of comput-ing →Bayesian networks; •Networks →Social media networks;

KEYWORDStemporal community detection, topic model, data mining, socialnetworkACM Reference format:Yuting Jia, Ying Huang, Yaowei Huang, Bo Wang, Xinbing Wang. 2016.Community Detection in Weighted Temporal Text Networks. In Proceedingsof ACM Conference, Washington, DC, USA, July 2017 (Conference’17), 9 pages.DOI: 10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTIONFrom blogs to video-sharing sites to social networks and still oth-ers, online social media have experienced rapid growth over thepast half-decade [17]. To extract meaningful knowledge from theavailable large amounts of data, the network structure inside itproves to be of utmost importance [12]. One way to understand

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).Conference’17, Washington, DC, USA© 2016 Copyright held by the owner/author(s). 978-x-xxxx-xxxx-x/YY/MM. . .$15.00DOI: 10.1145/nnnnnnn.nnnnnnn

network is to identify groups of nodes which share same propertiesor functions, which is known as community detection [27]. Detect-ing network communities allows us to discover functionally relatedobjects [13, 14, 25], infer missing a�ribute values [1, 8], and predictunobserved connections [5].

Most of the real-life networks we are interested in are weightedtemporal text networks, such as online social media (e.g. Twi�er)and the academic citation/coauthor network (e.g. DBLP digitallibrary), as shown in 1(a). �e nodes, which in real-life networko�en represent users, are associated with several posts. �ese postscan be tweets in Twi�er, papers published in conference and journaletc. �e edge between users is associated with a weight. Naturally,each post is designated with a time stamp which indicates thepublish time. In addition, each link is also characterized by severaltime stamps illustrating the time of interactions, which is detailedlyformulated in §3.1.

However, it is di�cult to comprehensively model such networks.In terms of edge weight, for simpli�cation, existing methods o�entreat edges equally [18, 22, 25]. And in the few works which takeit into consideration, the edge weight is only used to calculateand maximize/minimize some measure of graph, by which thecommunity can be detected [6, 10, 21]. However, the edge weight isan important feature of the network and supposed to be explicitlyinvolved into the model. �erefore, we intend to introduce the edgeweight into the generative model.

Another issue is how to utilize the temporal information. A largeamount of works divide the period of time into slices arti�ciallyand focus on the subnetwork in each time slice [7, 10, 19]. However,time is intrinsically continuous so that the determination of the slicesize is tricky. Instead of discretizing time, we employ continuousdistribution to model the time a�ribute of both posts and links.

In order to tackle the problems mentioned above within a uni-�ed framework, we propose a probabilistic generative model basedon Bayesian network, named Custom Temporal Community De-tection (CTCD), which models the edge weight, post time stampsand interaction time stamps directly. Taking both weight infor-mation and temporal information in to account, CTCD is able toextract temporal community a�liations for each user, the in�uencestrength across communities and temporal interested topics in eachcommunity at the same time. �e expected output of CTCD isdemonstrated in 1(b). An e�cient Gibbs-sampling-based inferencealgorithm is designed and corresponding parallel implementationfor adapting to large-scale social networks is introduced. Besides,our approach is proved to achieve high accuracy for predictiontask. And the extracted knowledge reveals interesting developingpa�ern of communities in the network.

To summarize, our contributions are as follows:1. Novel Perspective. We propose a new idea that model mul-

tiple interactions between two users as a weighted edge with twobags of time stamps, which are generated within a topic model

Page 2: Community Detection in Weighted Temporal Text Networks Detection inWeight… · temporal community detection, topic model, data mining, social network ACM Reference format: Yuting

Conference’17, July 2017, Washington, DC, USA Y. Jia et al.

1 2:{ , ,..., , }Dw w w t

1 1 2 2:{(t , t ), ( , t ),..., ( , t )}e et t

time

affi

liatio

n

prob

abili

ty

edge weight

users communities communities topics

time

inte

rest

User’s varying community affiliation strength over time

Edge weight distribution of pairs of communities

Community’s varying topic interest over time

(a) (b)Figure 1: �e overview of CTCD: (a) Input weighted temporal text network. (b) Knowledge extracted by CTCD.

framework. By modeling time in this way, our model is able to ex-tract temporal community a�liation strength for each user, but notstatic community a�liation strength in nearly all existing models.

2. Comprehensive Model. Our model incorporates all seman-tic, network and temporal information. Temporal community a�li-ations for each user, the in�uence strength across communities andtemporal interested topics can be uncovered at the same time. Withthe knowledge extracted, we are able to spot the community shi� ofthe individual user and track the development of the communitiesover time.

3. Scalable Inference. We propose an inference method usingGibbs sampling which reaches linear complexity. �en to adapt tolarge-scale social dataset, we develop the parallel implementationand propose several methods to deal with the synchronizationproblem between processes.

�e rest of this paper is organized as follows: §2 summarizes therelated literature; §3 formulates the problem and introduces themodel. §4 describes the inference method and the parallel imple-mentation. §5 develops some user behavior prediction approaches.§6 evaluates the model proposed. And �nally, we conclude thiswork in §7.

2 RELATEDWORK2.1 Weighted NetworkMost of real-life networks are weighted network. For example,in social networks, the user enjoys di�erent degree of intimacyof relations with friends in his contact list and therefore the linkis expected to be characterized with various weight. However,it is di�cult to model such weight of the edge. To simplify theanalysis or design, they are o�en formulated as two-value networks(1 denotes edge existence, otherwise 0) in previous communitydetection works [18, 22, 25], which leads to loss of information.Recently, there has been some researches on community detectionin weighted networks [6, 15, 21]. But all these methods are basedon local property of network like cut, node-strength or local updatelike label propagation. We introduce a new method which considersboth edge weight and global property of network in CTCD. By theempirical result of comparison between CTCD and correspondingunweighted version model, CTCD-Ber, we can get a view of theimprovement brought by taking edge weight into account.

2.2 Temporal Community DetectionTemporal community detection is an important branch of com-munity detection research [2, 4, 10], which aims to identify theprocess that communities emerge, grow, combine, and decay overtime. Most existing works separate the whole network into severalsubnetworks by spli�ing time into slices, then exploit some stati-cal community detection method on each subnetwork and �nallymerge sub-results with some smoothness [7, 10, 19]. However, dis-cretization of time always begs the question of selecting the slicesize, and inevitably the size is too small for some regions and toolarge for others.

In this paper, we encode the time stamps of interactions betweenusers as the a�ribute of edges and model them in a generative andcontinuous manner.

3 CUSTOM TEMPORAL COMMUNITYDETECTION METHOD

3.1 Problem FormulationNotation. In this paper, all vectors are column vectors and denotedby lower-case bold le�ers like θ and π . Calligraphic le�ers are usedto represent sets likeU andV . And for simplicity, the correspond-ing normal le�ers are used to denote their size like V = |V |. �edetails of notations in this paper are listed in table 1.

De�nition 3.1. (WeightedTemporal Network) Consider a real-life network G = (U , E) whereU is a set ofU users and E is a setof E weighted edges.

A directed edge (i, i ′, eii′ , s, s′) ∈ E means the existence of in-

teraction(s) between user i and i ′, the weight of the interaction(s) iseii′ , and this edge contains two bags of time stamps, s and s ′, whichrepresent the out and in time stamps of each interaction betweenuser i and i ′ respectively. In this paper, we set the weight as thenumber of interactions (i.e. the number of pairs of time stamps ofinteractions) between two users. Correspondingly, if there is noany interaction between user i and i ′, eii′ equals to zero and thesetwo bags of time stamps are both empty.

Note that interaction represents di�erent connecting activitiesin di�erent networks. One interaction means one retweeting, coau-thoring or citing behavior. Between two users there may be severalinteractions. For each interaction, there will be two time stamps,

Page 3: Community Detection in Weighted Temporal Text Networks Detection inWeight… · temporal community detection, topic model, data mining, social network ACM Reference format: Yuting

Community Detection in Weighted Temporal Text Networks Conference’17, July 2017, Washington, DC, USA

Table 1: Notations used in this paperSymbol Description

U, C, K, V number of users, communities, and topics and size of vocabularyD ,W , E , S number of posts, words, edges, and interactions(pairs of time

stamps)Di , Ei number of posts by user i , and out edges from user i

Wi j , Sii′ number of words in post j by user i , and interactions between useri and i′

di j the jth post of user ieii′ the weight of the edge between user i and i′ti j the time stamp of post di j

sii′m , s ′ii′m the ‘out ’ and ‘in’ time stamp of mth interaction in edge eii′wi jl the l th word in post di jci j , zi j community and topic associated with post di jдii′ , д′ii′ communities associated with user i and i′ in edge eii′π i multinomial distribution over communities speci�c to user iθ c multinomial distribution over topics speci�c to community cϕk multinomial distribution over words speci�c to topic kψck Beta distribution over time speci�c to community c and topic kηcc′ Poisson distribution over edge weight from community c to c ′δiд Beta distribution over out time speci�c to user i and community дγi′д′ Beta distribution over in time speci�c to user i′ and community д′

out and in, and they may be di�erent. For example, when investi-gating the citing interaction as shown in Figure 2, we de�ne thedirection of the edge from the cited user to the citing user. �eout time stamp, s , denotes the publish time of the cited paper and,correspondingly, the in time stamp, s ′, represents the publish timeof the citing paper. �is se�ing is based on the fact that in citinginteractions, it is user i at time s who in�uences user i ′ at time s ′.So the time stamp of this interaction should not be only s ′. How-ever, in some other kinds of interaction activities like coauthoring,the two time stamps of one interaction are supposed be the same,which are both the publish time of the collaborated paper.

t0

t1

t2t3

t4

t5t6 3

1

2

21

t1

t5

t5,t5,t6

t0,t2,t0

t5,t6

t4,t4t1,t2

t3,t3

t0

t2

s s'iiei iedge:

Figure 2: �e example of citation interaction modeling

De�nition 3.2. (Semantic Information) Each user is associatedwith Di posts, denoted as Di . And each post di j is a bag of wordsalong with a time stamp ti j .

In text network, the available post content information enablesus to extract semantic topics as well as detecting communities. �us,a community can be characterized not only by link structures, butalso by its topic composition. Associating each community with adistribution over topics makes the communities more interpretable.�erefor we de�ne communities as follows:

De�nition 3.3. (Community) A community c ∈ C has two com-ponents: (1) a multinomial distribution over topics, θc , where eachθck represents the probability that community c publishes posts

covering topic k ; (2) ηc , where each ηcc ′ , containing two parame-ters ηcc ′0 and ηcc ′1, represents a Poisson distribution of the weightof edges from community c to community c ′.

In the real life networks, users generally enjoy multiple rolesand get involved in di�erent communities [24]. �erefore a mixed-membership approach is adopted. Furthermore, the user’s mem-bership of communities is not invariable. �e participation andin�uence of one user in a speci�c community constantly changes,and a distribution over time is assigned to each community userpair to depict the dynamic feature of the the membership. for eachuser i , (1) there is a multinomial distribution over communities, π i ,where each πic represents the a�liation strength from user i tocommunity c; (2) for each community c , there are two Beta distri-butions over time, δic and γic , which model the change of user i’sin�uence degree and participation degree over time in communityc .

De�nition 3.4. (Topic) A topic k ∈ K is a multinomial distribu-tion over a collection of words, ϕk , where each ϕkw represents theprobability for word w generated from topic k .

De�nition 3.5. (Community-Topic Temporal Distribution)For each community c and topic k , the popularity of topic k incommunity c obey a Beta distribution over time,ψck .

3.2 Model StructureTo integrate the text, time and network information into a uni�edframework, we propose the probabilistic generative model, CTCD(Custom Temporal Community Detection). �e graphical modelrepresentation of CTCD is shown in 3. �ough the model seemscomplicated on the whole, it consists of three simple component inessential : the user a�liation component models users’ a�liationto communities which act as bridge connecting the other two com-ponents, the text-time component discovers the topics and acquiresvariation of topic popularity within communities, the network-timecomponent considers the link structure and spots the membershipchange of users.

3.2.1 User a�iliation component. Generally, users belong to mul-tiple communities in the real world. We associate each user i with aprobability vector π i , which denotes the probability distribution ofuser i’s a�liation to communities. For each post di j ∈ D, we assignit to a community ci j , which indicates the a�liation of user i whenhe publishes his jth post. And for each edge eii′ ∈ E, we associateit with two communities дii′ and д′ii′ , which respectively representsthe a�liations of user i and i ′ when the interaction happens.

3.2.2 Text-time component. Each post di j contains a bag ofwords {wi j1,wi j2, ...,wi j |di j | }, where |di j | represents the lengthof post di j . Traditional topic models assume that a document isa mixture of topics and each words in the document is generatedfrom a hidden topic. It’s necessary to view the document as a topicmixture when the it is long and possible to exhibit several di�erenttopics. However, when the posts are short texts, such as tweets,micro-blogs and titles of papers, they are usually about a singletopic and assign the posts with one topic is enough.

Besides a bag of words, each post di j also has a time stamp ti j .To model this temporal information, we employ a Beta distributionψck over time to model the temporal variation speci�c to each

Page 4: Community Detection in Weighted Temporal Text Networks Detection inWeight… · temporal community detection, topic model, data mining, social network ACM Reference format: Yuting

Conference’17, July 2017, Washington, DC, USA Y. Jia et al.

i

ijcijz

z

ijlwijt

ck

k

'iig

'iie

'miis 'mii

s

ig

'g'i

cc '

U

iDijW

iE

CK

K

C 2C

'iiT

UC

User affiliationcomponent

Text-timecomponent

Network-timecomponent

'iig

Figure 3: Graphical Model Representation of CTCD. �e la-tent variable д′ is represented by dashed circle because it isdrawn from πi , which is not shown in graph.

community c and each topic k . �us, the publishing time of postdi j , ti j , is sampled from Beta distributionψci jzi j .

3.2.3 Network-time component. Each edge (i, i ′) contains theweight eii′ and two bags of time stamps, sii′ and s ′ii′ , which respec-tively represent the out and in time stamps of interactions betweenuser i and i ′. We use pairwise community Poisson distribution η tomodel the weight of edges between pairs of users. For each edge(i, i ′): (1) the weight eii′ is drawn from ηдii′д′ii′

which representsthe in�uence strength between community дii′ and д′ii′ ; (2) eachout time stamp sii′m is sampled from Beta distribution δiд ; (3) andeach in time stamp s ′ii′m is sampled from Beta distribution γi′д′ .

Social networks are usually sparse, so we only model positiveedges for speedup: the variables дii′ and д′ii′ exist if and only if(i, i ′) ∈ E, i.e., eii′ > 0. As in [17], the negative edges can beavoided by a Bayesian fashion: we use a Gamma prior(λ0, λ1) oneach ηcc ′ , and set λ0 = 0.1 and λ1 = K · ln(Nneд/C

2), whereNneд = U (U − 1) − |E| represents the number of negative edgesand K represents a tunable weight. By the Bayesian fashion, thetime complexity is reduced to be linear with the number of edges.

3.3 Generative Process�e whole generative process is summarized in Algorithm 1. Foruser i , when he publishes a post di j , �rst he selects a communityci j by his community distribution π i , and then selects a topic zi jby topic interest θci j speci�c to community ci j . With the selectedcommunity ci j and topic zi j , words are generated from the topic’sword distribution ϕzi j , and the time stamp is generated from tem-poral distributionψci jzi j speci�c to ci j and zi j . On the other hand,when he interacts with another user i ′, �rst they select commu-nities дii′ and д′ii′ respectively, and then the weight of edge (i, i ′),eii′ , is drawn from community-level in�uence strength distribu-tion ηдii′д′ii′ . With the selected communities, each out time stampsii′m is sampled from temporal in�uence strength δiдii′ speci�c touser i and community дii′ , and each in time stamp s ′ii′m is sampled

Algorithm 1 Generative Process for CTCD1. For each community c=1,2,…,C,

(1) Sample the distribution over topics, θc |α ∼ Dir (α ).(2) For each community c’=1,2,…,C,

(a) Sample the probability of community-community link,ηcc′ |λ ∼ Gamma (λ0, λ1 ).

(3) For each topic k=1,2,…,K,(a) Sample the distribution over time stamps of document,

ψck |ϵ ∼ Beta (ϵ0, ϵ1 ).2. For each topic k=1,2,…,K,

(1) Sample the distribution over words, ϕk |β ∼ Dir (β ).3. For each user i=1,2,…,U,

(1) Sample the distribution over communities, πi |ρ ∼ Dir (ρ ).(2) For each document j=1,2,…,

(a) Sample community indicator, ci j |πi ∼ Mul (πi ).(b) Sample topic indicator, zi j |θci j ∼ Mul (θci j ).(c) Sample time stamp, ti j |ψci j zi j ∼ Beta (ψci j zi j 0, ψci j zi j 1 ).(d) For each word l=1,2,…,

(i) Sample word, wi jl |ϕzi j ∼ Mul (ϕzi j ).(3) For each out edge (i, i’),

(a) Sample community indicator, дii′ |πi ∼ Mul (πi ).(b) Sample community indicator, д′ii′ |πi′ ∼ Mul (πi′ ).(c) Sample edge weight, eii′ |ηдii′д′ii′

∼ Poi (ηдii′д′ii′).

(d) For each pair of time stamp m=1,2,…,(i) Sample out time stamp,

sii′m |δiдii′ ∼ Beta (δiдii′ 0, δiдii′ 1 ).(ii) Sample in time stamp, s ′ii′m |γiд′ii′

∼ Beta (γiд′ii′

0, γiд′ii′

1 ).

from temporal participation strength γi′дii′ speci�c to user i ′ andcommunity д′ii′ .

4 INFERENCE & IMPLEMENTATION4.1 Inference MethodExact inference for CTCD is di�cult because of intractable latentvariables, so we adopt collapsed Gibbs sampling method [16] toachieve an approximate inference. Gibbs sampling is a Markovchain Monte Carlo (MCMC) algorithm for obtaining a sequence ofobservations which are approximated from a speci�ed multivariateprobability distribution, when direct sampling is di�cult. As withother MCMC algorithms, Gibbs sampling generates a Markov chainof samples of latent variables, like c and z in CTCD, each of whichis correlated with nearby samples. And then the samples can beused to infer the distribution of interests, i.e., {π ,θ ,ϕ,η} in CTCD,while Beta distributions {ψ,δ ,γ } are ��ed in inference process.

�e whole inference process is summarized as Algorithm 2. Inthe sampling part of each iteration of Gibbs sampler, for each postdi j , sample both the corresponding community indicator ci j andtopic indicator zi j . And for each edge (i, i ′), sample the correspond-ing community indicators дii′ and д′ii′ . In the ��ing part, �t theBeta distributionsψ according to the sampled c and z, and �t theBeta distributions δ and γ according to the sampled д and д′. Dueto the space limitation, we only show the sampling and ��ingformulas here, omi�ing the details of derivation process.

4.1.1 Simpling ci j for post di j .

P (ci j = c |zi j = k, ti j = t, c−i j , д, z−i j , t −i j , .)

∝n (c )i + ρ

n (·)i +Cρ

n (k )c + α

n (·)c + Kα

tψck0−1i j · (1 − ti j )ψck1−1

B (ψck0, ψck1)

(1)

where n(c )i denotes the number of posts and edges (both in and out)of user i associated with community c; n(k )c denotes the number

Page 5: Community Detection in Weighted Temporal Text Networks Detection inWeight… · temporal community detection, topic model, data mining, social network ACM Reference format: Yuting

Community Detection in Weighted Temporal Text Networks Conference’17, July 2017, Washington, DC, USA

Algorithm 2 Inference for CTCD1: initialize community and topic assignment randomly for all tokens2: for iter = 1 to Niter do3: // sampling part start4: for i = 1 to U do5: for j = 1 to Di do6: draw ci j from P (ci j |zi j = k, ti j = t, c−i j , д, z−i j , t −i j , .)7: draw zi j from P (zi j |ci j = c, ti j = t, c−i j , z−i j , t −i j , w , .)8: for i′ = 1 to U do9: if eii′ > 0 then

10: draw дii′ and д′ii′ from P (дii′, д′ii′ |eii′, д−ii′, c, e, .)11: // sampling part end12: // ��ing part start13: for c = 1 to C do14: for k = 1 to K do15: updateψck by Eq.(4)16: for i = 1 to U do17: for c = 1 to C do18: update δic by Eq.(5)19: update γic by Eq.(6)20: // ��ing part end21: compute the posterior estimates of π , θ , ϕ and η

of posts associated with both community c and topic z; B (x ,y) =∫ 10 tx−1 (1 − t )y−1dt denotes the beta function. �e marginal coun-

ters are represented by dots, e.g., n( ·)i denotes the number of postsand edges (both in and out) of user i associated with any community.All the counters are counted with the post di j excluded.

4.1.2 Sampling zi j for post di j .

P (zi j = k |ci j = c, ti j = t, c−i j , z−i j , t −i j , w , .)

∝n (k )c + α

n (·)c + Kα

tψck0−1i j · (1 − ti j )ψck1−1

B (ψck0, ψck1)

∏Vv=1∏n (v )

i jq=0 (n (v )

k + q + β )

∏n (·)i j

q=0 (n(·)k + q +V β )

(2)

where v denotes the vth word in vocabulary; V denotes the sizeof vocabulary; n(v )i j denotes the number of words v in post di j ;

n(v )k denotes the number of words v associated with topic k . �e

marginal counters are represented by dots and all the counters arecounted with the post di j excluded.

4.1.3 Sampling дii′ and д′ii′ for edge (i, i′).

P (дii′ = c, д′ii′ = c

′ |eii′, д−ii′, c, e, .)

∝n (c )i + σ

n (·)i +Cσ

n (c′)i′ + ρ

n (·)i′ +Cρ

(λ1 + ncc′ )(λ0+nscc′ )

∏eii′−1m=0 (λ0 + nscc′ +m)

(λ1 + ncc′ + 1) (λ0+nscc′+eii′ )

∏sii′m

sδic0−1ii′m (1 − sii′m )δic1−1

B (δic0, δic1)

∏s′ii′m

s′γi′c′0−1ii′m (1 − s′ii′m )γi′c′1−1

B (γi′c′0, γi′c′1)

(3)where ncc ′ and nscc ′ respectively denote the number of edges andinteractions associated with community c and c ′ with edge (i, i ′)excluded.

4.1.4 Fi�ingψck for community c and topic k .

ψck0 =mck (mck (1 −mck )

v2ck

− 1);ψck1 = (1−mck ) (mck (1 −mck )

v2ck

− 1)

(4)wheremck and v2

ck denote the mean and variance of time stampsof posts associated with community c and topic v .

4.1.5 Fi�ing δic and γic for user i and community c .

δic0 =mic (mic (1 −mic )

v2ic

−1); δic1 = (1−mic ) (mic (1 −mic )

v2ic

−1) (5)

where mic and v2ic denote the mean and variance of out time

stamps of user i associated with community c .

γic0 =m′ic (m′ic (1 −m

′ic )

v ′2ic−1);γic1 = (1−m′ic ) (

m′ic (1 −m′ic )

v ′2ic−1) (6)

wherem′ic andv ′2ic denote the mean and variance of in time stampsof user i associated with community c .

With the samples, then we can infer the distribution of interests,i.e., {π ,θ ,ϕ,η} in CTCD. For any single sample, we can estimatethe unknown distributions as follows:

πic =n (c )i + ρ

n (·)i +Cρ

; θck =n (k )c + α

n (·)c + Kα

; ϕkw =n (w )k + β

n (·)k +V β

(7)

However, the distribution η cannot be estimated like the others.�e distributions likeπ can be inferred as Eq.(7) because of the prop-erty of multinomial distribution (i.e. πic = P (ci (Di+1) = c |c,д, i )where ci (Di+1) denotes the community indicator for an unseen postof user i), while η is the Poisson distribution which can only berepresented as the formula:

P (e |дe = c, д′e = c′, д, д′, λ)

=(λ1 + ncc′ )

(λ0+nscc′ )∏e−1m=0 (λ0 + nscc′ +m)

(λ1 + ncc′ + 1) (λ0+nscc′+e )

(8)

where e is an unseen edge; дe and д′e are the two community indi-cators of edge e .

4.2 Time ComplexityNow we analyze the time complexity of our inference algorithm. Itis shown that the time complexity of the whole inference algorithmis O (W + S ), which means that this algorithm scales linearly w.r.t.the size of dataset, i.e. the number of words and interactions. �ereare two parts of the whole algorithm, sampling and ��ing, whichwill be analyzed separately.

Firstly we analyze the sampling part. To sample communityindicator ci j , Eq.(1) need to be calculated. Because all the counterscan be counted in advance and updated in a constant time wheneach ci j is resampled, with all the counters, Eq.(1) can be calculatedin constant time. Sampling all c takes O (D) time. But to sampletopic indicator zi j , when calculating Eq.(2), linear time in termsof the length of post di j costs. So sampling all z takes O (W ) time.Finally we sample community indicator дii′ and д′ii′ according toEq.(3). Calculating Eq.(3) costs O (eii′ + 2Sii′ ) time, which can bealso wri�en as O (Sii′ ). Because all the negative edges are modeledby a Bayesian fashion, the time for sampling all д and д′ is reducedfrom quadratic (O (U 2)) to linear (O (S )). So the sampling part totallytakes O (W + S ) time.

In the ��ing part, counting all the counters takes O (D + S ).Updatingψ takes O (CK ) and updating δ and γ takes O (UC ). Con-sidering that usually C and K won’t be so large that CK < D andUC < S , the ��ing part totally takes O (D + S ) time.

Page 6: Community Detection in Weighted Temporal Text Networks Detection inWeight… · temporal community detection, topic model, data mining, social network ACM Reference format: Yuting

Conference’17, July 2017, Washington, DC, USA Y. Jia et al.

4.3 Parallel ImplementationTo apply CTCD to large-scale social data, parallelizing the inferencealgorithm is inevitable. In the ��ing part, all ��ings are totallyindependent which can be parallelized directly. But in the sam-pling part, a�er each iteration of sampling, the counters have tobe updated and the process synchronization will be the bo�leneck[9, 20]. Note that, as [23], we don’t re�t the Beta distributions a�ereach resampling, but re�t once at the end of each iteration. Nowwe propose three method to solve this synchronization problem.

1. Batch Sampling. At each iteration, �rst calculate all thecounters, and in sampling, all the counters stay no-change. Asthe result, there is no critical section and synchronization problembecause the counters are not updated. �is method means thatall the indicators are resampled simultaneously so that they willbe updated based on same counters. But according to [3], thestochastic updating will converge much faster than batch updating.So the batch sampling may also take more iterations to convergence,which means more time for inference.

2. Exact Synchronization. �is time we do not sample allindicators in batch again, but sample one, update counters once. Inthe synchronization, if we want to keep counters exactly correct,we have to: get lock, update counters and release locks. Althoughall processes share a common global lock is okay, requiring andreleasing a common lock means single processing updating, whichis easily to be the bo�leneck. So we propose a parallel updatingmethod for speedup. It should be noticed that all the indicators arerelative to communities or topics. So we set one lock lc (or lk ) foreach community c and each topic k . If the community indicator ofa post is changed from c to c ′, it will require lock lc and l ′c and thenupdate corresponding counters, and similar to topics.

3. Nearly-Exact Synchronization. �ere are three types ofcounters, speci�c to users, only speci�c to communities/topics andspeci�c to words. (1) For counters speci�c to users like n

(c )i , we

can avoid the synchronization problem by parallelizing as userlevel, i.e., di�erent processes sample indicators for di�erent users.(2) For counters only speci�c to communities or topics like n

(k )c ,

typically the value will be very large because of the low dimensionof communities and topics. So the con�ict is small enough to ignore.(3) For counters speci�c to words like n(v )k , the value will be small,but because of the large vocabulary, the probability for di�erentprocesses to update indicators for a same word will be small enoughto ignore. In a word, we can update the counter directly, withoutany lock, because the error is too small to a�ect the result much.

�ese three di�erent parallel implementations are all practicable,and we implement all of them. �e comparison of these threeimplementations are shown in §6.4

5 USER BEHAVIOR PREDICTION5.1 User Interaction PredictionUser interaction prediction is de�ned to estimate the probability ofan interaction. As in Def 3.1, an interaction can be once retweeting,coauthor, citing and so on. �us, given two users i and i ′, and twotime stamps s and s ′, the probability of an interaction from user i

to user i ′ is computed as follow:P (i, i′, s, s′ |π , η, δ, γ )

=∑д

∑д′

P (д |i )P (д′ |i′)P (eii′ > 0 |д, д′)P (s |i, д)P (s′ |i′, д′)

=∑д

∑д′πiдπi′д′[1 − (

λ1 + ncc′λ1 + ncc′ + 1 )

(λ0+nscc′ )]

b (s ; δiд0, δiд1)b (s′;γi′д′0, γi′д′1)

(9)

where b (x ;α , β ) represents the probability density function of Betadistribution. In this formula, we use P (eii′ > 0) because if thereexists the interaction, the weight of the edge from user i to i ′ willbe at least 1, i.e., eii′ > 0.

5.2 Post Time PredictionPredicting the time stamps of posts [23] is to estimate the publishingtime stamp of an unseen post. Given the words w and the author iof a post d , the time stamp can be chosen as the candidate givingmaximum likelihood:

td = arдmaxt

P (d |t, i, π , θ, ϕ,ψ )

= arдmaxt

∑cP (c |i, π )

∑k

P (k |c, θ )P (w |k, ϕ )P (t |c, k,ψ )

= arдmaxt

∑cπic∑k

θckb (t ;ψck0, ψck1)∏wϕkw

(10)

Because the Beta distribution is a continuous curve, which is notproper for participating calculation. We can select multiple candi-date time stamps from the whole time range, and use the discretevalue of Beta distribution to calculate.

6 EXPERIMENTSIn this section, we �rst quantitatively evaluate the performance ofCTCD for extracting communities and topics. We then qualitativelydemonstrate the interesting knowledge about the developmentpa�ern of communities by some case studies. Finally we comparethe e�ciency and performance of di�erent parallel methods.1

6.1 Experiment Setup6.1.1 Datasets. Our experiments are conducted on two real

temporal text networks, Academia and Weibo.Academia. �is dataset is extracted from data provided by

website Acemap (h�p://acemap.sjtu.edu.cn), which has mergedseveral data sources such as DBLP, Microso� Academic Graph, ACMDigital Library and IEEEXplore. we select out 80271 authors whohave published at least one paper on top journals or conferences incomputer science region. �e posts of each author are the titles ofhis papers. And each interaction between two users means oncecitation between them. Note that posts and interactions which timeis earlier than Jan 1, 2000 are removed. Finally there are 2.4m posts18m words, 10m edges and 22m interactions in the network andthe vocabulary size is 0.1m.

Weibo. Weibo is one of the most popular micro-blog platformin China. We select 25088 active, interacted users and crawledmicro-blogs they published in recent 5 years. In this dataset, eachpost represents one micro-blog and each interaction denotes once1�e code of CTCD and datasets used in the experiments can be downloaded ath�ps://github.com/SamJia/KDD2017-Model-Dataset

Page 7: Community Detection in Weighted Temporal Text Networks Detection inWeight… · temporal community detection, topic model, data mining, social network ACM Reference format: Yuting

Community Detection in Weighted Temporal Text Networks Conference’17, July 2017, Washington, DC, USA

retweeting. �ere are 21m posts, 125m words, 0.6m edges and 3.7minteractions in the network and the vocabulary size is 0.7m.

6.1.2 Baselines. We compare CTCD with another 4 baselinemethods. Cluster A�liation Model for Big Networks (BIGCLAM)[25], Poisson Mixed-Topic Link Model (PMTLM) [28], COmmunityLevel Di�usion (COLD) [18] and Enhanced User-Temporal Modelwith Burst-weighted Smoothing (EUTB) [26]. BIGCLAM is one ofrepresentatives of community detection methods for pure networkbased on a�liation graph model. PMTLM utilizes network andposts while COLD and EUTB consider network, posts and temporalinformation.

In CTCD, the weight of edges are considered and modeled byPoisson distribution, now we propose a Bernoulli version CTCD(CTCD-Ber) for exhibiting the importance of considering edgeweight. In CTCD-Ber,η ∼ Beta(λ0, λ1) and eii′ ∼ Bernoulli (ηдii′д′ii′

),where eii′ is an indicator for edge (i, i ′) which can only be 0 or 1.In this situation, the sampling formula for дii′ and д′ii′ changes to

P (дii′ = c, д′ii′ = c

′ |eii′, д−ii′, c, e, .) ∝n (c )i + σ

n (·)i +Cσ

n (c′)i′ + ρ

n (·)i′ +Cρ

λ0 + ncc′λ0 + λ1 + ncc′

∏sii′m

sδic0−1ii′m (1 − sii′m )δic1−1

B (δic0, δic1)

∏s′ii′m

s′γi′c′0−1ii′m (1 − s′ii′m )γi′c′1−1

B (γi′c′0, γi′c′1)

(11)�e Bernoulli distribution η can be inferred as

ηcc′ =λ0 + ncc′

λ0 + λ1 + ncc′(12)

And the probability of interaction from user i to i ′ changes to:P (i, i′, s, s′ |π , η, δ, γ )

=∑д

∑д′

P (д |i )P (д′ |i′)P (eii′ = 1 |д, д′)P (s |i, д)P (s′ |i′, д′)

=∑д

∑д′πiдπi′д′ηдд′b (s ; δiд0, δiд1)b (s′;γi′д′0, γi′д′1)

(13)

6.2 �antitative EvaluationIn this section, we quantitatively evaluate our model based on theuser behavior prediction methods proposed in §5.

6.2.1 User interaction Prediction. Since here is no pre-de�nedthreshold for user interaction prediction, we adopt area under the re-ceiver operating characteristic curve (AUC) as the evaluation metric.When using normalized units, the AUC is equal to the probabil-ity that a classi�er will rank a randomly chosen positive instancehigher than a randomly chosen negative one (assuming ’positive’ranks higher than ’negative’) [11]. In 5-fold cross validation, eachtime we randomly select 20% positive interactions and 1% negativeinteractions to calculate AUC.

Figure 4 shows the AUC value for �ve models. CTCD-Ber out-performs others in Academia dataset, while CTCD outperformsin Weibo dataset. �ere are 10m edges and 22m interactions inAcademia, on the other hand, 0.6m edges and 3.7m interactions inWeibo. So the average weight of edges in Weibo (6.2) is much largerthan Academia (2.2), which bene�ts the Poisson distribution andleads be�er performance for CTCD in Weibo dataset. Moreover,AUC of BIGCLAM is much lower than all other models, showingthat pure network structure is not such reliable and incorporating

50 100 150# community C

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

User

Inte

racti

on

Pre

dic

tion

(AU

C) BIGCLAM

PMTLM

COLD

CTCD-Ber

CTCD

(a) Academia Dataset

50 100 150# community C

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

User

Inte

racti

on

Pre

dic

tion

(AU

C) BIGCLAM

PMTLM

COLD

CTCD-Ber

CTCD

(b) Weibo Dataset

Figure 4: the AUC of User Interaction Prediction

information improves the accuracy of models signi�cantly. BothCTCD and CTCD-Ber are much be�er than other models, whichmeans the importance of considering temporal community.

Table 2: �e Accuracy of Post Time PredictionDataset Academia Weibo

# community C 50 100 150 50 100 150EUTB .529 .541 .552 .453 .432 .402COLD .496 .506 .502 .841 .839 .838

CTCD-Ber .624* .628* .629* .863 .866 .864CTCD .621 .627 .629* .875* .872* .873*

6.2.2 Post Time Prediction. Table 2 shows the prediction accu-racy for the four models. For each number of communities, thesame number of topics are set. And for calculating the accuracy, weset the time tolerance as 20%, i.e., the prediction is seen as correctif the di�erence between the predicted time and the real time isless than 20% of the whole time range. Note that here 5-fold crossvalidation is used.

From Table 2, we can know that: (1) CTCD-Ber and CTCD areboth outperform in Academia dataset, while CTCD outperform inWeibo dataset because of the same reason mentioned in last section§6.2.1; (2) the average accuracy of Weibo dataset is much higherthan Academia dataset. Because compared to Weibo where topicsvary fast, the topics of Academia dataset is much stable so that thetime of posts are much harder to predict; (3) the obvious improve-ment from COLD to CTCD shows that signi�cant superiority ofcontinuous distributions to model time.

6.3 �alitative Evaluation6.3.1 Capture the User Flow between Communities. A�er obtain-

ing temporal community a�liations through CTCD, we are able tocapture trend on the community level by analyzing the change ofpeople’s activeness in particular community. As mentioned before,the beta distribution δ model the a�liation from in�uence aspectand γ from the participation aspect. For the reason that the waymodeling in�uence using citations is biased to old documents andtherefore not comprehensive, we only utilize the a�liation strengthfrom the perspective of participation.

One interesting phenomenon is that there are a certain amount ofpeople who show high involvement in one community at �rst andthen appear to be more active in another community. If the numberof people who expose a�liation shi� between two communitiesis large, it indicates that there is a user �ow from one community

Page 8: Community Detection in Weighted Temporal Text Networks Detection inWeight… · temporal community detection, topic model, data mining, social network ACM Reference format: Yuting

Conference’17, July 2017, Washington, DC, USA Y. Jia et al.

F

D

E

C

B

A

02

networkwirelesssensormobileroute 03

optimizationlearning

timecontrollinear 04

datasocial

informationcomputation

web 05

robotinteraction

mobilehumanvirtual 06

graphsecuritybound

computationapproximation

dataweb

searchsemantic

query07 08

programminglogic

servicecheckdesign 09

imagedetect

recognitionfeature

segmentation 10

performancememory

computationparallel

time 01

channelcode

powerdesignMIMO

(a)

(b)

TinyDB: an acquisitional query processing system for sensor networks. (2005)

Sharing aggregate computation for distributed queries. (2007)

MAD skills: new analysis practices for big data. (2009)

Hybrid in-database inference for declarative information extraction. (2010)

TinyDB: an acquisitional query processing system for sensor networks. (2005)

GraphX: a resilient distributed graph system on Spark. (2013)

Querying probabilistic information extraction. (2010)

B

C

C

D

Joseph M. Hellerstein

Michael J. Franklin

(c)

Figure 5: Dynamic pattern:(a) �e user �ow among communities. We represent the communities in the form of pie chart.�e sectors of colors correspond to the topics with the same color stated at bottom. (b) Temporal a�liation strength. Foreach user, the top 2 dominant communities he belongs to are present as well as representative papers. (c) Change of commoninteractions over time. For certain year, the number of common interactions between two users indicates how many timestheir publications within that year reference or are cited by the same authors.

to another. We investigate such user �ow on Academia dataset,se�ing both the topic number and community number ten.

Part of the user �ow we discovered is displayed in Figure 5(a).Since the topic is more interpretable, we present the communityin the form of combination of topics. According to the result, thecommunity B whose interested topics cover data mining, optimiza-tion and information retrieval can be viewed as a rising community.Because the input �ow of community B far outweighs the output�ow and these people from community A C E F all show greattendency to join community B. �us, we expect community B tobe larger and popular. In addition, compared to community B, com-munity C which mainly focuses on wireless sensor network andcommunity E whose major interest is logic programming are losingheat. Such user �ow diagram enable us to track the developmentof the community.

6.3.2 Analysis of Individual User. Zooming into individual user,the change of community a�liation strength over time can be in-vestigated. In Figure 5(b), we pick up two users in the Academiadataset, plot a�liation strength of the top two dominant communi-ties each user belongs to and list some representative papers of thetwo users. Joseph M. Hellerstein and Michael J. Franklin co-authorseveral times and share high degree of a�liation to communityC. Concluded from the a�liation strength plot, up to 2010, JosephM. Hellerstein has almost le� the community C while Michael J.

Iterations0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Norm

alize

d log

lik

elih

ood

NP

P1

P2

P3

(a) Log Likelihood

NP P1 P2 P3

Parallel Implemetation

0

500

1000

1500

2000A

vera

ge T

ime f

or

One Inte

ract

ion(s

) 1987.2

151.2 158.5 157.1

(b) Iteration Time

Figure 6: Comparison of Di�erent Parallel Implementation(NP : No Parallel; P1: Batch Sampling; P2: Exact Synchroniza-tion; P3: Nearly-Exact Synchronization)

Franklin is still in it. �is �nding is in accordance with the sta-tistics as demonstrated in Figure 5(c) – the number of commoninteracted-authors of these two scholars fell from 1280 per yearduring 2001-2006 to 287 per year during 2007-2015.

6.4 Parallel Implementations ComparisonIn this section, we compare three di�erent parallel implementationsdescribed in §4.3. �e log-likelihood and iteration time are recordedon Academia dataset.

6.4.1 Log-Likelihood Comparison. To evaluate the sampling per-formance for di�erent parallel implementations, we calculate the

Page 9: Community Detection in Weighted Temporal Text Networks Detection inWeight… · temporal community detection, topic model, data mining, social network ACM Reference format: Yuting

Community Detection in Weighted Temporal Text Networks Conference’17, July 2017, Washington, DC, USA

log-likelihood for generating the whole training data in the in-ference iterations, as shown in Figure 6(a). Compared to othermethods, P1(Batch Sampling) converge slower, but the convergedlog-likelihood is higher. �is result is in accordance with the fea-tures of batch update and stochastic update that batch update con-verge slower but can reach extreme point, while stochastic updateconverge faster but may never converge (keep oscillating aroundthe extreme point). �e trace of log-likelihood of P2(Exact Synchro-nization) is exactly same to NP (No Parallel), because the synchro-nization is exactly done, which means the sampling process of P2is same to single process version. �e log-likelihood of P3(Nearly-Exact Synchronization) is similar to P2, except few points wherelog-likelihood of P3 is a li�le smaller. �is di�erence should be dueto the errors of approximate counter update.

6.4.2 Iteration Time Comparison. Figure 6(b) gives the averagedtime cost per iteration of di�erent parallel implementations. �ethree parallel implementations (i.e. P1, P2, P3) are all take 16 cores.�e time cost per iteration of di�erent implementations are nearlysame, and the speedup is greater than 12 compared to NP . P1is fastest because it need not update counters and the followingone is P3, which update counters without lock. Although P1 isfaster than the other two parallel implementations, because it takemore iterations for convergence, it isn’t the most e�cient choice.Considering both e�ciency and performance, we should adopt P3(or P2) for �rst several iterations and then switch to P1 for lastseveral iterations.

7 CONCLUSIONIn this paper, we study the problem of temporal community detec-tion in temporal text network. We present CTCD (Custom TemporalCommunity Detection), a probabilistic generative model consider-ing network, text and time, to extract temporal community a�l-iations for each user, the in�uence strength across communitiesand temporal topic interest in each community in a uni�ed way.An inference method for CTCD based on Gibbs sampling is given.What’s more, to adopt to large-scale dataset, we propose the scal-able parallel implementation, in which 3 methods to deal with thesynchronization problem is given and compared. �e extractedknowledge of CTCD can be used to address various practical prob-lem, such as user behavior prediction and user’s community shi�analysis. �e result of experiments on two real-world large-scaletemporal text network demonstrates the e�ectiveness of CTCD.

REFERENCES[1] Ramnath Balasubramanyan and William W. Cohen. 2014. Block-LDA: Jointly

Modeling Entity-Annotated Text and Entity-Entity Links. In Handbook of MixedMembership Models and �eir Applications. 255–273.

[2] Marya Bazzi, Mason A. Porter, Stacy Williams, Mark McDonald, Daniel J. Fenn,and Sam D. Howison. 2016. Community Detection in Temporal MultilayerNetworks, with an Application to Correlation Networks. Multiscale Modeling &Simulation 14, 1 (2016), 1–41.

[3] Leon Bo�ou and Olivier Bousquet. 2008. �e Tradeo�s of Large Scale Learning.In Advances in Neural Information Processing Systems. Vol. 20. 161–168.

[4] Deepayan Chakrabarti, Ravi Kumar, and Andrew Tomkins. 2006. EvolutionaryClustering. In Proceedings of the 12th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining (KDD ’06). 554–560.

[5] Jonathan Chang and David M. Blei. 2010. Hierarchical Relational Models forDocument Networks. �e Annals of Applied Statistics 4, 1 (2010), 124–150.

[6] Duanbing Chen, Mingsheng Shang, Zehua Lv, and Yan Fu. 2010. Detectingoverlapping communities of weighted networks via a local algorithm. Physica

A: Statistical Mechanics and its Applications 389, 19 (2010), 4177 – 4187.[7] Yudong Chen, Vikas Kawadia, and Rahul Urgaonkar. 2013. Detecting Overlapping

Temporal Community Structure in Time-Evolving Networks. CoRR abs/1303.7226(2013). h�p://arxiv.org/abs/1303.7226

[8] Michele Coscia, Giulio Rosse�i, Fosca Gianno�i, and Dino Pedreschi. 2012. DE-MON: A Local-�rst Discovery Method for Overlapping Communities. In Proceed-ings of the 18th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining (KDD ’12). 615–623.

[9] Padhraic Smyth David Newman and Mark Steyvers. 2006. Scalable Parallel TopicModels. Journal of Intelligence Community Research and Development (2006).

[10] N. Du, X. Jia, J. Gao, V. Gopalakrishnan, and A. Zhang. 2015. Tracking TemporalCommunity Strength in Dynamic Networks. IEEE Transactions on Knowledgeand Data Engineering 27, 11 (Nov 2015), 3125–3137.

[11] Tom Fawce�. 2006. An Introduction to ROC Analysis. Pa�ern Recogn. Le�. 27, 8(June 2006), 861–874.

[12] Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486,3��5 (2010), 75 – 174.

[13] Mario Frank, Andreas P. Streich, David Basin, and Joachim M. Buhmann. 2012.Multi-assignment Clustering for Boolean Data. J. Mach. Learn. Res. 13, 1 (Feb.2012), 459–489.

[14] M. Girvan and M. E. J. Newman. 2002. Community structure in social andbiological networks. PNAS 99, 12 (2002), 7821–7826.

[15] S. Gregory. 2010. Finding overlapping communities in networks by label prop-agation. New Journal of Physics 12, 10 (Oct. 2010), 103018. arXiv:physics.soc-ph/0910.5516

[16] T. L. Gri�ths and M. Steyvers. 2004. Finding scienti�c topics. Proceedings of theNational Academy of Sciences 101, Suppl. 1 (April 2004), 5228–5235.

[17] Qirong Ho, Rong Yan, Rajat Raina, and Eric P. Xing. 2012. Understanding theInteraction between Interests, Conversations and Friendships in Facebook. CoRRabs/1211.0028 (2012).

[18] Zhiting Hu, Junjie Yao, Bin Cui, and Eric Xing. 2015. Community Level Di�usionExtraction. In Proceedings of the 2015 ACM SIGMOD International Conference onManagement of Data (SIGMOD ’15). 1555–1569.

[19] Vikas Kawadia and Sameet Sreenivasan. 2012. Sequential detection of temporalcommunities by estrangement con�nement. Scienti�c Reports 2 (Nov. 2012).h�p://www.nature.com/srep/2012/121109/srep00794/full/srep00794.html

[20] Zhiyuan Liu, Yuzhou Zhang, Edward Y. Chang, and Maosong Sun. 2011. PLDA+:Parallel Latent Dirichlet Allocation with Data Placement and Pipeline Processing.ACM Transactions on Intelligent Systems and Technology, special issue on LargeScale Machine Learning (2011).

[21] Z. Lu, Y. Wen, and G. Cao. 2013. Community detection in weighted networks:Algorithms and applications. In 2013 IEEE International Conference on PervasiveComputing and Communications (PerCom). 179–184.

[22] Gergely Palla, Imre Dernyi, Ills Farkas, and Tams Vicsek. 2005. Uncovering theoverlapping community structure of complex networks in nature and society.Nature 435, 7043 (June 2005), 814–818.

[23] Xuerui Wang and Andrew McCallum. 2006. Topics over Time: A non-MarkovContinuous-time Model of Topical Trends. In Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’06). 424–433.

[24] Jierui Xie, Stephen Kelley, and Boleslaw K. Szymanski. 2011. Overlapping Com-munity Detection in Networks: the State of the Art and Comparative Study.CoRR abs/1110.5813 (2011).

[25] Jaewon Yang and Jure Leskovec. 2013. Overlapping Community Detection atScale: A Nonnegative Matrix Factorization Approach. In Proceedings of the SixthACM International Conference on Web Search and Data Mining (WSDM ’13). 587–596.

[26] H. Yin, B. Cui, H. Lu, Y. Huang, and J. Yao. 2013. A uni�ed model for stable andtemporal topic detection from social media data. In 2013 IEEE 29th InternationalConference on Data Engineering (ICDE). 661–672.

[27] Hongyi Zhang, Irwin King, and Michael Lyu. 2015. Incorporating Implicit LinkPreference Into Overlapping Community Detection. (2015).

[28] Yaojia Zhu, Xiaoran Yan, Lise Getoor, and Cristopher Moore. 2013. ScalableText and Link Analysis with Mixed-topic Link Models. In Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discovery and DataMining(KDD ’13). 473–481.