representations and ensemble methods for dynamic...

Representations and Ensemble Methods for Dynamic Relational Classification

Ryan A. RossiDepartment of Computer Science

Purdue [email protected]

Jennifer NevilleDepartment of Computer Science

Purdue [email protected]

Abstract—Temporal networks are ubiquitous and evolve overtime by the addition, deletion, and changing of links, nodes, andattributes. Although many relational datasets contain temporalinformation, the majority of existing techniques in relationallearning focus on static snapshots and ignore the temporaldynamics. We propose a framework for discovering temporalrepresentations of relational data to increase the accuracyof statistical relational learning algorithms. The temporalrelational representations serve as a basis for classification,ensembles, and pattern mining in evolving domains. Theframework includes (1) selecting the time-varying relationalcomponents (links, attributes, nodes), (2) selecting the temporalgranularity (i.e., set of timesteps), (3) predicting the temporalinfluence of each time-varying relational component, and (4)choosing the weighted relational classifier. Additionally, wepropose temporal ensemble methods that exploit the temporal-dimension of relational data. These ensembles outperformtraditional and more sophisticated relational ensembles whileavoiding the issue of learning the most optimal representation.Finally, the space of temporal-relational models are evaluatedusing a sample of classifiers. In all cases, the proposedtemporal-relational classifiers outperform competing modelsthat ignore the temporal information. The results demonstratethe capability and necessity of the temporal-relational represen-tations for classification, ensembles, and for mining temporaldatasets.

Keywords-Time-evolving relational classification; tempo-ral network classifiers; temporal-relational representations;temporal-relational ensembles; statistical relational learning;graphical models; mining temporal-relational datasets

I. INTRODUCTION

Temporal-relational information is seemingly ubiquitous;it is present in domains such as the Internet, citation and col-laboration networks, communication/email networks, socialnetworks, biological networks, among many others. Thesedomains all have attributes, links, and/or nodes changingover time which are important to model. We conjecturethat discovering an accurate temporal-relational represen-tation disambiguates the true nature and strength of links,attributes, and nodes. However, the majority of research inrelational learning has focused on modeling static snap-shots [1], [2], [3] and has largely ignored the utility oflearning and incorporating temporal dynamics into relationalrepresentations.

Temporal relational data has three main components (i.e.,attributes, nodes, links) that vary in time. First, the at-

tribute values might change over time (e.g., research areaof an author). Secondly, links might be created and deletedthroughout time (e.g., friendships or a paper citing a previouspaper). Thirdly, nodes might be activated and deactivatedthroughout time (e.g., a person might not send an emailfor a few days). Additionally, in a temporal prediction task,the attribute to predict is changing throughout time (e.g.,predicting a network anomaly) whereas in a static predictiontask the predictive attribute remains constant.

Consequently, the space of temporal relational models isdefined by considering the set of relational elements thatmight change over time such as the attributes, links, andnodes. Additionally, the space of temporal-relational repre-sentations depends on a temporal weighting and the temporalgranularity. The temporal weighting attempts to predict theinfluence of the links, attributes and nodes by decaying theweights of each with respect to time whereas the temporalgranularity restricts links, attributes, and nodes with respectto some window of time. The most optimal temporal-relational representation and the corresponding temporalclassifier depends on the particular temporal dynamics ofthe links, attributes, and nodes present in the data and alsoon the domain and type of network (e.g., social networks,biological networks).

In this work, we address the problem of selecting themost optimal temporal-relational representation to increaseaccuracy of predictive models. The space of temporal-relational representations leads us to propose the (1)temporal-relational classification framework and (2) tempo-ral ensemble methods (e.g., temporally sampling, randomiz-ing, and transforming features) that leverage time-varyinglinks, attributes, and nodes. We evaluate these temporal-relational models on a variety of classification tasks andevaluate each under various constraints. Finally, we explorethe utility of the framework for (3) mining temporal datasetsand discovering temporal patterns. The results demonstratethe importance and scalability of the temporal-relationalrepresentations for classification, ensembles, and for miningtemporal datasets.

II. RELATED WORK

Most previous work uses static snapshots or significantlylimits the amount of temporal information used for rela-

arX

iv:1

111.

5312

v1 [

cs.A

I] 2

2 N

ov 2

011

tional learning. Sharan et. al. [4] assumes a strict temporal-representation that uses kernel estimation for links andincludes these into a classifier. They do not consider multipletemporal granularities (all information is used, statically)and the attributes and nodes are not weighted. In addition,they focus only on one specific temporal pattern and ig-nore the rest whereas we explore many temporal-relationalrepresentations and propose a flexible framework capableof capturing the temporal patterns of links, attributes, andnodes. Moreover, they only evaluate and consider staticprediction tasks. Other work has focused on discoveringtemporal patterns between attributes [5]. There are alsotemporal centrality measures that capture properties of thenetwork structure [6].

III. TEMPORAL-RELATIONAL CLASSIFICATIONFRAMEWORK

The temporal-relational classification framework is de-fined with respect to the possible transformations of links,attributes, or nodes (as a function of time). The temporalweighting (e.g., exponential decay of past information) andtemporal granularity (e.g., window of timesteps) of thelinks, attributes and nodes form the basis for any arbitrarytransformation with respect to the temporal information (SeeTable I). The discovered temporal-relational representationcan be applied for mining temporal patterns, classification,and as a means for constructing temporal-ensembles. Anoverview of the temporal-relational representation discoveryis provided below:

1) For each RELATIONAL COMPONENT

− Links, Attributes, or Nodes2) Select the TEMPORAL GRANULARITY

? Timestep ti? Window {ti, ti+1..., tj}? Union T = {t0, ..., tn}

3) Select the TEMPORAL INFLUENCE

? Weighted? UniformRepeat steps 1-3 for each relational component.

4) Select the Modified RELATIONAL CLASSIFIER

? Relational Bayes Classifier (RBC)? Relational Probability Trees (RPT)

Table I provides an intuitive view of the possibletemporal-relational representations. For instance, the recentTVRC model is a special case of the proposed frameworkwhere the links, attributes, and nodes are unioned and thelinks are weighted.

A. Relational Components: Links, Attributes, Nodes

The data is represented as an attributed graph D =(G,X). The graph G = (V,E) represents a set of N nodes,such that vi ∈ V corresponds to node i and each edge

Table ITEMPORAL-RELATIONAL REPRESENTATION.

UNIFORM WEIGHTING

Tim

este

p

Win

dow

Uni

on

Tim

este

p

Win

dow

Uni

on

EdgesAttributes

Nodes

eij ∈ E corresponds to a link (e.g., email) between nodes iand j. The attribute set:

X =

(XV = [X1, X2, ..., Xmv ],XE = [Xmv+1, Xmv+2, ..., Xmv+me ]

)may contain observed attributes on both the nodes (XV) andthe edges (XE). Below we use Xm to refer to the genericmth attribute on either nodes or edges.

There are three aspects of relational data that may varyover time. First, the values of attribute Xm may vary overtime.

Second, edges may vary over time. This results in adifferent data graph Gt = (V,Et) for each time step t, wherethe nodes remain constant but the edge set may vary (i.e.,Eti 6= Etj for some i, j). Third, a nodes existence mayvary over time (i.e., objects may be added or deleted). Thisis also represented as a set of data graphs G′t = (Vt, Et),but in this case both the nodes and edge sets may vary.Let Dt = (Gt,Xt) refer to the dataset set at time t, whereGt = (V,Et,W

Et ) and Xt = (XV

t ,XEt ,W

Xt ). Here Wt

refers to a function that assigns weights on the edges andattributes that are used in the classifiers below. We defineWE

t (i, j) = 1 if eij ∈ Et and 0 otherwise. Similarly, wedefine WX

t (xmi ) = 1 if Xmi = xmi ∈ Xm

t and 0 otherwise.

B. Temporal Granularity

Traditionally, relational classifiers have attempted to useall the data [4]. Conversely, the appropriate temporal gran-ularity (i.e., set of timesteps) can be learned to improveclassification accuracy. We briefly define the three generalclasses evaluated in this work for varying the temporalgranularity of the links, attributes, and nodes.

1) Timestep. The timestep models only use a singletimestep ti for learning.

2) Window. The window models use a sliding windowof timesteps {ti, ti+1..., tj} for learning. The space ofwindow models is by far the largest.

3) Union. The union model uses all the previous temporalinformation for learning.

The timestep and union models are separated into distinctclasses for clarity in evaluation and for pattern mining.

C. Temporal Influence: Links, Attributes, Nodes

The influence of the relational components over time arepredicted using temporal weighting. The temporal weightscan be viewed as probabilities that a relational componentis still active at the current time step t, given that it wasobserved at time (t−k). Conversely, the temporal influenceof a relational component might be treated uniformly. Ad-ditionally, weighting functions can be chosen for differentrelational components with varying temporal granularities.For instance, the temporal influence of the links might bepredicted using the exponential kernel while the attributesare uniformly weighted but have a different temporal gran-ularity than the links.

1) Weighting. We investigated three temporal weightingfunctions:• Exponential Kernel. The exponential kernel

weights the recent past highly and decays theweight rapidly as time passes [7]. The kernelfunction KE for temporal data is defined as:

KE(Di; t, θ) = (1− θ)t−iθWi

• Linear Kernel. The linear kernel decays more gen-tly and retains the historical information longer.The linear kernel KL for the data is defined as:

KL(Di; t, θ) = θWi(t∗ − ti + 1

t∗ − to + 1)

• Inverse Linear Kernel. The inverse linear kernelKIL lies between the exponential and linear ker-nels when moderating the contribution of histori-cal information. The inverse linear kernel for thedata is defined as:

KIL(Di; t, θ) = θWi(1

ti − to + 1)

2) Uniform. The relational component(s) could be as-signed uniform weights across time for the selectedtemporal granularity (e.g., traditional classifiers assignuniform weights, but they do not select the appropriatetemporal granularity).

D. Temporal-Relational Classification

Once the temporal granularity and the temporal weightingare selected for each relational component, then a temporal-relational classifier is learned. Modified versions of theRBC [8] and the RPT [9] are applied with the temporal-relational representation. However, any relational model thathas been modified for weights is suitable for this phase.We extended RBCs and RPTs since they are interpretable,diverse, simple, and efficient. We use k-fold cross-validationto learn the “best” model. Both classifiers are extended forlearning and inference through time.

Weighted Relational Bayes Classifier. RBCs extendnaive Bayes classifiers to relational settings by treating

heterogeneous relational subgraphs as a homogenous setof attribute multisets. For example, when modeling thedependencies between the topic of a paper and the topics ofits references, the topics of those references form multisetsof varying size (e.g., {NN, GA}, {NN, NN, RL, NN, GA}).The RBC models these heterogenous multisets by assumingthat each value of the multiset is independently drawn fromthe same multinomial distribution. This approach is designedto mirror the independence assumption of the naive Bayesianclassifier [10]. In addition to the conventional assumptionof attribute independence, the RBC also assumes attributevalue independence within each multiset. More formally, fora class label C, attributes X, and related items R, the RBCcalculates the probability of C for an item i of type G(i) asfollows:

P (Ci|X, R) ∝∏

Xm∈XG(i)

P (Xim|C)

∏j∈R

∏Xk∈XG(j)

P (Xjk|C)P (C)

Weighted Relational Probability Trees. RPTs extendstandard probability estimation trees to a relational setting inwhich data instances are heterogeneous and interdependent.The algorithm for learning the structure and parameters ofa RPT searches over a space of relational features that useaggregation functions (e.g. AVERAGE, MODE, COUNT)to dynamically propositionalize relational data multisets andcreate binary splits within the RPT.

Learning. The RBC uses standard maximum likelihoodlearning with Laplace correction for zero-values. More

Figure 1. Temporal Link Weighting

Figure 2. Temporal Attribute Weighting

Figure 3. Graph and Attribute Weighting

(a) Links weighting (b) Link and attribute weighting

Figure 4. (a) The feature calculation that includes only the temporal linkweights. (b) The feature calculation that incorporates both the temporalattribute weights and the temporal link weights.

specifically, the sufficient statistics for each conditionalprobability distribution are computed as weighted sums ofcounts based on the link and attribute weights. The RPTuses the standard RPT learning algorithm except that theaggregate functions are computed after the appropriate linksand attributes weights are included with respect to theselected temporal granularity (shown in Figure 4).

Prediction. For prediction we compute the summary dataDSt

at time t − the time step for which the model is beingapplied. The learned model for time (t − 1) to DSt

. Theweighted classifier is appropriately augmented to incorporatethe weights for DSt .

IV. TEMPORAL ENSEMBLE METHODS

Ensemble methods have traditionally been used to im-prove predictions by considering a weighted vote from a setof classifiers [11]. We propose temporal ensemble methodsthat exploit the temporal dimension of relational data toconstruct more accurate predictors. This is in contrast totraditional ensembles that disregard the temporal informa-tion. The temporal-relational classification framework andin particular the temporal-relational representations of thetime-varying links, nodes, and attributes form the basis ofthe temporal ensembles (i.e., used as a wrapper over ourframework). The proposed temporal ensemble techniques areassigned to one of the five methodologies described below.

A. Transforming the Temporal Nodes and LinksThe first temporal-ensemble method learns a set of clas-

sifiers where each of the classifiers are applied after the linkand nodes are sampled from each discrete timestep accordingto some probability. This sampling strategy is performed af-ter constructing the temporal-relational representation wherethe temporal weighting and temporal granularity have beenselected. Additionally, the sampling probabilities for eachtimestep can be chosen to be biased toward the present orthe past. In contrast to applying a sampling strategy acrosstime, we might transform the time-varying nodes and linksusing the methods described in the framework.

B. Sampling or Transforming the Temporal Feature SpaceThe second type of temporal ensemble method transforms

the temporal feature space by localizing randomization (for

attributes at each timestep), weighting, or by varying thetemporal granularity of the features. Additionally, we mightuse only one temporal weighting function but learn differentdecay parameters or resample from the temporal features.The temporal features could also be clustered (using varyingdecay parameter or randomizations), similar to the dynamictopic discovery models evaluated later in the paper.

C. Adding Noise or Randomness

A temporal ensemble based on adding noise along thetemporal dimension of the data may significantly increasegeneralization and performance. Suppose, we randomly per-mute the nodes feature values across the timesteps (i.e., anodes recent behavior is observed in the past and vice versa)or links between nodes are permuted across time.

D. Transforming the Time-Varying Class Labels

These temporal ensemble methods introduce variance inthe classifiers by randomly permuting the previously learnedlabels at t-1 (or more distant) with the the true labels at t.

E. Multiple Classification Algorithms and Weightings

A temporal ensemble may be constructed by randomlyselecting from a set of classification algorithms (i.e., RPT,RBC, wvRN, RDN), while using equivalent temporal-relational representations or by varying the representationwith respect to the temporal weighting or granularity.Notably, an ensemble using RPT and RBC significantlyincreases accuracy, most likely due to the diversity ofthese temporal classifiers (i.e., correctly predicting differentinstances). Additionally, the temporal-classifiers might beassigned weights based on cross-validation (or Bayesianapproach).

V. METHODOLOGY

We describe the datasets and define a few representativetemporal-relational classifiers from the framework.

A. Datasets

For evaluating the framework, we use a range of bothstatic (i.e., prediction attribute is constant as a function oftime) and temporal prediction tasks (i.e., prediction attributechanges between timesteps).

PYCOMM Developer Communication Network. We an-alyze email and bug communication networks extracted fromthe Python development environment (www.python.org).We use the python-dev mailing list archive for the period01/01/07−09/30/08. The sample contains 13181 email mes-sages, among 1914 users. Bug reports were also collectedand we constructed a second bug discussion network. Thesample contained 69435 bug comments among 5108 users.The size of the timesteps are three months.

We also extracted text from emails and bug messages anduse it to dynamically model the topics between individualsand teams. Additionally, we discover temporal centrality

attributes (i.e., clustering coefficient, betweenness). The pre-diction task is whether a developer is effective (i.e., if a userclosed a bug in that timestep).

Table IIGENERATED ATTRIBUTES FROM THE PYCOMM NETWORK

Python Communication Network Attributes

Conv Tool BuildDemos & tools Dist UtilsDocumentation Doc Tools

Team Installation InterpCoreMembership Regular Expr Tests

Unicode WindowsCtypes Ext ModulesIdle LibraryLibTkinter XML

Performance Assigned To [HAS CLOSED]

Communication Comm. Count Bug Comm.Attributes Email Comm.

User Topics Topic Email TopicBug Topic

Temporal Eigenvector Cluster. Coeff.Centrality Betweenness Degree

Edge Count Edge TopicLink Attributes Email Count Email Topic

Bug Count Bug Topic

CORA Citation Network. The CORA database containsauthorship and citation information about CS research papersextracted automatically from the web. The prediction tasksare to predict one of seven machine learning papers and topredict AI papers given the topic of its references. In addi-tion, these techniques are evaluated using the most prevalenttopics its authors are working on through collaborations withother authors.

B. Temporal ModelsThe space of temporal-relational models are evaluated

using a representative sample of classifiers with varyingtemporal-relational weightings and granularities. For everytimestep t, we learn a model on Dt (i.e., some set oftimesteps) and apply the model to Dt+1. The utility of thetemporal-relational classifiers and representation are mea-sured using the area under the ROC curve (AUC). Below, webriefly describe a few classes of models that were evaluated.• TENC: The TENC models predict the temporal influ-

ence of both the links and attributes.• TVRC: This model weights only the links using all

previous timesteps.• Union Model: The union model uses all links and

nodes up to and including t for learning.• Window Model: The window model uses the dataDt−1 for prediction on Dt (unless otherwise specified).

We also compare simpler models such as the RPT (re-lational information only) and the DT (non-relational) that

TVRCRPTIntrinsicInt+timeInt+graphInt+topics

AU

C

0.92

0.94

0.96

0.98

1.00

Figure 5. We compare a primitive temporal model (TVRC) to competingrelational (RPT), and non-relational (DT) models. The AUC is averagedacross the timesteps.

ignore any temporal information. Additionally, we exploremany other models, including the class of window models,various weighting functions (besides exponential kernel),and built models that vary the set of windows in TENCand TVRC.

VI. EXPERIMENTS

We first evaluate temporal-relational representations forimproving classification models. These models are evaluatedusing different types of attributes (e.g., relational only vs.non-relational) and also by using different types of dis-covered attributes (e.g., temporal centrality, team attributes,communication). The results demonstrate the utility of thetemporal-relational classifiers, their representation, and thediscovered temporal attributes. We also identify the mini-mum temporal information (i.e., simplest model) requiredto outperform classifiers that ignore the temporal dynam-ics. Furthermore, the proposed temporal ensemble methods(i.e., temporally sampling, randomizing, and transformingfeatures) are evaluated and the results demonstrate signifi-cant improvements over traditional and relational ensemblemethods.

We then focus on models that vary the temporal-granularity and apply these for mining temporal patterns andmore generally for discovering the nature of the time-varyinglinks and attributes. Finally, we apply temporal textual analy-sis, generate topic features, and annotate the links and nodeswith their corresponding topics over time. The significanceof the evolutionary topic patterns are evaluated using aclassification task. The results indicate the effectiveness ofthe temporal textual analysis for discovering time-varyingfeatures and incorporating these patterns to increase theaccuracy of a classification task. For brevity, we omit manyplots and comparisons.

T=1 T=2 T=3 T=4

AU

C

0.65

0.70

0.75

0.80

0.85

0.90

0.95

TENCTVRCTVRC+UnionWindow ModelUnion Model

Figure 6. Exploring the space of temporal-relational models. We evaluatesignificantly different temporal-relational representations from the proposedframework. This experiment uses the PYCOMM network, but focuses ontime-varying relational attributes.

A. Single Models

We provide examples of temporal-relational models fromthe proposed framework and show that in all cases theperformance of classification improves when the temporaldynamics are appropriately modeled.Temporal, Relational, and Non-relational Information.We first assess the utility of the temporal, relational, andnon-relational information. In particular, we are interested inthis information as it pertains to the construction of featuresand their selection and pruning from the model. For theseexperiments, we compare the most primitive models suchas TVRC (i.e., uses temporal-relational information), RPT(i.e., only relational information), and a decision tree thatuses only non-relational information. Additionally, we learnthese models using various types of attributes and explorethe utility of each with respect to the temporal, relational ornon-relational information.

Figure 5 compares TVRC (i.e., a primitive temporal-relational classifier) with the RPT and DT models that usemore features but ignore the temporal dynamics of the data.We find the TVRC to be the simplest temporal-relationalclassifier that still outperforms the others. Interestingly, thediscovered topic features are the only additional features thatimprove performance of the DT model. This is significantas these attributes are discovered by dynamically modelingthe topics, but are included in the DT model as simple non-relational features (i.e., no temporal weighting or granularity,...). We also find that in some cases the selective learnerchooses a suboptimal feature when additional features areincluded in the basic DT model (see Figure 5). More sur-prisingly, the base RPT model does not improve performanceover the DT model, indicating the significance of moderatingthe relational information with the temporal dynamics.Exploring Temporal-Relational Models. We focus on ex-

ploring a small but representative set of temporal-relationalmodels from the proposed framework. To more appropriatelyevaluate their temporal-representations, we chose to removehighly correlated attributes (i.e., that are not necessarilytemporal patterns, or motifs), such as assignedto in thePYCOMM prediction task. In Figure 6, we find that TENCoutperforms the other models over all timesteps. This pro-posed class of models is significantly more complex thanTVRC (and most other models) since the temporal influenceof both links and attributes are learned.

We then explored learning the appropriate temporal gran-ularity but with respect to the TVRC model. Figure 6 showsthe results from two models in the TVRC class wherewe attempt to tease apart the superiority of TENC (i.e.,weighting or granularity). However, both models outperformone another on different timesteps, indicating the necessityfor a more precise temporal-representation that optimizesthe temporal granularity by selecting the appropriate decayparameters for links and attributes (i.e., in contrast to amore strict representation of including the links or not).The window and union models perform significantly worse,but are significantly more efficient and scalable for billionnode temporal datasets while still including some temporalinformation based on the granularity of the links and at-tributes. Similar results were found using CORA and otherbase classifiers such as RBC.

We have also experimented searching over many temporalweighting functions and found the exponential decay tobe the most appropriate for both links and attributes inthe proposed prediction tasks. The most optimal temporal-relational representation depends on the temporal dynam-ics and nature of the network under consideration (e.g.,social networks, biological networks, citation networks).Nevertheless, multiple temporal weightings and granularitiesare found to be useful for constructing robust temporalensembles that significantly reduce error and variance (i.e.,compared to single temporal-relational classifiers and moreimportantly relational and traditional ensembles).

The accuracy of classification generally increases as moretemporal information is included in the representation. How-ever, this may lead to overfitting or other biases. On the otherhand, the more complex temporal-relational representationsaid in the mining of temporal patterns. For instance, the useof the evolutionary topic patterns for improving classificationby moderating both the links and attributes over time (SeeSection VI-D).Selective Temporal Learning. We also explored “selectivetemporal learning” that uses multiple temporal weightingfunctions (i.e., and temporal granularities) for the links andattributes. The motivation for such an approach is that theinfluence of each temporal component should be modeledindependently, since any two attributes (or links) are likelyto decay at different rates. However, the complexity andthe utility of the learned temporal-relational representation

depends on the ability of the selective learner to select thebest temporal features (derived from weighting or varyingthe temporal granularity of attributes and links) without over-fitting or causing other problems. We found that the selectivetemporal learning performs best for simpler prediction tasks,however, it still frequently outperforms classifiers that ignorethe temporal information.

B. Temporal-Ensemble Models

Instead of directly learning the most optimal temporal-relational representation to increase the accuracy of classifi-cation, we use temporal ensembles by varying the relationalrepresentation with respect to the temporal information.These ensemble models reduce error due to variance andallow us to assess which features are the most relevantto the domain with respect to the relational or temporalinformation.Temporal, Relational, and Traditional Ensembles. Wefirst resampled the instances (nodes, links, features) repeat-edly and then learn TVRC, RPT, and DT models. Acrossalmost all the timesteps, we find the temporal-ensemble thatuses various temporal-relational representations outperformsthe relational-ensemble and the traditional ensemble (seeFigure 7). The temporal-ensemble outperforms the otherseven when a the minimum amount of temporal informa-tion is used (e.g., time-varying links). More sophisticatedtemporal-ensembles can be constructed to further increaseaccuracy. For instance, we have investigated ensembles thatuse significantly different temporal-relational representations(i.e., from a wider range of model classes) and ensemblesthat use various temporal weighting parameters. In all cases,these ensembles are more robust and increase the accuracyover more traditional ensemble techniques (and single clas-sifiers).

Additionally, the average improvement of the temporal-ensembles is significant at p < 0.05 with a 16% reductionin error, justifying the proposed temporal ensemble method-ologies. From the individual trials, it is clear that the RPT

TVRCRPTDT

AU

C

0.92

0.94

0.96

0.98

1.00

T=1 T=2 T=3 T=4 AvgFigure 7. Comparing Temporal, Relational, and Traditional Ensembles

Communication Team Centrality Topics

AU

C

0.6

0.7

0.8

0.9

1.0

TVRCRPTDT

Figure 8. Comparing the utility of the discovered attribute classes and theinfluence of each with respect to the temporal, relational, and traditionalensembles.

has a lot of variance—despite the use of ensembles, which isaimed at reducing variance, the RPT performs significantlybetter in one trial (t = 3) and worse in another (t = 1).This provides further evidence that relational informationand the utility of such information increases significantlywhen moderated by the temporal-information.Attribute Classes: Temporal Patterns and Significance.We again use one of the most primitive classes of temporal-relational representations in order to tease apart (i.e., moreaccurately) the most significant attribute category (commu-nication, team, centrality, topics). These primitive temporal-representations also help identify the minimum amount oftemporal information that we must consider to outperformrelational classifiers. This is important as the more temporal-relational information we exploit, the more complex andexpensive it is to learn and search from this space.

In Figure 8, we find several striking temporal patterns.First, the team attributes are localized in time and are notchanging frequently. For instance, it is unlikely that a devel-oper changes their assigned teams and therefore modelingthe temporal dynamics only increases the accuracy by a rel-atively small percent. However, the temporal-ensemble stillincreases the accuracy over the other ensemble methods thatignore the temporal patterns. This indicates the robustness ofthe temporal-relational representations. Moreover, we alsonotice that a few developers change projects frequently,which could be responsible for the increase in accuracy whenthe temporal information is leveraged. More importantly,the other classes of attributes are evolving considerablyand this fact is captured in the significant improvement ofthe temporal ensemble models. Similar performance is alsoobtained by varying the temporal granularity. We provide afew examples in the next section.Randomization. We use randomization to identify the sig-nificant attributes in the temporal-ensemble models. Addi-tionally, randomization provides a means to rank the featuresand identify redundandt features (i.e., two features may share

assi

gned

−co

unt

assi

gned

−co

unt−

prev

emai

l−di

scus

sion

−co

unt

degr

ee−

cent

ralit

y−em

ail

all−

com

mun

icat

ion−

coun

t

bug−

disc

ussi

on−

coun

t

bc−

cent

ralit

y−em

ail−

disc

ussi

on

eige

n−ce

ntra

lity−

all−

disc

ussi

on

clus

t−co

eff−

emai

l−di

scus

sion

team

−co

unt

hasc

lose

d−pr

ev

clus

t−co

eff−

all−

disc

ussi

on

bc−

cent

ralit

y−bu

g−di

scus

sion

degr

ee−

cent

ralit

y−al

l−di

scus

sion

bc−

cent

ralit

y−al

l−di

scus

sion

unic

ode−

team

topi

cs−

all−

disc

ussi

on

topi

cs−

bug−

disc

ussi

on

tkin

ter−

team

topi

cs−

emai

l−di

scus

sion

DTTVRCRPT

AU

C D

rop

0.00

0.02

0.04

0.06

0.08

0.10

Figure 9. Identifying and ranking of the most significant features in the ensemble models. The significant features used in the temporal ensemble arecompared to the relational and traditional ensembles. We measure the change in AUC due to the randomization of attribute values.

the same significant temporal pattern). Randomization isperformed on an attribute by randomly reordering the values,thereby preserving the distribution of values but destroyingany association of the attribute with the class label. Forevery attribute, in every time step, we randomize the givenattribute, apply the ensemble method, and measure the dropin AUC due to that attribute. The resulting changes in AUCare used to assess and rank the attributes in terms of theirimpact on the temporal ensemble (and how it compares tomore standard relational or traditional ensembles). Figure 9.The results are shown in Figure 9.

We find that the basic traditional ensemble relies moreheavily on assignedto (in the current time step) while thetemporal ensemble (and even less for the relational en-semble) relies on the previous assignedto attributes. Thisindicates that the relational information in the past is moreuseful than the intrinsic information in the present—whichpoints to an interesting hypothesis that a colleagues behavior(and iteractions) precedes their own behavior. Organizationsmight use this to predict future behavior with less informa-tion and proactively respond more quickly.

Additionally, we investigated the attribute classes of eachtype of ensemble and found that topics are most useful forthe temporal ensemble. This indicates that topics are usefulas a way to understand the context and strength of interactionamong the developers, but only when the temporal dynamicsare modeled.

C. Discovering Temporal Patterns

We define three temporal mining techniques based onthe temporal framework to construct models with varyingtemporal granularities. These techniques are combined withrelational classifiers or used separately to discover the tem-poral nature and patterns of relational datasets.Models of Temporal Granularity. If we do not considertemporally weighting the links, nodes, and attributes thenwe restrict our focus to models based strictly on varying

the temporal granularity. In this space, there are a range ofinteresting models that provide insights into the temporalpatterns, structure, and nature of the dataset. We first definethree classes of models based on varying the temporalgranularity and then evaluate the utility of these models.In addition to discovering temporal patterns, these modelsare applied to measure the temporal stability and varianceof the classifiers over time.

• PAST-to-PRESENT. These models consider the linkednodes from the distant past and successively increasesthe size of the window to consider more recent links,attributes, and nodes.

• PRESENT-to-PAST. These models initially consideronly the most recent links, nodes, and attributes andsuccessively increase the size of the window to consid-ering more of the past.

• TEMPORAL POINT. These models only consider thelinks, nodes, and attributes at timestep k.

Mining Temporal-Relational Patterns Intuitively, Fig-ure 10 shows that if we consider only the past and suc-cessively include more recent information, then the AUCincreases as a function of the more recent attributes and links

0.85

0.90

0.95

1.00

1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998

Time

AU

C

Time PointPresent−to−PastPast−to−Present

(a) AI (RBC)

0.5

0.6

0.7

0.8

0.9

1.0

1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998

Time

AU

C

Time PointPresent−to−PastPast−to−Present

(b) ML (RBC)

Figure 10. A variety of temporal granularity models (uniform weighting).Average accuracy using RPT and RBC classifiers for ML and AI predictiontasks.

(i.e., PAST-To-PRESENT model). Conversely, if we consideronly the most recent temporal information and successivelyinclude more of the past then the AUC initially increases to alocal maximum and then dips before increasing as additionalpast information is modeled. This drop in accuracy indicatesa type of temporal-transition in the link structure and at-tributes. However, we might also expect the values to decaymore quickly since papers published in the distant past aregenerally less similar to recent papers as shown previously.Overfitting may justify the slight improvement in AUC asnoisey past information is added. The noise reduces biasin training and consequently increases the models ability togeneralize for predicting instances in the future. However,this might not always be the case. We expect the noise tobe minor in this domain as papers are unlikely to cite otherpapers in the past that are not related.

More interestingly, the class of TEMPORAL-POINT mod-els allow us to more accurately determine if past actionsat some previous timestep are predictive of the future andhow these behaviors transition over time. These patterns areshown in Figure 10.

Temporal Anomalies. The temporal granularity modelscapture many temporal anomalies. One striking anomaly isseen in Figure 10(a) where the accuracy of the TEMPORAL-POINT model decreases significantly in 1990, but then by1991 the accuracy has increased back to the previous level.

Temporal Stability of Relational Classifiers. We use thetemporal granularity models to compare more accurately thestability of the modified temporal RBC and RPT classifierswhich leads us to identify a few striking differences betweenthe two classifiers when modelling temporal networks.

In Figure 11(b), the RBC is shown to be stable over timewhereas the variance and stability of the RPT is significantlyworse. This lead us to analyze the internals of the modifiedRPT and found that for the ML prediction task, the structureof the trees at each timestep are significantly different fromone another and consequently unstable. However, we foundthe structure of the trees to more gradually evolve in the AIprediction task, making the RPT relatively more stable overtime.

In addition, we also found the RBC to perform extremelywell even with small amounts of temporal information (lowsupport for any hypothesis). The RPT and RBC are shownto have complementary advantages and disadvantages, es-pecially for predicting temporal attributes. This providesfurther justification for the proposed temporal ensemblemethod that uses both RPT and RBC with each selectedtemporal-relational representation.Temporal Relational Statistics. The temporal granularitymodels can be used to compute intuitive yet informativesimple measures to gain insights into the temporal natureof a network. The GLOBAL LINK RECENCY measures theprobability of citing a paper at time t and t − 1 forthe years 1993-1998 in both AI and ML prediction tasks

0.90

0.92

0.94

0.96

0.98

1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998

Time

AU

C

RPTRBC

(a) Temporal Stability (AI)

0.70

0.75

0.80

0.85

0.90

0.95

1.00

1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998

Time

AU

C

RPTRBC

(b) Temporal Stability (ML)

Figure 11. Average Temporal Stability of RPT and RBC for AI and MLprediction tasks.

as shown in Figure 12(a). For instance, the link recencymeasure (AI) between 1993 and 1995 is approximately 60%indicating that out of all the cited papers the majority ofthem are published in the same year t or the previous yeart− 1. Interestingly, the papers published in the most recentyears (e.g., 1998) cite fewer papers from the same year orprevious year and more papers in the past (i.e., indicatinga temporal-transition that could be due to papers becomingmore available to researchers, perhaps with digital archivesor other factors). Furthermore, the temporal relational auto-correlation measure shows that in general the recent papersare more influential compared to the papers in the past(correlation plots omitted for brevity).

The temporal link probabilities for the AI and ML pre-diction tasks are shown in Figure 12(b). For the papers ineach time period, the probability of citing a paper given thetime-lag ` is computed. Interestingly, the link probabilitiesat ` = 3 for each prediction-time approximately beginto converge. Indicating a global pattern with respect topast links that is independent of the core-nodes initial timeperiod. However, the time-lag between 0 ≤ ` ≤ 3 captureslocal patterns with respect to the core-nodes predictiontime. Hence, the more recent behavior of the core-nodesis significantly different than their past behavior.

0.35

0.40

0.45

0.50

0.55

0.60

0.65

1993 1994 1995 1996 1997 1998

Time

Mea

sure

of L

ink

Rec

ency

Link Recency (ML)Link Recency (AI)

(a) GLOBAL LINK RECENCY

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8 9 10

Time lag

Tem

pora

l Lin

k P

roba

bilit

y

199319941995199619971998

(b) TEMPORAL LINK PROBABILITY(ML)

Table IIIA SET OF DISCOVERED TOPICS AND THE MOST SIGNIFICANT WORDS

TOPIC 1 TOPIC 2 TOPIC 3 TOPIC 4 TOPIC 5dev logged gt code test

wrote patch file object libguido issue lt class viewimport bugs line case svncode bug os method trunkpep problem import type revmail fix print list modules

release fixed call set buildtests days read objects ampwork created socket change error

people time path imple usrmake docu data functions includepm module error argument homeve docs open dict file

support added windows add runmodule check problem def mainthings doc traceback methods localgood doesnt mailto exception srcvan report recent ms directory

D. Dynamic Textual Analysis: Interpreting Links and Nodes

In this task, we use only the communications to generate anetwork and then automatically annotate the links and nodesby discovering the latent topics of these communications.There are many motivations for such an approach, however,we are most interested in automatically learning evolu-tionary patterns between the topics and the correspondingdevelopers to increase the accuracy of temporal-relationalrepresentations and classifiers.

We first removed a standard list of stopwords and thenuse a version of Latent Dirichlet Allocation (LDA [12])to model the topics over time. We use EM to estimatethe parameters and Gibbs sampling for inference. Afterextracting the latent topics, inference is used to label eachlink with it’s most likely latent topic and each node with theirmost frequent topic. Instead of this simple representation, wecould have used the link probability distributions over time,but found that the potential performance gain did not justifythe significant increase in complexity.

The latent topics are modeled in three communicationnetworks (email, bug, and both). From these annotatedtemporal networks, we investigate the effects of modelingthe latent topics of the communications and their evolutionover time. We use the discovered evolutionary patterns asfeatures to explore the temporal-relational representations,classifiers, and ensembles and evaluate and compare eachof the models.

Table III lists a few topics and the most significant wordsfor each. We find words with both positive and negativeconnotation such as ‘good’ or ‘doesnt’ (i.e., related tosentiment analysis) and also words referring to the domainsuch as ‘bugs’ or ‘exception’. Additionally, we find the top-ics correspond to different development and social aspects.Interestingly, the word ‘guido’ appears significant, since

T=1 T=2 T=3 T=4 AVG

AU

C

0.65

0.70

0.75

0.80

0.85

0.90

0.95 TENC

TVRCWindow ModelUnion Model

Figure 12. Evaluation of temporal-relational classifiers using only thelatent topics of the communications to predict effectiveness. LDA isused to automatically discover the latent topics as well as annotating thecommunication links and individuals with their appropriate topic in thetemporal networks.

Guido van Rossum is the author of the Python programminglanguage.

E. Modeling the Evolutionary Patterns of Topics

We evaluate these dynamic topic features using varioustemporal-relational representations for improving classifica-tion models. Figure 12 indicates the necessity of using amore optimal temporal-relational representation that modelsthe temporal influence of links and attributes. More interest-ingly, we see that models that consider only simple temporal-relational representations perform significantly worse, in-dicating that the dynamic topics are only meaningful ifappropriately modeled. Additionally, we also learned morecomplex models from the class of window models to exploitadditional temporal granularities, but removed the plots forbrevity. In all the experiments, we find that the temporal-relational representations that leverage more of the temporalinformation outperform models that use only some of thetemporal information.

Evolutionary patterns between the topics, developers, andtheir effectiveness are clearly present in annotated networks.These results indicate that productive developers usuallycommunicate about similar topics or aspects of development.Additionally, we find that effective communications have aspecific structure that consequently enables others to becomemore effective. Moreover, these topics and the correspondingcommunications over time are temporally correlated with adevelopers effectiveness.

VII. CONCLUSION

We proposed a framework for temporal-relational clas-sifiers, ensembles, and more generally, representations for

mining temporal data. We evaluate and provide insightsof each using real-world networks with different attributesand informational constraints. The results demonstrated theeffectiveness, scalability, and flexibility of the temporal-relational representations for classification, ensembles, andmining temporal networks.

ACKNOWLEDGMENTS

This research is supported by DARPA and NSF undercontract number(s) NBCH1080005 and SES-0823313. Thisresearch was also made with Government support underand awarded by DoD, Air Force Office of Scientific Re-search, National Defense Science and Engineering Graduate(NDSEG) Fellowship, 32 CFR 168a. The U.S. Governmentis authorized to reproduce and distribute reprints for gov-ernmental purposes notwithstanding any copyright notationhereon. The views and conclusions contained herein arethose of the authors and should not be interpreted as nec-essarily representing the official policies or endorsementseither expressed or implied, of DARPA, NSF, or the U.S.Government.

REFERENCES

[1] S. Chakrabarti, B. Dom, and P. Indyk, “Enhanced hypertextcategorization using hyperlinks,” in SIGMOD, 1998, pp. 307–318.

[2] P. Domingos and M. Richardson, “Mining the network valueof customers,” in SIGKDD, 2001, pp. 57–66.

[3] J. Neville, O. Simsek, D. Jensen, J. Komoroske, K. Palmer,and H. Goldberg, “Using relational knowledge discovery toprevent securities fraud,” in SIGKDD, 2005, pp. 449–458.

[4] U. Sharan and J. Neville, “Temporal-relational classifiers forprediction in evolving domains,” in ICML, 2008.

[5] Q. Mei and C. Zhai, “Discovering evolutionary theme patternsfrom text - an exploration of temporal text mining,” inSIGKDD, 2005, pp. 198–207.

[6] J. Tang, M. Musolesi, C. Mascolo, V. Latora, and V. Nicosia,“Analysing Information Flows and Key Mediators throughTemporal Centrality Metrics,” 2010.

[7] C. Cortes, D. Pregibon, and C. Volinsky, “Communities ofinterest,” in IDA, 2001, pp. 105–114.

[8] J. Neville, D. Jensen, and B. Gallagher, “Simple estimatorsfor relational Bayesian classifers,” in ICML, 2003, pp. 609–612.

[9] J. Neville, D. Jensen, L. Friedland, and M. Hay, “Learningrelational probability trees,” in SIGKDD, 2003, pp. 625–630.

[10] P. Domingos and M. Pazzani, “On the optimality of the simplebayesian classifier under zero-one loss,” Machine Learning,vol. 29, pp. 103–130, 1997.

[11] T. Dietterich, “Ensemble methods in machine learning,” Mul-tiple classifier systems, pp. 1–15, 2000.

[12] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet allocation,”JMLR, vol. 3, pp. 993–1022, 2003.

representations and ensemble methods for dynamic...

Documents