link prediction - eth zürich - homepage | eth zürichlink prediction karsten borgwardt, christoph...
TRANSCRIPT
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 1
Link Prediction
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze
Interdepartmental Bioinformatics GroupMPI for Biological CyberneticsMPI for Developmental Biology
Link prediction
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 2
DefinitionGiven two nodes x and x′, should they be connected by anedge?
Unsupervised versus supervised
Supervised: We are given a training set of edges.Unsupervised: No such training set is available.
Similarity-score versus cluster-based
Similarity-based: Nodes are connected if they are similar.Cluster-based: Nodes from the same cluster show similarconnectivity patterns.
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 3
Section 1:Similarity-score based link prediction
Similarity-based link prediction
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 4
Unsupervised link predictionDirect methodUnsupervised link prediction using kernel methods
Supervised link predictionBasic schemeProtein interaction prediction
Unsupervised link prediction
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 5
Introduction to unsupervised network inference
Direct approachStatistical interpretation
Network inference by kernel-based dependence maximizationNETHSIC
ExperimentsSocial network analysis
Conclusions
Unsupervised network inference
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 6
Given set of objects described by their attributes xi ∈ X
Find a set E of m edges e(i, j) that corresponds to interactions
Example: social network
Objects are people
Attribute is the occupation
Target network:
Who is friends with whom?
Unsupervised network inference
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 7
Given set of objects described by their attributes xi ∈ X
Find a set E of m edges e(i, j) that corresponds to interactions
Who is friends with whom?
A direct approach:
Measure the pairwise di-stances d(xi, xj)
Iteratively connect the leastdistant pair by an edge
Unsupervised network inference
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 8
Given set of objects described by their attributes xi ∈ X
Find a set E of m edges e(i, j) that corresponds to interactions
Who is friends with whom?
A direct approach:
Measure the pairwise di-stances d(xi, xj)
Iteratively connect the leastdistant pair by an edge
Direct approach
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 9
Measure the pairwise distances d(xi, xj) induced by a kernel k̃(xi, xj) on the centeredattributes
Iteratively connect the least distant pair by an edge
argmine′
∑(i,j)∈E∪{e′}
d(xi, xj)
argmine′
∑(i,j)∈E∪{e′}
k̃(i, i) + k̃(j, j)− 2k̃(i, j)
argmine′
∑i,j
K̃. ∗ (D − AE∪{e′})
argmine′
tr(K̃ LE∪{e′})
argmine′
tr(HKH LE∪{e′})
argmaxe′
tr(HKH(aI − LE∪{e′})1)
argmaxe′
1
(n− 1)2tr(HKHL1−step
E∪{e′})
argmaxe′
HSIC(K,L1−stepE∪{e′})
A: adjacency matrixD: diagonal matrix holding the degree of each node; D(i, i) =
∑j A(i, j)
L: graph LaplacianLp−step: p-step random walk kernel
Direct approach
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 10
Measure the pairwise distances d(xi, xj) induced by a kernel k̃(xi, xj) on the centeredattributes
Iteratively connect the least distant pair by an edge
argmine′
∑(i,j)∈E∪{e′}
d(xi, xj)
argmine′
∑(i,j)∈E∪{e′}
k̃(i, i) + k̃(j, j)− 2k̃(i, j)
argmine′
∑i,j
K̃. ∗ (D − AE∪{e′})
argmine′
tr(K̃ LE∪{e′})
argmine′
tr(HKH LE∪{e′})
argmaxe′
tr(HKH(aI − LE∪{e′})1)
argmaxe′
1
(n− 1)2tr(HKHL1−step
E∪{e′})
argmaxe′
HSIC(K,L1−stepE∪{e′})
So the direct approach iteratively maximizes the Hilbert-Schmidt indepen-dence criterion between a kernel on the attributes and a 1-step random walkkernel on the nodes in the network.
HSIC
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 11
Hilbert-Schmidt independence criterion (Gretton et al., 2005)
Let F and G be RKHS on X and Yand mappings φ : X→ F and ψ : Y→ G
HSIC is a measure of dependence between F and G
HSIC(F,G, Prxy) := ‖Cxy ‖2HS
For pairs of finite samples X, Y an empirical estimate of HSICcan be computed in terms of kernels
HSIC(K,L) :=1
(n− 1)2tr(HKHL)
Kij = 〈φ(xi), φ(xj)〉Lij = 〈ψ(yi), ψ(yj)〉Hij = δij − n−1
HSIC
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 12
Hilbert-Schmidt independence criterion (Gretton et al., 2005)
Let F and G be RKHS on X and Yand mappings φ : X→ F and ψ : Y→ G
HSIC is a measure of dependence between F and G
HSIC(F,G, Prxy) := ‖Cxy ‖2HS
For pairs of finite samples X, Y an empirical estimate of HSICcan be computed in terms of kernels
HSIC(K,L) :=1
(n− 1)2tr(HKHL)
Kij = 〈φ(xi), φ(xj)〉Lij = 〈ψ(yi), ψ(yj)〉Hij = δij − n−1
⇒ The direct approach maximizes the dependence between repre-sentations of the objects in the spaces induced by a kernel on theattributes and a 1-step random walk kernel on the network.
Overview
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 13
Introduction to unsupervised network inferenceDirect approachStatistical interpretation
Network inference by kernel-based dependencemaximization
NETHSIC
ExperimentsSocial network analysis
Conclusions
NETHSIC
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 14
exploits the fact that HSIC can be estimated using kernels.
K(i, j) = k(xi, xj) attribute kernel
LE(i, j) = lE(xi, xj) node kernel
argmaxE⊂(V×V )∧|E|=m
1
(n− 1)2tr(HKHLE)
O(nm) number of choices⇒ use greedy selection of m edges
real-world networks are often sparse⇒ do greedy forward selection of edges
NETHSIC
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 15
exploits the fact that HSIC can be estimated using kernels.
K(i, j) = k(xi, xj) attribute kernel
LE(i, j) = lE(xi, xj) node kernel
Input: the set of nodes V ,number of edges m,attribute kernel k and node kernel l
Output: a subset E of V × V of size mE ← ∅repeate = argmaxe′∈V×V tr(HKHLE∪{e′})E ← E ∪ {e}
until |E| = m
Algorithm 1: NETHSIC forward selection
NETHSIC – node kernels LE
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 16
Given set of objects described by their attributes xi ∈ X
Find a set E of m edges e(i, j) that corresponds to interactions
Who is friends with whom?
What happens if we lookat a different kind of relationbetween the objects?
Who has a trade relationwith whom?
NETHSIC – node kernels LE
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 17
Given set of objects described by their attributes xi ∈ X
Find a set E of m edges e(i, j) that corresponds to interactions
Who has a trade relation withwhom?
NETHSIC is kernel based
Network topology definedby node kernel LEHere 1-step random walkdoes not fit
Define node kernel LE ex-pressing prior knowledgeabout the network structure
NETHSIC – node kernels LE
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 18
choice of node kernel LE defines the topology of the network
argmaxE⊂(V×V )∧|E|=m
1
(n− 1)2tr(HKHLE)
1-step Laplacian degree A2 closeness betweenness(aI − L)1 L = D − A 〈δ(i), δ(j)〉 A2 〈CC(i), CC(j)〉 〈CB(i), CB(j)〉⇓ ⇓ ⇓ ⇓ ⇓ ⇓
similar xi dissimilar xi similar xi similar xi similar xi similar xiare connected are connected have similar share many have similar have similar
degrees neighbors closeness betweennesscentrality centrality
CC(i) = (n− 1)−1∑t∈V \{i} dG(i, t) Average shortest path length dG between i and
all other nodes in G.CB(i) =
∑s6=i 6=t∈V
s6=t
σst(i)σst
Number of shortest paths σst passing through i
Overview
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 19
Introduction to unsupervised network inferenceDirect approachStatistical interpretation
Network inference by kernel-based dependence maximizationNETHSIC
Experiments
Social network analysis
Conclusions
Experiments
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 20
Countries trade data (Wassermann et al., 1994)
24 countries
3 attributes:population sizeGNP per capitaenergy usage
reference network:trade relations ofbasic manufactured goods
experimental setup:
linear kernel on each attribute
set m to max number of edges
rank edges by order of insertion
compute area under ROC curveusing reference network
Experiments
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 21
Countries trade data (Wassermann et al., 1994)
population size GNP per capita energy consumption0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
area
und
er R
OC
cur
ve
1−step random walkLaplaciandegreemutual information< 5% quantile> 95% quantile
Degree kernel of-ten shows best re-sults
Some results arebelow 5% quantile
Often it is not de-sirable to connectmost similar nodes
Overview
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 22
Introduction to unsupervised network inferenceDirect approachStatistical interpretation
Network inference by kernel-based dependence maximizationNETHSIC
ExperimentsSocial network analysis
Conclusions
Conclusions
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 23
Kernel method for unsupervised network inference (NETHSIC)
Statistically motivated
High flexibility by choice of node kernelLE that can define com-plex network topologies
Allows for a statistical interpretation to direct approaches
In real-world networks it is not always desirable to connect themost similar objects
Future work: Use NETHISC for network completion
argmaxE⊂(V×V )∧|E|=m
1
(n− 1)2 tr(HKHLE)
Supervised approaches
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 24
SettingWe are now given a training set of edges Etraining
We try to infer a rule, a classifier from this set Etraining thatallows us to predict edges on the test set Etest.
Ingredientsa similarity measure or metric for two pairs of nodesa set of negative examples of non-interacting nodesa classifier that turns these similarity scores into predictions
Pairwise similarity measures
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 25
Tensor pairwise kernel (Ben-Hur and Noble, ISMB 2005)Given two pairs of nodes (a, b) and (c, d).
ktensor((a, b), (c, d)) =knodes(a, c)knodes(b, d)
+knodes(a, d)knodes(b, c); (1)
This kernel quantifies the similarity of the source and targetnodes in both edges, for both directions.knodes is a kernel that measures the similarity of two nodes,just like the ones that are used for unsupervised link predicti-on.
Pairwise similarity measures
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 26
Method 1: Direct similarity-based prediction
Motivation: “connect similar genes”Connect a and b if d(a, b) is below a threshold.This is an unsupervised approach (no use of the knownsubnetwork).
J.-P. Vert (Ecole des Mines) Supervised network inference 5 / 19
Pairwise similarity measures
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 27
Method 1: Direct similarity-based prediction
Motivation: “connect similar genes”Connect a and b if d(a, b) is below a threshold.This is an unsupervised approach (no use of the knownsubnetwork).
J.-P. Vert (Ecole des Mines) Supervised network inference 6 / 19
Pairwise similarity measures
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 28
Method 2: metric learning
Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).
J.-P. Vert (Ecole des Mines) Supervised network inference 7 / 19
Pairwise similarity measures
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 29
Method 2: metric learning
Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).
J.-P. Vert (Ecole des Mines) Supervised network inference 8 / 19
Pairwise similarity measures
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 30
Method 2: metric learning
Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).
J.-P. Vert (Ecole des Mines) Supervised network inference 9 / 19
Pairwise similarity measures
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 31
Method 2: metric learning
Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).
J.-P. Vert (Ecole des Mines) Supervised network inference 10 / 19
Pairwise similarity measures
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 32
Metric learning pairwise kernel (Vert et al., 2007)Given two pairs of nodes (a, b) and (c, d).
kml((a, b), (c, d)) =(knodes(a, c)− knodes(a, d)−knodes(b, c) + knodes(b, d))
2
=[(φ(a)− φ(b))>(φ(c)− φ(d)]2; (2)
knodes is a kernel that measures the similarity of two nodes,just like the ones that are used for unsupervised link predicti-on.a pair (a, b) is similar to a pair (c, d)
if a− b is similar to c− d, or . . .if a− b is similar to d− c.
Protein interaction prediction
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 33
SettingProtein-protein interactions (PPI) from yeast-two hybridscreens and mass spectrometry measurements provide onlypartial view of the interactomeGoal of protein interaction prediction is to complete the inter-actome by link prediction
Sequence-based PPI predictiondomain or motif-based interaction prediction (Sprinzak andMargalit, 2001; Deng et al., 2002; Gomez et al., 2003; Wanget al., 2004)3-mer sequence kernel (Martin et al., 2005)phylogenetic trees (Ramani and Marcotte, 2003), correlatedmutations (Pazos and Valencia, 2002) derived from sequence
Protein interaction prediction
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 34
Negative examplesJansen et al., 2003: pairs of proteins from different cellularlocationsBen-Hur & Noble, 2005: select random pairs of non-interacting proteins
Ben-Hur & Noble, 2005Used 3-mer kernel, kernel based on sequence and domain mo-tifs, kernel based on GO annotation, interactions in other spe-cies and common neighboursPPI prediction on BIND physical interaction dataset via SVM:AUC of 0.97, ROC50 of 0.58
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 35
Section 2:Cluster-based link prediction
Cluster-based link prediction
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 36
ApproachSimilar nodes form a clusterNodes from the same cluster exhibit a similar connectivitypattern
Problems to be solvedHow to find clusters on a graph?→ graph-based clusteringHow to define a connectivity pattern of a cluster?
Graph-based clustering I
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 37
Data representationdataset D is given in terms of a graph G = (V,E)
a data objects vi is a node in Gedge e(i, j) from node vi to node vj has weight w(i, j)
Graph-based clusteringDefine a threshold θRemove all edges e(i, j) from G with weight w(i, j) > θ
Each connected component of the graph now corresponds toone clusterTwo nodes are in the same connected component if there is apath between themGraph components can be found by depth-first search in agraph ((O(|V | + |E|))
Graph-based clustering II
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 38
Original graph
Graph-based clustering III
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 39
Thresholded graph (θ = 0.5)
Graph-based clustering IV
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 40
But how to get the graph in the first place?Think of the weights as a similarity measure.If two nodes are not connected, then their similarity measureis 0.Graph-based clustering creates clusters of similar objectsFor any object vi in a cluster, there is a second object vj suchthat similarity(vi, vj) is larger than θ.
DBScan I
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 41
Noise-robust graph-based clusteringGraph-based clustering can suffer from the fact that one noisyedge connects two clustersDBScan (Ester et al., 1996) is a noise-robust extension ofgraph-based clusteringDBScan is short for Density-Based Spatial Clustering ofApplications with Noise
Core objectTwo objects vi and vj with distance d(vi, vj) < ε belong tothe same cluster if either vi or vj are a core object.vi is a core object iff there are MinPoints points within adistance of ε from vi.A cluster is defined by iteratively checking this core objectproperty.
DBScan II
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 42
DBSCAN (SetOfPoints, Eps, MinPts) // SetOfPoints is UN-CLASSIFIEDClusterId := nextId(NOISE);for i FROM 1 TO SetOfPoints.size doPoint := SetOfPoints.get(i);if Point.ClId = UNCLASSIFIED thenif ExpandCluster(SetOfPoints, Point, ClusterId, Eps, MinPts)thenClusterId := nextId(ClusterId)
end ifend if
end for
DBScan III
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 43
Code: ExpandClusterExpandCluster(SetOfPoints, Point, ClId, Eps, MinPts): Boo-lean;seeds:=SetOfPoints.regionQuery(Point,Eps);if seeds.size < MinPts then
SetOfPoint.changeClId(Point,NOISE);RETURN False;
elseSetOfPoints.changeClIds(seeds,ClId);seeds.delete(Point);while seeds <> Empty
currentP := seeds.first();result := SetOfPoints.regionQuery(currentP, Eps);
DBScan IV
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 44
if result.size >= MinPts thenfor i FROM 1 TO result.size do
resultP := result.get(i);if resultP.ClId IN (UNCLASSIFIED, NOISE) then
if resultP.ClId = UNCLASSIFIED thenseeds.append(resultP);
end ifSetOfPoints.changeClId(resultP,ClId);
end if // UNCLASSIFIED or NOISEend for;
end if; // result.size >= MinPtsseeds.delete(currentP);
end while; // seeds <> EmptyRETURN True;
end ifend // ExpandCluster
DBScan V
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 45
Original graph
DBScan VI
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 46
DBScan-clustered graph (MinPts = 2, Eps = 0.5)
DBScan VII
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 47
Original graph
DBScan VIII
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 48
DBScan-clustered graph (MinPts = 3, Eps = 0.5)
DBScan IX
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 49
PropertiesCluster assignment of border points is order-dependentUnlike k-means, one does not have to specify the number ofclusters a prioriBut one has to set MinPts and EpsEster et al. report that for 2D examples MinPts=4 is sufficientfor good resultsThey determine Eps by visual inspection of a k-distance plotTransfer question: How to kernelise DBScan?
Relational learning
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 50
PropertiesRepresents the graph as a probability distribution, in terms ofa graphical model.A graphical model is a probabilistic model for which a graphdenotes the conditional independence structure between thenodes, that is the random variables.A link r is a random variable in this model, typically a binaryvariable:r = 1; link does existr = 0; link does not exist
Relational learning
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 51
Link prediction based on node atrributes
Relational learning
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 52
Link prediction based on cluster membership
Relational learning
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 53
Link prediction based on cluster membership
Relational learning
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 54
Variants of cluster-based relational learningLinks between all members of the same cluster, no links bet-ween members of different clusters:P (r = 1|za, zb) = 1 if za = zbP (r = 1|za, zb) = 0 if za 6= zb
Links between all members of the same cluster, fixed link pro-bability between members of different clusters:P (r = 1|za, zb) = 1 if za = zbP (r = 1|za, zb) = c if za 6= zb and 0 ≤ c ≤ 1
Link probability η(a, b) between members of clusters a and b:P (r = 1|za, zb) = η(a, b) and η(a, b) ∼ Beta(β, β)
Relational learning
Karsten Borgwardt, Christoph Lippert and Nino Shervashidze: Biological Network Analysis, Page 55
Infinite (Hidden) Relational Model (IRM, IHRM)developed independently by Kemp et al and Xu et al in 2006Cluster nodes via a Chinese Restaurant ProcessLink probability η(a, b) between members of clusters a and b
Chinese restaurant process in a nutshell
P (zi = a|z1, . . . , zi−1) =
{ nai−1+γ na > 0γ
i−1+γ a is a new clusterwhere z1, . . . , zi−1 are the cluster asssignments of objects1, . . . , n, na is the number of objects assigned to cluster a,and γ is a parameter.The more objects there are in a cluster, the more likely it isthat a new data point is also assigned to this cluster.The creation of a new cluster is also possible.