inferring network structure from indirect … › ~gsp16 › rabat.pdfinferring network structure...
Post on 27-Jun-2020
3 Views
Preview:
TRANSCRIPT
Inferring Network Structure from IndirectObservations
Mike Rabbat
McGill University, Montreal, Canada
GSP Workshop, Philadelphia, May 2016
1 / 25
Motivation
• Need “the graph(s)” to do graph signal processing
• Fundamental problem: Given signals, recover the graph
2 / 25
The Agenda
• Historical overview
• Case study: Inferring network structure from cascades
• Open directions
3 / 25
Combinatorial vs. Statistical Approaches
Combinatorial Graph Reconstruction
• Let G = (V,E) be an graph
• A card Gi = V \ vi is a vertex-deleted subgraph
• The deck is the set of all cards
Deck(G) = {Gi = V \ vi : vi ∈ V }
• Does Deck(G) uniquely determine G?
• How many cards are necessary?
4 / 25
Statistical Approaches – Historical Overview
• Covariance selection (Dempster ’72)
• Estimate Σ = E[xx>
]from x1,x2, . . . ,xm
i.i.d.∼ N (0,Σ)• Explore sparsity of Σ−1
• Σ−1i,j = 0 encodes conditional independence relationships
• Theoretical and computational advances• Meinshausen and Buhlmann (2006)• Ravikumar, Wainwright, and Lafferty (2010)• Friedman, Hastie, and Tibshirani (2007)• Banerjee, El Ghaoui, and d’Aspremont (2008)• Marjanovic and Hero (2014)• Main tools: Convex optimization + sparsity/high-d statistics
5 / 25
Statistical Approaches – Historical Overview
• Learning probabilistic graphical models• Beyond Gaussian/Ising• Survey by Heckerman (1995)• Main tools: Search over discrete space of graphs
6 / 25
Statistical Approaches – Historical Overview
Learning causal networks
• Edge u→ v iff ¬v =⇒ ¬u• Interventions: Pearl (2000)
• Predictive: Granger (1969)
Other network representations
• Partial correlations
• Structural equation models
• ...
7 / 25
Application-Specific Approaches
Network tomography
• Castro, Coates, Liang, Nowak, and Yu, Statistical Science,2004
S
R1 R2 R3
S
R1 R2 R3
S
R1 R2 R3
8 / 25
The Graph Signal Processing View
Session: Thursday P.M.
9 / 25
Relevant Questions
Identifiability: Do the observations (asymptotically) uniquelyidentify G?
Consistency: Does G (asymptotically) converge to G?
Statistical complexity: How many samples are required forG = G w.h.p.?
Computational complexity: To compute G from observations?
10 / 25
Case Study
M. Gomez-Rodriguez, L. Song, H. Daneshmand, and B. Scholkopf,“Estimating diffusion network stuctures”, ICML 2014.
11 / 25
Observations of Cascades
a
c
b
d
e
f
a
c
b
d
e
f
tb=0 te=2.1
1.8
tf=3.2
a
c
b
d
e
f
tb=0 te=2.1
1.8
tf=3.2tc=∞
ta=∞
G cascade observation
Observe multiple cascades t1, t2, . . . , tm where tcv ∈ [0, T c] ∪ {∞}
Each cascade is a graph signal (of exposure times)
12 / 25
Continuous-time diffusion modelGomez-Rodriguez et al. (2011)
1. Cascade begins at seed node si.i.d.∼ P(s)
2. Subsequent transmission over edge j → i governed by density
f(ti|tj ;αji) = f(ti − tj ;αji)
13 / 25
Examples
• Exponential
f(τ ;αji) = αjie−αjiτ1 {τ ≥ 0}
• Power-law (δ > 0)
f(τ ;αji) =αjiδ
(τδ
)−1−αji
1 {τ ≥ δ}
• Rayleighf(τ ;αji) = αjiτe
−αjiτ2/21 {τ ≥ 0}
14 / 25
Continuous-time diffusion modelGomez-Rodriguez et al. (2011)
1. Cascade begins at seed node si.i.d.∼ P(s)
2. Subsequent transmission over edge j → i governed by density
f(ti|tj ;αji) = f(ti − tj ;αji)
Also define survival function
s(ti|tj ;αji) = 1−∫ ti
tj
f(t− ti;αji)dt
and hazard function
h(ti|tj ;αji) =f(ti|tj ;αji)s(ti|tj ;αji)
15 / 25
Likelihood of infected nodes
Likelihood that a particular node j is the parent of i,
f(ti|tj ;αji)∏
k 6=j:tk<ti
s(ti|tk;αki)
Marginalizing over all nodes infected before i,∑j:tj<ti
f(ti|tj ;αji)∏
k 6=j:tk<ti
s(ti|tk;αki)
=∏
k:tk<ti
s(ti|tk;αki)∑j:tj<ti
h(ti|tj ;αji)
16 / 25
Likelihood of a Cascade
Network structure is encoded in matrix A = [αji]For a cascade t = (t1, . . . , tN ) with T = max{ti : ti <∞}
f(t;A) =∏i:ti≤T
∏u:tu>T
s(T |ti;αiu)
×∏
k:tk<ti
s(ti|tk;αki)∑j:tj<ti
h(ti|tj ;αji)
Maximum likelihood: Given t1, . . . , tm, find A to
minimize −∑m
c=1 log f(tc;A)subject to αji ≥ 0, i, j = 1, . . . , N, i 6= j
Note: Problem decouples into independent per-node sub-problems
17 / 25
Per-Node Problems
For node i optimize over αi = {αji : j = 1, . . . , N, j 6= i}.
minimize `i(αi)def= −
∑mc=1 gi(t
c;αi)subject to αji ≥ 0, j = 1, . . . , N
where (e.g., for exponential delays)
gi(t;αi) =
log(∑
j:tj<tiαji
)−∑
j:tj<tiαji(ti − tj) if ti ≤ T
−∑
j:tj≤T αji(T − tj) if ti > T
Note: Problem is convex in αi for all models mentioned before
18 / 25
Identifiability and Consistency
Are per-node problems identifiable? (`i(α) = `i(α?) iff α = α?)
The Hessian Q(αi) = ∇2`i(αi) has the form
Q(αi) = X(αi)[X(αi)]>
where X(αi) ∈ RN×m with entries
[X(αi)]jc =
(∑
k:tck<tciαki
)−1if tcj < tci
0 if tcj > tci .
Suppose P(s) > 0 for all s = 1, . . . , N .Then, as m→∞, X(αi) is full row-rank.=⇒ Q(αi) is positive definite=⇒ If α 6= α? then `i(α) > `i(α
?).Consistency also follows since the problem is strictly convex.
19 / 25
Finite Sample Recovery
When can we recover G (w.h.p.) from a finite number cascades?
• Are some networks more difficult to recover than others?
• What kind of cascades are needed for recovery?
Examine “population” log-likelihood E [`i(α)] = E [gi(t;α)].
20 / 25
Let Q?def= ∇2E [gi(t;α)] |α=α? .
Let P denote the true set of parents.
Dependency Condition: There exist constants Cmin, Cmax > 0such that
Cmin < λmin(Q?PP ) < λmax(Q?PP ) < Cmax.
Incoherence Condition: There exists a constant ε ∈ (0, 1] suchthat
‖Q?P cP (Q?PP )−1‖∞ ≤ 1− ε.
21 / 25
Conditions depend on A = [αji] and P(s)
1
0
3
2
5
4
7
6
2
1
3
0
0
1
2
3
4
n
22 / 25
Finite-Sample Recovery Results
Choose αi to solve the regularized problem
minimize `i(αi) + λ‖αi‖1subject to αji ≥ 0, j = 1, . . . , N
Theorem (Gomez-Rodriguez et al. (2016)): Suppose
λ ≥ C12− εε
√logN
N.
If m > C2|P |3 logN then with probability at least1− 2 exp(C3λ
2m),
1. αi is unique, and
2. αi = α?i .
23 / 25
Summary
Identifiability,Consistency,Statistical complexity, andComputational complexity
Open questions for this particular model:
• Noise/uncertainty in observations
• Allow for / model unobserved incubation periods
• Track time-varying αji’s
24 / 25
Directions For Topology Identification in GSP
• Inferring topologies such that observed signals are smooth• Generative models (diffusions, Gaussian)• Discriminative
• Inference with confidences (or p(G|X))
• Understand how errors in G impact subsequent GSPoperations (filtering, sampling)
• Infer topology characteristics
michael.rabbat@mcgill.ca
25 / 25
top related