inferring network structure from indirect … › ~gsp16 › rabat.pdfinferring network structure...

Post on 27-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Inferring Network Structure from IndirectObservations

Mike Rabbat

McGill University, Montreal, Canada

GSP Workshop, Philadelphia, May 2016

1 / 25

Motivation

• Need “the graph(s)” to do graph signal processing

• Fundamental problem: Given signals, recover the graph

2 / 25

The Agenda

• Historical overview

• Case study: Inferring network structure from cascades

• Open directions

3 / 25

Combinatorial vs. Statistical Approaches

Combinatorial Graph Reconstruction

• Let G = (V,E) be an graph

• A card Gi = V \ vi is a vertex-deleted subgraph

• The deck is the set of all cards

Deck(G) = {Gi = V \ vi : vi ∈ V }

• Does Deck(G) uniquely determine G?

• How many cards are necessary?

4 / 25

Statistical Approaches – Historical Overview

• Covariance selection (Dempster ’72)

• Estimate Σ = E[xx>

]from x1,x2, . . . ,xm

i.i.d.∼ N (0,Σ)• Explore sparsity of Σ−1

• Σ−1i,j = 0 encodes conditional independence relationships

• Theoretical and computational advances• Meinshausen and Buhlmann (2006)• Ravikumar, Wainwright, and Lafferty (2010)• Friedman, Hastie, and Tibshirani (2007)• Banerjee, El Ghaoui, and d’Aspremont (2008)• Marjanovic and Hero (2014)• Main tools: Convex optimization + sparsity/high-d statistics

5 / 25

Statistical Approaches – Historical Overview

• Learning probabilistic graphical models• Beyond Gaussian/Ising• Survey by Heckerman (1995)• Main tools: Search over discrete space of graphs

6 / 25

Statistical Approaches – Historical Overview

Learning causal networks

• Edge u→ v iff ¬v =⇒ ¬u• Interventions: Pearl (2000)

• Predictive: Granger (1969)

Other network representations

• Partial correlations

• Structural equation models

• ...

7 / 25

Application-Specific Approaches

Network tomography

• Castro, Coates, Liang, Nowak, and Yu, Statistical Science,2004

S

R1 R2 R3

S

R1 R2 R3

S

R1 R2 R3

8 / 25

The Graph Signal Processing View

Session: Thursday P.M.

9 / 25

Relevant Questions

Identifiability: Do the observations (asymptotically) uniquelyidentify G?

Consistency: Does G (asymptotically) converge to G?

Statistical complexity: How many samples are required forG = G w.h.p.?

Computational complexity: To compute G from observations?

10 / 25

Case Study

M. Gomez-Rodriguez, L. Song, H. Daneshmand, and B. Scholkopf,“Estimating diffusion network stuctures”, ICML 2014.

11 / 25

Observations of Cascades

a

c

b

d

e

f

a

c

b

d

e

f

tb=0 te=2.1

1.8

tf=3.2

a

c

b

d

e

f

tb=0 te=2.1

1.8

tf=3.2tc=∞

ta=∞

G cascade observation

Observe multiple cascades t1, t2, . . . , tm where tcv ∈ [0, T c] ∪ {∞}

Each cascade is a graph signal (of exposure times)

12 / 25

Continuous-time diffusion modelGomez-Rodriguez et al. (2011)

1. Cascade begins at seed node si.i.d.∼ P(s)

2. Subsequent transmission over edge j → i governed by density

f(ti|tj ;αji) = f(ti − tj ;αji)

13 / 25

Examples

• Exponential

f(τ ;αji) = αjie−αjiτ1 {τ ≥ 0}

• Power-law (δ > 0)

f(τ ;αji) =αjiδ

(τδ

)−1−αji

1 {τ ≥ δ}

• Rayleighf(τ ;αji) = αjiτe

−αjiτ2/21 {τ ≥ 0}

14 / 25

Continuous-time diffusion modelGomez-Rodriguez et al. (2011)

1. Cascade begins at seed node si.i.d.∼ P(s)

2. Subsequent transmission over edge j → i governed by density

f(ti|tj ;αji) = f(ti − tj ;αji)

Also define survival function

s(ti|tj ;αji) = 1−∫ ti

tj

f(t− ti;αji)dt

and hazard function

h(ti|tj ;αji) =f(ti|tj ;αji)s(ti|tj ;αji)

15 / 25

Likelihood of infected nodes

Likelihood that a particular node j is the parent of i,

f(ti|tj ;αji)∏

k 6=j:tk<ti

s(ti|tk;αki)

Marginalizing over all nodes infected before i,∑j:tj<ti

f(ti|tj ;αji)∏

k 6=j:tk<ti

s(ti|tk;αki)

=∏

k:tk<ti

s(ti|tk;αki)∑j:tj<ti

h(ti|tj ;αji)

16 / 25

Likelihood of a Cascade

Network structure is encoded in matrix A = [αji]For a cascade t = (t1, . . . , tN ) with T = max{ti : ti <∞}

f(t;A) =∏i:ti≤T

∏u:tu>T

s(T |ti;αiu)

×∏

k:tk<ti

s(ti|tk;αki)∑j:tj<ti

h(ti|tj ;αji)

Maximum likelihood: Given t1, . . . , tm, find A to

minimize −∑m

c=1 log f(tc;A)subject to αji ≥ 0, i, j = 1, . . . , N, i 6= j

Note: Problem decouples into independent per-node sub-problems

17 / 25

Per-Node Problems

For node i optimize over αi = {αji : j = 1, . . . , N, j 6= i}.

minimize `i(αi)def= −

∑mc=1 gi(t

c;αi)subject to αji ≥ 0, j = 1, . . . , N

where (e.g., for exponential delays)

gi(t;αi) =

log(∑

j:tj<tiαji

)−∑

j:tj<tiαji(ti − tj) if ti ≤ T

−∑

j:tj≤T αji(T − tj) if ti > T

Note: Problem is convex in αi for all models mentioned before

18 / 25

Identifiability and Consistency

Are per-node problems identifiable? (`i(α) = `i(α?) iff α = α?)

The Hessian Q(αi) = ∇2`i(αi) has the form

Q(αi) = X(αi)[X(αi)]>

where X(αi) ∈ RN×m with entries

[X(αi)]jc =

(∑

k:tck<tciαki

)−1if tcj < tci

0 if tcj > tci .

Suppose P(s) > 0 for all s = 1, . . . , N .Then, as m→∞, X(αi) is full row-rank.=⇒ Q(αi) is positive definite=⇒ If α 6= α? then `i(α) > `i(α

?).Consistency also follows since the problem is strictly convex.

19 / 25

Finite Sample Recovery

When can we recover G (w.h.p.) from a finite number cascades?

• Are some networks more difficult to recover than others?

• What kind of cascades are needed for recovery?

Examine “population” log-likelihood E [`i(α)] = E [gi(t;α)].

20 / 25

Let Q?def= ∇2E [gi(t;α)] |α=α? .

Let P denote the true set of parents.

Dependency Condition: There exist constants Cmin, Cmax > 0such that

Cmin < λmin(Q?PP ) < λmax(Q?PP ) < Cmax.

Incoherence Condition: There exists a constant ε ∈ (0, 1] suchthat

‖Q?P cP (Q?PP )−1‖∞ ≤ 1− ε.

21 / 25

Conditions depend on A = [αji] and P(s)

1

0

3

2

5

4

7

6

2

1

3

0

0

1

2

3

4

n

22 / 25

Finite-Sample Recovery Results

Choose αi to solve the regularized problem

minimize `i(αi) + λ‖αi‖1subject to αji ≥ 0, j = 1, . . . , N

Theorem (Gomez-Rodriguez et al. (2016)): Suppose

λ ≥ C12− εε

√logN

N.

If m > C2|P |3 logN then with probability at least1− 2 exp(C3λ

2m),

1. αi is unique, and

2. αi = α?i .

23 / 25

Summary

Identifiability,Consistency,Statistical complexity, andComputational complexity

Open questions for this particular model:

• Noise/uncertainty in observations

• Allow for / model unobserved incubation periods

• Track time-varying αji’s

24 / 25

Directions For Topology Identification in GSP

• Inferring topologies such that observed signals are smooth• Generative models (diffusions, Gaussian)• Discriminative

• Inference with confidences (or p(G|X))

• Understand how errors in G impact subsequent GSPoperations (filtering, sampling)

• Infer topology characteristics

michael.rabbat@mcgill.ca

25 / 25

top related