estimating diffusion network structures - twikin.ethz.ch/~dhadi/papers/netrate-ident-sc-supp.pdf ·...

6
Estimating Diffusion Network Structures A. Proof of Lemma 9 Lemma 9 Given log-concave survival functions and con- cave hazard functions in the parameter(s) of the pairwise transmission likelihoods, then, a sufficient condition for the Hessian matrix Q n to be positive definite is that the hazard matrix X n () is non-singular. Proof Using Eq. 5, the Hessian matrix can be expressed as a sum of two matrices, D n () and X n ()X n () > . The matrix D n () is trivially positive semidefinite by log-concavity of the survival functions and concavity of the hazard functions. The matrix X n ()X n () > is positive definite matrix since X n () is full rank by assumption. Then, the Hessian matrix is positive definite since it is a sum a positive semidefinite matrix and a positive definite matrix. B. Proof of Lemma 10 Lemma 10 If the source probability P(s) is strictly posi- tive for all s 2 R, then, for an arbitrarily large number of cascades n !1, there exists an ordering of the nodes and cascades within the cascade set such that the hazard matrix X n () is non-singular. Proof In this proof, we find a labeling of the nodes (row indices in X n ()) and ordering of the cascades (column indices in X n ()), such that, for an arbitrary large number of cascades, we can express the matrix X n () as [TB], where T 2 R pp is an upper triangular with nonzero diag- onal elements and B 2 R pn-p . And, therefore, X n () has full rank (rank p). We proceed first by sorting nodes in R and then continue by sorting nodes in U : Nodes in R: For each node u 2 R, consider the set of cascades C u in which u was a source and i got in- fected. Then, rank each node u according to the ear- liest position in which node i got infected across all cascades in C u in decreasing order, breaking ties at random. For example, if a node u was, at least once, the source of a cascade in which node i got infected just after the source, but in contrast, node v was never the source of a cascade in which node i got infected the second, then node u will have a lower index than node v. Then, assign row k in the matrix X n () to node in position k and assign the first d columns to the corresponding cascades in which node i got in- fected earlier. In such ordering, X n () mk =0 for all m<k and X n () kk 6=0. Nodes in U : Similarly as in the first step, and assign them the rows d +1 to p. Moreover, we assign the columns d +1 to p to the corresponding cascades in which node i got infected earlier. Again, this order- ing satisfies that X n () mk =0 for all m<k and X n () kk 6=0. Finally, the remaining columns n - p can be assigned to the remaining cascades at random. This ordering leads to the desired structure [TB], and thus it is non-singular. C. Proof of Eq 7. If the Hazard vector X(t c ; ) is Lipschitz continuous in the domain {: S min 2 }, kX(t c ; β) - X(t c ; )k 2 k 1 kβ - k 2 , where k 1 is some positive constant. Then, we can bound the spectral norm of the difference, 1 p n (X n (β) -X n ()), in the domain {: S min 2 } as follows: |k 1 p n ( X n (β) - X n () ) k| 2 = max kuk2=1 1 p n ku ( X n (β) - X n () ) k 2 = max kuk2=1 1 p n v u u t n X c=1 hu, X(t c ; β) - X(t c ; )i 2 1 p n q k 2 1 nkuk 2 2 kβ - k 2 2 k 1 kβ - k 2 . D. Proof of Lemma 3 By Lagrangian duality, the regularized network inference problem defined in Eq. 4 is equivalent to the following con- strained optimization problem: minimize i ` n (i ) subject to ji 0,j =1,...,N,i 6= j, ||i || 1 C (λ n ) (20) where C (λ n ) < 1 is a positive constant. In this alternative formulation, λ n is the Lagrange multiplier for the second constraint. Since λ n is strictly positive, the constraint is active at any optimal solution, and thus ||i || 1 is constant across all optimal solutions. Using that ` n (i ) is a differentiable convex function by assumption and {: ji 0, ||i || 1 C (λ n )} is a convex set, we have that r` n (i ) is constant across opti- mal primal solutions (Mangasarian, 1988). Moreover, any optimal primal-dual solution in the original problem must satisfy the KKT conditions in the alternative formulation defined by Eq. 20, in particular, r` n (i )= -λ n z + μ, where μ 0 are the Lagrange multipliers associated to the

Upload: hathu

Post on 23-Mar-2018

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Estimating Diffusion Network Structures - TWikin.ethz.ch/~dhadi/papers/netrate-ident-sc-supp.pdf · Estimating Diffusion Network Structures ... them the rows d +1to p. Moreover,

Estimating Diffusion Network Structures

A. Proof of Lemma 9Lemma 9 Given log-concave survival functions and con-cave hazard functions in the parameter(s) of the pairwisetransmission likelihoods, then, a sufficient condition for theHessian matrix Qn to be positive definite is that the hazardmatrix Xn

(↵) is non-singular.

Proof Using Eq. 5, the Hessian matrix can be expressedas a sum of two matrices, D

n

(↵) and X

n

(↵)X

n

(↵)

>.The matrix D

n

(↵) is trivially positive semidefinite bylog-concavity of the survival functions and concavity of thehazard functions. The matrix X

n

(↵)X

n

(↵)

> is positivedefinite matrix since X

n

(↵) is full rank by assumption.Then, the Hessian matrix is positive definite since it is asum a positive semidefinite matrix and a positive definitematrix.

B. Proof of Lemma 10Lemma 10 If the source probability P(s) is strictly posi-tive for all s 2 R, then, for an arbitrarily large numberof cascades n ! 1, there exists an ordering of the nodesand cascades within the cascade set such that the hazardmatrix Xn

(↵) is non-singular.

Proof In this proof, we find a labeling of the nodes (rowindices in X

n

(↵)) and ordering of the cascades (columnindices in X

n

(↵)), such that, for an arbitrary large numberof cascades, we can express the matrix X

n

(↵) as [T B],where T 2 Rp⇥p is an upper triangular with nonzero diag-onal elements and B 2 Rp⇥n�p. And, therefore, Xn

(↵)

has full rank (rank p). We proceed first by sorting nodes inR and then continue by sorting nodes in U :

• Nodes in R: For each node u 2 R, consider the setof cascades C

u

in which u was a source and i got in-fected. Then, rank each node u according to the ear-liest position in which node i got infected across allcascades in C

u

in decreasing order, breaking ties atrandom. For example, if a node u was, at least once,the source of a cascade in which node i got infectedjust after the source, but in contrast, node v was neverthe source of a cascade in which node i got infectedthe second, then node u will have a lower index thannode v. Then, assign row k in the matrix X

n

(↵) tonode in position k and assign the first d columns tothe corresponding cascades in which node i got in-fected earlier. In such ordering, Xn

(↵)

mk

= 0 for allm < k and X

n

(↵)

kk

6= 0.• Nodes in U : Similarly as in the first step, and assign

them the rows d + 1 to p. Moreover, we assign thecolumns d + 1 to p to the corresponding cascades in

which node i got infected earlier. Again, this order-ing satisfies that Xn

(↵)

mk

= 0 for all m < k andX

n

(↵)

kk

6= 0. Finally, the remaining columns n � pcan be assigned to the remaining cascades at random.

This ordering leads to the desired structure [T B], and thusit is non-singular.

C. Proof of Eq 7.If the Hazard vector X(t

c

;↵) is Lipschitz continuous inthe domain {↵ : ↵

S

� ↵

⇤min2

},

kX(t

c

;�)�X(t

c

;↵)k2

k1

k� �↵k2

,

where k1

is some positive constant. Then, we can boundthe spectral norm of the difference, 1p

n

(Xn

(�)�Xn

(↵)),

in the domain {↵ : ↵S

� ↵

⇤min2

} as follows:

|k 1pn

�X

n

(�)�X

n

(↵)

�k|2

= max

kuk2=1

1pnku�Xn

(�)�X

n

(↵)

�k2

= max

kuk2=1

1pn

vuutnX

c=1

hu,X(t

c

;�)�X(t

c

;↵)i2

1pn

qk21

nkuk22

k� �↵k22

k1

k� �↵k2

.

D. Proof of Lemma 3By Lagrangian duality, the regularized network inferenceproblem defined in Eq. 4 is equivalent to the following con-strained optimization problem:

minimize↵i `n(↵i

)

subject to ↵ji

� 0, j = 1, . . . , N, i 6= j,||↵

i

||1

C(�n

)

(20)

where C(�n

) < 1 is a positive constant. In this alternativeformulation, �

n

is the Lagrange multiplier for the secondconstraint. Since �

n

is strictly positive, the constraint isactive at any optimal solution, and thus ||↵

i

||1

is constantacross all optimal solutions.

Using that `n(↵i

) is a differentiable convex function byassumption and {↵ : ↵

ji

� 0, ||↵i

||1

C(�n

)} is aconvex set, we have that r`n(↵

i

) is constant across opti-mal primal solutions (Mangasarian, 1988). Moreover, anyoptimal primal-dual solution in the original problem mustsatisfy the KKT conditions in the alternative formulationdefined by Eq. 20, in particular,

r`n(↵i

) = ��n

z+ µ,

where µ � 0 are the Lagrange multipliers associated to the

Page 2: Estimating Diffusion Network Structures - TWikin.ethz.ch/~dhadi/papers/netrate-ident-sc-supp.pdf · Estimating Diffusion Network Structures ... them the rows d +1to p. Moreover,

Estimating Diffusion Network Structures

non negativity constraints and z denotes the subgradient ofthe `1-norm.

Consider the solution ↵ such that ||zSc ||1 < 1 and thusr

↵Sc `n(↵i

) = ��n

zSc+ µ

S

c . Now, assume there is anoptimal primal solution ↵ such that ↵

ji

> 0 for some j 2Sc, then, using that the gradient must be constant acrossoptimal solutions, it should hold that ��

n

zj

+ µj

= ��n

,where µ

ji

= 0 by complementary slackness, which impliesµj

= ��n

(1 � zj

) < 0. Since µj

� 0 by assumption,this leads to a contradiction. Then, any primal solution ↵must satisfy ↵

S

c= 0 for the gradient to be constant across

optimal solutions.

Finally, since ↵S

c= 0 for all optimal solutions, we can

consider the restricted optimization problem defined inEq. 17. If the Hessian sub-matrix [r2L(↵)]

SS

is strictlypositive definite, then this restricted optimization problemis strictly convex and the optimal solution must be unique.

E. Proof of Lemma 4To prove this lemma, we will first construct a function

G(u

S

) := `n(↵⇤S

+ u

S

)� `n(↵⇤S

)

+ �n

(k↵⇤S

+ u

S

k1

� k↵⇤S

k1

).

whose domain is restricted to the convex set U = {uS

:

↵⇤S

+ u

S

� 0}. By construction, G(u

S

) has the followingproperties

1. It is convex with respect to u

S

.2. Its minimum is obtained at u

S

:= ↵S

� ↵⇤S

. That isG(u

S

) G(u

S

), 8uS

6= uS

.3. G(u

S

) G(0) = 0.

Based on property 1 and 3, we deduce that any point in thesegment, L := {u

S

: uS

= tuS

+ (1� t)0, t 2 [0, 1]},connecting u

S

and 0 has G(uS

) 0. That is

G(uS

) = G(tuS

+ (1� t)0)

tG(uS

) + (1� t)G(0) 0.

Next, we will find a sphere centered at 0 with strictly po-sitive radius B, S(B) := {u

S

: kuS

k2

= B}, such thatfunction G(u

S

) > 0 (strictly positive) on S(B). We notethat this sphere S(B) can not intersect with the segmentL since the two sets have strictly different function values.Furthermore, the only possible configuration is that the seg-ment is contained inside the sphere entirely, leading us toconclude that the end point u

S

:= ↵S

�↵⇤S

is also withinthe sphere. That is k↵

S

�↵⇤S

k2

B.

In the following, we will provide details on finding sucha suitable B which will be a function of the regularizationparameter �

n

and the neighborhood size d. More specifica-lly, we will start by applying a Taylor series expansion and

the mean value theorem,

G(u

S

) = rS

`n(↵⇤S

)

>u

S

+ u

>S

r2

SS

`n(↵⇤S

+ buS

)u

S

+ �n

(k↵⇤S

+ u

S

k1

� k↵⇤S

k1

), (21)

where b 2 [0, 1]. We will show that G(u

S

) > 0 by boun-ding below each term of above equation separately.

We bound the absolute value of the first term using theassumption on the gradient, r

S

`(·),|r

S

`n(↵⇤S

)

>u

S

| krS

`k1kuS

k1

krS

`k1pdku

S

k2

4

�1�n

Bpd. (22)

We bound the absolute value of the last term using the re-verse triangle inequality.

�n

|k↵⇤S

+ u

S

k1

� k↵⇤S

k1

| �n

kuS

k1

�n

pdku

S

k2

.(23)

Bounding the remaining middle term is more challenging.We start by rewriting the Hessian as a sum of two matrices,using Eq. 5,

q = min

uS

u

>S

D

n

SS

(↵⇤S

+ buS

)u

S

+ n�1

u

>S

X

n

S

(↵⇤S

+ buS

)X

n

S

(↵⇤S

+ buS

)

>u

S

= min

uS

u

>S

D

n

SS

(↵⇤S

+ buS

)u

S

+ ku>S

X

n

S

(↵⇤S

+ buS

)k22

.

Now, we introduce two additional quantities,

�D

n

SS

= D

n

SS

(↵⇤S

+ buS

)�D

n

SS

(↵⇤S

)

�X

n

S

= X

n

S

(↵⇤S

+ buS

)�X

n

S

(↵⇤S

),

and rewrite q as

q = min

uS

⇥u

>S

D

n

SS

(↵⇤S

)u

S

+ n�1ku>S

X

n

S

(↵⇤S

)k22

+n�1ku>S

�X

n

S

k22

+ u

>S

�D

n

SS

u

S

+2n�1hu>S

X

n

S

(↵⇤S

),u>S

�X

n

S

i⇤.

Next, we use dependency condition,

q � Cmin

B2 �max

uS

|u>S

�D

n

SS

u

S| {z }T1

|

�max

uS

2|n�1hu>S

X

n

S

(↵⇤S

),u>S

�X

n

S

i| {z }T2

|,

and proceed to bound T1

and T2

separately. First, we boundT1

using the Lipschitz condition,

|T1

| = |X

k2S

u2

k

[D

n

k

(↵⇤S

+ buS

)�D

n

k

(↵⇤S

)]|

X

k2S

u2

k

k2

kbuS

k2

k2

B3.

Page 3: Estimating Diffusion Network Structures - TWikin.ethz.ch/~dhadi/papers/netrate-ident-sc-supp.pdf · Estimating Diffusion Network Structures ... them the rows d +1to p. Moreover,

Estimating Diffusion Network Structures

Then, we use the dependency condition, the Lipschitz con-dition and the Cauchy-Schwartz inequality to bound T

2

,

T2

1pnku>

S

X

n

S

(↵⇤S

)k2

1pnku>

S

�X

n

S

k2

pC

max

B1pnku>

S

�X

n

S

k2

pC

max

BkuS

k2

1pn|k�X

n

S

k|2

pC

max

B2k1

kbuS

k2

k1

pC

max

B3,

where we note that applying the Lipschitz condition im-plies assuming B < ↵min

2

. Next, we incorporate thebounds of T

1

and T2

to lower bound q,

q � Cmin

B2 � (k2

+ 2k1

pC

max

)B3. (24)

Now, we set B = K�n

pd, where K is a constant that

we will set later in the proof, and select the regulariza-tion parameter �

n

to satisfy �n

pd 0.5C

min

/K(k2

+

2k1

pC

max

). Then,

G(u

S

) � �4

�1�n

pdB + 0.5C

min

B2 � �n

pdB

� B(0.5Cmin

B � 1.25�n

pd)

� B(0.5Cmin

K�n

pd� 1.25�

n

pd).

In the last step, we set the constant K = 3C�1

min

, and wehave

G(u

S

) � 0.25�n

pd > 0,

as long aspd�

n

C2

min

6(k2

+ 2k1

pC

max

)

↵⇤min

� 6�n

pd

Cmin

.

Finally, convexity of G(u

S

) yields

k↵S

�↵⇤S

k2

3�n

pd/C

min

↵⇤min

2

.

F. Proof of Lemma 5Define zc

j

= [rg(tc;↵⇤)]

j

and zj

=

1

n

Pc

zcj

. Now,using the KKT conditions and condition 4 (Boundedness),we have that µ⇤

j

= Ec

{zcj

} and |zcj

| k3

, respectively.Thus, Hoeffding’s inequality yields

P (|zj

� µ⇤j

| > �n

"

4(2� "))

2 exp

� n�2

n

"2

32k23

(2� ")2

!,

and then,

P (kz � µ⇤k1 >�n

"

4(2� "))

2 exp

� n�2

n

"2

32k23

(2� "))2+ log p

!.

G. Proof of Lemma 6We start by factorizing the Hessian matrix, using Eq. 5,

Rn

j

=

⇥r2`n(↵j

)�r2`n(↵⇤)

⇤>j

(↵�↵⇤) = !n

j

+ �nj

,

where,

!n

j

=

⇥D

n

(↵j

)�D

n

(↵⇤)

⇤>j

(↵�↵⇤)

�nj

=

1

nV n

j

(↵�↵⇤)

V n

j

= [X

n

(↵j

)]

j

X

n

(↵j

)

> � [X

n

(↵⇤)]

j

X

n

(↵⇤)

>.

Next, we proceed to bound each term separately. Since[↵

j

]

S

= ✓j

↵S

+ (1 � ✓j

)↵⇤S

where ✓j

2 [0, 1], andk↵

S

� ↵⇤S

k1 ↵

⇤min2

(Lemma 4), it holds that [↵j

]

S

�↵

⇤min2

. Then, we can use condition 3 (Lipschitz Continuity)to bound !n

j

.

|!n

j

| k1

k↵j

�↵⇤k2

k↵�↵⇤k2

k1

✓j

k↵�↵⇤k22

k1

k↵�↵⇤k22

. (25)

However, bounding term �nj

is more difficult. Let us startby rewriting �n

j

as follows.

�nj

= (⇤

1

+⇤

2

+⇤

3

) (↵�↵⇤),

where,

1

= [X

n

(↵⇤)]

j

(X

n

(↵j

)

> �X

n

(↵⇤)

>)

2

= {[Xn

(↵j

)]

j

� [X

n

(↵⇤)]

j

}(Xn

(↵j

)

> �X

n

(↵⇤)

>)

3

=

�[X

n

(↵j

)]

j

� [X

n

(↵⇤)]

j

�X

n

(↵⇤)

>.

Next, we bound each term separately. For the first term, wefirst apply Cauchy inequality,

|⇤1

(↵�↵⇤)| k[Xn

(↵⇤)]

j

k2

⇥ |kXn

(↵j

)

> �X

n

(↵⇤)

>k|2

k↵�↵⇤k2

,

and then use condition 3 (Lipschtiz Continuity) and 4(Boundedness),

|⇤1

(↵�↵⇤)| nk

4

k1

k↵j

�↵⇤k2

k↵�↵⇤k2

nk4

k1

k↵�↵⇤k22

.

For the second term, we also start by applying Cauchy in-

Page 4: Estimating Diffusion Network Structures - TWikin.ethz.ch/~dhadi/papers/netrate-ident-sc-supp.pdf · Estimating Diffusion Network Structures ... them the rows d +1to p. Moreover,

Estimating Diffusion Network Structures

0.1 0.2 0.3 0.4 0.50.4

0.5

0.6

0.7

0.8

0.9

1

!

Su

cce

ss P

rob

ab

ility

P=16

P=32

P=64

P=128

P=256

(a) Chain (di = 1)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

!

Su

cce

ss P

rob

ab

ility

p=31

p=63

p=127

(b) Stars with different # of leaves (di = 1)

0 0.02 0.04 0.06 0.08 0.10

0.2

0.4

0.6

0.8

1

!

Su

cess

Pro

ba

bili

ty

p=39

p=120

p=363

(c) Tree (di = 3)

Figure 5. Success probability vs. # of cascades. Different super-neighborhood sizes pi.

equality,

|⇤2

(↵�↵⇤)| k[Xn

(↵j

)]

j

� [X

n

(↵⇤)]

j

k2

⇥ |kXn

(↵j

)

> �X

n

(↵⇤)

>k|2

k↵�↵⇤k2

,

and then use condition 3 (Lipschtiz Continuity),

|⇤2

(↵�↵⇤)| nk2

1

k↵�↵⇤k22

.

Last, for third term, once more we start by applying Cauchyinequality,

|⇤3

(↵�↵⇤)| k[Xn

(↵j

)]

j

� [X

n

(↵⇤)]

j

k2

⇥ |kXn

(↵⇤)

>k|2

k↵�↵⇤k2

,

and then apply condition 1 (Dependency Condition) andcondition 3 (Lipschitz Continuity),

|⇤3

(↵�↵⇤)| nk

1

pC

max

k↵�↵⇤k22

Now, we combine the bounds,

kRnk1 Kk↵�↵⇤k22

,

whereK = k

1

+ k4

k1

+ k21

+ k1

pC

max

.

Finally, using Lemma 4 and selecting the regularization pa-rameter �

n

to satisfy �n

d C2

min

"

36K(2�")

yields:

kRnk1/�n

3K�n

d/C2

min

"

4(2� ")

H. Proof of Lemma 7We will first bound the difference in terms of nuclearnorm between the population Fisher information matrixQ

SS

and the sample mean cascade log-likelihood Qn

SS

.Define zc

jk

= [r2g(tc;↵⇤) � r2`n(↵⇤

)]

jk

and zjk

=

1

n

Pn

c=1

zcjk

. Then, we can express the difference betweenthe population Fisher information matrix Q

SS

and the sam-

ple mean cascade log-likelihood Qn

SS

as:

|kQn

SS

(↵⇤)�Q⇤

SS

(↵⇤)k|

2

|kQn

SS

(↵⇤)�Q⇤

SS

(↵⇤)k|

F

=

vuutdX

j=1

dX

k=1

(zik

)

2.

Since |z(c)jk

| 2k5

by condition 4, we can apply Hoeff-ding’s inequality to each z

jk

,

P (|zjk

| � �) 2 exp

✓��2n

8k25

◆, (26)

and further,

P (|kQn

SS

(↵⇤)�Q⇤

SS

(↵⇤)k|

2

� �)

2 exp

��K�2n

d2+ 2 log d

�(27)

where �2

= �2/d2. Now, we bound the maximum eigen-value of Qn

SS

as follows:

max

(Qn

SS

) = max

kxk2=1

x>Qn

SS

x

= max

kxk2=1

{x>Q⇤SS

x+ x>(Qn

SS

�Q⇤SS

)x}

y>Q⇤SS

y + y>(Qn

SS

�Q⇤SS

)y,

where y is unit-norm maximal eigenvector of Q⇤SS

. There-fore,

max

(Qn

SS

) ⇤

max

(Q⇤SS

) + |kQn

SS

�Q⇤SS

k|2

,

and thus,

P�⇤

max

(Qn

SS

) � Cmax

+ ��

exp

✓�K

�2n

d2+ 2 log d

◆.

Reasoning in a similar way, we bound the minimum eigen-

Page 5: Estimating Diffusion Network Structures - TWikin.ethz.ch/~dhadi/papers/netrate-ident-sc-supp.pdf · Estimating Diffusion Network Structures ... them the rows d +1to p. Moreover,

Estimating Diffusion Network Structures

0.4 0.8 1.2 1.6 20

0.2

0.4

0.6

0.8

1

!

Sourc

e P

robabili

ty

d=1

d=2

d=3

Figure 6. Success probability vs. # of cascades. Different in-degrees di.

value of Qn

SS

:

P�⇤

min

(Qn

SS

) Cmin

� ��

exp

✓�K

�2n

d2+ 2 log d

I. Proof of Lemma 8We start by decomposing Qn

S

cS

(↵⇤)(Qn

S

cS

(↵⇤))

�1 as fo-llows:

Qn

S

cS

(↵⇤)(Qn

S

cS

(↵⇤))

�1

= A1

+A2

+A3

+A4

,

where,

A1

= Q⇤S

cS

[(Qn

S

cS

)

�1 � (Q⇤S

cS

)

�1

],

A2

= [Qn

S

cS

�Q⇤S

cS

][(Qn

S

cS

)

�1 � (Q⇤S

cS

)

�1

]

A3

= [Qn

S

cS

�Q⇤S

cS

](Q⇤SS

)

�1,

A4

= Q⇤S

cS

(Q⇤SS

)

�1,

Q⇤= Q⇤

(↵⇤) and Qn

= Qn

(↵⇤). Now, we bound

each term separately. The fourth term, A4

, is the easiestto bound, using simply the incoherence condition:

|kA4

k|1 1� ".

To bound the other terms, we need the following lemma:

Lemma 11 For any � � 0 and constants K and K 0, thefollowing bounds hold:

P [|kQn

S

cS

�Q⇤S

cS

k|1 � �]

2 exp

✓�K

n�2

d2+ log d+ log(p� d)

◆(28)

P [|kQn

SS

�Q⇤SS

k|1 � �]

2 exp

✓�K

n�2

d2+ 2 log d

◆(29)

P [|k(Qn

SS

)

�1 � (Q⇤SS

)

�1k|1 � �]

4 exp

✓�K

n�

d3�K 0

log d

◆(30)

Proof We start by proving the first confidence interval. Bydefinition of infinity norm of a matrix, we have:

P [|kQn

S

cS

�Q⇤S

cS

k|1 � �]

= P⇥max

j2S

c

X

k2S

|zjk

| � �⇤

(p� d)P⇥X

k2S

|zjk

| � �⇤,

where zjk

= [Qn �Q⇤]

jk

and, for the last inequality, weused the union bound and the fact that |Sc| p � d. Fur-thermore,

P⇥P

k2S

|zjk

| � �⇤ P [9k 2 S||z

jk

| � �/d]

dP [|zjk

| � �/d].

Thus,

P [|kQn

S

cS

�Q⇤S

cS

k|1 � �] (p� d)dP [|zjk

| � �/d].

At this point, we can obtain the first confidence bound byusing Eq. 26 with � = �/d in the above equation. Theproof of the second confidence bound is very similar andwe omit it for brevity. To prove the last confidence bound,we proceed as follows:

|k(Qn

SS

)

�1 � (Q⇤SS

)

�1k|1= |k(Qn

SS

)

�1

[Qn

SS

�Q⇤SS

](Q⇤SS

)

�1k|1

pd|k(Qn

SS

)

�1

[Qn

SS

�Q⇤SS

](Q⇤SS

)

�1k|2

pd|k(Qn

SS

)

�1k|2

|kQn

SS

�Q⇤SS

k|2

|k(Q⇤SS

)

�1k|2

pd

Cmin

|kQn

SS

�Q⇤SS

k|2

|k(Qn

SS

)

�1k|2

.

Next, we bound each term of the final expression in theabove equation separately. The first term can be boundedusing Eq. 27:

P⇥|kQn

SS

�Q⇤SS

k|2

� C2

min

�/2pd⇤

2 exp

��Kn�2

d3+ 2 log d

�,

The second term can be bounded using Lemma 6:

P⇥|k(Qn

SS

)

�1k|2

� 2

Cmin

= P⇥⇤

min

(Qn

SS

) Cmin

2

exp

⇣�K

n

d2+B log d

⌘.

Then, the third confidence bound follows.

Control of A1

. We start by rewriting the term A1

as

A1

= Q⇤S

cS

(Q⇤SS

)

�1

[(Q⇤SS

)� (Qn

SS

)](Qn

SS

)

�1,

Page 6: Estimating Diffusion Network Structures - TWikin.ethz.ch/~dhadi/papers/netrate-ident-sc-supp.pdf · Estimating Diffusion Network Structures ... them the rows d +1to p. Moreover,

Estimating Diffusion Network Structures

500 1000 1500 2000 25000.5

0.6

0.7

0.8

0.9

1

# cascades

F1

NetRate

Our method

First Edge

(a) Kronecker hierarchical, EXP

500 1000 1500 2000 25000.5

0.6

0.7

0.8

0.9

1

# cascades

F1

NetRate

Our method

First Edge

(b) Kronecker hierarchical, RAY

500 1000 1500 2000 25000.5

0.6

0.7

0.8

0.9

1

# cascades

F1

NetRate

Our method

First Edge

(c) Forest Fire, POW

500 1000 1500 2000 25000.5

0.6

0.7

0.8

0.9

# cascades

F1

NetRate

Our method

First Edge

(d) Forest Fire, RAY

Figure 7. F1-score vs. # of cascades.

and further,

|kA1

k|1 |kQ⇤S

cS

(Q⇤SS

)

�1k|1⇥ |k(Q⇤

SS

)� (Qn

SS

)k|1|k(Qn

SS

)

�1k|1.

Next, using the incoherence condition easily yields:

|kA1

k|1 (1� ")|k(Q⇤SS

)� (Qn

SS

)k|1⇥pd|k(Qn

SS

)

�1k|2

Now, we apply Lemma 6 with � = Cmin

/2 to have that|k(Qn

SS

)

�1k|2

2

Cminwith probability greater than 1 �

exp(�Kn/d2 + K 0log d), and then use Eq. 30 with � =

"Cmin

12

pd

to conclude that

P⇥|kA

1

k|1 � "

6

⇤ 2 exp

⇣�K

n

d3+K 0

log d⌘.

Control of A2

. We rewrite the term A2

as

|kA2

k|1 |kQn

S

cS

�Q⇤S

cS

k|1|k(Qn

SS

)

�1�(Q⇤SS

)

�1k|1,

and then use Eqs. 28 and 29 with � =

p"/6 to conclude

that

P⇥|kA

2

k|1 � "

6

4 exp

⇣�K

n

d3+ log(p� d) +K 0

log p⌘.

Control of A3

. We rewrite the term A3

as

|kA3

k|1 =

pd|k(Q⇤

SS

)

�1k|2

|kQn

S

cS

�Q⇤S

cS

k|1

pd

Cmin

|kQn

S

cS

�Q⇤S

cS

k|1.

We then apply Eq. 28 with � =

"Cmin

6

pd

to conclude that

P⇥|kA

3

k|1 � "

6

⇤ exp

⇣�K

n

d3+ log(p� d)

⌘,

and thus,

P⇥|kQn

S

cS

(Qn

SS

)

�1k|1 � 1� "

2

= O⇣exp(�K

n

d3+ log p)

⌘.

J. Additional experiments

Parameters (n, p, d). Figure 5 shows the success proba-bility at inferring the incoming links of nodes on the sametype of canonical networks as depicted in Fig. 2. We choosenodes the same in-degree but different super-neighboorhodset sizes p

i

and experiment with different scalings � of thenumber of cascades n = 10�d log p. We set the regula-rization parameter �

n

as a constant factor ofplog(p)/n

as suggested by Theorem 2 and, for each node, we usedcascades which contained at least one node in the super-neighborhood of the node under study. We used an ex-ponential transmission model and time window T = 10.As predicted by Theorem 2, very different p values lead tocurves that line up with each other quite well.

Figure 6 shows the success probability at inferring the in-coming links of nodes of a hierarchical Kronecker networkwith equal super neighborhood size (p

i

= 70) but differentin-degree (d

i

) under different scalings � of the number ofcascades n = 10�d log p and choose the regularization pa-rameter �

n

as a constant factor ofplog(p)/n as suggested

by Theorem 2. We used an exponential transmission modeland time window T = 5. As predicted by Theorem 2,in this case, different d values lead to noticeably differentcurves.

Comparison with NETRATE and First-Edge. Figure 7compares the accuracy of our algorithm, NETRATE andFirst-Edge against number of cascades for different typeof networks and transmission models. Our method typi-cally outperforms both competitive methods. We find es-pecially striking the competitive advantage with respect toFirst-Edge, however, this may be explained by comparingthe sample complexity results for both methods: First-Edgeneeds O(Nd logN) cascades to achieve a probability ofsuccess approaching 1 in a rate polynomial in the num-ber of cascades while our method needs O(d3 logN) toachieve a probability of success approaching 1 in a rate ex-ponential in the number of cascades.