david daniels, eric liu, charles zhang cme 323rezab/classes/cme323/s15/...Β Β· key results from...

22
David Daniels, Eric Liu, Charles Zhang CME 323

Upload: others

Post on 26-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

David Daniels, Eric Liu, Charles Zhang

CME 323

Page 2: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Not all edges are created equal

Example: AngelList ($2.9+ billion) β—¦ Investors can follow a start-up (social link)

β—¦ Investors can invest in a start-up (economic link)

β—¦ What is the relative importance of a social link versus an economic link?

Page 3: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

This paper attempts to recover edge weights that are revealed via the propagation of actual influence along a network β—¦ β€œWhat is the edge-type weight vector that best

describes a graph, assuming influence operates as if characterized by Edge-Type Weighted PageRank?”

Page 4: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Edge-Type Weighted Graphs

Weighted PageRank

Structural Estimation

Algorithm β—¦ (Inner) PageRank Iterations (for given weight vector)

β—¦ (Outer) Search Strategy (for optimal weight vector)

Experiments

Page 5: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

𝐺 = 𝑉, 𝐸(𝑒, 𝑑), πœ” ∈ 𝒒𝑀

𝑉 is the set of vertices with 𝑉 = 𝑛

𝐸 is the set of edges with 𝐸 = π‘š

Each edge 𝑒 having a type attribute 𝑑 ∈ 𝒯 ={1, 2, … , 𝑇}

πœ” = (πœ”1, … , πœ”π‘‡) ∈ ℝ+T is the weight vector

Page 6: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Weighted stochastic adjacency matrix A∈ ℝ𝑛×𝑛

𝐴𝑗𝑖 =𝑀𝑖𝑗

𝑀𝑖𝑗𝑗 where 𝑀𝑖𝑗 = πœ”π‘‘π‘–π‘— is the weight for edge 𝑒𝑖𝑗,

if 𝑒𝑖𝑗 ∈ 𝐸 (i.e. if 𝑣𝑖 β†’ 𝑣𝑗), and 𝑀𝑖𝑗 = 0 if 𝑒𝑖𝑗 βˆ‰ 𝐸

Alternatively, we can write A as

𝐴 = πœ”1𝐡1 +β‹―+πœ”π‘‡π΅

𝑇 𝐢 where 𝐢𝑖𝑖 = πœ”π‘‘π‘‘ 𝐡 𝑑

𝑖𝑗𝑗

Page 7: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Find π‘Ÿ ∈ ℝn such that π‘Ÿ = (1 βˆ’ 𝛿)/𝑛 + π›Ώπ΄π‘Ÿ

i.e -> Find the eigenvector corresponding to the eigenvalue πœ† = 1 for the matrix 𝑀 = 𝛿𝐴 +1βˆ’π›Ώ

π‘›πŸ™

(Perron–Frobenius) on positive stochastic matrices

πœ† = 1 is the unique largest eigenvalue of 𝑀

=> Power Iterations

Page 8: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Ο‰opt = argminπœ”

β„Ž π‘œπ‘Ÿπ‘‘π‘’π‘Ÿ 𝑃𝑅 𝐺 𝑉, 𝐸, πœ” , π‘βˆ—

π‘βˆ— = π‘œπ‘Ÿπ‘‘π‘’π‘Ÿ(𝑃𝑅 𝐺 𝑉, 𝐸,πœ”βˆ— +𝒩 0, σϡ

2𝐼 )

β„Ž 𝑝1, 𝑝2 = 𝑝1 βˆ’ 𝑝2 2

Page 9: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood
Page 10: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood
Page 11: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Key results from simulations

The optimal weight vector πœ”π‘œπ‘π‘‘ that attains minimum of 𝑓 lies in close neighbourhood of true weight vector, πœ”βˆ—.

Albeit intractability to find closed-form of derivative, the function 𝑓 is convex and smooth (and at least piecewise continuous/convex for higher dimensions).

Weighted PageRanks perform better than unweighted PageRanks especially on graphs with high in-degree / low edge weights, low in-degree / high edge weights (see paper).

Page 12: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

πœ”βˆ— = (0.5714, 0.2857, 0.1429) 95% CI for πœ”1

βˆ— : [0.5677, 0.5831] 95% CI for πœ”2

βˆ— : [0.2797, 0.2899] 95% CI for πœ”3

βˆ— : [0.1325, 0.1471]

Page 13: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Local Machine

Power iteration on 𝑀 2𝑛 βˆ’ 1 𝑛 ~ π’ͺ 𝑛2 per multiplication

Number of iterations:

π’ͺlog 1/πœ– βˆ’ 𝑐𝑐2/𝑐1

log 1/πœ†2 ~π’ͺ

log 1/πœ–

log 1/𝛿

because π‘€π‘˜π‘£ = 𝑐1(π‘Ÿ +𝑐2

𝑐1πœ†2

π‘˜π‘ž2 +β‹―+𝑐𝑛

𝑐1πœ†π‘›

π‘˜π‘žπ‘›

and by Haveliwala (2003)’s bound on πœ†2 ≀ 𝛿

Smart update: 𝑣(π‘˜) = (1 βˆ’ 𝛿)/𝑛 + 𝛿𝐴𝑣(π‘˜βˆ’1) 2π‘š βˆ’ 𝑛 ~ π’ͺ π‘š per update

Page 14: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Distributed using Pregel Framework

Page 15: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Distributed using Pregel Framework, 𝐡 machines, with combiners

Communication cost: π’ͺ min π‘š, 𝑛𝐡

Reduce size for each key:

β—¦ π’ͺ min max indegrees, 𝐡 β—¦ Max in-degrees could be as bad as π’ͺ π‘š

β—¦ On average, it is π’ͺ π‘š/𝑛

Number of supersteps: π’ͺ log 1/πœ–

Page 16: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Grid search π’ͺ 1/𝛼 π‘‡π‘š log 1/πœ–

Numerical gradient descent πœ”(𝑙+1) = πœ”(𝑙) βˆ’ 𝛾 𝛻𝑓 πœ” 𝑙

πœ•π‘“

πœ•πœ”π‘‘πœ” 𝑙 =

𝑓 πœ” 𝑙 + 𝛼𝑒𝑑 βˆ’ 𝛼𝑒𝑇 βˆ’ 𝑓 πœ” 𝑙

𝛼

π’ͺ 𝑆𝑇 π‘š log 1/πœ– + 𝑛 log 𝑛 𝐿

Page 17: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Accuracy?

N.B.: Experiments use a slightly different minimization objective – squared difference in PageRank scores rather than squared difference in PageRank order – because our laptop Spark setup (4 cores, 4 GB memory) was able to process the former metric, but not the latter, in a reasonable time for graphs of a size worth distributing.

Page 18: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Accuracy?

Nodes Edges # Types Recovered Weights True Weights

600 20% 2 .332, .668 .33, .67

600 20% 4 .098, .199, .3, .4 .1, .2, .3, .4

600 20% 2 .331, .669 .33+Ο΅,.67+Ο΅,

Ο΅ ~𝒩 0,.1

𝑀𝑖

2

2000 500 3 .165, .332, .503 .166, .333, .5

Page 19: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Runtime?

Page 20: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Erdos-Renyi random graphs, 600 Nodes

Page 21: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood

Can recover edge-type weights accurately

Next steps β—¦ Dynamically changing PageRank tolerance

β—¦ Look for direction in unit ball with small πœ†2

Page 22: David Daniels, Eric Liu, Charles Zhang CME 323rezab/classes/cme323/S15/...Β Β· Key results from simulations The optimal weight vector 𝑑 that attains minimum of lies in close neighbourhood