challenges and opportunities posed by power laws in network analysis
DESCRIPTION
Challenges and Opportunities Posed by Power Laws in Network Analysis. Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011. Power Laws in Networks. Network topology: power law distribution of node degrees AS topology, social networks (Facebook, etc) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/1.jpg)
Challenges and Opportunities Posed by Power Laws in Network Analysis
Bruno RibeiroUMass Amherst
MURI REVIEW MEETINGBerkeley, 26th Oct 2011
![Page 2: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/2.jpg)
2
Power Laws in Networks
→ Network topology:– power law distribution of node degrees
• AS topology, social networks (Facebook, etc)
→ Network traffic:
– Flow: subset of packets– Power law distribution of flow sizes
routerpacket stream
vertex degree - d
P[de
g >
d ] Flickr dataset
![Page 3: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/3.jpg)
3
Characterizing Networks from Incomplete Data
This talk
→ Estimate distributions (of degrees, of flow sizes, …) from incomplete data (sampled edges, sampled packets, …)
→ Uncover central nodes in the network
![Page 4: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/4.jpg)
4
Outline
→ Challenge: Estimating subset size distributions from incomplete data– Incomplete data:
• randomly sampled edges, randomly sampled packets, …– Impact of power laws on estimation accuracy– Impact of other distributions on estimation accuracy
→ Opportunity: Uncovering central nodes in power law networks
![Page 5: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/5.jpg)
5
ESTIMATING SUBSET SIZE DISTRIBUTIONS FROM INCOMPLETE DATA
Part 1: Challenge
![Page 6: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/6.jpg)
6
Subset size distributionsSet of fishes
Number of fishes(subset size)
types of fish (subsets)
distribution
x - subset size (number of fishes)
fractionof subsets (types of fish)with size x
![Page 7: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/7.jpg)
7
Estimating subset size distributions
Set of fishes
randomlysample Nfishes(uniformly)
distribution
x - subset size (number of fishes)
fractionof subsets (types of fish)with size x
unbiasedestimate
sampledfishes
![Page 8: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/8.jpg)
8
Questions
How many fishes need to catch toobtain accurate distribution estimates?
What is impact of distribution shapeon estimation accuracy?
![Page 9: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/9.jpg)
10
Incomplete Data Estimation
randomsampling
estimation
IP flow size distribution
set of IP packets
Sampled packets
flow
![Page 10: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/10.jpg)
11
→ Distribution of # incoming links to a webpage– Q: do we need to crawl most of web graph?
→ Incoming links observed as outgoing links from other webpages– set = set of links– subset = incoming links to a webpage– sampling: link sampling
Network-related subset sizedistributions (webgraph)
?in-degree:# of links to webpage
outgoing links
![Page 11: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/11.jpg)
12
→ Distribution number of packets in a TCP flow
– Set = IP packets– Subset = a IP flow– Sampling: packet sampling
Network-related subset sizedistributions (IP traffic)
routerpacket stream
![Page 12: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/12.jpg)
13
Incomplete Data, Edge Sampling Example
Original graph
Sampled in-degrees
3x Estimator
OriginalIn-Degree Distribution
![Page 13: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/13.jpg)
14
Incomplete data model→ Set elements sampled with probability p– without replacement– independently
→ Model– : probability that j out
of i subset elements are sampled– i : fraction of subsets with i elements• e.g.: fraction of nodes with degree i, fraction of flows
with i packets
![Page 14: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/14.jpg)
15
Model (cont)→ bij – binomial(i,j)→ i : fraction of subsets with i elements→ W : maximum subset size
→ : fraction of subsets with j sampled
elements– d0 is not observable
![Page 15: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/15.jpg)
16
Mean Squared Error Question
→ i : unbiased estimate of of i
→ p : sampling probability→ N : sampled subsets (e.g. N sampled flows)
Exists an unbiased estimator that has small mean squared error: MSE(i)?
Try Maximum Likelihood Estimator (MLE)?
![Page 16: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/16.jpg)
17
Maximum Likelihood Estimation→ Simulation: edge sampling→ Flickr network (photo-sharing), 1.5M nodes
in-degree
i
![Page 17: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/17.jpg)
18
Cramer-Rao Lower Bound (CRLB)→ Let B = [bij] , d = [dj] , = [i]– Then
d = B→ D = diag(d) : diagonal matrix Djj = dj
→ i : unbiased estimate of of i
→ J : Fisher information matrix of N subsets– J = BT D B– lower bound Mean Squared Error of i :
MSE(i) (J-1)ii/NNeed to find J-1
![Page 18: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/18.jpg)
19
Recap→ Interested in the inverse of Fisher information
matrix becauseMSE(i) (J-1)ii/N
→ N : # of subsets sampled (# of nodes, # of TCP flows)
→ : subset size distribution estimate (what we seek)
→ p : sampling probability (edges, packets)
→ W : maximum subset size
![Page 19: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/19.jpg)
20
Results
![Page 20: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/20.jpg)
21
Heavier than exponential subset size distribution tail
→ Theorem 1: Suppose that W decreases more slowly than exponential. More precisely assume –log(W) = o(W) error grows
with subset size W
![Page 21: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/21.jpg)
22
Exponential subset size distribution tail
→ Theorem 2: Suppose that W decreases exponentially in W. More precisely assume -log(W) = W log a + o(W) as W ∞ for some 0 < a < 1
![Page 22: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/22.jpg)
23
Lighter than exponential subset size distribution tail
→ Theorem 3: Suppose that W decreases faster than exponentially in W. More precisely assume -log(W) = 𝜔(W). Then it follows that
0 < p ≤ 1
![Page 23: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/23.jpg)
24
Infinite support & power laws
→ If is power law with infinite support (W ∞)– if p < any unbiased estimator has ½“infinite” MSE• might as well output random estimates
– if p > estimates can be accurate if ½enough samples are collected
![Page 24: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/24.jpg)
25
Estimating Subset Size Average
→ I : randomly chosen subset size→ Average subset size E[I]:
– E[I] ≤ ∞ & E[I2] = ∞ then estimation error is unbounded• Reason: inspection paradox• Sampling biased towards very large subsets
– Average size of sampled subsets E[I2]/2E[I]
– otherwise, error is bounded
![Page 25: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/25.jpg)
26
IMPACT OF POWER LAWS ON SAMPLING CENTRAL NETWORK NODES
Part 2: Opportunity
![Page 26: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/26.jpg)
27
→ Central nodes important in networks– Communication bottlenecks, trend setters, information aggregators
→ Notions of centrality. – betweenness, closeness, PageRank, degree
Challenge: identify top k central nodes exploring small fraction of network
Central Nodes
central nodes
![Page 27: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/27.jpg)
28
Degree as a proxy for centrality→ Betweenness centrality: node is central if it belongs to many
shortest paths→ Closeness centrality: node is
central if has short paths to all other nodes
→ Rank correlation measures the degree of similarity between two rankings
→ Low rank correlation inplanar graphs (e.g. power grid)Set Type of Network # of nodes # of edges Description
AS-Snapshot Computer 22,963 48,436 Snapshot of Internet at level of ASca-CondMat Collaboration 23,133 186,936 ArXiv Condense Matter
ca-HepPh Collaboration 12,008 237,010 ArXiv High Energy Physicsemail-Enron Social 36,692 367,662 Email network from Enron
Rank correlation with Degree
![Page 28: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/28.jpg)
29
Random walk in steady state visits node with probability proportional to node degree
In power law graphs such bias towards high degree nodes is strong
We observe that RWs more efficient than more evolved techniques (AXS, RXS)
Looking for high degree nodes
% of network sampled
![Page 29: Challenges and Opportunities Posed by Power Laws in Network Analysis](https://reader035.vdocument.in/reader035/viewer/2022062501/56816252550346895dd29b0a/html5/thumbnails/29.jpg)
30
Thank you