sampling large graphs for anticipatory analysis€¦ · anticipatory analysis lauren edwards*, luke...

45
Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015 This work is sponsored by the Intelligence Advanced Research Projects Activity (IARPA) under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Upload: others

Post on 07-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Sampling Large Graphs for Anticipatory Analysis

Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller

IEEE High Performance Extreme Computing Conference

September 16, 2015

This work is sponsored by the Intelligence Advanced Research Projects Activity (IARPA) under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Page 2: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Outline

•  Introduction

•  Sampling Techniques

•  Link Prediction

•  Experimental Setup

•  Results

•  Summary

Page 3: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Common Big Data Challenge

Commanders Operators Analysts

Users

Maritime Ground Space C2 Cyber OSINT

<html> Data

Air HUMINT Weather

Data

Users

Gap

2000 2005 2010 2015 & Beyond

Rapidly increasing - Data volume - Data velocity - Data variety - Data veracity (security)

Page 4: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

What is a Graph?

Graphs are mathematical representations of physical or logical relationships between sets of objects

A graph is a pair of sets: •  A set of vertices/

nodes (entities) •  A set of edges/links

(connections, transactions, and relationships)

Page 5: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Applications of Graph Analytics

Cyber

• Graphs represent communication patterns of computers on a network

• ~1M – 1B network events

• GOAL: Detect attack or malicious software

• Graphs represent entities and relationships detected through multiple data

• ~1K – 1M entities and connections

• GOAL: Identify anomalous patterns

ISR Social

• Graphs represent relationships between individuals or documents

• ~10K – 10M individuals and interactions

• GOAL: Identify hidden social networks

Bio

• Graphs represent connectivity between brain regions

• ~1B – 1T regions and connections

• GOAL: Detect regions affected by a neurological condition

NSA-RD-2013-056001v1

Brain scale. . .

100 billion vertices, 100 trillion edges

2.08 mNA · bytes2 (molar bytes) adjacency matrix

2.84 PB adjacency list

2.84 PB edge list

Human connectome.Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011

2NA = 6.022⇥ 1023mol�1

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

Page 6: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

1

8

64

512

4,096

32,768

262,144

2,097,152

1 16 256 4,096 65,536 1,048,576

Big Data Challenge M

emor

y re

quire

d fo

r sto

ring

Spar

se M

atrix

(GB

)

Graph Vertices (millions)

ISR

Cyber

Web

How can these massive graphs be leveraged to provide accurate anticipatory intelligence?

Petabyte

Terabyte

NSA-RD-2013-056001v1

Web scale. . .

50 billion vertices, 1 trillion edges

271 EB adjacency matrix

29.5 TB adjacency list

29.1 TB edge list

Internet graph from the Opte Project(http://www.opte.org/maps)

Web graph from the SNAP database(http://snap.stanford.edu/data)

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

NSA-RD-2013-056001v1

Social scale. . .

1 billion vertices, 100 billion edges

111 PB adjacency matrix

2.92 TB adjacency list

2.92 TB edge list

Twitter graph from Gephi dataset(http://www.gephi.org)

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

Social

NSA-RD-2013-056001v1

Brain scale. . .

100 billion vertices, 100 trillion edges

2.08 mNA · bytes2 (molar bytes) adjacency matrix

2.84 PB adjacency list

2.84 PB edge list

Human connectome.Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011

2NA = 6.022⇥ 1023mol�1

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

Brain

P. Burkhardt and C. Waring, “An NSA Big Graph experiment,” National Security Agency, Tech. Rep. NSA-RD-2013-056002v1, 2013.

NSA Big Graph

Experiment

Page 7: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Outline

•  Introduction

•  Sampling Techniques

•  Link Prediction

•  Experimental Setup

•  Results

•  Summary

Page 8: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Methods for Sampling Graphs

Edge sampling methods Consider each edges individually, keep it in the sample based on some criterion

“Snowball” sampling methods

Start with a set of “seed” vertices, grow the sample from those

Vertex sampling methods Sample individual vertices or local substructures

Random walk methods Walk from vertex to vertex by some criterion, keep edges along the path

Page 9: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Edge Sampling –  Each edge in the graph has

equal probability of being included in the sample

•  Popularity-Based Sampling –  Probability of sampling a

given edge is proportional to a decreasing function of the vertex degrees

•  Wedge Sampling –  Choose a vertex at random,

and randomly select two of its adjacent edges to be included in the sample

Methods Used

Page 10: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Edge Sampling –  Each edge in the graph has

equal probability of being included in the sample

•  Popularity-Based Sampling –  Probability of sampling a

given edge is proportional to a decreasing function of the vertex degrees

•  Wedge Sampling –  Choose a vertex at random,

and randomly select two of its adjacent edges to be included in the sample

Methods Used

Page 11: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Edge Sampling –  Each edge in the graph has

equal probability of being included in the sample

•  Popularity-Based Sampling –  Probability of sampling a

given edge is proportional to a decreasing function of the vertex degrees

•  Wedge Sampling –  Choose a vertex at random,

and randomly select two of its adjacent edges to be included in the sample

Methods Used

Page 12: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Edge Sampling –  Each edge in the graph has

equal probability of being included in the sample

•  Popularity-Based Sampling –  Probability of sampling a

given edge is proportional to a decreasing function of the vertex degrees

•  Wedge Sampling –  Choose a vertex at random,

and randomly select two of its adjacent edges to be included in the sample

Methods Used

Page 13: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Edge Sampling –  Each edge in the graph has

equal probability of being included in the sample

•  Popularity-Based Sampling –  Probability of sampling a

given edge is proportional to a decreasing function of the vertex degrees

•  Wedge Sampling –  Choose a vertex at random,

and randomly select two of its adjacent edges to be included in the sample

Methods Used

Page 14: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Edge Sampling –  Each edge in the graph has

equal probability of being included in the sample

•  Popularity-Based Sampling –  Probability of sampling a

given edge is proportional to a decreasing function of the vertex degrees

•  Wedge Sampling –  Choose a vertex at random,

and randomly select two of its adjacent edges to be included in the sample

Methods Used

Page 15: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Edge Sampling –  Each edge in the graph has

equal probability of being included in the sample

•  Popularity-Based Sampling –  Probability of sampling a

given edge is proportional to a decreasing function of the vertex degrees

•  Wedge Sampling –  Choose a vertex at random,

and randomly select two of its adjacent edges to be included in the sample

Methods Used

Page 16: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Edge Sampling –  Each edge in the graph has

equal probability of being included in the sample

•  Popularity-Based Sampling –  Probability of sampling a

given edge is proportional to a decreasing function of the vertex degrees

•  Wedge Sampling –  Choose a vertex at random,

and randomly select two of its adjacent edges to be included in the sample

Methods Used

Page 17: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 18: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 19: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 20: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 21: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 22: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 23: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 24: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 25: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 26: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 27: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 28: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Page 29: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Random Walk Sampling –  Move along edges from

vertex to vertex, adding edges to the sample along the way

–  At each step, with probability 0.15, jump to any vertex in the graph

•  Random Area Sampling –  Choose a set of “seed”

vertices, and add all edges adjacent to those vertices to the sample

Methods Used, Cont’d

Selected methods comprise a broad array of network sampling techniques

Page 30: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Outline

•  Introduction

•  Sampling Techniques

•  Link Prediction

•  Experimental Setup

•  Results

•  Summary

Page 31: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Scenario: Two vertices currently do not share an edge

•  Question: How likely are these vertices to form a connection in the near future?

•  Examples: –  How likely is a connection to be

formed between two people in a social network?

–  How likely are two political entities to engage in diplomatic or military conflict?

Link Prediction

Link prediction: Anticipation of new connections that currently do not exist

?

?

Page 32: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Common neighbors –  Vertices with more neighbors

in common are more likely to connect

•  Jaccard’s coefficient –  Normalize common

neighbors by total number of neighbors

•  Low-rank approximation –  Use a low-rank approximation

for the square of the adjacency matrix

•  Tensor decomposition –  Break data into vertex-vertex-

time factors, extrapolate along temporal axis

Link Prediction Statistics

time

time

vert

ex

vert

ex

2 ≈

U Σ2 UT

Page 33: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  If edge sampling probability is p, probability of maintaining a common neighbor is p2

•  This “smears” the distribution of common neighbors, inhibiting detection

•  For eigenvector-based techniques, removing edges causes the distribution of eigenvalues to contract

•  The outlier eigenvalues, which provide the most information, contract more quickly

How Does Sampling Hurt?

eigenvalue

Sample 1/8 of Edges

Page 34: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Outline

•  Introduction

•  Sampling Techniques

•  Link Prediction

•  Experimental Setup

•  Results

•  Summary

Page 35: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Experiments

Simulated graph Method •  Random edge •  Popularity-based •  Random walk •  Random area •  Wedge

Amount •  Factor by which

data size should be reduced

Link Prediction •  Common

neighbors •  Jaccard’s

coefficient •  Low-rank

approximation •  Tensor

decomposition

input graph evaluation

•  Prediction performance

•  Running time

aggregate

sampling prediction statistics sampling prediction statistics sampling prediction statistics sampling prediction statistics sampling prediction statistics

Page 36: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  BTER Model: Simulate graphs evolving over several time steps

•  First time step: Create a graph from a generative model •  Future time steps:

–  Compute common neighbors   Normalize using cosine similarity   Randomly select based on scores

–  Create new random edges from generative model   Simulates “chance” encounters

–  Randomly remove edges

Simulation Setup Graph at time t Normalized common neighbors Graph at time t+1 Random changes

Page 37: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Approximately 10,000 vertices, average degree of 17, 5% increase in edge count per time step

•  Two degree distributions: one lighter tailed and one heavier tailed

Simulated Graphs

Variations designed to determine most robust sampling methods

Page 38: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Area under the ROC curve (AUC) –  Measure area under the

receiver operating characteristic (ROC) curve, which maps false alarm rate to detection rate

•  Running time –  Time required (in Matlab

implementation) to compute prediction statistics

•  Memory Footprint –  Storage required for prediction

statistics

Performance Metrics

Fundamental metrics for anticipatory network analytics: •  Predictive performance •  Latency •  Computing resource use

ROC shaded area =

0.144 AUC = 0.856

Probability of False Alarm

Prob

abili

ty o

f Det

ectio

n

Page 39: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Outline

•  Introduction

•  Sampling Techniques

•  Link Prediction

•  Experimental Setup

•  Results

•  Summary

Page 40: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Link Prediction in Baseline Simulation

Common Neighbors Jaccard’s Coefficient

Low-Rank Approximation Tensor Decomposition

•  Common neighbors and Jaccard’s coefficient are virtually identical

•  Random area sampling has the most significant initial performance drop in all cases

•  Random walk sampling improves AUC at a higher sampling factor with tensor factorization

Page 41: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Time for Link Prediction

•  Common neighbors and Jaccard: time is substantially reduced as sampling becomes more aggressive

•  Best initial decrease is with random area sampling –  However, this comes with the biggest decrease in predictive

performance

Jaccard’s Coefficient Common Neighbors

Page 42: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Time for Link Prediction

•  Low Rank: time increases with higher sampling rates –  Changes in the distribution of eigenvalues alters convergence

rate •  Tensor: High sampling rates significantly reduces runtime •  Random area sampling produces high number of isolated

vertices, reduces the dimensionality of the adjacency matrix

Jaccard’s Coefficient Common Neighbors Low-Rank Approximation Tensor Decomposition

Page 43: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Outline

•  Introduction

•  Sampling Techniques

•  Link Prediction

•  Experimental Setup

•  Results

•  Summary

Page 44: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

•  Network analytics are of tremendous importance for anticipatory analytics

–  Forecast future connections •  Networks of interest are often extremely large, and data

sampling enables computation on a dataset of a reasonable size

•  Study considered the effect of sampling on anticipatory network analytics

–  Random edge sampling is typically the most stable –  Random area sampling loses performance the fastest

•  Promising areas for future work: test on larger simulated and real graph data, investigate emergent community detection, aggregate samples for higher predictive capabilities, integration with high-performance graph analytics hardware

Summary

Page 45: Sampling Large Graphs for Anticipatory Analysis€¦ · Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance

Backup