scalable dynamic graph summarization

34
Scalable Dynamic Graph Summarization Ioanna Tsalouchidou 1 Gianmarco De Francisci Morales 2 Francesco Bonchi 3 Ricardo Baeza-Yates 1 1 Web Research Group, DTIC Pompeu Fabra University, Spain 2 Qatar Computing Research Institute 3 Algorithmic Data Analytics Lab ISI Foundation, Turin, Italy IEEE International Conference on Big Data, 2016

Upload: ioanna-tsalouchidou

Post on 13-Apr-2017

107 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Scalable Dynamic Graph Summarization

Scalable Dynamic Graph Summarization

Ioanna Tsalouchidou 1 Gianmarco De Francisci Morales 2

Francesco Bonchi 3 Ricardo Baeza-Yates 1

1Web Research Group, DTICPompeu Fabra University, Spain

2Qatar Computing Research Institute

3Algorithmic Data Analytics LabISI Foundation, Turin, Italy

IEEE International Conference on Big Data, 2016

Page 2: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Table of Contents

1. Introduction– Motivation– Related Work– Our approach

2. Methodology– Baseline algorithm– MicroClustering algorithm

3. Experiments

4. Conclusions

2

Page 3: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Introduction to Big Graphs

Big Data in social, communication, biological networksetc.

Are represented by Big Graphs

Encode relationship and communication patterns betweenpeople, news, trends, proteins etc.

3

Page 4: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Characteristics

These graphs have some common characteristics:

Dynamic: structural and interaction evolution

Massive: with hundreds of millions of vertices and billionsof edges

A B

4

Page 5: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Characteristics

These graphs have some common characteristics:

Dynamic: structural and interaction evolution

Massive: with hundreds of millions of vertices and billionsof edges

A B

C

4

Page 6: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Characteristics

These graphs have some common characteristics:

Dynamic: structural and interaction evolution

Massive: with hundreds of millions of vertices and billionsof edges

A B

C

0.3

0.7

4

Page 7: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Characteristics

These graphs have some common characteristics:

Dynamic: structural and interaction evolution

Massive: with hundreds of millions of vertices and billionsof edges

A B

C

0.5

0.9

4

Page 8: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Characteristics

These graphs have some common characteristics:

Dynamic: structural and interaction evolution

Massive: with hundreds of millions of vertices and billionsof edges

A B

C

4

Page 9: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem - Solution

Problem

Store and process biggraphs

Their evolution in time inmain memory

Applying algorithms iscomputationally expensive

Aggregate vertices andedges to reduce the size

Supernode: a set ofvertices of the original graph

Superedge: an edgebetween two supernodes

5

Page 10: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem - Solution

Problem Solution

Store and process biggraphs

Their evolution in time inmain memory

Applying algorithms iscomputationally expensive

Aggregate vertices andedges to reduce the size

Supernode: a set ofvertices of the original graph

Superedge: an edgebetween two supernodes

5

Page 11: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Related Work

Graph Summarization:

GraSS: Graph structure summarization [LeFevre and Terzi, ’10]

Graph summarization with quality guarantees [Riondato et al., ’14]

Data stream clustering:

A framework for clustering evolving data streams [Aggarwal etal., ’03]

6

Page 12: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Background: Static graph summarization

Represent graphs as adjacency matrices

Minimize the reconstruction ErrorQuality guaranties: geometric clustering of the nodes

Static Graph:

Adjacency matrix:

7

Page 13: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Background: Static graph summarization

Represent graphs as adjacency matricesMinimize the reconstruction ErrorQuality guaranties: geometric clustering of the nodes

Static Graph:

Adjacency matrix:=⇒

Summary Graph:

Summary adjacency matrix:

7

Page 14: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem formulation: Tensor summarization

Time series of w static graphs

The graph time series is represented by an adjacency tensor

Summary represented by an adjacency matrix

Νode1

ΝodeN

w

N

N

Super Node1

k

k

8

Page 15: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem formulation:Dynamic graph summarization via tensor streaming

t0

Dynamic graph: infinite stream of static graphs

Tensor with one dimension increasing in time

Define a sliding tensor window

Summarize the tensor within the tensor window

9

Page 16: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem formulation:Dynamic graph summarization via tensor streaming

t1

Dynamic graph: infinite stream of static graphs

Tensor with one dimension increasing in time

Define a sliding tensor window

Summarize the tensor within the tensor window

9

Page 17: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem formulation:Dynamic graph summarization via tensor streaming

t2

Dynamic graph: infinite stream of static graphs

Tensor with one dimension increasing in time

Define a sliding tensor window

Summarize the tensor within the tensor window

9

Page 18: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem formulation:Dynamic graph summarization via tensor streaming

t3

Dynamic graph: infinite stream of static graphs

Tensor with one dimension increasing in time

Define a sliding tensor window

Summarize the tensor within the tensor window

9

Page 19: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem formulation:Dynamic graph summarization via tensor streaming

t4

Dynamic graph: infinite stream of static graphs

Tensor with one dimension increasing in time

Define a sliding tensor window

Summarize the tensor within the tensor window

9

Page 20: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem formulation:Dynamic graph summarization via tensor streaming

t4

w

Dynamic graph: infinite stream of static graphs

Tensor with one dimension increasing in time

Define a sliding tensor window

Summarize the tensor within the tensor window

9

Page 21: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem formulation:Dynamic graph summarization via tensor streaming

t5

wSuper Node1

k

k

Dynamic graph: infinite stream of static graphs

Tensor with one dimension increasing in time

Define a sliding tensor window

Summarize the tensor within the tensor window

9

Page 22: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Problem formulation:Dynamic graph summarization via tensor streaming

At each time-stamp :

Input: most recent adjacency matrix

Update of the sliding window

Clustering nodes to supernodes

Output: one summary at every time-stamp

10

Page 23: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

MotivationRelated WorkOur approach

Contributions

Introduce the problem of lossy dynamic graph summarization

Two online algorithms for summarizing dynamic, large-scalegraphs

Distributed, scalable algorithms, implemented in Apache Spark

11

Page 24: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Baseline algorithmMicroClustering algorithm

Baseline algorithm: kC

Νode1

ΝodeN

wN

N

S0

Super-nodes

SC-1

Data PointsCluster each node of the tensor tothe supernodes

Each node has wN values

Clustering N points at everytime-stamp

Problem: (w − 1)N2 values remainunchanged

12

Page 25: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Baseline algorithmMicroClustering algorithm

MicroClustering algorithm: µC

At0

AtN-1

μC0

μC1

S0

Data Points Micro-Clusters Super-nodes

μC2

μCmC-1

SC-1

Two level clustering

Step1: adjacency matrix tomicro-clusters

Step2: keep statistics in themicro-clusters

Step3: run maintenancealgorithm

Step4: micro-clusters tosupernodes

13

Page 26: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Baseline algorithmMicroClustering algorithm

MicroClustering algorithm: µC

At0

AtN-1

μC0

μC1

S0

Data Points Micro-Clusters Super-nodes

μC2

μCmC-1

SC-1

Two level clustering

Step1: adjacency matrix tomicro-clusters

Step2: keep statistics in themicro-clusters

Step3: run maintenancealgorithm

Step4: micro-clusters tosupernodes

13

Page 27: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Baseline algorithmMicroClustering algorithm

MicroClustering algorithm: µC

At0

AtN-1

μC0

μC1

S0

Data Points Micro-Clusters Super-nodes

μC2

μCmC-1

SC-1

Two level clustering

Step1: adjacency matrix tomicro-clusters

Step2: keep statistics in themicro-clusters

Step3: run maintenancealgorithm

Step4: micro-clusters tosupernodes

13

Page 28: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Baseline algorithmMicroClustering algorithm

MicroClustering algorithm: µC

At0

AtN-1

μC0

μC1

S0

Data Points Micro-Clusters Super-nodes

μC2

μCmC-1

SC-1

Two level clustering

Step1: adjacency matrix tomicro-clusters

Step2: keep statistics in themicro-clusters

Step3: run maintenancealgorithm

Step4: micro-clusters tosupernodes

13

Page 29: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Baseline algorithmMicroClustering algorithm

MicroClustering algorithm: µC

At0

AtN-1

μC0

μC1

S0

Data Points Micro-Clusters Super-nodes

μC2

μCmC-1

SC-1

Two level clustering

Step1: adjacency matrix tomicro-clusters

Step2: keep statistics in themicro-clusters

Step3: run maintenancealgorithm

Step4: micro-clusters tosupernodes

13

Page 30: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Datasets and Experimental Setup

Datasets:

Twitter hashtag co-occurrences

Yahoo! Network Flow

Synthetic Dataset

Environment:

Cluster of 400 cores distributed in 30 machines.

Each machine: 24 cores Intel(R) Xeon(R) CPU E5-2430 0 @2.20 GHz.

Memory: driver program 12GB, executor process 3GB.

14

Page 31: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Scalability

15

Page 32: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Reconstruction Error

16

Page 33: Scalable Dynamic Graph Summarization

IntroductionMethodologyExperimentsConclusions

Conclusions

Problem: Large, evolving graphs are difficult to store andprocess

Solution: Graph summarization, reduces the size andcaptures the evolution of the input graph

Evaluation: Scalable, distributed solution with small error

17

Page 34: Scalable Dynamic Graph Summarization

Scalable Dynamic Graph Summarization

Ioanna Tsalouchidou 1 Gianmarco De Francisci Morales 2

Francesco Bonchi 3 Ricardo Baeza-Yates 1

1Web Research Group, DTICPompeu Fabra University, Spain

2Qatar Computing Research Institute

3Algorithmic Data Analytics LabISI Foundation, Turin, Italy

IEEE International Conference on Big Data, 2016