llama: efficient graph analytics using large...

Post on 10-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LLAMA: Efficient graph analytics using Large Multiversioned Arrays

Presenters:Bryan Wilder, Ethan Kripke, Zach Dietz

Background

Mickens, James. “The Saddest Moment.” ;Login: Logout, (May 2013).

Big-Data Graph Analytics Applications

Big-Data Graph Analytics Applications

FriendBook

• • •

GoggleDon’t Not Not Be Not Evil™

123

Design Considerations

Compressed Sparse Row

Adjacency Lists

Compressed Sparse Bitmaps❓

Design Considerations

Compressed Sparse Row

Mutability

Multi-versioning

In-Memory vs. Out of Memory

Overhead

Compressed Sparse Row (CSR)

0 4 0 0 0 0 0

0 0 2 8 0 0 1

0 0 0 0 0 0 0

0 0 0 1 0 3 0

[ 4 ]

[ 2, 8, 1 ]

[ 1, 3 ]

[ ]

0 1 2 3

[ 0, 1, 4, 4, 6 ]

Compressed Sparse Row (CSR)

0 4 0 0 0 0 0

0 0 2 8 0 0 1

0 0 0 0 0 0 0

0 0 0 1 0 3 0 Value

Column

Cum. NNZ

(implicit: row)

[ 1, 2, 3, 6, 3, 5 ]

[ 4, 2, 8, 1, 1, 3 ]

Compressed Sparse Row (CSR)

0

1 2 [ 1, 2, 2, 0, 2 ]Col

[ 0, 2, 3, 5 ]NNZ

(row) 0 1 2

0 1 1

0 0 1

1 0 1

Compressed Sparse Row (CSR)

0

1 2 [ 1, 2, 2, 0, 2 ]

[ 0, 2, 3, 5 ]0 1 1

0 0 1

1 0 1

VertexTable

Edge Table

• • •

• • •

Implementation

Snapshot 0 0 1 2 3

2 3 0 2

Vertex Table

Edge TableSnapshot 0Vertex 0

Offset: 0Length: 2

Snapshot 0Vertex 1

Offset: 2Length: 1

Snapshot 0Vertex 2

Offset: 3Length: 0

Snapshot 0Vertex 3

Offset: 3Length: 1

2 3 0 2

Vertex Table

Edge Table

Blue Red Green BlueEdge Property

Tall Short Skinny ShortVertex Property

0 0 0 0Deletions

Indirection Array(pointers to pages in VT)

0-1 2-3

Page 0 Page 1

< L3 Cache

multiple of file system & virtual mem page size

Snapshot 0 0 1 2 3

2 3 0 2

Vertex Table

Edge Table

Snapshot 1 0 1

3 Continuation:NONE

Indirection Table 0-1 2-3

Page 0 Page 1 Page 0’

3 Continuation:Snapshot 0, Offset 2

0-1 2-3

Reading? Merging?

Rule for merging?

Using LLAMAExposes three operations to user:

Iterate over verticesSelect a specific vertexIterate over neighbors of a vertex

Contrast: does not enforce sequential access pattern

(unlike GraphChi or X-stream)

ExperimentalEvaluation

TasksPageRank computation

Breadth-first search

Triangle counting

TasksPageRank computation

Breadth-first search

Triangle counting

DatasetsLarge synthetic graphs (R-MAT)

Large, publicly available social graphs (Twitter, Livejournal)

TasksPageRank computation

Breadth-first search

Triangle counting

DatasetsLarge synthetic graphs (R-MAT)

Large, publicly available social graphs (Twitter, Livejournal)

In memory vs out of memory

PageRank, in-memory

In memory Out of memory

PageRank, out-of-memory

Runtime breakdown

Enormous variation!

LLAMA minimizes overhead

How is this possible? What features of LLAMA’s design essentially eliminate overhead like buffer management?

Overhead of snapshots

LLAMA

GraphChi (for reference)

PageRank Time to merge snapshots

Merges are IO-bound

Scalability in number of cores (PageRank)

Experimental evaluationCompare to in-memory and out-of-memory systems

(GreenMarl and GraphLab, GraphChi and X-stream)

Evaluate multi-version support

How much overhead from snapshots?

Evaluate scalability on different datasets

Experimental takeawaysLLAMA performs well both in and out of memory

Significantly reduces existing overhead

Introduces manageable additional overhead

Thank you!

How can we store variable length properties?

Can we create a hybrid between a row- and column-store with the storage of properties?

top related