llama: efficient graph analytics using large...

LLAMA: Efficient graph analytics using Large Multiversioned Arrays

Presenters:Bryan Wilder, Ethan Kripke, Zach Dietz

Background

Mickens, James. “The Saddest Moment.” ;Login: Logout, (May 2013).

Big-Data Graph Analytics Applications

FriendBook

• • •

GoggleDon’t Not Not Be Not Evil™

Design Considerations

Compressed Sparse Row

Adjacency Lists

Compressed Sparse Bitmaps❓

Design Considerations

Compressed Sparse Row

Mutability

Multi-versioning

In-Memory vs. Out of Memory

Overhead

Compressed Sparse Row (CSR)

0 4 0 0 0 0 0

0 0 2 8 0 0 1

0 0 0 0 0 0 0

0 0 0 1 0 3 0

[ 2, 8, 1 ]

[ 1, 3 ]

0 1 2 3

[ 0, 1, 4, 4, 6 ]

0 4 0 0 0 0 0

0 0 2 8 0 0 1

0 0 0 0 0 0 0

0 0 0 1 0 3 0 Value

Column

Cum. NNZ

(implicit: row)

[ 1, 2, 3, 6, 3, 5 ]

[ 4, 2, 8, 1, 1, 3 ]

1 2 [ 1, 2, 2, 0, 2 ]Col

[ 0, 2, 3, 5 ]NNZ

(row) 0 1 2

1 2 [ 1, 2, 2, 0, 2 ]

[ 0, 2, 3, 5 ]0 1 1

VertexTable

Edge Table

• • •

Implementation

Snapshot 0 0 1 2 3

2 3 0 2

Vertex Table

Edge TableSnapshot 0Vertex 0

Offset: 0Length: 2

Snapshot 0Vertex 1

Offset: 2Length: 1

Snapshot 0Vertex 2

Offset: 3Length: 0

Snapshot 0Vertex 3

Offset: 3Length: 1

2 3 0 2

Vertex Table

Edge Table

Blue Red Green BlueEdge Property

Tall Short Skinny ShortVertex Property

0 0 0 0Deletions

Indirection Array(pointers to pages in VT)

0-1 2-3

Page 1

< L3 Cache

multiple of file system & virtual mem page size

Snapshot 0 0 1 2 3

2 3 0 2

Vertex Table

Edge Table

Snapshot 1 0 1

3 Continuation:NONE

Indirection Table 0-1 2-3

Page 1 Page 0’

3 Continuation:Snapshot 0, Offset 2

0-1 2-3

Reading? Merging?

Rule for merging?

Using LLAMAExposes three operations to user:

Iterate over verticesSelect a specific vertexIterate over neighbors of a vertex

Contrast: does not enforce sequential access pattern

(unlike GraphChi or X-stream)

ExperimentalEvaluation

TasksPageRank computation

Breadth-first search

Triangle counting

DatasetsLarge synthetic graphs (R-MAT)

Large, publicly available social graphs (Twitter, Livejournal)

Triangle counting

DatasetsLarge synthetic graphs (R-MAT)

Large, publicly available social graphs (Twitter, Livejournal)

In memory vs out of memory

PageRank, in-memory

In memory Out of memory

PageRank, out-of-memory

Runtime breakdown

Enormous variation!

LLAMA minimizes overhead

How is this possible? What features of LLAMA’s design essentially eliminate overhead like buffer management?

Overhead of snapshots

GraphChi (for reference)

PageRank Time to merge snapshots

Merges are IO-bound

Scalability in number of cores (PageRank)

Experimental evaluationCompare to in-memory and out-of-memory systems

(GreenMarl and GraphLab, GraphChi and X-stream)

Evaluate multi-version support

How much overhead from snapshots?

Evaluate scalability on different datasets

Experimental takeawaysLLAMA performs well both in and out of memory

Significantly reduces existing overhead

Introduces manageable additional overhead

Thank you!

How can we store variable length properties?

Can we create a hybrid between a row- and column-store with the storage of properties?

llama: efficient graph analytics using large...

Documents

llama trina

llama radio observatory

with llama - pwnet...llama llama and the bully goat by anna...

llama/alpaca project record book - msu...

llama and international collaboration

ala llama presentation 2012

llama reporter christmas 2009

aby llama hunky llama love ear warmer - cascade yarns ·...

llama power point

little llama, don’t you know, mama llama loves you so? ·...

project llama

premiering fall 2017 a netflix original...

lenny the llama

llama home with mama - best beginnings · book title: llama...

january llama tales 2011

llama llama mad at mama - best beginnings alaska ›...

llama manual

plta llama listings: sorted by llama name vs.:...

llama llama red pajama - dolly parton's imagination...

the difference between a fast llama and a slow llama · the...