graph computation

19
Graph Computation Naveen Molleti, Sigmoid

Upload: sigmoid

Post on 16-Apr-2017

174 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Graph computation

Graph Computation

Naveen Molleti,Sigmoid

Page 2: Graph computation

Graph of the Internet

Source: INRIA (http://raweb.inria.fr/rapportsactivite/RA2009/gravite/uid59.html)

Page 3: Graph computation

Red Hat family tree rendered along with an axis

Source: Wikimedia Commons (https://commons.wikimedia.org/wiki/File:Redhat_family_tree_11-06.png)

Page 4: Graph computation

Tabular structure Graph structure

Rows, fields, values

Vertices, edges, labels, properties

?

Graph computation

Page 5: Graph computation

Customer ID Customer Name Bill ID Item Name

391 Naveen 137 Pizza

391 Naveen 137 Coke

391 Naveen 139 Garlic Bread

393 Rahul 154 Garlic Bread

393 Rahul 154 Coke

391 Naveen 193 Coke

Table data

Page 6: Graph computation

Compute configuration

Specify type of edges to be created:

(Customer ID: CustomerName) => Bill ID

Bill ID => Item Name

Page 7: Graph computation
Page 8: Graph computation

Raw data

Ingest data Compute Insert graph

Configuration

Persistence

Page 9: Graph computation

Raw dataIngest data Compute Insert graph

Configuration

Persistence

HDFSSPARK

HDFS

TitanTinkerpop

Cassandra

Page 10: Graph computation

Graph data structures

trait Edge

{ def out: Vertex

def in: Vertex

def props: Map[String, AnyRef]

def label: String}

trait Vertex

{ def name: String

def id: String

def props: Map[String, AnyRef]}

trait Graph

{ def adjList: immutable.Map[Vertex, Seq[Edge]]}

Page 11: Graph computation

Compute

data

tokens + relations

vertices + edges

Page 12: Graph computation

Compute - simple map reduce approach

0) Split data into partitions

1) For each partition, compute tokens and relations

2) Create vertices and edges, and adjacency lists (local

subgraphs)

3) Merge adjacency lists using groupBy vertices

4) Merge duplicate edges within adjacency list

5) Result is final graph

Page 13: Graph computation

DATA

Chunk... ...

tokens relations

vertices edges

subgraph subgraph subgraphsubgraph

GRAPH

map step

reduce step

transformation step

Page 14: Graph computation

Tweaking for memory

- Maintaining vertex and edge objects is memory consuming both on application server and Spark master/workers- Moving around objects on network is costly too

Solution: Compute on ‘aliases’. Create objects corresponding to alias only before returning.

- After effects of merging duplicate objects - GC! (which opens another box of problems)Solution: Avoid all duplicate objects as far as possible.

Page 15: Graph computation

DATA

GRAPH

Chunk... ...

tokens relations

subcompute subcomputesubcompute ... ...

compute result

map step

reduce step

transformation step

Page 16: Graph computation

http://aa.bb.cc.dd:8000/graph/zzgraph/search?name=mr%20vijay&depth=2&limit=10

Page 17: Graph computation

- Xmx values on a forked JVM launched via SBT. (fork := true)

- Set javaOptions key (e.g. javaOptions := -Xmx16G)

- Underestimated size of Spark compute result

- Set spark.driver.maxResultSize

- Get the most out of your machine. Don’t let OS kill the process under memory

pressure.

- Set vm.panic_on_oom (echo 1 | sudo tee /proc/sys/vm/panic_on_oom)

Not enough memory?

Page 18: Graph computation

?

Graph

Database

Page 19: Graph computation

References

Titan: http://thinkaurelius.github.io/titan/Tinkerpop: http://tinkerpop.apache.org/Cassndra: http://cassandra.apache.org/