lighthouse: large-scale graph pattern matching on giraph

Post on 20-Jul-2015

140 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LighthouseLarge-scale graph pattern matching on Giraph

2

Timeline• Inspired by Google Pregel (2010)

• Donated to ASF by Yahoo! in 2011

• Top-level project in 2012

• 1.0 release in January 2013

• 1.1 release in November 2014

• Used at Facebook, LinkedIn, Yahoo!

3

Vertex-centric API

5

?

?

?

2

3

Iteration i+1Iteration i

4

PU 1

PU 2

PU 3

PU 4

PU 5

Iteration i Iteration i+1

BSP/Pregel implementation

5

Architecture

Netty Netty Netty Netty

...

Hadoop File System (HDFS)

Zookeeper

Master Coordinator

Worker 1 Worker 2 Worker N Master

Compute threads

Vertices

Message Inbox

Message Outbox

6

Lighthouse

Giraph execution algebra

Binding Table. Matching and potential graph patterns are stored in a table that is distributed across the messages sent around by vertices. !• Scan: starts traversals from certain vertices. • Select: prunes traversals based on expressions. • Project: adds data to the binding table. • Hash Join: joins paths generated from different traversals • Step Join: performs a further hop in the traversal. • Move: continues a traversal from different vertices.

8

5

?

?

?

2

3

Iteration i+1Iteration i

V1 John … VN

… … … …

V4 Paul … VJ

V7 Mark … VL

Distributed Binding Table

9

MATCH (person:Person {firstName:"Antonio"}) -[:WORK_AT]-> (company), (company) -[:IS_LOCATED_IN]-> (country)

WHERE person.browser = "Chrome" RETURN person.id, person.lastName, company.id, country.id

10

MATCH (person:Person) -[:WORK_AT]-> (company) RETURN person.id, person.birthDate, company.id

11

Scan

Project12

StepJoin

13

Cypher path-queriesDesired functionality: • weighted shortest paths • multiple source and destinations • top N shortest paths for each pair • provide both paths and their costs • restrict search to subset of graph

Restrictions: • Monotonic cost function • Path-independent local vertex/edge restrictions

14

ProposalMATCH p = (a:Start) -[e* | not(endNode(e)).danger ]-> (b:Finish)

CHEAPEST 3 SUM e.distance * e.maxSpeed AS length RETURN a, b, path, length

Features: • Selector applied before WHERE condition (optional) • Number of paths for each pair (e.g. 3) (optional) • User-defined cost function (required) • AS keyword to bind distance to variable (optional)

15

Giraph implementation

Two phases: !• First phase: we compute the routes of each top K

shortest paths. Each vertex discovers and registers the precedent vertex in the shortest paths (similar to Pregel BFS).

• Second phase: starting from “leaves”, we traverse back the structure building the paths.

16

Preliminary results

17

Thanks.

top related