titan @ gitpro conference 2014

42
AURELIUS THINKAURELIUS.COM TITAN Scalable Graph Database Matthias Broecheler @mbroecheler April 12 th , MMXIV

Upload: matthias-broecheler

Post on 15-Jan-2015

1.268 views

Category:

Technology


0 download

DESCRIPTION

Presents Titan and Faunus at the Gitpro conference help April 12, 2014.

TRANSCRIPT

Page 1: Titan @ Gitpro Conference 2014

AURELIUS THINKAURELIUS.COM

TITAN Scalable Graph Database

Matthias Broecheler @mbroecheler April 12th, MMXIV

Page 2: Titan @ Gitpro Conference 2014

Database

L?;F NCG?

BCAB NBLIOABJON

NL;HM;=NCIH;F

Page 3: Titan @ Gitpro Conference 2014

Graph Database

Page 4: Titan @ Gitpro Conference 2014

Graph Database

M=;F;<F?

CHN?AL;N?>

IJ?H MIOL=?

Page 5: Titan @ Gitpro Conference 2014

name: Newton type: user

name: Hercules type: user

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

Page 6: Titan @ Gitpro Conference 2014

name: Newton type: user

name: Hercules type: user

bought

bought

bought

viewed

in-Cart

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

Page 7: Titan @ Gitpro Conference 2014

name: Newton type: user

name: Hercules type: user

bought

time:24

bought

bought

time:22

time:20

viewed

in-Cart

time:05

time:09

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

Page 8: Titan @ Gitpro Conference 2014

name: Newton type: user

name: Hercules type: user

bought

time:24

bought

bought

time:22

time:20

viewed

in-Cart

time:05

time:09

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

Page 9: Titan @ Gitpro Conference 2014

1.  Home-grown solution

2.  Relational Database

3.  Graph Database

Page 10: Titan @ Gitpro Conference 2014

Home-grown Solution

!  Start with your favorite NoSQL database !  Cassandra, MongoDB, HBase, etc

1.  Error-prone

2.  Data model moves into application code

3.  Maintainability hazard

4.  No query language support

5.  No performance optimization

Page 11: Titan @ Gitpro Conference 2014

Relational Database

!  Relationship tables, SQL and joins

1.  Join processing is expensive

2.  Join processing on large tables does not scale

3.  Cumbersome query language

4.  Inflexible data model

Page 12: Titan @ Gitpro Conference 2014

SELECT P.title FROM

User U1 JOIN Purchase P1 ON P1.buyerid = U1.userid JOIN Purchase P2 ON P1.productid=P2.productid JOIN Purchase P3 ON P2.buyerid=P3.buyerid JOIN Product P ON P3.productid = P.productid

WHERE U1.name=“xyz” AND P1.time>T1 AND P2.time>T1

Page 13: Titan @ Gitpro Conference 2014

Relational Database

!  Relationship tables, joins, and SQL

1.  Join processing is expensive

2.  Join processing on large tables does not scale

3.  Cumbersome query language

4.  Inflexible data model

Page 14: Titan @ Gitpro Conference 2014

name: Newton type: user

name: Hercules type: user

bought

friends

time:24

bought

bought

time:22

time:20

viewed

in-Cart

time:05 duration: 60

time:09

name: Saturn type: author author

author

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

Page 15: Titan @ Gitpro Conference 2014

1.  Home-grown solution

2.  Relational Database

3.  Graph Database

Page 16: Titan @ Gitpro Conference 2014

UML

Entity Relationship Model

Page 17: Titan @ Gitpro Conference 2014

name: Hercules type: user

bought

time:24

6?LN?R

%>A? ,;<?F

%>A?

0LIJ?LNS t E?S q P;FO?

title: “Muscle building for beginners” type: book

Page 18: Titan @ Gitpro Conference 2014

name: Newton type: user

name: Hercules type: user

bought

friends

time:24

bought

bought

time:22

time:20

viewed

in-Cart

time:05 duration: 60

time:09

name: Saturn type: author author

author

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

Page 19: Titan @ Gitpro Conference 2014

g.V.has(‘name’,’xyz’).outE(‘bought’).has(‘time’,gt,T1).inV .inE(‘bought’).has(‘time’,gt,T1).outV .out(‘bought’).title

http://gremlindocs.com/

Page 20: Titan @ Gitpro Conference 2014

Architecture Analogy

MyISAM

Page 21: Titan @ Gitpro Conference 2014

Flexible Persistence

Partitionability

Availability Consistency

Page 22: Titan @ Gitpro Conference 2014

Vertex-Centric Indices

!  Sort and index edges per vertex by sor tkey !  Sort key can be composite

!  Enables efficient focused traversals !  Only retrieve edges that matter

!  Uses push down predicates for quick, index-driven retrieval

Page 23: Titan @ Gitpro Conference 2014

Token Ring

Graph Partitioning

;MMCAHM C>M NI G;J P?LNC=?M CHNI “IJNCG;F” NIE?H L;HA?

,INM I@ CHN?L?MNCHA KO?MNCIHM @IL@ONOL? QILE

OM?M "/0

Page 24: Titan @ Gitpro Conference 2014
Page 25: Titan @ Gitpro Conference 2014

Educating the Planet

Page 26: Titan @ Gitpro Conference 2014

Person

Person Student Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author

references

hasComment relatesTo

author

partOf

relatesTo

Page 27: Titan @ Gitpro Conference 2014

121 Billion Edges 6.2 Billion Vertices

U -CFFCIH 5HCP?LMCNC?M W . Y "CFFCIH 3NO>?HNM

Page 28: Titan @ Gitpro Conference 2014

0F;=?G?HN 'LIOJ

BCU .4RF

Setup

Page 29: Titan @ Gitpro Conference 2014

1.1 million edges / sec

OMCHA <;N=B GI>?

Data Ingestion

Page 30: Titan @ Gitpro Conference 2014

\^ GU .G?>COG

Page 31: Titan @ Gitpro Conference 2014

10,200 transactions / sec

UZ L;H>IGFS =BIM?H =IGJF?R NL;P?LM;F N?GJF;N?M

Throughput

Page 32: Titan @ Gitpro Conference 2014

Transaction Description Avg (ms) Stdev (ms) Student retrieves all content for a single course in their course list 279.32 81.83

Student follows another student 193.72 22.77 Student is recommended people to follow 241.33 256.48

Student reads their stream and shares an item with followers 284.07 68.20

Student retrieves their profile 53.740 22.61 Student reads the most recent comments for their courses 211.07 45.56

Page 33: Titan @ Gitpro Conference 2014

x = [] as Set; m = [:]!m = user.out('follows').aggregate(x)[0..(num*2)]!!.out('follows').except(x)[0..limit]!!.groupCount(m);!

!m.sort{-it.value}[0..num]._()!!.transform{ [userid: it.key.id, !! ! ! ! ! ! points: it.value]};!

&IFFIQ 2?=IGG?H>;NCIH

Page 34: Titan @ Gitpro Conference 2014

AURELIUS THINKAURELIUS.COM

Faunus Batch Graph Analytics

Page 35: Titan @ Gitpro Conference 2014

!  Hadoop-based Graph Computing Framework

!  Graph Analytics

!  Breadth-first Traversals

!  Global Graph Computations

! Batch Big Graph Data

Faunus Features

Page 36: Titan @ Gitpro Conference 2014

Faunus Architecture

g._()!

Page 37: Titan @ Gitpro Conference 2014

Faunus Work Flow

hdfs://user/ubuntu/

output/job-0/

output/job-1/

output/job-2/ { graph*

sideeffect*

g.V.out .out .count()

Compressed HDFS Graphs !  stored in sequence files !  variable length encoding !  prefix compression

Page 38: Titan @ Gitpro Conference 2014

Degree Distribution

GitHub Network

g.V.sideEffect{ it.degree = it.out(‘follows’).count()

}.degree.groupCount

Page 39: Titan @ Gitpro Conference 2014

Degree Distribution

P(k) ~ k-γ

γ = 2.2

Page 40: Titan @ Gitpro Conference 2014

Global Recommendations

gremlin> g.E.has('label','pushed','to').keep.!! ! !V.out('pushed').out('to').!! ! !in('to').in('pushed').!! ! !sideEffect('{it.score =it.pathCounter}').!! ! !score.order(F.decr,'name')!

!# Top 5:!Jippi ! ! ! !60892182927!garbear ! ! !30095282886!FakeHeal ! ! !30038040349!brianchandotcom !24684133382!nyarla! ! !15230275746!

Page 41: Titan @ Gitpro Conference 2014

Aurelius Graph Cluster

OLTP OLAP

Hadoop MapReduce

Analysis results back into Titan

Apache 2

g.V.label.groupCount g.v(101).out

titan.thinkaurelius.com faunus.thinkaurelius.com

[email protected]

Page 42: Titan @ Gitpro Conference 2014

AURELIUS THINKAURELIUS.COM

@AURELIUSGRAPHS