a journey from relational to graph

26
A Journey From Relational to Graph Trials and Tribulations on the Path to Graph

Upload: nakul-jeirath

Post on 12-Apr-2017

382 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: A Journey from Relational to Graph

A Journey From Relational to Graph

Trials and Tribulations on the Path to Graph

Page 2: A Journey from Relational to Graph

Introduction● Nakul Jeirath

● Senior security engineer at WellAware (wellaware.us)

● WellAware: Oil & gas startup building a SaaS monitoring & analytics platform

Page 3: A Journey from Relational to Graph

Wikipedia List of Graph DBs

https://en.wikipedia.org/wiki/Graph_database

Page 4: A Journey from Relational to Graph

Wikipedia List of Graph DBs

We use Titan+Cassandra

Page 5: A Journey from Relational to Graph

Transitioned ~2 years ago

Page 6: A Journey from Relational to Graph

Why Switch?Graph model allowed modeling of well pad and derived calculations

Page 7: A Journey from Relational to Graph

Why Switch?Graph model allowed modeling of well pad and derived calculations

Visualization built with http://js.cytoscape.org/

Page 8: A Journey from Relational to Graph

Overview● Quick graph overview + toy example

● Our journey○ Episode I: Development

○ Episode II: Migration

○ Episode III: Operation

Page 9: A Journey from Relational to Graph

Property Graph

Label: employee

name: Nakul

Label: company

name: WellAware

label: works for

hired: 9/13

Page 10: A Journey from Relational to Graph

A Toy Example

http://coachesbythenumbers.com/sportsource-college-football-data-packages/

2005 College Football Data

● Team names & conferences● Game record with dates and scores

● Interesting questions:○ Records for all teams in conference X○ Top 25 ranking using record + strength of opponents

○ Three team loop (A beat B beat C beat A)

● Source code: https://github.com/njeirath/titan-perf-tester

Page 11: A Journey from Relational to Graph

Toy Models

Label: team

name: Purdueconf: Big 10

Label: team

name: IUconf: Big 10

label: beat

date: 11/19/05score: 41-14

Teams

team_id

conference

name

Beat

winner

loser

win_score

lose_score

SQL

Graph

Page 12: A Journey from Relational to Graph

Episode I: DevelopmentSQL vs Gremlin

Developer Opinion

Page 13: A Journey from Relational to Graph

Example: Get Big 10 RecordsSQL

SELECT win_record.NAME, win_record.wins, Count(l) FROM (SELECT teams.team_id, teams.NAME AS NAME, Count(w) AS wins FROM teams JOIN beat AS w ON teams.team_id = w.winner WHERE conference = 'Big Ten Conference' GROUP BY teams.NAME, teams.team_id) AS win_record JOIN beat AS l ON team_id = l.loser GROUP BY win_record.NAME, win_record.wins ORDER BY win_record.wins DESC;

Gremlin

g.V().order().by(__.outE().count(), decr).has('conference', 'Big Ten Conference').as('team', 'wins', 'losses').select('team', 'wins', 'losses').by('name').by(__.outE().count()).by(__.inE().count())

Page 14: A Journey from Relational to Graph

Example: Top 25 RankingSQL

SELECT teams.name, ranks.rank FROM (SELECT beat.winner, Sum(rec.wins) AS rank FROM (SELECT teams.team_id, Count(w) AS wins FROM teams JOIN beat AS w ON w.winner = teams.team_id GROUP BY teams.team_id) AS rec JOIN beat ON beat.loser = rec.team_id GROUP BY beat.winner ORDER BY rank DESC LIMIT 25) AS ranks JOIN teams ON teams.team_id = ranks.winner ORDER BY ranks.rank DESC;

Gremlin

g.V().order().by(__.out().out().count(), decr).as('team', 'score', 'wins', 'losses').select('team', 'score', 'wins', 'losses').by('name').by(__.out().out().count()).by(__.outE().count()).by(__.inE().count()).limit(25)

Page 15: A Journey from Relational to Graph

/r/mildlyinteresting/1. Texas2. USC3. Penn State4. Ohio State5. Virginia Tech6. TCU7. West Virginia8. Lousianna State9. Alabama

10. Oregon11. Louisville12. Georgia13. UCLA14. Miami (FL)

1. Texas2. USC3. Penn State4. Virginia Tech5. LSU6. Ohio State7. Georgia8. TCU9. West Virginia

10. Alabama11. Boston College12. Oklahoma13. Florida14. UCLA

http://www.collegefootballpoll.com/2005_archive_computer_rankings.html

2005 End of Season Computer Rankings

Our Query Results

Page 16: A Journey from Relational to Graph

Developer Opinion● ORMs

○ Move to graph, lost Django ORM○ ORM/OGM option at the time was Totorom

● Query Language○ Gremlin seems more intuitive

Page 17: A Journey from Relational to Graph

Episode II: MigrationEssentially an ETL operation:

1. Export tables (table name --> vertex label, columns --> vertex properties)2. Export FK/Join tables (FK/Join table name --> edge label)

team_id conference name

559 Big 10 Purdue

306 Big 10 Indiana

...

winner loser win_score lose_score

559 306 41 14

...

Challenges:

● Dealing with indices● Migrating a production DB

Page 18: A Journey from Relational to Graph

Challenges with Index Relational DB indices are local per table, graph IDs are global

ID Name Teacher

1 Kyle 1

2 Stan 1

3 Kenny 1

...

ID Teacher

1 Garrison

...

student

pg_id: 1

teacher

pg_id: 1

Unique key isVertex label + pg_id

Page 19: A Journey from Relational to Graph

Migrating a Production DBPotentially large amounts of data - batch loading optimizations

Static

Time series

Step 1: Move static

Step 2: Reroute requests and data

Step 3: Move old TS

Page 20: A Journey from Relational to Graph

Episode III: Operating GraphUsual benefits of NoSQL

● Designed for scalability - built in sharding, redundancy, etc.○ Ex: Titan pluggable with Cassandra/HBase

● Usually allows on the fly schema changes○ Flexible migrations avoid DB downtime

Underlying DB technology requires expertise, tuning, monitoring, etc

Page 21: A Journey from Relational to Graph

PerformanceIf not considered early, OLTP performance can potentially be an issue

Consider Titan architecture:

Server

Titan JVM

Storage Backend

Gremlin evaluated here

g.V().has('name', 'Purdue').out('beat').values('name')

Index retrievalEdge traversalVertex property retrieval

Page 22: A Journey from Relational to Graph

Dealing with Performance● Understand storage structures

● Understand Cassandra characteristics○ Ex: Generally deletes are bad

● Talks on Titan+Cassandra tuning:○ Ted Wilmes - Cassandra Summit 2015:

■ Slides: http://www.slideshare.net/twilmes/modeling-the-iot-with-titandb-and-cassandra

■ Video: https://vimeopro.com/user35188327/cassandra-summit-2015/video/143695770

○ Nakul Jeirath - Graph Day TX:

http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html

Page 23: A Journey from Relational to Graph

Our ApproachLots of real-time data, tiny bit of relatively static data

Some optimization, mostly caching of static data

Heavily optimized real-time

Static

Time series

Code Optimization + caching

Model changes + code optimization

Page 24: A Journey from Relational to Graph

Maturity of Graph● Query languages

○ SQL allows relatively ease of switching relational DB vendors

○ Tinkerpop for graph but not universally supported today

● Version upgrades○ Currently on Titan 0.4.4○ 0.4.4 --> 0.5.*: not storage compatible (require ETL to upgrade)○ 0.4.4 --> 1.*: not storage compatible, query code rewrite

Page 25: A Journey from Relational to Graph

Summary● Development

○ Gremlin easier to work with than SQL (opinion)

○ Tools for SQL more mature and varied but graph is catching up

● Migration○ Relational --> Graph generally requires ETL

● Operation○ NoSQL benefits of distributed, scalable, schemaless DBs○ Performance can be an issue if not considered early○ Graph vendor/version coupling but will improve with maturity

Page 26: A Journey from Relational to Graph

Thanks For Watching

Questions

Nakul Jeirath@njeirathSenior Security Engineer - WellAware