Transcript
Page 1: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Wanderu: Lessons Learned

Lessons Learned and Unlearned from Building a Travel Site with Graphs and Neo4j

Eddy WongCTO, Wanderu.com

@eddywongch

Page 2: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

About Wanderu.comSearch Engine for (Intercity) Buses and Trains

Page 3: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Demo

Page 4: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

From pt A to pt B

Nomenclature: Stations, Trips

A: NYCB: DC

Philly

BOLT, $13, 11/07/2013

MEG, $9, 11/07/2013 MEG, $4, 11/07/2013

A Shortest Path Problem as a function of depart, arrive, price, duration, date times

Page 5: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Lessons

LearnedUnLearned

Idea

•Architectural•Modeling•Geo

Page 6: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Our Story

• 2 yr startup, Tech started about 1+ yr ago

• Beta in Mar 2013, Launch in Aug 2013

• Knew nothing about Neo4j when we started (Jun 2012)

• Did not like the relational model: wanted schema-less and no self-joins

• Wanted a graph model

Page 7: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Workflow

Store

Scraping JSON

Bus Websites Non-uniform Data

Uniform Data

Server

Page 8: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Architectural Lessons

Art: MC Escher

Page 9: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Our Situation

• Data is written only in one direction

• Users search for paths, then segments

• Searches are done by date

• Needed online capability

• Trip info (price/avail) could change on some

Page 10: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Solution

Scraping JSON

Bus Websites Non-uniform Data

Uniform Data

MongoDBNeo4jMongoConn

Nodes & Edges

Replica Mechanism

Page 11: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

MongoConnector

• MongoDB Lab project, open source, unsupported

• Uses Replica Mechanism: Oplog

• Eventually Consistent (not real time)

• Written in Python

• Main methods: Upserts and Deletes, passes doc

• Implement DocMgr->Neo4jDocMgr->py2neo

• We can add new properties easily on the fly

Page 12: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Polyglot Arch

Scraping JSON

Bus Websites Non-uniform Data

MongoDB

Neo4j

MongoConnNodes & Edges

Replica Mechanism

REST Server

BOS, NYCBOS, PHLNYC, DC

NYC, PHL

Page 13: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Modeling Lessons

Art: MC Escher

Page 14: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Our Story

• We tried to “dump” all data into Neo4j

• Edges had dates -> too many Edges -> “Super Node Problem”

• Query perf was terrible (1+ mins) and worse as # edges increased

• Tried Gremlin -> No improvements

• Needed range queries on Edges

Page 15: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

“Dehydate”

• Don’t store everything in the Neo4j, only metadata

• Use Neo4j as a “connection index”

• Don’t store entities in Nodes, only keys

• Don’t store heavy properties in Edges

Page 16: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Neo4j Model

source: Wes Freeman, Tobias Lindaaker

Page 17: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Our Solution

• Serve paths from Neo4j

• Segments from MongoDB (with date constraints)

• Back to “Joins”

• “Join” across Neo4j + MongoDB:

1 != 525d9031e6c9236072114387

Page 18: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Joins across DBs

MongoDB: Stations Neo4j: Nodes

BOS BOS

NYC NYC

DC DC

... ...

MongoDB: Trips Neo4j: Edges

BOS-NYC BOS-NYC

BOS-DC BOS-DC

NYC-DC NYC-DC

... ...

• Forget seq id generated by dbs

• Use a human-created “UUID” string for id

• Convert pair into id: depart-arrive

• For example: BOS-NYC

Page 19: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Geo Lessons

Art: MC Escher

Page 20: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Hybrid Solution

• Google Autocomplete

• Google Maps

• MongoDB station geo lookup

Page 21: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Lessons of Lessons

• Really understand the Neo4j Runtime Model

• Pick universal human generated ids

• Join across dbs better than RDBMS: 10s paths x 100s segments vs. 500k x 500k

• Glad to have picked Neo4j: doing content gen and more geo features now

Page 22: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Useful Links

• Neo4j Internals

slideshare.net/thobe/an-overview-of-neo4j-internals

• Aseem’s Lessons Learned with Neo4j

http://aseemk.com/talks/neo4j-lessons-learned#/14

• Wes Freeman, Neo4j Internals

http://wes.skeweredrook.com/graphdb-meetup-may-2013.pdf

• MongoConnector

blog.mongodb.org/post/29127828146/introducing-mongo-connector


Top Related