tao: facebook’s distributed data store for the social...
TRANSCRIPT
![Page 1: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/1.jpg)
TAO: Facebook’s Distributed Data Store for the Social Graph
Presented by Zongheng YangCS294 Big Data
Nov 9, 2015
![Page 2: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/2.jpg)
Graph stores in the wild
![Page 3: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/3.jpg)
Graph stores in the wild
![Page 4: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/4.jpg)
Graph stores in the wild
![Page 5: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/5.jpg)
Graph stores in the wild
LinkedIn’s GraphDB
![Page 6: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/6.jpg)
Graph stores in the wild
LinkedIn’s GraphDB
![Page 7: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/7.jpg)
Graph stores in the wild
LinkedIn’s GraphDB
![Page 8: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/8.jpg)
Graph stores in the wild
LinkedIn’s GraphDB
![Page 9: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/9.jpg)
Graph stores in the wild
LinkedIn’s GraphDB
Key diff. from Graph Processing: user-facing!
![Page 10: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/10.jpg)
Graph stores in the wild
LinkedIn’s GraphDB
Key diff. from Graph Processing: user-facing!
Very active space, both in industry & academia
![Page 11: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/11.jpg)
Graph stores in the wild
LinkedIn’s GraphDB
Key diff. from Graph Processing: user-facing!
Very active space, both in industry & academia
Huge variance in scale and approach
![Page 12: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/12.jpg)
ProblemUser-facing serving of a billion-node, trillion-edge social graph• FB full graph in O(petabyte), not gonna fit in my laptop
Previous approach: lookaside memcache + MySQL:1. KV pair is inefficient 2. expensive read-after-write consistency
Extremely high read load, due to freshness & privacy filtering• sustained > one billion queries per second
![Page 13: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/13.jpg)
Data Model
0
2
3
1
“Hi”atype 0time 10
“Martin”atype 1time 10
“Winter”atype 2time 4
“Coming”atype 1time 7
“George”atype 0time 99
atype 2time 6
“MyFav!”atype 0time 2
assoc. type012
![Page 14: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/14.jpg)
API
assoc_range(src, atype, off, len)obj_get(nodeId)assoc_get(src, atype, dstIdSet, tLow, tHigh)assoc_count(src, atype)assoc_time_range(src, atype, tLow, tHigh, len)
CHECKED_IN
LIKED
[ (id 123, time 11/8/2015 9:30am), … ]
[ (id 123, time 11/8/2015 11am), … ]
“50 most recent check-ins to Golden Gate Bridge”
“10 most recent check-ins within last 24hr”
![Page 15: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/15.jpg)
Architecture
Adapted from Bronson et al., ATC 13
Cache
Database
Web servers
![Page 16: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/16.jpg)
Architecture
Adapted from Bronson et al., ATC 13
Cache
Database
Web servers
sharded by nodeID
![Page 17: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/17.jpg)
Architecture
Adapted from Bronson et al., ATC 13
Cache
Database
Web servers
“tier”
sharded by nodeID
![Page 18: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/18.jpg)
Architecture
Adapted from Bronson et al., ATC 13
Cache
Database
Web servers
“tier”objects,
assoc lists, counts
sharded by nodeID
![Page 19: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/19.jpg)
Challenge: read load is too high
![Page 20: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/20.jpg)
Challenge: read load is too high
Add more servers to the caching layer
![Page 21: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/21.jpg)
Challenge: read load is too high
Add more servers to the caching layer
Challenge: graph grows larger
![Page 22: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/22.jpg)
Challenge: read load is too high
Add more servers to the caching layer
Challenge: graph grows larger
Add more database shards to the storage layer
![Page 23: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/23.jpg)
Challenge: read load is too high
Challenge: a large tier of cache servers doesn’t scale well
Add more servers to the caching layer
Challenge: graph grows larger
Add more database shards to the storage layer
![Page 24: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/24.jpg)
Challenge: read load is too high
Challenge: a large tier of cache servers doesn’t scale well
Add more servers to the caching layer
Two-layer hierarchical caching
Challenge: graph grows larger
Add more database shards to the storage layer
![Page 25: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/25.jpg)
Two-layer caching
Adapted from Bronson et al., ATC 13
Follower cache
Database
Web servers
Leader cache
![Page 26: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/26.jpg)
Availability• Key idea: a “tier” covers all ID space, can answer
any query
• Follower failure: failover to another follower tier
• Leader failure: follower talks directly to database• 0.15% of follower cache misses
• Database failure:• If DB in master “region” down, promote a slave
• 0.25% of a 90-day sample• If slave DB down: route to master
![Page 27: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/27.jpg)
Write Path• On write to node:
• leader sends invalidate message to other followers
• On write to edge:• leader sends refill
message (why?)
• More complicated when inter-region repl. is involved (see Figure)
![Page 28: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/28.jpg)
Consistency• As a whole, TAO is
eventually consistent
• Within a tier, read-after-write consistency
• Trick: route critical queries to master region for strong consistency
![Page 29: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/29.jpg)
Consistency• As a whole, TAO is
eventually consistent
• Within a tier, read-after-write consistency
• Trick: route critical queries to master region for strong consistency
But, with failures, if client writes N things…
![Page 30: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/30.jpg)
Consistency• As a whole, TAO is
eventually consistent
• Within a tier, read-after-write consistency
• Trick: route critical queries to master region for strong consistency
But, with failures, if client writes N things…
Can end up with 2^N states!
![Page 31: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/31.jpg)
Eval. Takeaway: API Frequency40.9% assoc_range(src, atype, off, len)28.9% obj_get(nodeId)15.7% assoc_get(src, atype, dstIdSet, tLow, tHigh)11.7% assoc_count(src, atype) 2.8% assoc_time_range(src, atype, tLow, tHigh, len)
Reads (99.8%)
Writes (0.2%)
52.5% assoc_add20.7% obj_update16.5% obj_add 8.3% assoc_del 2.0% obj_del0.9% assoc_change_type
![Page 32: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/32.jpg)
Eval. Takeaway: Degree
Takeaways:1% supernodes
long tail
![Page 33: TAO: Facebook’s Distributed Data Store for the Social Graphistoica/classes/cs294/15/notes/21-tao.pdf · Discussion • TAO uses a relational storage backend, citing operational](https://reader033.vdocument.in/reader033/viewer/2022042014/5e73b14f66678d65b6425ef7/html5/thumbnails/33.jpg)
Discussion• TAO uses a relational storage backend, citing operational
confidence• Is a mature, full-fledged, performant, geographically distributed
native graph store possible / preferable over TAO’s architecture?
• Is there something fundamentally difficult/different about the higher-level data model that prevents this (vs. relational)?
• Is it possible to combine batch processing with online serving in a single graph system?
• Limitation: is stronger consistency worth the tradeoff in online graph serving?