megastore: providing scalable, highly available storage ... · megastore • started in 2006 for...
TRANSCRIPT
![Page 1: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/1.jpg)
Megastore: Providing Scalable, Highly Available Storage for
Interactive Services J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin,
J. Larson, J-M Léon, Y. Li, A. Lloyd, V. YushprakhGoogle Inc.
Originally presented at CIDR 2011 by James Larson
Presented by George Lee
Monday, April 4, 2011
![Page 2: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/2.jpg)
With Great Scale Comes GreatResponsibility
• A billion Internet users• Small fraction is still huge• Must please users• Bad press is expensive - never lose data• Support is expensive - minimize
confusion• No unplanned downtime• No planned downtime• Low latency• Must also please developers, admins
Monday, April 4, 2011
![Page 3: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/3.jpg)
Monday, April 4, 2011
![Page 4: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/4.jpg)
Monday, April 4, 2011
![Page 5: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/5.jpg)
Monday, April 4, 2011
![Page 6: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/6.jpg)
Megastore
• Started in 2006 for app development at Google• Service layered on:• Bigtable (NoSQL scalable data store per
datacenter)• Chubby (Config data, config locks)• Turnkey scaling (apps, users)• Developer-friendly features• Wide-area synchronous replication• partition by "Entity Group"
Monday, April 4, 2011
![Page 7: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/7.jpg)
Monday, April 4, 2011
![Page 8: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/8.jpg)
Monday, April 4, 2011
![Page 9: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/9.jpg)
Monday, April 4, 2011
![Page 10: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/10.jpg)
Entity Group ExamplesApplication Entity Groups Cross-EG Ops
Email User accounts none
Blogs Users, BlogsAccess control, notifications,
global indexes
Mapping Local patches Patch-spanning ops
Social Users, GroupsMessages,
relationships, notifications
Resources Sites Shipments
Monday, April 4, 2011
![Page 11: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/11.jpg)
Achieving Technical Goals
• Scale• Bigtable within datacenters• Easy to add Entity Groups (storage,
throughput)• ACID Transactions
• Write-ahead log per Entity Group• 2PC or Queues between Entity Groups
• Wide-Area Replication• Paxos• Tweaks for optimal latency
Monday, April 4, 2011
![Page 12: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/12.jpg)
Two Phase Commit
• Commit request/Voting phase
• Coordinator sends query to commit
• Cohorts prepare and reply
• Commit/Completion phase
• Success: Commit and acknowledge
• Failure: Rollback and acknowledge
• Disadvantage: Blocking protocol
Monday, April 4, 2011
![Page 13: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/13.jpg)
Basic Paxos
• Prepare and Promise
• Proposer selects proposal number N and sends promise to acceptors
• Acceptors accept or deny the promise
• Accept! and Accepted
• Proposer sends out value
• Acceptors respond to proposer and learners
Monday, April 4, 2011
![Page 14: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/14.jpg)
Message Flow: Basic Paxos
Client Proposer Acceptor Learner | | | | | | | X-------->| | | | | | Request | X--------->|->|->| | | Prepare(N) | |<---------X--X--X | | Promise(N,{Va,Vb,Vc}) | X--------->|->|->| | | Accept!(N,Vn) | |<---------X--X--X------>|->| Accepted(N,Vn) |<---------------------------------X--X Response | | | | | | |
Monday, April 4, 2011
![Page 15: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/15.jpg)
Paxos: Quorum-based Consensus
"While some consensus algorithms, such as Paxos, have started to find their way into [large-scale distributed storage systems built over failure-prone commodity components], their uses are limited mostly to the maintenance of the global configuration information in the system, not for the actual data replication."
-- Lamport, Malkhi, and Zhou, May 2009
Monday, April 4, 2011
![Page 16: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/16.jpg)
Paxos and Megastore
• In practice, basic Paxos is not used
• Master-based approach?
• Megastore’s tweaks
• Coordinators
• Local reads
• Read/write from any replica
• Replicate log entries on each write
Monday, April 4, 2011
![Page 17: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/17.jpg)
Omissions
• These were noted in the talk:
• No current query language
• Apps must implement query plans
• Apps have fine-grained control of physical placement
• Limited per-Entity Group update rate
Monday, April 4, 2011
![Page 18: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/18.jpg)
Is Everybody Happy?• Admins
• linear scaling, transparent rebalancing (Bigtable)• instant transparent failover• symmetric deployment
• Developers• ACID transactions (read-modify-write)• many features (indexes, backup, encryption, scaling)• single-system image makes code simple• little need to handle failures
• End Users• fast up-to-date reads, acceptable write latency• consistency
Monday, April 4, 2011
![Page 19: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/19.jpg)
Take-Aways
• Constraints acceptable to most apps
• Entity Group partitioning
• High write latency
• Limited per-EG throughput
• In production use for over 4 years
• Available on Google App Engine as HRD (High Replication Datastore)
Monday, April 4, 2011
![Page 20: Megastore: Providing Scalable, Highly Available Storage ... · Megastore • Started in 2006 for app development at Google • Service layered on: • Bigtable (NoSQL scalable data](https://reader034.vdocument.in/reader034/viewer/2022051918/600a1e35d72c2425e173a6ac/html5/thumbnails/20.jpg)
Questions?
• Rate limitation on writes are insignificant?
• Why not lots of RDBMS?
• Why not NoSQL with “mini-databases”?
Monday, April 4, 2011