pnuts: yahoo!’s hosted data serving platform
DESCRIPTION
Mina Farid University of Waterloo CS 848 Presentation 8 February 2010. PNUTS: Yahoo!’s Hosted Data Serving Platform. Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Research. Outline. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/1.jpg)
PNUTS: Yahoo!’s Hosted Data Serving Platform
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni
Research
Mina FaridUniversity of WaterlooCS 848 Presentation8 February 2010
![Page 2: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/2.jpg)
Mina Farid2
Outline Motivation Data and Query Model Consistency System Architecture Applications Experiments
![Page 3: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/3.jpg)
Mina Farid3
Motivation Scalability Response Time (SLAs) High Availability and Fault Tolerance Relaxed Consistency Guarantees
Serializable Transactions Eventual Consistency: update any replica, all updates
are propagated to all replicas, but potentially in different orders
![Page 4: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/4.jpg)
Mina Farid4
Data and Query Model Simplified Relational Data Model (tables,
records, attributes) Flexible schemas Query: Selection and Projection from a single
table. Specific applications Scans a few records No ad-hoc queries
Support for hashed and ordered tables
![Page 5: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/5.jpg)
Mina Farid5
Consistency In between One record updates Per-record timeline consistency: replicas
of a record apply updates in the same order
For one version, all replicas contain the same information
General Serializability Eventual Consistency
![Page 6: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/6.jpg)
Mina Farid6
Consistency (cont’d) Master replica for each record. Updates are forwarded to this master replica Master record carries the version info API calls - Consistency
Read-anyRead-critical(required_version)Read-latestWriteTest-and-set-write(required_version)
![Page 7: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/7.jpg)
Mina Farid7
System Architecture
Tablet Controll
er
Storage Unit 1 Storage Unit 2 Storage Unit N
Routers
Message
Broker
. . . . . . . .
Region
T1 SU1
T2 SU2
T3 SU3
T4 SU1
![Page 8: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/8.jpg)
Mina Farid8
System Architecture – Data Storage and Retrieval
Regions with full complement of system and data
Tables are partitioned into tablets Tablet is just a group of records of a certain table
Tablets are stored on storage units servers Storage units respond to:
get() scan() set()
![Page 9: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/9.jpg)
Mina Farid9
Tablet 1 Tablet 2 Tablet 3 Tablet 4
Routers’ Mapping – Ordered Table Routers decide:
Which tablets contain which records Which SU holds which tablets Banana
. . . .
. . . .Grape. . . .. . . .Lemon. . . .MAX_STRING
MIN_STRING. . . .
T1 SU1
T2 SU2
T3 SU3
T4 SU1
MIN T1
Banana
T2
Grape T3
Lemon T4
![Page 10: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/10.jpg)
Mina Farid10
System Architecture
Tablet Controll
er
Storage Unit 1 Storage Unit 2 Storage Unit N
Routers
Message
Broker
. . . . . . . .
Region
T1 SU1
T2 SU2
T3 SU3
T4 SU1
MIN T1
Banana
T2
Grape T3
Lemon T4
MIN T1
Banana
T2
Grape T3
Lemon T4
T1 SU1
T2 SU2
T3 SU3
T4 SU1
![Page 11: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/11.jpg)
Mina Farid11
System Architecture
Tablet Controll
er Routers Message
Broker
Tablet Controll
erRoutersMessage
Broker
Storage Units Storage Units
Region 1
Region 2
T1 SU1
T2 SU2
T3 SU3
T4 SU1
T1 SU1
T2 SU2
T3 SU3
T4 SU1
MIN T1
Banana T2
Grape T3
Lemon T4
T1 SU1T2 SU2T3 SU3T4 SU1
MIN T1
Banana T2
Grape T3
Lemon T4
T1 SU1T2 SU2T3 SU3T4 SU1
![Page 12: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/12.jpg)
Mina Farid12
System Architecture – Replication and Consistency1- Yahoo! Message Broker
Reliable topic based publish/subscribe Updates are asynchronously propagated to all replicas Provides ‘Partial Ordering’:
Messages published to a particular YMB will be delivered to all subscribers in the same order.
Messages published to different YMBs may be delivered in any order
Solution: per-record mastership
![Page 13: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/13.jpg)
Mina Farid13
System Architecture – Replication and Consistency2- Consistency and Record Mastership
One copy of a record as a master Updates are forwarded to that master copy
Publish update (commit) Different records in the same table can be mastered in
different clusters
Who is the master record? How it is selected? Each record carries meta-data information about the
identity of the master record (changeable) Record receiving most updates
![Page 14: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/14.jpg)
Mina Farid14
Query Processing Multi-record querying
Scatter-gather engine (Router) Split multi-record request to multiple single-record
requests Initiates parallel queries Assemble and evaluate results, and send it back to the
client Handles range and scan queries (also supports top-k)
![Page 15: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/15.jpg)
Mina Farid15
Applications User Databases
Millions of records, frequent updates, important data, relaxed consistency
Social ApplicationFlexible schemas, large number of small updates, no real-time requirements (relaxed consistency)
Content Meta-DataManage structured metadata, scalable, consistent
Session DataScalable storage to manage states, but low consistency required
![Page 16: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/16.jpg)
Mina Farid16
Experiments
Main criteria: Average Request Latency (response time)
Experiment Setup 3 Regions (2 West, 1 East)
1- Inserting data2- Varying Load3- Varying number of Storage Units
![Page 17: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/17.jpg)
Mina Farid17
Future EnhancementsIncludes adding the following features:
Indexing, Materialized Views Bundled updates (atomic non-isolated updates
for multiple records)
![Page 18: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/18.jpg)
Mina Farid18
Conclusion
![Page 19: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/19.jpg)
Mina Farid19
Thank You!Questions?
![Page 20: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/20.jpg)
Mina Farid20
![Page 21: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/21.jpg)
Mina Farid21
Google BigTable Record-oriented access to very large tables Does not support:
Geographic replication Secondary indexes Materialized views Hash-organized tables
![Page 22: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/22.jpg)
Mina Farid22
Dynamo Focuses on availability Provides geographic replication via ‘gossip’
mechanism Eventual consistency model does not suit all
applications “Updates are committed in different orders at
different replicas”, then replicas are eventually reconciled (updates may roll back)
Does not support: Ordered tables
![Page 23: PNUTS: Yahoo!’s Hosted Data Serving Platform](https://reader030.vdocument.in/reader030/viewer/2022020323/56816385550346895dd46ed8/html5/thumbnails/23.jpg)
Mina Farid23
Boxwood Provides B-tree implementation The design favors consistency over scalability
(tens of machines)