Download - Big table presentation-final
![Page 1: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/1.jpg)
A Distributed Storage System for Structured Data
Bigtable
Presenter:Yunming Zhang
Conglong Li
Saturday, September 21, 13
![Page 2: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/2.jpg)
References
SOCC 2010 Key Note SlidesJeff Dean Google
Introduction to Distributed Computing, Winter 2008University of Washington
2Saturday, September 21, 13
![Page 3: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/3.jpg)
Motivation
Lots of (semi) structured data at GoogleURLs
Contents, crawl metadata, linksPer-user data:
User preference settings, search resultsScale is large
Billions of URLs, hundreds of million of users,Existing Commercial database doesn’t meet the requirements
3Saturday, September 21, 13
![Page 4: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/4.jpg)
Store and manage all the state reliably and efficientlyAllow asynchronous processes to update different pieces of data continuously
Very high read/write ratesEfficient scans over all or interesting subsets of data
Often want to examine data changes over time
Goals
4Saturday, September 21, 13
![Page 5: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/5.jpg)
BigTable vs. GFS
GFS provides raw data storageWe need:
More sophisticated storageKey - value mapping
Flexible enough to be usefulStore semi-structured dataReliable, scalable, etc.
5Saturday, September 21, 13
![Page 6: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/6.jpg)
BigTable
Bigtable is a distributed storage system for managing large scale structured data
Wide applicabilityScalabilityHigh performanceHigh availability
6Saturday, September 21, 13
![Page 7: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/7.jpg)
Overview
Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions
7Saturday, September 21, 13
![Page 8: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/8.jpg)
Data Model
SparseSortedMultidimensional
8Saturday, September 21, 13
![Page 9: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/9.jpg)
Cell
Contains multiple versions of the data
Can locate a data using row key, column key and a time stamp
Treats data as uninterpreted array of bytes that allow clients to serialize various forms of structured and semi-structured data
Supports automatic garbage collection per column family for management of versioned data
9Saturday, September 21, 13
![Page 10: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/10.jpg)
Store and manage all the state reliably and efficientlyAllow asynchronous processes to update different pieces of data continuously
Very high read/write ratesEfficient scans over all or interesting subsets of data
Often want to examine data changes over time
Goals
10Saturday, September 21, 13
![Page 11: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/11.jpg)
Row
Row key is an arbitrary stringAccess to column data in a row is atomic
Row creation is implicit upon storing dataRows ordered lexicographically
Rows close together lexicographically usually reside on one or a small number of machines
11Saturday, September 21, 13
![Page 12: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/12.jpg)
Columns
Columns are grouped into Column Families:family:optional_qualifier
Column familyHas associated type informationUsually of the same type 12
Saturday, September 21, 13
![Page 13: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/13.jpg)
Overview
Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions
13Saturday, September 21, 13
![Page 14: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/14.jpg)
API
Metadata operationsCreate/delete tables, column families, change metadata, modify access control list
Writes ( atomic )Set (), DeleteCells(), DeleteRow()
ReadsScanner: read arbitrary cells in a BigTable
14Saturday, September 21, 13
![Page 15: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/15.jpg)
Overview
Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions
15Saturday, September 21, 13
![Page 16: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/16.jpg)
Tablets
Large tables broken into tablets at row boundariesTablet holds contiguous range of rows
Clients can often choose row keys for localityAim for ~100MB to 200MB of data per tablet
Serving machine responsible for ~100 tabletsFast recovery:
100 machine each pick up 1 tablet from failed machine
Fine-grained load balancing:Migrate tablets away from overloaded machine
16Saturday, September 21, 13
![Page 17: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/17.jpg)
Tablets and Splitting
Saturday, September 21, 13
![Page 18: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/18.jpg)
System Structure
MasterMetadata operationsLoad balancingKeep track of live tablet serversMaster failure
Tablet serverAccept read and write to data
18Saturday, September 21, 13
![Page 19: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/19.jpg)
System Structure
Saturday, September 21, 13
![Page 20: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/20.jpg)
System Structure
read/write
Saturday, September 21, 13
![Page 21: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/21.jpg)
System Structure
Metadata operations
Saturday, September 21, 13
![Page 22: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/22.jpg)
Locating Tablets
3-level hierarchical lookup scheme for tabletsLocation is ip port of servers in META tables
22Saturday, September 21, 13
![Page 23: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/23.jpg)
Tablet Representationand serving
Append only tablet logSSTable on GFS
A Sorted map of string to stringIf you want to find a row data, all the data are contiguous
Memtable write bufferWhen a read comes in, you have to merge SSTable data and uncommitted value.
23Saturday, September 21, 13
![Page 24: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/24.jpg)
Tablet Representationand Serving
24Saturday, September 21, 13
![Page 25: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/25.jpg)
Tablet Representationand Serving
25Saturday, September 21, 13
![Page 26: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/26.jpg)
Compaction
Tablet state represented as a set of immutable compacted SSTable files, plus tail of log
Minor compaction:When in-memory buffer fills up, it freezes the in-memory buffer and create a new SSTable
Major compaction:Periodically compact all SSTables for tablet into new base SSTable on GFS
Storage reclaimed from deletions at this point
Produce new tables 26
Saturday, September 21, 13
![Page 27: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/27.jpg)
Overview
Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions
27Saturday, September 21, 13
![Page 28: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/28.jpg)
Reliable system for storing and managing all the statesAllow asynchronous processes to update different pieces of data continuously
Very high read/write ratesEfficient scans over all or interesting subsets of data
Often want to examine data changes over time
Goals
28Saturday, September 21, 13
![Page 29: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/29.jpg)
Locality Groups
Clients can group multiple column families together into a locality group
A separate SSTable is generated for each locality group
Enable more efficient readCan be declared to be in-memory
29Saturday, September 21, 13
![Page 30: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/30.jpg)
Compression
Many opportunities for compressionSimilar values in columns and cells
Within each SSTable for a locality group, encode compressed blocks
Keep blocks small for random access Exploit fact that many values very similar
30Saturday, September 21, 13
![Page 31: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/31.jpg)
Reliable system for storing and managing all the statesAllow asynchronous processes to update different pieces of data continuously
Very high read/write ratesEfficient scans over all or interesting subsets of data
Often want to examine data changes over time
Goals
31Saturday, September 21, 13
![Page 32: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/32.jpg)
Commit log and recovery
Single commit log file per tablet serverreduce the number of concurrent file writes to GFS
Tablet Recoveryredo points in log perform the same set of operations from last persistent state
32Saturday, September 21, 13
![Page 33: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/33.jpg)
Overview
Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions
33Saturday, September 21, 13
![Page 34: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/34.jpg)
Performance evaluation
Test EnvironmentBased on a GFS with 1876 machines400 GB IDE hard drives in each machineTwo-level tree-shaped switched network
Performance TestsRandom Read/WriteSequential Read/Write
34Saturday, September 21, 13
![Page 35: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/35.jpg)
Single tablet-server performance
Random reads is the slowestTransfer 64 KB SSTable over GFS to read 1000 byte
Random and sequential writes perform betterAppend writes to server to a single commit logGroup commit
35Saturday, September 21, 13
![Page 36: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/36.jpg)
Performance Scaling
Performance didn’t scale linearlyLoad imbalance in multiple server configurationsLarger data transfer overhead
36Saturday, September 21, 13
![Page 37: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/37.jpg)
Overview
Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions
37Saturday, September 21, 13
![Page 38: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/38.jpg)
Google Analytics
A service that analyzes traffic patterns at web sitesRaw Click Table
Row for each end-user sessionRow key is (website name, time)
Summary TableExtracts recent session data using MapReduce jobs
38Saturday, September 21, 13
![Page 39: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/39.jpg)
Google Earth
Use one table for preprocessing and one for servingDifferent latency requirements (disk vs memory)
Each row in the imagery table represents a single geographic segment
Column family to store data sourceOne column for each raw imageVery sparse
39Saturday, September 21, 13
![Page 40: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/40.jpg)
Personalized Search
Row key is a unique useridA column family for each type of user actionReplicated across Bigtable clusters to increase availability and reduce latency
40Saturday, September 21, 13
![Page 41: Big table presentation-final](https://reader038.vdocument.in/reader038/viewer/2022110115/54c7f5f24a79594f2e8b456f/html5/thumbnails/41.jpg)
Conclusions
Bigtable provides a high scalability, high performance, high availability and flexible storage for structured data.
It provides a low level read / write based interface for other frameworks to build on top of it
It has enabled Google to deal with large scale data efficiently
41Saturday, September 21, 13