big table
TRANSCRIPT
SOA Server: RESTFul Web Services
Presented byManuel Correa
BigTable: A Distributed Storage System for Structure DataFay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. WallachMike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
Google, Inc.
Problem in RDBMS & NoSQL Databases
What is BigTable?
Data Model
Implementation
HBASE Hadoop project
Quick Demo
Performance and Evaluations
Real Applications
Questions
Agenda
RDBMS do not scale with Large Data Sets petabytes
Do not scale horizontally Replication, and Clustering RDBMS were not designed to be distributed
Schema are rigid
Joins do not scale well
Problems in RDBMS
Non-Relational data storage
No joins, one simple schema that accommodates large datasets
NoSQL Data Bases designed to be distributed
NoSQL is not a replacement for RDBMS!!
Examples:Document DB: Hbase (Hadoop)
Object DB: MongoDB
Graph DB: Neo4j
NoSQL Databases No Only SQL
Who's using NoSQL?
BigTable is a distributed storage system for managing structure data
Designed by Google Inc. in 2006
BigTable was designed to scale to petabytes of data in thousands of machines
BigTable has successfully provided a flexible, wide applicability, scalability, high availability, and high-performance solution for all Google products
What is BigTable?
BigTable is sparse, distributed, persistent multidimensional sorted Map
The map is index by row key, column key, timestamp. The value is an array of bytes
BigTable Data Model
Rows keys are arbitrary strings (up to 64KB)
Every single Read/Write operation over a row is Atomic
BigTable data is ordered lexicographically by row key
A table is dynamically partitioned in row ranges called Tablet
A Tablet is the basic unit of distribution and load balancing
BigTable Model - Rows
Columns keys are grouped together in a single unit called column families
A column family is the single unit of access control
All data stored in a column family are usually of the same type
A column family contains several column indexes
Access a column index: family:qualifier
BigTable Model Columns
Each cell in BigTable can contain multiple versions of the same data
Each version is index with a timestamp
The timestamp is an 64-bit Integer
Different versions of a cell are store in decreasing order, so that the most recent version can be read first
BigTable implements a Garbage Collector to remove unused versions (The client can specify a expiration policy for each column family)
BigTable Model Timestamps
Each cell in BigTable are index by timestamp
Maintain different version of the same data
The most recent version will be first. The order of the timestamp is decreasing
The system implements garbage collector. This takes care of unused versions
Example: WebTable
The contents family column of a Web page has different versions
BigTable Model Timestamps
BigTable Model Example: WebTable
Row Key is the URL in reverse
Pages in the same domain are group together in contiguous rows = TABLET
Anchor is a family column with two column indexes
Different versions (t3, t5, t6, t8, 9) are keep in family indexes
BigTable Implementation
BigTable uses the distributed Google File System
MapReduce inputs/outputs can be store in BigTable
BigTable uses the Google SSTable file format to internally store the data
SSTable: provide a persistent, ordered immutable map from keys to values
Internally SSTable contains a sequence of blocks (~64KB)
SSTable contains a index at the end of the block
A lookup can be perform with a single disk seek
SSTable can loaded into memory, so the scans and lookups operation happens in memory
BigTable Implementation
BigTable relies in a high-available, persistent distributed Lock service called Chubby
The Chubby services consist in five active replicas. One of them is the Master. Paxos algorithm is used to keep the replicas in sync
Chubby provides a namespace that consist in Directories/Files. Which can be used as a Lock. Each Read/Write is atomic
BigTable uses Chubby to:To ensure that there is at most one Master node active at any time
To store the bootstrap location of BigTable data
To discover Tablet servers and finalize Tablet server deaths
To Store BigTable schema information
Three Major Components:Master ServerAssigns Tablet to Tablets servers
Handles schema changes: Tablet and column families creation
Tablet Servers:Manage a set of Tablets (between 10 and 1000 tablets)
Handles Read/Write requests
A library, linked to every clientClients communicate directly with Tablet Server
Client cache the tablet location. Worst case must go through the Master to find out the tablet location and refresh the local cache
BigTable Implementation
Three levels hierarchy to store Tablet location information. (analogous to B+ tree)
Chubby File contains the location to the Root Tablet
Root Tablet contain the location to other Tablets in a special METADATA tablet
BigTable Implementation
Updates are committed to a commit log that stores redo records. The Authorization is checked in a chubby file
The recently commits are store in memtable (Memory)
To recover a Tablet server read its METADATA and then applies the changes in MEMTABLE
The read operations read for memtable or SSTable. The authorization is checked in a Chubby file
BigTable Implementation Tablet Serving
This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware
HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's BigTable
Hbase includes:Convenient base classes for backing Hadoop MapReduce jobs with HBase tables including cascading, hive and pig source and sink modules
Query predicate push down via server side scan and get filters
Optimizations for real time queries
A Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
Extensible jruby-based (JIRB) shell
HBase The Hadoop DB
DEMO
HBase The Hadoop DB
dff
BigTable: Performance/Evaluations
dff
BigTable: Real Applications
Questions ?
BigTable