big table

SOA Server: RESTFul Web Services

Presented byManuel Correa

BigTable: A Distributed Storage System for Structure DataFay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. WallachMike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Google, Inc.

Problem in RDBMS & NoSQL Databases

What is BigTable?

Data Model

Implementation

HBASE Hadoop project

Quick Demo

Performance and Evaluations

Real Applications

Questions

Agenda

RDBMS do not scale with Large Data Sets petabytes

Do not scale horizontally Replication, and Clustering RDBMS were not designed to be distributed

Schema are rigid

Joins do not scale well

Problems in RDBMS

Non-Relational data storage

No joins, one simple schema that accommodates large datasets

NoSQL Data Bases designed to be distributed

NoSQL is not a replacement for RDBMS!!

Examples:Document DB: Hbase (Hadoop)

Object DB: MongoDB

Graph DB: Neo4j

NoSQL Databases No Only SQL

Who's using NoSQL?

BigTable is a distributed storage system for managing structure data

Designed by Google Inc. in 2006

BigTable was designed to scale to petabytes of data in thousands of machines

BigTable has successfully provided a flexible, wide applicability, scalability, high availability, and high-performance solution for all Google products

What is BigTable?

BigTable is sparse, distributed, persistent multidimensional sorted Map

The map is index by row key, column key, timestamp. The value is an array of bytes

BigTable Data Model

Rows keys are arbitrary strings (up to 64KB)

Every single Read/Write operation over a row is Atomic

BigTable data is ordered lexicographically by row key

A table is dynamically partitioned in row ranges called Tablet

A Tablet is the basic unit of distribution and load balancing

BigTable Model - Rows

Columns keys are grouped together in a single unit called column families

A column family is the single unit of access control

All data stored in a column family are usually of the same type

A column family contains several column indexes

Access a column index: family:qualifier

BigTable Model Columns

Each cell in BigTable can contain multiple versions of the same data

Each version is index with a timestamp

The timestamp is an 64-bit Integer

Different versions of a cell are store in decreasing order, so that the most recent version can be read first

BigTable implements a Garbage Collector to remove unused versions (The client can specify a expiration policy for each column family)

BigTable Model Timestamps

Each cell in BigTable are index by timestamp

Maintain different version of the same data

The most recent version will be first. The order of the timestamp is decreasing

The system implements garbage collector. This takes care of unused versions

Example: WebTable

The contents family column of a Web page has different versions

BigTable Model Timestamps

BigTable Model Example: WebTable

Row Key is the URL in reverse

Pages in the same domain are group together in contiguous rows = TABLET

Anchor is a family column with two column indexes

Different versions (t3, t5, t6, t8, 9) are keep in family indexes

BigTable Implementation

BigTable uses the distributed Google File System

MapReduce inputs/outputs can be store in BigTable

BigTable uses the Google SSTable file format to internally store the data

SSTable: provide a persistent, ordered immutable map from keys to values

Internally SSTable contains a sequence of blocks (~64KB)

SSTable contains a index at the end of the block

A lookup can be perform with a single disk seek

SSTable can loaded into memory, so the scans and lookups operation happens in memory


BigTable relies in a high-available, persistent distributed Lock service called Chubby

The Chubby services consist in five active replicas. One of them is the Master. Paxos algorithm is used to keep the replicas in sync

Chubby provides a namespace that consist in Directories/Files. Which can be used as a Lock. Each Read/Write is atomic

BigTable uses Chubby to:To ensure that there is at most one Master node active at any time

To store the bootstrap location of BigTable data

To discover Tablet servers and finalize Tablet server deaths

To Store BigTable schema information

Three Major Components:Master ServerAssigns Tablet to Tablets servers

Handles schema changes: Tablet and column families creation

Tablet Servers:Manage a set of Tablets (between 10 and 1000 tablets)

Handles Read/Write requests

A library, linked to every clientClients communicate directly with Tablet Server

Client cache the tablet location. Worst case must go through the Master to find out the tablet location and refresh the local cache


Three levels hierarchy to store Tablet location information. (analogous to B+ tree)

Chubby File contains the location to the Root Tablet

Root Tablet contain the location to other Tablets in a special METADATA tablet


Updates are committed to a commit log that stores redo records. The Authorization is checked in a chubby file

The recently commits are store in memtable (Memory)

To recover a Tablet server read its METADATA and then applies the changes in MEMTABLE

The read operations read for memtable or SSTable. The authorization is checked in a Chubby file

BigTable Implementation Tablet Serving

This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware

HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's BigTable

Hbase includes:Convenient base classes for backing Hadoop MapReduce jobs with HBase tables including cascading, hive and pig source and sink modules

Query predicate push down via server side scan and get filters

Optimizations for real time queries

A Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options

Extensible jruby-based (JIRB) shell

HBase The Hadoop DB

DEMO

HBase The Hadoop DB

dff

BigTable: Performance/Evaluations

dff

BigTable: Real Applications

Questions ?

BigTable

big table

Documents