hugetable:application-oriented structure data storage system

18
HugeTableApplication-Oriented Structure Data Storage System China Mobile Research Institute HugeTable Project Team Qian Ling

Upload: qlw5

Post on 06-Apr-2017

697 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: HugeTable:Application-Oriented Structure Data Storage System

HugeTable:Application-Oriented Structure Data Storage System

China Mobile Research Institute

HugeTable Project Team

Qian Ling

Page 2: HugeTable:Application-Oriented Structure Data Storage System

Agenda

Motivations

Hadoop, Hive & HBase

HT Design & Development

HT Applications

Further Plans

Page 3: HugeTable:Application-Oriented Structure Data Storage System

Motivations

� Huge Data Volumes

� Total data volumes: Several PB per system

� Daily data volumes: Several TB per system

� Longer retention period: several months

� Big potential: 200% increase in some area

� Multiple Applications Areas

� BOSS BI NMS Internet ...

� Data Integration

� Traditional Application Model

� SQL support

� Fast Index Query

� Multiple Application support

� Sensitive data

� CRUD support

� Statistic & Reporting

Data Warehouse

•Scalable

•High Available

•Reliable

+ App Solution

… Affordable

Page 4: HugeTable:Application-Oriented Structure Data Storage System

Hadoop: Raw Techniques

HDFS: distributed file system with fault tolerance

MapReduce: parallel programming

environments over HDFS

�Similar to the situation of POSIX API + Local FS

High Level Toolkits are initiated

Yahoo: PIG/Latin

Business.com: Cloudbase/Hadoop+JDBC

China Mobile: BC-PDM

Facebook: Hive/SQL

Page 5: HugeTable:Application-Oriented Structure Data Storage System

Hive: A Petabytes Scale Data Warehouse

Source: ICDE 2010/Facebook

• Schema support

• Pluggable Storage Engine I/F

• SQL � MR translation

• xDBC Driver

• Tools: HQL Console

• Admin: HWI

Features:

• Reporting

• Ad hoc Analysis

• Machine Learning

• Others

•Log analysis

•Trend detection

�Facebook has huge clusters

>1000 nodes

Usage Scenarios

Page 6: HugeTable:Application-Oriented Structure Data Storage System

HBase: structured storage of sparse data for Hadoop

Source: ApacheCon2009/ HBase

• ColumnFamilies

• ACID

• Optimized R/W

• BigTable I/F + BU

• Tools: HBase Shell

• Admin: Jetty Based

Features

• Social Service

• MapReduce Analysis

• Content Repository

• Wiki, RSS

• Near Realtime Reporting

& analytics

• Store web pages

Usage Scenarios

… Replacing SQL Systems

Page 7: HugeTable:Application-Oriented Structure Data Storage System

HugeTable: Application-Oriented Structure Data Storage System

Address the missing blocksIndex store & Query Optimizer

Access Control List

Insert, Update and Delete

Web-based Administration

Build Solutions for Telco ApplicationsNetwork Management System – NMS

Value-added System – VAS

Business Intelligence – BI

Other areas

HugeTable

ToolsClient

s

HFile w/

CF

Index

Store

AdminData,

config,

FM, Log,

Perf

I/F

Page 8: HugeTable:Application-Oriented Structure Data Storage System

A Brief History of HugeTable

2008 2009 2010

HT-p1 HT-p2 HT-p3

1.HBase-based

2.Partial xDBC/SQL

support

3. Integration HBase

with ZK before

official release4.Secondary Index5.Support Schema6.ACL support7.SQL console

1.Connect Hive with

HBase

3.Support HFile, CF in

Hive

2.Global Indexing4.Secondary Index5.Multiple DB support6.ACL support7.MR & Scan I/F8.Loader Tools, HT-Client9.Admin Portal10.JDBC remote console

1. Move to higher version

of Hive, Hadoop and

HBase

2. New Storage Engine

3. Fruitful external I/F

4. Many other

improvements4. Application Solution

Page 9: HugeTable:Application-Oriented Structure Data Storage System

HugeTable Building Blocks

HugeTableHugeTableHugeTable

HadoopHadoop

CoreCore

HadoopHadoopHBaseHBase

HadoopHadoop

HiveHive

HadoopHadoop

ZookeeperZookeeper

Storage

Computing

KVStore SQL-MR Lock

Cloud Cloud

MasterMaster

NMS

……

………

ApplicationsApplications

Page 10: HugeTable:Application-Oriented Structure Data Storage System

HBase as HugeTable Index Store

Index Meta Data

Index Data

HBase

Create Index

Drop Index

Query Engine

Select … using index xxx

Select … where idxcol

Find Index

Load Service

Find Index

HT Loader

Write Index

Read Index

Check Index

Page 11: HugeTable:Application-Oriented Structure Data Storage System

Index Store Implementation

<10 sec<10 secN/AIndex Query

Load Speed

Memory

Consumption

20 Nodes,1TB/Node

>5MB/s·Node(Primary Index)

2.5MB/s·Node(Primar

y Index)

20MB/s·Node(No

Index)

2GB/Node*TB8GB/Node*TBNo Additional cost

HT-p2HTHTHTHT----p1p1p1p1Hive

Primary Index: index into data file

Secondary Index: index into primary index

Exact match and Range scan

Integrated with Hive ql and other modules

Page 12: HugeTable:Application-Oriented Structure Data Storage System

HugeTable IUD Support

Goal: Support Insert, Update and Delete on application data.

Meta Data

IUD Table

HBase

IUD Statement

Query EngineFind IUD table

Write IUD Data

Read IUD Data

Select

HT Data

HDFS

Offline Merger

Page 13: HugeTable:Application-Oriented Structure Data Storage System

HugeTable Access Control

Goal: Support Multiple Users from Multiple Applications , w/o mutual trust

Database privileges:

1. Meta Data: Index, Create,

Drop

2. User Data: IUD

User Access Level:

1. System Administrator

2. User Manager

3. User

Meta Data

DDL/DML

ACL ModuleCheck Privileges

Loader/PortalGrant/Revoke

Page 14: HugeTable:Application-Oriented Structure Data Storage System

Administration Portal

Goal: Unified HugeTable management point, decrease management effort

Data Management

DB/TBL/IDX

User Management

Add/Delete/Modify

Monitor & FM

Log/Alert/Service

Configuration

Deploy/Setup

Page 15: HugeTable:Application-Oriented Structure Data Storage System

HugeTable Application API

JDBC/SQL API MapReduce API

• Migration of traditional database

applications

• For SQL developer

• Batch processing & interactive

• Compatible with Hadoop MR API

• For data analysis, e.g. data mining

• Work with HT records format

• Access control

BigTable API

• BigTable/HBase style API

• For NoSQL application, on HFile2

• Range scan, Key-value access

• Access Control

Various kinds of ApplicationsVarious kinds of Applications

public void map(LongWritable key, HugeRecord value,

OutputCollector<HugeRecordRowKey, HugeRecord> output,

Reporter reporter);

public void reduce(HugeRecordRowKey key,

Iterator<HugeRecord> values,

OutputCollector<HugeRecordRowKey, HugeRecord> output,

Reporter reporter);

Table table = new Table("gdr", "admin", "admin");

String[] families = new String[] {"default"};

String[] partitions = new String[] {"dt=20100317"};

int limit = 10;

TableScannerInterface tsi = table.getScanner(

new byte[0],new byte[] {Byte.MAX_VALUE},families, partitions);

for (int i=0; i<limit; ++i) {

GroupValue gv = tsi.next();

for (String family : families) {

System.out.println(family + " = " + Bytes.toString(gv.getByteValue(family)));

}

}

Page 16: HugeTable:Application-Oriented Structure Data Storage System

HugeTable based Telco Application Solutions

Heavy Requirements, e.g.

Batch processing

Complex data analysis

Interactive query on CDR

Statistic and reporting

HugeTable

Cluster

+

DataMing

Tool kits

Data Source

Data Source

Data

Aggregator

Database

Data

warehouse

Telco App

Interactive

Simple Query

Interactive

Complex Query

Complex

Analyze

Reporting

Mass Data Store

Batch processing

Statistic

Page 17: HugeTable:Application-Oriented Structure Data Storage System

Future works

Column Sorage Engine

File Format

Compression

Local Index

Global Index

Query Optimization

Join Optimization: index

Load Optimization

Parallel Load

Application Solution

Page 18: HugeTable:Application-Oriented Structure Data Storage System

Thanks for your time!Thanks for your time!Thanks for your time!Thanks for your time!China Mobile Research Institute