jackhare- a framework for sql to nosql translation using mapreduce

30
JackHare a framework for SQL to NoSQL translation using MapReduce Presented by 康康康 2013.10.22 ived: 15 December 2012 / Accepted: 6 September 2013 ringer Science+Business Media New York 2013 1 u-Chun Chung·Hung-Pin Lin· hih-Chang Chen·Mon-Fong Jiang· eh-Ching Chung

Upload: -

Post on 27-Jan-2015

111 views

Category:

Technology


3 download

DESCRIPTION

20131022論文報告

TRANSCRIPT

Page 1: JackHare- a framework for SQL to NoSQL translation using MapReduce

JackHare

a framework for SQL to NoSQL translation using MapReduce

Presented by 康志強2013.10.22

Received: 15 December 2012 / Accepted: 6 September 2013© Springer Science+Business Media New York 2013

1

Wu-Chun Chung·Hung-Pin Lin·Shih-Chang Chen·Mon-Fong Jiang·Yeh-Ching Chung

Page 2: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Introduction• Related work• The JackHare framework architecture• Unstructured data processing in HBase• Experimental results• Conclusions

Outline

2

Page 3: JackHare- a framework for SQL to NoSQL translation using MapReduce

• BigData 的問題 (massive data)– 資料的存取速度– 資料合併的問題

平行處理時資料的即時性、正確性。• Hadoop MapReduce– to process the massive data in parallel.

• Hadoop distributed file system– difficult to update data frequently

Introduction

3

Page 4: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Hbase– to place the data over a scale-out storage system– to manipulate the changeable data in a transparent

way– the Hbase interface is not friendly

• JackHare– 遵守 ANSI-SQL 和 JDBC-4.0 規格的 API ,用來操作

Apache Hbase– using MapReduce framework for processing the

unstructured data in HBase

Introduction

4

Page 5: JackHare- a framework for SQL to NoSQL translation using MapReduce

• 資料的存取速度– 1990, 硬碟可存 1,370M, 傳輸速度 4.4MB/s– 現在 ,1 TB, 傳輸速度 100MB/s– 平行進行資料讀取及寫入 , 加快速度

• Hadoop Distributed File System– difficult to update data frequently in such file system

Introduction

5

Page 6: JackHare- a framework for SQL to NoSQL translation using MapReduce

• 資料合併的問題– 正確性

• MapReduce– 分散式程式框架– Map 就是將一個工作分到多個 Node– Reduce 就是將各個 Node 的結果再重新結合成最

後的結果– 資料本地化– 運用高階的查詢語言 (Pig, Hive)

Introduction

6

Page 7: JackHare- a framework for SQL to NoSQL translation using MapReduce

• MapReduce

Introduction

7

Page 8: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Hbase– 架構在 HDFS 上的分散式資料庫– 使用列 (row) 和行 (column) 為索引存取資料值– 每一筆資料都有一個時間戳記 (timestamp) ,因

此同一個欄位可依不同時間存在多筆資料。(Version)

– HBase 的資料表 (table) 是由許多 row 及數個column family 組成

– 可供 MapReduce 的程式當作資料來源或儲存媒介

Introduction

8

Page 9: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Hbase

Introduction

9

Page 10: JackHare- a framework for SQL to NoSQL translation using MapReduce

• NoSQL 資料庫• http://www.ithome.com.tw/itadm/article.php?c=6336

0&s=5

Introduction

10

Page 11: JackHare- a framework for SQL to NoSQL translation using MapReduce

• JackHare– allowing users to use the ANSI-SQL queries to

manipulate large-scale data– 遵守 ANSI-SQL 和 JDBC-4.0 規格的 API ,用來操作

Apache Hbase– using MapReduce framework for processing the

unstructured data in Hbase

Introduction

11

Page 12: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Pig– HDFS 與 MapReduce 叢集環境中執行– Pig Latin - a simpler procedural language– http://pig.apache.org/docs/r0.12.0/basic.html#nest

edblock• Hive– 提供類似 SQL 的查詢語言來查詢資料 (HiveQL)– 可管理 HDFS 的資料– https://cwiki.apache.org/confluence/display/Hive/T

utorial

Related work

12

Page 13: JackHare- a framework for SQL to NoSQL translation using MapReduce

• YSmart– An SQL-to-MapReduce Translator– http://ysmart.cse.ohio-state.edu/

• S2MART– Smart Sql to Map-Reduce Translators

Related work

13

Page 14: JackHare- a framework for SQL to NoSQL translation using MapReduce

• HadoopDB– An Architectural Hybrid of MapReduce and DBMS

Technologies for Analytical– HadoopDB provides SQL query via a translation

called SQL-MR-SQL (SMS), based on Hive.– http://db.cs.yale.edu/hadoopdb/hadoopdb.html

• Clydesdale– structured data processing on MapReduce– focuses on processing the data fitting a star schema

Related work

14

Page 15: JackHare- a framework for SQL to NoSQL translation using MapReduce

• SQL 查詢轉換為 MapReduce• Hbase– 滿足頻繁的數據更新– 維持 NoSQL 數據庫的可擴展性和可靠性

Related work

15

Page 16: JackHare- a framework for SQL to NoSQL translation using MapReduce

The JackHare framework architecture

16

Page 17: JackHare- a framework for SQL to NoSQL translation using MapReduce

• User submits an ANSI-SQL query by SQL client application.

• The compiler scans and parses the ANSI-SQL query.

• Lookup the related table name, column families and column qualifier of HBase.

• Generate MapReduce code according to the query commands and metadata.

The JackHare framework architecture

17

Page 18: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Access HBase and execute the MapReduce job.• The results wrapped back from the back-end.• The returned results are shown on SQL client

application according to RDB schema.

The JackHare framework architecture

18

Page 19: JackHare- a framework for SQL to NoSQL translation using MapReduce

The JackHare framework architecture

SQuirreL

19

Page 20: JackHare- a framework for SQL to NoSQL translation using MapReduce

• remap the data in relational database to HBase

Unstructured data processing in HBase

20

Page 21: JackHare- a framework for SQL to NoSQL translation using MapReduce

• remap the data in relational database to HBase

Unstructured data processing in HBase

21

Page 22: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Analysis of SQL clauses

– SELECT, FROM and WHERE clauses– Extended clauses• GROUP BY• HAVING• ORDER BY• JOIN• AGGREGATE FUNCTIONs

Unstructured data processing in HBase

22

Page 23: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Experimental environment– two Intel Xeon L5640 CPU, 24 GB ram and3 TB HD– 16-node virtual machine cluster on four physical

machines– Hadoop 0.20.203 (15 October, 2013: release 2.2.0 available)– Hbase 0.92.0 (2013-09-20 | Version: 0.97.0-SNAPSHOT)– Hive 0.9.0– JAVA 1.6.0, maximum heap size is 512 MB

Experimental results

23

Page 24: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Experimental environment– Node : two cores at 2 GHz with 4 GB ram and 400

GB storage space– MySQL : two cores at 2 GHz, 4 GB ram and– 800 GB hard disk– 3 Table : LOT, WAFER and DIE

Experimental results

24

Page 25: JackHare- a framework for SQL to NoSQL translation using MapReduce

• Results

Experimental results

25

Page 26: JackHare- a framework for SQL to NoSQL translation using MapReduce

Experimental results

26

Page 27: JackHare- a framework for SQL to NoSQL translation using MapReduce

Experimental results

27

Page 28: JackHare- a framework for SQL to NoSQL translation using MapReduce

Experimental results

28

Page 29: JackHare- a framework for SQL to NoSQL translation using MapReduce

Conclusions

29

Page 30: JackHare- a framework for SQL to NoSQL translation using MapReduce

• 報告完畢… .

30