introduction of hbase reporter: hu yi 2009-3-11. overview hbase is an apache open source project...
TRANSCRIPT
![Page 1: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/1.jpg)
Introduction of HBase
Reporter: Hu Yi
2009-3-11
![Page 2: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/2.jpg)
Overview
HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing Environment.
Data is logically organized into tables, rows and columns.
![Page 3: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/3.jpg)
Outline
Data Model Architecture and Implementation Examples & Tests
![Page 4: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/4.jpg)
Conceptual View
A data row has a sortable row key and an arbitrary number of columns.
A Time Stamp is designated automatically if not artificially.
<family>:<label>
Row keyTime
Stamp
Column“contents:
”Column “anchor:”
“com.apache.www”
t12 “<html>…”
t11 “<html>…”
t10“anchor:apache.
com”“APACHE
”
“com.cnn.www”
t15“anchor:cnnsi.co
m”“CNN”
t13“anchor:my.look.c
a”“CNN.co
m”
t6 “<html>…”
t5 “<html>…”
t3 “<html>…”
<family>:<label>
![Page 5: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/5.jpg)
Physical Storage View
Physically, tables are stored on a per-column family basis.
Empty cells are not stored in a column-oriented storage format.
Each column family is managed by an HStore.
Row key TSColumn
“contents:”
“com.apache.www”
t12 “<html>…”
t11 “<html>…”
“com.cn.www”
t6 “<html>…”
t5 “<html>…”
t3 “<html>…”
Row key TS Column “anchor:”
“com.apache.www” t10
“anchor:apache.com”
“APACHE”
com.cn.www”
t9“anchor:
cnnsi.com”“CNN”
t8“anchor:
my.look.ca”“CNN.co
m”
HStore
Data MapFile
Index MapFile
Key/Value
Index key
HStore
Memcache
![Page 6: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/6.jpg)
Row Ranges: Regions
Row key/ Column ascending, Timestamp descending
Physically, tables are broken into row ranges contain rows from start-key to end-key
Row keyTime
StampColumn
“contents:”Column “anchor:”
aaaa
t15 anchor:cc value
t13 ba
t12 bb
t11 anchor:cd value
t10 bc
aaab t14
aaac anchor:be value
aaad anchor:ad value
aaaet5 ae
t3 af
![Page 7: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/7.jpg)
Outline
Data Model Architecture and Implementation Examples & Tests
![Page 8: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/8.jpg)
Three major components
The HBaseMaster
The HRegionServer
The HBase client
![Page 9: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/9.jpg)
HBaseMaster
Assign regions to HRegionServers.
1. ROOT region locates all the META regions.
2. META region maps a number of user regions.
3. Assign user regions to the HRegionServers.
Enable/Disable table and change table schema
Monitor the health of each Server
ROOT Regi on
META Regi on
META Regi on
USER Regi on
USER Regi on
USER Regi on
![Page 10: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/10.jpg)
ROOT/META Table
Each row in the ROOT and META tables is approximately 1KB in size. At the default size of 256MB.
18
18 18
54 64
1 2
2 2
2 2
ROOTtable METAregions
USERregions
KB bytes
224TB
![Page 11: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/11.jpg)
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
write
Hstore1 Hstore2
Memcache1
HLog
Row keyTimeStam
p
Column“contents
:”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACH
E”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
Memcache2
Mapfile1.1
Mapfile1.2
![Page 12: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/12.jpg)
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
Read
Hstore1
Memcache1
Mapfile1.1
Mapfile1.2
Row keyTimeStam
p
Column“contents:
”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACHE
”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
![Page 13: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/13.jpg)
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
Cache Flushes
Hstore1
Memcache1
Mapfile1.1
Mapfile1.2
HLog
Row keyTimeStam
p
Column“contents:
”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACHE
”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
Mapfile1.1
Mapfile1.2
Mapfile1.3
![Page 14: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/14.jpg)
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
Compactions
Hstore1
Memcache1
Mapfile1.1
Mapfile1.2Mapfile1
Row keyTimeStam
p
Column“contents:
”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACHE
”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
![Page 15: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/15.jpg)
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
Region Splits
Hstore1
Memcache1
Mapfile1
Row keyTimeStam
p
Column“contents
:”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACH
E”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
![Page 16: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/16.jpg)
HBase Client
![Page 17: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/17.jpg)
HBase Client ROOT Region
![Page 18: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/18.jpg)
HBase Client
META Region
![Page 19: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/19.jpg)
HBase Client User Region
Information cached
![Page 20: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/20.jpg)
Outline
Data Model Architecture and Implementation Examples & Tests
![Page 21: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/21.jpg)
Create MyTable
HBaseAdmin admin= new HBaseAdmin(config);HColumnDescriptor []column;column= new HColumnDescriptor[2];column[0]=new HColumnDescriptor("columnFamily1:");column[1]=new HColumnDescriptor("columnFamily2:");HTableDescriptor desc= new HTableDescriptor(Bytes.toByt
es("MyTable"));desc.addFamily(column[0]);desc.addFamily(column[1]);admin.createTable(desc);
Row Key
Timestamp
columnFamily1:
columnFamily2:
![Page 22: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/22.jpg)
Insert Values
BatchUpdate batchUpdate = new BatchUpdate("myRow",timestamp);
batchUpdate.put("columnFamily1:labela",Bytes.toBytes("labela value"));
batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“labelb value"));
table.commit(batchUpdate);
Row Key
Timestamp columnFamily1:
myRow
ts1 labela labela value
ts2labelb
labelb value
![Page 23: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/23.jpg)
I nsert
0
20000
40000
60000
80000
100000
120000
140000
160000
100000 10000 1000 100 10 1
1 10 100 1000 10000 100000
Hbase
![Page 24: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/24.jpg)
Insert
1
10
100
1000
10000
100000
1000000
10 100
1000
1000
0
1000
00
Row*10 Column=1
time
(ms)
HbaseMySQL
![Page 25: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/25.jpg)
Search
Row keyTime
StampColumn “anchor:”
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3
Select value from table where key=‘com.apache.www’ AND label=‘anchor:apache.com’
![Page 26: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/26.jpg)
Search ScannerSelect value from table where anchor=‘cnnsi.com’
Row keyTime
StampColumn “anchor:”
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3
![Page 27: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/27.jpg)
Summary
Column-oriented modification more flexible.
Higher performance on row key clusters.
![Page 28: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/28.jpg)
Future work
More test work
Optimization on search
![Page 29: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f325503460f94c4ecf2/html5/thumbnails/29.jpg)
Thank you