hbase

10
HBase A column-centered database 1

Upload: caryn-weber

Post on 31-Dec-2015

12 views

Category:

Documents


0 download

DESCRIPTION

HBase. A column-centered database. Overview. An Apache project Influenced by Google’s BigTable Built on Hadoop A distributed file system Supports Map-Reduce Goals Scalability Versions Compression In memory tables. Architectural issues. Cluster of nodes is general architecture - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HBase

1

HBaseA column-centered database

Page 2: HBase

2

Overview

•An Apache project• Influenced by Google’s BigTable•Built on Hadoop

▫A distributed file system▫Supports Map-Reduce

•Goals▫Scalability▫Versions▫Compression▫In memory tables

Page 3: HBase

3

Architectural issues

•Cluster of nodes is general architecture•Standalone mode for single machine•There is a Java API accessed with JRuby •There is a JRuby shell

Page 4: HBase

4

Modeling constructs

•Table▫Has a row key▫A series of column families

Each has a column name and a value•Operations

▫Create table▫Insert a row with “Put” command

Only one column at a time▫Query a table with a “Get” command

(uses a table name and a row key)

Page 5: HBase

5

Filters

•Scan▫can get a series of rows based on two key

values ▫Can provide a filter for such things as

column families, timestamps▫Filters can be pushed to the server

Page 6: HBase

6

Updating

•When a column value is written to the db, old values are kept and organized by timestamp▫Each such value is a cell

•You can explicitly assign timestamps manually▫Otherwise, current timestamp with insert▫When getting, uses most recent version

•Operations that alter column family structures is expensive

Page 7: HBase

7

Other characteristics

•Text compression•Rows are stored in order by key value•A region is some set of rows

▫Each is stored in a single region server▫Regions can be automatically merged and

split•Uses write-ahead logging to prevent loss

of data with node failures▫This is called journaling in Unix file

systems•Supports a master/slave multi-cluster

strategy

Page 8: HBase

8

An HBase clustertaken from: http://www.packtpub.com/article/hbase-basic-performance-tuning

Page 9: HBase

9

Tasks of components

•Zookeeper cluster is a coordination service for the HBase cluster▫Finds the correct server▫Selects the master

•Master allocates regions & load balancing•Region servers hold the regions•Hadoop supports Map-Reduce

Page 10: HBase

10

Some key concepts

•De-normalization•Fast random, key-row retrieval•Use of a multi-component architecture to

leverage existing software tools•Controllable in-memory selection