hbase
DESCRIPTION
HBase. A column-centered database. Overview. An Apache project Influenced by Google’s BigTable Built on Hadoop A distributed file system Supports Map-Reduce Goals Scalability Versions Compression In memory tables. Architectural issues. Cluster of nodes is general architecture - PowerPoint PPT PresentationTRANSCRIPT
1
HBaseA column-centered database
2
Overview
•An Apache project• Influenced by Google’s BigTable•Built on Hadoop
▫A distributed file system▫Supports Map-Reduce
•Goals▫Scalability▫Versions▫Compression▫In memory tables
3
Architectural issues
•Cluster of nodes is general architecture•Standalone mode for single machine•There is a Java API accessed with JRuby •There is a JRuby shell
4
Modeling constructs
•Table▫Has a row key▫A series of column families
Each has a column name and a value•Operations
▫Create table▫Insert a row with “Put” command
Only one column at a time▫Query a table with a “Get” command
(uses a table name and a row key)
5
Filters
•Scan▫can get a series of rows based on two key
values ▫Can provide a filter for such things as
column families, timestamps▫Filters can be pushed to the server
6
Updating
•When a column value is written to the db, old values are kept and organized by timestamp▫Each such value is a cell
•You can explicitly assign timestamps manually▫Otherwise, current timestamp with insert▫When getting, uses most recent version
•Operations that alter column family structures is expensive
7
Other characteristics
•Text compression•Rows are stored in order by key value•A region is some set of rows
▫Each is stored in a single region server▫Regions can be automatically merged and
split•Uses write-ahead logging to prevent loss
of data with node failures▫This is called journaling in Unix file
systems•Supports a master/slave multi-cluster
strategy
8
An HBase clustertaken from: http://www.packtpub.com/article/hbase-basic-performance-tuning
9
Tasks of components
•Zookeeper cluster is a coordination service for the HBase cluster▫Finds the correct server▫Selects the master
•Master allocates regions & load balancing•Region servers hold the regions•Hadoop supports Map-Reduce
10
Some key concepts
•De-normalization•Fast random, key-row retrieval•Use of a multi-component architecture to
leverage existing software tools•Controllable in-memory selection