hbase a column-centered database 1. overview an apache project influenced by google’s bigtable...

20
HBase A column-centered database 1

Upload: oscar-glenn

Post on 29-Dec-2015

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

1

HBaseA column-centered database

Page 2: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

2

Overview

•An Apache project• Influenced by Google’s BigTable•Built on Hadoop

▫A distributed file system▫Supports Map-Reduce

•Goals▫Scalability▫Versions▫Compression▫In memory tables

Page 3: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

3

Architectural issues

•Cluster of nodes is the general architecture

•Standalone mode for single machine•There is a Java API accessed with JRuby •There is a JRuby shell

Page 4: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

4

Modeling constructs

•Table▫Has a row key▫A series of column families

Each has a column name and a value•Operations

▫Create table▫Insert a row with “Put” command

Only one column at a time▫Query a table with a “Get” command

(uses a table name and a row key)

Page 5: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

5

Filters

•Scan▫can get a series of rows based on two key

values ▫Can provide a filter for such things as

column families, timestamps▫Filters can be pushed to the server

Page 6: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

6

Updating

•When a column value is written to the db, old values are kept and organized by timestamp▫Each such value is a cell

•You can explicitly assign timestamps manually▫Otherwise, current timestamp with insert▫When getting, uses most recent version

•Operations that alter column family structures is expensive

Page 7: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

7

Other characteristics

•Text compression•Rows are stored in order by key value•A region is some set of rows

▫Each is stored in a single region server▫Regions can be automatically merged and

split•Uses write-ahead logging to prevent loss

of data with node failures▫This is called journaling in Unix file

systems•Supports a master/slave multi-cluster

strategy

Page 8: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

8

An HBase clustertaken from: http://www.packtpub.com/article/hbase-basic-performance-tuning

Page 9: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

9

Terminology

•A region is a subset of the rows of a table▫These are automatically sharded

•A master coordinates the slaves▫Assigns regions▫Detects region failures▫Administrative functions

•A client reads and writes rows directly to the region servers

•A client finds region server addresses in zookeeper

Page 10: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

10

Tasks of components

•Zookeeper cluster is a coordination service for the HBase cluster▫Finds the correct server▫Selects the master

•Master allocates regions & load balancing•Region servers hold the regions•Hadoop supports Map-Reduce

Page 11: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

11

Features

•Consistency over available•Efficient mapreduce•Range partition queries, not based on

hashing or other random access•Automatically shards•Very sparse column storage

Page 12: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

12

Some key concepts

•De-normalization•Fast random, key-row retrieval•Use of a multi-component architecture to

leverage existing software tools•Controllable in-memory selection

Page 13: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

13

Important HBase Properties

•Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore.

•Automatic sharding: HBase tables are distributed on the cluster via regions, and regions are automatically split and re-distributed as your data grows.

•MapReduce: HBase supports massively parallelized processing via MapReduce

Page 14: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

14

When to use or not use Hbase

•Java Client API: HBase supports an easy to use Java API for programmatic access.

•If you have hundreds of millions or billions of rows, then HBase is a good candidate.

•If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle.

Page 15: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

15

More on using or not using HBase•Make sure you can live without all the extra

features that an RDBMS provides (e.g., typed columns, secondary indexes, transactions, advanced query languages, etc.)

•Make sure you have enough hardware. Even HDFS doesn't do well with anything less than 5 DataNodes

•HBase can run quite well stand-alone on a laptop - but this should be considered a development configuration only.

Page 16: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

16

The Hbase API

•Get a row•Put a row, with a column/value pair•Scan, with a key range and a filter•mapreduce via Hive

Page 17: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

17

High level map reduce diagram

Page 18: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

18

A more detailed diagram

Page 19: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

19

Map reduce example steps1. The system takes input from a file system and splits it up across separate Map nodes 2. The Map function or code is run and generates an output for each Map node—in the word count function, every word is listed and grouped by word per node 3. This output represents a set of intermediate key-value pairs that are moved to Reduce nodes as input 4. The Reduce function or code is run and generates an output for each Reduce node—in the word count example, the reduce function sums the number of times a group of words or keys occurs 5. The system takes the outputs from each node to aggregate a final view

Page 20: HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce

20

Diagram of map reduce