introduction to apache hbase, mapr tables and security

57
1 ©MapR Technologies © MapR Technologies, confidential Introduction to Apache HBase, MapR Tables, and Security

Upload: mapr-data-technologies

Post on 21-Jan-2018

3.289 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Introduction to Apache HBase, MapR Tables and Security

1©MapR Technologies

© MapR Technologies, confidential

Introduction to Apache HBase,

MapR Tables, and Security

Page 2: Introduction to Apache HBase, MapR Tables and Security

Agenda

HBase Overview

HBase APIs

MapR Tables

Example

Securing tables

Page 3: Introduction to Apache HBase, MapR Tables and Security

What‟s HBase??

A NoSQL database

– Synonym for ‘non-traditional’ database

A distributed columnar data store

– Storage layout implies performance characteristics

The “Hadoop” database

A semi-structured database

– No rigid requirements to define columns or even data types in advance

– It’s all bytes to HBase

A persistent sorted Map of Maps

– Programmers view

3

Page 4: Introduction to Apache HBase, MapR Tables and Security

Column Oriented

CF1

colA colB colC

val val

val

Row is indexed by a key

– Data stored sorted by key

Data is stored by columns grouped into column families

– Each family is a file of column values laid out in sorted order by row key

– Contrast this to a traditional row oriented database where rows are stored together with fixed space allocated for each row

CF2

colA colB colC

val val

val

RowKey

axxx

gxxx

Customer Address data Customer order dataCustomer id

Page 5: Introduction to Apache HBase, MapR Tables and Security

HBase Data Model- Row Keys

Row Keys: identify the rows in an HBase table.

RowKey

CF1 CF2 …

colA colB colC colA colB colC colD

R1

axxx val val val val

gxxx val val val val

R2

hxxx val val val val val val val

jxxx val

R3

kxxx val val val val

rxxx val val val val val val

… sxxx val val

Page 6: Introduction to Apache HBase, MapR Tables and Security

Rows are Stored in Sorted Order

Sorting of row key is based upon binary values

–Sort is lexicographic at byte level

–Comparison is “left to right”

Example:

–Sort order for String 1, 2, 3, …, 99, 100: 1, 10, 100, 11, 12,…, 2, 20, 21, …, 9, 91, 92, …, 98, 99

– Sort order for String 001, 002, 003, …, 099, 100: 001, 002, 003, …, 099, 100

–What if the RowKeys were numbers converted to fixed sized binary?

Page 7: Introduction to Apache HBase, MapR Tables and Security

Tables are split into Regions = contiguous keys

Source: Diagram from Lars George‟s HBase: The Definitive Guide.

KeyRange

Region1Key Range

axxx

gxxx

Tables are partitioned into key ranges (regions)

Region= contiguous keys, served by nodes (RegionServers)

Regions are spread across cluster: S1, S2…

Region 2Key Range

Lxxx

zxxx

Region

CF1

colA colB colC

val val

val

CF2

colA colB colC

val val

val

Region

Rowkey

axxx

gxxx

Region Server for Region 2, 3

Page 8: Introduction to Apache HBase, MapR Tables and Security

HBase Data Model- Cells

Value for each cell is specified by complete coordinates:

– RowKey Column Family Column Version: Value

– Key:CF:Col:Version:Value

RowKey CF:Qualifier version value

smithj Data:street 12734567800 Main street

Column Key

Page 9: Introduction to Apache HBase, MapR Tables and Security

Sparsely-Populated Data

Missing values: Cells remain empty and consume no storage

RowKey

CF1 CF2 …

colA colB colC colA colB colC colD

Region1

axxx val val val val

gxxx val val val val

Region2

hxxx val val val val val val val

jxxx val

R3

kxxx val val val val

rxxx val val val val val val

… sxxx val val

Page 10: Introduction to Apache HBase, MapR Tables and Security

HBase Data Model Summary

Efficient/Flexible

– Storage allocated for columns only as needed on a given row

• Great for sparse data

• Great for data of widely varying size

– Adding columns can be done at any time without impact

– Compression and versioning are usually built-in and take advantage of column family storage (like data together)

Highly Scalable

– Data is sharded amongst regions based upon key

• Regions are distributed in cluster

– Grouping by key = related data stored together

Finding data

– Key implies region and server, column family implies file

– Efficiently get to any data by key

Page 11: Introduction to Apache HBase, MapR Tables and Security

Agenda

HBase Overview

HBase APIs

MapR Tables

Example

Securing tables

Page 12: Introduction to Apache HBase, MapR Tables and Security

Basic Table Operations

Create Table, define Column Families before data is imported

– But not the rows keys or number/names of columns

Basic data access operations (CRUD):

put Inserts data into rows (both add and update)

get Accesses data from one row

scan Accesses data from a range of rows

delete Delete a row or a range of rows or columns

Page 13: Introduction to Apache HBase, MapR Tables and Security

CRUD Operations Follow A Pattern (mostly)

Most common pattern

– Instantiate object for an operation: Put put = new Put(key)

– Add or Set attributes to specify what you need: put.add(…)

– Execute the operation against the table: myTable.put(put)

// Insert value1 into rowKey in columnFamily:columnName1

Put put = new Put(rowKey);

put.add(columnFamily, columnName1, value1);

myTable.put(put);

// Retrieve values from rowA in columnFamily:columnName1

Get get = new Get(rowKey);

get.addColumn(columnFamily, columnName1);

Result result = myTable.get(get);

Page 14: Introduction to Apache HBase, MapR Tables and Security

Put Example

byte [] invTable = Bytes.toBytes("/path/Inventory");

byte [] stockCF = Bytes.toBytes(“stock");

byte [] quantityCol = Bytes.toBytes (“quantity”);

long amt = 24l;

HTableInterface table = new HTable(hbaseConfig, invTable);

Put put = new Put(Bytes.toBytes (“pens”));

put.add(stockCF, quantityCol, Bytes.toBytes(amt));

table.put(put);

quantity

pens 24

CF “stock"Inventory

Page 15: Introduction to Apache HBase, MapR Tables and Security

Put Operation – Add method

Once a Put instance is created you call an add method on it

Typically you add a value for a specific column in a column family

– ("column name" and "qualifier" mean the same thing)

Optionally you can set a timestamp for a cell

Put add(byte[] family, byte[] qualifier, long ts, byte[]

value)

Put add(byte[] family, byte[] qualifier, byte[] value)

Page 16: Introduction to Apache HBase, MapR Tables and Security

Put Operation –Single Put Example adding multiple column values to a row

byte [] tableName = Bytes.toBytes("/path/Shopping");

byte [] itemsCF = Bytes.toBytes(“items");

byte [] penCol = Bytes.toBytes (“pens”);

byte [] noteCol = Bytes.toBytes (“notes”);

byte [] eraserCol = Bytes.toBytes (“erasers”);

HTableInterface table = new HTable(hbaseConfig, tableName);

Put put = new Put(“mike”);

put.add(itemsCF, penCol, Bytes.toBytes(5l));

put.add(itemsCF, noteCol, Bytes.toBytes(5l));

put.add(itemsCF, eraserCol, Bytes.toBytes(2l));

table.put(put);

Page 17: Introduction to Apache HBase, MapR Tables and Security

Bytes classhttp://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/util/Bytes.html

org.apache.hadoop.hbase.util.Bytes

Provides methods to convert Java types to and from byte[] arrays

Support for

String, boolean, short, int, long, double, and float

Example:

byte [] bytesTablePath = Bytes.toBytes("/path/Shopping");

String myTable = Bytes.toString(bytesTablePath);

byte [] amountBytes = Bytes.toBytes(1000l);

long amount = Bytes.toLong(amount);

Page 18: Introduction to Apache HBase, MapR Tables and Security

Get Operation – Single Get Examplebyte [] tableName = Bytes.toBytes("/path/Shopping");

byte [] itemsCF = Bytes.toBytes(“stock");

byte [] penCol = Bytes.toBytes (“pens”);

HTableInterface table = new HTable(hbaseConfig, tableName);

Get get = new Get(“Mike”);

get.addColumn(itemsCF, penCol);

Result result = myTable.get(get);

byte[] val = result.getValue(itemsCF, penCol);

System.out.println("Value: " + Bytes.toLong(val));

Page 19: Introduction to Apache HBase, MapR Tables and Security

Get Operation – Add And Set methods

Using just a get object will return everything for a row.

To narrow down results call add

– addFamily: get all columns for a specific family

– addColumn: get a specific column

To further narrow down results, specify more details via one or more set calls then call add

– setTimeRange: retrieve columns within a specific range of version timestamps

– setTimestamp: retrieve columns with a specific timestamp

– setMaxVersions: set the number of versions of each column to be returned

– setFilter: add a filter

get.addColumn(columnFamilyName, columnName1);

Page 20: Introduction to Apache HBase, MapR Tables and Security

Result – Retrieve A Value From A Result

public static final byte[] ITEMS_CF= Bytes.toBytes("items");

public static final byte[] PENS_COL = Bytes.toBytes(“pens");

Get g = new Get(Bytes.toBytes(“Adam”));

g.addColumn(ITEMS_CF , PENS_COL);

Result result = table.get(g);

byte[] b = result.getValue(ITEMS_CF, PENS_COL);

long valueInColumn = Bytes.toLong(b);

http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/Result.html

Items:pens Items:notepads Items:erasers

Adam 18 7 10

Page 21: Introduction to Apache HBase, MapR Tables and Security

Other APIs

Not covering append, delete, and scan

Not covering administrative APIs

24

Page 22: Introduction to Apache HBase, MapR Tables and Security

Agenda

HBase Overview

HBase APIs

MapR Tables

Example

Securing tables

Page 23: Introduction to Apache HBase, MapR Tables and Security

Tables and Files in a Unified Storage Layer

HBase

JVM

HDFS

JVM

ext3 FS

Disks

ApacheHBase onHadoop

HBase

JVM

Apache HBase onMapR Filesystem

MapR-FS

Disks

HDFS API

M7 Tables Integratedinto Filesystem

MapR-FS

Disks

HBase API HDFS API

MapR Filesystem is an integrated system

– Tables and Files in a unified filesystem, based on MapR’s enterprise-grade storage layer.

Page 24: Introduction to Apache HBase, MapR Tables and Security

Portability

MapR tables use the HBase data model and API

Apache HBase applications work as-is on MapR tables

–No need to recompile

–No vendor lock-in

MapR-FS

Disks

HBase API HDFS API

Page 25: Introduction to Apache HBase, MapR Tables and Security

MapR M7 Table Storage

Table regions live inside a MapR container

– Served by MapR fileserver service running on nodes

– HBase RegionServer and HBase Master services are not required

Region Region

Container

Key colB colC

val val

val

Key colB colC

val val

val

Region Region

Container

Key colB colC

val val

val

Key colB colC

val val

val

Client Nodes

Page 26: Introduction to Apache HBase, MapR Tables and Security

MapR Tables vs. HBase

• Compaction delays• Manual administration• Poor reliability• Lengthy disaster recovery

• No Compaction delays• Easy administration• Strong consistency• Rapid recovery• 2x Cassandra performance • 3x HBase performance

Apache HBase

Page 27: Introduction to Apache HBase, MapR Tables and Security

MapR M7 vs. CDH – Mixed Load (50-50)

Page 28: Introduction to Apache HBase, MapR Tables and Security

Agenda

HBase Overview

HBase APIs

MapR Tables

Example

Securing tables

Page 29: Introduction to Apache HBase, MapR Tables and Security

Example: Employee Database

Column Family: Base

– lastName

– firstName

– address

– SSN

Column Family: salary

– ‘dynamic’ columns

– year:salary

Row key

– lastName:firstName? Not unique

– Unique id? Can’t search easily

– lastName:firstName:id? Can’t search by id

32

Page 30: Introduction to Apache HBase, MapR Tables and Security

Source: “employee class”

public class Employee {

String key;

String lastName, firstName, address;

String ssn;

Map<Integer, Integer> salary;

}

33

Page 31: Introduction to Apache HBase, MapR Tables and Security

Source: „schema‟

byte[] BASE_CF = Bytes.toBytes("base");

byte[] SALARY_CF = Bytes.toBytes("salary");

byte[] FIRST_COL = Bytes.toBytes("firstName");

byte[] LAST_COL = Bytes.toBytes("lastName");

byte[] ADDRESS_COL = Bytes.toBytes("address");

byte[] SSN_COL = Bytes.toBytes("ssn");

String tableName = userdirectory + "/" + shortName;

byte[] TABLE_NAME = Bytes.toBytes(tableName);

34

Page 32: Introduction to Apache HBase, MapR Tables and Security

Source: “get table”

HTablePool pool = new HTablePool();

table = pool.getTable(TABLE_NAME);

return table;

35

Page 33: Introduction to Apache HBase, MapR Tables and Security

Source: “get row”

Whole row

Get g = new Get(Bytes.toBytes(key));

Result result = getTable().get(g);

Just base column family

Get g = new Get(Bytes.toBytes(key));

g.addFamily(BASE_CF);

Result result = getTable().get(g);

36

Page 34: Introduction to Apache HBase, MapR Tables and Security

Source: “parse row”

Employee e = new Employee();

e.setKey(Bytes.toString(r.getRow()));

e.setLastName(getString(r, BASE_CF, LAST_COL));

e.setFirstName(getString(r,BASE_CF, FIRST_COL));

e.setAddress(getString(r,BASE_CF, ADDRESS_COL));

e.setSsn(getString(r,BASE_CF, SSN_COL));

String getString(Result r, byte[] cf, byte[] col) {

byte[] b = r.getValue(cf, col);

if (b != null)

return Bytes.toString(b);

else return "";

}

37

Page 35: Introduction to Apache HBase, MapR Tables and Security

Source: “parse row”

//get salary information

Map<byte[], byte[]> m = r.getFamilyMap(SALARY_CF);

Iterator<Map.Entry<byte[], byte[]>> i =

m.entrySet().iterator();

while (i.hasNext()) {

Map.Entry<byte[], byte[]> entry = i.next();

Integer year =

Integer.parseInt(Bytes.toString(entry.getKey()));

Integer amt = Integer.parseInt(Bytes.toString(

entry.getValue()));

e.getSalary().put(year, amt);

}

38

Page 36: Introduction to Apache HBase, MapR Tables and Security

Demo

Create a table using MCS

Create a table and column families using maprcli

39

$ maprcli table create -path /user/keys/employees

$ maprcli table cf create -path /user/keys/employees -cfnamebase

$ maprcli table cf create -path /user/keys/employees -cfnamesalary

Page 37: Introduction to Apache HBase, MapR Tables and Security

Demo

Populate with sample data using hbase shell

40

hbase> put '/user/keys/employees', 'k1', 'base:lastName', 'William'

> put '/user/keys/employees', 'k1', 'base:firstName', 'John'

> put '/user/keys/employees', 'k1', 'base:address', '123 street, springfield, VA'

> put '/user/keys/empoyees', 'k1', 'base:ssn', '999-99-9999'

> put '/user/keys/employees', 'k1', 'salary:2010', '90000’

> put '/user/keys/employees', 'k1', 'salary:2011', '91000’

> put '/user/keys/employees', 'k1', 'salary:2012', '92000’

> put '/user/keys/employees', 'k1', 'salary:2013', '93000’

….….

Page 38: Introduction to Apache HBase, MapR Tables and Security

Demo

Fetch record using java program

41

$ ./run employees get k1

Use command get against table /user/keys/employees

Employee record:

Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={2010=90000, 2011=91000, 2012=92000, 2013=93000}]

Page 39: Introduction to Apache HBase, MapR Tables and Security

Demo – run script

42

#!/bin/bash

export LD_LIBRARY_PATH=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Linux-amd64-64

java -cp `hbaseclasspath`:/home/kbotzum/development/exercises/target/exercises.jarperson.botzum.hbase.Demo $*

Page 40: Introduction to Apache HBase, MapR Tables and Security

What Didn‟t I Consider?

43

Page 41: Introduction to Apache HBase, MapR Tables and Security

Row Key

Secondary ways of searching

– Other tables as indexes?

Long term data evolution

– Avro?

– Protobufs?

Security

– SSN is sensitive

– Salary looks kind of sensitive

What Didn‟t I Consider?

44

Page 42: Introduction to Apache HBase, MapR Tables and Security

Agenda

HBase Overview

HBase APIs

MapR Tables

Example

Securing tables

Page 43: Introduction to Apache HBase, MapR Tables and Security

MapR Tables Security Access Control Expressions (ACEs)

– Boolean logic to control access at table, column family, and column level

46

Page 44: Introduction to Apache HBase, MapR Tables and Security

ACE Highlights

Creator of table has all rights by default

– Others have none

Can grant admin rights without granting read/write rights

Defaults for column families set at table level

Access to data depends on column family and column access controls

Boolean logic

47

Page 45: Introduction to Apache HBase, MapR Tables and Security

MapR Tables Security

Leverages MapR security when enabled

– Wire level authentication

– Wire level encryption

– Trivial to configure

• Most reasonable settings by default

• No Kerberos required!

– Portable

• No MapR specific APIs

48

Page 46: Introduction to Apache HBase, MapR Tables and Security

Demo

Enable cluster security

Yes, that’s it!

– Now all Web UI and CLI access requires authentication

– Traffic is now authenticated using encrypted credentials

– Most traffic is encrypted and bulk data transfer traffic can be encrypted

49

# configure.sh –C hostname –Z hostname -secure –genkeys

Page 47: Introduction to Apache HBase, MapR Tables and Security

Demo

Fetch record using java program when not authenticated

50

$ ./run employees get k1

Use command get against table /user/keys/employees

14/03/14 18:42:39 ERROR fs.MapRFileSystem: Exception while trying to get currentUser

java.io.IOException: failure to login: Unable to obtain MapR credentials

Page 48: Introduction to Apache HBase, MapR Tables and Security

Demo

Fetch record using java program

51

$ maprlogin password

[Password for user 'keys' at cluster 'my.cluster.com': ]

MapR credentials of user 'keys' for cluster 'my.cluster.com' are written to '/tmp/maprticket_1000'

$ ./run employees get k1

Use command get against table /user/keys/employees

Employee record:

Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={2010=90000, 2011=91000, 2012=92000, 2013=93000}]

Page 49: Introduction to Apache HBase, MapR Tables and Security

Demo

Fetch record using java program as someone not authorized to table

52

$ maprlogin password

[Password for user 'fred' at cluster 'my.cluster.com': ]

MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to '/tmp/maprticket_2001'

$ ./run /user/keys/employees get k1

Use command get against table /user/keys/employees

2014-03-14 18:49:20,2787 ERROR JniCommonfs/client/fileclient/cc/jni_common.cc:7318 Thread: 139674989631232 Error in DBGetRPC for table /user/keys/employees, error: Permission denied(13)

Exception in thread "main" java.io.IOException: Error: Permission denied(13)

Page 50: Introduction to Apache HBase, MapR Tables and Security

Demo

Set ACEs to allow read to base information but not salary

Fetch whole record using java program

53

$ ./run /user/keys/employees get k1

Use command get against table /user/keys/employees

2014-03-14 18:53:15,0806 ERROR JniCommonfs/client/fileclient/cc/jni_common.cc:7318 Thread: 139715048077056 Error in DBGetRPC for table /user/keys/employees, error: Permission denied(13)

Exception in thread "main" java.io.IOException: Error: Permission denied(13)

Page 51: Introduction to Apache HBase, MapR Tables and Security

Demo

Set ACEs to allow read to base information but not salary

Fetch just base record using java program

54

$ ./run employees getbase k1

Use command get against table /user/keys/employees

Employee record:

Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={}]

Page 52: Introduction to Apache HBase, MapR Tables and Security

What Else Didn‟t I Consider?

55

Page 54: Introduction to Apache HBase, MapR Tables and Security

57©MapR Technologies

© MapR Technologies, confidential

Questions?

57

Page 55: Introduction to Apache HBase, MapR Tables and Security

58©MapR Technologies

© MapR Technologies, confidential

Hbase Architecture

Page 56: Introduction to Apache HBase, MapR Tables and Security

What is HBase? (Cluster View)

ZooKeeper (ZK)

HMaster (HM)

Region Servers (RS)

For MapR, there is less delineation between Control and Data Nodes.

ZooKeeper

NameNode

A B

HMaster

C DHMaster

ZooKeeper

ZooKeeper

Masterservers

Slaveservers

Region Server

Data NodeRegion Server

Data Node

Region Server

Data Node

Region Server

Data Node

Page 57: Introduction to Apache HBase, MapR Tables and Security

What is a Region?

The basic partitioning/sharding unit of HBase.

Each region is assigned a range of keys it is responsible for.

Region servers serve data for reads and writes

Region Server

Client

Region Region

HMaster

zookeeper

Region Region

Region Server

Key colB colC

val val

val

Key colB colC

val val

val

Key colB colC

val val

val

Key colB colC

val val

val

zookeeperzookeeper