introduction to apache hbase, mapr tables and security
TRANSCRIPT
1©MapR Technologies
© MapR Technologies, confidential
Introduction to Apache HBase,
MapR Tables, and Security
Agenda
HBase Overview
HBase APIs
MapR Tables
Example
Securing tables
What‟s HBase??
A NoSQL database
– Synonym for ‘non-traditional’ database
A distributed columnar data store
– Storage layout implies performance characteristics
The “Hadoop” database
A semi-structured database
– No rigid requirements to define columns or even data types in advance
– It’s all bytes to HBase
A persistent sorted Map of Maps
– Programmers view
3
Column Oriented
CF1
colA colB colC
val val
val
Row is indexed by a key
– Data stored sorted by key
Data is stored by columns grouped into column families
– Each family is a file of column values laid out in sorted order by row key
– Contrast this to a traditional row oriented database where rows are stored together with fixed space allocated for each row
CF2
colA colB colC
val val
val
RowKey
axxx
gxxx
Customer Address data Customer order dataCustomer id
HBase Data Model- Row Keys
Row Keys: identify the rows in an HBase table.
RowKey
CF1 CF2 …
colA colB colC colA colB colC colD
R1
axxx val val val val
…
gxxx val val val val
R2
hxxx val val val val val val val
…
jxxx val
R3
kxxx val val val val
…
rxxx val val val val val val
… sxxx val val
Rows are Stored in Sorted Order
Sorting of row key is based upon binary values
–Sort is lexicographic at byte level
–Comparison is “left to right”
Example:
–Sort order for String 1, 2, 3, …, 99, 100: 1, 10, 100, 11, 12,…, 2, 20, 21, …, 9, 91, 92, …, 98, 99
– Sort order for String 001, 002, 003, …, 099, 100: 001, 002, 003, …, 099, 100
–What if the RowKeys were numbers converted to fixed sized binary?
Tables are split into Regions = contiguous keys
Source: Diagram from Lars George‟s HBase: The Definitive Guide.
KeyRange
Region1Key Range
axxx
gxxx
Tables are partitioned into key ranges (regions)
Region= contiguous keys, served by nodes (RegionServers)
Regions are spread across cluster: S1, S2…
Region 2Key Range
Lxxx
zxxx
Region
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Region
Rowkey
axxx
gxxx
Region Server for Region 2, 3
HBase Data Model- Cells
Value for each cell is specified by complete coordinates:
– RowKey Column Family Column Version: Value
– Key:CF:Col:Version:Value
RowKey CF:Qualifier version value
smithj Data:street 12734567800 Main street
Column Key
Sparsely-Populated Data
Missing values: Cells remain empty and consume no storage
RowKey
CF1 CF2 …
colA colB colC colA colB colC colD
Region1
axxx val val val val
…
gxxx val val val val
Region2
hxxx val val val val val val val
…
jxxx val
R3
kxxx val val val val
…
rxxx val val val val val val
… sxxx val val
HBase Data Model Summary
Efficient/Flexible
– Storage allocated for columns only as needed on a given row
• Great for sparse data
• Great for data of widely varying size
– Adding columns can be done at any time without impact
– Compression and versioning are usually built-in and take advantage of column family storage (like data together)
Highly Scalable
– Data is sharded amongst regions based upon key
• Regions are distributed in cluster
– Grouping by key = related data stored together
Finding data
– Key implies region and server, column family implies file
– Efficiently get to any data by key
Agenda
HBase Overview
HBase APIs
MapR Tables
Example
Securing tables
Basic Table Operations
Create Table, define Column Families before data is imported
– But not the rows keys or number/names of columns
Basic data access operations (CRUD):
put Inserts data into rows (both add and update)
get Accesses data from one row
scan Accesses data from a range of rows
delete Delete a row or a range of rows or columns
CRUD Operations Follow A Pattern (mostly)
Most common pattern
– Instantiate object for an operation: Put put = new Put(key)
– Add or Set attributes to specify what you need: put.add(…)
– Execute the operation against the table: myTable.put(put)
// Insert value1 into rowKey in columnFamily:columnName1
Put put = new Put(rowKey);
put.add(columnFamily, columnName1, value1);
myTable.put(put);
// Retrieve values from rowA in columnFamily:columnName1
Get get = new Get(rowKey);
get.addColumn(columnFamily, columnName1);
Result result = myTable.get(get);
Put Example
byte [] invTable = Bytes.toBytes("/path/Inventory");
byte [] stockCF = Bytes.toBytes(“stock");
byte [] quantityCol = Bytes.toBytes (“quantity”);
long amt = 24l;
HTableInterface table = new HTable(hbaseConfig, invTable);
Put put = new Put(Bytes.toBytes (“pens”));
put.add(stockCF, quantityCol, Bytes.toBytes(amt));
table.put(put);
quantity
pens 24
CF “stock"Inventory
Put Operation – Add method
Once a Put instance is created you call an add method on it
Typically you add a value for a specific column in a column family
– ("column name" and "qualifier" mean the same thing)
Optionally you can set a timestamp for a cell
Put add(byte[] family, byte[] qualifier, long ts, byte[]
value)
Put add(byte[] family, byte[] qualifier, byte[] value)
Put Operation –Single Put Example adding multiple column values to a row
byte [] tableName = Bytes.toBytes("/path/Shopping");
byte [] itemsCF = Bytes.toBytes(“items");
byte [] penCol = Bytes.toBytes (“pens”);
byte [] noteCol = Bytes.toBytes (“notes”);
byte [] eraserCol = Bytes.toBytes (“erasers”);
HTableInterface table = new HTable(hbaseConfig, tableName);
Put put = new Put(“mike”);
put.add(itemsCF, penCol, Bytes.toBytes(5l));
put.add(itemsCF, noteCol, Bytes.toBytes(5l));
put.add(itemsCF, eraserCol, Bytes.toBytes(2l));
table.put(put);
Bytes classhttp://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/util/Bytes.html
org.apache.hadoop.hbase.util.Bytes
Provides methods to convert Java types to and from byte[] arrays
Support for
String, boolean, short, int, long, double, and float
Example:
byte [] bytesTablePath = Bytes.toBytes("/path/Shopping");
String myTable = Bytes.toString(bytesTablePath);
byte [] amountBytes = Bytes.toBytes(1000l);
long amount = Bytes.toLong(amount);
Get Operation – Single Get Examplebyte [] tableName = Bytes.toBytes("/path/Shopping");
byte [] itemsCF = Bytes.toBytes(“stock");
byte [] penCol = Bytes.toBytes (“pens”);
HTableInterface table = new HTable(hbaseConfig, tableName);
Get get = new Get(“Mike”);
get.addColumn(itemsCF, penCol);
Result result = myTable.get(get);
byte[] val = result.getValue(itemsCF, penCol);
System.out.println("Value: " + Bytes.toLong(val));
Get Operation – Add And Set methods
Using just a get object will return everything for a row.
To narrow down results call add
– addFamily: get all columns for a specific family
– addColumn: get a specific column
To further narrow down results, specify more details via one or more set calls then call add
– setTimeRange: retrieve columns within a specific range of version timestamps
– setTimestamp: retrieve columns with a specific timestamp
– setMaxVersions: set the number of versions of each column to be returned
– setFilter: add a filter
get.addColumn(columnFamilyName, columnName1);
Result – Retrieve A Value From A Result
public static final byte[] ITEMS_CF= Bytes.toBytes("items");
public static final byte[] PENS_COL = Bytes.toBytes(“pens");
Get g = new Get(Bytes.toBytes(“Adam”));
g.addColumn(ITEMS_CF , PENS_COL);
Result result = table.get(g);
byte[] b = result.getValue(ITEMS_CF, PENS_COL);
long valueInColumn = Bytes.toLong(b);
http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/Result.html
Items:pens Items:notepads Items:erasers
Adam 18 7 10
Other APIs
Not covering append, delete, and scan
Not covering administrative APIs
24
Agenda
HBase Overview
HBase APIs
MapR Tables
Example
Securing tables
Tables and Files in a Unified Storage Layer
HBase
JVM
HDFS
JVM
ext3 FS
Disks
ApacheHBase onHadoop
HBase
JVM
Apache HBase onMapR Filesystem
MapR-FS
Disks
HDFS API
M7 Tables Integratedinto Filesystem
MapR-FS
Disks
HBase API HDFS API
MapR Filesystem is an integrated system
– Tables and Files in a unified filesystem, based on MapR’s enterprise-grade storage layer.
Portability
MapR tables use the HBase data model and API
Apache HBase applications work as-is on MapR tables
–No need to recompile
–No vendor lock-in
MapR-FS
Disks
HBase API HDFS API
MapR M7 Table Storage
Table regions live inside a MapR container
– Served by MapR fileserver service running on nodes
– HBase RegionServer and HBase Master services are not required
Region Region
Container
Key colB colC
val val
val
Key colB colC
val val
val
Region Region
Container
Key colB colC
val val
val
Key colB colC
val val
val
Client Nodes
MapR Tables vs. HBase
• Compaction delays• Manual administration• Poor reliability• Lengthy disaster recovery
• No Compaction delays• Easy administration• Strong consistency• Rapid recovery• 2x Cassandra performance • 3x HBase performance
Apache HBase
MapR M7 vs. CDH – Mixed Load (50-50)
Agenda
HBase Overview
HBase APIs
MapR Tables
Example
Securing tables
Example: Employee Database
Column Family: Base
– lastName
– firstName
– address
– SSN
Column Family: salary
– ‘dynamic’ columns
– year:salary
Row key
– lastName:firstName? Not unique
– Unique id? Can’t search easily
– lastName:firstName:id? Can’t search by id
32
Source: “employee class”
public class Employee {
String key;
String lastName, firstName, address;
String ssn;
Map<Integer, Integer> salary;
…
}
33
Source: „schema‟
byte[] BASE_CF = Bytes.toBytes("base");
byte[] SALARY_CF = Bytes.toBytes("salary");
byte[] FIRST_COL = Bytes.toBytes("firstName");
byte[] LAST_COL = Bytes.toBytes("lastName");
byte[] ADDRESS_COL = Bytes.toBytes("address");
byte[] SSN_COL = Bytes.toBytes("ssn");
String tableName = userdirectory + "/" + shortName;
byte[] TABLE_NAME = Bytes.toBytes(tableName);
34
Source: “get table”
HTablePool pool = new HTablePool();
table = pool.getTable(TABLE_NAME);
return table;
35
Source: “get row”
Whole row
Get g = new Get(Bytes.toBytes(key));
Result result = getTable().get(g);
Just base column family
Get g = new Get(Bytes.toBytes(key));
g.addFamily(BASE_CF);
Result result = getTable().get(g);
36
Source: “parse row”
Employee e = new Employee();
e.setKey(Bytes.toString(r.getRow()));
e.setLastName(getString(r, BASE_CF, LAST_COL));
e.setFirstName(getString(r,BASE_CF, FIRST_COL));
e.setAddress(getString(r,BASE_CF, ADDRESS_COL));
e.setSsn(getString(r,BASE_CF, SSN_COL));
String getString(Result r, byte[] cf, byte[] col) {
byte[] b = r.getValue(cf, col);
if (b != null)
return Bytes.toString(b);
else return "";
}
37
Source: “parse row”
//get salary information
Map<byte[], byte[]> m = r.getFamilyMap(SALARY_CF);
Iterator<Map.Entry<byte[], byte[]>> i =
m.entrySet().iterator();
while (i.hasNext()) {
Map.Entry<byte[], byte[]> entry = i.next();
Integer year =
Integer.parseInt(Bytes.toString(entry.getKey()));
Integer amt = Integer.parseInt(Bytes.toString(
entry.getValue()));
e.getSalary().put(year, amt);
}
38
Demo
Create a table using MCS
Create a table and column families using maprcli
39
$ maprcli table create -path /user/keys/employees
$ maprcli table cf create -path /user/keys/employees -cfnamebase
$ maprcli table cf create -path /user/keys/employees -cfnamesalary
Demo
Populate with sample data using hbase shell
40
hbase> put '/user/keys/employees', 'k1', 'base:lastName', 'William'
> put '/user/keys/employees', 'k1', 'base:firstName', 'John'
> put '/user/keys/employees', 'k1', 'base:address', '123 street, springfield, VA'
> put '/user/keys/empoyees', 'k1', 'base:ssn', '999-99-9999'
> put '/user/keys/employees', 'k1', 'salary:2010', '90000’
> put '/user/keys/employees', 'k1', 'salary:2011', '91000’
> put '/user/keys/employees', 'k1', 'salary:2012', '92000’
> put '/user/keys/employees', 'k1', 'salary:2013', '93000’
….….
Demo
Fetch record using java program
41
$ ./run employees get k1
Use command get against table /user/keys/employees
Employee record:
Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={2010=90000, 2011=91000, 2012=92000, 2013=93000}]
Demo – run script
42
#!/bin/bash
export LD_LIBRARY_PATH=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Linux-amd64-64
java -cp `hbaseclasspath`:/home/kbotzum/development/exercises/target/exercises.jarperson.botzum.hbase.Demo $*
What Didn‟t I Consider?
43
Row Key
Secondary ways of searching
– Other tables as indexes?
Long term data evolution
– Avro?
– Protobufs?
Security
– SSN is sensitive
– Salary looks kind of sensitive
What Didn‟t I Consider?
44
Agenda
HBase Overview
HBase APIs
MapR Tables
Example
Securing tables
MapR Tables Security Access Control Expressions (ACEs)
– Boolean logic to control access at table, column family, and column level
46
ACE Highlights
Creator of table has all rights by default
– Others have none
Can grant admin rights without granting read/write rights
Defaults for column families set at table level
Access to data depends on column family and column access controls
Boolean logic
47
MapR Tables Security
Leverages MapR security when enabled
– Wire level authentication
– Wire level encryption
– Trivial to configure
• Most reasonable settings by default
• No Kerberos required!
– Portable
• No MapR specific APIs
48
Demo
Enable cluster security
Yes, that’s it!
– Now all Web UI and CLI access requires authentication
– Traffic is now authenticated using encrypted credentials
– Most traffic is encrypted and bulk data transfer traffic can be encrypted
49
# configure.sh –C hostname –Z hostname -secure –genkeys
Demo
Fetch record using java program when not authenticated
50
$ ./run employees get k1
Use command get against table /user/keys/employees
14/03/14 18:42:39 ERROR fs.MapRFileSystem: Exception while trying to get currentUser
java.io.IOException: failure to login: Unable to obtain MapR credentials
Demo
Fetch record using java program
51
$ maprlogin password
[Password for user 'keys' at cluster 'my.cluster.com': ]
MapR credentials of user 'keys' for cluster 'my.cluster.com' are written to '/tmp/maprticket_1000'
$ ./run employees get k1
Use command get against table /user/keys/employees
Employee record:
Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={2010=90000, 2011=91000, 2012=92000, 2013=93000}]
Demo
Fetch record using java program as someone not authorized to table
52
$ maprlogin password
[Password for user 'fred' at cluster 'my.cluster.com': ]
MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to '/tmp/maprticket_2001'
$ ./run /user/keys/employees get k1
Use command get against table /user/keys/employees
2014-03-14 18:49:20,2787 ERROR JniCommonfs/client/fileclient/cc/jni_common.cc:7318 Thread: 139674989631232 Error in DBGetRPC for table /user/keys/employees, error: Permission denied(13)
Exception in thread "main" java.io.IOException: Error: Permission denied(13)
Demo
Set ACEs to allow read to base information but not salary
Fetch whole record using java program
53
$ ./run /user/keys/employees get k1
Use command get against table /user/keys/employees
2014-03-14 18:53:15,0806 ERROR JniCommonfs/client/fileclient/cc/jni_common.cc:7318 Thread: 139715048077056 Error in DBGetRPC for table /user/keys/employees, error: Permission denied(13)
Exception in thread "main" java.io.IOException: Error: Permission denied(13)
Demo
Set ACEs to allow read to base information but not salary
Fetch just base record using java program
54
$ ./run employees getbase k1
Use command get against table /user/keys/employees
Employee record:
Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={}]
What Else Didn‟t I Consider?
55
References
http://www.mapr.com/blog/getting-started-mapr-security-0
http://www.mapr.com/
http://hadoop.apache.org/
http://hbase.apache.org/
http://tech.flurry.com/2012/06/12/137492485/
http://en.wikipedia.org/wiki/Lexicographical_order
Hbase in Action, Nick Dimiduck, Amandeep Khurana
HBase: The Definitive Guide, Lars George
Note: this presentation includes materials from the MapR HBase training classes
57©MapR Technologies
© MapR Technologies, confidential
Questions?
57
58©MapR Technologies
© MapR Technologies, confidential
Hbase Architecture
What is HBase? (Cluster View)
ZooKeeper (ZK)
HMaster (HM)
Region Servers (RS)
For MapR, there is less delineation between Control and Data Nodes.
ZooKeeper
NameNode
A B
HMaster
C DHMaster
ZooKeeper
ZooKeeper
Masterservers
Slaveservers
Region Server
Data NodeRegion Server
Data Node
Region Server
Data Node
Region Server
Data Node
What is a Region?
The basic partitioning/sharding unit of HBase.
Each region is assigned a range of keys it is responsible for.
Region servers serve data for reads and writes
Region Server
Client
Region Region
HMaster
zookeeper
Region Region
Region Server
Key colB colC
val val
val
Key colB colC
val val
val
Key colB colC
val val
val
Key colB colC
val val
val
zookeeperzookeeper