Security Features in Apache HBase –
An Operator’s Guide Anoop Sam John, Andrew Purtell, Ramkrishna S. Vasudevan
Committers and PMC Members, Apache HBase, Apache Software Foundation
Big Data US Research And Development, Intel
v5
• New Security Features in Apache HBase 0.98
• Controlling Access To Data
– Role-Based Access Control Using Groups and ACLs
– Role-Based Access Control Using Labels
– Attribute-Based Access Control Using Labels
• Preventing Data Leaks
– Transparent Encryption
• Performance Considerations
Outline
New Security Features in Apache HBase 0.98
Cell Tags
• All values written to HBase are stored in cells
• Cells can now also carry an arbitrary number of tags
– Metadata, considered distinct from the key and the value
– Compressed when persisted to HFiles
– Server side only
• Clients cannot get or send cells with tags directly
• Tags will be correctly replicated if cross-cluster replication is enabled
Cell ACLs (HBASE-7662)
• Extends the existing HBase ACL model with support for persisting
and checking per-cell ACL data in tags
– (R)ead, (W)rite, E(X)ecute, (A)dmin, (C)reate
– Namespace → Table →
Column Family → Cell
• Backwards compatible with
existing installs and code
• Uses existing facilities (operation
attributes) to carry cell ACLs to
supporting servers
Cell ACLs (HBASE-7662)
• Cell ACLs are scoped to the same point in time as the cell itself
– Simple and straightforward evolution of security policy over time without
expensive updates
• We require that mutations have covering permission
– The union of the user’s table perms, CF perms, and perms in the most
recent visible[1] version, if the value already exists, must allow the
pending mutation in order for it to be applied
– For Deletes, in addition, all visible prior versions covered by the Delete
must allow the Delete
– Delete semantics are being refined
• Complex Deletes may be rejected; just resubmit as simpler ops
• Improved in 0.98.2, likely fully resolved in 0.98.3
1. Visible is defined here as not covered already by a committed delete marker
Cell Labels (HBASE-7663)
• Visibility expression support via a new security coprocessor
– Labels: arbitrary strings
– Expressions: Labels joined in boolean expressions
– Operators: &, |, !, ( )
secret
secret | topsecret
( secret | topsecret ) & !probationary
Cell Labels (HBASE-7663)
• New admin APIs and new shell commands for label management
• The universe of labels and the maximal set of labels for a user are
defined up front
• Users label cells using visibility expressions
• Other users ask for authorizations on Gets and Scans
• We build a user’s effective set of authorizations per request in a
pluggable way on the server
• Scan results are filtered according to the user’s effective
authorizations
• VisibilityController and AccessController can be used together
Transparent Encryption (HBASE-7544)
• Transparent encryption of HBase on disk data
– HFile blocks are encrypted as written and decrypted as read
– Write ahead log (WAL) serialization is pluggable; we provide new
secure writers and readers that encrypt and decrypt edits
• Built on a new extensible cryptographic codec and key management
framework in HBase
• Simple key management
– Default provider integrates with the Java Keystore
• Per column family configuration
– Supports schema design that places sensitive information in only a
subset of column families
Transparent Encryption (HBASE-7544)
Endpoint EXEC Grants (HBASE-6104)
• HBase ACLs grant a familiar set of privileges to users and groups:
– (R)ead, (W)rite, E(X)excute, (C)reate, (A)dmin
• Versions prior to 0.98.0 ignored X
• Now access to coprocessor Endpoint invocations can be controlled
on a global, per-table, or per-column family basis
Controlling Access To Data
Our Example Schema
• A simple user information table
Row Key Column Family: i Column Family: pii
uid i:fullname pii:address
i:nick pii:phone
pii:cc
pii:cvv2
pii:expdate
> create ‘user’, \
{ NAME => ‘i’, COMPRESSION => ’snappy’, VERSIONS => 10 }, \
{ NAME => ‘pii’, COMPRESSION => ’snappy’, VERSIONS => 10 }
Our Example Security Policy
• Column family: i
Our Example Security Policy
• Column family: pii
Getting Started
• Enable HFile V3
– hfile.format.version=3
• Enable SASL+Kerberos authentication
– RPC: Follow the steps in section 8.1 of the online manual:
https://hbase.apache.org/book/security.html
– ZooKeeper: Follow the steps in section 17.2 of the online manual:
https://hbase.apache.org/book/zk.sasl.auth.html
• Install security coprocessors
– hbase.coprocessor.region.classes=
org.apache.hadoop.hbase.security.access.AccessController,
org.apache.hadoop.hbase.security.visibility.VisibilityController,
org.apache.hadoop.hbase.security.token.TokenProvider
Getting Started
– hbase.coprocessor.master.classes=
org.apache.hadoop.hbase.security.access.AccessController,
org.apache.hadoop.hbase.security.visibility.VisibilityController
– hbase.coprocessor.regionserver.classes=
org.apache.hadoop.hbase.security.access.AccessController
• Enable Endpoint exec permission checks
– hbase.security.exec.permission.checks=true
• [Optional] Enable transport security
– hbase.rpc.protection=auth-conf
Role-Based Access Control
Using the Hadoop Group Mapping Service and ACLs
• Map each role in the organization to a LDAP entity
– Employee ->
• cn=user, member: ou=users,dc=groups, dc=example,dc=org
– Developer ->
• cn=developer, member: ou=developers,dc=groups,dc=example,dc=org
– Test User Account ->
• cn=testuser, member: ou=users,dc=example,dc=org
– Service Account ->
• cn=service, member: ou=services,dc=example,dc=org
– Admin ->
• cn=manager,dc=example,dc=org
Role-Based Access Control
Using the Hadoop Group Mapping Service and ACLs
• Set up the Hadoop group mapper (core-site.xml) – hadoop.security.group.mapping=
org.apache.hadoop.security.LdapGroupsMapping
– hadoop.security.group.mapping.ldap.url=…
– hadoop.security.group.mapping.ldap.bind.user=…
– hadoop.security.group.mapping.ldap.search.filter.user=
(& (|(objectclass=person)(objectclass=applicationProcess))(cn={0}))
– hadoop.security.group.mapping.ldap.search.filter.group=
(objectclass=groupofnames)
– hadoop.security.group.mapping.ldap.search.attr.member= member
– hadoop.security.group.mapping.ldap.search.attr.group.name=cn
Role-Based Access Control
Using the Hadoop Group Mapping Service and ACLs
• Confirm the configuration is working correctly
hbase> whoami
service (auth:KERBEROS)
groups: services
Role-Based Access Control
Using the Hadoop Group Mapping Service and ACLs
• Grant permissions to groups and service and test accounts
hbase> grant '@admins', 'RWXCA'
hbase> grant 'service', 'RWXCA', 'user'
hbase> grant '@developers', 'RW', 'user', 'i'
hbase> grant 'testuser', 'RW', 'user', 'i'
hbase> grant 'user', \
{ '@developers' => 'RW', 'testuser' => 'R' }, \
{ COLUMNS => 'pii', FILTER => "(PrefixFilter ('test'))" }
Note: Cell grants done by the shell apply to existing cells only. This is useful for testing. In practice applications must add the desired cell ACL to the operation when submitting writes.
Role-Based Access Control
Using Labels
• Define labels corresponding to roles in the security policy
admin service test developer
Role-Based Access Control
Using Labels
• Express access rules as visibility expressions
admin | service
admin | service | test
admin | service | developer
admin | service | developer | test
• Define labels
hbase> add_labels [ 'admin', 'service', 'developer', 'test' ]
Role-Based Access Control
Using Labels
• Assign one or more roles to each user by associating their principal
with a label set
hbase> set_auths 'service', [ 'service' ]
hbase> set_auths 'testuser', [ 'test' ]
hbase> set_auths 'manager', [ 'admin' ]
hbase> set_auths 'dev', [ 'developer' ]
hbase> set_auths 'qa', [ 'test', 'developer' ]
hbase> …
Role-Based Access Control
Using Labels
• Apply appropriate visibility expressions to cells
hbase> set_visibility 'user', 'admin|service|developer', \
{ COLUMNS => 'i' }
hbase> set_visibility 'user', 'admin|service', \
{ COLUMNS => ' pii' }
hbase> set_visibility 'user', 'admin|service|developer|test',\
{ COLUMNS => [ 'i', 'pii' ], \
FILTER => "(PrefixFilter ('test'))" }
Note: Visibility expressions added to cells by the shell apply to existing cells only. This is useful for testing. In practice applications must add the desired visibility expression to the operation when submitting writes.
Attribute-Based Access Control
• We can construct the effective authorization set for a user in a
pluggable and stackable way
← Retrieves principal for user
← Maps principal to group names
← Imports auths from request
← Enforces minimum auths
Auths table
← Maps identity attributes to auths
Directory
Attribute-Based Access Control
• LDAP plugin can mix in auths corresponding to attributes of the
subject’s identity
– Expected soon in 0.98 (maybe 0.98.4)
Query
(&(objectClass=person)
(userPrincipalName={0}))
Attribute Mapping
<attribute>: <regex> → <auth>
memberOf: .+ -> $1
division: .+ -> $1
department: .+ -> $1
employeeID: P[0-9]+ -> probationary
Directory
Attribute-Based Access Control
Using Labels
• Apply appropriate visibility expressions to cells
hbase> set_visibility 'user', \
'admin|service|(developer&(!probationary))', \
{ COLUMNS => 'i' }
hbase> set_visibility 'user', 'admin|service', \
{ COLUMNS => ' pii' }
hbase> set_visibility 'user', \
'admin|service|((developer|test)&(!probationary))', \
{ COLUMNS => [ 'i', 'pii' ], \
FILTER => "(PrefixFilter ('test'))" }
Attribute-Based Access Control
Using ACLs
• An area of future work
– We could consider a HBase provided replacement for the Hadoop
Group Mapper that also supports mapping object attributes to strings
– For the VisibilityController, the mapped strings would be interpreted as
auths (see slide #27)
– For the AccessController, the mapped strings could be interpreted as
group names
– See HBASE-10919[1] or raise a discussion on [email protected]
1. https://issues.apache.org/jira/browse/HBASE-10919
Preventing Data Leaks
Protecting Data At Rest
• HBase is deployed into a layered system
• Incorrect handling of permissions or storage volumes at the HDFS
layer or below could expose sensitive information
Apache HBase
Apache ZooKeeper
ZooKeeper ZooKeeper ZooKeeper
Apache Hadoop Distributed File System (HDFS)
DataNode
Master Master
(Standby)
RegionServer
DataNode DataNode DataNode DataNode
RegionServer RegionServer RegionServer RegionServer
Getting Started
• Create the cluster master key in a KeyStore file
$ keytool -keystore hbase.jks -storetype jceks –genseckey \
-keyalg AES -keysize 128 -storepass secret \
-alias hbase-master-default
• Deploy the KeyStore file to all site configuration directories and
restrict local access to it
$ chown hbase:hbase hbase.jks
$ chmod 0600 hbase.jks (-rw-------)
• Enable HFile V3
– hfile.format.version=3
Getting Started
• Set up key provider configuration for KeyStore files
– hbase.crypto.keyprovider=
org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider
– hbase.crypto.keyprovider.parameters=
jceks:///path/to/hbase/conf/hbase.jks?password=secret
– hbase.crypto.master.key.name=hbase-master-default
• Restrict local access to the site file
$ chown hbase:hbase hbase-site.xml
$ chmod 0600 hbase-site.xml (-rw-------)
• The KeyStore password need not be embedded in the site file – Use ?passwordFile=/path/to/password/file and protect that instead
Getting Started
• Enable WAL encryption
– hbase.crypto.wal.key.name=hbase-master-default
– hbase.regionserver.hlog.reader.impl=
org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
– hbase.regionserver.hlog.writer.impl=
org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
– hbase.regionserver.wal.encryption=true
WAL encryption is configured separately from HFile encryption to enable
storage management with tiered sensitivity
• (JRE 8+) Enable AES-NI acceleration features
– Add to hbase-env.sh: – XX:+UseAES –XX:+UseAESIntrinsics
Transparent Encryption
• Segregate sensitive information into one or a few column families
with HFile encryption enabled
– We are storing sensitive personally identifiable customer information in
the “pii” family
– Enable encryption on “pii” only to mitigate performance impact
– After changing schema, run a major compaction to insure all files are
(eventually) transformed
hbase> disable 'user'
hbase> alter 'user', { NAME => 'pii',\
COMPRESSION => 'snappy', \
ENCRYPTION => 'aes' }
hbase> enable 'user'
hbase> major_compact 'user'
Row Key Column Family: i Column Family: pii
uid i:fullname pii:address
i:nick pii:phone
pii:cc
pii:cvv2
pii:expdate
Transparent Encryption
• Data key management
– RegionServers retrieve and unwrap CF keys from descriptors as
needed to encrypt HFiles
– The data key for a CF can be modified at any time by the admin
• Or, encryption can be enabled and disabled entirely
• CF encryption is completely reversible!
– HFiles contain the data key used for encryption, wrapped (encrypted) by
the master key
• Supports incremental rekeying without expensive IO or downtime
– Simply trigger major compaction to normalize encryption and data
keying state over the entire CF
• Can be done on a region by region basis with a HBase shell script
Transparent Encryption
• Master key rotation
– Should be an infrequent operation, an attacker able to observe even all
schema and HFiles gains very little information about it over time
– Store a copy of the current master key with an alternate alias e.g.
“hbase-master-alt”
– Replace the master key with a new one
– Update site file
• hbase.crypto.master.alternate.key.name=hbase-master-alt
– Do a rolling restart of all HBase server processes
– Trigger a major compaction and wait for completion
– Remove the old master key from the KMS and remove alt alias from site
– Do another rolling restart of all HBase server processes
Key Providers
• Any Key Management System with a Java KeyStore provider can be
supported by the KeyStoreKeyProvider
• Or natively, via custom HBase KeyProviders
• Update site configuration hbase.crypto.keyprovider
hbase.crypto.keyprovider.parameters
HBase
KeyStoreKeyProvider
HBase
YourKeyProvider
JDK KeyStore provider framework
Thales Luna CloudHSM . . .
Cipher Providers
• We support alternate or accelerated ciphers with either:
1. Java Cryptography (JCE) algorithm provider
• Install a signed JCE provider (supporting “AES/CTR/NoPadding”
mode with 128 bit keys)
• Add it with highest preference to the JCE site configuration file $JAVA_HOME/lib/security/java.security
• Update site configuration hbase.crypto.algorithm.aes.provider
hbase.crypto.algorithm.rng.provider
2. Custom HBase Cipher implementation
• Start at org.apache.hadoop.hbase.io.crypto.CipherProvider
• Make it available on the server classpath
• Update site configuration hbase.crypto.cipherprovider
Performance Considerations
WAL Encryption
• Performance implications of WAL encryption
– As measured by HLogPerformanceEvaluation microbenchmark
– Relative differences are what is interesting
– WAL throughput ceiling ~10% lower with 7u45
– ~8% lower with 8u20
• Future mitigation: When HDFS storage tiering capability is in
production, configure separate storage tiers for WAL and HFile data
Test Throughput
ops/sec Total cycles
Insns per cycle
Oracle Java 1.7.0_45-b18 - None 52658.302 8878179986750 0.47
Oracle Java 1.7.0_45-b18 - AES WAL encryption 48045.834 9911748458387 0.57
OpenJDK 1.8.0_20-b09 - None 54874.125 8662634367005 0.46
OpenJDK 1.8.0_20-b09 - AES WAL encryption 50659.507 9668111259270 0.61
Promoting Common ACLs
• When designing security policy for a table, consider that table and
column family level grants are inexpensive compared to cell level
grants
– Table and CF level grants are cached in memory
– Cell level grants require region scanning
• We consider permissions as the union of grants at all levels; a table
or CF grant allows us to early out
• If a user will always be granted permissions at the cell level,
promote their access to a column family or table level grant
End
Questions?