building secure nosql applications nosqlnow_conf_2014
TRANSCRIPT
+
Building Secure Applications With HBase / Accumulo
Sujee Maniyam [email protected]
Nosql now! 2014 Conference
Aug 2014, San Jose, CA
+About This Talk…
n Some practical tips & design patterns on building secure applications using HBase and Accumulo
n A quick demo (fingers crossed!)
n Audience : technical
+Who Invited This Guy?
n HI, I am Sujee Maniyam
n Founder / Principal @ Elephant Scale Consulting & Training in Big Data, NoSQL
n Co-Author of open source Hadoop book: http://hadoopilluminated.com
n Founder / Organizer of ‘Big Data Guru’ meetup http://www.meetup.com/BigDataGurus/
n Open source : http://github.com/sujee
n http://sujee.net | http://www.linkedin.com/in/sujeemaniyam
+NoSQL eco-system (too many!)
+HBase : Quick Intro
n Modeled after Google Big Table
n Distributed, Nosql store built on Hadoop / HDFS
n Apache project
n http://hbase.apache.org/
HDFS
HBase
+Accumulo : Quick Intro
n Developed by the National Security Agency (NSA) !
n Google Big Table implementation
n Nosql store on top of HDFS
n Security is a first grade concept
HDFS
Accumulo
+HBase & Accumulo
n Both are Big Table implementation
n Based on HDFS
n Written in Java
n Apache open source projects
HDFS
HBase Accumulo
+Approach to Security in Hadoop Until Recently…
+But Security Picture Has Improved Rapidly…
n Lot of work going on in the eco system
n Hadoop vendors (Cloudera / HortonWorks ..) have been very actively working on security features
n ‘the core’ features are in
n Ease of use improving as well
+Next : Building Secure Applications
+What Does It Mean to be ‘Secure’?
n 1) Control who can get in?
n 2) Verify the person’s identity
n 3) safeguard communications with user
n 4) What is allowed for this user
n 5) And finally… n Protect data at rest
+1) Who can get in
n Control which machines can connect to NoSQL cluster
n Don’t expose the cluster to public n Too many open ports
n Too vulnerable
n Solutions: n Run cluster behind firewall
n Restrict which machines can connect to cluster
n Linux / Network level security
n Outside the actual NoSQL
+Trusted Environment
+2) User Authentication
n Wolf: Knock… Knock…
n Pig : Who is there?
n Wolf : It is me… little pig
n How can we verify the user? n Username / password (gmail)
n Or use a third person (referee) n Kerberos
Source : http://1.bp.blogspot.com/
+Kerberos : Quick Primer
n Kerberos is a authentication protocol for networked machines
n Validates client to server and vice-versa
n Strong crypto algorithms (AES, 3DES…)
+Kerberos Protocol for Getting a Beer in a Carnival / Fair J_
+Kerberos Protocol Explained : Getting Beer @ Fair / Party
n Prove your age (identity) to wrist-band issuer n Ticket Granting Ticket
n Get a wristband à qualifies you to get beer n Service Ticket
n Go to bartender and ask for beer using your wrist-band n Service Request
n Get Beer ! J
n For technically correct explanation see : http://www.roguelynn.com/words/explain-like-im-5-kerberos/
+Kerberos Integration
HBase Accumulo
Kerberos Integration yes Yes (simple authentication built-in also)
+3) Secure Client Communication
n Guard client / server communication (‘on the wire’)
n Done by using SASL (certificates)
n Prevents snooping by third parties
Hbase Accumulo
Secure client communications
Yes Yes
+4) What Is Allowed For This User?
n In unsecured environment users can read / write to any table n à not very secure!
n Control which data users can see..
+Quick Primer on HBase Storage
n Tables have many rows
n Row has multiple columns (or qualifiers)
n They are grouped into column families
n Each cell also has a timestamp (not shown here)
info secure
Customer_id name email phone Last 4 social
Full ssn
Family1
Cell
Family2
+HBase Allows Access Control At Family Level
info secure
Customer_id name email phone Last 4 social
Full ssn
First level CSR can Only access this family
Only supervisors can access this family
+Need More Fine Grained Access
n We like to provide ‘cell level’ access controls
n Greater flexibility in application development
n More fine grained access controls
n Meet Accumulo’s Data Model
+Accumulo Data Model
Family : info
Columns à name email Last 4 ssn Ssn Gmail password
Visibility tokens à
Level 1 Level 1 Level 1 Level 2 OR Top clearance
Top clearance
• Every thing in HBase data model • Plus each row has a ‘Visibility Token’
+Users Are Assigned ‘Visibility Tokens’
User id Visibility levels
User 1 Level 1
User 2 Level 1 + Level 2
Edward Snowden Level 1 + Level 2 + Top Clearance
+Accumulo only returns cells visible to user
family
Columns à name email Last 4 SSN Full SSN Gmail password
person1 Joe [email protected]
6789 123-45-6789
JoeSuperMan!
Visibility tokens à
Level 1 Level 1 Level 1 Level 2 OR Top clearance
Top clearance
+What Users Can See…
User Visibility Privilage Visible Cells
User 1 Level 1 Name Email Last 4 ssn
User 2 Level 1 + Level 2
Name Email Last 4 SSN Full SSN
Edward Snowden Level 1 + Level 2 + Top Clearance
Name Email Last 4 SSN Full SSN Gmail Password
+Good News For HBase
n With release 0.98 Hbase also allows cell based access controls
n Called ‘tags’
n Need to upgrade to Hfile V3 (version 3) format
+Visibility / Access Controls
n Both HBase and Accumulo allow access control for the data
Hbase Accumulo
Cell Level Visibility Yes (Starting with v 0.98)
Yes
+5) Final Step : Encrypt Data At Rest
n Eventually data ends up in disk
n We need to protect the ‘raw data’ on disk
n To prevent n Users going to disk directly
n Theft of hardware
+Solution : Encrypt Data Transparently
n Encryption is done via keys n Uses Java Cryptography Extension (JCE)
n Data is encrypted before writing to HDFS n Does not rely on HDFS or Linux level encryption
n Per family encryption is supported
Hbase Accumulo
Encryption At Rest Yes Yes
+HBase & Accumulo : Transparent Encryption
+Encryption : Key Management
n The keys have to managed carefully… n Don’t loose them !
n Don’t compromise them !!
n Possible storage mechanisms n Database
n Remote file server
n Key management server
n Local file system
+Summary
HBase Accumulo
Runs in a trusted environment Yes (outside configuration)
Yes (outside configuration)
User Authentication Kerberos Kerberos + Built-in
Secure client communications (via SSL)
Yes Yes
Visibility at cell level Yes (starting from v0.98)
Yes
Encrypt data at rest Yes Yes
+Useful Resources
n Accumulo n http://www.slideshare.net/DonaldMiner/accumulo-
oct2013bofpresentation
n HBase n http://hbase.apache.org/book/hbase.encryption.server.html
+DEMO
+Demo Explained
Name email ssn Gmail_password
Person1 Joe Smith [email protected]
123-45-6789 ‘JoeDaMan!’
Visibility Level
Level 1 Level 1 Level 2 Top
Demonstrate cell level visibility feature of accumulo Here is how the data looks like:
+Demo : Accumulo Users + Visibility
Accumulo user
Table1 access
Access level
Visible Columns
root yes all all
user1 yes Level 1 Name, email
user2 yes Level 1 + Level 2
Name, email + SSN
esnowden yes Level 1 + Level 2 + Top
Name, email + SSN + Gmail password J
user3 no N/A N/A
+Thanks & Questions!
http://ElephantScale.com
Expert consulting & training in Big Data (Hadoop, NoSQL, Spark)
Free, online Hadoop book ‘Hadoop illuminated’