curb your insecurity with hdp - tips for a secure cluster
TRANSCRIPT
Curb Your Insecurity with HDP Tips for a Secure Cluster (with Spark too)
Ancil McBarneA Senior Solu*ons Engineer – Security & Governance Future of Data Meetup – New York June 2nd, 2016
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• Introduction to Hadoop Security – The 4 Steps to Hadoop Security
• Authentication with Kerbeos – Integra*on with LDAP
• Authorization with Apache Ranger – Hive, HDFS, YARN
• Rest API Security with Apache Knox – WebHDFS
– Hive • Encrypt the Data/ Data Protection
– Transparent Data Encryp*on and KMS
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How do I set policy across the en*re cluster?
Who am I/prove it?
What can I do?
What did I do?
How can I encrypt at rest and over the wire?
Comprehensive Approach to Security
Data ProtecDon
Protect data at rest and in mo*on
In order to protect any data system you must implement the following:
Audit
Maintain a record of data access
AuthorizaDon
Provision access to data
AuthenDcaDon
Authen*cate users and systems
AdministraDon
Central management and consistent security
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP Security: Comprehensive, Complete, Extensible
Data ProtecDon
Protect data at rest and in mo*on
Security in HDP is the most comprehensive, complete and extensible for Hadoop
Audit
Maintain a record of data access
AuthorizaDon
Provision access to data
AuthenDcaDon
Authen*cate users and systems
AdministraDon
Central management and consistent security
Single administra*ve console to set policy across the en*re cluster: Apache Ranger
Authen*ca*on for perimeter and cluster; integrates with exis*ng Ac*ve Directory and LDAP solu*ons: Kerberos | Apache Knox
Consistent authoriza*on controls across all Apache components within HDP: Apache Ranger
Record of data access events across all components that is consistent and accessible: Apache Ranger
Encrypts data in mo*on and data at rest; refer partner encryp*on solu*ons for broader needs: HDFS TDE with Ranger KMS
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security: Rings of Defense
Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways)
AuthenDcaDon • Kerberos
OS Security
AuthorizaDon • MR ACLs • HDFS Permissions • HDFS ACLs • HiveATZ-‐NG • HBase ACLs • Accumulo Label Security
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Centralized Security with Ranger
• Administrators have complete visibility into the security administration process
Deep Visibility Centralized PlaVorm
• Administer security for: – Database – Table – Column
– LDAP Groups – Specific Users
Fine-‐Grained Security DefiniDon
• Centralized platform to define, administer and manage security policies consistently
• Define security policy once and apply it to all the applicable components across the stack
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization and Audit
Authorization Fine grain access control
• HDFS – Folder, File
• Hive – Database, Table, Column • HBase – Table, Column Family, Column
• Storm, Knox and more
Audit Extensive user access audi*ng in HDFS, Hive and HBase
• IP Address • Resource type/ resource
• Timestamp
• Access granted or denied
Control access into system
Flexibility in defining
policies
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AuthenDcaDon—API Security with Knox
Eliminates SSH “edge node”
Central API management
Central audit control
Service level authorization
SSO Integration—Siteminder and OAM
LDAP and AD integration
Incubated and led by Hortonworks, Apache Knox extends the reach of Hadoop REST API without Kerberos complexi*es
Integrated with exisDng systems to simplify idenDty maintenance
Single, simple point of access for a cluster
Central controls ensure consistency across one or more clusters
Kerberos Encapsulation
Single Hadoop access point
REST API hierarchy
Consolidated API calls
Multi-cluster support
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Load Balancer
Extend Hadoop API reach with Knox
Hadoop Cluster
Applica*on Tier App A App N App B App C
Data Ingest
ETL
Admin/ Operators
Bas*an Node
SSH
RPC Call
Falcon Oozie Scoop Flume
Data Operator
Business User
Hadoop Admin
JDBC/ODBC REST/HTTP
Knox
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop REST APIs
à Useful for connecting to Hadoop from the outside the cluster à When more client language flexibility is required
– i.e. Java binding not an op*on
à Challenges – Client must have knowledge of cluster topology – Required to open ports (and in some cases, on every host) outside the cluster
Service API WebHDFS Supports HDFS user opera*ons including reading files, wri*ng to
files, making directories, changing permissions and renaming. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL
commands. Learn more about WebHCat. Hive Hive REST API opera*ons HBase HBase REST API opera*ons Oozie Job submission and management, and Oozie administra*on.
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop REST API with Knox – Representative Examples
Service Direct URL Knox URL WebHDFS hkp://namenode-‐host:50070/webhdfs
hkps://knox-‐host:8443/webhdfs
WebHCat hkp://webhcat-‐host:50111/templeton
hkps://knox-‐host:8443/templeton
Oozie hkp://ooziehost:11000/oozie
hkps://knox-‐host:8443/oozie
Hbase/Stargate
hkp://hbasehost:60080
hkps://knox-‐host:8443/hbase
Hive hkp://hivehost:10001/cliservice hkps://knox-‐host:8443/hive
YARN hkp://yarn-‐host:yarn-‐port/ws hkps://knox-‐host:8443/resourcemanager
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security in Hadoop with HDP
• Wire encryp*on in
Hadoop • HDFS Encryp*on
with Ranger KMS
• Centralized audit
repor*ng with Apache Ranger
• Fine-‐grain access
control with Apache Ranger
AuthorizaDon What can I do?
Audit What did I do?
Data ProtecDon Can data be encrypted at rest and over the wire?
• Kerberos • API security with Apache
Knox
AuthenDcaDon Who am I/prove it?
HDP 2.4
Centralized Security AdministraDon with Ranger
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection HDP allows you to apply data protection policy at different layers across the Hadoop stack
Layer What? How ?
Storage and Access Encrypt data while it is at rest HDFS Transparent Data Encryp*on, Partners,
Hbase encryp*on, OS level encrypt,
Transmission Encrypt data as it moves SSL, SASL, Supported from HDP 2.1
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Points of CommunicaDon
Page 26
WebHDFS
DataTransferProtocol
Nodes
M/R Shuffle
Client
1
2
4
RPC 3 Nodes
DataTransfer 2
JDBC/ODBC
3
Hadoop Cluster
RPC
4
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data ProtecDon -‐ HDFS EncrypDon
DATA ACCESS
DATA MANAGEMENT
SECURITY PARTNERS
YARN
KeyProvider API (partner integra*on point)
Key Management System (KMS)
Stateless Key Management
°
1
°
°
°
°
° °
° °
° °
° °
° N °
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
° °
° °
° °
° °
°
HDFS
EncrypDon Zone
Encrypted File
Encrypted File
Encrypted File
Encrypted File Encrypted
Files Name Node
HDFS Client
HDFS Client
• Hortonworks collabora*ng with partners to deliver enterprise scale Key Management , deliver more choices to customers
• Open source KMS with Ranger
• Or Partner with Voltage KMS - Partner joint engineering resources - Voltage Stateless Key Management integrated with KeyProvider API
Only HDP offers open source and
commercial choices for key management
Open Source Key Management
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security in Spark?
Spark supports running in a Kerberized Cluster Only Spark on YARN supports security (Kerberos support) From command line run kinit before submitting spark jobs Spark reads data from HDFS & ORC • HDFS file permissions (& Ranger integration) applicable to Spark jobs
Spark submits job to YARN queue • YARN queue ACL (& Ranger integration) applicable to Spark jobs
Wire Encryption • Spark has some coverage, not all channels are covered
LDAP Authentication • No Authentication in Spark UI OOB, supports filter for hooking in LDAP
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What makes Hadoop Summit Different? – Deep technical sessions chosen by the community – Business Track based on real-‐world implementa*ons – Keynotes from Progressive Insurance, Ford, Macy’s, MD Anderson, GE, Capital One, …
– Free Hands-‐on labs – Networking events and 10 Year Celebra*on! – 20% Off Code: 16SJext20x
Apache Hadoop, SPARK, IoT, Streaming, Data Science
EVERYTHING DATA!