securing your hadoop cluster with apache ranger, atlas and...

29
Securing Your Hadoop Cluster With Apache Ranger, Atlas and Knox Attila Kanto & Zsombor Gegesy June 13 rd 2017 – Budapest Data Forum

Upload: others

Post on 11-Oct-2019

19 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

Securing Your Hadoop Cluster With Apache Ranger, Atlas and KnoxAttila Kanto & Zsombor Gegesy

June 13rd 2017 – Budapest Data Forum

Page 2: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Disclaimer

This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately never be developed.

Product capabilities are based on information that is publicly available within the Apache Software Foundation websites (“Apache”). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.

This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.

Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Since this document may contain an outline of general product development plans, customers should not rely upon it when making purchasing decisions.

Page 3: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Agenda

Security concepts overview

Apache Knox

Apache Ranger

Apache Atlas

Q&A

Page 4: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Five pillars of enterprise security

Page 5: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

HDP Security: Comprehensive, Complete, Extensible

Perimeter Level Security

• Network Security (i.e. Firewalls)

• Apache Knox (i.e. Gateways)

Authentication

• LDAP / AD

• Kerberos

Authorization

• Consistent authorization control across all HDP components with Apache Ranger

Data protection

• Encrypt data in motion and data at rest, Apache Ranger KMS

OS Security

• Process isolation

• Namespaces

Page 6: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Knox

Page 7: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

What is Apache Knox?

REST API and Application Gateway for the Apache Hadoop Ecosystem

Page 8: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Why Apache Knox?

Extensible reverse proxy framework

Simplifies access Kerberos encapsulation

Single access point for all REST and HTTP interactions

Multi-cluster support

Enhanced Security Eliminate SSH edge node (securely exposes REST APIs and HTTP based services at the perimeter)

Protects the details of the cluster deployment

Provides SSL for non-SSL services

Central auditing

Enterprise Integration LDAP/AD integration

SSO integration

Page 9: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

What the Apache Knox isn’t

Not an alternative to firewalls

Not an alternative to Kerberos

Not a channel for high volume data ingest or export

Page 10: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Knox OverviewProxying Services

Primary goals of the Apache Knox project is to provide access to Apache Hadoop via proxying of HTTP resources.

Authentication Services

Authentication for REST API access as well as WebSSO flow for UIs. LDAP/AD, Header based PreAuth, Kerberos, SAML, OAuth are all available options.

Client DSL/SDK Services

Client development can be done with scripting through DSL or using the Knox Shell classes directly as SDK.

WebSSO

AuthenticationAnd

Federationproviders

Groovy basedDSL

Client DSL/SDK Services

HTTPProxyingServices

UIs

RESTAPIs

WebSockets

Hive

Ambari

HBase

WebHCatWebHDFS

HadoopUIs

Authentication ServicesProxying Services

KnoxShellSDK

TokenSessions

RESTAPI

Classes

KnoxSSO/Token

YARN

Ranger

Zeppelin

Oozie

Phoenix

Gremlin

SQL/DB

SAML

OAuth

LDAP/AD

SPNEGO

HeaderBased

YARNRM

WebHCat

WebHDFS

HiveYARNRM

HBase

Page 11: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Cluster access through Edge Node

Hadoop Services

User

SSH/SCP

DMZ

Hadoop CLIsEdge Node

CLI hard to install on desktops

Limited auditing

CLIs must be aware of cluster topology

Page 12: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Cluster access through Gateway

Hadoop Services

User

REST API

DMZ

All activity audited consistently

Cluster topology is not exposed to the client

User connects trough a REST API

REST API

Gateway

REST API

Page 13: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Authentication and Identity Propagation

Hadoop Services

Gateway

User

Client is not aware the cluster is secured

with Kerberos

1. REST API Request

2. Authenticationchallenge

user:secret

0. Configure Knoxas trusted proxy

3. Authenticate asuser:secret

4. Authenticate asKnox via SPNEGO

(i.e. Kerberos)

5.REST API RequestdoAs user

Page 14: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Scalability and Fault Tolerance

Hadoop ServicesGateway

REST API

User

REST API

Load Balancer

REST API

Page 15: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Multi cluster support / multi tenant support

Hadoop Services

User

Gateway

Hadoop Services

Page 16: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Extensibility: Providers and Services

Providers Features of the the gateway that can be used by Services

Services Actual Hadoop services like WebHDFS, Hive, RM, etc.

Definitions of endpoints to the gateway to expose a specific service

Includes providing configuration (e.g. rewrite rules)

Topologies Assembly of providers and services

Page 17: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Ranger

Page 18: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Why Apache Ranger?

To define and administer security policies consistently

Define security policy once, and apply across the component stack

Deep visibility – detailed audit trail

Database

Table

Column

Queue ( be it Kafka, or YARN)

Any resource

Centralized Platform Fine-Grained Security On

Page 19: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Ranger

All the Hadoop components have some kind of user management

Not integrated, not too sophisticated– HDFS – Posix like User/Group access policy

– Hive/Hbase - db/table level restrictions for Users/Groups

– Etc …

It would be nice if restrictions could be applied:– Driven from LDAP/Active Directory

– Client IP address

– Time of access

– Data masking - ‘support’ only could see the last 4 number of credit card number

– Data filtering – ‘sales’ could only see the data from the same region

Page 20: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Ranger

Solution for authorization with flexible policies – with a plugin architecture

Supports:– HDFS

– Hive

– HBase

– Kafka

– Knox

– Storm

– YARN

– Nifi

– Atlas

Contributed community plugins:– Apache Hawq, Druid, Gaian DB

Page 21: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Ranger

One policy to grant:– Hdfs://home/sales/{USERNAME} – to all the users in the ‘sales’ group

Hive database, table, column level access– row level filter – ‘location = ”HU”’

– row level masking – hashing / hiding / etc

HBase– Table, Column-family and column level filtering

YARN :– Limit processing queue access

KNOX– Limit to topologies / services

Etc …

Page 22: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Ranger

Audit log about every access and decision– Stored in HDFS and/or Solr

– So it is easy to search/filter for audit events

– Could be sent to Kafka, for integrating with other services

Ranger KMS– Secure key management for HDFS (“data-at-rest”)

– Access control policies

– Audit

Page 23: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Can we have more flexibility?

If table/column/row specific restrictions are not enough– To configure hundreds of columns independently, manually is error prone

Tag based access decisions:– Every column taged as ‘Personal Information’ should be hidden from ‘X’

– Every table tagged with ’visibleBefore=2017-10-01’ should be hidden

But how to get the tags?

Page 24: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

STRUCTURED

UN

ST

RU

CT

UR

ED

- Metadata Truth in Hadoop

TRADITIONALRDBMS

METADATA

MPP APPLIANCES

Project 1

Project 5

Project 4

Project 3

Metadata

Project 6

DATALAKE

Data Managementalong the entire data lifecycle with integrated provenance and lineage capability

Modeling with MetadataCross- component dataset lineage. Centralized location for all metadata inside Hadoop

Interoperable SolutionsSingle Interface point for Metadata Exchange with platforms outside of Hadoop

Page 25: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Atlas

Graph about the Metadata – ability to collect and link various information automatically

As a graph, it is highly extensible– Define new nodes and edges between them, and even new node types

Dynamic query language

REST API – for external systems

External connectors with messaging frameworks

Page 26: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Atlas

For Hive – Which column contains what kind of data

– Who created / who consumes the data – lineage

– Lineage if created by

• Sqoop, Storm, Kafka, Falcon – or if it’s created by Hive SQL

Tags – for marking something as personal info …

Page 27: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Atlas + Ranger

More fine grained access decisions …

Page 28: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Atlas and Ranger Integration

Basic Tag policy – Access and entitlements can be based on attributes. – Personally Identifiable Information (PII) is a tag that can be leveraged to protect sensitive personal

data.

Geo-based policy – Access policy based on location. – A user might be able to access data in North America, but may be restricted from access in EMEA

due to privacy compliance.

Time-based policy – Access policy based on time windows. – A user might be able to access data only between 8AM – 5PM (common in SOX regulations.)

Prohibitions – Restrictions on combining two data sets which might be in compliance originally, but not when combined together. – Names and health care records

Page 29: Securing Your Hadoop Cluster With Apache Ranger, Atlas and ...biconsulting.hu/letoltes/2017budapestdata/kanto_attila_gegesy_zsombor... · Securing Your Hadoop Cluster With Apache

29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Q&A