bigdatatech 2016 how to manage authorization rules on hadoop cluster with apache ranger

42
How to manage authorization rules on Hadoop cluster with Apache Ranger Krzysztof Adamski

Upload: krzysztof-adamski

Post on 17-Feb-2017

43 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

How to manage authorization rules on Hadoop cluster with Apache Ranger

Krzysztof Adamski

Page 2: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger
Page 3: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

3

We deliver innovativeIT services for the ING Groupall over the world.

ING Services Polska

Page 4: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

4

SocialHarmonisation

Digitalisation

Customer Call CentresWebservices

In the Cloud

Virtual Bank

Software as a Service

Infrastructure as a Service

SeamlessConcept of ONE

No geographical boundaries

Exception Handling

APIs

My identity

Straight through processing

Customer experiencePersonalisation

Automation

Standardisation

Agile

Self Service

Mobile FirstReal Time

Security

24/7

‘Outside in and Inside out’

Omnichannel

Zero Touch

Customer journeys

Analytics

Big Data

Digitalised branches

Building standard for new generation digital bank

Cloud Platform as a service

Data Centre

Page 5: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

197

289

58

10Średnia wieku w ISP

20-30 31-40 41-50 50-70

33,26

People matters

55416,43% (91)83,57%

(463)

5

Page 6: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

How secure is your cluster?

Page 7: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Ownership and permissions look fine…

Page 8: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

How secure is your cluster?

Page 9: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

That must have been a sophisticated hack…

Page 10: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

3 x A or 4 as you wish

Page 11: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Hadoop authentication methods

Simple

Page 12: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Hadoop authentication methods

Kerberos

Page 13: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

HDFSHiveServer 2

A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Knox gets service ticket for Hive

Knox runs as proxy user using Hive ST

Original request with user id and password

Client gets query result

Client

Apache Knox

Active Directory

Hortonworks Ring of Defense Architecture

hortonworks.com

Page 14: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

What is IPA?

redhat.com

Page 15: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

AD Account mapping

redhat.com

Page 16: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

SSSD integration

redhat.com

Page 17: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

IPA for central UAM• This works great for OS• Can this be used by Hadoop?• Can this be used by Ranger?

Page 18: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

HDFSHiveServer 2

A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Knox gets service ticket for Hive

Knox runs as proxy user using Hive ST

Original request with user id and password

Client gets query result

Client

Apache Knox

Active Directory

Hortonworks Ring of Defense Architecture

hortonworks.com

Page 19: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Installation through ambari

hortonworks.com

Page 20: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Installation through ambari

hortonworks.com

HDP 2.3.4

Watch for ranger.usersync.source.impl.class property

Page 21: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Enable Ranger for HDFS

hortonworks.com

Page 22: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger
Page 23: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

hortonworks.com

Page 24: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

hortonworks.com

Page 25: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger
Page 26: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Ranger audit

• It is recommended that you store audits in Solr and HDFS, and disable Audit to DB.

• Otherwise you can expect performance issues• Audit is stored in a single table• No partitions• No data retention

Page 27: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger
Page 28: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

IPA as a central UAM• This works great for OS• Can this be used by Hadoop? Works great for PA in IPA• Can this be used by Ranger? Not yet. You still need to bind to LDAP.

Page 29: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Ranger KMS

One big advantage of encryption in HDFS is that even privileged users, such as the “hdfs” superuser, can be blocked from viewing encrypted data.

Page 30: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Caveats• Ranger (the same goes for Sentry) feels like slapped on security• User synchronization can be very slow with many users due to

architecture issues• Doesn’t manage HDFS ACLS and requires Hive user access… defeating

end to end security• Vulnerability scans just kill Ranger ;)

Page 31: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Caveats

Page 32: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

mysql> select count(*) from x_user;+----------+| count(*) |+----------+| 99 |+----------+1 row in set (0.00 sec)

Page 33: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

mysql> select count(*) from x_group;+----------+| count(*) |+----------+| 45 |+----------+1 row in set (0.00 sec)

Page 34: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

mysql> select count(*) from x_group_users;+----------+| count(*) |+----------+| 645697 |+----------+1 row in set (0.13 sec)

Page 35: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

mysql> select sum(user_id) from (select count(distinct user_id) user_id from x_group_users group by p_group_id) temp;+--------------+| sum(user_id) |+--------------+| 603 |+--------------+1 row in set (1.21 sec)

Page 36: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

mysql> delete from x_group_users where id not in(

select minid from (select min(id) as minid from x_group_users group by

p_group_id,user_id) as temp);

Page 37: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Make it better• https://issues.apache.org/jira/browse/RANGER-827 usersync SSSD integration (sync excplicitly specified group)• https://issues.apache.org/jira/browse/HADOOP-12751 allow users with domain suffix (avoid naming collision)• https://issues.apache.org/jira/browse/HIVE-12981 the same for Hive• https://issues.apache.org/jira/browse/RANGER-842 PAM integrated authentication for Ranger

Page 39: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Other upcoming features (0.6)• Tag based policies• Geolocation based policies• Deny and exclude policies• Hive Metastore plugin

Page 40: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger
Page 41: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Some take away tips • Install updates on a regular basis• Isolate your cluster from the rest of the network• Kerberize your cluster• Secure the user interfaces• dfs.namenode.acls.enabled• fs.permissions.umask-mode• Watch for superusers (hadoop.proxyuser settings)• Change OS default umask (watch for the upgrades and config permissions)• Make sure hive warehouse hdfs path is protected• Implement Ranger• Just don’t sync your whole AD with it ;)

Page 42: BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

[email protected]

@adamskikrzysiek

http://pl.linkedin.com/in/adamskikrzysztof

And yes. We are hiring