apache hive authorization models

15
© Hortonworks Inc. 2011 Hive Authorization Models Thejas Nair [email protected] @thejasn Page 1

Upload: thejas-nair

Post on 27-Jan-2015

121 views

Category:

Technology


0 download

DESCRIPTION

Apache Hive has different models of authorization that you can use based on the use case you have. Also discusses how to setup and configure hive to use appropriate authorization models.

TRANSCRIPT

Page 1: Apache Hive authorization models

© Hortonworks Inc. 2011

Hive Authorization Models

Thejas Nair

[email protected]

@thejasn

Page 1

Page 2: Apache Hive authorization models

© Hortonworks Inc. 2011

Authentication vs Authorization

• Authentication–Verifying your identity–Enabled in Hadoop using Kerberos

• Authorization–Verifying if you have permissions to perform this action

Page 2Architecting the Future of Big Data

Pic1 – http://www.flickr.com/photos/matsuyuki/2906448025/Pic2 - http://www.flickr.com/photos/86818962@N00/3209747460http://www.flickr.com/photos/matsuyuki/2906448025/

Page 3: Apache Hive authorization models

© Hortonworks Inc. 2011

Hive architecture

Page 3Architecting the Future of Big Data

Hive client

Metastore server RDBMS

HDFS

Map Reduce

What are we trying to protect here ?

Data

Metadata

Page 4: Apache Hive authorization models

© Hortonworks Inc. 2011

Actions controlled by authorization

Page 4Architecting the Future of Big Data

• Metadata operations - Access/changes to RDBMS storing the metadata

• Storage operations- create, write, read operations

- Storage (HDFS) comes with its own authorization, the challenge is protecting the metadata.

Page 5: Apache Hive authorization models

© Hortonworks Inc. 2011

Existing models of authorization

Page 5Architecting the Future of Big Data

1. Traditional RDBMS style authorization–Use Case: Hive is like an RDMBS, managing its own data

2. Storage based authorization–Use Case: Hadoop is providing shared storage, Hive is one of

the tools to use this–HCatalog world view

3. No Authorization–Makes sense in prototype or single user case–Metadata is not protected

Page 6: Apache Hive authorization models

© Hortonworks Inc. 2011

Traditional RDBMS style authorization

Page 6Architecting the Future of Big Data

• Use grant, revoke statements to manage permissions• Store permissions in Metastore RDBMS• But HDFS authorization is separate

–Two sources of truth!–HDFS permissions can still grant access

• Problems sharing the stored data with other tools

Page 7: Apache Hive authorization models

© Hortonworks Inc. 2011

Traditional RDBMS style authorization

Page 7Architecting the Future of Big Data

• Hive is only tool - use case–Disable all other tools, set 777 permissions to HDFS files?

–Easy to bypass Hive authorization–Hive allows arbitrary code in UDFs, or Hive streaming code

–You still need to manage HDFS file permissions

• Permission model is incomplete–HIVE-3720 has a new proposal

• Does not protect against malicious users

Page 8: Apache Hive authorization models

© Hortonworks Inc. 2011

Storage based authorization model

Page 8Architecting the Future of Big Data

• Use HDFS/storage permissions as only source of truth–Works well if you have other systems accessing the data

• eg. Table directory permissions determine table permissions–To alter table metadata you need write permissions on table

directory

• Problem: Hive concepts such as columns and views don't map to files. –Coarse vs fine grained authorization

Page 9: Apache Hive authorization models

© Hortonworks Inc. 2011

Potential solution

Page 9Architecting the Future of Big Data

• Combine the two models?–Add HDFS permission verification/management to a traditional

RDMBS style authorization

–Use grant/revoke on file system user and groups

–Tables populated by external tools can be marked as ‘external’– Hive does not manage index, statistics

– (personal opinion – need to make detailed proposal)

Page 10: Apache Hive authorization models

© Hortonworks Inc. 2011

Hive secure setup - Metastore

Page 10Architecting the Future of Big Data

• Don’t trust end clients• Standalone metastore server to protect access to metastore RDBMS–Set hive.metastore.uris in client

• Have metastore do actions as user–hive.metastore.execute.setugi=true in client and server–Creates files as the user

• Enable verification on metastore (hive 0.10) (HIVE-3705)hive.metastore.pre.event.listeners=org.apache.hadoop.hive.ql.security.authorization

hive.security.metastore.authenticator.manager=org.apache.hadoop.hive.ql.security.HiveMetastoreAuthenticationProvider

hive.security.metastore.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider

Page 11: Apache Hive authorization models

© Hortonworks Inc. 2011

Hive secure setup – auth setup

Page 11Architecting the Future of Big Data

• Turn on authorization!

• hive.security.authorization.enabled=true

Page 12: Apache Hive authorization models

© Hortonworks Inc. 2011

Setting RDBMS style authorization

Page 12Architecting the Future of Big Data

• This is the default model

• Set hive.security.authorization.createtable.owner.grants=ALL

Page 13: Apache Hive authorization models

© Hortonworks Inc. 2011

Setting storage based authorization

Page 13Architecting the Future of Big Data

• Use custom authorization manager StorageBasedAuthorizationProvider

hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider

• Available in hive since 0.10• Available in hcatalog earlier

–export HIVE_AUX_JARS_PATH=<hcatalog jar location>

–hive.security.authorization.manager=org.apache.hcatalog.security.HdfsAuthorizationProvider

Page 14: Apache Hive authorization models

© Hortonworks Inc. 2011

Other possibilities

Page 14Architecting the Future of Big Data

• AccessServer proposal based on HiveServer2–Clients use JDBC to talk to server that can serve queries from

Hive, Pig or other tools–Server restricts what can be run–Use improved version of traditional RDBMS style auth–Would require UDFs, serdes to be blessed by a Hive DBA–Disallow arbitrary streaming commands?

Page 15: Apache Hive authorization models

© Hortonworks Inc. 2011

Further reading

Page 15Architecting the Future of Big Data

• https://cwiki.apache.org/confluence/display/Hive/

LanguageManual+Authorization• https://cwiki.apache.org/confluence/display/HCATALOG/

Storage+Based+Authorization• https://cwiki.apache.org/confluence/display/Hive/

AccessServer+Design+Proposal

• HIVE-3705 - Adding authorization capability to the metastore• HIVE-3720 - Expand and standardize authorization in Hive