atlas and ranger epam meetup
TRANSCRIPT
Next Generation of Hadoop Security & Governance
Apache Atlas + Ranger
Alex Zeltov – Solutions Engineer
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Data Platform Architecture
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger + Atlas Overview
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• Administrators have complete visibility into the security administration process
Deep VisibilityCentralized Platform
• Administer security for:– Database
– Table
– Column
– LDAP Groups
– Specific Users
Fine-Grained Security Definition
• Centralized platform to define, administer and manage security policies consistently
• Define security policy once and apply it to all the applicable components across the stack
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Data Governance
Organizations need data governance to understand its information to answer questions such as:
• What do we know about our information?• Where did this data come from and how’s it being used?• Does this data adhere to company policies and rules?
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: DGI Community becomes Apache Atlas
May2015
Proto-typeBuilt
Apache AtlasIncubation
DGI groupKickoff
Feb2015
Dec 2014
July2015HDP 2.3 FoundationGA Release
First kickoff to GA in 7 months
Global FinancialCompany
* DGI: Data Governance Initiative
Key Benefits:
• Co-Dev = Built for real customer use cases
• Faster & Safer = Customers know business + HWX knows Hadoop
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas
REST APIModern, flexible access to Atlas services, HDP components, UI & external tools
Search: SQL like DSL (Domain Specific Language)Support for key word, faceted and full text searches
Data Lineage Only product that captures lineage across Hadoop components at platform level.
ExchangeLeverage existing metadata / models by importing it from current tools. Export metadata to downstream systems
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Metadata?
Technical Metadata
• Database Name• Table Name• Column Name• Data Type
Business Metadata
• Business Name• Business Definition• Business Classification• Sensitivity Tags
Operational Metadata
• Who (security access)• What (job information)• When (logs/ audit trails)• Where (location)
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Access PolicyApache Ranger + Atlas Integration
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use cases drives design – high reliability
Metastore
• Tags• Assets• Entities
Notification Framework
Kafka Topics
AtlasAtlas Client
• Subscribes to Topic• Gets Metadata
Updates
PDPResource Cache
Ranger
Notification Metadata updates
Messagedurability
Optimized for Speed
Event driven updates
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tag-based Access Policy Requirements
• Basic Tag policy – PII example. Access and entitlements must be tag based ABAC and scalable in implementation.
• Geo-based policy – Policy based on IP address, proxy IP substitution maybe required. The rule enforcement but be geo aware.
• Time-based policy – Timer for data access, de-coupled from deletion of data.
• Prohibitions – Prevention of combination of Hive tables/Columns that may pose a risk together.
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Expanded Native Connectors: Dataset Lineage
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sqoop
TeradataConnector
ApacheKafka
Expanded Native Connector: Dataset Lineage
Custom Activity Reporter
MetadataRepository
RDBMS
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
UX proto-type: Taxonomy Navigation
Breadcrumbs for taxonomy context path
Contents at taxonomy context
DEMO