what the #$* is a business catalog and why you need it

15
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What the #$* is a Business Catalog and Why You Need It! June 28, 2016 Apache Atlas

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

577 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: What the #$* is a Business Catalog and why you need it

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What the #$* is a Business Catalog and Why You Need It!June 28, 2016

Apache Atlas

Page 2: What the #$* is a Business Catalog and why you need it

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Disclaimer

This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.

Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.

This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.

Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.

Page 3: What the #$* is a Business Catalog and why you need it

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The Problem

• Low confidence in Data - Fragmentation of metadata across the enterprise

• Duplicate or MIA – Incorrect or missing classification

• Rigid Governance – Traditional MDM tools are not agile, cannot keep up with rate of data change

Page 4: What the #$* is a Business Catalog and why you need it

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Atlas Solution• Cross component lineage: Dynamically capture

dataset lineage

• Single source: Combine and centralize information about your data

• Dynamic Access Control: Integration with Ranger

• Taxonomy (Business Catalog!): Common Business Language. Hierarchically organized – No dupes !

Page 5: What the #$* is a Business Catalog and why you need it

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What is the Atlas Business Catalog ? Organize data assets along business

terms• Authoritative: Hierarchical Taxonomy

Creation• Agile modeling: Model Conceptual,

Logical, Physical assets• Definition and assignment of tags like PII

(Personally Identifiable Information) Comprehensive features for compliance

• Multiple user profiles including Data Steward and Business Analysts

• Object auditing to track “Who did it?”• Metadata Versioning to track ”what did

they do?”

Key Benefits:

Organize data assets along business terms

Impact analysis, Compliance, Acceptable use

Faster Insight

Page 6: What the #$* is a Business Catalog and why you need it

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Taxonomies (catalog) enables:

• Search / Discovery – Business catalog of conceptual, logical and physical assets

• Security --Dynamic metadata based Access control

Page 7: What the #$* is a Business Catalog and why you need it

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

We conduct open-ended user interviews so that we can learn more about who are users are and what their needs are. This helps us validate whether or not we’re solving the right problem.

Research: Focused on Hadoop

Page 8: What the #$* is a Business Catalog and why you need it

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

We test our prototype in InVision - a click through prototyping tool that allows users to interact with static mockups.

Usability Testing

Page 9: What the #$* is a Business Catalog and why you need it

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Principle Roles & Activities

• Data Steward – Curator, responsible for catalog veracity

• Data Scientist – Analyst, primary consumer of Business Catalog

• Administrator – Role management only

• Data Engineer – Data ingress and egress, semantic data quality

• 50% - 80%+ Time spend looking for data

• Profit Center• Primary

User of Atlas

• Enables ScientistGoal: < 25% spent

on finding data=Empowering scientist to spend their time uncovering insights -- faster

Page 10: What the #$* is a Business Catalog and why you need it

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key ConceptsBusiness Taxonomy (Catalog)The practice and science of classification of things or concepts, including the principles that underlie such classification. The business organization model is hierarchical, making it authoritative with no duplication.Data Lineage (Provenance)Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources.Tags: Traits vs. Labels vs. Business TaxonomyAtlas has Tags that are authoritative and prevent duplication. Tag can span different parts of the business taxonomy. A tag PII can be used in HR as well Finance or Sales.

Benefits:

A view of data assets organized by business language

Impact analysis, Compliance, Acceptable use

Common tag though Hadoop components

Page 11: What the #$* is a Business Catalog and why you need it

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Walk Through

• User Setup Atlas via Ranger• Create & Browse Taxonomy of Business Terms• Create & Browse Tags• Search for Assets• Classify Assets with Business Terms• Associate Assets with Tags

Summer GA

Page 12: What the #$* is a Business Catalog and why you need it

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Atlas Value

• Designed for Hadoop at platform, not application level

• High Confidence data in Hadoop for regulated verticals

• Compliance and business objectives aligned to data organization

• Faster discovery for analysts – reduce time to value• Agile and adaptable – ensures information is current

by native connectors• Dynamic protection with Ranger in simple audited

policies

Page 13: What the #$* is a Business Catalog and why you need it

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

In Flight:Feature patches being review & committed

• Object Versioning UX – Current state of object active or deleted

• Comment Tab – User can add comments for collaboration

• DQ / Profile Notes Tab – Populate by 3rd parties or by Steward via UI

Page 14: What the #$* is a Business Catalog and why you need it

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Additional Atlas Sessions

• Top 3 Big Data Governance Issues:

Tuesday 4:10PM @ Room 212

• Extend Governance in Hadoop with the Atlas Ecosystem: integrations with partners Waterline, Trifacta and Attivo:

Thursday 4:10PM @ Room 210A

Page 15: What the #$* is a Business Catalog and why you need it

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Learn More:

• Hortonworks links: http://hortonworks.com/solutions/security-and-governance/

• Tutorials: https://github.com/hortonworks/tutorials/tree/atlas-ranger-tp/tutorials/hortonworks/atlas-ranger-preview