extend governance in hadoop with atlas ecosystem: waterline, attivo & trifacta
TRANSCRIPT
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extend Governance in Hadoop with Atlas Ecosystem:Waterline, Attivo & Trifacta
June 30, 2016
Apache Atlas
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
This documentmaycontain productfeatures andtechnology directions thatareunderdevelopment, may beunderdevelopment inthefutureormayultimately notbedeveloped.
Projectcapabilities arebased oninformation thatis publicly available within theApache Software Foundationprojectwebsites ("Apache"). Progress oftheprojectcapabilities canbetrackedfrominception toreleasethroughApache, however, technical feasibility, marketdemand, userfeedback and theoverarching ApacheSoftware Foundation community development process canalleffecttiming andfinal delivery.
This document’s description ofthesefeatures andtechnology directions does notrepresentacontractualcommitment, promise orobligation fromHortonworks todeliver thesefeaturesinany generally availableproduct.
Product featuresand technology directions aresubject tochange, andmust notbeincluded incontracts,purchase orders, orsales agreements ofanykind.
Sincethis document contains anoutline ofgeneral productdevelopment plans, customers should notrelyupon itwhenmaking purchasing decisions.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
STRUCTURED
UNSTRUCTURED
Vision - Enterprise Data Governance Across Platforms
TRADITIONALRDBMS
METADATA
MPP APPLIANCES
Project 1
Project 5
Project 4
Project 3
METADATA
Project 6
DATALAKE
Atlas: Metadata Truth in Hadoop
Data Managementalong the entire data lifecycle with integrated provenance and lineage capability
Modeling with Metadataenables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilitiesInteroperable Solutionsacross the Hadoop ecosystem, through a common metadata store
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Governance Ready Certification Program
DiscoveryTagging
Prep /Cleanse
ETL
GovernanceBPM
Self Service
Visualization
Choice: Customers choose features that they want to deploy—a la carte versus vendor lock
Curated & Fast: Selected group of vendor partners to provide rich, complimentary and complete features ready to deploy
Agile: Low switching costs, Faster deployment and innovation
Centralized: Common SLA & common open metadata store
Flexibility: Interoperability of products through Atlas metadata
Safe: HDP at core to provide stability and interoperability
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Governance Ready Certification Program
DiscoveryTagging
Prep /Cleanse
ETL
GovernanceBPM
Self Service
Visualization
The Apache open source community is committed to collaboration which critical for proper data governance. Partners have adopted this commitment and are extending governance capabilities by integrating their products with Atlas -- which is providing a rich innovative community with a common metadata store backed by Atlas. This session will showcase 3 vendors:
– Waterline– Attivo– Trifacta
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Additional Atlas Sessions
• BOF:ApacheKnoxandApacheRangerprovideHadoopsecuritywhileAtlasprovidesaHadoopmetadatastoreandenterprisecompliance.Comelearnanddiscusssecurity&governanceinnovationsandfuturedirections.
Thursday 5-7 PM @ Room 210A
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn More:
• Hortonworkslinks:http://hortonworks.com/solutions/security-and-governance/
• Tutorials:https://github.com/hortonworks/tutorials/tree/atlas-ranger-tp/tutorials/hortonworks/atlas-ranger-preview
Unlock The Value Of The Data Lake With Waterline Data’s Smart Data Catalog
Time To Value
AUTOMATICALLY catalog data assets across ALL the data AND enable SELF-SERVICE access
Tribal Knowledge Sharing
AUGMENT semantic discovery by CROWDSOURCING tribal data knowledge
Trust
Enable AGILE GOVERNANCE with automated tagging, data stewardship, and SECURE SELF-SERVICE access to data based on role and policy
Shopping Metaphor For “Managed” Self-Service: Amazon.com
Catalog Find, Understand And Collaborate Provision
Workflow Of Enabling Self-Service Analytics With Hortonworks
Hortonworks Atlas And Ranger
Data Prep Analytics & Visualization
Smart Data Discovery
Profiling, Sensitive Data & Data
Lineage Discovery, Automated Tagging
Data Stewardship
Curate Tags
Self-Service Data Catalog
Find, Collaborate And Take Action
Metadata, Tags, Data Lineage
Metadata, Tags, Roles & Access Control
Roles & Access Control
UNIFY YOUR DATA ACROSS SILOSJoe LichtmanVice President, [email protected]
1 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL
WHO IS ATTIVIO?
Attivio unifies your data across silos to provide a 360° view of your
business
2 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL
Gartner Magic Quadrant For Enterprise Search, Q3 2015
Forrester Wave: Big Data Search and Knowledge Discovery Solutions,Q3 2015
Forrester Wave:Big Data Text AnalyticsPlatforms, Q2 2016
Gartner Magic Quadrant For Enterprise Search, August 2015
Forrester Wave:Big Data Search and Knowledge Discovery Solutions, Q3 2015
LEADER IN SEARCH, DATA DISCOVERY AND TEXT ANALYTICS
3 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL
SEMANTIC DATA CATALOG
Attivio radically reduces time spent finding and understanding data sources to speed time-to-analytics.
• Catalogs all your enterprise information• Identifies what’s most relevant• Unifies all structured and semi-structured sources
in a visual model• Provisions leading BI and predictive analytics
tools such as Qlik, R, RapidMiner, Spotfire, and Tableau
58%of the effort for BI initiatives is wasted on data exploration and integration
33%of businesses cite big data discovery as a challenge they are facing
50%Businesses use less than half of their available data for BI
4 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL
CATALOG ALL YOUR ENTERPRISE INFORMATION
• Spiders and extracts metadata for all information types
• Automatically catalogs data and content with semantic meaning
• Applies human expertise to fine-tune tagging and align with business rules
5 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL
IDENTIFY THE RIGHT INFORMATION
• Delivers natural language and keyword search
• Provides an eCommerce-like shopping cart for data
• Recommends the most relevant data for your context
6 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL
UNIFY THE INFORMATION FOR YOUR ANALYTIC CONTEXT
• Automatically generates data models
• Correlates all structured data and unstructured content
• Simplifies provisioning to BI and advanced analytic tools
7 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL
PROVISION & OPERATIONALIZE DATA AS A STRATEGIC ASSET
• Provision directly to agile BI and analytics tools
OR
• Rationalize the data warehouse for greater simplicity and lower cost
• Power domain-specific apps
DATA WRANGLING
What is Data Wrangling?
2
QUESTION ANALYZE INSIGHTDISCOVER STRUCTURE CLEANSE ENRICH VALIDATE PUBLISH
INGESTION
ACCESS
DATA SOURCES
Transactional Databanking
credit cardslendingwealth
mortgagesledgerstrades
payments
Interaction Datasocial
webchat
Analytics
Reporting
Data Product Models
BUSINESS OPERATIONS
Data Wrangling within the Hortonworks Data Lake
Discovery Zone
SharedZone
Raw DataZone