manage tracability with apache atlas, a flexible metadata repository

Post on 06-Jan-2017

837 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright Synaltic 2015

Manage tracability with Apache Atlas,a flexible metadata repository

Charly ClairmontSynaltic@egwadacclairmont@synaltic.frhttp://synaltic.fr

Copyright Synaltic 2015

More than ten years experience in IT mainly in BI

Cofounder of Altic, now Synaltic

Cofounder of the Hadoop User Groupe France

Belives in Open Source to help enterprises to create value

Helps open source projects to be known via meetups and conference

Charly Clairmont

2

Copyright Synaltic 2015

An integrator company mainly focused in Data Management

Founded in 2004, Synaltic is the merge of two companies Synotis and Altic

25 specialists in Data Management

A Swiss subsidiary, installed in Lausanne

Our values● Commitment● Expertise● Loyalty

Synaltic

3

R&D

Training

SupportProject

Expertise

Data Intelligence

Data Platform

Data Governance

Data ExchangeSYNALTIC

Copyright Synaltic 2015

What about your Data ?

4

Do you know where is your data ?

Do you know who is responsible of this specific datasets ?

Do you know from which application or task this entity was modified last friday ?

Copyright Synaltic 2015

Enterprise Data Governance

Provide a common approach to data governance across all systems and data within the organization

– Transparent

– Reproductible

– Auditable

– Consistent

Copyright Synaltic 2015

Enterprise Data Governance, in Hadoop

No specific way to address this requirement

– Each project proposes its own way to resolve data governance

– No integration with some existing entreprise frameworks for data governance

Copyright Synaltic 2015

Apache Atlas

Data classification

Metadata Exchange

Centralized Auditing

Search & Lineage

Security & Policy engine

Copyright Synaltic 2015

Apache Atlas, OverviewData Classification

● Taxonomy business-oriented annotations● Relationships between data sets and underlying elements

including source, target, and derivation processes● Export metadata to third-party systems

Centralized Auditing● Security access information for every application, process● Operational information for execution, steps, and activities

Search & Lineage (Browse)● Navigation paths to explore the data classification and

audit information● Text-based search to locate what is relevant● Visualization of data set lineage

Security & Policy Engine● Compliance policy at runtime based on data classification

schemes● Advanced definition of policies for preventing data

derivation

Copyright Synaltic 2015

Apache Atlas, Knowledge StoreKnowledge store categorized with appropriate business-oriented taxonomy

● Data sets & objects● Tables / Columns● Logical context● Source, destination

Support exchange of metadata between foundationcomponents and third-party applications/governance tools

Tech:Titan with Apache HBase

Copyright Synaltic 2015

Apache Atlas, Data Lifecycle Management

Provenance

Multi-cluster replication

Data set retention/eviction

Late data handling

AutomationTech:

● Apache Falcon

Copyright Synaltic 2015

Apache Atlas, Audit StoreHistorical repository for all governance events

● Security: Access Grant & Deny● Operational: Data Provenance &

Metrics● Indexed and Searchable

Tech:● YARN ATS, Apache HBase, Apache Hive, Solr,

ElasticSearch(Pluggable)

Copyright Synaltic 2015

Apache Atlas, SecurityEstablish global security policies based on data classification.

Copyright Synaltic 2015

Apache Atlas, Policy EngineRuntime rationalization of policies rules with respect to data asset combinations and time. Fully extensible.

● Metadata based● Geo based rules● Time-based rules● Column /Attribute Prohibitions● Preview: Hive Row and Column Masking

Tech:● Ranger

Copyright Synaltic 2015

Apache Atlas, RESTful interfaceExtensible enterprise classification of data assets, relationships and policies organized in a meaningful way -- aligned to business organization.

Supports exploration via user interface

Supports extensibility via API and CLI exposure

Copyright Synaltic 2015

A use case

Our process

ImportImport

TwitterTwitter

HDFS : Raw data

HDFS : Raw data

Data source

RéférentielRéférentiel

Collect from twitter

Hive:urlHive:url

Hive:Hash tagsHive:Hash tags

Hive:usersHive:users AnalyseAnalyse

Build social network

Hive:tweetsHive:tweets

Hive:Social network

Hive:Social network

Data Platform

Copyright Synaltic 2015

A use case

Search basedon tables

Copyright Synaltic 2015

A use case

Search basedon Services

Copyright Synaltic 2015

A use case

Table Metadata

Copyright Synaltic 2015

A use case

Lineage

Copyright Synaltic 2015

Thank you !

top related