big data governance in hadoop environments with cloudera navigatorfeb2017meetu
Post on 13-Apr-2017
144 Views
Preview:
TRANSCRIPT
Agenda
● Introduction
● What is data governance and why should you care about it?
● What is Cloudera Navigator and how does it fit in?
● Cloudera Navigator Demonstration
● What’s new in the latest release of Cloudera 5.10?
Where are You with Hadoop? 1/2
Your relationship with Hadoop ...
● Still learning
● Evaluating distributions
● Testing / Development / Prototyping
● In production
Where are You with Hadoop? 2/2
Your are using / planning to use Hadoop in ...
● Banking
● Telecom
● Healthcare
● Media / entertainment
● Internet services ...
Data Governance...
“... refers to the overall management of the availability, usability, integrity, auditability ,and security of the data employed in an enterprise.
A sound data governance program includes a governing body, a defined set of procedures and policies, and a plan to execute them.
Data governance is used by organizations to exercise control over processes and methods used by their data stewards in order to improve data quality.”
Cloudera Big Data Maturity Survey 2016
https://goo.gl/d3A0ps
Data Governance & Challenges● Compliance Officers: how to track, understand, and protect access to sensitive data?
○ Am I prepared for an audit?○ Who’s accessing what data?○ What are they doing with the data?○ Is sensitive data governed and protected?
● Data Stewards and Curators: how to manage and organize data assets at Hadoop scale?○ How to efficiently manage the data lifecycle from ingest to purge?○ How to classify data efficiently?○ How to make data available to end users efficiently?
● Data Scientists and BI Users: how to effortlessly find and trust the data that matter the most?○ How can I explore data on my own?○ Can I trust what I find?○ How to find related data sets?
● Hadoop Administrators and DBAs: how to boost user productivity and cluster performance?○ How is data being used today?○ How can I optimize for future workloads?
Your Hadoop data management concern is...
● Compliance, e.g. EU General Data Protection Regulation
(GDPR)
● Stewardship (lifecycle management)
● Curation (metadata tagging)
● Enabling end-user self-service
● Administration (optimization)
● Other
Cloudera Navigator Governance Foundation
Unified Auditing Comprehensive Lineage
Unified Metadata Universal Policies
Cloudera Navigator
● Trusted for production: deployed at 100s of customers in
various industries, running in production for 4 years
● Compliance-ready: Cloudera is the first Hadoop
distribution that passed an independent PCI audit
● Integrates well with industry-leading partner solutions
Integration with Others 2/2https://github.com/cloudera/navigator-sdk
What’s new in Cloudera 5.10 (1/3)
● Comprehensive Governance for the Cloud
○ Cataloging, metadata management, and
comprehensive lineage for data on Amazon S3
○ The only big data governance solution for data
stored on-premise as well as in the cloud
What’s new in Cloudera 5.10 (3/3)● Policy-based business metadata assignment and validation
● Major performance optimizations
● Refreshed look-and-feel for increased data stewardship
productivity
● Solr indexing has been optimized to improve search speed
and reduce memory requirements.
top related