ovum fireside chat: governing the data lake - understanding what's in there
TRANSCRIPT
Fireside Chat with Tony Baer, Ovum Research Developing a Strategy for Data Lake Governance
Wednesday, May 18, 2016 1:00 pm EST
Meet today’s speakers
Tony Baer Principle Analyst, Information Management, Ovum Tony Baer leads Ovum’s Big Data research area. His coverage focuses on how Big Data must become a first-class citizen in the data center, IT organization, and the business. He has a multi-disciplinary background touching the different tiers of enterprise software. He is an author and sought after speaker.
Scott Gidley Vice President of Product, Zaloni Scott is a nearly 20 year veteran of the data management software and services market. Prior to joining Zaloni, Scott served as senior director of product management at SAS and was previously CTO and cofounder of DataFlux Corporation. Scott received his BS in Computer Science from University of Pittsburgh.
• Award-winning provider of enterprise data lake management solutions:
Integrated data lake management platform
Self-service data preparation
• Data Lake Design and Implementation Services: POC, Pilot, Production, Operations, Training
• Data Science Professional Services
Delivering on the business of big data
Funded by top-tier technology investors:
Key Findings • Data lakes must be managed • Data lakes must have the capability to ingest all data &
related metadata • Data lakes will only succeed if they become shared
resources • Business users must be prepared to take responsibility
for curating data. • Maturity & readiness of tools, technologies & best
practices are works in progress • Mgmt. & governance of data lakes should be a phased
process
Ovum Big Data Report: Developing a Strategy for Data Lake Governance
Group Multi-department Enterprise
Log analytics Sentiment Analysis DW offload
Data Lake
Exploratory Analytics Line of business analytic applications Operational analytics
Data lake is later stage of Hadoop adoption
IT Data Scientists Business
Bulk storage of raw data
Exploratory Analytics Line of business analytic applications
Operational analytics
Migrate I/O-intensive operations (e.g., ELT)
“Deep” analytics (e.g. segmentation, predictive, prescriptive modeling)
Data lake use case maturity model
Availability/Reliability
(FT, HA
, Backup D
R)
Monitoring &
troubleshooting
Perimeter
Security
Data platform (Hadoop)
Query/Analytics tools, programs
Cost Optimization & Integration
Data Inventory Data Curation
Data-level security
Self-service tier
Data Lake building block Hadoop platform management
End user tool
Ovum’s data lake reference architecture
Data lake challenges and complications
• Ingestion
• Lack of Visibility
• Privacy and Compliance
• Quality Issues
• Reliance on IT
• Reusability
• Rate of Change
• Skills Gap
• Complexity
Building: Managing: Delivering:
Zaloni Confidential and Proprietary 8
Engage the business
• Discover • Enrich
• Provision
Govern the data in the lake
• Cleanse • Secure
• Operationalize
Enable the data lake
• Ingest • Organize • Catalog
Data Curation Build your library of
information
Physical Inventory Know/manage what data is in
the data lake
Data profiling, data preparation, collaborative data enrichment,
catalog, match data, derive master data, record data lineage
Business & Analytics teams Technology team
Manage data access, track data lineage, tag for security,
data retention
Manage data access, tag for security, data retention, lifecycle &
workflow, track data lineage
Collaboration key to modern data management
Data lake reference architecture Consumption
ZoneSource System
File Data
DB Data
ETL Extracts
Streaming
TransientLoading Zone
Raw Data Refined Data
Trusted Data
DiscoverySandbox
Original unaltered data attributes
Tokenized Data
APIs
Reference Data Master Data
Data WranglingData DiscoveryExploratory Analytics
Metadata Data Quality Data Catalog Security
Data Lake
Integrate to common formatData ValidationData CleansingAggregations
OLTP or ODS
Enterprise Data Warehouse
Logs(or other unstructured
data)
Cloud Services
Business AnalystsResearchersData Scientists
Zaloni Proprietary 10