achieving agility and scale for your data lakeachieving agility and scale for your data lake...
TRANSCRIPT
@isanuage
Achieving Agility and Scale for Your Data Lake
Isabelle Nuage, Product Marketing Cyril Sonnefraud, Product Management
©2017 Talend Inc #TalendConnect
Poll• Who’s using Talend Big Data today?• Who has a data lake in production?• Who is deploying or planning a data lake project within
12 months?• Who is implementing a data lake in the Cloud?
<Digital Tranformation Stats>
By end 2017, > 70% of G500
By 2020, 50% of the G2000
Digital Transformation is no Longer an Option Are You Prepared?
But only 26% of Organizations
Accenture and Forrester Digital Transformation in the Age of the Customer studyIDC Futurescape
The Data Lake is the New Digital Backbone
• Break down data silos• Structured and
unstructured • Granular data• Machine learning
Bu
sin
es
s V
alu
e• Offload EDW
• Cheaper storage
• Access to archived data
Why Create Data Lakes?
Reduce costs
Bu
sin
es
s V
alu
e
Generating new opportunities
• Offload EDW
• Cheaper storage
• Access to archived data
• Customer acquisition, retention..
• Real-time engagement
• Pricing optimization
• Demand forecasting
• Risk and fraud
• Predictive maintenance
• Smart products…
Why Create Data Lakes?
Reduce costs
7
Challenges
Complex Technology
Limited Access
DataSwamps
How to achieveAgility & Scale?
DATALAKES
#TalendConnect
People Doing it the OLD Way…
#TalendConnect
2017 Lenovo Internal. All rights reserved.
Change is the Only ConstantB
usin
ess Valu
e
Reporting MeasurementBusinessInsights
Optimization Predictive Analytics
Automation Prescriptive Analytics
Pre FY - 07FY - 07/10
FY - 11/ 12
FY - 13/ 14
FY – 15/ 17
Time
Cognitive Analytics
FY – 17/ 18
• Any innovation
• Any platform
• Any use case
• Any speed
• Any user
The Agile Data Lake
The Path to Agility
Ing
esti
on
+
basic
vis
uali
za
tio
n
Data
Qu
ali
ty
Self
Serv
ice
Data
Go
vern
an
ce
Real-
tim
e
Mach
ine
Le
arn
ing
©2017 Talend Inc #TalendConnect
ExamplesSmart Data QualitySmart Data Pipelines
Demo flow
Data Lake
Incoming Lead Data
(Raw)
Amazon EMR Cluster Data Lake
Output Lead Data (Processed)
With Segmentation
1
Ingestion with Smart Data
Quality
2
Smart Data Pipeline
with Machine Learning
©2017 Talend Inc #TalendConnect
ArchitectureGuidelines
On-premise Data Lakes
On-Premise
Data Sources
Ingest Prepare Process Access Consume
Cloud
Data Sources
Governance
Processing
Storage
On-prem Datalake
Hybrid Data Lakes
On-Premise
Data Sources
Ingest Prepare Process Access Consume
Cloud
Data Sources
Governance
Cloud Processing
Processing
Cloud Storage
Storage
On-prem Datalake
Cloud Datalake
Distribute
Cloud Data Lakes – A Concrete Example
Ingest Prepare Process Access Consume
Governance
Cloud Processing
Cloud Storage
On-Premise
Data Sources
Cloud
Data Sources
S3
EMR
Cloud Storage
Cloud Dataflow
Azure DL Store
HDInsight
The Path to Agility
Ing
esti
on
+
basic
vis
uali
za
tio
n
Data
Qu
ali
ty
Self
Serv
ice
Data
Go
vern
an
ce
Real-
tim
e
Mach
ine
Le
arn
ing
Deliver Value Along The Way
Start with quick wins & business outcome in mind
Get a cadence of constantly delivering value
Focus on game changer value drivers
Get the company onboard
Be Eligible to Win Prizes at the End of the Show!