achieving agility and scale for your data lake achieving agility and scale for your data lake...
Post on 03-Jul-2020
1 views
Embed Size (px)
TRANSCRIPT
@isanuage
Achieving Agility and Scale for Your Data Lake
Isabelle Nuage, Product Marketing Cyril Sonnefraud, Product Management
©2017 Talend Inc #TalendConnect
Poll • Who’s using Talend Big Data today? • Who has a data lake in production? • Who is deploying or planning a data lake project within
12 months? • Who is implementing a data lake in the Cloud?
By end 2017, > 70% of G500
By 2020, 50% of the G2000
Digital Transformation is no Longer an Option Are You Prepared?
But only 26% of Organizations
Accenture and Forrester Digital Transformation in the Age of the Customer studyIDC Futurescape
The Data Lake is the New Digital Backbone
• Break down data silos • Structured and
unstructured • Granular data • Machine learning
B u
s in
e s
s V
a lu
e • Offload EDW
• Cheaper storage
• Access to archived data
Why Create Data Lakes?
Reduce costs
B u
s in
e s
s V
a lu
e
Generating new opportunities
• Offload EDW
• Cheaper storage
• Access to archived data
• Customer acquisition, retention..
• Real-time engagement
• Pricing optimization
• Demand forecasting
• Risk and fraud
• Predictive maintenance
• Smart products…
Why Create Data Lakes?
Reduce costs
7
Challenges
Complex Technology
Limited Access
Data Swamps
How to achieve Agility & Scale?
DATA LAKES
#TalendConnect
People Doing it the OLD Way…
#TalendConnect
2017 Lenovo Internal. All rights reserved.
Change is the Only Constant B
u sin
ess V alu
e
Reporting Measurement Business Insights
Optimization Predictive Analytics
Automation Prescriptive Analytics
Pre FY - 07 FY - 07/10
FY - 11/ 12
FY - 13/ 14
FY – 15/ 17
Time
Cognitive Analytics
FY – 17/ 18
• Any innovation
• Any platform
• Any use case
• Any speed
• Any user
The Agile Data Lake
The Path to Agility
In g
e s ti
o n
+
b a s ic
v is
u a li
z a
ti o
n
D a ta
Q u
a li
ty
S e lf
S e rv
ic e
D a ta
G o
v e rn
a n
c e
R e a l-
ti m
e
M a c h
in e
L e
a rn
in g
©2017 Talend Inc #TalendConnect
Examples Smart Data Quality Smart Data Pipelines
Demo flow
Data Lake
Incoming Lead Data
(Raw)
Amazon EMR Cluster Data Lake
Output Lead Data (Processed)
With Segmentation
1
Ingestion with Smart Data
Quality
2
Smart Data Pipeline
with Machine Learning
©2017 Talend Inc #TalendConnect
Architecture Guidelines
On-premise Data Lakes
On-Premise
Data Sources
Ingest Prepare Process Access Consume
Cloud
Data Sources
Governance
Processing
Storage
On-prem Datalake
Hybrid Data Lakes
On-Premise
Data Sources
Ingest Prepare Process Access Consume
Cloud
Data Sources
Governance
Cloud Processing
Processing
Cloud Storage
Storage
On-prem Datalake
Cloud Datalake
Distribute
Cloud Data Lakes – A Concrete Example
Ingest Prepare Process Access Consume
Governance
Cloud Processing
Cloud Storage
On-Premise
Data Sources
Cloud
Data Sources
S3
EMR
Cloud Storage
Cloud Dataflow
Azure DL Store
HDInsight
The Path to Agility
In g
e s ti
o n
+
b a s ic
v is
u a li
z a
ti o
n
D a ta
Q u
a li
ty
S e lf
S e rv
ic e
D a ta
G o
v e rn
a n
c e
R e a l-
ti m
e
M a c h
in e
L e
a rn
in g
Deliver Value Along The Way
Start with quick wins & business outcome in mind
Get a cadence of constantly delivering value
Focus on game changer value drivers
Get the company onboard
Be Eligible to Win Prizes at the End of the Show!