concevoir un data lake ouvert pour · build a data lake inventory •manage change with version...
TRANSCRIPT
![Page 1: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/1.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
26 Juin 2017
Concevoir un Data Lake ouvert pour l’entreprise
© 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 2: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/2.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PrésentateurAbass Safouatou, Solutions Architect, Amazon Web Services
Seif Eddine Abbassi, Solution Engineer, Talend
![Page 3: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/3.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda• Les services data management et data lake d’AWS
• Solutions Talend pour la data gouvernance dans un contexte data lake
• Les challenges rencontrés par Beachbody
• Retour d’expérience Beachbody avec AWS & Talend
• Q&A
![Page 4: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/4.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Learning Objectives:1. How to migrate a variety of structured and unstructured data sources to a
data lake
2. How to shorten development and testing cycles
3. How to mitigate complex deployment challenges common to real-time data
4. How to take advantage of Spark and Hadoop by generating native code
![Page 5: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/5.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Data Lake and AWS
Drive business value with any type of data
![Page 6: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/6.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Legacy Data Warehouses & RDBMS
• Complex to setup and manage
• Do not scale
• Takes months to add new data sources
• Queries take too long
• Cost $MM upfront
![Page 7: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/7.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Should I Build a Data Lake?
Starting by amassing "all your data" and dumping into a large repository for the data gurus to start finding "insights" is like trying to win the lottery by buying all the tickets
![Page 8: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/8.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building a Data Lake on AWS
![Page 9: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/9.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rethink How to Become a Data-driven Business
• Business outcomes - start with the insights and actions you want to drive, then work backwards to a streamlined design
• Experimentation - start small, test many ideas, keep the good ones and scale those up, paying only for what you consume
• Agile and timely - deploy data processing infrastructure in minutes, not months. take advantage of a rich platform of services to respond quickly to changing business needs
![Page 10: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/10.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Business Case Determines Platform Design
Ingest/
Collect
Consume/
visualizeStore
Process/
analyze
Data
1 40 9
5
Answers &
Insights
START HEREWITH A BUSINESS CASE
![Page 11: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/11.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Experiment and Scale Based on Your Business Needs
MATCHAVAILABLE DATA
Metrics and
Monitoring
Workflow
Logs
ERP
Transactions
Ingest/
Collect
Consume/
visualizeStore
Process/
analyze
Data
1 40 9
5
Answers &
Insights
![Page 12: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/12.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Business Outcomes on a Modern Data Architecture
Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and create new digital services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven automation, fraud detection
Outcome 4 : Automate for expansive reach
• Automation of business processes and physical infrastructure
![Page 13: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/13.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use an Optimal Combination of Highly Interoperable Services
![Page 14: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/14.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why Amazon S3 for Modern Data Architecture?
Designed for 11 9s
of durability
Designed for
99.99% availability
Durable Available High performance▪ Multiple upload
▪ Range GET
▪ Store as much as you need
▪ Scale storage and compute
independently
▪ No minimum usage commitments
Scalable
▪ Amazon EMR
▪ Amazon Redshift
▪ Amazon DynamoDB
▪ Amazon Athena
IntegratedEasy to use
▪ Simple REST API
▪ AWS SDKs
▪ Read-after-create consistency
▪ Event notification
▪ Lifecycle policies
![Page 15: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/15.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Decouple Storage and Compute
• Legacy design was large databases or data
warehouses with integrated hardware
• Big Data architectures often benefit from
decoupling storage and compute
![Page 16: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/16.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Improving Data Agility with Talend
![Page 17: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/17.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stakes are Ever Higher with Big Data
Companies that plan on
increasing spending on
analytics and making data
discovery a more significant
part of the architecture
Revenue from Big data and
analytics applications, tools
and services
Big data projects that will fail to
deliver against expectations
1/273%50%
![Page 18: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/18.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes: The First Phase and the Future
First phase: Capture and store raw data of many
different types at scale
Next phase: Augment enterprise data warehousing
strategies
![Page 19: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/19.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why Data Lake Projects Fail
No DevOps
Practices for
Scalability &
Testing
Lack of Expertise Siloed
Operating Model
Poor Data
Governance
Poor Architectural
Design &
Integration
![Page 20: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/20.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Foundational Elements of a Data Lake
Data Preparation
Self-service Data IngestMetadata Management
Data Classification
Data Lake
Data GovernanceData Lineage
Security Data Profiling
![Page 21: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/21.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streamline DevOps Process for Big Data
Custom Cluster Configuration
• Retrieve Hadoop configuration
data from job server
• Upload configuration files to
different clusters based on role:
dev/test/prod
• Enforce uniform security
standards
• Available for Spark and Spark
Streaming jobs
Portable integration jobs across your environment
![Page 22: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/22.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data Matching and Machine Learning on Spark
• New Data Stewardship interface simplifies matching process
• Improved performance through continuous matching speeding time to insight
Harmonize data at scale by learning from your key experts
![Page 23: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/23.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Faster and Better Real-time Insights with Spark
Enterprise – class robustness and Intelligent Integration
• New Spark Support
• Production-ready with Spark 2.2
• Toggle between Spark 1.X and 2.X
• Easily upgrade to Spark 2.X
• Natural Language Processing with Spark
• Data Preparation for Spark Streaming
• Talend Data Mapper runs with Spark Streaming
• Spark Streaming support for Kerberized Kafka 0.10
![Page 24: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/24.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data Governance
Complete End-to-end Data Lineage
• Understand more about your unstructured data with new cloud and big data metadata bridges
• Save time by automatically harvesting data structures to build a data lake inventory
• Manage change with version control and notifications
Metadata bridges
S3, Hadoop HDFS, Hive, MongoDB,
Couchbase, Cassandra, Apache Atlas
Files systems
Amazon S3, Hadoop HDFS, Unix, Windows, Linux
File formats
CSV, Excel, JSON, Avro, Parquet
Know Your Data for Increased Data Protection, Accessibility and Compliance
![Page 25: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/25.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beachbody – Fitness goes Big Data
Driving innovation with Talend on AWS
![Page 26: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/26.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
About Beachbody
• A leading provider of fitness, nutrition, and weight-loss programs
• Creator of P90X® Series, INSANITY®, FOCUS T25®, 21 Day Fix®, Body Beast®, PiYo®, and Hip Hop Abs®
• Empowers over 23 Million customers
• Supports 350K+ independent “Coach” distributors
• Operates with 800+ employees
• Sees 5 million+ monthly unique visits across digital platforms
• Reached $1 billion in gross sales in 2015
![Page 27: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/27.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Challenge - Do More Better, Faster, Cheaper
![Page 28: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/28.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Project Vision – Open Enterprise Data Lake
• Build an OPEN Enterprise Data Platform
• Open Source Technology: Bring Your Own Tool
• Decentralized Data Ownership: Many teams can publish
• Centralized people, processes, and tools available
• Capture All Data as real-time as possible
• Access to All raw + processed data by Authorized Users
• HIPAA/PII encrypted or masked to for compliance
• Shift Time from Collecting Data -> Analyzing Data
![Page 29: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/29.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Technology
![Page 30: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/30.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake Architecture
Amazon S3 Data Lake folder structure
AWS
![Page 31: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/31.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architectural Design
![Page 32: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/32.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake Component - Storage
![Page 33: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/33.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake Component – Data Pipeline
![Page 34: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/34.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake Component – RDBMS
![Page 35: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/35.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake Component – Compute
![Page 36: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/36.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake Component – Analytics
![Page 37: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/37.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Business Benefits
• Reduced Data Acquisition Time by 5x
• Improved Marketing Campaigns
• Reduced Site Tagging Costs
• Improved Employee Retention and Satisfaction
• Automated Customer Self-Service Order Status
• Identified Web Funnel Conversion Opportunities (testing now)
![Page 38: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/38.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Next steps and further information
• Data Lake solution on AWS:https://aws.amazon.com/big-data/data-lake-on-aws/
• Take a Free 30-Day Trial of Talend Integration Cloud:https://iam.integrationcloud.talend.com/idp/federation/up/login
• Try AWS for free:https://aws.amazon.com/
![Page 39: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/39.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Q & A
![Page 40: Concevoir un Data Lake ouvert pour · build a data lake inventory •Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase,](https://reader033.vdocument.in/reader033/viewer/2022042219/5ec55f22333851662b469c76/html5/thumbnails/40.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!