© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Abderrahmane Belarfaoui
CDO @ Euronext
Paul de Monchy
Senior Consultant @ AWS Professional Services
BDM Replacement
Automation and task orchestration
for Big Data at Euronext
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Euronext at a glance
• Euronext is the leading pan-European exchange in the Euro zone with
over 1,300 issuers worth c.€3.7* trillion in market capitalisation.
• Euronext operates regulated and transparent equity and derivatives
markets, offering market participants a comprehensive range of
services to meet their needs.
• Since June 2014, Euronext is an independent and listed company with
a market capitalisation of c.€4 billion. Since its IPO, Euronext stock
price Multiplied by 3 to c.€57**.
• Following its successful carve out from ICE and recent IPO, Euronext
entered in a new phase and announced to the market in May 2016 its
strategic plan ‘Agility for Growth’ for the period 2016-19.*as of end March 2018
**as of 11 June 2018
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
All the roads lead to Data Governance
… …
…
DataStrategy
Delivering Agility for Growth will
require a step change in our data
analytics capabilities
Comply with Regulatory Requirements
Research - businesses need to carry out in-depth analysis for activity to approach every decision by analysing relevant data
Optiq will radically transform the data production flows in the coming month and will require a new data architecture to fully leverage capabilities, e.g. real time
Meets clients market data needs by enhancing data services & introducing of the new data products
Through acquisitions and mergers, a set of data is growing as fast as the turnover
Internal Demand External Expectations
DataGovern-
ance
MifidIIGDPR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Products on the top of the datalake: Combined with roll-out of data governance and reinforcement of internal skills (ex. on AI)
Data foundation
Euronext data
warehouse on
cloud Data shop
Core Euronext data
In 2018, with Optiq and
cloud from day +1 to
near real time opening
new field of
opportunities
Data
sandboxes
With AI
Capabilities
Clients
Regulators
Businesses
Finance
…
ClientsData
scientists
New service
enabled by
Datalake – pay per
use Euronext and
external data
Orders, trades, referential, technical data, post trade data,
3rd party data, …
…
Future use
cases
e.g.
surveillance
Advanced
analytic
products
Etc…
Innovation
sandboxes to
quickly validate
new data services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
State of the art
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On-Premise
Netezza DBs
- SygmaX
- Cash
- Derivative
- Saturn
FSIA
(Enx files from
source application)
Genio
Talend3- Files sent to regulator and partners
• 4 IBM Netezza Databases
• 74TB of compressed data
(including 50TB of archives)
• 3.5TB/month for Batch
• 150 ETL Genio (IBM) jobs to migrate
scheduled with Cisco Tidal
• Cash, Derivative and Saturn to be fed in
streaming starting the 6th of June
up to 1,7M messages/s
• 150 direct BI customers on Netezza
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Make Euronext a Data-centric company
with AWS services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Central StorageSecure, Cost Effective
Storage in S3
S3
Catalog & SearchAccess & Search Metadata
DynamoDB Amazon ES
Access & User InterfaceGive your users easy & secure access
API Gateway IAM Cognito
Protect & SecureUse entitlements to ensure data is secure and users identities are verified
Security Token
ServiceCloudwatch Cloudtrail KMS
Athena Quicksight EMR Redshift
Processing & AnalyticsUse predictive and prescriptive
analytics to gain better understanding
Firehose Direct Connect Snowball DMS
Data IngestionGet your data into S3
quickly and securely
DATALAKE
GLUE
Data TransformationGet your data suitable for Analytics
AWS Batch AWS Step
Functions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automation and task orchestration
for ETL jobs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On-Premise Euronext account
FSIA
(Enx files from
source application)
Kafka
messages
raw
processed
refined
Enx-vpc (without public subnet)
Talend
Administration Center
1- Script triggered on file upload that will send data to S3
2- Step function is triggered and
launch Batch jobs or Ephemeral
EMR clusters
3- Processed files
are send back to S3
and Data to
RedShift
1bis- streaming process
4- Some processed
and refined files to
be sent back to
FSIO for regulator
and partner, with
ECS tasksRedshift cluster
(Kinesis or Kafka)
AWS Batch
executing
Spark jobs
Direct Connect
connexion
GitLab
Server EMR Cluster
(ephemeral)
Nexus
Server
On-Premise Euronext account
FSIA
(Enx files from
source application)
Kafka
messages
processed
refined
Enx-vpc (without public subnet)
Talend
Administration Center
Redshift cluster
(Kinesis or Kafka)
AWS Batch
executing
Spark jobs
Direct Connect
connexion
GitLab
Server EMR Cluster
(ephemeral)
Nexus
Server
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatic Triggering of Application Run
• 1/ When data are put on S3, a Step Function is automatically triggered thanks to CloudTrail and CloudWatch Events
• 2/ StepFunctions start a Lambda function that build the job command and insert it directly in the AWS Batch Job Queue
• 3/ AWS Batch Compute Environment will provision necessary spot instances in an ECS cluster for the job executions
• 4/ AWS Batch will start the jobs as ‘tasks’ on the ECS cluster and monitor them, popping the Docker image from ECR and starting a command that retrieve the Talend job zip file on S3, unzip it and execute it.
• 5/ StepFunctions launch a lambda function to see if the job ended successfully. If status is still running, it iterates the check minutes later
• 6/ If success, the StepFunctions might launch new jobs or end.
Transformed CSV
Data Lake S3
Add job to
AWS Batch
ECR
registry
AWS
CloudTrailCloudWatch
Event
AWS StepFunctionsCheck
JobStatus
AWS Batch
Bucket with
Talend jobs
zip files
Batch Compute
Environment
Job queue
Send
execution logs
to CWL
CloudWatch Logs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Job Status Poller
Using Step Functions to
launch Talend’s jobs on
Batch and/or EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
CI/CD pipeline – 1st iteration
• 1/ Triggering of the pipeline with Gitcommit actions on CodeCommit
• 2/ Build of the program with Codebuild• Build the Talend job with Talend CI
• Build the Java8 Docker Image that will execute the jobs with AWS Batch
• Publish the docker image to a private Repository on ECR
• Package the CloudFormation templates and the lambda function code to S3
• 3/ Deployment of the Gittedarchitecture with CloudFormation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
CI/CD pipeline – 2nd iteration
• 1/ Triggering of the pipeline with Gitcommit actions on GitLab
• 2/ Build of the program with Jenkins• Build the Talend job with Talend CI
• Build the Java8 Docker Image that will execute the jobs with AWS Batch
• Publish the docker image to a private Repository on ECR
• Package the CloudFormation templates and the lambda function code to S3
• 3/ Launch of the CloudFormationdeployment of the application from Jenkins
GitLab
Jenkins Build
Cloudformation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep dive
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on Git Repository content
• /TALEND_PROJECT/…
• /AWS_CICD/DockerFile
• /AWS_CICD/template.yml
• /AWS_CICD/lambda/…
• /AWS_CICD/buildspec.yml
• /AWS_CICD/config.json
• readme.txt
• Content of the Talend job project, to
be compiled with Talend CI
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on Git Repository content
• /TALEND_PROJECT/…
• /AWS_CICD/DockerFile
• /AWS_CICD/template.yml
• /AWS_CICD/lambda/…
• /AWS_CICD/buildspec.yml
• /AWS_CICD/config.json
• readme.txt
• FROM amazonlinux:2
• Install Oracle JDK 8
• Add custom scripts for the command
line execution that retrieve compiled
and prepared Talend zip files from S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on Git Repository content
• /TALEND_PROJECT/…
• /AWS_CICD/DockerFile
• /AWS_CICD/template.yml
• /AWS_CICD/lambda/…
• /AWS_CICD/buildspec.yml
• /AWS_CICD/config.json
• readme.txt
• Contains the following resource
definitions for CloudFormation:• AWS Batch
• Compute Environment
• JobQueue
• JobDefinition
• Lambda Functions
• SubmitJob{Batch/EMR}
• CheckJobStatus{Batch/EMR}
• WaitForS3Files
• CloudWatchEvent Rules
• Triggers StepFunctions on File
uploads
• StepFunctions
• ECR registry
• SSM Parameters Store
• IAM Roles
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on Git Repository content
• /TALEND_PROJECT/…
• /AWS_CICD/DockerFile
• /AWS_CICD/template.yml
• /AWS_CICD/lambda/…
• /AWS_CICD/buildspec.yml
• /AWS_CICD/config.json
• readme.txt
• Source code for
• SubmitJob{Batch/EMR}.py
• CheckJobStatus{Batch/EMR}.py
• WaitForS3Files.py
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on Git Repository content
• /TALEND_PROJECT/…
• /AWS_CICD/DockerFile
• /AWS_CICD/template.yml
• /AWS_CICD/lambda/…
• /AWS_CICD/buildspec.yml
• /AWS_CICD/config.json
• readme.txt
• Recipe for CodeBuild execution
1. Install Oracle JDK
2. Install Talend CI
3. Connect to ECR
4. Build Docker image with Dockerfile
5. Build Jobs with Talend CI and push
them to S3
6. Update and consolidate CF template
(template.yml file)
7. Push Docker image to ECR
8. Push transformed template.yml and
lambda function codes to S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on Git Repository content
• /TALEND_PROJECT/…
• /AWS_CICD/DockerFile
• /AWS_CICD/template.yml
• /AWS_CICD/lambda/…
• /AWS_CICD/buildspec.yml
• /AWS_CICD/config.json
• readme.txt
• Contains every parameter values for
the deployment of the
CloudFormation template
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on Git Repository content
• /TALEND_PROJECT/…
• /AWS_CICD/DockerFile
• /AWS_CICD/template.yml
• /AWS_CICD/lambda/…
• /AWS_CICD/buildspec.yml
• /AWS_CICD/config.json
• readme.txt
• Readme.md of the project
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Outcomes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agility
• CI-CD pipeline
• Full serverless and/or ephemeral resources
• Euronext can now experiment and innovate quickly and
more frequently
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost saving
• Redshift vs Netezza
• Kinesis vs Kafka
• Use of AWS Batch with Spot instances
• TCO: iso budget with 10x more data usage
(stream and storage)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Elasticity
• Serverless orchestration with StepFunctions
• AWS Batch and EMR
• S3 Storage
• Use of Redshift Spectrum
• Kinesis Stream, Firehose and Data Analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Breadth of functionality
• Every single identified need for this datalake has its
corresponding service on AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
https://www.euronext.com
https://aws.amazon.com
Sample code @ https://github.com/lePaulo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Appendix