bdm replacement automation and task...

30
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Abderrahmane Belarfaoui CDO @ Euronext Paul de Monchy Senior Consultant @ AWS Professional Services BDM Replacement Automation and task orchestration for Big Data at Euronext

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Abderrahmane Belarfaoui

CDO @ Euronext

Paul de Monchy

Senior Consultant @ AWS Professional Services

BDM Replacement

Automation and task orchestration

for Big Data at Euronext

Page 2: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Euronext at a glance

• Euronext is the leading pan-European exchange in the Euro zone with

over 1,300 issuers worth c.€3.7* trillion in market capitalisation.

• Euronext operates regulated and transparent equity and derivatives

markets, offering market participants a comprehensive range of

services to meet their needs.

• Since June 2014, Euronext is an independent and listed company with

a market capitalisation of c.€4 billion. Since its IPO, Euronext stock

price Multiplied by 3 to c.€57**.

• Following its successful carve out from ICE and recent IPO, Euronext

entered in a new phase and announced to the market in May 2016 its

strategic plan ‘Agility for Growth’ for the period 2016-19.*as of end March 2018

**as of 11 June 2018

Page 3: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

All the roads lead to Data Governance

… …

DataStrategy

Delivering Agility for Growth will

require a step change in our data

analytics capabilities

Comply with Regulatory Requirements

Research - businesses need to carry out in-depth analysis for activity to approach every decision by analysing relevant data

Optiq will radically transform the data production flows in the coming month and will require a new data architecture to fully leverage capabilities, e.g. real time

Meets clients market data needs by enhancing data services & introducing of the new data products

Through acquisitions and mergers, a set of data is growing as fast as the turnover

Internal Demand External Expectations

DataGovern-

ance

MifidIIGDPR

Page 4: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Data Products on the top of the datalake: Combined with roll-out of data governance and reinforcement of internal skills (ex. on AI)

Data foundation

Euronext data

warehouse on

cloud Data shop

Core Euronext data

In 2018, with Optiq and

cloud from day +1 to

near real time opening

new field of

opportunities

Data

sandboxes

With AI

Capabilities

Clients

Regulators

Businesses

Finance

ClientsData

scientists

New service

enabled by

Datalake – pay per

use Euronext and

external data

Orders, trades, referential, technical data, post trade data,

3rd party data, …

Future use

cases

e.g.

surveillance

Advanced

analytic

products

Etc…

Innovation

sandboxes to

quickly validate

new data services

Page 5: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

State of the art

Page 6: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

On-Premise

Netezza DBs

- SygmaX

- Cash

- Derivative

- Saturn

FSIA

(Enx files from

source application)

Genio

Talend3- Files sent to regulator and partners

• 4 IBM Netezza Databases

• 74TB of compressed data

(including 50TB of archives)

• 3.5TB/month for Batch

• 150 ETL Genio (IBM) jobs to migrate

scheduled with Cisco Tidal

• Cash, Derivative and Saturn to be fed in

streaming starting the 6th of June

up to 1,7M messages/s

• 150 direct BI customers on Netezza

Page 7: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Make Euronext a Data-centric company

with AWS services

Page 8: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Central StorageSecure, Cost Effective

Storage in S3

S3

Catalog & SearchAccess & Search Metadata

DynamoDB Amazon ES

Access & User InterfaceGive your users easy & secure access

API Gateway IAM Cognito

Protect & SecureUse entitlements to ensure data is secure and users identities are verified

Security Token

ServiceCloudwatch Cloudtrail KMS

Athena Quicksight EMR Redshift

Processing & AnalyticsUse predictive and prescriptive

analytics to gain better understanding

Firehose Direct Connect Snowball DMS

Data IngestionGet your data into S3

quickly and securely

DATALAKE

GLUE

Data TransformationGet your data suitable for Analytics

AWS Batch AWS Step

Functions

Page 9: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Automation and task orchestration

for ETL jobs

Page 10: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

On-Premise Euronext account

FSIA

(Enx files from

source application)

Kafka

messages

raw

processed

refined

Enx-vpc (without public subnet)

Talend

Administration Center

1- Script triggered on file upload that will send data to S3

2- Step function is triggered and

launch Batch jobs or Ephemeral

EMR clusters

3- Processed files

are send back to S3

and Data to

RedShift

1bis- streaming process

4- Some processed

and refined files to

be sent back to

FSIO for regulator

and partner, with

ECS tasksRedshift cluster

(Kinesis or Kafka)

AWS Batch

executing

Spark jobs

Direct Connect

connexion

GitLab

Server EMR Cluster

(ephemeral)

Nexus

Server

On-Premise Euronext account

FSIA

(Enx files from

source application)

Kafka

messages

processed

refined

Enx-vpc (without public subnet)

Talend

Administration Center

Redshift cluster

(Kinesis or Kafka)

AWS Batch

executing

Spark jobs

Direct Connect

connexion

GitLab

Server EMR Cluster

(ephemeral)

Nexus

Server

Page 11: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Automatic Triggering of Application Run

• 1/ When data are put on S3, a Step Function is automatically triggered thanks to CloudTrail and CloudWatch Events

• 2/ StepFunctions start a Lambda function that build the job command and insert it directly in the AWS Batch Job Queue

• 3/ AWS Batch Compute Environment will provision necessary spot instances in an ECS cluster for the job executions

• 4/ AWS Batch will start the jobs as ‘tasks’ on the ECS cluster and monitor them, popping the Docker image from ECR and starting a command that retrieve the Talend job zip file on S3, unzip it and execute it.

• 5/ StepFunctions launch a lambda function to see if the job ended successfully. If status is still running, it iterates the check minutes later

• 6/ If success, the StepFunctions might launch new jobs or end.

Transformed CSV

Data Lake S3

Add job to

AWS Batch

ECR

registry

AWS

CloudTrailCloudWatch

Event

AWS StepFunctionsCheck

JobStatus

AWS Batch

Bucket with

Talend jobs

zip files

Batch Compute

Environment

Job queue

Send

execution logs

to CWL

CloudWatch Logs

Page 12: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Job Status Poller

Using Step Functions to

launch Talend’s jobs on

Batch and/or EMR

Page 13: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

CI/CD pipeline – 1st iteration

• 1/ Triggering of the pipeline with Gitcommit actions on CodeCommit

• 2/ Build of the program with Codebuild• Build the Talend job with Talend CI

• Build the Java8 Docker Image that will execute the jobs with AWS Batch

• Publish the docker image to a private Repository on ECR

• Package the CloudFormation templates and the lambda function code to S3

• 3/ Deployment of the Gittedarchitecture with CloudFormation

Page 14: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

CI/CD pipeline – 2nd iteration

• 1/ Triggering of the pipeline with Gitcommit actions on GitLab

• 2/ Build of the program with Jenkins• Build the Talend job with Talend CI

• Build the Java8 Docker Image that will execute the jobs with AWS Batch

• Publish the docker image to a private Repository on ECR

• Package the CloudFormation templates and the lambda function code to S3

• 3/ Launch of the CloudFormationdeployment of the application from Jenkins

GitLab

Jenkins Build

Cloudformation

Page 15: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Deep dive

Page 16: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Focus on Git Repository content

• /TALEND_PROJECT/…

• /AWS_CICD/DockerFile

• /AWS_CICD/template.yml

• /AWS_CICD/lambda/…

• /AWS_CICD/buildspec.yml

• /AWS_CICD/config.json

• readme.txt

• Content of the Talend job project, to

be compiled with Talend CI

Page 17: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Focus on Git Repository content

• /TALEND_PROJECT/…

• /AWS_CICD/DockerFile

• /AWS_CICD/template.yml

• /AWS_CICD/lambda/…

• /AWS_CICD/buildspec.yml

• /AWS_CICD/config.json

• readme.txt

• FROM amazonlinux:2

• Install Oracle JDK 8

• Add custom scripts for the command

line execution that retrieve compiled

and prepared Talend zip files from S3

Page 18: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Focus on Git Repository content

• /TALEND_PROJECT/…

• /AWS_CICD/DockerFile

• /AWS_CICD/template.yml

• /AWS_CICD/lambda/…

• /AWS_CICD/buildspec.yml

• /AWS_CICD/config.json

• readme.txt

• Contains the following resource

definitions for CloudFormation:• AWS Batch

• Compute Environment

• JobQueue

• JobDefinition

• Lambda Functions

• SubmitJob{Batch/EMR}

• CheckJobStatus{Batch/EMR}

• WaitForS3Files

• CloudWatchEvent Rules

• Triggers StepFunctions on File

uploads

• StepFunctions

• ECR registry

• SSM Parameters Store

• IAM Roles

Page 19: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Focus on Git Repository content

• /TALEND_PROJECT/…

• /AWS_CICD/DockerFile

• /AWS_CICD/template.yml

• /AWS_CICD/lambda/…

• /AWS_CICD/buildspec.yml

• /AWS_CICD/config.json

• readme.txt

• Source code for

• SubmitJob{Batch/EMR}.py

• CheckJobStatus{Batch/EMR}.py

• WaitForS3Files.py

Page 20: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Focus on Git Repository content

• /TALEND_PROJECT/…

• /AWS_CICD/DockerFile

• /AWS_CICD/template.yml

• /AWS_CICD/lambda/…

• /AWS_CICD/buildspec.yml

• /AWS_CICD/config.json

• readme.txt

• Recipe for CodeBuild execution

1. Install Oracle JDK

2. Install Talend CI

3. Connect to ECR

4. Build Docker image with Dockerfile

5. Build Jobs with Talend CI and push

them to S3

6. Update and consolidate CF template

(template.yml file)

7. Push Docker image to ECR

8. Push transformed template.yml and

lambda function codes to S3

Page 21: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Focus on Git Repository content

• /TALEND_PROJECT/…

• /AWS_CICD/DockerFile

• /AWS_CICD/template.yml

• /AWS_CICD/lambda/…

• /AWS_CICD/buildspec.yml

• /AWS_CICD/config.json

• readme.txt

• Contains every parameter values for

the deployment of the

CloudFormation template

Page 22: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Focus on Git Repository content

• /TALEND_PROJECT/…

• /AWS_CICD/DockerFile

• /AWS_CICD/template.yml

• /AWS_CICD/lambda/…

• /AWS_CICD/buildspec.yml

• /AWS_CICD/config.json

• readme.txt

• Readme.md of the project

Page 23: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Demo

Page 24: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Outcomes

Page 25: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Agility

• CI-CD pipeline

• Full serverless and/or ephemeral resources

• Euronext can now experiment and innovate quickly and

more frequently

Page 26: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Cost saving

• Redshift vs Netezza

• Kinesis vs Kafka

• Use of AWS Batch with Spot instances

• TCO: iso budget with 10x more data usage

(stream and storage)

Page 27: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Elasticity

• Serverless orchestration with StepFunctions

• AWS Batch and EMR

• S3 Storage

• Use of Redshift Spectrum

• Kinesis Stream, Firehose and Data Analytics

Page 28: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Breadth of functionality

• Every single identified need for this datalake has its

corresponding service on AWS

Page 29: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Thank you!

https://www.euronext.com

https://aws.amazon.com

Sample code @ https://github.com/lePaulo

Page 30: BDM Replacement Automation and task …awsmarketingbucket.s3-eu-west-1.amazonaws.com/2018/Summit...Kafka messages raw processed refined Enx-vpc (without public subnet) Talend Administration

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Appendix