containers #101: optimize ci/cd for big data solutions

34
Containers #101 Optimize CI/CD for Big Data Solutions Oct 2016

Upload: codefresh

Post on 12-Jan-2017

104 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Containers #101: Optimize CI/CD for Big Data Solutions

Containers #101Optimize CI/CD for Big Data Solutions

Oct 2016

Page 2: Containers #101: Optimize CI/CD for Big Data Solutions

Shimon Tolts General Manager, Data Solutions

AtomData Pipeline Processing 200B events

with Node.js And Docker On AWS

Page 3: Containers #101: Optimize CI/CD for Big Data Solutions

About ironSource: Hypergrowth

People Reached Each Month

4200Apps Installed Every Minutewith the ironSource Platform

Registered & Analyzed Data EventsEvery Month

200B

800M

50B

0

100B

150B

200B

Jun 201

5

Jul 201

5

Aug 201

5

Sep 201

5

Oct 201

5

Nov 201

5

Dec 201

5

Jan 201

6

Feb 201

6

Mar 201

6

Apr 201

6

May 201

6

Page 4: Containers #101: Optimize CI/CD for Big Data Solutions

We needed a way to manage this data:

Our Business Challenge

ProcessCollect Store

Page 5: Containers #101: Optimize CI/CD for Big Data Solutions
Page 6: Containers #101: Optimize CI/CD for Big Data Solutions

Collection

● Multi region layer - Latency based

routing

● Low latency from client to Atom servers

● High Availability - AWS regions does

fail!

● Storing raw data + headers upon

receiving

Page 7: Containers #101: Optimize CI/CD for Big Data Solutions

Data Enrichment● Enrich data before storing in your Data

Lake and/or Warehouse○ IP to Country○ Currency conversion ○ Decrypt data○ User Agent parsing - OS, Browser, Device...

● Any custom logic you would like! - fully extendible

Page 8: Containers #101: Optimize CI/CD for Big Data Solutions

Data Targets● Near real-time data insertion - 1

minute!● Stream data to Google Storage and/or

AWS S3● Smart insertion of data into AWS

Redshift○ Set the amount of parallel copys○ Configure priority on tables

● BigQuery - Streaming data using batch files import (saves 20% cost)

Page 9: Containers #101: Optimize CI/CD for Big Data Solutions
Page 10: Containers #101: Optimize CI/CD for Big Data Solutions

Micro-Services Architecture● Everything is a service● Decoupling● Distributed systems

Separate lifecycle● Communication using RESTful /

Queue / Streams

Page 11: Containers #101: Optimize CI/CD for Big Data Solutions

Docker● Linux Container● Save provisioning time● Infrastructure as code● Dev-Test-Production - identical

container● Ship easily

Page 12: Containers #101: Optimize CI/CD for Big Data Solutions

Cloud infrastructure● Pay as you go - (grow)● SaaS services ● Auto-scaling-groups● DynamoDB● RDS *SQL● Redshift data warehouse

Page 13: Containers #101: Optimize CI/CD for Big Data Solutions

Continuous Integration● From commit to production● Jenkins commit hook● Git branching model● AWS dynamic slaves● Unit tests● Docker builds● Updating live environment

Page 14: Containers #101: Optimize CI/CD for Big Data Solutions

Diagram

Page 16: Containers #101: Optimize CI/CD for Big Data Solutions

Starting PointPre-baked images - AMIs

Supervisor

Nginx reverse proxy

Node.js * cpu-count

Provisioning time * instances

Bash provisioning scripts

Page 17: Containers #101: Optimize CI/CD for Big Data Solutions
Page 19: Containers #101: Optimize CI/CD for Big Data Solutions

Minimum Viable ProductInfrastructure as code

Nginx

Node.js * cpu-count

Supervisor

Docker Hub

No Bash scripts!

No provisioning time * instances

Page 20: Containers #101: Optimize CI/CD for Big Data Solutions

https://github.com/ironSource/docker-config/blob/bb6be85b97132cbdd10084305ee1ee2f414b0b50/Dockerfile

Page 21: Containers #101: Optimize CI/CD for Big Data Solutions
Page 22: Containers #101: Optimize CI/CD for Big Data Solutions
Page 23: Containers #101: Optimize CI/CD for Big Data Solutions

Interactive CycleNginx

Supervisor

Infrastructure as code

Node.js * cpu-count

Docker Hub

No Bash scripts!

No provisioning time * instances

Page 24: Containers #101: Optimize CI/CD for Big Data Solutions
Page 25: Containers #101: Optimize CI/CD for Big Data Solutions

https://github.com/ironSource/docker-config/blob/c4bbad11a323fd6e36ff31505c43e7c8dc51b1eb/Dockerfile-iojs-cluster

Page 26: Containers #101: Optimize CI/CD for Big Data Solutions
Page 27: Containers #101: Optimize CI/CD for Big Data Solutions

User Data

Page 28: Containers #101: Optimize CI/CD for Big Data Solutions

https://github.com/ironSource/docker-config/blob/2f4ccc7c277850de928cc432f47b2fc58fb8732a/Dockerfile-nodejs-cluster

Page 29: Containers #101: Optimize CI/CD for Big Data Solutions

docker-common.yml

docker-compose.yml

https://stash.ironsrc.com/projects/INFRA-IB/repos/ironbeastcompserter/browse/docker-compose.ymlDocker Compose Example #1 (Using ‘Extends):

Page 30: Containers #101: Optimize CI/CD for Big Data Solutions

User Data

Page 31: Containers #101: Optimize CI/CD for Big Data Solutions

Docker Compose Example #2 (Using ‘links’):

Page 32: Containers #101: Optimize CI/CD for Big Data Solutions
Page 33: Containers #101: Optimize CI/CD for Big Data Solutions
Page 34: Containers #101: Optimize CI/CD for Big Data Solutions

10 MillionFree Monthly Events

Thank you!

ironsrc.com/atom

[email protected] @shimontolts