system design and maintenance real time bidding · system design and maintenance rg-dev#3 -...

42
Real Time Bidding System design and maintenance rg-dev#3 - Rzeszów (24.11.2016)

Upload: nguyenanh

Post on 01-Sep-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Real Time BiddingSystem design and maintenance rg-dev#3 - Rzeszów (24.11.2016)

~ # whoamiKonrad Kaplita

CTO at AppliscaleSoftware Engineer (Erlang)Operations Engineer (AWS)

https://twitter.com/konradkaplitahttps://pl.linkedin.com/in/konradkaplitahttps://github.com/konradkaplita

Hire us! www.appliscale.io/contactJoin us! www.appliscale.io/career

Domain: AdTech

RTBReal Time Bidding

Every AD placeholder can be a fierce battlefield where different advertisers try

to outbid each other

A bit of a landscape...

https://adexchanger.com/venture-capital/ecosystem-map-luma-partners-kawaja/

http://www.adopsinsider.com/ad-serving/diagramming-the-ssp-dsp-and-rtb-redirect-path/

Scale

HTTP layer● Thousands concurrent HTTP requests per host● More than 50 pools of connections to DSP partners● Up to 50 connections per single HTTP pool

True learning happens on productionChris Maxwellhttps://twitter.com/wrathofchris

Issues with network infrastructure

● Undersea fiber maintenance● Rogue fiber shooters● Classical fiber cuts● ISP datacenter floods● BGP issues - “scenic packet routing” included free of

charge

Issues with external partners

● Datacenter outages● AWS misconfigurations● HTTP protocol abuse

○ 204 No Content with content○ 204 Pas de Contenu instead of No Content

● Death by Time-Wait

Challenges● Soft real time - users hate waiting for ads to display

(and eventually install adblock)● Concurrency - sending bid requests to multiple partners at the same time● Fault tolerance - we cannot crash the entire service because of a simple

bug in business logic● Fast feature delivery - simple, frequent and safe deployments● System awareness - with such big and complex system it is really hard to

know what is the status of all parts● Failure detection - when something breaks identify what was that

● Functional language● Supports actor model concurrency● Developed in 1986, open source since 1998● Soft real time● Distribution● Hot code reload● Garbage collection● Pattern matching● Runs on Erlang VM (BEAM)

http://learnyousomeerlang.com/

ConcurrencyActor Model

System is definitely

not slacking off

DevOps

It’s all about culture

What can possibly go wrong?

Monitoring

You can roll out your own solution

Until it stops to scale

Datadog

● Simple client-server architecture● Very stable agent● Lightweight on resources● Tons of out-of-the-box integrations (MySQL, SNMP)● If you have big infrastructure it will cost you● You have to decide if it’s worth it

https://www.datadoghq.com

Deployments

Let’s build some components first

Now let’s do some deployments

This can get a bit complicated after a while....

Now let’s try to modify a few jobs and keep things consistent

Jenkins job-dsl

● Keep your jenkins jobs defined as groovy scripts● Configuration as code● All job changes are tracked and versioned● PR review process for job changes (important on production!)● Simplify jobs by extracting common parts● Bootstrapping of Jenkins infrastructure is a breeze

https://github.com/jenkinsci/job-dsl-plugin

Ansible

● One of many automation frameworks out there● Simple, declarative syntax (YAML format)● No need for agent modules● Only dependency is SSH and Python● Can run deployments simultaneously to hundreds of machines

https://www.ansible.com/

λQuestions?