business track: how criteo scaled and supported massive growth with mongodb

15
How Criteo Scaled and Supported Massive Growth with MongoDB Julien SIMON Vice President, Engineering j.simon@ criteo.com @julsimon

Upload: mongodb

Post on 07-May-2015

2.922 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

How Criteo Scaled and Supported Massive Growth with MongoDB

Julien SIMONVice President, [email protected] @julsimon

Page 2: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

CRITEO

2

• R&D EFFORT• RETARGETING• CPC

PHASE 1 : 2005-2008CRITEO CREATION

• MORE THAN 3000 CLIENTS• 35 COUNTRIES, 15 OFFICES• R&D: MORE THAN 300 PEOPLE

PHASE 2 : 2008-2012GLOBAL LEADER : + 700 EMPLOYEES!

2007

15EMPLOYEES 2009

84EMPLOYEES

6EMPLOYEES

2005

2010

203EMPLOYEES

2012

+700EMPLOYEES SO FAR

2006

2011

395EMPLOYEES

2008

33EMPLOYEES

Page 3: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

3

GLOBAL PRESENCE

SYDNEY

PARIS

LONDON

BARCELONA

MILAN

MUNICH

BOSTON

NEW YORK

SAO PAULO

PALO ALTOTOKYO

SEOUL

STOCKHOLM

AMSTERDAM

15 OFFICES, 30+ COUNTRIES

CHICAGO

Page 4: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

GO GOGO

Powered by

PERFORMANCE DISPLAY

Copyright © 2013 Criteo. Confidential

A user sees products

on your website…

… and sees

personalized banners

After clicking on the banner, the user goes back to the product

page.

...then browses the internet1

2

3

4

4

Page 5: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

5

REAL-TIME PERSONALIZATION

Copyright © 2013 Criteo. Confidential.

Boutons

all original #represent

SHOP NOW

CouleursFond Disposition

WARM MEETS LIGHT

SWEET NOTHING

ADDIDAS IS ALL IN

ALL ORIGINALS #REPRESENT

Slogans

JOIN NOW

SEE MORE

CLICK HERE

“Call to action”

Lien opt-out

SEE MORE

JOIN NOW

SEE MORE

CLICK HERE

SHOP NOWSHOP NOW

JOIN NOW JOIN NOW

Page 6: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

PREDICTION & RECOMMENDATION

2 CORE TECHNOLOGIES

choose the right product to display

choose the right users / advertiser / publisher to display

RECOMMENDATION ENGINE CTR + CR

increase

PREDICTION ENGINE

Page 7: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

7

INFRASTRUCTURE

Copyright © 2013 Criteo. Confidential.

DAILY TRAFFIC

- HTTP REQUESTS: 30+ BILLION

- BANNERS SERVED: 1+ BILLION

PEAK TRAFFIC (PER SECOND)

- HTTP REQUESTS: 500,000+

- BANNERS: 25,000+

7 DATA CENTERS

SET UP AND MANAGED IN-HOUSE

AVAILABILITY > 99.95%

Page 8: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

8Copyright © 2013 Criteo. Confidential.

HIGH PERFORMANCE COMPUTING

FETCH, STORE, CRUNCH, QUERY 20 additional TB EVERY DAY ?

…SUBTITLED « HOW I LEARNED TO STOP WORRYING AND LOVE HPC »

Storm Kafka

Page 9: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

9

PRODUCT CATALOGUES

• Catalogue = product feed provided by advertisers (product id, description, category, price, URL, etc)

• 3000+ catalogues, ranging from a few MB to several tens of GB• About 50% of products change every day

• Imported at least once a day by an in-house application• Data replicated within a geographical zone• Accessed through a cache layer by web servers• Microsoft SQL Server used from day 1• Running fine in Europe, but…

– Number of databases (1 per advertiser)… and servers– Size of databases – SQL Server issues hard to debug and understand

• Running kind of fine in the US, until dead end in Q1 2011 – transactional replication over high latency links

Copyright © 2010 Criteo. Confidential.

Page 10: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

Copyright © 2010 Criteo. Confidential.

10

REQUIREMENTS FOR A NEW DB

• Scale-out architecture running on commodity hardware(aka « Intel CPUs in metal boxes »)

• No transactions needed, eventual consistency OK • High availability• Distributed clusters, with replication over high latency links• Requestable (key-value not enough)• Open source

… with active user community

… backed by a stable organization with long-term commitment (not one guy in a garage)

… no licence fees for production use

… commercial support available at reasonable cost

• Easy to learn, (re)deploy, monitor and upgrade• « Low maintenance » (don’t need a 10-people team just to run it)• Multi-language support • Ability to export everything to Hadoop multiple times per day

Page 11: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

Copyright © 2010 Criteo. Confidential.

11

FROM SQL SERVER TO MONGODB

• Ah, database migrations… everyone loves them

• 1st step: solve replication issue– Import and replicate catalogues in MongoDB– Push content to SQL Server, still queried by web servers

• 2nd step: prove that MongoDB can survive our web traffic– Modify web applications to query MongoDB– C-a-r-e-f-u-l-l-y switch web queries to MongoDB for a small set of catalogues– Observe, measure, A/B test… and generally make sure that the system still works

• 3rd step: scale !– Migrate thousands of catalogues away from SQL Server– Monitor and tweak the MongoDB clusters– Add more MongoDB servers… and more shards– Update ops processes (monitoring, backups, etc)

Page 12: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

12

OUR MONGODB DEPLOYMENT

• Europe– 18 3-server shards (1+1+1)– 800M products, 1TB– 1B requests/day (peak at 40K/s)– 350M updates/day (peak at 11K/s)

• US– 14 4-server shards (2+2)– 400M products, 650GB

• APAC– 12 3-server shards (2+1)– 300M products, 500GB

• 146 servers total : 2.0 (+ Criteo patches) 2.2 2.4.3

Copyright © 2010 Criteo. Confidential.

Page 13: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

13

MONGODB, 2+ YEARS LATER

• Stable (2.4.3 much better)• Easy to (re)install and administer• Great for small datasets (i.e. smaller than server RAM)• Good performance if read/write ratio is high• Failover and inter-DC replication work (but shard early!)

• Performance suffers when :– dataset much larger than RAM– read/write ratio is low– Multiple applications coexist on the same cluster

• Some scalability issues remain (master-slave, connections)• Criteo is very interested in the 10gen roadmap

Copyright © 2010 Criteo. Confidential.

Page 14: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

14

THANKS A LOT FOR YOUR ATTENTION!

Copyright © 2013 Criteo. Confidential.

www.criteo.comengineering.criteo.com

Page 15: Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB