criteo infrastructure (platform) meetup
TRANSCRIPT
Criteo Infrastructure (Platform) Meetup
22nd February 2017
Diarmuid Gill, VP R&D - Platforms
Introduction & welcome note
About Criteo
1
3 | Copyright © 2017 Criteo
Our mission
TARGET THE RIGHT USER
AT THE RIGHT TIME
WITH THE RIGHT MESSAGE
4 | Copyright © 2017 Criteo
Key Figures
18 000 PUBLISHERS90% RETENTION RATE2
+130COUNTRIES
LISTED ON THE NASDAQ SINCE
OCTOBER 2013
R&D REPRESENTS 21% OF THE WORKFORCE
2500EMPLOYEES
21 BILLIONS $3
14 000 ADVERTISERS
$1,799 million1
31OFFICES
1: REVENUE IN 20162: ANNUAL RATE 2015
3: $ OF TURNOVER GENERATED TO OUR CLIENTS - TURNOVER POST-CLICK WW FROM JANUARY TO DECEMBER 2015
How does it work ?
2
6 | Copyright © 2017 Criteo
GENERAL CONCEPT
Users visit an advertiser’s website
1
Criteo identifies the users (via cookies)
2
Users leave the advertiser’s website& browse publisher on the Internet
3
Criteo identifies users on these pages(via cookie)
4
Criteo displays an advertising banner, personalized for
each user
5
Click through directlyto the advertiser’s
page
6
@
Retargeting principles
Underlying infrastructure
3
8 | Copyright © 2017 Criteo
• 3.2B catalog items ingested/day, 6B items stored
• 3.6B cookies/device IDs seen per month
• 3.9B personalized banners/day• 49 RTBs @ 120B bid requests/day
• 3M QPS at peak• 90 Gbps bandwidth• 20K servers• 27PB of data stored• 3.6PB of data read daily• 500B log lines processed/day• 363TB of RAM in memcached, 37M req/s• 300K Hadoop jobs/day
Scale @ Criteo
9 | Copyright © 2017 Criteo
Batch processing:
• Hadoop as a Service:• 2 clusters – main + backup one for degraded mode• Cloudera CDH5• 2300 servers total (1300 + 1000), 76K vcores• 50PiB storage capacity
• Own job scheduler for improved data flow and coordination• 300k jobs per day
Hadoop @ Criteo
10 | Copyright © 2017 Criteo
Infrastructure Key Figures
Hosting Global Partners :
Sunnyvale2 PoP
500 kVA2 006 Servers
New York2 PoP
930 kVA2 793 Servers
Hong Kong2 PoP
472 kVA2 185 Servers
Paris3 Pop
1 800 kVA5 003 Servers
Amsterdam2 PoP
+2 500 kVA3 874 Servers
Tokyo2 PoP
455 kVA2 564 Servers
Shanghai1 PoP
200 kVA907 Servers
Worldwide16 PoP
~8 MVA Contracted20 526 ServersUp to 90 Gbps
3M QPS
Ashburn2 PoP
1,1 MVA1 170 Servers
Hosting Global Partners :
11 | Copyright © 2017 Criteo
Some of the many technologies used at Criteo
What does “Platforms”
mean in Criteo?
4
13 | Copyright © 2017 Criteo
Top Level Applications
Platforms
Infrastructure
SRE
Advertiser Publisher
WebScale
Prediction DynamicCreative
Recommendation
Engine• Catalog• User Events• Campaigns• Reporting
• RTB• Direct• Campaigns• Reporting
Systems
Platforms
Systems
Engine
14 | Copyright © 2017 Criteo
Analytics Platforms
Advertiser Publisher
Analytics
AX/BI
Reporting / Billing Reporting / Payments
Tonight’s programme
4
16 | Copyright © 2017 Criteo
Tonight’s menu
Bill of Fare***
1st talk: FastTrack: scaling customer integration - Nicolas Laveau, Leo-Paul Goffic & Camille Coueslant -
2nd talk: Evolution of data structures in Yandex.Metrica- Alexey Milovidov -
3rd talk: Don't take your software for granted- Cedrick Montout -
4th talk: Evolution of analytics at Criteo- Justin Coffey -
***21:05 - 22:00 Networking
Thank you!
Camille Coueslant, Léo-Paul Goffic, Nicolas Laveau
2017/02/22
Scaling customer integration
FastTrackPLACEHOLDER IMAGE
19 | Copyright © 2017 Criteo
What do we do in Criteo?
Deliver the right message to the right user at the right time
20 | Copyright © 2017 Criteo
Integration: Creatives settings
• Banners need branding• Logo• Font• Color palette
• Banners come in many formats
21 | Copyright © 2017 Criteo
Integration: Tags
• Banners are based on user intent• Tags on customer store• Different types of intent
• Home page view• Product view• Listing view• Basket• Sales
• Intent at product level
<script type="text/javascript" src="//static.criteo.net/js/ld/ld.js" async="true"></script><script type="text/javascript">window.criteo_q = window.criteo_q || [];window.criteo_q.push({ event: "setAccount", account: 666 },{ event: "setEmail", email: "[email protected]" },{ event: "setSiteType", type: "g" },{ event: "viewHome" });</script>
<script type="text/javascript" src="//static.criteo.net/js/ld/ld.js" async="true"></script><script type="text/javascript">window.criteo_q = window.criteo_q || [];window.criteo_q.push({ event: "setAccount", account: 666 },{ event: "setEmail", email: "[email protected]" },{ event: "setSiteType", type: "g" },{ event: "trackTransaction", id: "tr-56182-2123", item: [ { id: "patronus", price: 12.54, quantity: 3 }, { id: "avada-kedavra", price: 1099.99, quantity: 1 }/* add a line for each item in the user's basket */]});</script>
Home
Sales
22 | Copyright © 2017 Criteo
Integration: Product Feed
• Banners contain products• Characteristics of products are used for
recommendation• Name, description, image, price for display
<item> <g:id>0</g:id> <title>Abracadabra</title> <g:image_link> http://www.magic.com/assets/spells/abracadabra.png </g:image_link> <link> http://www.magic.com/spells/abracadabra </link> <description> Multi-purpose spell. Your companion for every occasion! </description> <g:price>625.99</g:price> <g:google_product_category>35</g:google_product_category></item>
id;title;image_link;link;description;price;google_product_category0;Abracadabra;http://www.magic.com/assets/spells/abracadabra.png;http://www.magic.com/spells/abracadabra;Multi-purpose spell. Your companion for every occasion!;625.99;Arts & Entertainment > Hobbies & Creative Arts > Magic & Novelties
XML
CSV
23 | Copyright © 2017 Criteo
Back in 2014
When the customer was seeing what he had to implement
24 | Copyright © 2017 Criteo
Back in 2014
When the technical support was seeing the first implementation
25 | Copyright © 2017 Criteo
Back in 2014
When the customer was trying to debug his implementation
26 | Copyright © 2017 Criteo
Criteo grows… fast!
This does not scale!
« Performance is everything »BUT
we need to onboard first
Clients
TS
27 | Copyright © 2017 Criteo
All is not lost!
Technology & UX to the rescue!
TagsPart 1:Tag Validation Dashboard
29 | Copyright © 2017 Criteo
Goal
Show near real-time metrics on trackers format issues Detect mismatches between the trackers and the product feed Provide fine-grained data (max 24 hours) Available for each of our clients (=worldwide)
30 | Copyright © 2017 Criteo
How
Initial trackers architecture
31 | Copyright © 2017 Criteo
How
1. Audit the tracker events2. Send this audit event to Kafka3. Consume it from Druid
32 | Copyright © 2017 Criteo
Why Druid
• Druid is an open-source column-oriented distributed data store
• Advantages:• Fast aggregation queries on huge amount of metrics• Real-time streaming ingestion• Scalable• Highly available
33 | Copyright © 2017 Criteo
1. Audit the tracker events2. Send this audit event to Kafka3. Consume it from Druid4. Query Druid from Integrate
How
34 | Copyright © 2017 Criteo
Result
TagsPart 2:Tag Debug Mode
36 | Copyright © 2017 Criteo
Tag Debug Mode
How do I make sure I send Criteo the right information from my website?
?? Fig 1: Criteo Hotline
37 | Copyright © 2017 Criteo
Tag Debug Mode
How do I make sure I send Criteo the right information from my website?
Fig 2: Happy customer
38 | Copyright © 2017 Criteo
How tags work
https://www.mvmtwatches.com/
39 | Copyright © 2017 Criteo
How tags work
https://www.mvmtwatches.com/
ld.js
40 | Copyright © 2017 Criteo
How tags work
https://www.mvmtwatches.com/
ld.js
GET /event?a=%5B30072%…
41 | Copyright © 2017 Criteo
How tags work
https://www.mvmtwatches.com/
ld.js
GET /event?a=%5B30072%…
200 OK
42 | Copyright © 2017 Criteo
Tag Debug Mode
43 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode
44 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js
if (document.location.hash == debugHash) loadLdDebug();
45 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js
ld-debug.js
if (document.location.hash == debugHash) loadLdDebug();
addDebugIframe();
46 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js
GET /event?a=%5B30072%…&debugMode=1
ld-debug.js
if (document.location.hash == debugHash) loadLdDebug();
addDebugIframe();
47 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js
GET /event?a=%5B30072%…&debugMode=1
200 OKContent-Type: application/javascript
sendDebugInformationToIframe({ audit: {
product: { image: ‘…’ },errors: […]
}});
ld-debug.js
if (document.location.hash == debugHash) loadLdDebug();
addDebugIframe();
48 | Copyright © 2017 Criteo
Tag Debug Mode
Gives you fine-grained insights on the quality of information sent Requires no technical knowlege Mirrors exactly what will be processed down the line
Feed
50 | Copyright © 2017 Criteo
Goal
Provide feedbacks ASAP on a subset of products Provide feedbacks on the whole feed Automatic format detection (Google specs) User can validate the structure of the feed User can review some products As close as possible as the daily feed import
51 | Copyright © 2017 Criteo
Full import
Daily import architecture
52 | Copyright © 2017 Criteo
Full import
Update feed processing Hadoop job to compute errors and attributes statistics
53 | Copyright © 2017 Criteo
Full import
Launch full import from Integrate, retrieve and display statistics
54 | Copyright © 2017 Criteo
Test import
Create a Marathon application that:- Stream incoming feed- Detect format- Reuse part of feed processing
Hadoop job java code- Save import & statistics in DB- Provide API to fetch statistics
55 | Copyright © 2017 Criteo
Result
56 | Copyright © 2017 Criteo
Result
Creatives
58 | Copyright © 2017 Criteo
How banners work at Criteo
• Actual humans pick predefinedlayouts, colors, CTAs
• Then those are combined with productinformation and optimized on-the-fly
Je découvre !
J’achète !× ×
×
=
59 | Copyright © 2017 Criteo
How banners work at Criteo
“Can I have drop shadows on my products?”
“I’m not sure about the pink”
“Could it autoplay loud music?”
As a result, clients worry
“What will my banners look like?”
60 | Copyright © 2017 Criteo
How banners work at Criteo
There is stuff we can’t do, and stuff we don’t necessarily want to do
“What will my banners look like?”
“Can I have drop shadows on my products?”
“I’m not sure about the pink”
“Could it autoplay loud music?”
61 | Copyright © 2017 Criteo
Creatives to the rescue
And it takes back and forth.
Our goal:• Give advertisers a preview of what it’ll look like• Give advertisers customization options• Feedback the performance impact
• 80% of advertisers validate their Creatives in < 2 minutes• 80% of advertisers don’t ask for a change
62 | Copyright © 2017 Criteo
Creatives
Bring on UX, R&D, Product, Sales, Creatives & Technical Support
63 | Copyright © 2017 Criteo
Creatives
Bring on UX, R&D, Product, Sales, Creatives & Technical Support
64 | Copyright © 2017 Criteo
Creatives
1 Education
Preview
Performance
Customization
2
3
4
1
2
3
4
Going further!And mostly faster
66 | Copyright © 2017 Criteo
eCommerce Platforms
Lots of our clients run on ready-to-use platforms that have APIs
As a result, we can completely automate the integration workflow for them!
67 | Copyright © 2017 Criteo
Shopify integration
Only 2 clicks needed!
Reduced integration time from 14 days to 20 minutes
Integration today
69 | Copyright © 2017 Criteo
How customers / technical support / we feel
70 | Copyright © 2017 Criteo
“”
• Only 25% in 2014• 66% complete
Feed in < 1h
• 43 days in 2014
• 2014: 600 integrations/quarter
• 2016: 1800 integrations/quarter
• 50% handled through Integrate
• 95% accept “as-is”• 4% accept with
performance downgrade
• Only 1% ask for modification
Nassim Aissat, Global TS
I’m in love with the Tag Debug Mode
7514d %Median integration time
Tags without help
Integrate achievements
92%Validate Creatives < 2 mn
20mnIntegration w/ Shopify App
Questions?
72 | Copyright © 2017 Criteo
73 | Copyright © 2017 Criteo
What does Black Friday mean at Criteo?
74 | Copyright © 2017 Criteo
Release freeze: trying to guarantee the stability of the platform...
... with nasty side-effects
Getting ready for Black Friday
75 | Copyright © 2017 Criteo
How to know evaluate at a glance the health of the datacenter?
Comes grafana
Monitoring the datacenter
76 | Copyright © 2017 Criteo
With specific filters, deviant machines can be spotted easily
Monitoring the datacenter
77 | Copyright © 2017 Criteo
Drilling down...
Monitoring the datacenter
78 | Copyright © 2017 Criteo
Until finding a likely culprit
Monitoring the datacenter
79 | Copyright © 2017 Criteo
And switching to micro analysis to find the root cause• Process Explorer• Profiling• Windbg• ClrMD
Monitoring the datacenter
80 | Copyright © 2017 Criteo
Load Balancing
HA Proxy
81 | Copyright © 2017 Criteo
Basic of Client Side Load Balancing
82 | Copyright © 2017 Criteo
Basic of Client Side Load Balancing
83 | Copyright © 2017 Criteo
Mixed technical specifications
84 | Copyright © 2017 Criteo
Gen8 Load test
85 | Copyright © 2017 Criteo
• This is a bullet• 2nd level bullet
Gen8 vs Gen9 servers
86 | Copyright © 2017 Criteo
Observable result
2/3
1/3
87 | Copyright © 2017 Criteo
Conclusion
Do not take your software for granted• Internal Infrastructure will change• External workload will change
… be prepared
88 | Copyright © 2017 Criteo
The Analytics Stack at Criteo
Yesterday, Today and Tomorrow with an assist from Bill MurrayJustin Coffey, Team Lead
89 | Copyright © 2017 Criteo
The Ghost of Christmas Present
What do we have now?
90 | Copyright © 2017 Criteo
Criteo: Scale of Data
• 4 Billion ads served each day
• 200+ Billion events logged each day
• 50TBs of data ingested each day
• 10 trillion records processed each day
91 | Copyright © 2017 Criteo
Criteo: Scale of the Analytics Stack
50+ TB ingested / day
2000+ jobs / day
7+PB
UnderManagement
200+ Analysts400+ Engineers
1000+Sales and Ops
92 | Copyright © 2017 Criteo
Criteo: Scaling Analysts
Sep 20
10
Nov 20
10
Jan 2
011
Mar 20
11
May 20
11
Jul 2
011
Sep 20
11
Nov 20
11
Jan 2
012
Mar 20
12
May 20
12
Jul 2
012
Sep 20
12
Nov 20
12
Jan 2
013
Mar 20
13
May 20
13
Jul 2
013
Sep 20
13
Nov 20
13
Jan 2
014
Mar 20
14
May 20
14
Jul 2
014
Sep 20
14
Nov 20
14
Jan 2
015
Mar 20
15
May 20
15
Jul 2
015
Sep 20
15
Nov 20
15
Jan 2
016
Mar 20
160
20
40
60
80
100
120
140
160
180
Analysts Hired since 2010
93 | Copyright © 2017 Criteo
Criteo: Scaling Data
7/13/1
48/3
/14
8/24/1
4
9/14/1
4
10/5/
14
10/26
/14
11/16
/14
12/7/
14
12/28
/14
1/18/1
52/8
/153/1
/15
3/22/1
5
4/12/1
55/3
/15
5/24/1
5
6/14/1
57/5
/15
7/26/1
5
8/16/1
59/6
/15
9/27/1
5
10/18
/15
11/8/
15
11/29
/15
12/20
/15
1/10/1
6
1/31/1
6
2/21/1
6
3/13/1
64/3
/16
4/24/1
6
5/15/1
66/5
/16
6/26/1
6
7/17/1
68/7
/16
8/28/1
6
9/18/1
60
20000000000
40000000000
60000000000
80000000000
100000000000
120000000000
140000000000
Growth of a Single Dataset Since July 2014
94 | Copyright © 2017 Criteo
Criteo: The Analytics Stack Today
Ad-HocAnalysis
Hadoop for primary storage and point of ingestion
Data Transformation on top of Hadoop
Hive (7PB) and Vertica (100+ TB) Data Warehouses
Ad-Hoc SQL on Hive and Vertica, Reporting on Tableau and Vertica
Orchestration via Langoustine
95 | Copyright © 2017 Criteo
Our Stack is Simple
• Few moving parts
• Purposefully built with Shiny Thing blinders on
• It's okay to not have the "latest and greatest" tech
• Good enough is, actually, always good enough
96 | Copyright © 2017 Criteo
On Shiny Things: the universe is vast
so be selective, and master what you select
97 | Copyright © 2017 Criteo
The Ghost of Christmas PastBefore we continue, a quick history lesson of how we got here is in order...
98 | Copyright © 2017 Criteo
Everything starts somewhere
and it's not always pretty.
99 | Copyright © 2017 Criteo
In early 2013, you could use SQL Server…
AdServer_Db
Publisher_DbLogStatus_Db
BlogWidgetStat_Db
BlogWidgetAdStat_dbTraffic_custom_dbExtranet_DbTraffic_custom_db
CATEGORY_DB
Mail_MonitorDB
Inventory_Db
AdServerBo_Db
AdServerStat_Db
DashBoard_DB
Dashboard_Security_DB
WebServerStat_db
ABTesting_DB
AdvertiserFatigueStats_db
ADVERTISING_DB
StatPrediction_DB
CAST_DB
CriteoRefdb
ImportDB
RISK_DBGalacticaStats_DBMaxCpc_DB
UserProfilingDB
WorkflowPersistency_db
CAST_DB_HOURLYStatEngine_Db
Crawler_Db
BICustom_DB
Lookalike_DB
Widget_db
AOC_DB
AOC_DB
Build_Deploy_Fake_db
publisher_stats_db
TestFwk_Db
LogMonitorDb
ADMINLOGS_DB
SqoopExport_db
FraudDetection_db
HPClink_DB
DW_DB
tsuissesbenl_stat_dbHeyokr_Stat_dbkiabiit_stat_dbUltaus_Stat_dbCrutchfieldus_Stat_dbForzierijp_Stat_dbRetailchoiceuk_Stat_dbRyanairhotelses_Stat_dbSpeakyplanetfr_Stat_dbAutowayjp_Stat_dbSicilianobr_Stat_dbJukenhousingjp_Stat_dbCosyforyoufr_Stat_dbTripadvisorru_Stat_dbLinasmatkassese_Stat_dbEllepassionsfr_Stat_dbSkyde_Stat_dbSwimdoctormallkr_Stat_dbSitescoutbr_Stat_dbTravelzoousnewusers_Stat_dbPlatekompanietno_Stat_dbTestaoc110413frcom_Stat_dbMegapoolnl_Stat_dbElektrototaalmarktnl_Stat_dbIntersportuk_Stat_dbUsineadesignfr_Stat_dbLekmerno_Stat_dbVuelingit_Stat_db
Valuedopinions_Stat_dbForzierino_Stat_dbArtisantiuk_Stat_dbIdbusit_Stat_dbCocostorykr_Stat_dbArtnaturejp_Stat_dbByggmaxse_Stat_dbCorporatecriteopmit_Stat_dbAramisauto_Stat_dbMigoaes_Stat_dbDegrotespeelgoedwinkelnl_Stat_dbDiorcouturit_Stat_dbKaufuniquede_Stat_dbCodigallerykr_Stat_dbMandarinaduckfr_Stat_dbComarketingorangenokiafr_Stat_dbSinbiangkr_Stat_dbCheapflightsuk_Stat_dbUndergirlkr_Stat_dbAgradinl_Stat_dbKofferprofide_Stat_dbDomodipl_Stat_dbMandarinaduckat_Stat_dbMobilegermany_Stat_dbChlit_Stat_dbSpreadshirtuk_Stat_dbCasalrunningfr_Stat_dbBloomfm_Stat_db
Hotelsbe_Stat_dbStrumentimusicaliit_Stat_dbBathroomworlduk_Stat_dbVerivoxde_Stat_dbMcmkr_Stat_dbViaggiedreamsit_Stat_dbBrille24de_Stat_dbYjgakuseikaikan_Stat_dbStylepitnl_Stat_dbCvlibraryrecruiter_Stat_dbPreis24de_Stat_dbTigershedsuk_Stat_dbDuvetandpillowuk_Stat_dbNoths_Stat_dbWizwidkr_Stat_dbTicketonlinede_Stat_dbLifestyleeuropeuk_Stat_dbShopeccose_Stat_dbSwanhellenicuk_Stat_dbDeguisementdiscountfr_Stat_dbFreshcottonnl_Stat_dbTikamoonfr_Stat_dbTestfp1_Stat_dbwarehouse_stat_dbHisjeans_Stat_dbMountfieldlawnmowers_Stat_dbSitescoutnl_Stat_dbLancomeus_Stat_db
Brandelijp_Stat_dbMesdessousfr_Stat_dbBeautyplanningjp_Stat_dbLgcobrandingpriceminister_Stat_dbStockngous_Stat_dbKickzde_Stat_dbRockymountaindecorus_Stat_dbCellbesse_Stat_dbYvesrocheres_Stat_dbToshibadirectjp_Stat_dbSeneukr_Stat_dbWaterfeaturesuk_Stat_dbCottagesforyouuk_Stat_dbCamif_Stat_dbLojaskdbr_Stat_dbHipmunkhotels_Stat_dbSorteonline_Stat_dbEdiets_Stat_dbBonsportru_Stat_dbJobjsenjp_Stat_dbRedcoonit_Stat_dbHmuk_Stat_dbSrtestcetelem2_Stat_dbIamprettykr_Stat_dbLebunnybleushopkr_Stat_dbCondenastit_Stat_dbHotusaes_Stat_dbChilitvit_Stat_db
Hellinefr_Stat_dbCobrasonfr_Stat_dbmadeindesign_stat_dbMegagadgetsnl_Stat_dbTodaofertabr_Stat_dbbulbus_Stat_dbCalcioshopit_Stat_dbEdenlyes_Stat_dbRecruiterucajp_Stat_dbEngelhornde_Stat_dbSpreadshirtno_Stat_dbDusparstde_Stat_dbTabletbr_Stat_dbVentesecretfr_Stat_dbVenteunique_Stat_dbDellchde_Stat_dbDressforlessnl_Stat_dbMultipopkr_Stat_dballheartus_Stat_dbTrovitdejobs_Stat_dblesjeudisfr_stat_dbExpediaukcrosssell_Stat_dbFurniturebrituk_Stat_dbYooxbe_Stat_dbSkyscannerno_Stat_dbBluetomatoat_Stat_dbMechakaitaijp_Stat_dbDestinationlightingus_Stat_db
and 10K+ more
100 | Copyright © 2017 Criteo
SQL Server was Production Infrastructure
• Analyst access to data was an afterthought
• Production databases were not designed for analytics
• Reports and queries were tightly coupled to production
• UX was low and Analysts occasionally broke production systems!
101 | Copyright © 2017 Criteo
Hive also made an early appearance…
2013-04-22 11:28:59,942 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:01,010 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:02,071 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:03,134 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:04,876 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:05,112 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:06,047 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:06,984 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec
ZZZZ…
102 | Copyright © 2017 Criteo
But Hive was also an afterthought
• Raw production data batch loaded with no transformations
• Query tools were non-existant
• Queries were slow and only expert analysts could run them
• UX and productivity were extremely low
103 | Copyright © 2017 Criteo
This just wasn't working!we needed a new approach
104 | Copyright © 2017 Criteo
First things firstwe need a database!
105 | Copyright © 2017 Criteo
Requirements for an Analytic Database
• It must be extremely fast
• It must be able to store our most actionable data sets• Dozens (at the time!) of TBs, now hundreds
• It must be queryable with proper SQL
• It must be deployable on hardware we specify
106 | Copyright © 2017 Criteo
Defining a Proof of Concept Evaluation
• Work with Analysts to identify key data sets
• Analyze query patterns
• Define benchmark queries
• Work with vendors to test closed source solutions
• Test OSS in-house
107 | Copyright © 2017 Criteo
The results
• Vertica struck the right balance between cost, performance and deployment options
• PoC evaluation took ~3 months
• Initial deployment took another ~3 months
• Operations ramped up over the following ~6 months
108 | Copyright © 2017 Criteo
Working with Analysts during deployment
• Analysts in the team helped define and document the data model
• They also created training materials
• Training was done in concert with engineers
109 | Copyright © 2017 Criteo
But was it a success?
• Within a year of the rollout we were able to decomission SQL server for analytics
• Today Vertica has over 100 unique ad-hoc users connected each day
• It executes hundreds of thousands of queries each day
• It is the most important piece of analytics infrastructure at Criteo
110 | Copyright © 2017 Criteo
A fresh deployment to mature infrastructure
• Vertica at Criteo has scaled from ~12TB to ~120TB (going PB soon)
• Ad-hoc users have grown from ~40 to ~200
• Reporting users have grown from ~300 to ~1500
• The number of tables has grown from ~50 to >500
111 | Copyright © 2017 Criteo
Wait, 500 tables in 3 years?
That's a lot of data modelling!
112 | Copyright © 2017 Criteo
Analysts contribute to the data model
• Engineers know how the DB works and know how to optimize a data model, but they don't always know what to put in it
• With good tools Analysts contribute to the evolutions of the data model, including schema additions and modifications
• Engineers in the team can help guide them in the finer details
• Rinse and repeat
113 | Copyright © 2017 Criteo
Side bar: We also had dashboards with SSRS
But we were told it was ugly and complicated.
We traded ugly for slow, btw, and it's still complicated
114 | Copyright © 2017 Criteo
From SSRS to Tableau and SQL Server to Vertica
• Actually, "slow" is just our current perception—we had SSRS dashboards with timeouts on the order of hours.
• SSRS served as our de facto ETL between those 10K+ SQL Server DBs
• Those SQL Server DBs were also production databases.
115 | Copyright © 2017 Criteo
So to Summarize the Past
• Analysts had to query across thousands of DBs
• Dashboards were slow and complicated
• Analytics work was strongly coupled to production
life was great back then wasn't it?
116 | Copyright © 2017 Criteo
We're done then?Not quite. Things can go awry!
117 | Copyright © 2017 Criteo
The Ghost of Christmas Future
...here's hoping it's a near future...
118 | Copyright © 2017 Criteo
Criteo is World Wide
We have hundreds of analysts spread across dozens of countries!
119 | Copyright © 2017 Criteo
Criteo has a Rich Product Offering
• Banner Ads, Mobile, In-App, Email, Search
• 10's of Thousands of Advertisers and Publishers
• Some of them very big and very demanding
120 | Copyright © 2017 Criteo
And (reminder!) our Scale Never Seems to Stop Growing
7/13/1
48/3
/14
8/24/1
4
9/14/1
4
10/5/
14
10/26
/14
11/16
/14
12/7/
14
12/28
/14
1/18/1
52/8
/153/1
/15
3/22/1
5
4/12/1
55/3
/15
5/24/1
5
6/14/1
57/5
/15
7/26/1
5
8/16/1
59/6
/15
9/27/1
5
10/18
/15
11/8/
15
11/29
/15
12/20
/15
1/10/1
6
1/31/1
6
2/21/1
6
3/13/1
64/3
/16
4/24/1
6
5/15/1
66/5
/16
6/26/1
6
7/17/1
68/7
/16
8/28/1
6
9/18/1
60
20000000000
40000000000
60000000000
80000000000
100000000000
120000000000
140000000000
Growth of a Single Dataset Since July 2014
121 | Copyright © 2017 Criteo
(reminder #2) Number of analysts hired since 2010
Sep 20
10
Nov 20
10
Jan 2
011
Mar 20
11
May 20
11
Jul 2
011
Sep 20
11
Nov 20
11
Jan 2
012
Mar 20
12
May 20
12
Jul 2
012
Sep 20
12
Nov 20
12
Jan 2
013
Mar 20
13
May 20
13
Jul 2
013
Sep 20
13
Nov 20
13
Jan 2
014
Mar 20
14
May 20
14
Jul 2
014
Sep 20
14
Nov 20
14
Jan 2
015
Mar 20
15
May 20
15
Jul 2
015
Sep 20
15
Nov 20
15
Jan 2
016
Mar 20
160
20
40
60
80
100
120
140
160
180
122 | Copyright © 2017 Criteo
What could go wrong?
123 | Copyright © 2017 Criteo
New Challenges
• With so many hungry analysts to feed and with so much volume and variety of data, Vertica's query planner is working over time
• We need to instrument and monitor more
• We need to level-up analysts' SQL skills
• And yes, finally, we do need some data governance*
*oh how I've resisted this day!
124 | Copyright © 2017 Criteo
2 Analysts and 3 Engineers ain't gonna cut it
• We have scaled up our PM team
• We are moving from a proto-CoE team to an official CoE team
• We are scaling engineering operations
125 | Copyright © 2017 Criteo
What's on the TODO list?
• Documentation, and automating it as much as possible
• Non-invasive, but very intimate query monitoring
• Workload isolation
• Query suggestions and preëmptive query blocking
126 | Copyright © 2017 Criteo
More about query inspection
• No matter how wonderful a database may be its performance comes down to how much IO it has and how much contention there is for it
• The difference between a poorly optimized query and a well optimized one for the IO subsystem can be orders of magnitude
• Better queries means more concurrent, happier users
127 | Copyright © 2017 Criteo
More about query inspection
• Vertica offers lots of ways to find out what is going on behind the scenes, but one of the best ways is to EXPLAIN your users' queries and identify
those who need to be trained!
128 | Copyright © 2017 Criteo
Recalling our Current Challenges
• Tableau Workbooks are Slow
• Vertica is Overloaded
• Reporting Data is Frequently Late
129 | Copyright © 2017 Criteo
Patches and the Arc of History
• Each of our currently challenges can be addressed in the short term
• But we need long term solutions to avoid regressions
130 | Copyright © 2017 Criteo
Tableau Relief Program (TaRP)
Short Term:• Double the cores on production server• Isolate critical workbooks
Medium Term:• Require all production workbooks to go
through gerrit/git review• Score workbook complexity pre-release• Monitor released workbooks for QoS
Not So Long Term:• Work with Product and Central Ops to create
Tableau Center of Excellence and level up BI
131 | Copyright © 2017 Criteo
TaRP: reporting alchemy
Push to production
Productive Analyst
AngrySales Person
No SLAdataset
Productive Analyst
HappySales Person
SLAdataset
Push to review Automated deploy
Knowledgeable Analyst
Compliance checks
passed
Peer-reviewed
132 | Copyright © 2017 Criteo
Why impose a dev cycle on report building?
not to be trite, but, well:
that's good money!
133 | Copyright © 2017 Criteo
More seriously
• Tableau workbooks consume data
• Data comes in all sorts of volumes and velocities (sorry)
• Data query complexity is linked to workbook complexity and features
• If you don't know what you're doing, your workbooks will be:• slow, because of internal workbook complexity• slow, because of complex database queries• not be up to date if it doesn't query the proper data sources
Tableau workbook developers are developers, full stop. Treat them like they are.
134 | Copyright © 2017 Criteo
Consul
Vertica Roadmap
RTIngester
HD
FSIn
gest
er
HLL
JDB
C
VProxy
Adm
in
VIcO
JVMIngester
DataDisco
135 | Copyright © 2017 Criteo
Vertica as a Service
Short Term:• Scale out as fast as reasonable• Split reporting and ad hoc workloads• Better hardware configuration• More monitoring
Not So Long Term:• Better monitoring• Control Input: Trickle and Bulk Loading, Consistently, Durably and Efficiently• Control Output: Query inspection/prioritization, Workload management
136 | Copyright © 2017 Criteo
Fixing Your Latent Data Problem
Short Term:• Migrate critical data workflows to Langoustine• Optimize DAG and long running queries
Medium Term:• Migrate long-tail datasets to Langoustine• Better metrics, capacity planning
Not So Long Term:• Refactor data model to cull useless data sets• Better complexity analysis of workflow modifications pre-release
137 | Copyright © 2017 Criteo
We're going to need better instrumentation
Better Workflow Insights in Langoustine Better Hadoop Job Performance Metrics
138 | Copyright © 2017 Criteo
Let's spend less time making data workflows
Langoustine IDE makes building Hive workflows trivial
139 | Copyright © 2017 Criteo
Langoustine IDE promotes best practices
Workflows are source controlled:
Reviews are built-in:
140 | Copyright © 2017 Criteo
We'll need better dev tools (eg dev-cluster)
build an AWS hadoop cluster:
connect to it via a local docker container:
and load it with data saved in S3:
141 | Copyright © 2017 Criteo
SLAB: SLA Boards That Say A Lot
142 | Copyright © 2017 Criteo
Wait, what about Opera and Vizatra?didn't you guys do a lot of work on that?
143 | Copyright © 2017 Criteo
A Quick Opera Recap
Opera is the internal replacement for CPOP, built in two partsA scalding-langoustine data pipeline: And a vizatra-OLAP web app:
144 | Copyright © 2017 Criteo
We learned a lot from building Opera
• How to use SQL to describe a dashboard
• How to master SQL queries executed from an OLAP app
• How to build big, fast databases
• How to build optimal (or so we think) data processing pipelines
• How to make a decent UI with decent UX
145 | Copyright © 2017 Criteo
Let's focus on the SQL stuff
146 | Copyright © 2017 Criteo
Using SQL for dashboard meta-data
SELECT time_id as hour, country_code as country, network_id as network, SUM(clicks) as clicks, SUM(displays) as displays, SUM(clicks) / SUM(displays) as ctrFROM factsWHERE time_id BETWEEN ?start AND ?endGROUP BY time_id, country_code, network_id
Time dimensions
Dimensions
Metrics
Parameters
147 | Copyright © 2017 Criteo
Using SQL for dashboard meta-data
Time dimension
Dimensions
Metrics
Parameters
148 | Copyright © 2017 Criteo
Big-O(lap)
SELECT time_id as hour, country_code as country, network_id as network, SUM(clicks) as clicks, SUM(displays) as displays, SUM(clicks) / SUM(displays) as ctrFROM factsWHERE time_id BETWEEN ?start AND ?endGROUP BY time_id, country_code, network_id
PROJECTION Revenue by countrySELECTIONLast 7 days in EUR
149 | Copyright © 2017 Criteo
Big-O(lap)
SELECT time_id as hour, country_code as country, network_id as network, SUM(clicks) as clicks, SUM(displays) as displays, SUM(clicks) / SUM(displays) as ctrFROM factsWHERE time_id BETWEEN ?start AND ?endGROUP BY time_id, country_code, network_id
PROJECTION Revenue by countrySELECTIONLast 7 days in EUR
150 | Copyright © 2017 Criteo
Big-O(lap)
SELECT country_code as country, SUM(clicks) as clicks, SUM(displays) as displaysFROM factsWHERE time_id BETWEEN ‘2016-03-01’ AND ‘2016-03-07’GROUP BY country_code
PROJECTION Revenue by countrySELECTIONLast 7 days in EUR
151 | Copyright © 2017 Criteo
Now that we've gotten intimate with SQL...Let's see what else we can build...
152 | Copyright © 2017 Criteo
Vizatra Client: One DB Client to Rule Them All
153 | Copyright © 2017 Criteo
Vizatra Client: One DB Client to Rule Them All
• Parse every query and analyze complexity before executing it
• Enforce best practices (e.g. predicates on partitions)
• Degrade gracefully (e.g. don't submit queries to an overloaded DB)
• Score users and queries, share with other users
• Provide basic visualizations to increase analytic productivity
• Support non-SQL datasources
• And your feature?
154 | Copyright © 2017 Criteo
The End.Thanks for listening. If any of this sounds fun, we're hiring!