big data in the advertising industry (by michael dewhirst) - big data tech hangout - 2013.10.26
DESCRIPTION
On Saturday, 26 of October, the second external meeting of Tech Hangout Community took place in Creative Space 12, the cultural and educational center based in Kiev! The event was held under the motto «Discover the value of Big Data!» * Tech Hangout -- an event, organized by the developers for the developers for knowledge and experience sharing. The concept of the event proposes a 30-minute report on the topic previously defined, and the discussion of the same duration in a roundtable session format. This initiative has proved to be so popular and high-demand that Tech Hangout own logo, blog and group on Facebook with the opportunity to discuss information heard have been created in a short period of time. Join to discuss - https://www.facebook.com/groups/techhangout/ Read us - http://hangout.innovecs.com/TRANSCRIPT
![Page 1: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/1.jpg)
Big Data in the
advertising industryMichael Dewhirst
Captify CTO; StrikeAd, DevZeroG co-founder
freediver, rock climber, photographer
![Page 2: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/2.jpg)
Who am I?Born Moscow, Russia
UK from 1991
Working in Kiev (from London) since 1999
In IT/Software (professionaly) since 1994
Ex Java, HTML/JS, ABAP/SAP, .NET (shhh..), Notes, etc developer
Working with Big Data since 2010
Freediving and rockclimbing when not working
![Page 3: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/3.jpg)
CompaniesStrikeAd (2010-2013: CTO, Co-founder)
Mobile advertising media DSP / trading platform
Processing 10’s of BN requests/month
Several “Big Data” solutions in place
Launched in 2010 (co founded)
Captify (2013-now: CTO)
Search re-targeting company
Processing 10’s of BN requests/month
Complex “dual” traffic and data workflow
Launched R&D dpt 2 months ago
![Page 4: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/4.jpg)
Why is Big Data so key?
Pretty much everything in a business revolves around data and understanding it and there is exponentially more data every day to understand
![Page 5: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/5.jpg)
What is Big Data
What is big data and what solutions can be classed as such?
![Page 6: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/6.jpg)
What is Big Data
“Internet scale” / Billions of transactions a month
2000-5000+ QPS (queries per second)
![Page 7: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/7.jpg)
What is Big Data
Processing time of under a second per transaction
Usually sub- 100ms
![Page 8: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/8.jpg)
What is Big Data
Ability to aggregate, report and analyse processed data
in near real time or real-time
![Page 9: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/9.jpg)
What data?Ad slots
Impressions
Clicks
Actions/conversions
Tracking pixels
Data feeds / databases
User ID
IP address
GPS lat long
Site category
Site URL
Age
Gender
Income
Connection type (mobile / wifi)
etc
![Page 10: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/10.jpg)
The Challenge
![Page 11: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/11.jpg)
The Challenge
A lot of volume
which needs retrospective accessquickly
(s)
![Page 12: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/12.jpg)
Architecture, Design,
Solutions
![Page 13: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/13.jpg)
Typical architecture
Modules/components:
1. Load Balancing
2. Actual processing
distributed identical workers
3. Logging
4. ETL (Extract Transform Load)
Processing logs, summarising/aggregating by keys
5. Aggregated data
6. “Big DataBase” (sometimes x2)
7. Machine learning
![Page 14: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/14.jpg)
Big Data specific featuresLoad balancing
By geo - routing requests to nearest data centre
By load - usually round robin evenly distributing traffic between available nodes
DNS or software based (or both)
![Page 15: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/15.jpg)
Big Data specific features
Storage RW/RO
In-mem only for real time data (sub 100ms access)
On disk for near-line, non-”realtime” access
![Page 16: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/16.jpg)
Big Data specific features
Storage - in-mem (fast) - Sharding
Splitting data across several nodes (e.g. “A-C” - node1; “D-F” - node2, etc) - whole DB does not fit in one server memory
Hashing request data to determine storage node
2 tier architecture:
1) Load balancing tier evenly distributing traffic between available nodes - each LB is identical
2) Data storage tier, only processing relevant requests, each node only stores it’s chunk/shard of entire “spread out” DB
![Page 17: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/17.jpg)
Sharding architecture
![Page 18: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/18.jpg)
Dynamic scalingCloud based hosting charges are usually time based
Local continental data centres are needed
Traffic usually fluctuates significantly during the day, week, month and year
Cloud based hosting allows quick server/instance commissioning / decommissioning
Instances can be added as traffic trends grow and removed as they drop to save cost
![Page 19: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/19.jpg)
Other areasAutomatic node updating (there can be 100’s to manage)
Monitoring and alerting (load, space, errors, etc)
Burn in - testing new code on a small cluster before upgrading whole network
Good security - firewalls, local user/file access, etc
Avoid having single points of failure
Old log near-line storage (e.g. Amazon Glacier)
![Page 20: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/20.jpg)
Architecture, design, solutions
Any other “modules”?
![Page 21: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/21.jpg)
Machine learning
![Page 22: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/22.jpg)
What is machine learning?
Automated, algorithmic statistical data analysis and pattern detection
![Page 23: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/23.jpg)
What?!
![Page 24: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/24.jpg)
Used in advertising?
To help find repeatable actions with lowered risk and high expected outcome certainty
![Page 25: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/25.jpg)
Meaning...Finding links between ad properties to buy more clicks or actions, e.g.
ad shown on site a, during lunch time, ad size 320x600, user from London, etc - CPC likelihood of 10%
user with iPhone, in Central Kiev, having been to dance club sites - 30% likelyhood of conversion to taxi advertising
![Page 26: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/26.jpg)
Vendors and solutions
![Page 27: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/27.jpg)
Vendors and solutionsApache Hadoop
Nginx
Erlang, OTP, etc
Aerospike
MongoDB
Amazon Redshift
Google Big Query
Dynamo
PostgreSQL
Memcache
Xtremedata
![Page 28: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/28.jpg)
Vendors and solutionsDynDns
Nustar DNS
Nustar Quova Geo DB
Amazon Route53
Amazon Load Balancing
![Page 29: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/29.jpg)
Real world examples
• Companies who have big data at their core
Google AdX / Double click
Online and mobile Advertising Exchange
Ad serving
Criteo
![Page 30: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/30.jpg)
Conclusions
A complex, specialised industry and software development sub-category
Technically challenging by an order of magnitude
NOT only for “special” people - anybody can get in - I did
Genuinely interesting to work in
![Page 31: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/31.jpg)
Questions?
![Page 32: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26](https://reader033.vdocument.in/reader033/viewer/2022061218/54b7022a4a79595c3f8b4583/html5/thumbnails/32.jpg)
The end
Thank you!