owf 2014 - take back control of your web tracking - dataiku
DESCRIPTION
Why you should probably do your own web tracking, what are the challenges. Concludes with a presentation of the WT1 open source web tracker.TRANSCRIPT
![Page 1: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/1.jpg)
www.dataiku.com
Take back control of your Web Tracking
@ClementStenacCTO, Dataiku
![Page 2: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/2.jpg)
www.dataiku.com
Give me dashboards !
![Page 3: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/3.jpg)
www.dataiku.com
Choose one
Raw dataDo what you want
Your moneyAccess to raw data is a premium feature
![Page 4: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/4.jpg)
www.dataiku.com
Who cares about raw data ?
• SAAS analytics are full-featured• Custom variables to link with your backend data
• Did you really join all data for yourfuture needs ?
• Do you have access / want to push to the JS all necessary data ?
• What kinds of analysis will you do later on ?
![Page 5: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/5.jpg)
www.dataiku.com
A real exampleSegmentation and tracking user-satisfaction
Raw tracking
data
User-level stats
User base segmentation
Metrics per segments
Tracking over time
TB
GB
![Page 6: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/6.jpg)
www.dataiku.com
User-level data
![Page 7: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/7.jpg)
www.dataiku.com
Clustering
![Page 8: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/8.jpg)
www.dataiku.com
Labeling
Search for a specific Topic
Newcomer from Google News
Foreigner Discovering The
Site
Fan who loves to comment
Home Page Wanderer
Dark Bot (Competitor?)
Here you need your business intelligence
![Page 9: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/9.jpg)
www.dataiku.com
Compute metrics per segment
Search for a specific Topic Newcomer from
Google NewsForeigner
Discovering The Site
Fan that loves to comment
Home Page Wanderer Dark Bot
(Competitor?)
0.3€ per session0.23€ acquisition costs
```
13k sessions1.3€ per session0.23€ acquisition costs
938k sessions
938k sessions0.3€ per session0.23€ acquisition costs
738k sessions0.83€ per session0.73€ acquisition costs
68k sessions0.3€ per session
1.23€ acquisition costs
1k sessions0€ per session
0€ acquisition costs
Here you need tocross with your CRM
![Page 10: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/10.jpg)
www.dataiku.com
Track metrics over time
Search for a specific Topic
Newcomer from Google News
Foreigner Discovering The
Site
Fan that loves to comment
Home Page Wanderer
Dark Bot (Competitor?)
Using your already-computed segments
Damnour latest
releasehas diverging
effects on segments
![Page 11: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/11.jpg)
www.dataiku.com
A few other examples
• Churn prediction and explanation
• Customer lifetime value prediction
![Page 12: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/12.jpg)
www.dataiku.com
OK
I WANT TO DO IT
![Page 13: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/13.jpg)
www.dataiku.com
So, I have these Apache logs
• First level of web tracking• "Nothing required"
![Page 14: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/14.jpg)
www.dataiku.com
Are backend logs a solution ?
Challenge 1 : Identify a visitor• IP ?
• NAT / Proxy• Not everyone has a public IP address
• IP + user-agent ?• Big companies !
![Page 15: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/15.jpg)
www.dataiku.com
Are backend logs a solution ?
Challenge 2 : Re-create sessions• Using expiration times• Advanced SQL / Hive / …
makes this easier
![Page 16: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/16.jpg)
www.dataiku.com
Are backend logs a solution ?
Challenge 3 : single-page webapps• Track behaviour within each page• Track events, not pages
Also: getting logs from IT is sometimes another challenge
![Page 17: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/17.jpg)
www.dataiku.com
Client-side tracking
• visitor_id and session_id handled with cookies• Tracking page loads and various events
• Historically, "tracking" = fetching a 1x1 image• AJAX
www.website.com
Browser
tracker.com
JS tracking code
Tracking calls
![Page 18: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/18.jpg)
www.dataiku.com
Are cookies good for your (web) health ?
• Each cookie belongs to a domain(and its subdomains)
• Who can write a cookie ?– The HTTP server, who becomes owner
(via the Set-Cookie HTTP header)– JS code running on the "owner" domain
• Who can read a cookie ?– The owner HTTP server (sent by the browser)– JS code running on the "owner" domain
![Page 19: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/19.jpg)
www.dataiku.com
First-party cookies
• Set by the originating server (HTTP) or JS code• Belong to the originating domain• Sent by HTTP to the originating domain only• Readable by JS code
www.website.com
Browser
Cookies for www.website.com:None
tracker.com
GET /Cookies: none
Fetch tracking script
Tracking JS code: read cookies for www.website.comTracking JS code: create visitor id and set cookie
Contents
![Page 20: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/20.jpg)
www.dataiku.com
First-party cookies
• Set by the originating server (HTTP) or JS code• Belong to the originating domain• Sent by HTTP to the originating domain only• Readable by JS code
www.website.com
Browser
tracker.com
GET /track?visitor_id=d37ecbaCookies: None
JS code: send AJAX request to tracker.com with visitor_id
Cookies for www.website.com:visitor_id=d37ecba
![Page 21: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/21.jpg)
www.dataiku.com
Third-party cookies
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain• Not send by HTTP to the originating domain (does not belong)• NOT readable by JS code (does not belong)
www.website.com
Browser
tracker.com
GET /Cookies: none
Fetch tracking script
Contents
Cookies for www.website.com:None
Cookies for tracker.com: None
![Page 22: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/22.jpg)
www.dataiku.com
www.website.com
Browser
Cookies for www.website.com:None
tracker.com
Cookies for tracker.com: None
GET /trackCookies: None
200 OKSet-Cookie: visitor_id=33d7
Tracker code: assign visitor_id
Third-party cookies
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain• Not send by HTTP to the originating domain (does not belong)• NOT readable by JS code (does not belong)
![Page 23: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/23.jpg)
www.dataiku.com
Third-party cookies
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain• Not send by HTTP to the originating domain (does not belong)• NOT readable by JS code (does not belong)
www.website.com
Browser
tracker.com
Cookies for tracker.com: visitor_id=33d7
GET /trackCookies: visitor_id=33d7
200 OK
Tracker code: read visitor_id
Cookies for www.website.com:None
![Page 24: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/24.jpg)
www.dataiku.com
First party cookie
• Tracks on a single website• Requires JS code for tracking• Reduced privacy impact:
No exchange of information between sites
• Usage: track your user's behaviour
Third party cookie
• Tracks across all websitesusing the same tracker
• More frowned upon
• Usage: generally, adsbut also multi-website
Why each ?
Rarely blocked(used for logins)
Blocked by up to 40% visitors
![Page 25: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/25.jpg)
www.dataiku.com
What are your obligations ?
With ALL cookies• You should ask user whether he wants cookies• Even non-tracking related cookies• Yes, even login-related ones
![Page 26: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/26.jpg)
www.dataiku.com
What are your obligations ?
With third party cookies• Obey the Do-Not-Track header
www.website.com
Browser
tracker.com
GET /trackCookies: NoneDNT: 1
200 OK
Tracker code: DO NOTHING
![Page 27: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/27.jpg)
www.dataiku.com
What are your obligations ?
With third party cookies• Provide an opt-out URL• Allows the user to /optin , /optout or /status
See in action : www.youronlinechoices.com
![Page 28: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/28.jpg)
www.dataiku.com
What are your obligations ?
With third party cookies• Provide a P3P policy• Else, older IE blocks you
"What are you doing with my data ?"
Looks like this: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
![Page 29: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/29.jpg)
www.dataiku.com
Tracking in mobile apps
• Preserve battery– Each network call is costly– Do not track everything synchronously
• Network access is intermittent– Queue events and wait for network access
![Page 30: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/30.jpg)
www.dataiku.com
So, what are my choices ?
• You might really want to be your own web tracker
• Most used open source Webtracker : Piwik
• Provides both raw data and nice dashboards– MySQL backend– Raw data via API– Slightly less suited for analytics
![Page 31: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/31.jpg)
www.dataiku.com
WT1
YOUR OWN TRACKERIN MINUTES
![Page 32: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/32.jpg)
www.dataiku.com
WT1
An open source (Apache License) serverto build your own web tracking
https://github.com/dataiku/wt1
• Designed to provide you with raw data, directly usable for analytics
• Very high performance and scalability
![Page 33: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/33.jpg)
www.dataiku.com
Features
• 1st or 3rd party cookies– Handling of DNT and opt-out– Helps handling P3P
• Track events or pages with key-value data• Visitor-scope and session-scope variables
• "Live view" debugging console
![Page 34: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/34.jpg)
www.dataiku.com
Features
• Dashboards: None
• Events processing and storage– Filesystem, S3– Event queues: Flume– Custom processors
• JSON API for custom tracking
• iOS library
![Page 35: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/35.jpg)
www.dataiku.com
Architecture
Client-side JS tracker
iOS library
• 1st or 3rdparty cookies
• Event-level tracking
• Automatic batching• Queuing to deal with
network interruptions
WT1 Server
Raw storage• Filesystem• S3
Event processors:• Real-time aggregations• Custom code
Event queues• Flume • Kafka, RabbitMQ, …
• Java• > 20K events / second• Handles DNT, P3P, opt-out, …
JSON POST
![Page 36: OWF 2014 - Take back control of your Web tracking - Dataiku](https://reader034.vdocument.in/reader034/viewer/2022052506/557d5f2dd8b42aba3d8b4f00/html5/thumbnails/36.jpg)
www.dataiku.com
Future work
• Android library
• More event queues supported OOTB– Kafka– RabbitMQ
• Avro storage