frontend at scale - the tumblr story
DESCRIPTION
Growing to become one of the largest sites on the Internet comes with a unique set of problems. Learning how to and adopt, and doing so without losing sight of content creator's voice proves tricky. This talk details some of the frontend tools we've built and approaches we've taken to service our millions of users at scale.TRANSCRIPT
Frontend at Scalethe Tumblr story
What is Tumblr?→ Platform for you to express yourself
→ ~200 million blogs
→ 83+ billion posts
→ HQ in NYC
→ Founded in 2007
→ 100+ engineers
What is Tumblr?→ Three ways to surface
content:
→ The dashboard
What is Tumblr?→ Three ways to surface
content:
→ The dashboard
→ Search
What is Tumblr?→ Three ways to surface
content:
→ The dashboard
→ Search
→ Blog network
!
(Example: http://16-bitch.tumblr.com/)
Who am I?
→ Chris Miller
→ Product Engineering Manager
→ Content Consumption (a.k.a., The Dashboard)
Our stack→ Frontend
→ Backbone (+ lodash, underscore, etc.)
→ jQuery (+ some plugins)
→ SASS (+ Bourbon)
→ a bit of VelocityJS
→ Gulp for build
Our stack
→ Backend
→ PHP application layer
→ Some specialized services (Scala, C, etc.)
→ Data: MySQL, Redis, memcache, HDFS
How does it work?
→ 1000’s of servers
→ Deploy dozens of times per day
→ Monitor and measure everything
→ Hadoop
→ OpenTSDB (backed by HBase)
Our process
→ Teams are small
→ Iterate quickly
→ Release early and often, usually to % of users
→ 2 code review “ok’s” required for all Pull Requests
Feature Flagging
Feature Flagging
What is it?
→ Segregate your users to certain features
→ Control who sees what (and when)
Feature Flagging
Implementation→ Server-side feature flagging
→ Client-side feature flagging
Feature Flagging
Usage
→ Provides
→ A/B testing
→ Run beta code alongside production code
→ Kill switch
Feature Flagging
A/B Testing→ Injected recommendations
→ A/B(/*) testing of positioning
→ Which position is the best? Why?
Feature Flagging
A/B Test Results→ Injected recommendations
→ A/B(/*) testing of positioning
→ Which position is the best? Why?
Position 2
Position 3
Position 4
Position 5
Position 6
Position 7
Position 8
Position 9
Feature Flagging
Ramping & Kill Switch
→ Ramping new features
→ Deploy to only “admin” (staff)
→ …then 1% of users… then 5%… 10%… 25%…
→ Kill switch
→ Completely turn off a feature that’s breaking the site… poof
Feature Flagging
Use Carefully→ Feature flagging certain functionality can give a mixed
experience
→ Can cause user confusion:
→ “Why does my mom see this and I don’t?” — Confused teenager
→ Easy to build complex dependencies — don’t
Error Logging
Error Logging
Launching Features→ New features usually have bugs
→ (Well, not my code)
→ (just kidding)
Error Logging
Error Logging→ New features usually have bugs
→ Server-side errors, easy to find
Error Logging
Error Logging→ New features usually have bugs
→ Client-side errors, also easy to find…
→ …on my browser
Error Logging
Error Logging→ New features usually have bugs
→ Client-side errors, not easy to find on your browser
→ …until recently
Error Logging
Capture Errors→ We built: exceptions.js
→ Really, it’s just: window.onerror
Error Logging
Capture Errors→ Build dependency-free
→ Build to be defensive
Error Logging
Capture Errors→ What you do with the logs doesn’t matter; it’s how you use it
→ We log errors to Scribe…
→ …throw them into Hadoop
→ …and count frequency with OpenTSDB
Error Logging
Error Data→ With Hive, we can query Hadoop:
→ With this, I can see we log around 1.4 million errors per day
Error Logging
Error Data→ With OpenTSDB we can plot the frequency of logs
Error Logging
We Love Graphs→ We made pretty graphs with OpenTSDB and graph everything
Getting it Right→ Sometimes we find errors before our users do.
→ Sometimes.
→ And it makes us feel good.
Getting it Right→ So we dance.
Thank You
Email - [email protected] me - ee99ee.com