how appboy’s marketing automation for apps platform grew 40x on the objectrocket mongodb platform

41
Scaling 40x on the ObjectRocket MongoDB Platform Jon Hyman & Kenny Gorman MongoDB World, June 25, 2014 NYC @appboy @objectrocket @jon_hyman @kennygorman

Upload: mongodb

Post on 12-May-2015

3.470 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Scaling 40x on the ObjectRocket MongoDB Platform Jon Hyman & Kenny Gorman MongoDB World, June 25, 2014 NYC

@appboy @objectrocket @jon_hyman @kennygorman

Page 2: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

A LITTLE BIT ABOUT JON & APPBOY

Jon Hyman CIO :: @jon_hyman !

Appboy is a marketing automation platform for apps

Harvard Bridgewater

Page 3: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

A LITTLE BIT ABOUT KENNY & OBJECTROCKET

Kenny Gorman Co-Founder & Chief Architect :: @kennygorman !

ObjectRocket is a highly available, sharded, unbelievably fast MongoDB as a service

ObjectRocket eBay Shutterfly

Page 4: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Agenda

• Evolution of Appboy’s MongoDB installation as we grew to handle billions of data points per month

!

• Operational MongoDB issues we worked through

Page 5: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

MongoDB Evolution:

March, 2013

Mar May July Sept Nov Jan

Apr Jun Aug Oct Dec Feb

Mar

Page 6: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What did Appboy look like in March, 2013?•~2.5 million events per day tracking 8 million users

• Event storage: every data point as a new document

• Single, unsharded replica set on AWS (m2.xlarge)

• Mostly long-tail customers; biggest app had 2M users

Page 7: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What did Appboy look like in March, 2013?•~2.5 million events per day tracking 8 million users

• Event storage: every data point as a new document

• Single, unsharded replica set on AWS (m2.xlarge)

• Mostly long-tail customers; biggest app had 2M users

!

Growing a lot on disk. :-( !

Started running into locking issues (30-40%). :-(

Page 8: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

MongoDB Evolution:

April, 2013

Mar May July Sept Nov Jan

Apr Jun Aug Oct Dec Feb

Mar

Scaled vertically

Page 9: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What happened in April, 2013?

• First enterprise client signs

• More than 50 million users

• They estimated sending us over 1 billion data points per month

Page 10: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What happened in April, 2013?

• First enterprise client signs

• More than 50 million users

• They estimated sending us over 1 billion data points per month

!

“Btw, we’re going live next month”

Page 11: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

MongoDB Evolution:

April, 2013: holy crap!

Page 12: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

ObjectRocket: Getting Started

• The landscape of a simple configuration

• It’s all about choosing shard keys

• Locks - you know you love them

20%

80%

Page 13: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What are we going to do?• Contain growth from data points:

• Shifted to Amazon Redshift for “raw data”

• Moved MongoDB to storing pre-aggregated analytics for time series data

• Figure out sharding ASAP

• Moved to ObjectRocket, worked on shard key selection

• Sharding was hard:

• Tough to figure out the right shard key, make tradeoffs

• Rewrite a lot of application code to include shard keys in queries, inserts, adjust to life without unique indexes

Page 14: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Shard key selections• Users

• Had multiple ways to identify a user

• Device identifier, “external user id”, BSON ID

• Often performed large scans of user bases

Page 15: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Shard key selections• Users

• Had multiple ways to identify a user

• Device identifier, “external user id”, BSON ID

• Often performed large scans of user bases

!

{_id: “hashed”} !

• Cache secondary identifiers to BSON ID to reduce scatter-gather queries

• Doing scatter gathers goes against conventional wisdom

Page 16: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Shard key selections• Pre-aggregated analytics

• Always query history for a single app

• 1 document per day per app per metric

!

{app_id: 1}

Page 17: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

MongoDB Evolution:

May - October, 2013

Mar May July Sept Nov Jan

Apr Jun Aug Oct Dec Feb

Mar

Scaled vertically

Start sharding

Everything sharded

Page 18: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What did Appboy look like in May - October, 2013?• textPlus goes live, as do other customers

• > 1 billion events per month, doing great!

• 4, 100GB shards on ObjectRocket

Page 19: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

MongoDB Evolution:

November, 2013

Mar May July Sept Nov Jan

Apr Jun Aug Oct Dec Feb

Mar

Scaled vertically

Start sharding

Everything sharded

Various customer launches

Page 20: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What happened in November, 2013?

• One of the largest European soccer apps

Page 21: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What happened in November, 2013?

• One of the largest European soccer apps

• Soccer games crushed us: 15 million data points per hour just from this app!

• Lock percentage ran high, a single shard was pegged

• Real-time analytics processing got severely delayed, adding more servers did not help (in fact, it made things worse)

Page 22: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What happened in November, 2013?

• One of the largest European soccer apps

• Soccer games crushed us: 15 million data points per hour just from this app!

• Lock percentage ran high, a single shard was pegged

• Real-time analytics processing got severely delayed, adding more servers did not help (in fact, it made things worse)

Why a single shard?

Page 23: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Shard key selections• Pre-aggregated analytics

• Always query history for a single app

• 1 document per day per app per metric

!

{app_id: 1}

Page 24: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Shard key selections• Pre-aggregated analytics

• Always query history for a single app

• 1 document per day per app per metric

!

{app_id: 1}

Page 25: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

ObjectRocket: Capacity, Growth

• Concurrency

• Did I mention locks?

• Cache management

• Compaction

• The shell game

• Indexing at scale

Page 26: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

How to fix this?• Fundamentally, all updates are going to a single document

• Can’t shard out a single document

• Asked ObjectRocket for their suggestions

Page 27: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

How to fix this?• Fundamentally, all updates are going to a single document

• Can’t shard out a single document

• Asked ObjectRocket for their suggestions

!

Introduce write buffering

Page 28: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Write buffering• Buffer writes to something that can be sharded out, then flush to MongoDB

• Need something transactional, so MongoDB was out for this

• Decided on multiple Redis instances:

• Redis has native hash data structure with atomic hash increments, works nicely with MongoDB in this use-case

Page 29: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Write buffering

Incoming data Flush to MongoDB

Page 30: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Write buffering• Wrote write buffering over a weekend to buffer writes to MongoDB every 3 seconds

!

Pre-aggregated analytics bottleneck was solved!

Page 31: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

MongoDB Evolution:

January, 2014

Mar May July Sept Nov Jan

Apr Jun Aug Oct Dec Feb

Mar

Scaled vertically

Start sharding

Everything sharded

Various customer launches

Bad shard key hit upper limit

Added write buffering

Page 32: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What did Appboy look like in January, 2014?• > 3 billion events per month

• 4, 100GB shards on ObjectRocket

• Performance started to have really bad bursty behavior: sometimes user experience would slow down to what we thought was unacceptable for our customers

Page 33: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Why was performance getting worse?

• Appboy customers send millions of messages in a single campaign, most are sending hundreds of thousands to millions of messages each week

• Campaign times tend to cluster together across all Appboy customers: evenings, Saturday/Sunday afternoons, etc.

A lot of enormous read activity

Page 34: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Why was performance getting worse?

• Appboy customers send millions of messages in a single campaign, most are sending hundreds of thousands to millions of messages each week

• Campaign times tend to cluster together across all Appboy customers: evenings, Saturday/Sunday afternoons, etc.

A lot of enormous read activity Reads and writes and more reads start conflicting :-(

!

• Users visiting our dashboard during simultaneous large campaign sends would have sporadic poor performance

Page 35: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

ObjectRocket: Splits• Split out collections to different MongoDB clusters

After Before

Page 36: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What did Appboy look like in February, 2014?

• Splits helped

• > 4 billion events per month

• We needed more

Page 37: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What did Appboy look like in February, 2014?

• Splits helped

• > 4 billion events per month

• We needed more Isolation

Page 38: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

ObjectRocket: Isolation• Isolate large enterprise customers on their own MongoDB databases/clusters

• Appboy built this in March, 2014

Enterprise customer

Long-tail customer

Page 39: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

Mar May July Sept Nov Jan

Apr Jun Aug Oct Dec Feb

Mar

Scaled vertically

Start sharding

Everything sharded

Various customer launches

Bad shard key hit upper limit

Added write buffering

Start splitting DBs Isolation

Summary

Page 40: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

What’s next?• Figure out capacity planning

• Continue down isolation path

0

15000000

30000000

45000000

60000000

Page 41: How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

[email protected] [email protected]

@appboy @objectrocket @jon_hyman @kennygorman