how we survived hurricane sandy

29
How we survived Hurricane Sandy a look at what not to do Dan White @airplanedan Mike Zaic VP, Engineering, CafeMom @focustrate Photo: TenSafeFrogs (flickr)

Upload: focustrate

Post on 02-Jul-2015

172 views

Category:

Technology


0 download

DESCRIPTION

In the startup world, the most pressing issue usually isn't, "how to we prepare for a once-in-a-lifetime storm?" Risk management is always stressed at Velocity, but businesses don't always buy into dedicating resources to low probability events. Let our comedy of errors during Hurricane Sandy help to convince your bosses that infrastructure and redundancy are actually worthwhile investments.

TRANSCRIPT

Page 1: How we survived Hurricane Sandy

How we survived Hurricane Sandy

a look at what not to do

Dan White@airplanedan

Mike ZaicVP, Engineering, CafeMom

@focustrate

Photo: TenSafeFrogs (flickr)

Page 2: How we survived Hurricane Sandy

Calm before the Storm

Into the Storm

Lessons Learned

Introduction

Page 3: How we survived Hurricane Sandy

CafeMom

• Founded: 2006• Social Network for moms• > 3 million members• > 10 million monthly UVs• ~ 130 employees

– Tech: 13 people– Sales/Community: bulk of

people

Page 4: How we survived Hurricane Sandy

Calm before the Storm

Into the Storm

Lessons Learned

Introduction

Page 5: How we survived Hurricane Sandy

Our Application(s)

CafeMom

groups

games

advertising

profiles

photos

video

The Stir

answers

Que Mas

MamasLatinas

language switching

stats

admin tools

sponsorships

emailadvice

Page 6: How we survived Hurricane Sandy

DevOps Focus and Pace

• Sales Driven!• Limited resources• Product shifting• Priorities – refactoring?

Photo: árticotropical (flickr)

Page 7: How we survived Hurricane Sandy

Architecture

• Hosting– Single physical datacenter

– Redundancy? pffff

• Cloud Presence– Limited to specific parts of the app

– No database requiredPhoto: Arthur40A (flickr)

Page 8: How we survived Hurricane Sandy

Infrastructure Growth

2006 2013

minimal hardware

rapid growth=

infrastructure expands

slower growth=

stagnant infrastructure

stagnant infrastructure + more functionality=

tightly coupled birds nest

continued growth=

optimize for scale

Page 9: How we survived Hurricane Sandy

“If NYC has serious issues we’ll all have more important things to worry about…”

Page 10: How we survived Hurricane Sandy

Calm before the Storm

Into the Storm

Lessons Learned

Introduction

Page 11: How we survived Hurricane Sandy

Day 1: Bring on Sandy!

Page 12: How we survived Hurricane Sandy

Day 2: waiting and hoping

Page 14: How we survived Hurricane Sandy

Day 4: code with no data

Photo: mandiberg (flickr)

Page 15: How we survived Hurricane Sandy

Day 5: generator anticipation

Page 16: How we survived Hurricane Sandy

Day 6: generator #fail

Page 17: How we survived Hurricane Sandy

Day 7: everything… works?

Page 18: How we survived Hurricane Sandy

Calm before the Storm

Into the Storm

Lessons Learned

Introduction

Page 19: How we survived Hurricane Sandy
Page 20: How we survived Hurricane Sandy

Technical Takeawayswhat we did

• Redundancies• Code without data• DNS propagation• Cloud replication–Codebase –Database Photo: Robert S. Donovan (flickr)

Page 21: How we survived Hurricane Sandy

• Architecting for degraded service?

• Simplify application?• Second physical site? • Geographic redundancy?• Undoing "fixes" of fail-

overs?

Technical Takeawaysnext steps

Photo: paul bica (flickr)

Page 22: How we survived Hurricane Sandy

Overarching Problems• Don’t trust vendor/provider assurances! • Managing expectations during an outage• Feelings of helplessness• Scrambling doesn't necessarily get things up faster• Must know downtime tolerance• Cost of DR planning vs Lost opportunity of outage • DR Planning is insurance

Page 23: How we survived Hurricane Sandy

Know when to say when

• When is it appropriate to plan for DR?• When is it appropriate to build for DR?• When is it appropriate to retrofit DR for existing app?• When is it appropriate to dictate product requirements

based on ease of DR?• During an outage - when is it appropriate to do nothing

(and just sleep)?

Page 24: How we survived Hurricane Sandy

When not a great time to think about DR?

During a disaster.

(unless you’re thinking about the next one)

Page 25: How we survived Hurricane Sandy

Moving forward – inertia sucks!• Change is tough• Mature infrastructure/app resists drastic change• Pace of development inertia against stepping back

and revisiting• Pace of business inertia against non-revenue

projects• No recent disasters inertia against DR necessity

Page 26: How we survived Hurricane Sandy

baby steps

Page 27: How we survived Hurricane Sandy

thank you

Photo: Lisa Bettany {Mostly Lisa} (flickr)

Page 28: How we survived Hurricane Sandy

thank you

Photo: Lisa Bettany {Mostly Lisa} (flickr)

Page 29: How we survived Hurricane Sandy

thank you

Photo: Lisa Bettany {Mostly Lisa} (flickr)