spotify lessons- learning to let go of machines · spotify is hiring! spotifyjobs.com io tribe....

68
Spotify Lessons: Learning to Let Go of Machines IO Tribe James Wen, Site Reliability Engineer at Spotify ALF Squad, Infrastructure & Operations Tribe

Upload: others

Post on 28-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Spotify Lessons:Learning to Let Go of Machines

IO Tribe

James Wen, Site Reliability Engineer at SpotifyALF Squad, Infrastructure & Operations Tribe

Page 2: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20
Page 3: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Let’s control how feature developers think about what their code is actually running on.

Page 4: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Takeaways• Feature developers = happiest with

feature work • Find out developer machine

concerns and mitigate • Migrating to cloud or hybrid? Start

embracing ephemeral service design and infrastructure

Page 5: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Agenda• Why? • Journey • Hybrid Cloud • Ops in Squads

• Future • Learnings

Page 6: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Why?

Why don’t we want feature devs to care too much about infrastructure and machines?

Page 7: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Why?

Time taken on infrastructure tasks = time taken away from feature work

Feature devs = focused on features

Page 8: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Spotify Scale Stats- 140 Million+ Monthly Active Users

- 50 Million+ Subscribers - 30 Million+ Songs - 2 Billion+ Playlists

- Available in 60 markets

Page 9: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Spotify Dev Scale Stats

~900 Devs ~100 Tech Teams ~2000 Services

Page 10: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Spotify Machine Scale Stats

~10,000 Bare Metal Hosts~13,000 Hosts on GCP46 Hardware/VM Types

Page 11: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Example: Capacity Planning

Avg # devs on a team Capacity Planning

Page 12: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Scale doesn’t really matter-Smaller companies/teams = developer time is more valuable

-Larger companies/teams = wasted infra time scales as well

Page 13: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Other Infrastructure Tasks- Machine provisioning- Failure planning - Security updates - Machine maintenance

Page 14: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Dedicated Ops?

Page 15: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Dedicated Ops?~2000 Services74 Infrastructure and Operations Engineers

If all IO engineers → dedicated ops27:1 service:engineer ratio

Page 16: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Ops In SquadsFeature teams handle their own ops and

provisioning

Using the services and tooling the Infrastructure and Operations tribe has

written

Page 17: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

We control the level of context feature teams need to operate their services.

Page 18: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

- Developer Happiness- Developer effectiveness and context

Page 19: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Journey

Page 20: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

- Ops in Squads

- Hybrid Cloud (Ephemerality)

Page 21: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Starting Out

Page 22: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

StockholmSan Jose

Rack 2Rack 1

Historical: Feature Developer’s Context for Service’s Capacity

lon-1-dlon-1-b

lon-1-clon-1-akeys

updated

Rack 2

lon-1-f

lon-1-e

updated

Page 23: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Machine Context- Packages - Hostname - Machine specs (CPU, RAM, disk, etc.)

- Uptime and service duration - Location - Local state (files on disk, info in memory)

Unboundv1.6.3

ash2-metadata-a.ash2.spotify.net

Openssl v1.0.0f

2 Cores 8 GB RAM

Tarred LogsIn Virginia

3 Years

Page 24: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How to get?

How many?Specs?

How long?

How to talk to it?

Where? Up to date?

How to track?

What tools on it?Maintenance? What to put

on it?

Available?

Service + Business

Page 25: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20
Page 26: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How to get?

How many?Specs?

How long?

How to talk to it?

Where? Up to date?

How to track?

What tools on it?Maintenance? What to put

on it?

Available?

Service + Business

Page 27: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

ServerDB

Page 28: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How to get?

How long?

How to talk to it?

What tools on it?Maintenance? What to put

on it?

Available?

Service + BusinessHow to track?

Where? Up to date?

How many? Specs?

Page 29: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

ProvGun/ProvCannon

Page 30: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

What tools on it?Maintenance? What to put

on it?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

How many? Specs?

Page 31: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

DNS

Page 32: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

What tools on it?Maintenance? What to put

on it?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

How many? Specs?

Page 33: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Nameless

Page 34: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

What tools on it?Maintenance? What to put

on it?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

How many? Specs?

Page 35: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Cortana

Page 36: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Cortana

Page 37: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

What tools on it?Maintenance? What to put

on it?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

How many? Specs?

Page 38: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Helios and Containers

Page 39: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

What tools on it?Maintenance? What to put

on it?

Available?

Service + Business

Where? Up to date?

How many? Specs?

How to track? How to get?

Page 40: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Google Compute Platform

Page 41: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

ash2-cortana-a1.ash2Zone Service Group Sequential #

gew1-cortana-a-l33t.gew1Zone Service Pool Random 4 Chars

Page 42: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Cortana Pool Manager

Page 43: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

What tools on it?Maintenance? What to put

on it?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

How many? Specs?

Page 44: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Regional Managed Instance Groups

Page 45: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

What tools on it?Maintenance? What to put

on it?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

How many? Specs?

Page 46: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

MBMI: Minimal Base Machine Image

Page 47: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How to talk to it?

What tools on it?Maintenance? What to put

on it?

Available?

Service + BusinessHow to track? How to get?

Where? How long?Up to date?

How many? Specs?

Page 48: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Phoenix

Page 49: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

Maintenance?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

What tools on it?

What to put on it?

How many? Specs?

Page 50: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Current: Feature Developer’s Context for Service’s Capacity

GCP - europe-west-1

Pool:2 instances x (n1-standard-32)

Stockholm

Pool:4 instances x (High Mem)

Page 51: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

Maintenance?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

What tools on it?

What to put on it?

How many? Specs?

Page 52: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Future

Page 53: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Gordon (Cloud DNS)

Page 54: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

Maintenance?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

What tools on it?

What to put on it?

How many? Specs?

Page 55: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Autoscaling

Page 56: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

Maintenance?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

What tools on it?

What to put on it?

How many? Specs?

Page 57: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Right Sizing

Page 58: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

Maintenance?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

What tools on it?

What to put on it?

How many? Specs?

Page 59: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Future Feature Developer’s Context for Service’s Capacity

GCP - asia-east-1

Service Pool

GCP - europe-west-1

Service Pool

GCP - us-central-1

Service Pool

Page 60: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Feature Developer Concerns

How long?

How to talk to it?

Maintenance?

Available?

Service + BusinessHow to track? How to get?

Where? Up to date?

What tools on it?

What to put on it?

How many? Specs?

Page 61: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Learnings

Page 62: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Why Pets to Cattle was Difficult:- Manual/tedious setup- Wait times for machine becoming ready (packages, DNS)- Non-automatic security updates - A fixed, reliable hostname - SSH Access - Always up/present unless team tears down

Page 63: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

- Monitoring- Logging- Service Design - Incidents

Ephemerality Learnings

Page 64: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

- Replicate bare metal functionality, then iterate - When in doubt, devs provision up and many - Migration = great time to influence dev paradigms - Don’t need to DIY

Hybrid Learnings

Page 65: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

- Feature devs need carrots, sledgehammers, and/or limos to change - Edge Cases: REST API + CLI = provide enough for feature teams to handle the edge cases

DevEx Learnings

Page 66: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

- Decrease necessary infrastructure context - Increase reliability - Save $$$ - Increase dev happiness and productivity

Recap

Page 67: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

Let’s strategically control and limit how feature developers think about infrastructure.

Page 68: Spotify Lessons- Learning to Let Go of Machines · Spotify is hiring! spotifyjobs.com IO Tribe. Title: Spotify Lessons- Learning to Let Go of Machines Created Date: 6/27/2017 8:00:20

James WenEmail: [email protected]/Github: @rochesterinnyc LinkedIn: jamesrwenSpotify is hiring! spotifyjobs.com

IO Tribe