building a culture of observability at stripe

37
Building a Culture of Observability at Stripe Maaaaaaaybe?

Upload: cory-watson

Post on 25-Jan-2017

528 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Building A Culture of Observability At Stripe

Building a Culture of Observability at Stripe

Maaaaaaaybe?

Page 2: Building A Culture of Observability At Stripe

Cory “gphat” Watson

• Joined Stripe in August, 2015

• Previously at Keen IO and Twitter

• Generalist

Page 3: Building A Culture of Observability At Stripe

Starting Point

• Stripe had some visibility, but not enough.

• No clear ownership, broken windows.

• Lack of confidence, vision for future.

• Very reactive.

Page 4: Building A Culture of Observability At Stripe

This isn’t about a specific technology. This is about people.

Page 5: Building A Culture of Observability At Stripe

Did it work?

Page 6: Building A Culture of Observability At Stripe

See my resume at: onemogin.com/resume

(jk)

Page 7: Building A Culture of Observability At Stripe

You’re here because you know this is important.

Page 8: Building A Culture of Observability At Stripe

How can we get others to agree and work toward it?

Page 9: Building A Culture of Observability At Stripe

Stripe Org Facts

• ~450 employees, 100% growth in last year

• ~2 dozen teams

• ~200 services

• Thousands of hosts (AWS)

• Ruby, JVM, lots of OSS stuff

• Team: 3 + intern (starting Q2)

Page 10: Building A Culture of Observability At Stripe

Where to begin?

Page 11: Building A Culture of Observability At Stripe

Start Over, Kinda

• Spend time with the tools

• Improve if possible

• Replace if not

• Leverage past knowledge

Page 12: Building A Culture of Observability At Stripe

Empathy and Respect

• People not generally evil, but they are busy!

• Stressed, doing best with what they have

• Being a hater is lazy

• Help people be great at their jobs

Page 13: Building A Culture of Observability At Stripe

Replaced Existing System

• Maybe a bad call, technically better

• Overcoming momentum is hard, adds work

• Declaring bankruptcy

• Saved us ops headaches

• Still going

Page 14: Building A Culture of Observability At Stripe

Tip: Nemawashi

• Start small, you’re a great guinea pig

• Quietly lay a foundation and gather feedback

• Ask how you can improve, follow up!

• Engage discontent! Usually fine. Sometimes you need whisky.

Page 15: Building A Culture of Observability At Stripe

Identify Power Users

• Find interested parties

• Talk to them, give them what they need

• Empower them to help others

• Watch them grow!

Page 16: Building A Culture of Observability At Stripe

Value

• What are you improving?

• How can you measure it?

• Is this the best way?

Page 17: Building A Culture of Observability At Stripe

What is Observability?

Why do we want it?

Page 18: Building A Culture of Observability At Stripe

In control theory, observability is a measure for how well internal states of a system can

be inferred by knowledge of its external outputs.

Page 19: Building A Culture of Observability At Stripe

Systems output work.

If the internal state goes bad, the work goes bad.

We need to add sensors!

Page 20: Building A Culture of Observability At Stripe

Make This Great

ProgrammerReference

System

Sensor(s)

Work

Page 21: Building A Culture of Observability At Stripe

Flat Org Work Ethic

• Probably the biggest challenge, getting started

• So, ya know, get started

• Be willing to do the work, shave the preposterous line of yaks

• Stigmergy

• Strike when good opportunities arise (incidents, etc)

Page 22: Building A Culture of Observability At Stripe

Advertise

• Don’t be afraid!

• Promote team accomplishments.

• Moreso, promote the accomplishment of others.

• Humbly ask to help, then learn.

• We send monthly “State of” addresses…

Page 23: Building A Culture of Observability At Stripe

Make It Easy & Good

• Harder than it sounds (email!)

• Make it easy/automatic to do things right and hard to do wrong.

• Quality is important.

Page 24: Building A Culture of Observability At Stripe

Automated Monitors

• Baseline monitoring

• Common problems, common solutions

• Users have no state, are surprised

• People care when you show them failure and how to fix it.

Page 25: Building A Culture of Observability At Stripe

Automatic Ticket CreationAnd Resolution!

Page 26: Building A Culture of Observability At Stripe

Investigation DashboardSuch Helpful!

Page 27: Building A Culture of Observability At Stripe

Getting FeedbackHow we improve.

Page 28: Building A Culture of Observability At Stripe

Teach the Basics

• Company curriculum: Teach ‘em early!

• Measuring work metrics

• Metrics types

• Schemas (dotted, tags, etc)

• Rates, histograms

• Visualizations

Page 29: Building A Culture of Observability At Stripe

Ownership

• Poor story for this

• Org was ready for this, management was on board.

• Evolving, tools are lacking.

Page 30: Building A Culture of Observability At Stripe

Did it work?

Page 31: Building A Culture of Observability At Stripe

Yes, but not done.

• Some teams? Hell yes. Strong champions, huge improvement.

• Some other teams, kinda the same.

• Some other other teams, what is Observability and why do I care? Rare!

Page 32: Building A Culture of Observability At Stripe

Usage?

• 200+ dashboards created, 339 in old (over 2 years)

• 200+ monitors created, dozens in old (nobody trusted, was unreliable!)

• ~3000 distinct metrics (can’t compare, tags now!)

• All positive feedback from automation. (Avg 4.5, 2.5% response)

Page 33: Building A Culture of Observability At Stripe

Tools?

• Dozens of OSS PRs, OSS *StatsD library (Scala), internal libraries (we own)

• Vast improvement over old pipeline, no loss

• New styles, better naming, more consistency

• Being tied to a commercial product cuts both ways

Page 34: Building A Culture of Observability At Stripe

Adjustments?

• Embracing other tools (log analysis, error catching)

• Beginning to work on strategic things (global timers, histograms and sets)

• Need to improve metrics on our own work (we got by easy for a while)

• Monitoring is hard, need to fix.

Page 35: Building A Culture of Observability At Stripe

Summary

• Start small

• Seek feedback

• Think on your value

• Measure effectiveness

• Enjoy!

Page 36: Building A Culture of Observability At Stripe

Thanks

Team @antifuchs and @shu, all of Stripe

onemogin.com

@gphat

github.com/gphat

[email protected]

Page 37: Building A Culture of Observability At Stripe

Questions?

@gphatInfo

Slides

Feedback

Talk

Help me improve.