two sides of google infrastructure for everyone else

101
(without introducing more risk) The Two Sides Puppet Gareth Rushgrove Of Google Infrastructure for Everyone Else

Upload: gareth-rushgrove

Post on 10-Jan-2017

210 views

Category:

Technology


4 download

TRANSCRIPT

(without introducing more risk)

The Two Sides

PuppetGareth Rushgrove

Of Google Infrastructure for Everyone Else

(without introducing more risk)

@garethr

(without introducing more risk)

Gareth Rushgrove

(without introducing more risk)IntroductionA strange format for a talk

This is a debate

Gareth Rushgrove

I’ll be debating both sides

Gareth Rushgrove

Taking opposing viewpoints onthe same issue, as a way of exploring it in-depth

Gareth Rushgrove

The talk is split into two parts;a For part and an Against part

Gareth Rushgrove

I’d like to explore:- Technical practice evolution- How we adopt software- The organisational context

Gareth Rushgrove

This house believes…

Gareth Rushgrove

Successful companies will look like Google in the future, so we should adopt Google-like software and practices today

Gareth Rushgrove

Important disclaimerI’ve never worked for Google

Gareth Rushgrove

(without introducing more risk)For

You’re probably:1 Struggling with distributed systems2 Missing out on machine learning3 Wondering how to scale operations

Gareth Rushgrove

Gareth Rushgrove

have a 10+ year head start

publish research that influences out industry

Gareth Rushgrove

Gareth Rushgrove

MapReduce

Gareth Rushgrove

Chubby

Gareth Rushgrove

Borg

releases (and inspires) software we use

Gareth Rushgrove

Gareth Rushgrove

Gareth Rushgrove

Go

Gareth Rushgrove

from

(without introducing more risk)

GFS = HDFSBigTable = HBaseProtocol Buffers = Thrift or Avro (serialization)Stubby = Thrift or Avro (RPC)ColumnIO = ParquetDremel = ImpalaOmega = MesosBlaze = Pants or BuckFlumeJava = CrunchLogsaver = Scribe or FlumeMillwheel = Storm or Samza?Borgmon/Monarch = GraphiteDapper = Zipkin

2014 from @avibryant, @joshwills, @skamille, @marius, @wickmanGareth Rushgrove

We have a term for this; #GIFEE

Gareth Rushgrove

Google Infrastructure forEveryone Else

Gareth Rushgrove

Distributed systems are hard

Gareth Rushgrove

Building your own in-house framework is likely a waste of time

Gareth Rushgrove

Gareth Rushgrove From Adrian Colyer, Accel, https://speakerdeck.com/acolyer/making-sense-of-it-all

Kubernetes is the 3rd generationof Googles cluster management software

Gareth Rushgrove

Gareth Rushgrove

The Kubernetes API provides primitives that make doing theright thing easier

- Orchestration- Logging- Configuration- Self-healing- Storage

Gareth Rushgrove

- Load balancing- Service discovery- Scaling- Batch workloads- Lots more

Gareth Rushgrove

Exposed via a modern API

Machine learning is goingto be massive

Gareth Rushgrove

Soon We Won’t Program Computers. We’ll TrainThem Like Dogs

Gareth Rushgrove

TensorFlow is an open source software library for numerical computation

Gareth Rushgrove

(without introducing more risk)

Gareth Rushgrove

- Nearest neighbour- Linear regression- Recurrent neural networks- Multilayer perceptron- Lots more

Gareth Rushgrove

Gareth Rushgrove

Introductory ML docs

How do I do devops?

Gareth Rushgrove

Everyone ever”

Gareth Rushgrove

explain how they work too

Gareth Rushgrove

SRE: Have software engineersdo operations

Gareth Rushgrove

Dan Luu, ex Google ”“

http://danluu.com/google-sre-book/

(without introducing more risk)

Gareth Rushgrove

Dev SRE Ops

From http://web.devopstopologies.com/ by Matthew Skelton

The familiar:- Capacity planning- Performance- Change management- Monitoring

Gareth Rushgrove

The unfamiliar:- Error budget- Strong software engineering skills- 50% operations work cap

Gareth Rushgrove

A growing ecosystem

Gareth Rushgrove

Gareth Rushgrove

Friendly vendors

Gareth Rushgrove

More friendly vendors

Gareth Rushgrove

Even more nice vendors

(without introducing more risk)Summing up

For

“infrastructure” is shifting to ahigher level of abstraction

Gareth Rushgrove

It’s fine to just be a consumer

Gareth Rushgrove

You should be standing on the shoulders of giants

Gareth Rushgrove

You should be standing on the shoulders of

Gareth Rushgrove

(without introducing more risk)Against

Your organisation doesn’tlook like Google

Gareth Rushgrove

YOUR ORGANISATION DOESN’T LOOK LIKE GOOGLE

Gareth Rushgrove

Could your organisationlook like Google?

Gareth Rushgrove

How many employees do you have? Google have about 60,000

Gareth Rushgrove

What proportion of your organisation are software engineers or operations?

Gareth Rushgrove

50 percent?Based on the Google annual report December 2014

Gareth Rushgrove

How much do you paysoftware engineers?

Gareth Rushgrove

Gareth Rushgrove Data from Glassdoor, June 2016, based on 14k salaries

Gareth Rushgrove

The $3million engineer?

Gareth Rushgrove

Gareth Rushgrove

Build your own chips?

Could your organisationreally look like Google?

Gareth Rushgrove

So much of the information inthe SRE book makes PERFECT sense if you’re Google

Gareth Rushgrove

John Vincent, Ops Hero ”

The reality outside Google

Gareth Rushgrove

<1% of US workers are software engineers or programmers

Gareth Rushgrove US Bureau of Labor Statistics 2002. 1,069,000 jobs in working age population of 185million

Strategic vendor relationships

Gareth Rushgrove

Different applicationconstrains as well as differentorganisational constrains

Gareth Rushgrove

Goal of SRE team isn’t zero outages – SRE and product devs are incentive aligned to spend the error budget to get maximum feature velocity

Gareth Rushgrove

Dan Luu, ex Google ”

http://danluu.com/google-sre-book/

What if you’re operating an air traffic control system or a nuclear power station? Your goal is probably closer to zero outages

Gareth Rushgrove

Gareth Rushgrove

John Vincent SRE review

bringing a software engineering perspective to a problem isn’t always the best or right solution

Gareth Rushgrove

John Vincent, Ops Hero

Many of Google’s conclusions to operations problems are not unique

Gareth Rushgrove

Gareth Rushgrove

Gareth Rushgrove

Innovation happens elsewhere applies as much to Google as to other organisations

Gareth Rushgrove

(without introducing more risk)Summing up

Against

If a human operator needs to touch your system during normal operations, you have a bug. The definition of normal changes as your systems grow

Gareth Rushgrove

Carla Geisser, Google SRE

What is normal for Googlemay not be suitable foryour organisation

Gareth Rushgrove

Your startup with a single-purpose application does not have the luxury of having your operations team say I’m sorry you’re over your error budget

Gareth Rushgrove

John Vincent, Ops Hero ”

Gareth Rushgrove

(without introducing more risk)ConclusionsIf all you take away is…

Who votes…

Gareth Rushgrove

For

Who votes…

Gareth Rushgrove

Against

Who thinks it’s the wrong question?

Gareth Rushgrove

Context is king

Gareth Rushgrove

Gareth Rushgrove

The Overwhelming powerof context

Gareth Rushgrove

Charity Majors, Ops Person Extraordinaire”“

The technology we run, and how we run it, are interlinked

Gareth Rushgrove

(without introducing more risk)

The field of Sociotechnical Systems suggests that all human systems include both a technical system and a social system

Gareth Rushgrove

https://en.wikipedia.org/wiki/Coevolution#Technological_coevolution

(without introducing more risk)

Better outcomes are usually obtained by a reciprocal process of joint optimization, through which both the technical system and the social system change

Gareth Rushgrove

https://en.wikipedia.org/wiki/Coevolution#Technological_coevolution

Containers will not fix yourbroken culture

Gareth Rushgrove

Bridget Kromhout, Worlds nicest Ops Person”“

Awesome culture will not fix yourbroken containers

Gareth Rushgrove

Me, paraphrasing Bridget ”“

We are all collectively evolving the practice of operations

Gareth Rushgrove

Keep sharing, because it’s a pretty amazing ride

Gareth Rushgrove

(without introducing more risk)Questions

And thanks for listening