just enough web ops for web developers

60
Just Enough WebOps for Developers Alexis Lê-Quôc @alq http://www.datadoghq.com

Upload: datadogslides

Post on 02-Jul-2015

50 views

Category:

Technology


4 download

DESCRIPTION

Datadog is monitoring that does not suck. It's metrics friendly, people friendly and developer friendly monitoring. Learn more at https://www.datadoghq.com/

TRANSCRIPT

Page 1: Just enough web ops for web developers

Just Enough WebOpsfor Developers

Alexis Lê-Quôc @alqhttp://www.datadoghq.com

Page 2: Just enough web ops for web developers

@alq

Page 3: Just enough web ops for web developers

@alq

Co-founder DATADOG

Page 4: Just enough web ops for web developers

Datadog is Monitoring that does not suck... as a Service

Page 5: Just enough web ops for web developers

Datadog is Monitoring that does not suck... as a Service

“Metrics made social”

Page 6: Just enough web ops for web developers

People-friendly Monitoring

Page 7: Just enough web ops for web developers

Developer-friendly Monitoring

Page 8: Just enough web ops for web developers

Dev Ops930,000 350,000

2010 US figures from BLS

Page 9: Just enough web ops for web developers

The New Development

Equation

Page 10: Just enough web ops for web developers

Code + + AWS =

The New Development Equation

Page 11: Just enough web ops for web developers

Code + + AWS =3 months

The New Development Equation

Page 12: Just enough web ops for web developers

Code + + AWS =3 months 5 minutes

The New Development Equation

Page 13: Just enough web ops for web developers
Page 14: Just enough web ops for web developers

Web Operations?

Page 15: Just enough web ops for web developers

Code + + AWS =3 months 5 minutes

The New Development Equation

Page 16: Just enough web ops for web developers

Code + + AWS =3 months 5 minutes

Web Operations?

The New Development Equation

Page 17: Just enough web ops for web developers
Page 18: Just enough web ops for web developers

Cargo cult Operations

Page 19: Just enough web ops for web developers
Page 20: Just enough web ops for web developers

Common vocabularybetween Dev & WebOps?

Page 21: Just enough web ops for web developers

Users

SysAdmin

Page 22: Just enough web ops for web developers

“Come and get it”

“We want root!”

Page 23: Just enough web ops for web developers

Dev

WebOps

Page 24: Just enough web ops for web developers

WebOps

and this is what I do

Page 25: Just enough web ops for web developers

But first an important digression

Page 26: Just enough web ops for web developers

Product Service

Page 27: Just enough web ops for web developers

Service = Code + Infrastructure

Page 28: Just enough web ops for web developers

Service = Product + Access

Page 29: Just enough web ops for web developers
Page 30: Just enough web ops for web developers

Provide access

Page 31: Just enough web ops for web developers

Provide access

Page 32: Just enough web ops for web developers

Provide access

reliable, fast, cheap

Page 33: Just enough web ops for web developers

Provide access

reliable, fast, cheap

Page 34: Just enough web ops for web developers

Provide access

reliable, fast, cheap

24x7without going crazy

Page 35: Just enough web ops for web developers

24x7 && !crazy

Page 36: Just enough web ops for web developers

DevelopmentModels

Page 37: Just enough web ops for web developers
Page 38: Just enough web ops for web developers

Delivery historicallynot the focus

Page 39: Just enough web ops for web developers

Agile Cycle Delivery

Page 40: Just enough web ops for web developers

Agile Cycle Delivery

Page 41: Just enough web ops for web developers

Agile Cycle DeliveryWebOps Cycle

Page 42: Just enough web ops for web developers

WebOps

and this is what I do

Page 43: Just enough web ops for web developers

Dev Release Measure & Log

Monitor

AlertInvestigate

Change

Fix || Escalate

WebOps Cycle

Page 44: Just enough web ops for web developers

(Release)

Page 45: Just enough web ops for web developers

Dev Release

Monitor

AlertInvestigate

Change

Fix || Escalate

Measure & Log

Page 46: Just enough web ops for web developers

Measure

PurposeCollect quantitative metrics

ProcessInstrument serversInstrument codeInstrument SaaS depsAutomate collection

RisksImprecise metric definitionManual collection“What does it mean?”

ToolsSystem (ganglia, collectd, munin, nagios, etc.)Code (metrics, statsd)SaaS (Datadog et al.)

Page 47: Just enough web ops for web developers

Dev Release

Monitor

AlertInvestigate

Change

Fix || Escalate

Measure & Log

Page 48: Just enough web ops for web developers

Log

PurposeCollect meaningful, timestamped events

ProcessAll the timeIn one placeAccess for everyoneDiscipline

RisksTiB of garbageNon-uniform timestampsNon-uniform formats

Toolslog4j et al.syslog et al.logstash, splunk+ Logging-as-a-Service

Page 49: Just enough web ops for web developers

Dev Release Measure & Log

AlertInvestigate

Change

Fix || Escalate

Monitor

Page 50: Just enough web ops for web developers

Monitor

PurposeWatch actionable events & metrics

ProcessHealth of the app?Which metrics for health?Compute metricsMetric domainAccess for everyonePretty graphs

RisksNon-actionable metrics

Toolsgraphite, cubism et al.+ services

Page 51: Just enough web ops for web developers

Dev Release Measure & Log

Monitor

Investigate

Change

Fix || Escalate

Alert

Page 52: Just enough web ops for web developers

Alert

PurposeBring human in the loopwhen automated fix does not work

ProcessAlert on vital monitorsAdd new alerts with new monitorsCompute metrics from alertsRuthlessly edit

RisksToo many alertsBecome desensitizedIgnore alertsApp crashes for realPendulum swings back

Toolsnagios+ services

Page 53: Just enough web ops for web developers

Dev Release Measure & Log

Monitor

AlertInvestigate

Change

Fix || Escalate

Page 54: Just enough web ops for web developers

Fix || Escalate

PurposeFix issue or find someone who can

Process(fix) capture actions as soon as possible (while or shortly after)(fix) runbooks(fix) automate fixes(escalation) on-call rotation(escalation) agree on rules

RisksBurn out

ToolsPagerDutyBug tracker

Page 55: Just enough web ops for web developers

Dev Release Measure & Log

Monitor

Alert

Change

Fix || Escalate

Investigate

Page 56: Just enough web ops for web developers

Investigate

PurposeCollect evidenceReconstruct what happened

ProcessStart where/when problem 1st detectedWork your way from thereCapture relevant graphs/logs

RisksMissing the starting pointLagging events/metricsLow-level events/metricsBlame game

ToolsPost-mortems

Page 57: Just enough web ops for web developers

Dev Release Measure & Log

Monitor

AlertInvestigate

Fix || Escalate

Change

Page 58: Just enough web ops for web developers

Change

PurposeFewer alertsBetter service

ProcessChange infrastructure, codeInfrastructure == codeAdd/Edit monitors & alerts

Risksad-hoc changes

Tools...

Page 59: Just enough web ops for web developers

WebOps

and this is what I do

Page 60: Just enough web ops for web developers

Dev Release Measure & Log

Monitor

AlertInvestigate

Change

Fix || Escalate

Questions?Comments?

@alq