practical methods for adopting devops - michael stahnke
TRANSCRIPT
Practical Methods
for Adopting
DevOps
Michael Stahnke, Director of Engineering Services
@stahnma
• Introduction
• DevOps Overview
• When It’s Bad
• Making It Better – 5 Methods
• Questions
Next Week & PuppetConf Overview
Next Week & PuppetConfIntroduction
• Server architecture
• Infrastructure
automation
• Release
management
It’s impossible to exceed expectations.
Nobody ever got a high-five for a server
staying up.
Next Week & PuppetConfDevOps Culture
• Lots of press, talks,
hype
• Echo chamber
• Does it apply to all of
us?
• 10 deploys/day
• Rockstar ninja pirates
• Write your own
tools
• Awesome app
developers
• Startups
Next Week & PuppetConfSetting the Stage
• Large company size with competing
priorities
• IT teams that don’t sit in the same
building
• Average time in role: 2-3 years
• No dev … “application owners”
• Everybody really “nice”
Next Week & PuppetConfSetting the Stage
• Call the vendor for problems
• Customize for us, our needs are
unique
• Success means everything stays the
same
• Just looking for approvals
• No time to debate
Next Week & PuppetConfSetting the Stage
• Enterprise-y: Tivoli, BMC, CA, IBM
• Apps: Oracle RAC, Oracle EBS,
Websphere AS, MQ Series, Weblogic,
Legacy stuff
• HP-UX, Solaris, AIX, RHEL, VMware
Tool
s
Next Week & PuppetConfInfrastructure Team
• Small team – 4 of you quit; that’s 50%
of the team
• Information is tribal – What happens
when the tribe leaves?
• 16 hour days for some admins
• Server team blamed for EVERYTHING
This Silo? What Silo?
Next Week & PuppetConfInfrastructure Team
• No transfers
• Windows team doesn’t work with Unix
team, who doesn’t talk to Storage team
• Exiled to different buildings
This Silo? What Silo?
Next Week & PuppetConfIs it all bad?
• Industry experience
• Be the best sysadmin
• Open Source tooling
• LDAP
Motivations
Next Week & PuppetConfDatabases Rule the World
• Variance
• 25 db servers
• 24 unique configurations
• At least 6 cluster pairs
Reduce the standard deviation value. Then
raise the mean.
Next Week & PuppetConf Method 1
Reduce variability.
Next Week & PuppetConf Method 1
Standardize & automate what’s variable
or inconsistent to reduce
deviation.
Do you want systems to have planned or
unplanned downtime? Because you’re going
to get one or the other.
Next Week & PuppetConf Results
• Failing in known ways
• Better uptime
• Fewer configurations
• Lots of tickets on disk management
Collaboration?
Next Week & PuppetConf Collaboration
• Sudo for disk mgmt
• Shared responsibility for outages
• Proper classification (production, test,
dev)
• If I’m awake, you’re awake
Next Week & PuppetConfCollaboration
• Share the pain
• People who carry the pager will make
better design decisions when not on-
call
Next Week & PuppetConf Method 2
Stop. Collaborate. Listen. (Break down
silos)
Next Week & PuppetConf Method 2
Integrate your Ops engineers early in the
ALC and collaborate on interfaces &
handoffs
Next Week & PuppetConfRoot Cause Analysis Meetings
• What’s the root cause of the outage?
• What’s the exact impact?
• What do we do differently next time?
Next Week & PuppetConf Method 3
Shout your failures. (Honesty builds
credibility)
Next Week & PuppetConf Method 3
Listen and learn about failures so they
aren’t repeated.
Next Week & PuppetConfRoot Cause Analysis Meetings
• Shout out your failure
• Hand out a written report & timeline of
what happened
• Remediation plan
• Meeting lasts 10 minutes
Next Week & PuppetConf Setbacks
• Everything isn’t better overnight
• Time passes, new issues
• Lots of failures
• Config Mgmt messes
Don’t expect perfection
Next Week & PuppetConfExperiments
• What can you try while you’re fixing
process?
• Scrum teams
• SDLC
• Inventory service
• Puppet
Next Week & PuppetConf Method 4
Experimentation matters. (Failures are
still valid data)
Next Week & PuppetConf Method 4
This means Ops engineers need
dev/test environments for monitoring &
tooling
Next Week & PuppetConfPeriodic Review
• Find biggest pain points – optimize
• Improving anything that isn’t the
biggest bottleneck in a process is a
wasted effort!
• Drive value up the stack
• Find metrics people understand
• (Use Puppet)
The cost of retiring any technology is much
higher than introducing it.
Next Week & PuppetConf Learning
• Python, Ruby, Puppet, git, REST,
APIs, Web Services, Go, Containers
• Retrospectives
• Daily standups
• Lead by example
Are we satisfied with how this works, or are
we trying to make it better?
Next Week & PuppetConfImprovements
• Influence other teams
• Automation help
• Monitoring
• Design
• Requests to transfer to your team
Next Week & PuppetConf Method 5
Solve causes not symptoms.
(Continuous improvement)
Next Week & PuppetConf Method 5
Focus Ops Engineers on proactive fixes
to root causes, not fighting the
symptoms
• Reduce variability.
• Stop. Collaborate. Listen.
• Shout your failures.
• Experimentation matters.
• Solve causes not symptoms.
Next Week & PuppetConf Recap