visible ops and foundational controls: an eight year study of high
TRANSCRIPT
The Leader in
Configuration Audit & Control
Visible Ops and Foundational Controls:
An Eight Year Study Of High Performing IT Organizations
Gene Kim, CISACTO, Tripwire, Inc.
10/18/2007
Introductory Questions
� What about your job causes you to feel uncomfortable?
� In your interactions with your business peers and management, what situations don’t feel right to you?
2
Agenda
� Background on Visible Ops and high performing IT organizations
� Review 2006 IT Controls Performance Study
• Two key surprises on performance and controls
• 21 foundational controls that had majority of impact on IT performance
3
performance
• Expanded benchmark to 350 IT organizations in 2007
� Present findings from 2007 IT Controls Performance Study
� What have we learned about metrics that actually matter?
� How to improve IT performance in 90 days
1000
10,000
The Highest Performing IT Organizations Get Results
Operations Metrics Benchmarks:Best in Class: Server/SysAdmin RatiosSize of Operation
Size of Operation
• Highest ratio of staff for pre-production processes
• Lowest amount of unplanned work
4
1
10
100
0 20 40 60 80 100 120 140
# Servers
Server/SysAdmin Ratio
Size of Operation
Size of Operation
Efficiency of OperationEfficiency of Operation
unplanned work
• Highest change success rate
• Best posture of compliance
• Lowest cost of compliance
Best in Class Best in Class
Ops and SecurityOps and Security
Source: www.itpi.org
Common Traits of the Highest Performers
Change management
Causality
Culture of…
� Integration of IT operations/security via problem/change management
� Processes that serve both organizational needs and business objectives
� Highest rate of effective change
5Source: IT Process Institute
Causality
Compliance and continual reduction of operational variance
� Highest service levels (MTTR, MTBF)
� Highest first fix rate (unneeded rework)
� Production configurations
� Highest level of pre-production staffing
� Effective pre-production controls
� Effective pairing of preventive and detective controls
Seven Habits of Highly Effective IT Organizations
They…
1. Have a culture that embraces change management.
2. Monitor, audit, and document all changes to the infrastructure.
3. Have zero tolerance for unauthorized changes.
4. Have specific, defined consequences for unauthorized changes.
5. Test all changes in a preproduction environment before implementing into production.
6
into production.
6. Ensure preproduction environment matches production environment.
7. Track and analyze change successes and failures to make future change decisions.
� All high performers have created Cultures of…
• Change Management
• Causality
• Planned Work
Visible Ops: Playbook of High Performers
� The IT Process Institute has been studying high-performing organizations since 1999
• What is common to all the high performers?
• What is different between them and
7
• What is different between them and average and low performers?
• How did they become great?
� Answers have been codified in the Visible Ops Methodology
� The “Visible Ops Handbook” is now available from the ITPI www.ITPI.org
Security Management
Availability & Contingency
Management
Service Level Management
Service Reporting
Capacity Management
Financial Management
Control Processes
Asset & Configuration Management
Change Management
Release Processes
Release Management Resolution Processes
Incident Management
Problem Management
Supplier Processes
Customer Relationship
Management
Supplier Management
Automation
Service Design & Management
Visible Ops: Four Steps To Build An Effective Change Management Process
Phase 2: Catch and Release, Find Fragile Artifacts
Phase 3: Establish Repeatable Build Library Tripwire protects fragile
artifacts.
Tripwire enforces change
Tripwire captures known good state in preproduction.
8
Phase 1: Electrify Fence, Modify First Response
Phase 4:
Continually improve
Tripwire enforces the change process.
Tripwire rules out change as early as possible in the repair cycle.
Tripwire enforces change freeze and prevents configuration drift.
state in preproduction.
Tripwire captures production changes that need to be baked into the build.
Tripwire detects change, which all process areas hinge upon.
Source: ITPI Visible OpsSource: IT Infrastructure Library (ITIL) / BS 15000
ITPI Survey: Demographics
IT Employees IT Budget
Average 483 $114 million
Min 3 $5 million
Max 7,000 $1,050 million
9
Surprise #1: How Good The High Performers Are
� High performers contribute more to the business
• 8 times more projects and IT services
• 6 times more applications
� When high performers implement changes…
• 14 times more changes
10
• One-half the change failure rate
• One-quarter the first fix failure rate
� When high performers manage IT resources…
• One-third the amount of unplanned work
• 5 times higher server/sysadmin ratios
� When high performers are audited…
• Fewest number of repeat audit findings
Source: IT Process Institute, May 2006
High performers also have 3x higher budgets, as measured by IT operating expense as a function of revenue
And Security High Performance, Too
� When top performers have a security breach…
• Loss events are 29% less likely than in medium performers, and 84% less likely as low performers
• Failure to detect of the security breach by an automated control is 60% less likely than medium performers, and 79% less likely than low performers
11
low performers
• Time to detect is minutes for top performers, hours for medium performers, and days for low performers
� Top performers also allocate 3x more budget to security, as a function as IT operational expense
Surprise #2: What The High Performers Do Differently
Top Two Differentiators between Good and Great
1. Systems are monitored for unauthorized changes
2. Consequences are defined for intentional unauthorized changes
Foundational Controls:
High vs MediumFoundational Controls:
Medium vs Low
12Source: IT Process Institute, May 2006
Security Management
Availability & Contingency
Management
Service Level Management
Service Reporting
Capacity Management
Financial Management
Control Processes
Asset & Configuration Management
Change Management
Release Processes
Release Management Resolution Processes
Incident Management
Problem Management
Supplier Processes
Customer Relationship
Management
Supplier Management
Automation
Service Design & Management
Source: IT Infrastructure Library (ITIL) / BS
We selected the 6 leading BS15000 areas within ITIL that are conjectured to be “where to start.”
These were Access, Change, Resolution, Configuration,
Release, Service Levels
1
Design Survey: Pick IT Controls
13Source: COBIT, IT Governance Institute/ISACA
Source: IT Infrastructure Library (ITIL) / BS 15000
We then selected 63 COBIT control objectives within these areas.
2
The 63 IT Controls
Access Change Configuration Release Service Level Resolution Do you have a formal process for requesting, establishing, and issuing user accounts? Do you have an automated means of mapping user accounts to an authorized user? For each employee/resource, do you record a list of system access rights? Do you audit user accounts to ensure that they map to an authorized employee? Do you have procedures to keep authentication and
Do you have a formal IT change management process? Do you use tools to automate the request, approval, tracking, and review of changes? Do you track your change success rate? Do you track the number of authorized changes implemented in a given period? Do you track how many changes are denied the first time they are considered by the change authority?
Do you have a formal process for IT configuration management? Do you have an automated process for configuration management? Do you have a configuration management database (CMDB)? Does the CMDB describe relationships and dependencies between the configuration items (infrastructure components)? Does your configuration management database specify to
Do you have a standardized process for building software releases? Do you use tools to automate the build of new releases of software applications? Do you use automated software-distribution tools? Do you test all releases before rollout to a live environment? For release testing purposes, do you maintain an identical testing environment to your production environment?
Do you have someone (a service level manager) who is responsible for monitoring and reporting on the achievement of the specified service performance criteria? Do you have a service catalog? Do you regularly review your service catalog? Do you regularly review service level agreements? Do you have a service improvement programme? Do you ever renegotiate the
Do you have a defined process for managing incidents? Do you have an automated process for managing incidents? Do you track the percentage of incidents that are fixed on the first attempt (first fix rate)? Do you use a knowledge database of known errors and problems to resolve incidents? During an incident, do you ever rebuild rather than repair? Do you have a
The resulting controls that we selected were in the following control categories:
• Access Controls: 17 controls
• Change Controls: 13 controls
• Configuration Controls: 7 controls
• Release Controls: 6 controls
14
authentication and access mechanisms effective? Do you have a formal process for suspending and closing user accounts? Do you have processes for granting and revoking emergency access to relevant staff? Do IT personnel have well-defined roles and responsibilities? Do you have an automated process for defining and enforcing user account roles? Do user accounts ever allow actions that exceed their specified role? Do you monitor accounts to detect when they exceed their specified role? Do you rigorously enforce separation of duties between
Do you monitor systems for unauthorized changes? Are their defined consequences for intentional unauthorized changes? Do you have a change advisory board or committee? Do you have a change emergency committee? Do you use change success rate information to avert potentially risky changes? Do you distribute a forward schedule of changes to relevant personnel? Do you conduct regular audits of successful, unsuccessful, and unauthorized changes? Are changes thoroughly tested
database specify to which business service each configuration item supports? Are you able to provide relevant personnel with correct and accurate information on the present IT infrastructure configurations, including their physical and functional specifications? Do you monitor and record the time it takes to correct configuration variance?
Do you have a definitive software library (DSL)?
renegotiate the defined consequences in the service level agreement? Do you have a formal process to define service levels? Does your service level agreement cover ALL of the following aspects: availability, reliability, performance, growth capacity, levels of user support, continuity planning, security, and minimum level of system functionality?
Do you have a defined process for managing problems? Do you have an automated process for managing problems? Do you follow a structured method for analyzing and diagnosing problems? Do you have a defined process for managing known errors? Do you proactively identify problems and known errors before incidents occur? Is there integration between your problem management and change management processes? Is there integration between your problem management and configuration management
• Release Controls: 6 controls
• Service Level Controls: 8 controls
• Resolution Controls: 12 controls
The 21 Foundational Controls
Access Change Config
�Do you have a formal process for requesting, establishing, and issuing user accounts?
�Do you have an automated means of mapping user accounts to an authorized user?
�Do IT personnel have well-defined roles and responsibilities?
�Do you regularly review logs of violation and security activity to identify and resolve incidents of unauthorized access?
�Do you track your change success rate?
�Do you monitor systems for unauthorized changes?
�Are their defined consequences for intentional unauthorized changes?
�Do you use change success rate information to avert potentially risky changes?
�Do you have a formal process for IT configuration management?
�Do you have an automated process for configuration management?
�Are you able to provide relevant personnel with correct and accurate information on the present IT infrastructure configurations, including their physical and functional specifications?
15
Release Service Levels Resolution
�Do you have a standardized process for building software releases?
�For release testing purposes, do you maintain an identical testing environment to your production environment?
�Do you have a definitive software library (DSL)?
�Do you regularly review your service catalog?
�Do you have a service improvement program?
�Do you have a formal process to define service levels?
�Do you track the percentage of incidents that are fixed on the first attempt (first fix rate)?
�Do you use a knowledge database of known errors and problems to resolve incidents?
�During an incident, do you ever rebuild rather than repair?
�Do you have a defined process for managing known errors?
The ITPI identified 23 “foundational controls” and used cluster analysis techniques to identify the relationship between the use of Foundational Controls and performance indicators of the companies studied
Three clusters emerged.
1Each wedge in the pie represents one of the foundational controls.
Each bar represents the percentage of the cluster members that responded ‘yes’ to that control.
2Almost all of the members of the high performing cluster had all of the foundational controls.
3
Almost all of the members of the low performing cluster had no controls, except for access and resolution.
4
High, Medium and Low Performing Clusters
16
Low Performer Medium Performer High Performer
Three clusters emerged.
Source: IT Process Institute, May 2006
2007: Larger Repeat Benchmark With Even More Fascinating Results
� In 2007, the ITPI and the Institute of Internal Auditors repeated the benchmark to answer the following questions:
• Are the results still valid for a larger sample?
N = 350 IT
Employees
IT Budget
Average 587 $236 million
Min 2 $1 million
Max 3,500 $15 billion
17
larger sample?
• Can the set of foundational controls be reduced even further?
� 350 organizations were benchmarked
� There were two even bigger surprises in the study
Source: IT Process Institute/Institute of Internal Auditors (May 2007)
Surprise #1: Type 1 Organizations: 3 Foundational Controls
� Three essential foundational controls explain 60% of performance
• Defined consequences for intentional, unauthorized changes
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
0.00 10.00 20.00 30.00 40.00 50.00 60.00
Group1
Group2
Group3
Group4
Group5
18
changes
• A defined process to detect unauthorized access
• A defined process for managing known errors
These controls seem familiar…
The controls indicate a culture of change management and a culture of causality!
High Performers Can Bound Maximum MTTR
But look at the huge differences for large outages!
19Source: IT Process Institute, May 2006
(Large outages required 25-50 people to fix!)
High Performers Have Fewer Repeat Audit Findings
� High performers not only have fewer repeat
audit
findings, and
20
findings, and spend less time on audit
and
compliance
activities
Percentage of Emergency IT Changes
21Source: IT Process Institute/Institute of Internal Auditors (May 2007)
Visible Ops Phase 1: Ungoverned Change
Unplanned work(Unplanned work > 100%)
Our prediction is that as failed changes and unauthorized changes increase, unplanned work increases at a growing rate, to the point where overtime or additional staff are required. Note that the
23
time
Change rate
Failed changes orNum of unauthorized
changes
staff are required. Note that the total number of changes does not have to increase for this to occur.
Source: ITPI Visible Ops
How Do You Electrify Fence?
� Must have a report that shows management that all production changes are authorized
• What changes map to authorized and approved
24
authorized and approved work orders?
• What changes do not match expected changes?
What Happens When You Touch The Fence?
� High-performing IT organizations had some common processes for handling unauthorized change
• Making engineering team own the controls: “We just detected an unauthorized change – you have four hours to retroactively document your cowboy change, otherwise we mobilize security.”
• Deterrent and cultural controls: E.g., wall of shame, “two strikes
25
• Deterrent and cultural controls: E.g., wall of shame, “two strikes and you’re out”
� Auditors love it when Management owns the controls
• Preventive policies
• Detective controls showing policies are being enforced
• Documentation of corrective actions, showing deterrent controls
Visible Ops Phase 1: Stabilized Patient
Unplanned work
26
time
Change rate
Failed changes orNum of unauthorized
changes
The better alternative is to reduce or eliminate all unauthorized or failed change. One purpose of the survey is to find which controls are best at achieving this objective. If you can reduce unauthorized change, we predict unplanned work will start to decrease.
Source: ITPI Visible Ops
Visible Ops Phase 1: Increasing Auditability
Control over
Auditors perception of assurance
27
time
change
% of time spent on compliance activities
Time spent on audit prep and liaising
We predict that improved control over change will also reduce compliance costs and effort, as well as increasing auditors’ perception of effective IT controls.
Source: ITPI Visible Ops
Visible Ops Phase 1: Operational Excellence And Strategic Excellence
Business satisfaction with IT
Ability to fund IT projects
28
time
Unplanned work
Completion of planned work
Source: ITPI Visible Ops
Visible Ops Phase 2: Drifting Configurations
Change success rate
Unplanned work
Once change management is under control, we then predict that as IT configurations diverge from their desired state, unplanned work will increase at a growing rate because changes will not be
29
time
# of unique configurations
Mastery of each configuration
because changes will not be consistently successful.
Source: ITPI Visible Ops
Visible Ops Phase 2: Find Fragile Artifacts
Mastery of each configuration
Change success rate
30
time
# of unique configurations
Unplanned workIf you can reduce the configuration variance, unplanned work will decrease as the change success rate increases. This VEESC study also investigates which controls are most effective at achieving this objective.
Source: ITPI Visible Ops
Does Research Validate The Theory?
Foundational Control % high with
the specified
control
% medium with
the specified
control
Difference
C23 Do you monitor systems for unauthorized changes?
93 21 72
Note that virtually every top performer monitors their systems for unauthorized changes…
Visible Ops:
Electrify the Fence
31
C24 Are there defined consequences for intentional unauthorized changes?
93 32 61
C31 Do you have a formal process for IT configuration management?
100 42 58
C32 Do you have an automated process for configuration management?
79 21 58
C20 Do you track your change success rate? 86 32 54
C36 Are you able to provide relevant personnel with correct and accurate information on the present IT infrastructure configurations?
100 47 53
…and has defined consequences for unauthorized changes!
Organizations
that have these
controls are
almost always
great.
Visible Ops:
Create Consequences for
Touching the Fence
Source: IT Process Institute, May 2006
Top 5 Mistakes IT Management Makes
Not locking down change
“We can’t – we won’t be able to
get anything done.”
Not electrifying the fence
“We don’t need to – we trust
our own people.”
The continual desire for a
technical solution
Reward personal heroics instead
of repeatable discipline
32Source: The Visible Ops Handbook, © IT Process Institute
technical solution
Technology is easier to justify and
implement than people and
process improvements
of repeatable discipline
“If one person can save the
entire boat, one person
can probably sink it, too.”
The biggest failure is accountability while the biggest
obstacle is a commitment to the process
The only acceptable number of unauthorized change is “zero”
Resources
� ITPI Visible Ops Handbook
• Kevin Behr, CTO, IP Services, Inc.
• Gene Kim, CTO, Tripwire, Inc.
• George Spafford, Spafford Global Consulting
� ITPI IT Controls Performance Study
• Gene Kim, CTO Tripwire, Inc.
33
• Gene Kim, CTO Tripwire, Inc.
• Kurt Milne, ITPI
• Dr. Dan Phelps, Florida State University
• Dr. Grant Castner, University of Oregon
� Get your copy of VisOpsEmail: tripwire.com/visibleops
� More Info:Email: [email protected]
Top 5 Mistakes IT Management Makes
Not locking down change
“We can’t – we won’t be able to
get anything done.”
Not electrifying the fence
“We don’t need to – we trust
our own people.”
The continual desire for a
technical solution
Reward personal heroics instead
of repeatable discipline
34Source: The Visible Ops Handbook, © IT Process Institute
technical solution
Technology is easier to justify and
implement than people and
process improvements
of repeatable discipline
“If one person can save the
entire boat, one person
can probably sink it, too.”
The biggest failure is accountability while the biggest
obstacle is a commitment to the process
The only acceptable number of unauthorized change is “zero”
Visible Ops:Achieving Breakthrough Results In
30+90 Days
35
30+90 Days
Gene Kim, CISACTO, Tripwire, Inc.
Our Current Reality When Core Conflict Is Not Resolved: Does This Feel Familiar?
90. * Business is
dissatisfied with IT
25. * Too many
projects, some of
dubious ROI
403. Too much "bad
multitasking" causes
planned work to go
slower
39. * IT is perceived
as not having
35. * IT fails to
deliver on project
commitments
22. * IT staff quality
of life and retention
issues
46. * IT does not
have capacity to
innovate and help the
business win
45. * Infrastructure
improvement projects
always backburnered
(capex, opex)
32. * Tools are not
fully implemented
(benefits never
realized)
73. Fragile/mysterious
infrastructure never
replaced
47. * IT has difficulty
creating business
justification for
needed projects
46. * IT scarce
resources cannot
complete planned
work
Source: Tripwire, Inc.
View in slideshow mode to see animations
31. * Projects never
finish (always 80%
complete)
24. * Projects are
chronically late
36
30. * Changes to
mysterious
infrastructure require
long and protracted
resolution
30a. * We don’t know
which changes will
cause outages or
adverse service
impact
30b. Changes are not
adequately tested
42. * Too many
changes cause
outages/incidents/failu
res (despite being
"managed")
44. * Proposed fixes
often don’t work
43. * Outages take too
long to repair
21. * IT has to do
urgent and unplanned
work at random times
67. Business
significantly impaired
because of outage
dubious ROI
37. * Process to
request change takes
longer than time to do
work
67. There is a backlog
of changes that need
to be implemented
69. Doing things right
(testing, planning)
takes too long
46. * IT does not have
insight into the true
costs of doing a
change, service
request, incident, etc.
*
81. IT cannot
effectively say "no" to
the business
as not having
sufficient sense of
urgency
There is pressure to complete
work/changes quickly
There is pressure to complete
work/changes more carefully
Tension between Dev,
Operations, and
Security, Project
Mgmt
41. * Business will
take path of least
resistance to get
things done
(bypassing planning,
prioritizing,
coordination)
(control)
38. Poor planning or
research cause urgent
work (e.g., firewall
rule change)
53. * Auditors find IT
general control
deficiencies
We Have Identified A Chronic Core Conflict That Virtually Every IT Organization Faces
Respond to urgent
business needs
Complete
work/changes
quickly
37
Ensure that IT
contributes to the
business goals
Provide a stable
and predictable IT
production
environment
Complete
work/changes
slowly to ensure
successful
outcome
Source: Tripwire, Inc.
Success Stories: From David Allen, ELCA Board of Pensions
“We need to improve our ability to deliver changes to infrastructure in a reliable and rapid manner.
The techniques which Gene uses in his TOC/Visible Ops program are general-purpose thinking tools that are extraordinarily powerful.
38
“After just three sessions, our team had a sense of excitement when they realized that we might actually be able to significantly improve our change process and make our work environment a better place."
Success Stories: From Dave Colburn, Coldwater Creek
“Our time with Gene Kim at Tripwire using their methodology around TOC/Visible Ops has been invaluable!
“This approach impresses me as the missing link: it is the first approach I have seen that allows fast moving
organizations to gain immediate momentum with an ITIL
39
organizations to gain immediate momentum with an ITIL
initiative, while focusing on the critical prerequisites to demonstrating quick wins and producing supportive metrics to drive continued ITIL implementation, to support the final implementation of the Service Delivery processes.”
High Performers Leverage Configuration Audit
� One-half the failure rateAvailability
Zero tolerance for unauthorized change & Consequences when they are detected
Monitoring controls to find undocumented and unauthorized change
and they enjoy…and they enjoy…
ITPI High Performers have…
40 SOURCE: IT Process Institute; IT Controls Performance Study, May 2006
Compliance
& Security� Fewest number of audit findings
� Fewest number of security loss events
� 8 times more projects
� 5 times higher server / system admin ratiosEffectiveness/
Efficiency
� 14 times more changes Availability