problem management practitioners forum thursday january 19, 2012 jon dowell jorge a. wong

42
Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Upload: janis-nash

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Problem Management Practitioners Forum

Thursday January 19, 2012

Jon DowellJorge A. Wong

Page 2: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Agenda

o Housekeeping & Introductionso Define a successful investigationo Makeup of a successful Problem Managero Proactive monitoring of automated alerts for trends/patternso Impact of Change Management on PbMo Feedback & next steps

Page 3: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Housekeeping & Introductions

Fire & WashroomsName, Company, & Experience

Jon Dowell

Page 4: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Jon Dowell• Senior Consultant with KSLD Consulting.• 15 years of experience solving I.T. mysteries.• Facilitation and critical thinking during:

o Major Incidentso Problem investigationso Project quality assessments prior to go-liveo Project warranty periods

• Training and mentoringo Critical thinkingo Root cause analysiso Impact assessmentso Potential risks associated with requests for change

KSLD Consulting specializes in I.T. Problem Management and problem solving for today’s busy world.

Page 5: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Jorge Wong• Over 13 years in IT with Enmax and Accenture

o Senior Systems Analysto Applications Support Team Leado Contact Center Technology Team leado Service Delivery Leado Relationship Managero Problem Manager

• ITIL Background • Focuses on reactive and proactive problem management• Facilitates and conducts problem investigations with cause mapping

analysis method to capture the complete investigation to:o Assess impact and costo Identify root cause(s)o Best solution(s) to prevent recurrence

• Reviews and analyzes data from incident management and pinpoint problems which will give the best results once resolved.

Page 6: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Define a successful investigation

Jorge A. Wong

Page 7: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

• Successful Problem Investigations• Must first understand:

• Why do we have problem investigations?• An investigation should be conducted to diagnose the root

cause of the problem.• How long should it take?

• The speed and nature of the investigation will vary depending upon the impact, severity, and urgency of the problem.

• What resources are required?• The appropriate level of resources and expertise should be

applied to finding a resolution corresponding to the priority and service levels targeted.

• Then, use your problem investigation toolkit. • There are many problem solving analysis, diagnosis and

solving techniques available and much research has been done in this area.

Page 8: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

• Successful Problem Investigations• Some of the most useful and frequently used techniques

include:• Chronological analysis

• Timeline of events• Pain Value Analysis

• What level of pain has been caused to the organization/business by these problems

• Kepner and Tregoe• Deeper rooted problems

• Cause Mapping• Deeper rooted problems

• 5 Whys• Cause and effect

• Brainstorming• Gather together the relevant people and brainstorm the problem

• Ishikawa Diagrams• Document causes and effects which can be useful in helping identify where

something may be going wrong, or be improved• Pareto Analysis

• Separate important potential causes from more trivial issues

• Use what is appropriate and what you feel comfortable with.

Page 9: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

• Successful Problem Investigationso End results

o Expected and desired outcome realizedo Root cause(s) identified and or validatedo Corrective measure(s) identified and or implementedo Effective use of resources throughout the investigation

o Which meanso Increased benefits to the business and the IT organization of:

o Decreased downtimeo Increased business satisfactiono Decreased amount of IT resources spent on incident management

o Other benefitso Influences future cost avoidanceo CMDBo Improved IT service qualityo Incident volume reductiono Permanent solutionso Improved organizational learningo Better first time fix rate at the Service Desko Improves existing processes and procedureso Happy Staff, including Problem Manager!

Page 10: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

What makes a successful Problem Manager?

Jon Dowell

Page 11: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

1111

Root Cause

Event

Page 12: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

1212

Root Cause

Event

Why?

Page 13: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

1313

Root Cause

EventTechnical

Failure

Why?

Page 14: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

1414

Root Cause

EventTechnical

Failure

Why?Why?

Page 15: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

1515

Root Cause

EventPeopleFailure

Technical Failure

Why?Why?

Page 16: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

1616

Root Cause

EventPeopleFailure

Technical Failure

Why?Why? Why?

Page 17: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

1717

Root Cause

EventPeopleFailure

ProcessFailure

Technical Failure

Why?Why? Why?

Page 18: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

1818

Root Cause

EventPeopleFailure

ProcessFailure

Technical Failure

Why?Why? Why? Why?

Page 19: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

1919

Root Cause

EventRoot

Cause?PeopleFailure

ProcessFailure

Technical Failure

Why?Why? Why? Why?

Page 20: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Kepner Tregoe has a process called Incident Mapping that performs a similar process.

Page 21: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

ThinkReliabilty also has a process called "Cause Mapping"

Page 22: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Brainstorm traits for a good Problem Manager…

Page 23: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Are these good Problem Managers?

Page 24: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong
Page 25: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

What about these individuals…?

Page 26: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong
Page 27: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

What are the traits of a Problem Manager?• Listening

o Ability to listeno Attention to detail… while listening

• Questioningo Open questions… to allow the story to flowo Closed questions… to confirm facts/detailso Ability to ask tough questions and not be side tracked by miss direction.

• Leadership• Ability to lead a teams, resolve conflict, and drive resolution.• Prioritization with a focus on business, not technical, impact.• Strong organization & time management abilities.

• Business writing skills

And…• Understanding of business terminology and concepts.• Understanding of basic technical concepts, architecture, and methodologies.

Page 28: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Helpful educational opportunities?• Dale Carnegie• Kepner Tregoe

• Problem Solving & Decision Making• Incident Mapping

• ThinkReliabilty• Cause Mapping

• FranklinCovey• Focus

• General Business Writing

Page 29: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Proactive monitoring of automated alerts for trends/patterns

Jorge Wong

Page 30: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Alerts and monitoring, why?• Identify future problems.• Prevent problems from happening.• Manage technology infrastructure based on business.• Anticipate and meet the needs of the business.• Effectively manage an increasingly intricate and complex

infrastructure.• Predict and solve problems before they affect business.• Industry analyst reports, IT still discovers about 70% of

problems through the service desk.

Page 31: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

• Alerts and monitoring, why?o Reactive to Proactiveo End-user experienceo Application performance and availabilityo Service level commitmentso Outageso Cost avoidanceo Resourceso Productivityo Efficiencyo Capacityo Predictive analyticso MTTRo MTBF

Page 32: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

• Alerts and monitoring, what?o Demando Capacityo Availabilityo KPIso Logso Serviceso Networko Serverso User Defined Monitoring and Instant Alerts

Monitor the Windows Event log Alert on hardware and software changes Alert on specific file changes and protection violations Know if disk space is running low on computers Monitor computer online/offline status Know if a server goes down Know when traveling users with notebooks connect Alert message and recipient configuration

Page 33: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

• Alerts and monitoring, what?o Pro-active approach

Server's utilization exceeds predefined percentage of total capacity available......raise alert!

Server CPU breaches 90% utilization, or disk becomes 80% full.o Food For Thought

What happens when a server goes down? Alarms, alerts, and notifications are triggered all over the place. The application, database, and operating system may appear to be down. However, this problem behavior may be due to a single point of failure

elsewhere in the network. What is the problem? What is the impact? What is or are the root causes? What is or are the workarounds and resolutions? Or......should we even be worried about it?

Problem Management Categories Re-active Pro-active Predictive Intelligence?

Page 34: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Feedback & next steps

Jorge A. Wong

Page 35: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

• Next Stepso Future sessions

Problem Management Practitioner Forums 2012 January 19  (9am - Noon) March 15  (9a - Noon) June 7  (9a - Noon) Followed by casual lunch

Change Management Practitioner Forum 2012 April 12 (9a - Noon) <Tentative>

Business Analyst World Conference 2012 May 7, 8, & 9

Practitioner Forums 2012 Looking for subject ideas Configuration Management Service Level Management Looking for thought leaders and interested participants

Page 36: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

15 Minute Break

Page 37: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Thank you!

Page 38: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Appendix

Page 39: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

Problem Management: What it is? Is not?

Jorge A. Wong

Page 40: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

IT Problem ManagementWhat is a Problem?A cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created.What is Problem Management?The objective of Problem Management is to resolve the root cause of Incidents, and to prevent the recurrence of Incidents related to these errors.What does a Problem Manager do?The Problem Manager is responsible for managing the lifecycle of all Problems. He undertakes research for the root-causes of Incidents and thus ensures the enduring elimination of interruptions. His primary objectives are to prevent Incidents from happening, and to minimize the impact of Incidents that cannot be prevented.

Page 41: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

What is Root cause Analysis?

• A standard process of:o Identifying a problem

What happened?o Containing and analyzing the problem

What were the root causes of the problem?o Defining the root cause

What internal options are available to deal with the problem?

o Defining and implementing the actions required to eliminate the root cause What is the cost of acting upon the available options?

o Validating that the corrective action prevented recurrence of problem Which decision options will provide the most cost-

effective solution?

Validate

Follow Up Plan

Complete Plan

Action Plan

Root Cause

Immediate Action

Identify Team

Identify Problem

Page 42: Problem Management Practitioners Forum Thursday January 19, 2012 Jon Dowell Jorge A. Wong

At a high level, problem investigation looks at:• What were we doing? (Before Major Incident, Incident)• What was the problem?• Why did it happen?• What should be done?• What will we be doing now? (After Problem Investigation)