it – emergency change process - university of new...
TRANSCRIPT
ITSMChange Management
IT – Emergency Change Process
Yvette Fournier- Change Manager505-321-3287 (pager: 505-951-0950)
[email protected] 22, 2009
1
Different Types of Changes:
1. Logging and Notification
2. Peer Review and Approval
3. Manager Review and Approval
4. High Risk/Outage – TAT/CAB Review and Approval
5. Emergency Changes – CAB/EC Review and Approval TAT/CAB post implementation review
2
3
# Change Characteristic\Type of Request for ChangeRFC – IT –Logging &
Notification
RFC- IT –Peer
Review
RFC – IT –Manager Review
RFC – IT –High Risk /
Outage(TAT/CAB Review)
RFC – IT –Emergency
(CAB/EC Review)
1Regularly applied changes that have well documented and tested procedures for applying and are low risk and low impact. X
2
Regularly applied changes that have well documented and tested procedures for applying and are not high risk but have a medium impact and/or where the change process has changed. X
3Changes that are rarely or have never been applied but are not high risk or high impact and do not create outages. X
4 Technical support required from other groups X
5 Change is likely to affect the work of other groups and/or users. X
6High risk, high impact or outage occurs; uses pre-determined maintenance window; minimum one week notification. X
7High risk, high impact or outage; OUTSIDE of pre-determined maintenance window; two weeks notification. X
8 Same as #7 but less than one week notification can occur. X
9 Same as #8 but less than two weeks notification can occur. X
10Outage exists due to an incident and change must be applied in order for service to become available. 4
5
Example of a Groups “Types of Change” Inventory
Unix-and-Storage-Change-types.xlsx
Emergency Changes create the most Pain!
6
Current Procedure
1. Change Initiator (CI) realizes that a change must be applied ASAP. 2. CI discusses change with Supervisor and/or Manager, if available, and agree
that change must be applied.3. CI notifies Customer and begins to apply the change as agreed upon.4. Outage occurs and Support Center is “Slammed with Calls!”5. Gil, Moira and/or other directors get calls from Customers.6. Change Initiator/Manager/Supervisor needs to explain why?7. Notice (apology!!) has to be sent out and posted on White Board.8. Major Stress for all involved.
7
Why? Because Change Initiators have questions without answers.1. What is the Emergency Change Process?2. Does an Emergency Change Process even exist?3. If one exists, how do I know that my change must follow the Emergency
Change process?4. Do I need to follow the Emergency Change Process if I am applying the
change due to an Incident?5. Who approves the Change?6. Whom do I communicate with and how do I know that the date and/or time
selected is acceptable?7. Who notifies users of the outage due to the application of an emergency
change? 8
Emergency Change Process
1. Change Initiator (CI) realizes that a change must be applied ASAP. 2. CI discusses change with Supervisor and/or Manager, if available, and they
agree that change must be applied.3. CI contacts Change Manager.4. Change Manager coordinates communication process and approval process.5. If approved, CI applies the change & notifies the CM of outcome.6. Post Implementation review occurs at TAT/CAB meeting.
9
RFC cheat_sheet v1.0.pdf
10
Step 1 – Identifying an Emergency Change
11
Production Services Maintenance Window
A period of time designated in advance by the technical staff of a high-availability service during which preventive maintenance or upgrades that could cause disruption of service may be performed.
The purpose of stating a time period in advance is to allow clients of the service to prepare for possible disruption or prepare for any major changes to the functioning of the service.
This type of disclosure is typically guaranteed as part of a service level agreement.
12
Production Services Maintenance Window (continued)
The ITSM office is requesting that all services have a pre-defined maintenance window that is documented and posted on the IT web site.
The current IT maintenance windows can be found on the following web site:
http://it.unm.edu/availability/
13
Emergency Change Criteria
A Change planned, scheduled and implemented at very short notice in order to protect a service from an unacceptable risk of failure or degradation, lack or loss of functionality.
?It is understood that maintenance windows, by their very
nature, may involve outages, therefore, if a change is applied within a preapproved maintenance window, should that change be classified as an emergency change?
14
Emergency Change Criteria (continued)
Answer: IT DEPENDS!
The answer is based on the phrase “very short notice”.
IT’s current standard for notifications is as follows:
1. A high risk, high impact change or one that creates an outage outside of a maintenance window requires 2 weeks notification to our users.
2. A same change applied within the maintenance window, requires 1 week notification.
NOTE: A change, required to resolve an incident or a problem
responsible for creating a MAJOR disruption of service, follows the Incident Management Protocol.
15
Emergency Change Criteria (continued)
If you answer No to any of these questions follow the Emergency Change Process:
When using a predefined Maintenance Window:
Can the change wait until the predefined Maintenance Window?
Can Notification of the Pending Change be sent out at a minimum of 1 week in advance?
16
Emergency Change Criteria (continued)
If you answer No to any of these questions follow the Emergency Change Process:
If outside of a predefined Maintenance Window:
Can Notification of the Pending Change be sent out at a minimum of 2 weeks in advance?
17
Emergency Change Criteria (continued)
Important Note
(Nothing to do with an emergency change.)
On all major changes make sure you give yourself enough time to set up the announcement of the impending outage, high impact or high risk change. If it’s a major outage, communication needs to be coordinated with the Support Center who may involve Planning and PR/Marketing.
18
Emergency Change Criteria (continued)
What does NOT have to follow the Emergency Change Protocol:
A change that must be applied to resolve/workaround an INCIDENT or a PROBLEM that has created a Major disruption of service follows the Incident Management Protocol.
Caveats: 1. ONLY if the ENTIRE service is affected and 2. NO other services are affected.
The Incident Management Protocol needs to be followed for changes required due to Major Incidents or Problems. Incident Management Protocol includes coordinating notifications through the Manager on Duty (MOD).
19
Emergency Change Criteria (continued)
Examples – are these Emergency Changes?
1. GroupWise Post Office #1 is down, however, all users in Post Office #2 are functioning without any problems. The incident requires bringing down the server which will disrupt the functioning GroupWise Users instead of just the Users in Post Office #1.
20
Emergency Change Criteria (continued)
Examples – are these Emergency Changes?
1. Yes…..By bringing down the entire system, instead of a 50%
degradation of service, a 100% degradation of service will occur.
21
Emergency Change Criteria (continued)
Examples – are these Emergency Changes?
2. One of the applications which shares the use of a specific server is experiencing slow response time. Some Users are able to use the application while others are unable to logon. The Support Center is receiving multiple complaints about the application. Applications Support, after contacting the application vendor, has received an application configuration change relating to this issue and the Server needs to be rebooted after the configuration change is made.
22
Emergency Change Criteria (continued)
Examples – are these Emergency Changes?
2. Yes……The application resides on a server shared with other
applications. Rebooting the server will create an outage for the users of the other applications.
23
Emergency Change Criteria (continued)
Examples – are these Emergency Changes?
3. A known problem exists within the GroupWise application. The existing workaround requires a reboot of the email server twice a week in order to prevent a major disruption of service until the Root Cause can be identified and resolved. The scheduled reboot will occur outside of the regular maintenance window.
24
Emergency Change Criteria (continued)
Examples – are these Emergency Changes?
3. Yes & then No….. As soon as the initial workaround has
been identified, the emergency change process should be followed for the first scheduled outage. No additional approvals will be required for subsequent outages relating to this incident since rebooting on a planned schedule prevents a major disruption with the potential for loss of data. However, some notification is still required.
25
Emergency Change Criteria (continued)
Examples – are these Emergency Changes?
4. Servers receive system maintenance patches from their vendors on a regular basis. The technical support staff apply the patches on a weekly basis using their maintenance window. These patches will require a reboot of their servers. The reboot process takes no more than 30 minutes.
26
Emergency Change Criteria (continued)
Examples – are these Emergency Changes?
4. No …..Since this is occurring on a regular basis in their
maintenance window, TAT/CAB needs to approve the first occurrence in order to approve the process. Subsequent outages do not need approval but still require notification.
27
Step 2 – Enter a Change Request in Peregrine
28
Step 3 – Contact the Change Manager
• Pager: 505-951-0950• Email to Pager: [email protected]
Reminder: Our pagers do not accept voice mail. You must key in, not state, your contact cell or telephone number when prompted.
Good Idea! Enter the pager # in your cell phone; Enter the email of pager in your address book. 29
Step 3 – Contact the Change Manager (continued)
When contacting the Change Manager, be prepared to answer the following
questions:
1. Why the change cannot follow the normal RFC process?
2. Reason for change,
3. Date and time the change initiator would like to implement the change
4. The risk associated with the change or with not applying the change
5. If an outage will occur, how long it will last,
6. What services will be affected,
7. Back out plan, if change is unsuccessful & how long the back out
process will take. 30
Step 4. TAT/CAB Notification and CAB/EC approvers
The Change Manager, working with information from the Change Initiator and
the information available on the Service Catalog IT Internal Web site:
1. Selects the members of the TAT/CAB that will approve the change at
the CAB/EC meeting.
2. Identifies additional staff deemed necessary for the particular change being considered.
3. Schedules a conference room and a Web-Conference or GWIM chat
4. Prepares and sends the text message and email notifying the TAT/CAB
and identified staff of the need for a CAB/EC meeting . 31
CAB/EC is a subset of the TAT/CAB(+):At a minimum, the CAB/EC approvers are:
• the Change Manager,• the Manager On Duty, • the Change Initiator’s Director, • the affected Service Owner(s)• IT Security
32
Step 4. TAT/CAB Notification and CAB/EC approvers (continued)
Notification will include:
• The TAT/CAB members plus necessary staff required for the emergency change
approval CAB/EC,
• Description of the emergency change,
• Services affected,
• Where and when the CAB/EC meeting will occur: GWIM, Audio-conference,
conference room or someone’s office.
33
Change Advisory Board (TAT/CAB)
Change Advisory Board/Emergency change (CAB/EC)
34
Step 4. TAT/CAB Notification and CAB/EC approvers (continued)
Step 4. TAT/CAB Notification and CAB/EC approvers (continued)
If the CAB/EC cannot convene the Change Manager Decides.
• If it is impossible to convene the CAB/EC, the Change Manager will make an informed decision as to whether or not the change may be applied.
• The Change Manager will make the decision after discussing the reason for and the implications of not applying the change with the MOD and the change initiator.
35
Step 4. TAT/CAB Notification and CAB/EC approvers (continued)
The TAT/CAB members receive notification via email and/or a text message noting the emergency change request and the need for a CAB/EC meeting. The text message will also include when and where the meeting will occur plus the names of the individuals necessary for approval. 36
STEP 5. Approval or Rejection
CAB/EC Agenda1. Roll Call2. Change Manager introduces Change Initiator3. Change Initiator
• Why the change cannot follow the normal RFC process?• Reason for change, • If approved, when it will be implemented,• The risk associated with the change, • How long the outage will last, if one is to occur, will last, • What services will be affected,• Back out plan if change is unsuccessful & how long the back out process will take,
4. Change is approved or rejected by CAB/EC 37
Step 6 – Notification Coordination
After a decision is made, the Change Manager:1. Coordinates communication with the Support Center/Customer
Care and the Service Owner(s), if the change is approved. Depending on the nature of the change, notification will also be sent to all IT management.
2. Updates the Emergency RFC entry in Peregrine reflecting status;
3. Notifies the TAT/CAB of decision made by the CAB/EC;38
Step 6 – Notification Coordination - approved (continued)
• The Service Owner (SO), if available, contacts the primary users of the service(s) and informs them of the upcoming emergency change.
• If the SO is unavailable, the Change Manager will contact the primary users identified in the Service Catalog.
39
Step 6 – Notification Coordination - approved (continued)
Service Owner & Primary User Contact information can be found in the IT Service Catalog by using one of the following links:
Service Owner by Service Categoryhttp://itinternal.unm.edu/servicecatalog/service_own_cat.php/#content
Services by Ownerhttp://itinternal.unm.edu/servicecatalog/service_by_owner.php
40
Step 7 – RFC approved
Change Initiator follows through with the change:
1. Implements the change;2. Reports, to the Change Manager, on the success or failure of the
change;3. Attends the next TAT/CAB meeting for a post-implementation
review of the Emergency RFC and its outcome.
41
Step 7 – RFC Not Approved
1. Change Initiator’s RFC will follow the appropriate RFC request process;
2. Change Initiator will then request approval via the next TAT/CAB meeting.
42
Next Steps
1. Set up the Audio-Conference for control and use by the ITSM office. 2. Create the IT CAB/EC GWIM Group.3. Create the IT CAB/EC texting Group.4. Present to TAT.5. Schedule presentations with all IT Groups.
• After receiving this overview, you are to begin following the Emergency Change Process.
6. Modify the existing RFC-IT form to include an Emergency Change Identifier and to automatically page the Change Manager.
7. Create the IT Emergency RFC form in Peregrine.8. Post docs in fastinfo and create links in Peregrine.
43
Happy, Happy, Joy, Joy!
Whether the Emergency Change Request is approved or not:
1. Decreases the pain points associated with Emergency Changes;
2. Process and Roles are well defined;
3. IT Management, End Users and Customers will be kept informed re. availability of IT services.
44
45