escalation objective ver 1
TRANSCRIPT
-
8/3/2019 Escalation Objective Ver 1
1/24
CRISIS MANAGEMENTPLAN
IKE S. GABRIEL
-
8/3/2019 Escalation Objective Ver 1
2/24
CRISIS MANAGEMENT OBJECTIVE
To have as little impact as possible on the traffic if a catastrophic outage
occurs.
To shorten the outage duration by containing the crisis as early as possible.
To inform higher management immediate after the outbreak of a catastrophicevent in the network in order to deploy all necessary support and resources
100 % OF NETWORK OUTAGES MUST BE DETECTED
AND
ESCALATED WITHIN 15 MINUTES
KEY NOC OBJECTIVE
-
8/3/2019 Escalation Objective Ver 1
3/24
B. KEYS TO AN EFFECTIVE MANAGEMENT OFCATASTROPHIC EVENTS
STRICT ADHERENCE TO ESCALATION PLAN
INFORM HIGHER MANAGEMENT AT THE OUTSET OF ACATASTROPHIC EVENTS
ENGAGE RESOLVING UNIT AS EARLY AS POSSIBLE
ASSIGNING A MANAGEMENT STAFF TO ACT AS THE CENTRALPOINT OF COORDINATION AND CONTROL THE FLOW OFINFORMATION
-
8/3/2019 Escalation Objective Ver 1
4/24
C. ESCALATION PLANNING
Guide the Crisis Management team whom to call and seek help when facinga catastrophic outage/event with aim of containing a crisis.
Provide instructions what initial action to take and after what time the nextlevel of help must be contacted.
ESCALATION TIME FRAMES
Crisis Management team must have:
Right focus to the problems at the right time.
Focusing on the right competence when required.
Correctly use channels and the Crisis Management team mustknowledgeable that there exists single point of interface towards eachchannel i.e. the interface towards the support group/vendor.
-
8/3/2019 Escalation Objective Ver 1
5/24
Escalation Call Flow
OCCURRENCEOF
CATASTROPHICOUTAGE AND
EVENT
COORDINATE FAULTS/OUTAGEWITH RESOLVING UNITS (RAFO,NTBN, DNS, CORE, ETC.)
ImmediateAssignmentof Fault toResolvingUnit
NOC
SIC to InformImmediateHead
MANAGER OF PIONEERNMC or REGIONAL CEBUNMC
CONTACT VP FOR NOC
If not contacted,escalate to NextLevelManagementOfficer.
CONTACT HEAD OF NOAT
If not contacted,escalate to NextLevelManagementOfficer.
CONTACT VENDORLOCAL SUPPORT
RequireSecondLevel ofSupport?
RequireThirdLevel ofSupport?
COLLABORATE WITHVENDOR LOCAL
SUPPORT AND GLOBALTAC (R&D)
MUST
BECONTACTE
DWITHIN15
MINUTES
Alarm Monitor andDetected within 15
Minutes
-
8/3/2019 Escalation Objective Ver 1
6/24
CATASTROPHIC OUTAGE MANAGEMENT ESCALATION MATRIX
ELAPSED TIME IN MINUTES
0 ~ 15 16 ~ 60 61 ~ 120 121 ~ 240
DUTY NMCEngineer
Detect Outage andStarts Isolating theproblem. NotifiesHead of NMC.
-Coordinates withResolving Team
- Monitors clearing ofalarms and trafficnormalization
-Continue coordinationwith Resolving Team.Send Update
- Monitor clearing ofalarms and trafficnormalization.
-Continue coordination withResolving Team. Send Update.
- Monitor clearing of alarmsand traffic normalization
DMPI O&MRESOLVING TEAM
O&M Supportinvestigates the problem
and attempt to neutralize.Seek support fromVendor experts if require.
Collaborate withVendor Local Support
to contain crisis.Correlate events andKPIs with outages
Collaborate with VendorSupport to contain crisis.
Investigate work-aroundand trigger contingencyplan.
VENDOR LOCALEXPERTS3rd Level
Collaborate with GlobalTAC /R&D to neutralizefault. Work with DMPI/DTPIO&M support to resolve theproblem
VENDOR GLOBAL
RESPONSECENTER or R&D
Head of NMC Notify Heads of NOCand NOAT. Engagesupport team todetermine extentand find aworkaroundsolution.
-Collaborate with Headsof O&M ResolvingTeam- Decide on furtheraction to be taken
-Collaborate with Headsof O&M Resolving Team-Get update from NMCand Brief Head ofNOC/NOAT-Wait instructions fromHead of NOC/NOAT on
what further action to betaken.
Vendor Local TAC takesover responsibility toneutralize fault. Assignsproduct expert to solvethe problem
Vendor Global TAC takes over
responsibility to neutralizefault. Work with local toresolve the problem
Deploy Quick ReactionTeam to Location of
crisis. Contact VendorSupport Team forassistance ifnecessary
Vendor O&M teamprovide remote supportwhile Technical Experton their way to NOC/affected sites.
-Collaborate with Headsof O&M Resolving Team-Get update from NMCand Brief Head ofNOC/NOAT-Wait instructions fromHead of NOC/NOAT on
what further action to betaken.
-
8/3/2019 Escalation Objective Ver 1
7/24
E. Officer-In-Charge Crisis ManagementOIC- Crisis Management Team
Shall be responsible for managing catastrophic outage for particular week. He/Shewill be responsible in escalating to higher management to dispatch required support
and make decisions as required to contain a crisis. He/She may be required to be atthe CG2 NOC during catastrophic situations and must be reachable with two (2)phones.
CatastrophicEvent/Outage Duty Officer
DUTY WEEK
1st Duty 2nd Duty 3rd Duty 4th Duty
Alex Galzote Week 1 Week 14 Week 27 Week 40Arnold Melgarejo Week 2 Week 15 Week 28 Week 41Richard Cadungog Week 3 Week 16 Week 29 Week 42Tante Valdez Week 4 Week 17 Week 30 Week 43Arnold Pedro Week 5 Week 18 Week 31 Week 44PJ Capiral Week 6 Week 19 Week 32 Week 45
Melai Sabidong Week 7 Week 20 Week 33 Week 46Alex Galzote Week 8 Week 21 Week 34 Week 47Arnold Melgarejo Week 9 Week 22 Week 35 Week 48Richard Cadungog Week 10 Week 23 Week 36 Week 49
Tante Valdez Week 11 Week 24 Week 37 Week 50Arnold Pedro Week 12 Week 25 Week 38 Week 51
PJ Capiral Week 13 Week 26 Week 39 Week 52
-
8/3/2019 Escalation Objective Ver 1
8/24
INSTRUCTIONS TO OIC-DUTY OFFICER
Must be accessible 24X7 through the entire duration of his/her duty. In
case, his/her present location has no coverage, he/she must inform theNOC Duty Engineer to forward the call to any ALTERNATE PHONE.
Must have the necessary cash advances ( EMERGENCY CASH FUND)and have knowledge whom to call to get support.
In case Duty Officer would not be available, he/she must designate an
alternate OIC for his/her replacement. May call NOC time to time to check the status of the network.
At the outbreak of a catastrophic outage/event, Officer in Charge will callthe Head of NOC and NOAT immediately to seek help to deploy supportteam from other department.
-
8/3/2019 Escalation Objective Ver 1
9/24
HUAWEI Support Service
-
8/3/2019 Escalation Objective Ver 1
10/24
HUAWEI Local Support Organization
ROUTER/SWITCH(DATACOM)
Zhou zhi hao
Maintenance Manager
Nelson Villoria
CS Core LTAC
Wang Guodong
2G/3G RAN LTAC
Zhang Chong
Customer SupportHead- Gary Cai
Service Delivery Mgr.Peter Zhang
Technical Director
Xue Shi Jun
IN-VAS
Shu Peng
CHINAR and D
TRANSMISSION
Fang Yong Liang
PS Core LTAC
Xue Shi Jun
Crisis Management Team
Weekly Assigned Officer/
O&M Heads
Philippine TSDHead- Jack Ruan
Over-all Chair of CrisisManagement
R. Frey/R. Zawila
Country HeadC. Li
Deputy : Maxin
APAC TSDHead- Ma Xiao Bo
HQBusiness Unit Head
-
8/3/2019 Escalation Objective Ver 1
11/24
Escalation Procedure Name list
Page 11
Particulars position
Huawei DMPI
Name Tel./Mobile Nos. Emails NameTel./Mo
bileNos.
Emails
HOTLINE HW-TAC PH-TAC02 8190532
0922 [email protected]
HW Top Manager
Group
Country President liwei0922 80163010917 8306301
Account Director maxin 0922 8006189 [email protected]
Account manager xujiaxiang 0922 8990212 [email protected]
TSD Director ruanjiahai0922 89909680917 8679888
Service Delivery Manager zhangligang0922 83410010917 5954485 [email protected]
Customer SupportManager
caigaoyang0922 36139410916 4188028
Technical Director huang zhan 0922 8850125 [email protected]
HW Maintenance
Team
Maintenance manager Panda Huang0922 95084570917 9017797
Digitel Network CTO xue shi jun0922 89916190917 8513210
Core Network-Teamleader Wang Guodong 0922 3861982 [email protected]
Wireless-Team leader Joel Sabidong 0922 8115635 [email protected]
Data Com-Team leader Liu Dongbo 0922 5306045 [email protected]
A&S-Team leader Shu Peng 0933 9471514 [email protected]
Optical Network-Teamleader
Fang Yongliang 0908 1577115 [email protected]
-
8/3/2019 Escalation Objective Ver 1
12/24
Maintenance Service level agreement
Service availability language EnglishHotline service availability period 7 days x 24 hoursOn-site support service availabilityperiod 7 days x 24 hours
Problem 1st Level (Catastrophic) 2nd Level (Critical) 3rd Level (Major) 4th Level (Minor)Hotline response time
-
8/3/2019 Escalation Objective Ver 1
13/24
DMPI-Huawei Escalation Interface
Page 13
DMPI NOC Engineer
DMPI Maintenance
leader
DMPI Maintenance
Manager
DMPI Top Manager
Group
HW Engineer
HW Maintenance
Manager
HW HQ Expert Group
HW Top Manager
Group
HW Maintenance leader
HW Manager Member:Service Delivery Manager
Customer Support Manager
TSD Director;
Account manager
Account Director
Country President
5 Mins
Info
Sharing
5 Mins
10 Mins5 Mins
-
8/3/2019 Escalation Objective Ver 1
14/24Page 14
Huawei DMPI Joint Maintenance Team
Jack Ruan (Ruan
Jiahai)
Philippines TSD Head
Peter Zhang
Service Delivery Manager
Gary Cai
Customer Support Dep.
Manager
Xue shi jun
DMPI network CTO
Huang Xiarong
Maintenance Delivery Manager
Shu Peng
IN & VAS Product Maintenance
Leader
Wang Guodong
Core Product Maintenance Leader
Joel V Sabidong
Wireless Product Maintenance
Leader
Liu Dongbo
Data & Access ProductMaintenance Leader
Fang Yongliang
Optical & MW Product
Maintenance Leader
Sam Xu (Xu Jiaxiang)
Digitel Account Manager
Core Product Engineer
Yi Zhan
Audi/Rene
Optical & MW Engineer
Mark
Rey
Wireless Engineer
Huang Guodong
Eric/Jay
Data & Access Engineer
Richard
Liu Fumin/Cleo
IN & VAS Engineer
Yuzhenhua/Jo are/Shi
wei/Errol/Ryan/Lloyed
Dolly Esplana
Julius Rodriguez
NOC Core Team
DMPI
NOC Wireless Team
Mike cNOC Datacom Team
JennyRhoda Campos
NOC IN & VAS Team
DMPI
NOC Optical Team
IKE
NOC Manager
Rudi
Sponsor
R C
-
8/3/2019 Escalation Objective Ver 1
15/24
The NMC role under the directions of the Head of Crisis Management is to counter short-termdisturbances and congestion in the network due to:
o Failure of the Core networko Failure of the Transmission Backbone Systemo Accidental cutting of fiber cable, interconnection facilitieso Earthquake, flooding, fire, etco Special events, exhibition and sporting events
In the event that this situations occurs, NMC has two ways to deal with this:
o Redirecting traffic through unaffected parts of the networko Reducing the demand on the network by blocking less priority users.o Implement traffic control such as call gapping, SS7 link distribution and
BSC/RNC blocking, activating MSC congestion reduction mechanism
In case of disaster, for instance, priority might be given to police and other emergencyservices. In cases of national emergency (war), government and military would be givenpriority.
Responsibilities- Network Management Center Functions
-
8/3/2019 Escalation Objective Ver 1
16/24
A1. CONTENTS OF EMERGENCY BINDER
General
This chapter would contain the general information and guideline for pre-requests for using the emergencybinder and how to make an analysis of the fault situation.
Emergency Telephone Numbers
This chapter would contain the information about useful telephone/Cell phone number that can becontacted during crisis:
Operation and Maintenance Center for BSS and Transmission
Operations and Maintenance Center for NSS and GPRS Head Network Management Center Head Network Operations Center Regional Field Maintenance Center Manager of the different Regional Field Maintenance Center Head of the ACCESS Field Operations Center Power and Air-conditioning maintenance personnel
Spare parts and Logistics HUAWEI Local Support Center ERICSSON Local Support Center ACISION Support personnel Police Fire Brigade MERALCO and other provincial electric company Complete list of IN-VAS personnel
Complete list of all telephone number to all MSC locations
-
8/3/2019 Escalation Objective Ver 1
17/24
A2. CONTENTS OF EMERGENCY BINDER
EMERGENCY CATEGORIES
Following are some of the example of emergency categories. The complete list is located in the appendix summarized withdetail description and alert code assignment. The following category will have corresponding Operational instructions asattached in the appendices. The following define the different emergency situations:
Cyclic Restarts- The NSS heading and restart heading are repeatedly output at the workstation. Restarts do not result inreloads even though they are less than 10 minutes apart.
Cyclic Reload - One or more automatic system reloads in the MSC from the Hardisk have failed to recover the NSS.
System Stop - Failure of the MSC/HLR/VLR/MGW, GPRS, IP SWITCHES/ROUTERS,IN-VAS to carry out recovery attempt
Charging Failure (CDR Collector/Charging Gateway)- If the Charging records functions fails and all CDR devices is seized.
System Overload/ Reduced Traffic Handling - Substantial reduction of traffic handling of the MSC without a restarts e.g.excessive delay for Call Set up, SS7 instability, VLR instability, SS7 device lockout, abnormal processor load, route congestionor external indicators.
Power Failure - Power at the MSC i.e. how to handle the situation when the batteries are reaching critical low voltage situation.
Loss of Interconnection Links Calls to/from GLOBE/SMART/PLDT is impossible.
Fire - In case of fire, special instructions are required to handle such emergency
Backbone Transmission Failure - Situation affecting isolating an entire MSC or BSC
Flooding and Earthquake - Special instructions required to personnel
-
8/3/2019 Escalation Objective Ver 1
18/24
DESCRIPTION OF EMERGENCY SCENARIO Class Type Escalation Up to Alert Code
Total Breakdown of Interconnection Links within other Operators
network such as SMART, GLOBE & PLDT, IGF Interconnection Quality of Service Yellow
Total Breakdown of BSC/RNC Due To Hardware Fault BSS/RNC Hardware OrangeBreakdown of the Media Gateway Hardware Affecting Multiple
BSC/RNCBSS/RNC Hardware Orange
Breakdown of the SGSN/GGSN Hardware BSS/RNC Hardware OrangeCorrupted CDR data -Too many erroneous CDRs (50% of the total
CDRs);CDR Collector Application Orange
Corrupt data - Too many duplicated CDR's (50% of the totalCDRs) on counter reports
CDR Collector Application red
Total failure of all Processes in the CDR Collector CDR Collector Application orangeTotal break down of Clustered Server or Charging Gateway Charging Gateway Hardware Orange
Total Breakdown of MSC Rectifier ENVR_NSS Hardware RedTotal Breakdown of MSC Inverter ENVR_NSS Hardware Red
Total Breakdown of BSS /RNC DC Power Distribution ENVR_NSS Hardware YellowTotal Breakdown of Transmission Backbone DC Power
System/DistributionENVR_NSS Hardware Yellow
Failure of UPS Emergency Power ENVR_NSS Hardware YellowBuilding Fire Alarm ENVR_NSS Hardware Red
Breakdown of Fire Suppression System ENVR_NSS Hardware YellowMSC High Room Temperature ENVR_NSS Hardware Yellow
MSC Door Intrusion Alert ENVR_NSS Hardware YellowBreakdown of more than 2 MSC Air-conditioning Unit ENVR_NSS Hardware Orange
Breakdown of Genset at the MSC ENVR_NSS Hardware OrangeDay Tank Fuel Critical Level at MSC ENVR_NSS Hardware Yellow
AC Main Failure to MSC Rectifier ENVR_NSS Hardware Yellow
Charging function stop from any GSN. 3G Application YellowNo transfer of CDRs to CDR Collector 3G Application YellowBreakdown of one CDR Collector server 3G Application OrangeBreakdown of one CG LAN router/switch 3G Application Orange
Total break down of Ethernet switch on Gi interface. 3G Hardware OrangeTotal break down of DNS/DHCP 3G Hardware yellowTotal break down of Border GW 3G Hardware yellow
Data transfer between HLR and SGSN not possible 3G Hardware yellowBreakdown of 3G IP backbone LAN Switch 3G Hardware yellow
Total failure of Iub traffic to RNC 3G Transmission orangeBreakdown of SS7 signaling links to HLR 3G Transmission yellow
CRISIS CLASSIFICATION
CRISIS CLASSIFICATION
-
8/3/2019 Escalation Objective Ver 1
19/24
Corruption of Vital Service Profile Data IN-PPS Application yellowUnable to perform deduction/refund via PMC IN-PPS Application orange
Periodic fee is not functioning IN-PPS Application orangeNo generation of call detail records in the IN IN-PPS Application orange
Failure of Oracle database IN-PPS Database orangeRetrieval from backup not possible IN-PPS Database orange
Total break down of IN platform IN-PPS Hardware redIN Platform Shared hard drive failure IN-PPS Hardware redLoss of a non-redundant element IN-PPS Hardware yellow
Repetitive changeover to standby platform IN-PPS Hardware orangeDetection of multiple changeover to standby platform IN-PPS Hardware yellow
100% of the Bearer Connection is down IN-PPS Network red100% Signaling link Failure on PCM/Trunk IN-PPS Network red
Layer 3 Switch routing/link Failure IN-PPS Network redTotal loss of call processing IN-PPS Services red
Total failure of recharging functions for all prepaid accounts IN-PPS Services red
Impossibility to carry out a basic operation function IN-PPS Services orangeUnable to perform outgoing calls IN-PPS Services orange
Incorrect generation or loss of call records Core Network Application orangeLoss of connection to CDR Collector Core Network Application orangeTwo consecutive switch over for HLR Core Network Hardware red
Data transfer between HLR and MSC/VLR not possible Core Network Hardware redTotal break down of HLR Core Network Hardware red
Total Break Down of MSC or MGW Core Network Hardware redTwo consecutive switch over for MSC Core Network Hardware redSG M3UA Overload for Multiple MSC Core Network Hardware Red
Total loss of MSC call handling functions Core Network Transmission RedNo location update for more than 70% of booked subscriber in
VLRCore Network Transmission Red
Total loss of the signaling links (SS7) MSC/HLR to SGW Core Network Transmission RedBreakdown of Transmission to DIGITEL LEC (PSTN/IGF) Core Network Transmission Orange
Breakdown of Transmission to other PLMN (GLOBE & SMART) Core Network Transmission RedMore than 50% of calls are rejected Core Network Transmission Yellow
Total Loss of Connection between HLR and CDR Collector Core Network Transmission Yellow
CRISIS CLASSIFICATION
CRISIS CLASSIFICATION
-
8/3/2019 Escalation Objective Ver 1
20/24
Total Loss of Network Supervision to all MSC/MGW/SGW OMC Application YellowTotal Loss of Connection to more than 50% of total number
of Transmission Backbone NodesOMC Application Yellow
Total Loss of Connection to GGSN or SGSN OMC Application YellowTotal Loss of Connection to OMC-R by more than one BSC OMC Application Yellow
No Network Supervision for entire BSS network element. OMC Application YellowTotal break down of one or more OMC-x or INMS server OMC Hardware Yellow
Total Loss of an application critical to monitoring OMC Hardware YellowIP backbone down OMC Hardware Yellow
Database Corruption OMC Hardware YellowMore Than 10% of the Total BTS Isolated Due To
Transmission FailureTransmission Network Yellow
One (1) BSC Isolated Due To Transmission Failure Transmission Network YellowOne (1) RNC Isolated Due To Transmission Failure Transmission Network OrangeOne (1) MSC Isolated Due To Transmission Failure Transmission Network Red
Database crash VAS-SMSC Database RedLoss of billing data VAS-SMSC Database Red
Breakdown of 2 or more SMSC Front-ends VAS-SMSC Hardware RedTotal loss of SMSC interconnections to the SG VAS-SMSC Interface Yellow
Total Loss of SMSC interconnections to the Layer 3 switch VAS-SMSC Interface orangeTotal Failure of SMSC to handle SMS processing VAS-SMSC Services red
CRISIS CLASSIFICATION
CRISIS CODE LEVEL
-
8/3/2019 Escalation Objective Ver 1
21/24
CRISIS CODE LEVEL
CRISIS ALERTLEVEL
COLOR CODE RESPONSE TIME
1 Yellow The Mean time the Crisis Management Teamresponse to a Level 1 emergency shall bewithin 4 hours from the time the crisis isescalated.
2 Orange The Mean time the Crisis Management Teamresponse to a Level 1 emergency shall bewithin 2 hours from the time the crisis isescalated.
3 Red The Mean time the Crisis Management Teamresponse to a Level 1 emergency shall bewithin 1 hour from the time the crisis isescalated.
-
8/3/2019 Escalation Objective Ver 1
22/24
F. EMERGENCY HANDLING ORIENTATION
Regular workshop in handling emergency is a very important means of preparing theO&M personnel and the Crisis management team. The preparation could be as regulartraining and as test of the content of the emergency binder.
The head of the Crisis Management Team who is responsible for emergency planningshould make an emergency training plan for all member of the Quick Reaction Teamand the O&M personnel. The plan should be followed and followed up.
Additionally, the Head of Crisis Management team can schedule a CRISIS DRILL todetermine the responsiveness and readiness of the team and validate the affectivity ofthe crisis management plan.
10 R l f N t k S f t
-
8/3/2019 Escalation Objective Ver 1
23/24
10 Rules of Network Safety1. Only Qualified Personnel is allowed to conduct any Network Activities at all times: The implementing party who
will perform the activity on the LIVE" Network must be a trained personnel in order to eliminate the chances of
network outages or abnormalities occurring due to incompetence, lack of skills or inexperience.
2. Always secure the necessary clearances and approvals: All network activities that need to be conducted in a LIVE
network environment, must be covered by an approved WORAP. The MOP shall be strictly followed, only carrying
out specific network activities it covers.
3. Follow the Standards Operating Procedure recommended by vendors for both Hardware & Software at all times.
4. Ensure that activities are implemented based on the prescribed maintenance window: As a general rule, Network
activities in a LIVE network environment will be implemented during off-peak traffic hours which is from 1:00 am
to 5:00 am. If there will be necessary exceptions, it must have the written permission of Head of NOC and NOAT.
5. Confirm the operating state of the network before starting the activity: Before entering the site or accessing a
network element remotely to perform an approved activity, the implementing personnel must advise the NOC
team and wait for confirmation to proceed before doing so. He must also verify that there are no relevant alarmsexisting prior to starting the activity.
6. Verify that the network equipment is operating normally after completion of the activity: After having performed
the approved Network activity successfully on-site or remotely, the implementing personnel must advise the NOC
team on the conclusion of the activity and get confirmation that No ALARMS or any other abnormalities have
manifested arising out from the activities performed.
7. Only implement activities that are covered by an approved WORAP: If there are certain operations than need to
be implemented but not included in the WORAP/MOP, this operation must not be executed without the written
permission of Head of NOC or NOAT.
8. In case of the outbreak of catastrophic outage, immediately contact the Head of crisis management team for that
day: The NOC duty engineer must notify the crisis management team within 15 minutes from detection.
9. Never connect unauthorized devices before, during and after the activity: It is not allowed to connect personal
portable devices or memory medium such as erasable compact disc, USB and portable hard disk to all network
equipment if this is not a requirement of the WORAP.
10.Playing network games or logging on unauthorized websites from any NMS/OMC/maintenance terminals are
STRICTLY not allowed.
-
8/3/2019 Escalation Objective Ver 1
24/24
END OF PRESENTATION