nerc ems conference · pjm confidential pjm©2013 745894 nerc ems conference event response...
TRANSCRIPT
PJM©2013 PJM Confidential
745894
NERC EMS Conference Event Response Strategies
September 18 – 19, 2013
Denver, CO
John Baranowski, PJM
PJM©2013 2 PJM Confidential
745894
Event Response
• Philosophy
• System Design
• Failure Modes
• “Work Arounds”
• Extreme events
PJM©2013 3 PJM Confidential
745894
Philosophy
• Rapid recovery is better investment than 100%
uptime
• Levels of redundancy and failover are important
• Resilience of systems is important – as even the
best of systems will fail
– It is Not “IF?” but “WHEN?”
• Explore failure modes in table tops and drill
• Investigate actual events and ‘near misses’ for
continuous improvement
PJM©2013 4 PJM Confidential
745894
PJM Drills
Assuring
Reliability
Annual Communications Check with Generators in
Restoration Plan
Weekly EMS
Failover Drills
Business Continuity
Monthly Reactive
Reserve Drills
Weekly Satellite
Phone Drills
Shift Supervisor Lead Semi-Annual Emergency
Procedure Drills
Shift Supervisor Lead Semi-Annual System
Restoration Drills
Monthly IROL Load
Shed Drills
PJM©2013 5 PJM Confidential
745894
System Design
• Post 2003 implementation
• Dual staffed control rooms
• Dual live EMS and support systems
– Redundancy within each system
• Monitoring in depth at transmission owner
control centers / EMS systems
• Weekly / monthly / annual unannounced drills
• 24x7 IT Operations Center (ITOC)
PJM©2013 6 PJM Confidential
745894
Illustration of PJM’s
redundant
communication
between AC1 and
AC2 and continuous
communication,
operational
awareness of our
Member Companies
PJM
PJM Member
Companies
Dual Primary Control Centers
Valley Forge (AC1) Milford (AC2)
PJM©2013 7 PJM Confidential
745894
Failure Modes
• Drafted during system development
– In parallel with “system fix”
– Includes ways to assess and identify failures
• Working document – add as encountered
• Includes troubleshooting and “work-arounds”
• Part of our EOP-008 plan
PJM©2013 9 PJM Confidential
745894
“Work Arounds”
• Emergency Scheduler
• Manual Economic Dispatch
• Study Mode / Off-line ‘save’ cases
• Manual IROL worksheet
• Coordination between control rooms
• Coordination with support staff
• Work locations in control room for support staff
PJM©2013 10 PJM Confidential
745894
Extreme events
• Virtual Backup Control Center (vBUCC) – on-line
copy of yesterday’s database (DB)
• “Rapid Recovery” of yesterday’s DB
• Golden Image – ‘warm’ copy of older DB
Primary
1
Primary
2
vBUCC
Golden Image
Rapid
Recovery
DB
PJM©2013 11 PJM Confidential
745894
Past EMS Event
Operations went on
Emergency Scheduler
at 16:40
Main period of impact
was from
~16:30 to ~19:00
(or about 2.5 hours)
Event started at
approximately 15:56
with firewall change
At 19:00, operator access
was restored and
Emergency Scheduler was
turned off
EMS was back to a fully
redundant state at both sites
at approximately 21:00
On Manual Economic
Dispatch at 16:30
Off Manual
Economic Dispatch
at 20:00
PJM©2013 12 PJM Confidential
745894
Event Response
• EMS (and TNA) was running during the event
• User interfaces on operator consoles lost connectivity
gradually throughout the event
• Energy schedules were manually updated in the EMS
starting with the 16:45 change, through 19:00
• Operations support teams provided visibility and
workarounds to the control room operators during the
event
• Situational awareness was also maintained leveraging our
Transmission Owner’s EMS
PJM©2013 13 PJM Confidential
745894
Takeaways & Summary
• PJM Was Successful in Maintaining Situational
Awareness during this event because:
1. We trained and drilled extensively on many
scenarios
2. Such training breeds creativity – and enhances
organizational as well as system resilience
3. Even having the multiple levels of redundancy we
needed to be prepared with at least manual
protocols to fall back on
4. Future vBUCC and “Golden Image” capabilities will
provide PJM additional options in future scenarios