situational awareness and risk management - lehigh …inesei/images/pdf/ieeepresentation_1_2.pdf ·...
TRANSCRIPT
1
Situational Awareness and Risk Management
Presentation at Lehigh University April 7, 2014 and April 8, 2014 at IEEE Local Chapter Meeting
By Jim Robinson - Relion Associates LLC
Part 1, 2 and 3 Summary
Part 1 – case study US/CA 2003 Blackout
Part 2 – Scientific industry (STEM) renewed focus on Situational Awareness & Risk Management
Part 3 – case study Pacific SW 2011 event
2
Part 3 case study Pacific SW 2011 event
2
Situational AwarenessSituational Awareness and Risk Management
Case study - August 14, 2003 Blackout
By Jim Robinson - Relion Associates LLC8/14/2003 Event – NERC Sequence of Events Team Leader
Part 1 Overview
Overview of system and entities
Root Cause Analysis Key StepsRoot Cause Analysis – Key Steps
Sequence of Events prior to uncontrollable cascade failure
Determination of Root Causes – 3 “T”s
4
Uncontrollable wide area cascade
Joint Task Force Recommendations
3
Initial Conditions – 8/14/2003 Not Unusual(thousands of key conditions captured)
5
Root Cause Analysis – Key Steps
Early Capture of ALL Recordings and LogsDigital & analog data, communications, email, audio, video, etc.
I i i T F iInvestigation Team FormationKey subject matter specialists
Document System Initial ConditionsKey entity data several hours prior to event
Determine Precise Sequence of Events
6
q
Ask endless Questions – Root of each Step in SequenceConfirm answer to key Questions
Avoid hidden agendas, off Sequence tangents, etc.
4
Investigation Team OverviewSteering Group
Investigation Team Leaders
U.S – CanadaTask Force
MAAC/ECAR/NPCC Coordinating Group
MAAC
Project Planning and Support
Sequence of EventsJim Robinson –
Team Leader
NERC & Regional Standards/Procedures
& Compliance
Transmission System Performance,
Protection, ControlMaintenance & Damage
Operations - Tools, SCADA/EMS
Communications Op
Root Cause Analysis
Generator Performance, Protection, Controls
Maintenance & Damage
Vegetation/ROW Management
Restoration
Investigation Process Review
7
ECAR
NPCC
MEN Study Group
Data Requests and Management
System Modeling and Simulation Analysis
Communications Op Planning
System Planning, Design, & Studies
Maintenance & Damage
Frequency/ACE
ONTARIO
Sequence Start DeterminationEast Lake Generator #5 Trip: 1:31:34 PM
ONTARIOONTARIO
Transmission Lines
765 kV500 kV345 kV230 kV
Transmission Lines
765 kV500 kV345 kV230 kV
8
5
“Tools”Control Area (CA) Entity - Computer Failures
2:14 PM - Alarm logger fails and operators are not awareS. Canton - Star 345 kV line trip and reclose at 2:27 PM (both ends)
2:41 PM - server hosting alarm processor and other functions fails to backup
2:54 PM - backup server fails (2 of 4 servers down)2 servers continue to function but with long refresh (~59 second)unidentified Harding - Chamberlin 345kV line outage at 3:05 PM
9
3:08 PM - IT warm reboot of servers appears to work but alarm process not tested and still in failed condition
Root Cause Analysis Phases
BLACKOUT16:15
Sammis – StarStar – South Canton
Hanna – Juniper
16:06
Initial Focus
10
Pre-Existing ConditionsE.g. voltages, wide- area transfers,
line and generator outages, etc.
pChamberlin - Harding
15:05
6
Power System High Level Sequence
Premature failure of three 345kV linesstarting at 3:05 PM, three permanent outages within 36 minutes due to tree branches to close underneath heated conductorsdue to tree branches to close underneath heated conductors
Northern Ohio 138kV cascade beganstarted 3:39 PM - caused by above 345kV premature failures138kV conductors severely overheated beyond design limits
Northern Ohio 345kV high speed cascade
11
Northern Ohio 345kV high speed cascadeoverloaded line trips 4:05:57 - 4:09:07 PMaccelerated by Zone 3 directional distance relays
Eastern Interconnection Separates by 4:11PM
Chamberlin-Harding (3:05:41)
12
7
(3:05:41) Hanna-Juniper(3:32:03)
13
(3:05:41)(3:32:03)
Star- S Canton (3:41:35)
14
Star- S. Canton (3:41:35)
8
Situation after Initial Trips 3:05:41 – 3:41:35
15
Hanna - Juniper Tree ContactInadequate tree clearance = Premature failure
16
9
Root Cause #1 “Trees”Transmission Owner Vegetation Management
Transmission Owner (TO) did not adequately manage tree branch clearance under hot conductors. Conductors could not reach maximum rated temperaturenot reach maximum rated temperature.
NERC requires line rating be calculated based on most limiting element.
Including conductor temperature limit due to non- standard clearances (i.e. buildings, telephone wires, etc.) underneath the conductor; but prior to 2007 did not require adequate clearance to
17
conductor; but prior to 2007 did not require adequate clearance to trees.
TO did not comply with its own Utility Vegetation Management program (UVM Report page 50, #2)
18
G = Max Sag Ground Clearance – Determined from Line Plan & Profile
WSZ = Wire Security Zone (Table 1)
A = Additional Clearance to allow for Growth of Vegetation
10
120
140
C
Sammis-Star
Northern Ohio 138 kV Cascade contributes to Sammis-Star 345 kV overload
40
60
80
100
% o
f Nor
mal
Rat
ings
Canton C
entral Transform
Babb-W
.Akron 138 kV
Ch
Star-
Cloverdale-Torrey 138
E.Lima-N
ew Liberty 13 8
W.A
kron-Pleasant Valley
E.Lima-N
.Finlay 138 kV
Cham
berlin-W.A
kron 138 kV
W.A
kron 138 kV Breaker
Dale-W
.Canton 138 kV
19
15:05:41 EDT
15:32:03 EDT
15:41:35 EDT
15:51:41 EDT
16:05:55 EDT
0
20
mer
Harding-
hamberlin
Hanna-
Juniper
-S.Canton
kV
8 kV
y 138 kV
V
120
140
C
Sammis-Star
Root Causes #2 & #3 “Tools & Training”Situational Awareness & Human Communication Training,
Command & Control Tools (TOP alarms fail 2:14PM)
40
60
80
100
% o
f Nor
mal
Rat
ings
Canton C
entral Transform
Babb-W
.Akron 138 kV
Ch
Star-
Cloverdale-Torrey 138
E.Lima-N
ew Liberty 13 8
W.A
kron-Pleasant Valley
E.Lima-N
.Finlay 138 kV
Cham
berlin-W.A
kron 138 kV
W.A
kron 138 kV Breaker
Dale-W
.Canton 138 kV
20
15:05:41 EDT
15:32:03 EDT
15:41:35 EDT
15:51:41 EDT
16:05:55 EDT
0
20
mer
Harding-
hamberlin
Hanna-
Juniper
-S.Canton
kV
8 kV
y 138 kV
V
11
Sammis-Star(4:05:57.5)
21
Directional Distance Relays
22
12
Sammis-Star Zone 3 Relay Operateson Steady State Overload
23
Northern Ohio 345kV High Speed Cascade beginsLoss of Sammis - Star 345 kV Line 4:05:57 PM
Major Path to Cleveland Blocked
24
13
Root Causes – 3 “T”s“T”rees
Failure to manage tree growth in ROWs − to achieve maximum conductor design temperature
“T”oolsInadequate Situational AwarenessFailure of Reliability Coordinator’s diagnosticsLack of load shedding capability
25
Lack of load shedding capability
“T”rainingLack of effective situational communicationLack of decision & orders to shed line load
Reliability Standard Violations
Failure to return system to safe operating state within 30 minutesFailure to notify others of impending emergencyAdequate contingency analysis not performedInadequate training Reliability Coordinator did not notify othersReliability Coordinator did not have adequate
26
Reliability Coordinator did not have adequate monitoring capability
14
Uncontrollable Cascade Begins at 4:06PM
Premature failure of three 345kV linesstarting at 3:05 PM, three permanent outages within 36 minutes due to tree branches to close underneath heated conductorsdue to tree branches to close underneath heated conductors
Northern Ohio 138kV cascade beganstarted 3:39 PM - caused by above 345kV premature failures138kV conductors severely overheated beyond design limits
Northern Ohio 345kV high speed cascade of
27
Northern Ohio 345kV high speed cascade of overloaded lines 4:05:57 - 4:09:07 PM
accelerated by Zone 3 directional distance relay loadability
Eastern Interconnection Separates by 4:11PM
345 kV Overloaded Lines Trip – Zone 3 Relays4:08:59 - 4:09:07 PM
ONTARIO
28
15
Generation Trips 4:09:08 – 4:10:27 PM
ONTARIO
29
345 kV Zone 3 Relays Accelerate CascadeArgenta-Battle Creek lines trip 4:10:36 – 4:10:37PM
30
16
Three 345 kv Lines trip from 4:10:37.5 - 4:10:38.6 PM New Phase Begins - “Transient Instability”
31
Power Transfers Shift at 4:10:38.6 PM
32
17
Eastern Michigan (Detroit) Transient InstabilityMW & Voltage Swings (& N-S Pole Slipping)
33
Northeast Island Completes Separation from Eastern Interconnection 4:10:43 – 4:10:45 PM
North of LakeSuperior
34
18
End of the CascadeArea affected by blackout – pockets of generation and load remain on
Area affected by blackoutService maintained in
isolated pockets
35
Outage Summary
By Generation type:
C ti l t it 67 l t (39 l)Conventional steam units: 67 plants (39 coal)
Combustion turbines: 66 plants (36 combined cycle)
Nuclear: 10 plants (7 US and 3 Canadian, totaling 19 units)
Hydro 101
36
Hydro 101
Other 19
19
Operation Summary
Line and Transformer operationsfrom 3:05 PM to 4:13 PMfrom 3:05 PM to 4:13 PM
Line outages - 404Transformer outages - 20
50 million people8 states and 2 provinces
37
8 states and 2 provinces
60-65,000 MW of load initially interrupted 11% of Eastern Interconnection
Joint Task Force Report 4/5/04 - Chapter 10
Forty-six Recommendations
Group I - Institutional Issues Related to Reliabilityd iRecommendations 1-14
Group II - Support and Strengthen NERC’s Actions of February 10, 2004
Recommendations 15-31
Group III - Physical and Cyber Security of North lk
38
American Bulk Power SystemsRecommendations 32-46
Group IV - Canadian Nuclear Power SectorRecommendations 45-46
20
Joint Task Force Report 4/5/04 - Chapter 10
Group I - Institutional Issues Related to ReliabilityRecommendations (1-14)
Make reliability standards mandatory and enforceable, with penalties for noncompliance (1)
Strengthen institutional framework (3)
39
Group II - Support NERC 2/10/04 Actions
Strategic Initiatives:Strategic Initiatives:
Vegetation Management standards & performance (16)
Strengthen Compliance Enforcement Program (17)
“Control Area” & “Reliability Coordinator” audits (18)
40
21
Part 1 Overview
Overview of system and entitiesRoot Cause Analysis – Key StepsRoot Cause Analysis Key StepsSequence of Events prior to uncontrollable cascade failureDetermination of Root Causes – 3 “T”sUncontrollable wide area cascade
41
U co t o ab e de a ea cascadeJoint Task Force Recommendations
Situational Awareness & Risk Management
Why did the Control Area and Reliability Coordinator lose “Situational Awareness”?
Why did the first three 345kV lines trip prior to reaching maximum rated conductor temperature?
Wh did f il j i l k
42
Why did operators fail to jointly take heroic action?
22
Part 2 Overview during ‘Chat & Chew’ time
Keynote speaker – Dave NeviusRi k M t Ji R biRisk Management – Jim Robinson1. Failure Modes (identification & likelihood)2. Consequences (of each Failure Mode)3. Costs (to improve step 1 or 2, or both)
Professional Scientist (STEM) are key in pe fo ming Risk Management
43
performing Risk Management evaluations
44
23
Part 2 Overview during ‘Chat & Chew’ time
Risk Management (3 above steps) may be quick & easy OR require formal study.
DO NOT miss any of the 3 steps
During the next blackout or failure mode resulting in major consequences, will you be included?
45
Slides to remember . . . “Trees”
46
G = Max Sag Ground Clearance – Determined from Line Plan & Profile
WSZ = Wire Security Zone (Table 1)
A = Additional Clearance to allow for Growth of Vegetation
24
120
140
C
Sammis-Star
Slides to remember . . . “Tools & Training”Situational Awareness & Human Communication Training,
Command & Control Tools (TOP alarms fail 2:14PM)
40
60
80
100
% o
f Nor
mal
Rat
ings
Canton C
entral Transform
Babb-W
.Akron 138 kV
Ch
Star-
Cloverdale-Torrey 138
E.Lima-N
ew Liberty 13 8
W.A
kron-Pleasant Valley
E.Lima-N
.Finlay 138 kV
Cham
berlin-W.A
kron 138 kV
W.A
kron 138 kV Breaker
Dale-W
.Canton 138 kV
47
15:05:41 EDT
15:32:03 EDT
15:41:35 EDT
15:51:41 EDT
16:05:55 EDT
0
20
mer
Harding-
hamberlin
Hanna-
Juniper
-S.Canton
kV
8 kV
y 138 kV
V
Slides to remember - Directional Distance Relays
48
25
Keynote speaker – Dave NeviusRi k M t Ji R bi
Slides to remember . . .
Risk Management – Jim Robinson1. Failure Modes (identification & likelihood)2. Consequences (of each Failure Mode)3. Costs (to improve step 1 or 2, or both)
Professional Scientist (STEM) are key in pe fo ming Risk Management
49
performing Risk Management evaluations
Any Questions ??