situational awareness and risk management - lehigh …inesei/images/pdf/ieeepresentation_1_2.pdf ·...

25
1 Situational Awareness and Risk Management Presentation at Lehigh University April 7, 2014 and April 8, 2014 at IEEE Local Chapter Meeting By Jim Robinson - Relion Associates LLC Part 1, 2 and 3 Summary Part 1 – case study US/CA 2003 Blackout Part 2 – Scientific industry (STEM) renewed focus on Situational Awareness & Risk Management Part 3 case study Pacific SW 2011 event 2 Part 3 case study Pacific SW 2011 event

Upload: lydung

Post on 06-Mar-2018

220 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

1

Situational Awareness and Risk Management

Presentation at Lehigh University April 7, 2014 and April 8, 2014 at IEEE Local Chapter Meeting

By Jim Robinson - Relion Associates LLC

Part 1, 2 and 3 Summary

Part 1 – case study US/CA 2003 Blackout

Part 2 – Scientific industry (STEM) renewed focus on Situational Awareness & Risk Management

Part 3 – case study Pacific SW 2011 event

2

Part 3 case study Pacific SW 2011 event

Page 2: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

2

Situational AwarenessSituational Awareness and Risk Management

Case study - August 14, 2003 Blackout

By Jim Robinson - Relion Associates LLC8/14/2003 Event – NERC Sequence of Events Team Leader

Part 1 Overview

Overview of system and entities

Root Cause Analysis Key StepsRoot Cause Analysis – Key Steps

Sequence of Events prior to uncontrollable cascade failure

Determination of Root Causes – 3 “T”s

4

Uncontrollable wide area cascade

Joint Task Force Recommendations

Page 3: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

3

Initial Conditions – 8/14/2003 Not Unusual(thousands of key conditions captured)

5

Root Cause Analysis – Key Steps

Early Capture of ALL Recordings and LogsDigital & analog data, communications, email, audio, video, etc.

I i i T F iInvestigation Team FormationKey subject matter specialists

Document System Initial ConditionsKey entity data several hours prior to event

Determine Precise Sequence of Events

6

q

Ask endless Questions – Root of each Step in SequenceConfirm answer to key Questions

Avoid hidden agendas, off Sequence tangents, etc.

Page 4: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

4

Investigation Team OverviewSteering Group

Investigation Team Leaders

U.S – CanadaTask Force

MAAC/ECAR/NPCC Coordinating Group

MAAC

Project Planning and Support

Sequence of EventsJim Robinson –

Team Leader

NERC & Regional Standards/Procedures

& Compliance

Transmission System Performance,

Protection, ControlMaintenance & Damage

Operations - Tools, SCADA/EMS

Communications Op

Root Cause Analysis

Generator Performance, Protection, Controls

Maintenance & Damage

Vegetation/ROW Management

Restoration

Investigation Process Review

7

ECAR

NPCC

MEN Study Group

Data Requests and Management

System Modeling and Simulation Analysis

Communications Op Planning

System Planning, Design, & Studies

Maintenance & Damage

Frequency/ACE

ONTARIO

Sequence Start DeterminationEast Lake Generator #5 Trip: 1:31:34 PM

ONTARIOONTARIO

Transmission Lines

765 kV500 kV345 kV230 kV

Transmission Lines

765 kV500 kV345 kV230 kV

8

Page 5: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

5

“Tools”Control Area (CA) Entity - Computer Failures

2:14 PM - Alarm logger fails and operators are not awareS. Canton - Star 345 kV line trip and reclose at 2:27 PM (both ends)

2:41 PM - server hosting alarm processor and other functions fails to backup

2:54 PM - backup server fails (2 of 4 servers down)2 servers continue to function but with long refresh (~59 second)unidentified Harding - Chamberlin 345kV line outage at 3:05 PM

9

3:08 PM - IT warm reboot of servers appears to work but alarm process not tested and still in failed condition

Root Cause Analysis Phases

BLACKOUT16:15

Sammis – StarStar – South Canton

Hanna – Juniper

16:06

Initial Focus

10

Pre-Existing ConditionsE.g. voltages, wide- area transfers,

line and generator outages, etc.

pChamberlin - Harding

15:05

Page 6: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

6

Power System High Level Sequence

Premature failure of three 345kV linesstarting at 3:05 PM, three permanent outages within 36 minutes due to tree branches to close underneath heated conductorsdue to tree branches to close underneath heated conductors

Northern Ohio 138kV cascade beganstarted 3:39 PM - caused by above 345kV premature failures138kV conductors severely overheated beyond design limits

Northern Ohio 345kV high speed cascade

11

Northern Ohio 345kV high speed cascadeoverloaded line trips 4:05:57 - 4:09:07 PMaccelerated by Zone 3 directional distance relays

Eastern Interconnection Separates by 4:11PM

Chamberlin-Harding (3:05:41)

12

Page 7: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

7

(3:05:41) Hanna-Juniper(3:32:03)

13

(3:05:41)(3:32:03)

Star- S Canton (3:41:35)

14

Star- S. Canton (3:41:35)

Page 8: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

8

Situation after Initial Trips 3:05:41 – 3:41:35

15

Hanna - Juniper Tree ContactInadequate tree clearance = Premature failure

16

Page 9: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

9

Root Cause #1 “Trees”Transmission Owner Vegetation Management

Transmission Owner (TO) did not adequately manage tree branch clearance under hot conductors. Conductors could not reach maximum rated temperaturenot reach maximum rated temperature.

NERC requires line rating be calculated based on most limiting element.

Including conductor temperature limit due to non- standard clearances (i.e. buildings, telephone wires, etc.) underneath the conductor; but prior to 2007 did not require adequate clearance to

17

conductor; but prior to 2007 did not require adequate clearance to trees.

TO did not comply with its own Utility Vegetation Management program (UVM Report page 50, #2)

18

G = Max Sag Ground Clearance – Determined from Line Plan & Profile

WSZ = Wire Security Zone (Table 1)

A = Additional Clearance to allow for Growth of Vegetation

Page 10: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

10

120

140

C

Sammis-Star

Northern Ohio 138 kV Cascade contributes to Sammis-Star 345 kV overload

40

60

80

100

% o

f Nor

mal

Rat

ings

Canton C

entral Transform

Babb-W

.Akron 138 kV

Ch

Star-

Cloverdale-Torrey 138

E.Lima-N

ew Liberty 13 8

W.A

kron-Pleasant Valley

E.Lima-N

.Finlay 138 kV

Cham

berlin-W.A

kron 138 kV

W.A

kron 138 kV Breaker

Dale-W

.Canton 138 kV

19

15:05:41 EDT

15:32:03 EDT

15:41:35 EDT

15:51:41 EDT

16:05:55 EDT

0

20

mer

Harding-

hamberlin

Hanna-

Juniper

-S.Canton

kV

8 kV

y 138 kV

V

120

140

C

Sammis-Star

Root Causes #2 & #3 “Tools & Training”Situational Awareness & Human Communication Training,

Command & Control Tools (TOP alarms fail 2:14PM)

40

60

80

100

% o

f Nor

mal

Rat

ings

Canton C

entral Transform

Babb-W

.Akron 138 kV

Ch

Star-

Cloverdale-Torrey 138

E.Lima-N

ew Liberty 13 8

W.A

kron-Pleasant Valley

E.Lima-N

.Finlay 138 kV

Cham

berlin-W.A

kron 138 kV

W.A

kron 138 kV Breaker

Dale-W

.Canton 138 kV

20

15:05:41 EDT

15:32:03 EDT

15:41:35 EDT

15:51:41 EDT

16:05:55 EDT

0

20

mer

Harding-

hamberlin

Hanna-

Juniper

-S.Canton

kV

8 kV

y 138 kV

V

Page 11: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

11

Sammis-Star(4:05:57.5)

21

Directional Distance Relays

22

Page 12: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

12

Sammis-Star Zone 3 Relay Operateson Steady State Overload

23

Northern Ohio 345kV High Speed Cascade beginsLoss of Sammis - Star 345 kV Line 4:05:57 PM

Major Path to Cleveland Blocked

24

Page 13: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

13

Root Causes – 3 “T”s“T”rees

Failure to manage tree growth in ROWs − to achieve maximum conductor design temperature

“T”oolsInadequate Situational AwarenessFailure of Reliability Coordinator’s diagnosticsLack of load shedding capability

25

Lack of load shedding capability

“T”rainingLack of effective situational communicationLack of decision & orders to shed line load

Reliability Standard Violations

Failure to return system to safe operating state within 30 minutesFailure to notify others of impending emergencyAdequate contingency analysis not performedInadequate training Reliability Coordinator did not notify othersReliability Coordinator did not have adequate

26

Reliability Coordinator did not have adequate monitoring capability

Page 14: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

14

Uncontrollable Cascade Begins at 4:06PM

Premature failure of three 345kV linesstarting at 3:05 PM, three permanent outages within 36 minutes due to tree branches to close underneath heated conductorsdue to tree branches to close underneath heated conductors

Northern Ohio 138kV cascade beganstarted 3:39 PM - caused by above 345kV premature failures138kV conductors severely overheated beyond design limits

Northern Ohio 345kV high speed cascade of

27

Northern Ohio 345kV high speed cascade of overloaded lines 4:05:57 - 4:09:07 PM

accelerated by Zone 3 directional distance relay loadability

Eastern Interconnection Separates by 4:11PM

345 kV Overloaded Lines Trip – Zone 3 Relays4:08:59 - 4:09:07 PM

ONTARIO

28

Page 15: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

15

Generation Trips 4:09:08 – 4:10:27 PM

ONTARIO

29

345 kV Zone 3 Relays Accelerate CascadeArgenta-Battle Creek lines trip 4:10:36 – 4:10:37PM

30

Page 16: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

16

Three 345 kv Lines trip from 4:10:37.5 - 4:10:38.6 PM New Phase Begins - “Transient Instability”

31

Power Transfers Shift at 4:10:38.6 PM

32

Page 17: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

17

Eastern Michigan (Detroit) Transient InstabilityMW & Voltage Swings (& N-S Pole Slipping)

33

Northeast Island Completes Separation from Eastern Interconnection 4:10:43 – 4:10:45 PM

North of LakeSuperior

34

Page 18: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

18

End of the CascadeArea affected by blackout – pockets of generation and load remain on

Area affected by blackoutService maintained in

isolated pockets

35

Outage Summary

By Generation type:

C ti l t it 67 l t (39 l)Conventional steam units: 67 plants (39 coal)

Combustion turbines: 66 plants (36 combined cycle)

Nuclear: 10 plants (7 US and 3 Canadian, totaling 19 units)

Hydro 101

36

Hydro 101

Other 19

Page 19: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

19

Operation Summary

Line and Transformer operationsfrom 3:05 PM to 4:13 PMfrom 3:05 PM to 4:13 PM

Line outages - 404Transformer outages - 20

50 million people8 states and 2 provinces

37

8 states and 2 provinces

60-65,000 MW of load initially interrupted 11% of Eastern Interconnection

Joint Task Force Report 4/5/04 - Chapter 10

Forty-six Recommendations

Group I - Institutional Issues Related to Reliabilityd iRecommendations 1-14

Group II - Support and Strengthen NERC’s Actions of February 10, 2004

Recommendations 15-31

Group III - Physical and Cyber Security of North lk

38

American Bulk Power SystemsRecommendations 32-46

Group IV - Canadian Nuclear Power SectorRecommendations 45-46

Page 20: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

20

Joint Task Force Report 4/5/04 - Chapter 10

Group I - Institutional Issues Related to ReliabilityRecommendations (1-14)

Make reliability standards mandatory and enforceable, with penalties for noncompliance (1)

Strengthen institutional framework (3)

39

Group II - Support NERC 2/10/04 Actions

Strategic Initiatives:Strategic Initiatives:

Vegetation Management standards & performance (16)

Strengthen Compliance Enforcement Program (17)

“Control Area” & “Reliability Coordinator” audits (18)

40

Page 21: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

21

Part 1 Overview

Overview of system and entitiesRoot Cause Analysis – Key StepsRoot Cause Analysis Key StepsSequence of Events prior to uncontrollable cascade failureDetermination of Root Causes – 3 “T”sUncontrollable wide area cascade

41

U co t o ab e de a ea cascadeJoint Task Force Recommendations

Situational Awareness & Risk Management

Why did the Control Area and Reliability Coordinator lose “Situational Awareness”?

Why did the first three 345kV lines trip prior to reaching maximum rated conductor temperature?

Wh did f il j i l k

42

Why did operators fail to jointly take heroic action?

Page 22: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

22

Part 2 Overview during ‘Chat & Chew’ time

Keynote speaker – Dave NeviusRi k M t Ji R biRisk Management – Jim Robinson1. Failure Modes (identification & likelihood)2. Consequences (of each Failure Mode)3. Costs (to improve step 1 or 2, or both)

Professional Scientist (STEM) are key in pe fo ming Risk Management

43

performing Risk Management evaluations

44

Page 23: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

23

Part 2 Overview during ‘Chat & Chew’ time

Risk Management (3 above steps) may be quick & easy OR require formal study.

DO NOT miss any of the 3 steps

During the next blackout or failure mode resulting in major consequences, will you be included?

45

Slides to remember . . . “Trees”

46

G = Max Sag Ground Clearance – Determined from Line Plan & Profile

WSZ = Wire Security Zone (Table 1)

A = Additional Clearance to allow for Growth of Vegetation

Page 24: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

24

120

140

C

Sammis-Star

Slides to remember . . . “Tools & Training”Situational Awareness & Human Communication Training,

Command & Control Tools (TOP alarms fail 2:14PM)

40

60

80

100

% o

f Nor

mal

Rat

ings

Canton C

entral Transform

Babb-W

.Akron 138 kV

Ch

Star-

Cloverdale-Torrey 138

E.Lima-N

ew Liberty 13 8

W.A

kron-Pleasant Valley

E.Lima-N

.Finlay 138 kV

Cham

berlin-W.A

kron 138 kV

W.A

kron 138 kV Breaker

Dale-W

.Canton 138 kV

47

15:05:41 EDT

15:32:03 EDT

15:41:35 EDT

15:51:41 EDT

16:05:55 EDT

0

20

mer

Harding-

hamberlin

Hanna-

Juniper

-S.Canton

kV

8 kV

y 138 kV

V

Slides to remember - Directional Distance Relays

48

Page 25: Situational Awareness and Risk Management - Lehigh …inesei/images/pdf/IEEEpresentation_1_2.pdf · 1 Situational Awareness and Risk Management Presentation at Lehigh University April

25

Keynote speaker – Dave NeviusRi k M t Ji R bi

Slides to remember . . .

Risk Management – Jim Robinson1. Failure Modes (identification & likelihood)2. Consequences (of each Failure Mode)3. Costs (to improve step 1 or 2, or both)

Professional Scientist (STEM) are key in pe fo ming Risk Management

49

performing Risk Management evaluations

Any Questions ??