data mining techniques to classify inter-area oscillations adamantios marinakis abb corporate...

Data Mining Techniquesto classify inter-area oscillations

Adamantios MarinakisABB Corporate Research CH

London, 29/11/2013

2

Presentation outline

Problem statement

Data mining

Support Vector Machines

Evolution Strategies

Random Forests

Solution – Results

Conclusion

3


Problem statement

Data mining



Random Forests


Conclusion

© ABB Group April 10, 2023 | Slide 4

Time stamps

GPS Satellite

Voltage and current phasors

V, I t V, I t V, I t












V, I tV, I tV, I tV, I tV, I tV, I t

V, I tV, I tV, I t

Communication network

Wide-Area Monitoring System (WAMS)

System Protection Center

• Visualization of power system dynamics• Stability monitoring• Stability control and blackout prevention

5

Power Damping Monitoring – PDMPrinciple

Sliding window of 10-15 minutes length

Estimate MIMO state-space model

Carry our modal analysis Damping and frequency of critical modes …

20 30 40 50 60 70 800.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

0.3

Damping (%)

Mod

e F

requ

ency

(H

z)

6

Swissgrid WAMSCollects measurements from PMUs around Europe

7

And then?Do something more than observing…

What we have:

An operator can at any moment know what are the oscillation modes in its system

The operator can know in real-time its system security status

Insecure if damping < some value

What would be nice to have:

Given a candidate operating point, predict its expected oscillatory status.

Given an observed poorly damped operating point, say what is the reason for this. modify the operating point such that it

becomes well damped.o Insecure secure

modeloperating

pointsecurity status

8

What is an “operating point”At least, how we define it here

9

Overview of the approachLinking WAMS with SCADA data…

WAMS

PMU measurements

time-stamped oscillations

damping ratios

SCADA system (time-stamped data)

generation, load dispatch

line power flows

FACTS devices status

(PSS status)

…

Train classifierDatabase

input variables

output labels

Need to time-synchronize them

10


Problem statement

Data mining



Random Forests


Conclusion

11

What is data mining?Apart from a fancy term

An interdisciplinary subfield of computer science. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

It is about analyzing the data

12


Problem statement

Data mining



Random Forests


Conclusion

13

Support Vector MachinesA powerful classification technique

Main Idea: Find the optimal separating

hyperplane maximum margin, i.e. maximize distance to the closest point from either class

Minimizes generalization error

a QP

found by solving:

14

Non-separable classes

min𝒘 ,𝑏

12‖𝒘‖2+𝐶∑

𝑖=1

𝑁

𝜉 𝑖 ¿ s . t . 𝑦 𝑖 (𝒘𝑇 𝒙 𝑖+𝑏 )≥1−𝜉 𝑖 ,𝜉 𝑖≥0 ∀ 𝑖

min𝒘 ,𝑏

12‖𝒘‖2 ¿ s . t . 𝑦 𝑖 (𝒘 𝑇 𝒙𝑖+𝑏)≥1 , 𝑖=1 ,…,𝑁

regularization parameter

15

And what about nonlinear patterns in the data?

Map into a higher

dimensio

n feature sp

ace

Is there any problem?

YES! Number of features may blow up!

Computing the mapping can be inefficient

Using the mapped representation can be inefficient

Is there any solution?

YES!

16

The “kernel trick”

QP solved by resorting to its dual problem:

which … finally gives:

Note: We only need , never just Hence: kernel function:

It should have a dot

product in the space

defined by

() :

17

Most used kernels

Polynomial:

Linear: special case of polynomial

Gaussian:

, etc. are called “kernel hyperparameters”

They have to be chosen by the user

18

Ouf, now it seems that quite some tuning is required …

The user should choose …1. 2. kernel function3. kernel function hyperparameters

Role of regularization parameter :even more pronounced in an enlarged feature space where perfect separation can typically be achieved Overly large value of will lead to an overfit “too curvy” boundary. Overly small will lead to an overly smooth boundary, with big training error.

Large , kernel function “too flexible”, very nonlinear boundary can be achievedP

roper tuning is essential for good SVM performance

19

Automatic tuning of the SVM hyperparametersA nonlinear, non analytical optimization problem

Choose:, kernel, ( , , … )

Such that:SVM accuracy is maximized

Kernel choice:o binary coninuous

SVM accuracy:o 10-fold cross-validation

20


Problem statement

Data mining



Random Forests


Conclusion

21

The basic cycle of the ES algorithm

Explore

Exploit

×

×

×

×∘∘∘

∘ ∘∘

∘∘

∘ ∘ ∘

∘

∘

𝑓 =…

𝑓 =…

22

Mutation: create an offspring out of one parent

is created by mutating :

with

is called the mutation strength

23

Create offsprings out of one parent𝜆

×

∘

∘∘ ∘×

∘

∘

∘

∘∘

∘∘

∘

∘

∘×

×∘

∘∘

∘

∘

∘

∘

24

Self-adaptation of mutation strength 𝜎 Each variable has its mutation strength Mutation strengths are also mutated

with sampled from Each individual carries its mutation strengths’ values

Idea: individuals with more suitable mutation strength values will survive

Before mutating the individual object parameters, the strategy parameters are first mutated

25

Population >1𝜇

×

×

×

×

∘∘∘

∘ ∘∘

∘

∘

∘

∘

∘

∘

26

Another variation operator: Recombination

Create offspring out of parentse.g.

1. Do times recombination

2. Then apply mutation on those offsprings

×

×

×

×

×

×

××∘

∘

∘

∘

Parents are selected by uniform random distribution

(their fitness is NEVER taken into account)

(𝜇 /𝜚+¿ ,𝜆 )−𝐸𝑆

27

Guidelines for successful self-adaptation

preferred over selection better in leaving local optimum better in following moving optima with the + strategy bad can survive too long

to carry different strategies

high selective pressure (usually ) to generate offspring surplus

mix strategy parameters (i.e. mutation strengths) by recombining them

28

ES-tuned SVM classifierComing up with the oscillation damping classifier

29


Problem statement

Data mining



Random Forests


Conclusion

30

Random ForestsA promising alternative

A collection of decision trees

Basic Idea of DT:

Greedy algorithm to progressively select the cut-attributes

Splitting decided according to some node impurity measure

typically the Gini index

31

Ensemble classifiers

Why do they work

Assume 25 classifiers

Each with error rate

Assume independence among

classifiers

Error rate of the ensemble

classifier:

General Idea

32

Random Forests – The algorithm

Given training dataset For to :

1. Draw a bootstrap sample of size from (i.e. sample times with replacement)

2. Grow a tree classifier on , where each split is computed as follows:a) Select variables at random (from the variables)b) Pick the best variable/split-point among the c) Split the current node into two

Output: the ensemble of trees

𝜚𝜎 2+1−𝜚𝐵

𝜎2

Feature importance insight

Massive parallelization potential

pairwise correlation

33


Problem statement

Data mining



Random Forests


Conclusion

34

Solution OverviewLinking WAMS with SCADA data…

WAMS

PMU measurements

time-stamped oscillations

damping ratios

SCADA system (time-stamped data)

generation, load dispatch

line power flows

FACTS devices status

(PSS status)

…

Train classifierDatabase

input variables

output labels

Need for proper feature selection

35

Test system - Modified Nordic3212978 samples, produced by simulations

(based on participation factors from linear model)Generators mostly participating at the 0.4-0.5Hz mode

Correspond to different PSS being off

28 30 32 34 36 38 40 42 44

-8.00%

-6.00%

-4.00%

-2.00%

0.00%

2.00%

4.00%

6.00%

8.00%

4851 samples

1643 samples(out of 12978)

1271 samples

3580 samples

Damping vs. Intertie CutCorrelated, but …

37

ES-SVM classifier10-fold cross-validation accuracy

1% - 3% improvement compared to initial guess

mixed kernel slightly better

More features better performance (even if redundant)

Input features kernel

mixed radial basis polynomial

Only intertie flow 92.7 92.7 92.0

Intertie flow & PSS status 93.4 94.0 92.8

Dispatch 95.6 95.6

Intertie flow, PSS status &

synthetic features

98.3 97.8 98.2

Dispatch & PSS status 98.6 97.8 98.3

Dispatch, power flows,

PSS status & synthetic

features

99.2 98.6 99.1

95.628 30 32 34 36 38 40 42 44

-8.00%

-6.00%

-4.00%

-2.00%

0.00%

2.00%

4.00%

6.00%

8.00%

38

Random Forest classifierOut-of-bag accuracy

Input features Accuracy

Dispatch, power flows, PSS status & synthetic features

97.79

PSS, Intertie, Line 18, Line 32

98.54

PSS, Intertie, Gen63, Line 16, Line 32

98.53

PSS, Intertie, Gen63 & 6 line flows

98.59

18

32

Gen63 very efficient feature selection

less accurate than SVM

16

39


Problem statement

Data mining



Random Forests


Conclusion

40

Conclusion … and challenges

WAMS-SCADA link turned out to be an interesting idea At least for the inter-area oscillations case

SVM achieved higher accuracy proper SVM tuning pays off

RFs are not much worse, while allowing for very efficient feature selection

Challenges… Check in real data Computational intensiveness Close the loop – Correct operating point based on model

41

Acknowledgment

The author gratefully acknowledges the financial support from Marie Curie FP7-IAPP Project: Using real-time measurements for monitoring and management of power

transmission dynamics for the smart grid- REAL-SMART, Contract No. PIAP-GA 2009-251304

Thank you for your attention!

Adamantios MarinakisABB Corporate Research Switzerland

Phone: +41 585867307Mobile: +41 798766227

[email protected]

data mining techniques to classify inter-area oscillations adamantios marinakis abb corporate...

Documents

data slide

qp slide

account slide

exploit slide

blackout prevention

regularization parameter

good svm performance

data mining techniques