data mining techniques to classify inter-area oscillations adamantios marinakis abb corporate...
TRANSCRIPT
Data Mining Techniquesto classify inter-area oscillations
Adamantios MarinakisABB Corporate Research CH
London, 29/11/2013
2
Presentation outline
Problem statement
Data mining
Support Vector Machines
Evolution Strategies
Random Forests
Solution – Results
Conclusion
3
Presentation outline
Problem statement
Data mining
Support Vector Machines
Evolution Strategies
Random Forests
Solution – Results
Conclusion
© ABB Group April 10, 2023 | Slide 4
Time stamps
GPS Satellite
Voltage and current phasors
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I t V, I t V, I t
V, I tV, I tV, I tV, I tV, I tV, I t
V, I tV, I tV, I t
Communication network
Wide-Area Monitoring System (WAMS)
System Protection Center
• Visualization of power system dynamics• Stability monitoring• Stability control and blackout prevention
5
Power Damping Monitoring – PDMPrinciple
Sliding window of 10-15 minutes length
Estimate MIMO state-space model
Carry our modal analysis Damping and frequency of critical modes …
20 30 40 50 60 70 800.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
Damping (%)
Mod
e F
requ
ency
(H
z)
6
Swissgrid WAMSCollects measurements from PMUs around Europe
7
And then?Do something more than observing…
What we have:
An operator can at any moment know what are the oscillation modes in its system
The operator can know in real-time its system security status
Insecure if damping < some value
What would be nice to have:
Given a candidate operating point, predict its expected oscillatory status.
Given an observed poorly damped operating point, say what is the reason for this. modify the operating point such that it
becomes well damped.o Insecure secure
modeloperating
pointsecurity status
8
What is an “operating point”At least, how we define it here
9
Overview of the approachLinking WAMS with SCADA data…
WAMS
PMU measurements
time-stamped oscillations
damping ratios
SCADA system (time-stamped data)
generation, load dispatch
line power flows
FACTS devices status
(PSS status)
…
Train classifierDatabase
input variables
output labels
Need to time-synchronize them
10
Presentation outline
Problem statement
Data mining
Support Vector Machines
Evolution Strategies
Random Forests
Solution – Results
Conclusion
11
What is data mining?Apart from a fancy term
An interdisciplinary subfield of computer science. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
It is about analyzing the data
12
Presentation outline
Problem statement
Data mining
Support Vector Machines
Evolution Strategies
Random Forests
Solution – Results
Conclusion
13
Support Vector MachinesA powerful classification technique
Main Idea: Find the optimal separating
hyperplane maximum margin, i.e. maximize distance to the closest point from either class
Minimizes generalization error
a QP
found by solving:
14
Non-separable classes
min𝒘 ,𝑏
12‖𝒘‖2+𝐶∑
𝑖=1
𝑁
𝜉 𝑖 ¿ s . t . 𝑦 𝑖 (𝒘𝑇 𝒙 𝑖+𝑏 )≥1−𝜉 𝑖 ,𝜉 𝑖≥0 ∀ 𝑖
min𝒘 ,𝑏
12‖𝒘‖2 ¿ s . t . 𝑦 𝑖 (𝒘 𝑇 𝒙𝑖+𝑏)≥1 , 𝑖=1 ,…,𝑁
regularization parameter
15
And what about nonlinear patterns in the data?
Map into a higher
dimensio
n feature sp
ace
Is there any problem?
YES! Number of features may blow up!
Computing the mapping can be inefficient
Using the mapped representation can be inefficient
Is there any solution?
YES!
16
The “kernel trick”
QP solved by resorting to its dual problem:
which … finally gives:
Note: We only need , never just Hence: kernel function:
It should have a dot
product in the space
defined by
() :
17
Most used kernels
Polynomial:
Linear: special case of polynomial
Gaussian:
, etc. are called “kernel hyperparameters”
They have to be chosen by the user
18
Ouf, now it seems that quite some tuning is required …
The user should choose …1. 2. kernel function3. kernel function hyperparameters
Role of regularization parameter :even more pronounced in an enlarged feature space where perfect separation can typically be achieved Overly large value of will lead to an overfit “too curvy” boundary. Overly small will lead to an overly smooth boundary, with big training error.
Large , kernel function “too flexible”, very nonlinear boundary can be achievedP
roper tuning is essential for good SVM performance
19
Automatic tuning of the SVM hyperparametersA nonlinear, non analytical optimization problem
Choose:, kernel, ( , , … )
Such that:SVM accuracy is maximized
Kernel choice:o binary coninuous
SVM accuracy:o 10-fold cross-validation
20
Presentation outline
Problem statement
Data mining
Support Vector Machines
Evolution Strategies
Random Forests
Solution – Results
Conclusion
21
The basic cycle of the ES algorithm
Explore
Exploit
×
×
×
×∘∘∘
∘ ∘∘
∘∘
∘ ∘ ∘
∘
∘
𝑓 =…
𝑓 =…
22
Mutation: create an offspring out of one parent
is created by mutating :
with
is called the mutation strength
23
Create offsprings out of one parent𝜆
×
∘
∘∘ ∘×
∘
∘
∘
∘∘
∘∘
∘
∘
∘×
×∘
∘∘
∘
∘
∘
∘
24
Self-adaptation of mutation strength 𝜎 Each variable has its mutation strength Mutation strengths are also mutated
with sampled from Each individual carries its mutation strengths’ values
Idea: individuals with more suitable mutation strength values will survive
Before mutating the individual object parameters, the strategy parameters are first mutated
25
Population >1𝜇
×
×
×
×
∘∘∘
∘ ∘∘
∘
∘
∘
∘
∘
∘
26
Another variation operator: Recombination
Create offspring out of parentse.g.
1. Do times recombination
2. Then apply mutation on those offsprings
×
×
×
×
×
×
××∘
∘
∘
∘
Parents are selected by uniform random distribution
(their fitness is NEVER taken into account)
(𝜇 /𝜚+¿ ,𝜆 )−𝐸𝑆
27
Guidelines for successful self-adaptation
preferred over selection better in leaving local optimum better in following moving optima with the + strategy bad can survive too long
to carry different strategies
high selective pressure (usually ) to generate offspring surplus
mix strategy parameters (i.e. mutation strengths) by recombining them
28
ES-tuned SVM classifierComing up with the oscillation damping classifier
29
Presentation outline
Problem statement
Data mining
Support Vector Machines
Evolution Strategies
Random Forests
Solution – Results
Conclusion
30
Random ForestsA promising alternative
A collection of decision trees
Basic Idea of DT:
Greedy algorithm to progressively select the cut-attributes
Splitting decided according to some node impurity measure
typically the Gini index
31
Ensemble classifiers
Why do they work
Assume 25 classifiers
Each with error rate
Assume independence among
classifiers
Error rate of the ensemble
classifier:
General Idea
32
Random Forests – The algorithm
Given training dataset For to :
1. Draw a bootstrap sample of size from (i.e. sample times with replacement)
2. Grow a tree classifier on , where each split is computed as follows:a) Select variables at random (from the variables)b) Pick the best variable/split-point among the c) Split the current node into two
Output: the ensemble of trees
𝜚𝜎 2+1−𝜚𝐵
𝜎2
Feature importance insight
Massive parallelization potential
pairwise correlation
33
Presentation outline
Problem statement
Data mining
Support Vector Machines
Evolution Strategies
Random Forests
Solution – Results
Conclusion
34
Solution OverviewLinking WAMS with SCADA data…
WAMS
PMU measurements
time-stamped oscillations
damping ratios
SCADA system (time-stamped data)
generation, load dispatch
line power flows
FACTS devices status
(PSS status)
…
Train classifierDatabase
input variables
output labels
Need for proper feature selection
35
Test system - Modified Nordic3212978 samples, produced by simulations
(based on participation factors from linear model)Generators mostly participating at the 0.4-0.5Hz mode
Correspond to different PSS being off
28 30 32 34 36 38 40 42 44
-8.00%
-6.00%
-4.00%
-2.00%
0.00%
2.00%
4.00%
6.00%
8.00%
4851 samples
1643 samples(out of 12978)
1271 samples
3580 samples
Damping vs. Intertie CutCorrelated, but …
37
ES-SVM classifier10-fold cross-validation accuracy
1% - 3% improvement compared to initial guess
mixed kernel slightly better
More features better performance (even if redundant)
Input features kernel
mixed radial basis polynomial
Only intertie flow 92.7 92.7 92.0
Intertie flow & PSS status 93.4 94.0 92.8
Dispatch 95.6 95.6
Intertie flow, PSS status &
synthetic features
98.3 97.8 98.2
Dispatch & PSS status 98.6 97.8 98.3
Dispatch, power flows,
PSS status & synthetic
features
99.2 98.6 99.1
95.628 30 32 34 36 38 40 42 44
-8.00%
-6.00%
-4.00%
-2.00%
0.00%
2.00%
4.00%
6.00%
8.00%
38
Random Forest classifierOut-of-bag accuracy
Input features Accuracy
Dispatch, power flows, PSS status & synthetic features
97.79
PSS, Intertie, Line 18, Line 32
98.54
PSS, Intertie, Gen63, Line 16, Line 32
98.53
PSS, Intertie, Gen63 & 6 line flows
98.59
18
32
Gen63 very efficient feature selection
less accurate than SVM
16
39
Presentation outline
Problem statement
Data mining
Support Vector Machines
Evolution Strategies
Random Forests
Solution – Results
Conclusion
40
Conclusion … and challenges
WAMS-SCADA link turned out to be an interesting idea At least for the inter-area oscillations case
SVM achieved higher accuracy proper SVM tuning pays off
RFs are not much worse, while allowing for very efficient feature selection
Challenges… Check in real data Computational intensiveness Close the loop – Correct operating point based on model
41
Acknowledgment
The author gratefully acknowledges the financial support from Marie Curie FP7-IAPP Project: Using real-time measurements for monitoring and management of power
transmission dynamics for the smart grid- REAL-SMART, Contract No. PIAP-GA 2009-251304
Thank you for your attention!
Adamantios MarinakisABB Corporate Research Switzerland
Phone: +41 585867307Mobile: +41 798766227