predicting electricity distribution feeder failures using machine learning marta arias 1, hila...
TRANSCRIPT
![Page 1: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/1.jpg)
Predicting Electricity Distribution Feeder Failures using Machine Learning
Marta Arias 1, Hila Becker 1,2
1Center for Computational Learning Systems2Computer ScienceColumbia University
LEARNING ‘06
![Page 2: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/2.jpg)
Overview of the Talk
Introduction to the Electricity Distribution
Network of New York City What are we doing and why?
Early solution using MartiRank, a boosting-
like algorithm for ranking
Current solution using Online learning
Related projects
![Page 3: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/3.jpg)
Overview of the Talk
Introduction to the Electricity Distribution
Network of New York City What are we doing and why?
Early solution using MartiRank, a boosting-
like algorithm for ranking
Current solution using Online learning
Related projects
![Page 4: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/4.jpg)
The Electrical System1. Generation 2. Transmission
3. PrimaryDistribution
4. Secondary Distribution
![Page 5: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/5.jpg)
Electricity Distribution: Feeders
![Page 6: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/6.jpg)
Problem
Distribution feeder failures result in automatic feeder shutdown called “Open Autos” or O/As
O/As stress networks, control centers, and field crews
O/As are expensive ($ millions annually) Proactive replacement is much cheaper and
safer than reactive repair
![Page 7: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/7.jpg)
Our Solution: Machine Learning Leverage Con Edison’s domain knowledge and
resources Learn to rank feeders based on susceptibility to
failure How?
Assemble data Train model based on past data Re-rank frequently using model on current
data
![Page 8: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/8.jpg)
New York City
![Page 9: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/9.jpg)
Some facts about feeders and failures
About 950 feeders: 568 in Manhattan 164 in Brooklyn 115 in Queens 94 in the Bronx
![Page 10: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/10.jpg)
Some facts about feeders and failures About 60% of feeders failed at least once On average, feeders failed 4.4 times (between June 2005 and August 2006)
![Page 11: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/11.jpg)
Some facts about feeders and failures mostly 0-5 failures
per day more in the summer strong seasonality
effects
![Page 12: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/12.jpg)
Feeder data Static data
Compositional/structural Electrical
Dynamic data Outage history (updated daily) Load measurements (updated every 5 minutes)
Roughly 200 attributes for each feeder New ones are still being added.
![Page 13: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/13.jpg)
Feeder Ranking Application
Goal: rank feeders according to likelihood to failure (if high risk place near the top)
Application needs to integrate all types of data
Application needs to react and adapt to incoming dynamic data Hence, update feeder ranking every 15 min.
![Page 14: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/14.jpg)
Application Structure
Static data
SQLServer
DB
MLEngine
MLModels Rankings
DecisionSupport GUI
ActionDriver
ActionTracker
Decision Support App
Outage data
Xfmr Stress data
Feeder Load data
![Page 15: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/15.jpg)
Goal: rank feeders according to likelihood to failure
![Page 16: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/16.jpg)
Overview of the Talk
Introduction to the Electricity Distribution Network of New York City What are we doing and why?
Early solution using MartiRank, a boosting-like algorithm for ranking Pseudo ROC and pseudo AUC MartiRank Performance metric Early results
Current solution using Online learning Related projects
![Page 17: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/17.jpg)
(pseudo) ROC
sorte
d b
y sco
re
0
0
0
1
2
1
3
outagesfeeders
![Page 18: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/18.jpg)
(pseudo) ROC
Number of feeders
Number of
outages
941
210
![Page 19: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/19.jpg)
Fractionof
outages
(pseudo) ROC
1
1
Area under
the ROC curve
Fraction of feeders
![Page 20: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/20.jpg)
Some observations about the (p)ROC Adapted to positive labels (not just 0/1) Best pAUC is not always 1 (actually it almost never is..)
E.g.: pAUC = 11/15 = 0.73 “Best” pAUC with this data is 14/15 = 0.93 corresponding to ranking 21000
1 1
2 0
3 2
4 0
5 0
ranking outages
1 2 3 4 5
3
2
1
![Page 21: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/21.jpg)
MartiRank
Boosting-like algorithm by [Long & Servedio, 2005]
Greedy, maximizes pAUC at each round Adapted to ranking Weak learners are sorting rules
Each attribute is a sorting rule Attributes are numerical only
If categorical, then convert to indicator vector of 0/1
![Page 22: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/22.jpg)
MartiRankfeeder list begins in random order
sort list by “best” variable
divide list in two: split outages evenly
divide list in three: split outages evenly
chooseseparate “best”variables foreach part, sort
chooseseparate “best”variables foreach part, sort
continue…
![Page 23: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/23.jpg)
MartiRank
Advantages: Fast, easy to implement Interpretable Only 1 tuning parameter “nr of rounds”
Disadvantages: 1 tuning parameter “nr of rounds”
Was set to 4 manually..
![Page 24: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/24.jpg)
Using MartiRank for real-time ranking of feeders MartiRank is a “batch” algorithm, hence must deal with
changing system by: Continually generate new datasets with latest data
Use data within a window, aggregate dynamic data within that period in various ways (quantiles, counts, sums, averages, etc.)
Re-train new model, throw out old model Seasonality effects not taken into account
Use newest model to generate ranking Must implement “training strategies”
Re-train daily, or weekly, or every 2 weeks, or monthly, or…
![Page 25: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/25.jpg)
Performance Metric
feedersfailures
failureranki
i
##
)(1
∗−
∑
Normalized average rank of failed feeders Closely related to (pseudo) Area-Under-ROC-Curve
when labels are 0/1: avgRank = pAUC + 1 / #examples
Essentially, difference comes from 0-based pAUC to 1-based ranks
![Page 26: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/26.jpg)
Performance Metric Example
feedersfailures
failureranki
i
##
)(1
∗−
∑
5833.08*3
5321 =
++−
1 0
2 1
3 1
4 0
5 1
6 0
7 0
8 0
ranking outages
3
2
1
1 2 3 4 5 6 7 8 pAUC=17/24=0.7
![Page 27: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/27.jpg)
How to measure performance over time Every ~15 minutes, generate new ranking
based on current model and latest data Whenever there is a failure, look up its rank
in the latest ranking before the failure After a whole day, compute normalized
average rank
![Page 28: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/28.jpg)
MartiRank Comparison: training every 2 weeks
![Page 29: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/29.jpg)
Using MartiRank for real-time ranking of feeders MartiRank seems to work well, but..
User decides when to re-train User decides how much data to use for re-training …. and other things like setting parameters, selecting
algorithms, etc. Want to make system 100% automatic!
Idea: Still use MartiRank since it works well with this data, but
keep/re-use all models
![Page 30: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/30.jpg)
Overview of the Talk
Introduction to the Electricity Distribution Network of New York City What are we doing and why?
Early solution using MartiRank, a boosting-like algorithm for ranking
Current solution using Online learning Overview of learning from expert advice and the Weighted
Majority Algorithm New challenges in our setting and our solution Results
Related projects
![Page 31: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/31.jpg)
Learning from expert advice
Consider each model as an expert Each expert has associated weight (or score)
Reward/penalize experts with good/bad predictions
Weight is a measure of confidence in expert’s prediction
Predict using weighted average of top-scoring experts
![Page 32: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/32.jpg)
Learning from expert advice
Advantages Fully automatic
No human intervention needed Adaptive
Changes in system are learned as it runs Can use many types of underlying learning algorithms Good performance guarantees from learning theory:
performance never too far off from best expert in hindsight Disadvantages
Computational cost: need to track many models “in parallel” Models are harder to interpret
![Page 33: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/33.jpg)
Weighted Majority Algorithm [Littlestone & Warmuth ‘88] Introduced for binary classification
Experts make predictions in [0,1] Obtain losses in [0,1]
Pseudocode: Learning rate as main parameter, ß in (0,1] There are N “experts”, initially weight is 1 for all For t=1,2,3, …
Predict using weighted average of each experts’ prediction Obtain “true” label; each expert incurs loss li Update experts’ weights using wi,t+1 = wi,t • pow(ß,li)
![Page 34: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/34.jpg)
In our case, can’t use WM directly Use ranking as opposed to binary
classification More importantly, do not have a fixed set of
experts
![Page 35: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/35.jpg)
Dealing with ranking vs. binary classification Ranking loss as normalized average rank of
failures as seen before, loss in [0,1] To combine rankings, use a weighted
average of feeders’ ranks
![Page 36: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/36.jpg)
Dealing with a moving set of experts Introduce new parameters
B: “budget” (max number of models) set to 100 p: new models weight percentile in [0,100] : age penalty in (0,1]
When training new models, add to set of models with weight corresponding to pth percentile (among current weights)
If too many models (more than B), drop models with poor q-score, where qi = wi • pow(, agei)
I.e., is rate of exponential decay
![Page 37: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/37.jpg)
Other parameters
How often do we train and add new models? Hand-tuned over the course of the summer
Every 7 days Seems to achieve balance of generating new models to adapt to
changing conditions without overflowing system
Alternatively, one could train when observed performance drops .. not used yet
How much data do we use to train models? Based on observed performance and early experiments
1 week worth of data, and 2 weeks worth of data
![Page 38: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/38.jpg)
Performance
![Page 39: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/39.jpg)
Failures’ rank distribution
![Page 40: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/40.jpg)
Daily average rank of failures
![Page 41: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/41.jpg)
Other things that I have not talked about but took a significant amount of time DATA
Data is spread over many repositories. Difficult to identify useful data Difficult to arrange access to data
Volume of data. Gigabytes of data accumulated on a daily basis. Required optimized database layout and the addition of a
preprocessing stage
Had to gain understanding of data semantics
Software Engineering (this is a deployed application)
![Page 42: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/42.jpg)
Current Status
Summer 2006: System has has been debugged, fine-tuned, tested and deployed
Now fully operational Ready to be used next summer (in test mode)
After this summer, we’re going to do systematic studies of Parameter sensitivity Comparisons to other approaches
![Page 43: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/43.jpg)
Related work-in-progress Online learning:
Fancier weight updates with better guaranteed performance in “changing environments”
Explore “direct” online ranking strategies (e.g. the ranking perceptron) Datamining project:
Aims to exploit seasonality Learn “mapping” from environmental conditions to good performing
experts’ characteristics When same conditions arise in the future, increase weights of experts that
have those characteristics Hope to learn it as system runs, continually updating mappings
MartiRank: In presence of repeated/missing values, sorting is non-deterministic and
pAUC takes different values depending on permutation of data Use statistics of the pAUC to improve basic learning algorithm
Instead of input nr of rounds, stop when AUC increase is not significant Use better estimators of pAUC that are not sensitive to permutations of the
data
![Page 44: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/44.jpg)
Other related projects within collaboration with Con Edison
Finer-grained component analysis Ranking of transformers Ranking of cable sections Ranking of cable joints Merging of all systems into one
Mixing ML and Survival Analysis
![Page 45: Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2](https://reader031.vdocument.in/reader031/viewer/2022012913/56649ceb5503460f949b6f4a/html5/thumbnails/45.jpg)
Acknowledgments
Con Edison: Matthew Koenig Mark Mastrocinque William Fairechio John A. Johnson Serena Lee Charles Lawson Frank Doherty Arthur Kressner Matt Sniffen Elie Chebli George Murray Bill McGarrigle Van Nest team
Columbia: CCLS:
Wei Chu Martin Jansche Ansaf Salleb Albert Boulanger David Waltz Philip M. Long (now at Google) Roger Anderson
Computer Science: Philip Gross Rocco Servedio Gail Kaiser Samit Jain John Ioannidis Sergey Sigelman Luis Alonso Joey Fortuna Chris Murphy
Stats: Samantha Cook