sparse representations in power systems · background theory algorithms dictionary/operators...

115
Background Theory Algorithms Dictionary/Operators Application Conclusions Sparse Representations in Power Systems Jack Cooper University of Bremen August 6, 2009

Upload: others

Post on 23-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Sparse Representations in Power Systems

Jack Cooper

University of Bremen

August 6, 2009

Page 2: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Outline

Background

Power SignalsClemson Work

Theory

Algorithms

Operators/Dictionaries

Application

Conclusions

Page 3: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Signal Properties

Characteristics of a Power Signal (Xu)

Low frequency harmonics - caused by power electronic devices, arcfurnaces, transformers, rotational machines, and aggregate loads.Swells and Sags of the RMS Voltage.Transients - caused by lightnings, equipment faults, switchingoperations, and more.

Page 4: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Signal Properties

Characteristics of a Power Signal (Xu)

Low frequency harmonics - caused by power electronic devices, arcfurnaces, transformers, rotational machines, and aggregate loads.

Swells and Sags of the RMS Voltage.Transients - caused by lightnings, equipment faults, switchingoperations, and more.

Page 5: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Signal Properties

Characteristics of a Power Signal (Xu)

Low frequency harmonics - caused by power electronic devices, arcfurnaces, transformers, rotational machines, and aggregate loads.Swells and Sags of the RMS Voltage.

Transients - caused by lightnings, equipment faults, switchingoperations, and more.

Page 6: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Signal Properties

Characteristics of a Power Signal (Xu)

Low frequency harmonics - caused by power electronic devices, arcfurnaces, transformers, rotational machines, and aggregate loads.Swells and Sags of the RMS Voltage.Transients - caused by lightnings, equipment faults, switchingoperations, and more.

Page 7: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Transients

Transient Disturbances

Superimposed to the fundamental frequency.Modeled by damped sinusoids to appear as short lived impulses.Capacitor switching transients

damped sinusoids with frequency around 1000-2000 Hzdamping factor 450-500phase angle −π/2 to π/2.

Page 8: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Transients

Transient Disturbances

Superimposed to the fundamental frequency.

Modeled by damped sinusoids to appear as short lived impulses.Capacitor switching transients

damped sinusoids with frequency around 1000-2000 Hzdamping factor 450-500phase angle −π/2 to π/2.

Page 9: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Transients

Transient Disturbances

Superimposed to the fundamental frequency.Modeled by damped sinusoids to appear as short lived impulses.

Capacitor switching transients

damped sinusoids with frequency around 1000-2000 Hzdamping factor 450-500phase angle −π/2 to π/2.

Page 10: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Transients

Transient Disturbances

Superimposed to the fundamental frequency.Modeled by damped sinusoids to appear as short lived impulses.Capacitor switching transients

damped sinusoids with frequency around 1000-2000 Hzdamping factor 450-500phase angle −π/2 to π/2.

Page 11: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Example Signal with 60Hz Component

Figure: The test signal with the 60 Hz component, without noise sampled at1024 points from 0 to .1 seconds.

Page 12: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Sparse Framework

Feature extraction - detect and identify transient disturbance

Finding sparse solutions allows for a more crisp identification, morerobust in noise

minα‖α‖0 subject to x = Φα (1)

NP-Hard, instead

minα‖α‖1 subject to x = Φα (2)

Or in noise,

minα

λ ‖α‖1 +12‖x− Φα‖22 (3)

Page 13: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Sparse Framework

Feature extraction - detect and identify transient disturbance

Finding sparse solutions allows for a more crisp identification, morerobust in noise

minα‖α‖0 subject to x = Φα (1)

NP-Hard, instead

minα‖α‖1 subject to x = Φα (2)

Or in noise,

minα

λ ‖α‖1 +12‖x− Φα‖22 (3)

Page 14: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Sparse Framework

Feature extraction - detect and identify transient disturbance

Finding sparse solutions allows for a more crisp identification, morerobust in noise

minα‖α‖0 subject to x = Φα (1)

NP-Hard, instead

minα‖α‖1 subject to x = Φα (2)

Or in noise,

minα

λ ‖α‖1 +12‖x− Φα‖22 (3)

Page 15: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Sparse Framework

Feature extraction - detect and identify transient disturbance

Finding sparse solutions allows for a more crisp identification, morerobust in noise

minα‖α‖0 subject to x = Φα (1)

NP-Hard, instead

minα‖α‖1 subject to x = Φα (2)

Or in noise,

minα

λ ‖α‖1 +12‖x− Φα‖22 (3)

Page 16: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Clemson Research

Bruckstein, Donoho, Elad - From Sparse Solutions of Systems ofEquations to Sparse Modeling of Signals and Images

Chen, Donoho, Saunders - Atomic Decomposition by Basis Pursuit

Donoho, Elad - On the Stability of Basis Pursuit in the Presence ofNoise

Donoho, Elad, Temlyakov - Stable Recovery of Sparse OvercompleteRepresentations in the Precense of Noise

Tropp - Greed is Good: Algorithmic Results for SparseApproximation

Page 17: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Clemson Research

Two different (overcomplete) dictionaries - Wavelet Packet andDamped Sinusoid

Two different algorithms - Matching Pursuit and Interior PointMethod

Mutual Coherence analysis

µ(Φ) = max1≤k,j≤m,k 6=j

∣∣φTk φj

∣∣‖φk‖2 ‖φj‖2

Used for criterea on finding sparest solution possible without noise,stability in noise, etc. Often insufficient.

Time Frequency Plane Comparision

Page 18: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Clemson Research

Two different (overcomplete) dictionaries - Wavelet Packet andDamped Sinusoid

Two different algorithms - Matching Pursuit and Interior PointMethod

Mutual Coherence analysis

µ(Φ) = max1≤k,j≤m,k 6=j

∣∣φTk φj

∣∣‖φk‖2 ‖φj‖2

Used for criterea on finding sparest solution possible without noise,stability in noise, etc. Often insufficient.

Time Frequency Plane Comparision

Page 19: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Clemson Research

Two different (overcomplete) dictionaries - Wavelet Packet andDamped Sinusoid

Two different algorithms - Matching Pursuit and Interior PointMethod

Mutual Coherence analysis

µ(Φ) = max1≤k,j≤m,k 6=j

∣∣φTk φj

∣∣‖φk‖2 ‖φj‖2

Used for criterea on finding sparest solution possible without noise,stability in noise, etc. Often insufficient.

Time Frequency Plane Comparision

Page 20: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Clemson Research

Two different (overcomplete) dictionaries - Wavelet Packet andDamped Sinusoid

Two different algorithms - Matching Pursuit and Interior PointMethod

Mutual Coherence analysis

µ(Φ) = max1≤k,j≤m,k 6=j

∣∣φTk φj

∣∣‖φk‖2 ‖φj‖2

Used for criterea on finding sparest solution possible without noise,stability in noise, etc. Often insufficient.

Time Frequency Plane Comparision

Page 21: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Lingering Questions

What about the infinite dimesional setting?

What other dictionaries might be better?

What other algorithms are there?

Page 22: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Infinite Dimensional Setting

Daubechies, Defrise, De Mol - An Iterative Thresholding Algorithmfor Linear Inverse Problems with a Sparsity Constraint

Using an operator K, instead of a Dictionary ΦConsidering problem as ill-posed, unbounded or badly conditionedgeneralized inverse in need or regularization

Frequent restriction to injective K instead of Mutual Coherence.

excludes overcomplete basis.

Choose sparse solution from overcomplete basis versus use sparsityconstraint as regularization of ill posed problem with injective K

Difference in how the l1 penalty function is brought in:regularization compared to the recasting of

minα||α||1 subject to |S −Dα| ≤ δ

to

minα

λ ‖α‖1 +12‖x− Φα‖22 (4)

Page 23: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Infinite Dimensional Setting

Daubechies, Defrise, De Mol - An Iterative Thresholding Algorithmfor Linear Inverse Problems with a Sparsity Constraint

Using an operator K, instead of a Dictionary Φ

Considering problem as ill-posed, unbounded or badly conditionedgeneralized inverse in need or regularization

Frequent restriction to injective K instead of Mutual Coherence.

excludes overcomplete basis.

Choose sparse solution from overcomplete basis versus use sparsityconstraint as regularization of ill posed problem with injective K

Difference in how the l1 penalty function is brought in:regularization compared to the recasting of

minα||α||1 subject to |S −Dα| ≤ δ

to

minα

λ ‖α‖1 +12‖x− Φα‖22 (4)

Page 24: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Infinite Dimensional Setting

Daubechies, Defrise, De Mol - An Iterative Thresholding Algorithmfor Linear Inverse Problems with a Sparsity Constraint

Using an operator K, instead of a Dictionary ΦConsidering problem as ill-posed, unbounded or badly conditionedgeneralized inverse in need or regularization

Frequent restriction to injective K instead of Mutual Coherence.

excludes overcomplete basis.

Choose sparse solution from overcomplete basis versus use sparsityconstraint as regularization of ill posed problem with injective K

Difference in how the l1 penalty function is brought in:regularization compared to the recasting of

minα||α||1 subject to |S −Dα| ≤ δ

to

minα

λ ‖α‖1 +12‖x− Φα‖22 (4)

Page 25: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Infinite Dimensional Setting

Daubechies, Defrise, De Mol - An Iterative Thresholding Algorithmfor Linear Inverse Problems with a Sparsity Constraint

Using an operator K, instead of a Dictionary ΦConsidering problem as ill-posed, unbounded or badly conditionedgeneralized inverse in need or regularization

Frequent restriction to injective K instead of Mutual Coherence.

excludes overcomplete basis.

Choose sparse solution from overcomplete basis versus use sparsityconstraint as regularization of ill posed problem with injective K

Difference in how the l1 penalty function is brought in:regularization compared to the recasting of

minα||α||1 subject to |S −Dα| ≤ δ

to

minα

λ ‖α‖1 +12‖x− Φα‖22 (4)

Page 26: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Infinite Dimensional Setting

Daubechies, Defrise, De Mol - An Iterative Thresholding Algorithmfor Linear Inverse Problems with a Sparsity Constraint

Using an operator K, instead of a Dictionary ΦConsidering problem as ill-posed, unbounded or badly conditionedgeneralized inverse in need or regularization

Frequent restriction to injective K instead of Mutual Coherence.

excludes overcomplete basis.

Choose sparse solution from overcomplete basis versus use sparsityconstraint as regularization of ill posed problem with injective K

Difference in how the l1 penalty function is brought in:regularization compared to the recasting of

minα||α||1 subject to |S −Dα| ≤ δ

to

minα

λ ‖α‖1 +12‖x− Φα‖22 (4)

Page 27: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Infinite Dimensional Setting

Daubechies, Defrise, De Mol - An Iterative Thresholding Algorithmfor Linear Inverse Problems with a Sparsity Constraint

Using an operator K, instead of a Dictionary ΦConsidering problem as ill-posed, unbounded or badly conditionedgeneralized inverse in need or regularization

Frequent restriction to injective K instead of Mutual Coherence.

excludes overcomplete basis.

Choose sparse solution from overcomplete basis versus use sparsityconstraint as regularization of ill posed problem with injective K

Difference in how the l1 penalty function is brought in:regularization compared to the recasting of

minα||α||1 subject to |S −Dα| ≤ δ

to

minα

λ ‖α‖1 +12‖x− Φα‖22 (4)

Page 28: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Infinite Dimensional Setting

Daubechies, Defrise, De Mol - An Iterative Thresholding Algorithmfor Linear Inverse Problems with a Sparsity Constraint

Using an operator K, instead of a Dictionary ΦConsidering problem as ill-posed, unbounded or badly conditionedgeneralized inverse in need or regularization

Frequent restriction to injective K instead of Mutual Coherence.

excludes overcomplete basis.

Choose sparse solution from overcomplete basis versus use sparsityconstraint as regularization of ill posed problem with injective K

Difference in how the l1 penalty function is brought in:regularization compared to the recasting of

minα||α||1 subject to |S −Dα| ≤ δ

to

minα

λ ‖α‖1 +12‖x− Φα‖22 (4)

Page 29: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Regularization

To what extent is minimizing the functional

Φ(f) =12||Kf − g||22 + α||f ||1 (5)

a successful regularization?

Page 30: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Daubechies

Theorem (Daubechies, Defrise, De Mol)

For injective K, a bounded linear operator with ||K|| < 1 and α > 0, letf∗ be the minimizer of Φ. Then for any observed image g, if α(ε) satisfies

limε→0

α(ε) = 0 and limε→0

ε2

α(ε)= 0

Then,limε→0

[ sup||g−Kf0||≤ε

||f∗ − f0||] = 0

where f0 is the “true” solution.

This is presented in the literature with a range for p. p=1 is used for thiscontext.

Page 31: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Daubechies

Theorem (Daubechies, Defrise, De Mol)

For injective K, a bounded linear operator with ||K|| < 1 and α > 0, letf∗ be the minimizer of Φ. Then for any observed image g, if α(ε) satisfies

limε→0

α(ε) = 0 and limε→0

ε2

α(ε)= 0

Then,limε→0

[ sup||g−Kf0||≤ε

||f∗ − f0||] = 0

where f0 is the “true” solution.

This is presented in the literature with a range for p. p=1 is used for thiscontext.

Page 32: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Donoho, Elad

Theorem (Donoho, Elad)

With noise level ε, a dictionary D of size NxL, and mutual coherenceµ(D), then if the true solution α0 is sparse enough, satisfying

||α0||0 <1 + µ(D)

2µ(D) +√

N(ε + δ)/T

then the solution α̂ of

minα||α||1 subject to |S −Dα| ≤ δ

with δ ≥ ε exhibits stability

||α̂− α0||1 < T.

The dictionary can be overcomplete.

Page 33: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Donoho, Elad

Theorem (Donoho, Elad)

With noise level ε, a dictionary D of size NxL, and mutual coherenceµ(D), then if the true solution α0 is sparse enough, satisfying

||α0||0 <1 + µ(D)

2µ(D) +√

N(ε + δ)/T

then the solution α̂ of

minα||α||1 subject to |S −Dα| ≤ δ

with δ ≥ ε exhibits stability

||α̂− α0||1 < T.

The dictionary can be overcomplete.

Page 34: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Donoho, Elad, Temlyakov

Theorem (Donoho, Elad, Temlyakov)

With noise level ε, a dictionary D of size NxL, and mutual coherenceµ(D), then if the true solution α0 exists with

M = ||α0||0 <( 1

µ(D) + 1)

4

then the solution α̂ of

minα||α||1 subject to |S −Dα| ≤ δ

with δ ≥ ε exhibits stability

||α̂− α0||22 ≤(ε + δ)2

1− µ(D)(4M − 1).

Other conditions from Tropp

Page 35: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Donoho, Elad, Temlyakov

Theorem (Donoho, Elad, Temlyakov)

With noise level ε, a dictionary D of size NxL, and mutual coherenceµ(D), then if the true solution α0 exists with

M = ||α0||0 <( 1

µ(D) + 1)

4

then the solution α̂ of

minα||α||1 subject to |S −Dα| ≤ δ

with δ ≥ ε exhibits stability

||α̂− α0||22 ≤(ε + δ)2

1− µ(D)(4M − 1).

Other conditions from Tropp

Page 36: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Soft Thresholding - Daubechies, Defrise, De Mol

Paper in infinite dimensional setting

One parameter choice is to minimize l1 penalty functional

Φα(f) = ||Kf − g||22 + α||f ||1

Minimize this by minimizing a surrogate functional

ΦSURα (f ; a) = ||Kf − g||2 + α||f ||1 − ||Kf −Ka||2 + ||f − a||2

Iterate to with fn = arg min(ΦSURα (f ; fn−1))

This minimizer satisfies f = Sα (a + [K∗(g −Ka)]) where

Sα(x) =

x− ω

2 if x ≥ ω2

0 if |x| < ω2

x + ω2 ifx ≤ −ω

2

Iterate using fn−1 for a.

Page 37: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Soft Thresholding - Daubechies, Defrise, De Mol

Paper in infinite dimensional setting

One parameter choice is to minimize l1 penalty functional

Φα(f) = ||Kf − g||22 + α||f ||1

Minimize this by minimizing a surrogate functional

ΦSURα (f ; a) = ||Kf − g||2 + α||f ||1 − ||Kf −Ka||2 + ||f − a||2

Iterate to with fn = arg min(ΦSURα (f ; fn−1))

This minimizer satisfies f = Sα (a + [K∗(g −Ka)]) where

Sα(x) =

x− ω

2 if x ≥ ω2

0 if |x| < ω2

x + ω2 ifx ≤ −ω

2

Iterate using fn−1 for a.

Page 38: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Soft Thresholding - Daubechies, Defrise, De Mol

Paper in infinite dimensional setting

One parameter choice is to minimize l1 penalty functional

Φα(f) = ||Kf − g||22 + α||f ||1

Minimize this by minimizing a surrogate functional

ΦSURα (f ; a) = ||Kf − g||2 + α||f ||1 − ||Kf −Ka||2 + ||f − a||2

Iterate to with fn = arg min(ΦSURα (f ; fn−1))

This minimizer satisfies f = Sα (a + [K∗(g −Ka)]) where

Sα(x) =

x− ω

2 if x ≥ ω2

0 if |x| < ω2

x + ω2 ifx ≤ −ω

2

Iterate using fn−1 for a.

Page 39: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Soft Thresholding - Daubechies, Defrise, De Mol

Paper in infinite dimensional setting

One parameter choice is to minimize l1 penalty functional

Φα(f) = ||Kf − g||22 + α||f ||1

Minimize this by minimizing a surrogate functional

ΦSURα (f ; a) = ||Kf − g||2 + α||f ||1 − ||Kf −Ka||2 + ||f − a||2

Iterate to with fn = arg min(ΦSURα (f ; fn−1))

This minimizer satisfies f = Sα (a + [K∗(g −Ka)]) where

Sα(x) =

x− ω

2 if x ≥ ω2

0 if |x| < ω2

x + ω2 ifx ≤ −ω

2

Iterate using fn−1 for a.

Page 40: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Soft Thresholding - Daubechies, Defrise, De Mol

Paper in infinite dimensional setting

One parameter choice is to minimize l1 penalty functional

Φα(f) = ||Kf − g||22 + α||f ||1

Minimize this by minimizing a surrogate functional

ΦSURα (f ; a) = ||Kf − g||2 + α||f ||1 − ||Kf −Ka||2 + ||f − a||2

Iterate to with fn = arg min(ΦSURα (f ; fn−1))

This minimizer satisfies f = Sα (a + [K∗(g −Ka)]) where

Sα(x) =

x− ω

2 if x ≥ ω2

0 if |x| < ω2

x + ω2 ifx ≤ −ω

2

Iterate using fn−1 for a.

Page 41: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Soft Thresholding - Daubechies, Defrise, De Mol

Paper in infinite dimensional setting

One parameter choice is to minimize l1 penalty functional

Φα(f) = ||Kf − g||22 + α||f ||1

Minimize this by minimizing a surrogate functional

ΦSURα (f ; a) = ||Kf − g||2 + α||f ||1 − ||Kf −Ka||2 + ||f − a||2

Iterate to with fn = arg min(ΦSURα (f ; fn−1))

This minimizer satisfies f = Sα (a + [K∗(g −Ka)]) where

Sα(x) =

x− ω

2 if x ≥ ω2

0 if |x| < ω2

x + ω2 ifx ≤ −ω

2

Iterate using fn−1 for a.

Page 42: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Soft Thresholding

When these fn’s converge to the surrogate minimizer, it alsomatches the minimizer of Φ!

if ||K|| < 1 and injective then these iterations converge strongly tothe minimizer of Φ, regardless of the initial choice.

Page 43: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Soft Thresholding

When these fn’s converge to the surrogate minimizer, it alsomatches the minimizer of Φ!

if ||K|| < 1 and injective then these iterations converge strongly tothe minimizer of Φ, regardless of the initial choice.

Page 44: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Hard Thresholding - Bredies, Lorenz

Similar to soft thresholding but with a function

H(x) =

{0 for |x| ≤ 1||f ||22α for |x| > 1

Another function used in the algorithm:

ϕ(x) =

{|x| for |x| ≤ ||f ||2

2αα

||f ||2 (x2 + ( ||f ||2

2α )2 for |x| > ||f ||22α

Page 45: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Hard Thresholding - Bredies, Lorenz

Similar to soft thresholding but with a function

H(x) =

{0 for |x| ≤ 1||f ||22α for |x| > 1

Another function used in the algorithm:

ϕ(x) =

{|x| for |x| ≤ ||f ||2

2αα

||f ||2 (x2 + ( ||f ||2

2α )2 for |x| > ||f ||22α

Page 46: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Hard Thresholding - Bredies, Lorenz

Initialize u0 = 0

Direction Determination:

vn = H

(− (K∗ (Kun − f))

α

).

Step Size:

sn = min{1,α(ϕ(un)− ϕ(vn)) + (K∗(Kun − f))(un − vn)

||K(vn − un)||2}

where the expression makes sense and sn = 1 otherwise.

Iterate: un+1 = un + sn(vn − un)

Converges like n−12 for the l1 case

Page 47: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Hard Thresholding - Bredies, Lorenz

Initialize u0 = 0Direction Determination:

vn = H

(− (K∗ (Kun − f))

α

).

Step Size:

sn = min{1,α(ϕ(un)− ϕ(vn)) + (K∗(Kun − f))(un − vn)

||K(vn − un)||2}

where the expression makes sense and sn = 1 otherwise.

Iterate: un+1 = un + sn(vn − un)

Converges like n−12 for the l1 case

Page 48: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Hard Thresholding - Bredies, Lorenz

Initialize u0 = 0Direction Determination:

vn = H

(− (K∗ (Kun − f))

α

).

Step Size:

sn = min{1,α(ϕ(un)− ϕ(vn)) + (K∗(Kun − f))(un − vn)

||K(vn − un)||2}

where the expression makes sense and sn = 1 otherwise.

Iterate: un+1 = un + sn(vn − un)

Converges like n−12 for the l1 case

Page 49: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Hard Thresholding - Bredies, Lorenz

Initialize u0 = 0Direction Determination:

vn = H

(− (K∗ (Kun − f))

α

).

Step Size:

sn = min{1,α(ϕ(un)− ϕ(vn)) + (K∗(Kun − f))(un − vn)

||K(vn − un)||2}

where the expression makes sense and sn = 1 otherwise.

Iterate: un+1 = un + sn(vn − un)

Converges like n−12 for the l1 case

Page 50: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Iterative Hard Thresholding - Bredies, Lorenz

Initialize u0 = 0Direction Determination:

vn = H

(− (K∗ (Kun − f))

α

).

Step Size:

sn = min{1,α(ϕ(un)− ϕ(vn)) + (K∗(Kun − f))(un − vn)

||K(vn − un)||2}

where the expression makes sense and sn = 1 otherwise.

Iterate: un+1 = un + sn(vn − un)

Converges like n−12 for the l1 case

Page 51: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Semismooth Newton Method - Griesse, Lorenz

Solve l1 penalty minimization by finding the solution to

u− Sα(u− γK∗(Ku− f)) = 0

whereSα(uk) = max{0, |uk| − α}sgn(uk)

for some γ > 0.

Active Set Algorithm1 Initialize Active Sets, set sign vector2 Set un

In= 0 and calculate un

Anby solving

K∗KAn,AnunAn

= (K∗f + (sgn)nα)An

3 Update Active Set

An+1 = {k ∈ N|un − γK∗(Kun − f)|k > γα}

4 Update sign vector5 End if active set does not change, iterate if it does

Page 52: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Semismooth Newton Method - Griesse, Lorenz

Solve l1 penalty minimization by finding the solution to

u− Sα(u− γK∗(Ku− f)) = 0

whereSα(uk) = max{0, |uk| − α}sgn(uk)

for some γ > 0.

Active Set Algorithm1 Initialize Active Sets, set sign vector

2 Set unIn

= 0 and calculate unAn

by solving

K∗KAn,AnunAn

= (K∗f + (sgn)nα)An

3 Update Active Set

An+1 = {k ∈ N|un − γK∗(Kun − f)|k > γα}

4 Update sign vector5 End if active set does not change, iterate if it does

Page 53: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Semismooth Newton Method - Griesse, Lorenz

Solve l1 penalty minimization by finding the solution to

u− Sα(u− γK∗(Ku− f)) = 0

whereSα(uk) = max{0, |uk| − α}sgn(uk)

for some γ > 0.

Active Set Algorithm1 Initialize Active Sets, set sign vector2 Set un

In= 0 and calculate un

Anby solving

K∗KAn,AnunAn

= (K∗f + (sgn)nα)An

3 Update Active Set

An+1 = {k ∈ N|un − γK∗(Kun − f)|k > γα}

4 Update sign vector5 End if active set does not change, iterate if it does

Page 54: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Semismooth Newton Method - Griesse, Lorenz

Solve l1 penalty minimization by finding the solution to

u− Sα(u− γK∗(Ku− f)) = 0

whereSα(uk) = max{0, |uk| − α}sgn(uk)

for some γ > 0.

Active Set Algorithm1 Initialize Active Sets, set sign vector2 Set un

In= 0 and calculate un

Anby solving

K∗KAn,AnunAn

= (K∗f + (sgn)nα)An

3 Update Active Set

An+1 = {k ∈ N|un − γK∗(Kun − f)|k > γα}

4 Update sign vector5 End if active set does not change, iterate if it does

Page 55: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Semismooth Newton Method - Griesse, Lorenz

Solve l1 penalty minimization by finding the solution to

u− Sα(u− γK∗(Ku− f)) = 0

whereSα(uk) = max{0, |uk| − α}sgn(uk)

for some γ > 0.

Active Set Algorithm1 Initialize Active Sets, set sign vector2 Set un

In= 0 and calculate un

Anby solving

K∗KAn,AnunAn

= (K∗f + (sgn)nα)An

3 Update Active Set

An+1 = {k ∈ N|un − γK∗(Kun − f)|k > γα}

4 Update sign vector

5 End if active set does not change, iterate if it does

Page 56: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Semismooth Newton Method - Griesse, Lorenz

Solve l1 penalty minimization by finding the solution to

u− Sα(u− γK∗(Ku− f)) = 0

whereSα(uk) = max{0, |uk| − α}sgn(uk)

for some γ > 0.

Active Set Algorithm1 Initialize Active Sets, set sign vector2 Set un

In= 0 and calculate un

Anby solving

K∗KAn,AnunAn

= (K∗f + (sgn)nα)An

3 Update Active Set

An+1 = {k ∈ N|un − γK∗(Kun − f)|k > γα}

4 Update sign vector5 End if active set does not change, iterate if it does

Page 57: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Semismooth Newton Method

Converges locally superlinear

If it converges, then it converges to the global solution

Page 58: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search - Lee, Battle, Raina, Ng

Paper written in finite dimensional setting

Active Set algorithm

1 Initialize Active Set, sign vector2 Find i = arg maxi | ∂||y−Ax||2

∂xi|

3 Activate x1 if it improves the objective ie if

|∂||y − Ax||2

∂xi| > α

then add i to the active set and adjust the sign vector accordingly4 Feature Sign Step

Compute the solution to the unconstrainted QP over just the activeset, A

xnew = (AT A)−1(AT y − αsgn(x)

2)

Using a line search from x to xnew choose the x that minimizes theobjective. Update the active set and the sign vector

Page 59: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search - Lee, Battle, Raina, Ng

Paper written in finite dimensional setting

Active Set algorithm

1 Initialize Active Set, sign vector2 Find i = arg maxi | ∂||y−Ax||2

∂xi|

3 Activate x1 if it improves the objective ie if

|∂||y − Ax||2

∂xi| > α

then add i to the active set and adjust the sign vector accordingly4 Feature Sign Step

Compute the solution to the unconstrainted QP over just the activeset, A

xnew = (AT A)−1(AT y − αsgn(x)

2)

Using a line search from x to xnew choose the x that minimizes theobjective. Update the active set and the sign vector

Page 60: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search - Lee, Battle, Raina, Ng

Paper written in finite dimensional setting

Active Set algorithm

1 Initialize Active Set, sign vector

2 Find i = arg maxi | ∂||y−Ax||2∂xi

|3 Activate x1 if it improves the objective ie if

|∂||y − Ax||2

∂xi| > α

then add i to the active set and adjust the sign vector accordingly4 Feature Sign Step

Compute the solution to the unconstrainted QP over just the activeset, A

xnew = (AT A)−1(AT y − αsgn(x)

2)

Using a line search from x to xnew choose the x that minimizes theobjective. Update the active set and the sign vector

Page 61: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search - Lee, Battle, Raina, Ng

Paper written in finite dimensional setting

Active Set algorithm

1 Initialize Active Set, sign vector2 Find i = arg maxi | ∂||y−Ax||2

∂xi|

3 Activate x1 if it improves the objective ie if

|∂||y − Ax||2

∂xi| > α

then add i to the active set and adjust the sign vector accordingly4 Feature Sign Step

Compute the solution to the unconstrainted QP over just the activeset, A

xnew = (AT A)−1(AT y − αsgn(x)

2)

Using a line search from x to xnew choose the x that minimizes theobjective. Update the active set and the sign vector

Page 62: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search - Lee, Battle, Raina, Ng

Paper written in finite dimensional setting

Active Set algorithm

1 Initialize Active Set, sign vector2 Find i = arg maxi | ∂||y−Ax||2

∂xi|

3 Activate x1 if it improves the objective ie if

|∂||y − Ax||2

∂xi| > α

then add i to the active set and adjust the sign vector accordingly

4 Feature Sign StepCompute the solution to the unconstrainted QP over just the activeset, A

xnew = (AT A)−1(AT y − αsgn(x)

2)

Using a line search from x to xnew choose the x that minimizes theobjective. Update the active set and the sign vector

Page 63: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search - Lee, Battle, Raina, Ng

Paper written in finite dimensional setting

Active Set algorithm

1 Initialize Active Set, sign vector2 Find i = arg maxi | ∂||y−Ax||2

∂xi|

3 Activate x1 if it improves the objective ie if

|∂||y − Ax||2

∂xi| > α

then add i to the active set and adjust the sign vector accordingly4 Feature Sign Step

Compute the solution to the unconstrainted QP over just the activeset, A

xnew = (AT A)−1(AT y − αsgn(x)

2)

Using a line search from x to xnew choose the x that minimizes theobjective. Update the active set and the sign vector

Page 64: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search - Lee, Battle, Raina, Ng

Paper written in finite dimensional setting

Active Set algorithm

1 Initialize Active Set, sign vector2 Find i = arg maxi | ∂||y−Ax||2

∂xi|

3 Activate x1 if it improves the objective ie if

|∂||y − Ax||2

∂xi| > α

then add i to the active set and adjust the sign vector accordingly4 Feature Sign Step

Compute the solution to the unconstrainted QP over just the activeset, A

xnew = (AT A)−1(AT y − αsgn(x)

2)

Using a line search from x to xnew choose the x that minimizes theobjective.

Update the active set and the sign vector

Page 65: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search - Lee, Battle, Raina, Ng

Paper written in finite dimensional setting

Active Set algorithm

1 Initialize Active Set, sign vector2 Find i = arg maxi | ∂||y−Ax||2

∂xi|

3 Activate x1 if it improves the objective ie if

|∂||y − Ax||2

∂xi| > α

then add i to the active set and adjust the sign vector accordingly4 Feature Sign Step

Compute the solution to the unconstrainted QP over just the activeset, A

xnew = (AT A)−1(AT y − αsgn(x)

2)

Using a line search from x to xnew choose the x that minimizes theobjective. Update the active set and the sign vector

Page 66: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search

Check optimality

For nonzero coefficients: ∂||y−Ax||2∂xi

+ αsgn(xi) = 0If not go back to Step 4 (no new activation)

For zero coefficients: |∂||y−Ax||2∂xi

| ≤ α If not go back to Step 3

Finds the Global Minimizer

Considers overcomplete dictionaries

Page 67: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search

Check optimality

For nonzero coefficients: ∂||y−Ax||2∂xi

+ αsgn(xi) = 0If not go back to Step 4 (no new activation)

For zero coefficients: |∂||y−Ax||2∂xi

| ≤ α If not go back to Step 3

Finds the Global Minimizer

Considers overcomplete dictionaries

Page 68: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search

Check optimality

For nonzero coefficients: ∂||y−Ax||2∂xi

+ αsgn(xi) = 0If not go back to Step 4 (no new activation)

For zero coefficients: |∂||y−Ax||2∂xi

| ≤ α If not go back to Step 3

Finds the Global Minimizer

Considers overcomplete dictionaries

Page 69: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Feature Sign Search

Check optimality

For nonzero coefficients: ∂||y−Ax||2∂xi

+ αsgn(xi) = 0If not go back to Step 4 (no new activation)

For zero coefficients: |∂||y−Ax||2∂xi

| ≤ α If not go back to Step 3

Finds the Global Minimizer

Considers overcomplete dictionaries

Page 70: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Elastic Net

Extend the l1 penalty to include an l2 regularization also

Φα(f) = ||Kf − g||22 + α||f ||1 +β

2||f ||22

Same algorithms, different functional

Helps ill-conditioning in Active Set algorithms

Has unique minimizer even for noninjective K

Page 71: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Elastic Net

Extend the l1 penalty to include an l2 regularization also

Φα(f) = ||Kf − g||22 + α||f ||1 +β

2||f ||22

Same algorithms, different functional

Helps ill-conditioning in Active Set algorithms

Has unique minimizer even for noninjective K

Page 72: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Elastic Net

Extend the l1 penalty to include an l2 regularization also

Φα(f) = ||Kf − g||22 + α||f ||1 +β

2||f ||22

Same algorithms, different functional

Helps ill-conditioning in Active Set algorithms

Has unique minimizer even for noninjective K

Page 73: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Elastic Net

Extend the l1 penalty to include an l2 regularization also

Φα(f) = ||Kf − g||22 + α||f ||1 +β

2||f ||22

Same algorithms, different functional

Helps ill-conditioning in Active Set algorithms

Has unique minimizer even for noninjective K

Page 74: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Basis Learning - Lee, Battle, Raina, Ng

Optimize basis vectors for sparse reconstruction by fixing theobserved and the input.

Use a sparse coding algorithm to represent training signals in basisthen optimize using Lagrange Dual:

trace(XT X −XST (SST + Λ)−1(XST )T − cΛ)

Newton search to maximize this dual

Optimal basis is of the form

BT = (SST + Λ)−1(XST )T

Iterate this, shrinking B based on zero columns.

Page 75: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Basis Learning - Lee, Battle, Raina, Ng

Optimize basis vectors for sparse reconstruction by fixing theobserved and the input.

Use a sparse coding algorithm to represent training signals in basisthen optimize using Lagrange Dual:

trace(XT X −XST (SST + Λ)−1(XST )T − cΛ)

Newton search to maximize this dual

Optimal basis is of the form

BT = (SST + Λ)−1(XST )T

Iterate this, shrinking B based on zero columns.

Page 76: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Basis Learning - Lee, Battle, Raina, Ng

Optimize basis vectors for sparse reconstruction by fixing theobserved and the input.

Use a sparse coding algorithm to represent training signals in basisthen optimize using Lagrange Dual:

trace(XT X −XST (SST + Λ)−1(XST )T − cΛ)

Newton search to maximize this dual

Optimal basis is of the form

BT = (SST + Λ)−1(XST )T

Iterate this, shrinking B based on zero columns.

Page 77: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Basis Learning - Lee, Battle, Raina, Ng

Optimize basis vectors for sparse reconstruction by fixing theobserved and the input.

Use a sparse coding algorithm to represent training signals in basisthen optimize using Lagrange Dual:

trace(XT X −XST (SST + Λ)−1(XST )T − cΛ)

Newton search to maximize this dual

Optimal basis is of the form

BT = (SST + Λ)−1(XST )T

Iterate this, shrinking B based on zero columns.

Page 78: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Basis Learning - Lee, Battle, Raina, Ng

Optimize basis vectors for sparse reconstruction by fixing theobserved and the input.

Use a sparse coding algorithm to represent training signals in basisthen optimize using Lagrange Dual:

trace(XT X −XST (SST + Λ)−1(XST )T − cΛ)

Newton search to maximize this dual

Optimal basis is of the form

BT = (SST + Λ)−1(XST )T

Iterate this, shrinking B based on zero columns.

Page 79: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Test Signal 1

Figure: Test Signal 1 sampled at 1024 points from 0 to .1

Page 80: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Test Signal 2

Figure: Test Signal 2 sampled at 1024 points from 0 to .1

Page 81: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

How to choose alpha?

Figure: Alpha values versus the residual of the reconstruction and the noiselesssignal. The minimizer is chosen.

Page 82: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary

Mother wavelet chosen to be Daubechies 4 wavelets

Fit well to short impulses

Level 4 depth

Previously enumerated all waveforms of a wavelet packet -prohibitavely big, now can use function handle

Page 83: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary

Mother wavelet chosen to be Daubechies 4 wavelets

Fit well to short impulses

Level 4 depth

Previously enumerated all waveforms of a wavelet packet -prohibitavely big, now can use function handle

Page 84: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary

Mother wavelet chosen to be Daubechies 4 wavelets

Fit well to short impulses

Level 4 depth

Previously enumerated all waveforms of a wavelet packet -prohibitavely big, now can use function handle

Page 85: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary

Mother wavelet chosen to be Daubechies 4 wavelets

Fit well to short impulses

Level 4 depth

Previously enumerated all waveforms of a wavelet packet -prohibitavely big, now can use function handle

Page 86: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary Results

The optimal alpha value is selected at .1467

The number of nonzeros is 113.

Figure: The reconstruction compared to the noiseless signal

Page 87: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary Results

Figure: The coefficient vector

Page 88: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary Results

Algorithm Runtime (s)

Soft Thresholding 0.022067Hard Thresholding max Iter

SSN ∞FSS 0.690612

Elastic Net SSN 0.370986Elastic Net FSS 0.729112

GPSR 0.099766Interior Point 15.021680

Table: Algorithm running times on Wavelet Operator, Test Signal 1

Page 89: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary Results Signal 2

The optimal alpha value is selected at .0747

The number of nonzeros is 291

Figure: The reconstruction compared to the noiseless signal

Page 90: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary Results Signal 2

Figure: The coefficient vector

Page 91: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Wavelet Dictionary Results Signal 2

Algorithm Runtime (s)

Soft Thresholding 0.021567Hard Thresholding max Iter

SSN ∞FSS 2.305259

Elastic Net SSN 0.392921Elastic Net FSS 2.449922

GPSR 0.162870Interior Point 14.193927

Table: Algorithm running times on Wavelet Operator, Test Signal 2

Page 92: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary

Trained from 1600 signals of single damped sinusoids

Frequencies, damping factors over range associated with capacitorswitches

Starting points range over 0 to .1

Sized pared down to 354 from learning, much easier to deal with

Page 93: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary

Trained from 1600 signals of single damped sinusoids

Frequencies, damping factors over range associated with capacitorswitches

Starting points range over 0 to .1

Sized pared down to 354 from learning, much easier to deal with

Page 94: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary

Trained from 1600 signals of single damped sinusoids

Frequencies, damping factors over range associated with capacitorswitches

Starting points range over 0 to .1

Sized pared down to 354 from learning, much easier to deal with

Page 95: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary

Trained from 1600 signals of single damped sinusoids

Frequencies, damping factors over range associated with capacitorswitches

Starting points range over 0 to .1

Sized pared down to 354 from learning, much easier to deal with

Page 96: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary Results

The optimal alpha is .0495.

The number of nonzeros is 13.

Figure: The reconstruction compared to the noiseless signal

Page 97: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary Results

Figure: The coefficient vector

Page 98: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary Results

Algorithm Runtime (s)

Soft Thresholding 0.633864Hard Thresholding max iter

SSN ∞FSS 0.078537

Elastic Net SSN 0.089418Elastic Net FSS 0.078365

GPSR 0.099766Interior Point 7.790935

Table: Algorithm running times on Learned Basis Dictionary, Test Signal 1

Page 99: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary Results Signal 2

The optimal alpha is alpha=5.6000e-06

The number of nonzeros is 249.

Figure: The reconstruction compared to the noiseless signal

Page 100: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary Results Signal 2

Figure: The coefficient vector

Page 101: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Learned Dictionary Results Signal 2

Algorithm Runtime (s)

Soft Thresholding max IterHard Thresholding max Iter

SSN ∞FSS 7.975966

Elastic Net SSN 0.348048Elastic Net FSS 5.441613

GPSR max IterInterior Point 73.504971

Table: Algorithm running times on Learned B, Test Signal 2

This time the elastic net beta parameter had an effect resulting asolution of 322 nonzeros, but improved running time. beta taken as 1e-6.

Page 102: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

Sparsity Constraint research is approached from many angles. It ishard to say any setting is better than any other setting, or anyalgorithm is the best. The answer is always, it depends.

Overcomplete basis are particularly left out of a lot of the theoreticalframework by the injective operator assumptions. Have to rely onempirical evidence.

How to find the best alpha value? How to find the best alpha andbeta in Elastic Net?

Elastic Net helps SSN converge on a solution - particularly in themessy last case

The learned dictionary is very sparse when close to the trainingsignals, not so much otherwise

Page 103: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

Sparsity Constraint research is approached from many angles. It ishard to say any setting is better than any other setting, or anyalgorithm is the best. The answer is always, it depends.

Overcomplete basis are particularly left out of a lot of the theoreticalframework by the injective operator assumptions. Have to rely onempirical evidence.

How to find the best alpha value? How to find the best alpha andbeta in Elastic Net?

Elastic Net helps SSN converge on a solution - particularly in themessy last case

The learned dictionary is very sparse when close to the trainingsignals, not so much otherwise

Page 104: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

Sparsity Constraint research is approached from many angles. It ishard to say any setting is better than any other setting, or anyalgorithm is the best. The answer is always, it depends.

Overcomplete basis are particularly left out of a lot of the theoreticalframework by the injective operator assumptions. Have to rely onempirical evidence.

How to find the best alpha value? How to find the best alpha andbeta in Elastic Net?

Elastic Net helps SSN converge on a solution - particularly in themessy last case

The learned dictionary is very sparse when close to the trainingsignals, not so much otherwise

Page 105: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

Sparsity Constraint research is approached from many angles. It ishard to say any setting is better than any other setting, or anyalgorithm is the best. The answer is always, it depends.

Overcomplete basis are particularly left out of a lot of the theoreticalframework by the injective operator assumptions. Have to rely onempirical evidence.

How to find the best alpha value? How to find the best alpha andbeta in Elastic Net?

Elastic Net helps SSN converge on a solution - particularly in themessy last case

The learned dictionary is very sparse when close to the trainingsignals, not so much otherwise

Page 106: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

Sparsity Constraint research is approached from many angles. It ishard to say any setting is better than any other setting, or anyalgorithm is the best. The answer is always, it depends.

Overcomplete basis are particularly left out of a lot of the theoreticalframework by the injective operator assumptions. Have to rely onempirical evidence.

How to find the best alpha value? How to find the best alpha andbeta in Elastic Net?

Elastic Net helps SSN converge on a solution - particularly in themessy last case

The learned dictionary is very sparse when close to the trainingsignals, not so much otherwise

Page 107: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

The wavelet dictionary is more adaptable to different signals, butdoes not have as high of an underlying sparsity

When there are a small number of nonzero elements FSS isappropriate.

GPSR and Soft Thresholding preform very well with the waveletoperator.

Most algorithms are faster than the one I was using at Clemson.

How would OMP preform in this case?

How to classify detected transients?

Page 108: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

The wavelet dictionary is more adaptable to different signals, butdoes not have as high of an underlying sparsity

When there are a small number of nonzero elements FSS isappropriate.

GPSR and Soft Thresholding preform very well with the waveletoperator.

Most algorithms are faster than the one I was using at Clemson.

How would OMP preform in this case?

How to classify detected transients?

Page 109: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

The wavelet dictionary is more adaptable to different signals, butdoes not have as high of an underlying sparsity

When there are a small number of nonzero elements FSS isappropriate.

GPSR and Soft Thresholding preform very well with the waveletoperator.

Most algorithms are faster than the one I was using at Clemson.

How would OMP preform in this case?

How to classify detected transients?

Page 110: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

The wavelet dictionary is more adaptable to different signals, butdoes not have as high of an underlying sparsity

When there are a small number of nonzero elements FSS isappropriate.

GPSR and Soft Thresholding preform very well with the waveletoperator.

Most algorithms are faster than the one I was using at Clemson.

How would OMP preform in this case?

How to classify detected transients?

Page 111: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

The wavelet dictionary is more adaptable to different signals, butdoes not have as high of an underlying sparsity

When there are a small number of nonzero elements FSS isappropriate.

GPSR and Soft Thresholding preform very well with the waveletoperator.

Most algorithms are faster than the one I was using at Clemson.

How would OMP preform in this case?

How to classify detected transients?

Page 112: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

The wavelet dictionary is more adaptable to different signals, butdoes not have as high of an underlying sparsity

When there are a small number of nonzero elements FSS isappropriate.

GPSR and Soft Thresholding preform very well with the waveletoperator.

Most algorithms are faster than the one I was using at Clemson.

How would OMP preform in this case?

How to classify detected transients?

Page 113: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Conclusions - Questions - Thoughts

The wavelet dictionary is more adaptable to different signals, butdoes not have as high of an underlying sparsity

When there are a small number of nonzero elements FSS isappropriate.

GPSR and Soft Thresholding preform very well with the waveletoperator.

Most algorithms are faster than the one I was using at Clemson.

How would OMP preform in this case?

How to classify detected transients?

Page 114: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Solver Developers

Iterative Thresholding Methods - Dirk Lorenz

FSS - Stefan Schiffler

SNN - Stefan Schiffler

Basis Learning - Klaus Steinhorst

GPSR - Mario Figueiredo, Robert Nowak, Stephen Wright

Interior Point Method - SparseLab

Wavelet Operator - Sparco

Page 115: Sparse Representations in Power Systems · Background Theory Algorithms Dictionary/Operators Application Conclusions Clemson Research Two different (overcomplete) dictionaries -

Background Theory Algorithms Dictionary/Operators Application Conclusions

Thanks

Thanks to those who wrote the code I plundered, to Stefan for helping,and to the audience for listening.