the power of penalties
TRANSCRIPT
-
7/29/2019 The Power of Penalties
1/71
The Power of Penalties
in Signal Processing
Paul Eilers and Johan de Rooi
Erasmus Medical Center, Rotterdam, The Netherlands
LASIR, April 2012
-
7/29/2019 The Power of Penalties
2/71
Signals in real life
You cant always get what you want (Mick Jagger)
We can measure a lot
But there always are problems, small or large
Noise and artifacts
Drifting baselines
Convoluted signals
Usually a combination
Ill show some examples, and previews of solutions
LASIR, April 2012 1
-
7/29/2019 The Power of Penalties
3/71
A noisy signal
130 130.5 131 131.5 132 132.5 133 133.5 1340
50
100
150
200
250
300
XRD peak (linear), = 10000
Angle
Counts,
linear
130 130.5 131 131.5 132 132.5 133 133.5 1340
0.5
1
1.5
2
2.5
XRD peak (logscale), = 10000
log
10(counts
Angle
LASIR, April 2012 2
-
7/29/2019 The Power of Penalties
4/71
A drifting baseline
0 500 1000 1500 2000 2500 3000 3500 4000
26
27
28
29
30
31
32
Data and fitted baseline
0 500 1000 1500 2000 2500 3000 3500 40001
0
1
2
3
4
5
Fitted baseline subtracted
LASIR, April 2012 3
-
7/29/2019 The Power of Penalties
5/71
A baseline in time-resolved spectroscopy
Data
100 200 300
5
10
15
20
25
30
35
4040
20
0
20
40
60
80
100Estimated baseline
100 200 300
5
10
15
20
25
30
35
4040
20
0
20
40
60
80
100
Artefact
100 200 300
5
10
15
20
25
30
35
4040
30
20
10
0
"Background" weights
100 200 300
5
10
15
20
25
30
35
400
0.2
0.4
0.6
0.8
1
LASIR, April 2012 4
-
7/29/2019 The Power of Penalties
6/71
Segmented smoothing
0 10 20 30 40
7
8
9
10
11
Position on chromosome (Mbase)
log2(CNV
signal)
Array GBM 139.CEL
0 10 20 30 40
7
8
9
10
11
Position on chromosome (Mbase)
log2(CNV
signal)
Array GBM 2032.CEL
LASIR, April 2012 5
-
7/29/2019 The Power of Penalties
7/71
Deconvolution of spikes
0 50 100 150 200 250 300 350 400 450 500
0
0.5
1
Data and fit; = 0.02; = 0.0001
0 50 100 150 200 250 300 350 400 450 500
0
0.5
1
Estimated pulse coeffcients
20 15 10 5 0 5 10 15 20
0
0.5
1
Pulse shapes; initial and final estimate
LASIR, April 2012 6
-
7/29/2019 The Power of Penalties
8/71
Smoothing
LASIR, April 2012 7
-
7/29/2019 The Power of Penalties
9/71
Signals, raw and smooth
We have a signal, y, and compute another signal, z
We want z to be close to y
We have ways to measure that
Sums of squares: i(y
i z
i)2 = ||y z||2
Sums of absolute values i |yi zi| = |y z|
Or more complicated objective functions to minimize
Like in regression or convolution: ||y Cz||2
Or using (adaptive) weights i wi(yi zi)2 = (y z)W(y z)
LASIR, April 2012 8
-
7/29/2019 The Power of Penalties
10/71
Desired properties and penalties
We want z to have desirable properties
Smooth everywhere
Or piece-wise constant
Or consisting of a few spikes
Invent a penalty, another objective function, working on z
It has to be small ifz has desired property
Combine the objective functions for fit and penalty
Minimize that combination
LASIR, April 2012 9
-
7/29/2019 The Power of Penalties
11/71
A simple smoother: Whittaker
Whittaker (1923) proposed graduation: minimize
S2 = i
(yi zi)2 +
i
(dzi)2
Given a noisy data series y, it finds a smoother series z
Operator d forms differences of order d: zi = zi zi1
Today we call this penalized least squares
Explicit solution, with matrix D, such that dz = Dz:
(I+ DD) z = y
LASIR, April 2012 10
-
7/29/2019 The Power of Penalties
12/71
The Whittaker smoother (simulated data)
0.0 0.2 0.4 0.6 0.8 1.0
1.5
1.0
0.5
0.
0
0.5
1.
0
1
.5
x
y
Whittaker smoother with lambda = 10, 1000, 1e4
LASIR, April 2012 11
-
7/29/2019 The Power of Penalties
13/71
Sparseness
Many equations (one per observation), but a banded system
Computation time is linear in data length
R package spam is great (sparse matrices, Matlab-style)
System with 4000 equations solved in 20 milliseconds
# Whittaker smoother
m = length(y)
E = diag.spam(m)
D = diff(E, diff = 2)P = lambda * t(D) %*% D
z = solve(E + P, y)
LASIR, April 2012 12
-
7/29/2019 The Power of Penalties
14/71
Whittaker for Poisson counts
130 130.5 131 131.5 132 132.5 133 133.5 1340
50
100
150
200
250
300
XRD peak (linear), = 10000
Angle
Counts,
linear
130 130.5 131 131.5 132 132.5 133 133.5 1340
0.5
1
1.5
2
2.5
XRD peak (logscale), = 10000
log
10(counts
Angle
LASIR, April 2012 13
-
7/29/2019 The Power of Penalties
15/71
Segmented smoothing
LASIR, April 2012 14
-
7/29/2019 The Power of Penalties
16/71
Copy number variations in DNA
Normal DNA comes in pairs of chromosomes
But not so in tumors
Some parts may be missing, others doubled, tripled, or more
Different changes in each chromosome
These are called copy number variations (CNV)
We can use SNP microarrays to measure them in very manyplaces
But we get a noisy signal
We expect constant segments
LASIR, April 2012 15
-
7/29/2019 The Power of Penalties
17/71
The Whittaker smoother for segments?
0 10 20 30 40
7
8
9
10
11
Position on chromosome (Mbase)
log2(CNV
signal)
Array GBM 139.CEL
0 10 20 30 40
7
8
9
10
11
Position on chromosome (Mbase)
log
2(CNV
signal)
Array GBM 2032.CEL
LASIR, April 2012 16
-
7/29/2019 The Power of Penalties
18/71
-
7/29/2019 The Power of Penalties
19/71
The L1 penalty in action
0 10 20 30 40
7
8
9
10
11
Position on chromosome (Mbase)
log2(CNV
signal)
Array GBM 139.CEL
0 10 20 30 40
7
8
9
10
11
Position on chromosome (Mbase)
log
2(CNV
signal)
Array GBM 2032.CEL
LASIR, April 2012 18
-
7/29/2019 The Power of Penalties
20/71
Computation for the L1 penalty
We could try quadratic programming techniques
But there is an easier solution
For any x and approximation x we have |x| = x2/|x| x2/|x|
Use weighted L2 penalty, with vi = 1/| zi|:
S1 = i
(yi zi)2 + i
vi(zi)2
Iteratively update v and z
Solve (I+ DVD)z = y repeatedly, with V = diag(v)
Some smoothing near 0: use vi = 1/
( zi)2 + 2, with small
LASIR, April 2012 19
-
7/29/2019 The Power of Penalties
21/71
-
7/29/2019 The Power of Penalties
22/71
The L0 penalty in action
0 10 20 30 40
7
8
9
10
11
Position on chromosome (Mbase)
log2(CNV
signal)
Array GBM 139.CEL
0 10 20 30 40
7
8
9
10
11
Position on chromosome (Mbase)
log
2(CNV
signal)
Array GBM 2032.CEL
LASIR, April 2012 21
-
7/29/2019 The Power of Penalties
23/71
Baseline estimation
LASIR, April 2012 22
-
7/29/2019 The Power of Penalties
24/71
A drifting chromatograph signal
0 500 1000 1500 2000 2500 3000 350026
27
28
29
30
31
32
Time [s]
Drifting chromatogram
LASIR, April 2012 23
-
7/29/2019 The Power of Penalties
25/71
A picture of rice grains
Rice grains on background
50 100 150 200 250
50
100
150
200
250
0 100 200 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8Crosssection (red)
0 100 200 3000.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8Crosssection (blue)
LASIR, April 2012 24
-
7/29/2019 The Power of Penalties
26/71
-
7/29/2019 The Power of Penalties
27/71
Modelling strategy
Smooth background curve (surface)
B-splines with penalty to tune smoothness (P-splines)
Special mixture of distributions
Normal distribution for noise
Unknown one-sided distribution for signal
LASIR, April 2012 26
-
7/29/2019 The Power of Penalties
28/71
Simulated data in 1D
0 50 100 150 200 250 300 350 400 450 5000.2
0
0.2
0.4
0.6
0.8
1
1.2Simulated constant background
0 50 100 150 200 250 300 350 400 450 5000.2
0
0.2
0.4
0.6
0.8
1
1.2Simulated variable background
LASIR, April 2012 27
-
7/29/2019 The Power of Penalties
29/71
A statistical model for constant background
Observed: y
Mixture model for distribution ofy:
f(y) = pg(y|, ) + (1 p)h(y )
g normal, with (background level) and unknown
h unspecified, supported only on positive axis
Mixing ratio p unknown
LASIR, April 2012 28
-
7/29/2019 The Power of Penalties
30/71
Illustrating the mixture idea
g (v |,)
Two component mixture for baseline and peaks
qqqqqqqqqq
qqqqqqq
qqqqqqq
q
qq
qqqqqqqq
qqqqqqqq
qqqqqqqqq
qqqqqqqqqq
qqqqqq
qq
q
q
qqqqq
q
q
q
q
qqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqq
qqq
qq
q
q
qqqq
qq
qqqqqqqqqq
q
q
q
qqq
q
q
q
qq
qq
qqqqqqqqq
qqqq
qq
q
q
q
q
qq
q
q
q
q
qq
qqqqqqqqq
qqqqqqq
qqqqqqqqq
h(.)
LASIR, April 2012 29
-
7/29/2019 The Power of Penalties
31/71
EM estimation
Suppose we knew the distribution parameters approximately
Take one yi, compute (Bayes)
wi1 =pg(yi)
pg(yi) + (1 p)h(yi)
Then wi1 is probability ofyi coming from g (background)
And similarly wi2 = 1 wi1 for y coming from h (signal)
Use y with weights w1 to improve and
Use w2 to improve nonparametric estimate ofh
LASIR, April 2012 30
-
7/29/2019 The Power of Penalties
32/71
-
7/29/2019 The Power of Penalties
33/71
Showing the weights
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5Simulated data with constant background and estimate
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1Estimated weights
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30Background and signal distributions (green: unsmoothed)
LASIR, April 2012 32
-
7/29/2019 The Power of Penalties
34/71
Non-parametric density estimation
Variation on Whittaker smoother
Construct histogram (100 bins) ofy
Sum w2 in bins to get pseudo counts t
Smooth t, with E(tj) = ezj and Poisson-type likelihood
Difference penalty on z
Deals well with left discontinuity
LASIR, April 2012 33
-
7/29/2019 The Power of Penalties
35/71
Background with trend
Model: y(xi) = v(xi) + ui
Smooth trend v
Mixture model for distribution ofu:
f(u) = pg(u|0, ) + (1 p)h(u)
g normal, with unknown
h unspecified, supported only on positive axis
Mixing ratio p unknown
LASIR, April 2012 34
-
7/29/2019 The Power of Penalties
36/71
Estimating a varying background
Model trend with B-splines:
vi = v(xi) = Bj(xi)j
Add difference penalty on
This is the P-spline approach
Use EM procedure as before (split and fit)
Weights follow from residuals
Model is re-estimated with these weights
LASIR, April 2012 35
-
7/29/2019 The Power of Penalties
37/71
P-splines illustrated
Light penalty
Heavier penalty
LASIR, April 2012 36
-
7/29/2019 The Power of Penalties
38/71
Fitting a background trend
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.2
0
0.2
0.4
0.6
0.8
1
1.2Simulated data with varying background and Psplines estimate
0.2 0 0.2 0.4 0.6 0.8 10
5
10
15
20Background and signal distributions (green: unsmoothed)
LASIR, April 2012 37
-
7/29/2019 The Power of Penalties
39/71
-
7/29/2019 The Power of Penalties
40/71
Chromatogram background
0 500 1000 1500 2000 2500 3000 3500 400026
27
28
29
30
31
32
Data and fitted baseline
0 500 1000 1500 2000 2500 3000 3500 40001
0
1
2
3
4
5
Fitted baseline subtracted
LASIR, April 2012 39
h b k d (d l)
-
7/29/2019 The Power of Penalties
41/71
Chromatogram background (detail)
300 400 500 600 700 800 900 100027
28
29
30
31
32
Data and fitted baseline
300 400 500 600 700 800 900 10001
0
1
2
3
4
5
Fitted baseline subtracted
LASIR, April 2012 40
di i l hi i h li
-
7/29/2019 The Power of Penalties
42/71
Two-dimensional smoothing with P-splines
Tensor products of B-splines
Bjk(x,y) = Bj(x)Bk(y)
Equally spaced knots on 2D grid
Matrix of coefficients A = [jk]:
zi = j
k
Bj(xi)Bk(yi)jk
Penalties on rows and columns of A
LASIR, April 2012 41
T d b i f 2 D b li
-
7/29/2019 The Power of Penalties
43/71
Tensor product basis for 2-D baseline
850
900950
1000
Wavelen
gth(nm)
30
40
50
60
70
TemperatureC
0
0.
1
0.
2
0.
3
0.
4
0.
5
LASIR, April 2012 42
-
7/29/2019 The Power of Penalties
44/71
P k i f t d t
-
7/29/2019 The Power of Penalties
45/71
Peaks as a nuisance: femtosecond spectroscopy
Data
100 200 300
5
1015
20
25
30
35
4040
20
0
20
40
60
80
100Estimated baseline
100 200 300
5
1015
20
25
30
35
4040
20
0
20
40
60
80
100
Artefact
100 200 300
5
10
15
20
25
30
35
4040
30
20
10
0
"Background" weights
100 200 300
5
10
15
20
25
30
35
400
0.2
0.4
0.6
0.8
1
LASIR, April 2012 44
-
7/29/2019 The Power of Penalties
46/71
-
7/29/2019 The Power of Penalties
47/71
Spike deconvolution
LASIR, April 2012 46
P l lik i l
-
7/29/2019 The Power of Penalties
48/71
Pulse-like signals
Some signals are series of pulses: spike trains
We encounter them in many places
In chemical instruments
DNA sequencers
chromatographs
In nature
pulsatory hormone release
neuron signalling
In technical systems like radar or ultrasound
LASIR, April 2012 47
DNA i f t
-
7/29/2019 The Power of Penalties
49/71
DNA sequencing, four traces
0
500
1000
1500
ABI trace 1
0
500
1000
1500
ABI trace 2
0
500
1000
1500
ABI trace 3
0 100 200 300 400 500 600 700 800 900 10000
1000
2000
ABI trace 4
LASIR, April 2012 48
Th f l d l
-
7/29/2019 The Power of Penalties
50/71
The sum of pulses model
The model:
y(t) = j ajs(t j) + ei
Assumptions:
each pulse has identical shape s(.)
location j unknown
height aj unknown
linear superposition holds (sum of pulses)
LASIR, April 2012 49
The convolution model
-
7/29/2019 The Power of Penalties
51/71
The convolution model
Observations always are in discrete time
Assume discrete pulse shape sk
Discrete input series x
Non-zero elements ofx give pulse heights and positions
yi = j
sijxj + ei
Or y = Cx + e, ifcij = sij
Columns ofC identical, but shifted
This is called convolution
LASIR, April 2012 50
Convolution matrix
-
7/29/2019 The Power of Penalties
52/71
Convolution matrix
05
1015
2025
30
0
10
20
300
0.5
1
Convolution matrix
Convolution matrix
5 10 15 20 25 30
5
10
15
20
25
30
LASIR, April 2012 51
Deconvolution of pulse trains
-
7/29/2019 The Power of Penalties
53/71
Deconvolution of pulse trains
Output y given, estimate input x
Convolution matrix (pulse shape) assumed to be known
Model: y = Cx + e
This looks like regression problem, and it is
Least squares solution: x = (CC)1Cy
Results are disastrous; the problem is ill-conditioned
LASIR, April 2012 52
Results of (penalized) linear regression
-
7/29/2019 The Power of Penalties
54/71
Results of (penalized) linear regression
0 10 20 30 40 50 60 70 80 90 100
0
0.5
1Data, components, and fit
0 10 20 30 40 50 60 70 80 90 1001
0.5
0
0.5
1x 10
6 Deconvolution without penalty
0 10 20 30 40 50 60 70 80 90 100
0
0.5
1
Deconvolution with L2 penalty; = 0.01
LASIR, April 2012 53
Penalties come to the rescue
-
7/29/2019 The Power of Penalties
55/71
Penalties come to the rescue
Least squares goal is S = ||y Cx||2
Extend it with a ridge (L2) penalty:
S = ||y Cx||2 + ||x||2
Or with a LASSO (L1) penalty
S = ||y Cx||2 + j
|xj|
Ridge penalty not useful: no sign of impulses
LASSO is not too bad
L0 penalty works best
LASIR, April 2012 54
LASSO (L1) and L0 results
-
7/29/2019 The Power of Penalties
56/71
LASSO (L1) and L0 results
0 10 20 30 40 50 60 70 80 90 100
0
0.5
1Data, components, and fit
0 10 20 30 40 50 60 70 80 90 100
0
0.5
1
Deconvolution with L1 penalty; = 0.01
0 10 20 30 40 50 60 70 80 90 100
0
0.5
1
Deconvolution with L0 penalty; = 0.003
LASIR, April 2012 55
Implementation of LASSO and L0 penalty
-
7/29/2019 The Power of Penalties
57/71
Implementation of LASSO and L0 penalty
Write it as a weighted square: |xj| = x2j /|xj|
Avoid division by near zero: |xj| x2j /
x2j +
Small number
Do this iteratively, for LASSO: |xj| x2j /
x2j +
Or, for L0 penalty: |xj| x2j /(x
2j + )
Approximation xj from previous iteration
LASIR, April 2012 56
Interpretation of the L0 penalty
-
7/29/2019 The Power of Penalties
58/71
Interpretation of the L0 penalty
Consider the penalty after convergence
j
x2j /( + x2j )
Where is a small number
When xj = 0, no contribution to penalty
When xj = 0, contribution to penalty
Hence we penalize the number of non-zero elements
LASIR, April 2012 57
Deconvolution of hormone concentrations
-
7/29/2019 The Power of Penalties
59/71
Deconvolution of hormone concentrations
0 20 40 60 80 100 120 1402
4
6
8
10
Data and fit; = 1.2; = 0.001
0 20 40 60 80 100 120 1405
0
5
10Estimated pulse coeffcients
0 20 40 60 80 100 120 1400
2
4
6Individual spikes
LASIR, April 2012 58
Blind deconvolution
-
7/29/2019 The Power of Penalties
60/71
Blind deconvolution
If we know the input, we can estimate the pulse shape
This suggests an iterative procedure
Make good guess at pulse shape
Do the penalized deconvolution
Estimate pulse shape (regression with good condition)
Repeat last two steps
This works, with some care
LASIR, April 2012 59
Blind deconvolution of DNA data
-
7/29/2019 The Power of Penalties
61/71
Blind deconvolution of DNA data
0 50 100 150 200 250 300 350 400 450 500
0
0.5
1
Data and fit; = 0.02; = 0.0001
0 50 100 150 200 250 300 350 400 450 500
0
0.5
1
Estimated pulse coeffcients
20 15 10 5 0 5 10 15 20
0
0.5
1
Pulse shapes; initial and final estimate
LASIR, April 2012 60
-
7/29/2019 The Power of Penalties
62/71
More deconvolution
LASIR, April 2012 61
Illustrating convolution with step input
-
7/29/2019 The Power of Penalties
63/71
Illustrating convolution with step input
0 50 100 150 200 250 300 350 4004
2
0
2
4
6Input
0 50 100 150 200 250 300 350 4004
2
0
2
4
6Output
LASIR, April 2012 62
Deconvolution with step input
-
7/29/2019 The Power of Penalties
64/71
Deconvolution with step input
0 50 100 150 200 250 300 350 4004
2
0
2
4
6Input and output
0 50 100 150 200 250 300 350 4004
2
0
2
4
6Output and estimated input
LASIR, April 2012 63
Deconvolution of spikes in two dimensions
-
7/29/2019 The Power of Penalties
65/71
Deconvolution of spikes in two dimensions
The same principles: spikes are smeared out
Image in matrix Y, spike (input) matrix X
But now in two directions
Convolution kernel assumed to be known (Gaussian)
Model: y = Cx + e
With y = vec(Y) and x = vec(X)
Matrix C computed in a special way
L0 penalty on elements ofx
LASIR, April 2012 64
2-D spike deconvolution (simulated data)
-
7/29/2019 The Power of Penalties
66/71
2 D spike deconvolution (simulated data)
50 100 150 200 250
50
100
150
200
250
10 20 30 40
5
10
15
20
25
30
35
40
10 20 30 40
5
10
15
20
25
30
35
40
10 20 30 40
5
10
15
20
25
30
35
400
0.02
0.04
0.06
0.08
LASIR, April 2012 65
Convergence history
-
7/29/2019 The Power of Penalties
67/71
Convergence history
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
20 40
10
20
30
40
LASIR, April 2012 66
Computational aspects
-
7/29/2019 The Power of Penalties
68/71
Computational aspects
The system is too large for comfort
We now use 40 by 40 sub-pictures (1600 unknowns)
There are ways to improve things
We know that most elements ofXare zero
We are working on an adaptive strategy
LASIR, April 2012 67
Super-resolution
-
7/29/2019 The Power of Penalties
69/71
Super resolution
We can use a finer grid for X
Say 2 by 2 sub-pixels for each Ypixel
This works in principle
But computational aspects are harder
At the moment only an illustration available
Working with a coarsened Y
LASIR, April 2012 68
-
7/29/2019 The Power of Penalties
70/71
10 20 30 40
5
10
15
20
25
30
35
40
5 10 15 20
5
10
15
20
10 20 30 40
5
10
15
20
25
30
35
40
5 10 15 20
5
10
15
200
0.02
0.04
0.06
0.08
LASIR, April 2012 69
Summary
-
7/29/2019 The Power of Penalties
71/71
y
Penalties are very useful
For smoothness (reduce noise, estimate baselines)
For sparseness (spike deconvolution, 1-D and 2-D)
There are more applications
Shape constraints, like monotone or unimodal
Fit can be likelihood-based (counts, binary data)
Penalties are connected to prior opinions (Bayes)