kevin bleakley and jean-philippe vert - mines...
TRANSCRIPT
Multiple change-points detection in multiple signal
Kevin Bleakley and Jean-Philippe Vert
Mines ParisTech / Curie Institute / Inserm
Mathematical Statistics and Applications Workshop, Fréjus, France,Sep 2,2010.
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 1 / 21
Multiple change-points detection in 1 signal
0 100 200 300 400 500 600 700 800 900 1000−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 2 / 21
Multiple change-points detection in 1 signal
0 100 200 300 400 500 600 700 800 900 1000−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 2 / 21
Multiple change-points detection in many signals
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 3 / 21
Multiple change-points detection in many signals
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 3 / 21
Why we care?
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
Joint segmentation should increase the statistical powerApplications:
multi-dimensional signals (multimedia, sensors...)genomic profiles
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 4 / 21
Chromosomic aberrations in cancer
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 5 / 21
Comparative Genomic Hybridization (CGH)
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 6 / 21
A collection of bladder tumours
0 200 400 600 800 1000 1200 1400 1600 1800 2000−1
−0.5
0
0.5
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000−1
−0.5
0
0.5
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000−1
−0.5
0
0.5
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000−1
−0.5
0
0.5
1
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 7 / 21
Typical applications
Find frequent breakpoints in a collection of tumours (fusiongenes...)Low-dimensional summary and visualization of the set of profiles
0 200 400 600 800 1000 1200 1400 1600 1800 2000!1
!0.5
0
0.5
1
Probe
Log
-rat
io(a )
0 200 400 600 800 1000 1200 1400 1600 1800 2000!1
!0.5
0
0.5
1
Probe
Sco
re
( b)
0 200 400 600 800 1000 1200 1400 1600 1800 2000!1
!0.5
0
0.5
1
Probe
Log
-rat
io
(c)
0 200 400 600 800 1000 1200 1400 1600 1800 2000!1
!0.5
0
0.5
1
Probe
Log
-rat
io
( d)
Detection of frequently altered regions0 200 400 600 800 1000 1200 1400 1600 1800 2000
!1
!0.5
0
0.5
1
Probe
Log
-rat
io
(a )
0 200 400 600 800 1000 1200 1400 1600 1800 2000!1
!0.5
0
0.5
1
Probe
Sco
re
( b)
0 200 400 600 800 1000 1200 1400 1600 1800 2000!1
!0.5
0
0.5
1
Probe
Log
-rat
io
(c)
0 200 400 600 800 1000 1200 1400 1600 1800 2000!1
!0.5
0
0.5
1
Probe
Log
-rat
io
( d)
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 8 / 21
What we want
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
0 100 200 300 400 500 600 700 800 900 1000−0.5
0
0.5
1
1.5
1 An algorithm that scales in time and memory toProfiles length: n = 106 ∼ 109
Number of profiles (dimension): p = 102 ∼ 103
Number of change-points: k = 102 ∼ 103
2 A method with good statistical properties when p increases for nfixed (opposite to most existing litterature).
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 9 / 21
Segmentation by dynamic programming
Y ∈ Rn×p the signalsDefine a piecewise constant approximation U ∈ Rn×p of Y with kchange-points as the solution of
minU∈Rn×p
‖Y − U ‖2 such thatn−1∑i=1
1(Ui+1,• 6= Ui,•
)≤ k
DP finds the solution in O(n2kp) in time and O(n2) in memoryDoes not scale to n = 106 ∼ 109...
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 10 / 21
TV approximator for a single signal (p = 1)
Replace
minU∈Rn
‖Y − U ‖2 such thatn−1∑i=1
1 (Ui+1 6= Ui) ≤ k
by
minU∈Rn
‖Y − U ‖2 such thatn−1∑i=1
|Ui+1 − Ui | ≤ µ
An instance of total variation penalty (Rudin et al., 1992)Convex problem, fast implementations in O(nK ) or O(n log n)(Friedman et al., 2007; Harchaoui and Levy-Leduc, 2008;Hoefling, 2009)
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 11 / 21
TV approximator for many signals
Replace
minU∈Rn×p
‖Y − U ‖2 such thatn−1∑i=1
1(Ui+1,• 6= Ui,•
)≤ k
by
minU∈Rn×p
‖Y − U ‖2 such thatn−1∑i=1
wi‖Ui+1,• − Ui,•‖ ≤ µ
QuestionsPractice: can we solve it efficiently?
Theory: does it benefit from increasing p (for n fixed)?
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 12 / 21
TV approximator as a group Lasso problem
Make the change of variables:
γ = U1,• ,
βi,• = wi(Ui+1,• − Ui,•
)for i = 1, . . . ,n − 1 .
TV approximator is then equivalent to the following group Lassoproblem (Yuan and Lin, 2006):
minβ∈R(n−1)×p
‖ Y − Xβ ‖2 + λ
n−1∑i=1
‖βi,• ‖ ,
where Y is the centered signal matrix and X is a particular(n − 1)× (n − 1) design matrix.
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 13 / 21
TV approximator implementation
minβ∈R(n−1)×p
‖ Y − Xβ ‖2 + λ
n−1∑i=1
‖βi,• ‖ ,
TheoremThe TV approximator can be solved efficiently:
approximately with the group LARS in O(npk) in time and O(np)in memoryexactly with a block coordinate descent + active set method inO(np) in memory
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 14 / 21
Proof: computational tricks...
Although X is (n − 1)× (n − 1):For any R ∈ Rn×p, we can compute C = X>R in O(np) operationsand memoryFor any two subset of indices A =
(a1, . . . ,a|A|
)and
B =(b1, . . . ,b|B|
)in [1,n − 1], we can compute X>•,AX•,B in
O (|A||B|) in time and memoryFor any A =
(a1, . . . ,a|A|
), set of distinct indices with
1 ≤ a1 < . . . < a|A| ≤ n − 1, and for any |A| × p matrix R, we can
compute C =(
X>•,AX•,A)−1
R in O(|A|p) in time and memory
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 15 / 21
Consistency for a single change-point
Suppose a single change-point:at position u = αnwith increments (βi)i=1,...,p s.t. β2 = limk→∞
1p∑k
i=1 β2i
corrupted by i.i.d. Gaussian noise of variance σ2
0 100 200 300 400 500 600 700 800 900 1000−2
0
2
4
0 100 200 300 400 500 600 700 800 900 1000−2
0
2
4
0 100 200 300 400 500 600 700 800 900 1000−2
0
2
4
Does the TV approximator correctly estimate the first change-point asp increases?K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 16 / 21
Consistency of the unweighted TV approximator
minU∈Rn×p
‖Y − U ‖2 such thatn−1∑i=1
‖Ui+1,• − Ui,•‖ ≤ µ
TheoremThe unweighted TV approximator finds the correct change-point withprobability tending to 1 (resp. 0) as p → +∞ if σ2 < σ2
α (resp.σ2 > σ2
α), where
σ2α = nβ2 (1− α)2(α− 1
2n )
α− 12 −
12n
.
correct estimation on [nε,n(1− ε)] with ε =√
σ2
2nβ2 + o(n−1/2) .
wrong estimation near the boundaries
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 17 / 21
Consistency of the weighted TV approximator
minU∈Rn×p
‖Y − U ‖2 such thatn−1∑i=1
wi‖Ui+1,• − Ui,•‖ ≤ µ
Theorem
The weighted TV approximator with weights
∀i ∈ [1,n − 1] , wi =
√i(n − i)
n
correctly finds the first change-point with probability tending to 1 asp → +∞.
we see the benefit of increasing pwe see the benefit of adding weights to the TV penalty
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 18 / 21
Proof sketch
The first change-point i found by TV approximator maximizesFi = ‖ ci,• ‖2, where
c = X>Y = X>Xβ∗ + X>W .
c is Gaussian, and Fi is follows a non-central χ2 distribution with
Gi =EFi
p=
i(n − i)nw2
iσ2 +
β2
w2i w2
u n2×
{i2 (n − u)2 if i ≤ u ,u2 (n − i)2 otherwise.
We then just check when Gu = maxi Gi
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 19 / 21
Consistent estimation of more change-points?
0 100 200 300 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
Accu
racy
0 100 200 300 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
Accu
racy
0 100 200 300 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
Accu
racy
U LARSW LARSU LassoW Lasso
U LARSW LARSU LassoW Lasso
U LARSW LARSU LassoW Lasso
n = 100, k = 10, β2 = 1, σ2 ∈ {0.05; 0.2; 1}K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 20 / 21
Conclusion
A new convex formulation for multiple change-point detection inmultiple signalsBetter estimation with more signalsImportance of weightsEfficient approximate (gLARS) and exact (gLASSO)implementations; GLASSO more expensive but more accurateConsistency for the first K > 1 change-points observedexperimentally but technically tricky to prove.
K. Bleakley and J.P Vert (ParisTech) Multiple change-points in multiple signals StatMathAppli 2010 21 / 21