on the multiple breakpoint problem and the number of significant breaks in homogenisation of climate...
TRANSCRIPT
On the multiple breakpoint problem and the number of significant breaks in homogenisation of climate records
Separation of true from spurious breaks
Ralf Lindau & Victor VenemaUniversity of Bonn
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Internal and External Variance
Consider the differences of one station compared to a neighbour or a reference.
Breaks are defined by abrupt changes in the station-reference time series.
Internal variancewithin the subperiods
External variancebetween the means of different
subperiods
Criterion:Maximum external variance attained bya minimum number of breaks
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Decomposition of Variance
n total number of yearsN subperiodsni years within a subperiod
The sum of external and internal variance is constant.
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
First Question
How do random data behave?
Needed as stop criterion for the numberof significant breaks.
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Random Time Series
with stddev = 1
Segment averages xi scatter randomly
mean : 0
stddev: 1/
Because any deviation from zero can beseen as inaccuracy due to the limited number of members.
in
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
2-distribution
The external varianceis equal to the mean square sumof a random standard normal distributed variable.
Weighted measure for thevariability of the subperiods‘means
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
From 2 to distribution
n = 21 yearsk = 7 breaks
As the total variance is normalized to 1, a kind of normalized
chi2-distribution is expected:
This is the -distribution.
data
2
1,2
1)(
12
112
knkB
vvvp
knk
The exceeding probability P gives thebest (maximum) solution for v
Incomplete Beta Function
v
pdvvP0
1)(
7 breaks in 21 years
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Added variance per break
5ln21ln
2
1
1
1*
***
k
kk
dk
dv
v
k
1
0
1)(i
l
lml vvl
mvP
Incomplete -function:
2
3n
m
2
ki
Transformation to dv/dk:
mean
90%
95%
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
The extisting algorithm Prodige
Original formulation of Caussinus and Mestre for the penalty term in Prodige
Translation into terms used by us.
Normalisation by k* = k / (n -1)
Derivation to get the minimum
In Prodige it is postulated that the relative gain of external variance is a constant for given n.
minln21ln * nkv
0ln21
1*
ndk
dv
v
ndk
dv
vln2
1
1*
minln1
21ln
n
n
kv
min)ln(
1
2
)(
)(
1ln)(
1
2
1
1
2
nn
lk
YY
YYn
YCn
ii
k
j
jj
k
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Shorter length, less certainty
n = 21 yearsn = 101 years
Exceeding probability1/1281/641/321/161/81/4
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Second Question
How do true breaks behave?
Identical Behaviour
True breaks behave identical to random data.
But the abscissa-scale is now:
k / nk instead of k / n.
Compared to random time series the external variance grows faster by the factor
n / nk
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
data
theory
nk = 19 true breaks within n = 100 years time series
Assumed / True Break Number k / nk
Break vs Scatter Regime
Simulated data with 19 breaks interfered by scatter
The internal variance decrease as a function of break number.
In the break regime the variance decrease faster by the factor:
15 breaks are detectable, depending on signal to noise ratio.
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Time series lengthNumber of true breaks
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Conclusions
• The analysis of random data shows that the external variance is -distributed, which leads to a new formulation for the penalty term.
• True breaks are also -distributed. Their external variance increases faster by a factor of n/nk compared to random scatter.