informatlon tq userscollectionscanada.gc.ca/obj/s4/f2/dsk1/tape10/pqdd_0002/... · 2005-02-11 ·...
TRANSCRIPT
INFORMATlON TQ USERS
This manuscript has been reproduced from the microfilm master. UMI films the
text direcüy from the original or copy submitted. Thus, some thesis and
dissertation copies are in typewriter face, while others may be from any type of
cornputer printer.
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, calored or poor quality illustrations and
photographs, pnnt bieedthrough, substandard margins, and improper alignment
can adversely affect reproduction.
In the unlikely event that the author did not send UMI a cornplete manuscript and
there are rnissing pages, these will be noted. AISO, if unauthorized copyright
material had to be removed, a note wiil indicate the deletion.
Ovenue materials (e-g., rnaps. drawings, charts) are reproduced by sectioning
the original, beginning at the upper left-hand corner and continuing fmm left to
right in equal sections with small overlaps.
Photographs included in the original manusa-pt have k e n reproduced
xerographically in this copy. Higher quality 6" x 9" black and white photographie
prints are available for any photographs or illustrations appearing in this wpy for
an additional charge. Contact UMI directly to order.
Bell & Howell Information and Learning 300 North Zeeb Road, Ann A W r , iV11 48106-1346 USA
Multiple Changepoints wit h an
Application to Financial Modeling
A thesis submitted to
the Faculw of Graduate Studies
in partial £ulfillment of
the requirements for the degree of
Doctor of Philosophy
School of
Mathematics and Statistics
Carleton University
Ottawa, Ontario, Canada
April 1999
National tibrary E3ibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services services bibliographiques
395 Wellington Street 395. me Wellington OttawaON K 1 A W O(tawaON K I A M Canada Canada
The author has granted a non- L'auteur a accorde une licence non exclusive Licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distnbute or sell reproduire, prêter, didbuer ou copies of this thesîs in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/film. de
reproduction sur papier ou sur format électronique.
The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts f?om it Ni la thèse ni des extraits substantiels may be printed or othenvise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.
"Cogito ergo sumP
- René Descartes
Abstract This dissertation de& wit h asyrnptotic met ho& in probability and statistics
which may, for example, be used in business and finance. In particular, we study
U-statistics based processes which may be used to detect (multiple or structural)
changes in the distribution of independent observations.
In Chapter 1 we give an o v e ~ e w of well-known results which wi l I be used
throughout this thesis. Chapters 2 - 5 deal with changepoint analysis.
Chapter 2 summarizes some basic properties of U-statistics and their use in the
context of change-point analysis.
In Chapter 3 we review and study U-statistics based processes under the nuIl
hrpothesis of nechange as wel1 as under the alternative hypothesis of at most one
change in the distribution.
Chapter 4 de& with U-statistics based processes when the alternative allows a t
most two changes in their distribution. As special cases, epidemic alternatives are
investigated as weU.
Using similar arguments as in Chapter 3 and Chapter 4, we extend our theory
of at most one and two changes to multiple change-points in Chapter 5.
Besides the mentioned general theory, in Chapters 3 - 5 we study in particular
changes in the mean and the variance, respectively. In addition to that we study
multiple changes in the mean and/or variance and give an application in Chapter 6,
which deals with changes of volatility in the h a n c i a l stock market, using the Black-
Scholes set-up.
The many repetitions in descnbing the hypotheses conçidered are done on pur-
pose for the convenience of the reader. We prefer to give as much details as possible
rather than to send the reader back to previous chapters.
In Chapter 1, Chapter 2 and Chapter 3 we summarïze well-known resuIts. In
addition, we discuss in Chapter 3 some results which are believed to be new. Results
in Chapters 4 and 5 as well as its application in Chapter 5 are also believed to be
new. Theorems, corollaries and lemmas due to other authors WU be accompanied
by their names in brackets.
Acknowledgements Before thanking d l those people who guided and helped me over the past years
and who were a very important part of my life 1 would like to quote a part of a story
by Charles Krauthammerl about the extraordinary mathematician Pat Erd6s:
A few years ago, Ron Graham (a fiend and benefactor of Erdos) tells
me, Erdos heard of a promising young mathematician who wanted to go
to H m d but was short the money needed. Erd6s arrwged to see him
and lent him $1,000. (The sum total of the money E r d k carried around
at any one time was about $30.) He told the young man he could pay it
back when he was able to. Recently, the young man called Graham to
Say that he had gone through Harvard and was now teaching at Michigan
and could finally pay the money back. What should he do?
Graham consdted Erd6s. Erdos said, "TeIl him to do with $1,000 what
1 did."
The reason why 1 tell this story is that a similar thing happened to me. If it
were not for ail the help of'my Professors Mikl6s Csorgô (Carleton University) and
Pd Révész (Technical University of Vienna) it wodd not have been possible for me
to study in Canada. I would like to thank them for the opportunity they have given
me and I hope one day 1 will be able to do the same for someone eke.
I would Iü<e to thank Professor Mikl6s CsGrg6 for al1 the discussions and meetings
we had, for al1 his suggestions, his patience, perseverance and financial support. I
have learned a lot £rom him. For that 1 am honored to have been his student. What
makes Professor Csorgo truly exceptional is that he continues t o teach his students,
even though he is officially retired. - . -
'Washington Post Writers Group, A life that added up to something. This axticie may be found on the followi~g webpage http:// cecm.sfi.cu/ personal/ jbowein/ erdos-html
As well 1 a m thankful to the School of Mathematics and Statistics for all their
financial support. In addition, my thanks also go to: the Pacifie Institute for the
Mathematical Sciences for inviting me to participate in the Industrial Problem Solv-
ing Workshop in Calgary last June; the P d Erdos Center of Mathematics in Bu-
dapest for inviting me to the Workshop on Random Walks last July to give a talk
on my research; Professor Barbara Szyszkowicz (Carleton University) for funding to
attend the Meeting of the Canadian Mathematical S o c i e ~ in Kingston last Decem-
ber; and the Fields Institute for inviting me to attend the Workshop on Probability
in Finance 1 s t January.
1 would dso like to thank Dr. Ri~ardaç Zitikis of the Laboratory for Research in
Statistics and Probabiliw, Cadeton Universiw, for al l the discussions and meetings
we had, his encouragement and everything he taught me. As well many th&
to Professor Pa1 Révész who was my 'Diplomarbeit' (thesis) supervisor in Viema,
Austria, for teaching me al1 the basics I needed and for offering a helping hand the
1 s t couple of years.
My experience in Canada would not have been worthwile if it were not for my
fkiends who were there for me whenever 1 needed them and who made my stay in
Ottawa so enjoyable and unforgettable. In particular, 1 would like to t h d Kul-
winder Saini, Paul Wear and Nick Xenos for their proof reading on several occasions.
1 would also like to express m y gratitude to Mrs. Gillian Murray and Mrs. Adrienne
Richter for the many questions they answered for me with so much patience.
To my best friend and fiancee, Silvia, her love, support and understanding sus-
tained me over the past years. Finally, 1 would like to thank those without their
help, love, understanding and support I would not have corne this far: my family.
Thanks for everything!
Gewidmet meinen Eltern, Melitta und Albert Orasch
Contents
Acceptance sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . u ...
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~u
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . v
Dedication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii ..-
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v u
ListofFigures . . - . . - . . . . . . . . . . . . . . . . . xi .a. List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xm
1 Preliminaries 1
1.1 BasicDefinitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
. . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Stochastic Processes 3
2 U-Statistics in Change-point Andysis 8
2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Motivation. II
2.3 Definîtion of U-Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Generalized U-Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Variance of U-Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Some Convergence Results for U-Statistics . . . . . . . . . . . . . . . 17
3 At Most One Changepoint 21
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Notations under the Null Hypothesis Ha . . . . . . . . . . . . 22
3.1.2 Notations under the Alternative HA . . . . . . . . . . . . . . . 26
3.2 Antisymmetric Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Asymptotic Results under Ho . . . . . . . . . . . . . . . . . . 28
3.2.2 Asymptotic Resultsunder . . . . . . . . . . . . . . . . . . 33
3.2.3 Estimating the Time of Change . . . . . . . . . . . . . . . . . 33
3.3 Symmetric Keniels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 Asymptotic Resirlts under Ho . . . . . . . . . . . . . . . . . . 35
3.3.2 Asymptotic Results under HA . . . . . . . . . . . . . . . . . . 40
3.3.3 Estimating the Time of Change . . . . . . . . . . . . . . . . . 45
. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Change in the Mean 53
3.5 Change in the Vaxiance . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 At Most Two Change-points 70
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.1.1 Notations under the Null Hypothesis Ho . . . . . . . . . . . . 72
4.1.2 Notations under the Alternative .Hf ) . . . . . . . . . . . . . . 74
4.2 Antisymmetric Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.1 -4symptotic Results under Ho . . . . . . . . . . . . . . . . . . 76
4.2.2 AsymptoticResultsunder HF) . . . . . . . . . . . . . . . . . 85
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Symmetric Kernels 92
. . . . . . . . . . . . . . . . . . 4.3.1 Asymptotic Results under 92
4.3.2 AsynptoticFtesults under HF' . . . . . . . . . . . . . . . . . 100
4.4 Changes in the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.5 Changesin thevariance . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.6 Epidemic Alternatives . . . . . . . . . . . . . . . .. . . . . . . . . . 119
5 Multiple Change-points 131
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.1.1 Notations under the Null Hypothesis Ho . . . . . . . . . . . . 133
5.1.2 Notations under the Alternative H$' . . . . . . . . . . . . . . 136
5.2 Antisymmetric Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.2.1 AsymptoticResultsunder Ho . . . . . . . . . . . . . . . . . . 138
5.2.2 Aqnnptotic Results under H$) . . . . . . . . . . . . . . . . . 148
5.3 Symmetric KerneIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.3.1 Asymptotic Results under Ho . . . . . . . . . . . . . . . . . . 154
5.3.2 Asymptotic Resuits under H$) . . . . . . . . . . . . . . . . . 163
5.4 Multiple Changes in the Mean . . . . . . . . . . . . . . . . . . . . . . 172
5.5 Multiple Changes in the Variance . . . . . . . . . . . . . . . . . . . . 174
5.6 Multiple Changes in the Mean andlor Variance . . . . . . . . . . . . 175
5.7 Estimating the Number of Change-points . . . . . . . . . . . . . . . . 177
6 Applying Change-point Theory t o the Financial Market 186
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.2 Derivative Secunties . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.2.1 Forward Contracts . . . . . . . . . . . . . . . . . . . . . . . . 187
6.2.2 Futures Contracts . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.2.3 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.3 Modeling the Behavior of Stock Prices . . . . . . . . . . . . . . . . . 190
6.4 The Black-Scho1.e~ Formula . . . . . . . . . . . . . . . . . . . . . . . . 194
6.5 Changes in the Volatility . . . . . . . . . . . . . . . . . . . . . . . . . 198
Bibliography 203
List of Figures
3.2.1 The limiting function tif (t) with OIt2 = 10 takes its maximum value of
2.2 at t = X = f. Note: The z-rucis denotes t and the y-axis denotes
üx(t). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Summation Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.3 The Limiting function u ~ ( t ) with 61 = 1, O2 = 2 and = 3 takes 10
its maximum value of 0.55 at t = 0.475. . . . . . . . . . . . . . . . . . 46
3.3.4 The limiting function ul(t) with 81 = 3, 82 = 2 and OIt2 ='1 takes its
maximum value of 5 at t = !. . . . . . . . . . . . . . . . . . . . . . . 47
3.3.5 The Limiting function v ~ ( t ) with 81 = 1, & = 2 and = 3 takes 10
7 its maximum value of O.63at t = A = . . . . . . . . . . . . . . . . 50
3.4.6 The data XI, . . . , &Oo are i.i.d. N(1, 1)-distributed and Xioi, - . . , Xlooo
are i.i.d. N(4,l)-distributed. . . . . . . . . . . . . . . . . . . . . . . 54
3.4.7 A geometrical interpretation of lE{nS(k) - k S ( n ) ) = O under Ho. . . 56
3.5.8 The data XI, . . . , are i.i.d. N(0, 1)-distributed and X701, . - . , XIOOO
are i.i.d. N(O,4)-distributed. . . . . . . . . . . . . . . . . . . . . . . 62
4.4.1 The data XI, . . . , are i.i.d. N(0,l)-distributed, X3al, . . . , Xtoo
are i-i-d. N(3,l)-distributed and X701i . . . , XIWO are i.i.d. N(2,l)-
distributed. . . . . . . . . . . . . . . . - . . . . . . . . . . . . . . . . 110
4.4.2 A geometricd interpretation of IE{k2S(kl) +(n-kl)S(k2) -k2S(n)) =
O under a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Il1 4.5.3 The data XI, .-. , Xioo are i.i.d. N(O,2)-distributed, Xiol, ... , XSo0
are i.i.d. N ( O , 1)-distributed and X301r.-- ,X~ooO are i-i-d. N(O,3)-
distributed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
... 4.6.4 The data XI, , and X701, ... , XlOo0 are i.i.d. N(0, 1)-distributed
- - - . . . . . . . and X3017 , are i.i.d. N(3,l)-distributed. 120
4.6.5 Summation Area of fixlsh (tl , t2) . . . . . . . . . . . . . . . . . . . . . 124 4.6.6 The limiting function 211 2(tl, t2) with 012 = 10 takes its maximum
3'3
value of 2.2 at the point (i, i). . . . . . . . . . . . . . . . . . . . . . . 125
4.6.7 The limiting function ü 1 2 (t17 t2) with = 10 takes its maximum 10'10
value of 0.9 at the oint (+,&). . . . . . . . . . . . . . . . . . . . 126
4.6.8 The limiting function ü 1 9 (tl, t2) with = 10 takes its maximum 10 ' 10
value of 1.6 at the point (&, 5). . . . . . . . . . . . . - . . . . . . 127
4.6.9 The limiting function Üi,&, t2) with = 10, = 20 and 82,3 =
. . . . . . . -12 takes its maximum value of 3.05 at the point ( h l i) 130
6.3.1 Daily stock pnces with different volatilities al = 0.2 and 0 2 = 0.4. . 193
6.5.2 Price of an European caU option depending on o for K =$l5 (above)
and K=$25 (below). . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
xii
List of Syrnbols
statement holds almost surely
number of elements in the set A
Brownian bridge
expectation value of the random variable X
nul1 hypothesis
alternative hypothesis of s changes
kernel function with (x, y) E R x R
indicator function: I(X $ z) = 1, if X 5 x, other-
wise O
independent identically distributed random variables
natural logarithm
set of ail positive integers: {l, 2,3, . . .}
probabiliw space
probability that the random variable X is less than
or equal to x
set of all real numbers: {z : -XI < x < m)
sum of the squared random variables XI, . . . , X, number of change- points
price process of a stock, O 5 t 5 T
sum of the random vaiiables X I , . . . , X,
Wiener process
X is a normal random variable with mean p and
variance d largest integer less than or equal to x
average of the random variables XI,. . . , X, - xn
l i n q , , ~ d { I , I > c } = 0
X, is bounded in probabüity
2 converges to zero in probability
X, converges to zero in probability
X, converges almost surely to zero
variance of the random &able X
equality in distribution
convergence of real numbers
convergence almost surely
convergence in probability
convergence in distribution
equal by definition, where the new expression is on
the dotted side
indicates the end of a proof
xiv
"A mathematician is a machine for
tuming coffee int O t heorems.''
- PZ ErdOs
Chapter 1
Preliminaries
In this chapter we give some important definitions used throughout this thesis
which focuses on probability, statistics as well as on stochastics.
1.1 Basic Definitions
In this section some fundamental concepts of probabilty theory are given. The basic
mathenatical tool for this purpose is measure theory. We give a bnef survey of some
concepts that wiU be required in the following chapters. For further references, we
refer, for example, to Billingsley (1986).
Definition 1.1.1 0, an erbitrary space or set of points w, stands for the set of dl
possible outcomes w of an experirnent. A class 3 of subsetsl of an arbitrary nonemty
space R is called an algebra or field if the following are satisfied:
Q E ~ ,
A E 3 AC E 3, where AC denotes the complement of A,
A , B E F - A u B E T .
A c las F of subsets of R is caUed a O-field, if in addition to being a field, the
field also satisfis
Al,A2 ,... € F - A , U A 2 U ... €3.
'Many authors also use the sp -bo l A instead of F.
1.1. Basic Definitions 2
Defhïtion 1.1.2 A set fitnction is a rd-valued function deked on some class of
subsets of Q. A set function p on a field 3 of subsets of S2 is a countably additive
measure if the following is satisfied:
p(A) E [O, m] for A E 3,
P(0) = O,
If iJE, Ak E F for a disjoint sequence of F-sets Al, A*, . . . then
the secallecl wvntable additivity.
Moreover, the triple (0, T, p) is a measure space if p is a measure on a a-field T of
subsets of R. The pair (R, F) is a measurable space if 3 is a O-field of subsets of
a.
Definition 1.1.3 Consider two rneasurable spaces ( R , 3 ) and (a', T ) and the m a p
ping f : R + R'. Shen f is measurable if f -'(Af) = {w E 0 : f (w) E A', A' E 3') E
F. In probability context, a red measurzble function is called a random variable.
Definition 1.1.4 We cal1 a set function IP on a field 3 a probability measure if the
following is satisfied
O 5P(A) 5 1 for a l l A ~ 3 ,
P(0) = 0 and P(R) = 1,
If Ug?_, Ak E 3 for a disjoint sequence of F-sets Al, A*, . . . then
We cal1 the triple (a, F, P) a probability (measure) space if 3 is a 0-field of subsets
of f2 and IP is a probability measure on F.
1.2. Stochastic Processes 3
1.2 Stochastic Processes
In this section the general concept of a stochastic process and some of the most
important properties of such processes will be introduced. We ~ 3 1 also introduce
the notion of Gaussian process which plays a fundamental role when talking about
Wiener processes, Brownian bridges and Ohrnstein-Uhlenbeck processes. After in-
troducing these stochastic processes we state Donskerys theorem which is of crucial
importance when dealing with weak convergence of partial sums. For hrther refer-
ences on concepts related to stochastic processes we refer, for example, to Cramér
and Leadbetter (1967).
Definition 1.2.1 Let (a, 7, P) be a probability (measure) space and T be a given
parameter set. Shen a finite and real-valued function X = {X(t) = X(t, w ) ; t E T)
which is a measurable function of w E R for every fixed t is called a stochmtic
process.
The index set T is said to be discrete if it contains at most countably many
points, otherwise it is said to be wntinuow.
A one-parameter stochastic process is a stochastic process with a one-dimensional
index set T, while an s-parameter stochastic process has an s-dimensional index set
If we fix t E T, then X(tta, w ) , w E $2, is a random variable and if we fix w E Ci,
tben X(t, wfk), t E T, is the so-called sample puth or trajectory fiuLction of X .
Next we define a special stochastic process which plays a cruicid role in the area
of stochastics.
Definition 1.2.2 Let X = { X ( t ) : t E T) be a stochastic process on ( R , 3 , P). If
dl its hitedimensional distributions are normal, or in other words, if every finite-
dimensional vector ( X ( t l ) , . . . , X(tk) ) , ti E T, i = 1 , . . . , k, k E N, of the process X
has a multivariate normal distribution, then X is called a Gaussian process.
A Gaussian process is uniquely determined by its mean and its covariance func-
tion, since these two specib uniquely any multivariate normal distribution. One of
1.2. Stochastic Processes 4
the most important Gaussian processes is the Wiener process. Throughout the rest
of this section we use the notations fkom Csorgo and Révész (1981, Chapter 1).
Dennition 1.2.3 A stochastic process {W(t; w ) = W(t) ; O 5 t < cm), where w E
R, and (R, F, IP) is a probabiiity space, is called a (standard) Wiener process or
Broumian motion, if the following is satisfied:
W ( t ) - W ( S ) % N ( O , ~ - s ) f o r a U O ~ s < t < c a andW(O)=O,
W(t) is an independent increment process, that is W(t2) - W(t1) , W(t4) -
W(t3), . . - , W(tZi) - W(tZi-1) are independent r x ' s for all O 5 tl < tz 5 t 3 < t4 5 ... 5 tZi-1 < t2i < 03 (i = 2,3* .. .), the sample path function W(t; w) is continuous in t with probabiliv 1.
The propem that W(0) = O is merely a normalization and is a convenience rather
than a basic requirement. If a process { ~ ( t ) ; O 5 t < m) satisfies the other
assumptions but not W ( O ) = O , then the process { ~ ( t ) - ~ ( 0 ) ; O 5 t < w) would
be a standard Wiener process. The property that the sample path is continuous
in t with probability 1 follows fkom the first two properties in the definition of a
Wiener process in the sense that given a process w with the first two properties,
there always exists a version which satiçfies the third property (cf. Doob (1953) and
Resnick (1992)).
Cleady, the first two properties imply that the covariance function of a Wiener
process is
@ W ( s ) W ( t ) = I E W ( s A t ) W ( s V t )
= E ( W ( s A t ) W ( s V t ) - w Z ( s A t ) + w2(s A t ) )
= l E W ( s ~ t ) ( W ( s v t ) - W ( s A t ) ) +lEw2(s ~ t )
= EW(s A t ) E ( W ( s V t ) - W ( s A t ) ) +EW*(S A t )
A constructive proof for the existence of this process is, for example, given by Csorgo
and Révész (1981). For a historical summary and development of a Wiener process,
a s well as for further results, we refer to Cs6rg6 (1979).
1.2. Stochastic Processes 5
We now define another Gaussian process of crucial importance, the so-called
Brownian bridge.
Definition 1.2.4 A stochastic process ( B ( t ) ; O 5 t 5 1) is cdled a Broumian bridge
or tied down Brownian motion, if the foltowing is satisfied:
a the joint distribution of B(t,), ..., B(t,) ( O 5 tl < t2 < ... < t, 5 l ;n =
1,2, . . .) is Gauçsian, with IEB(t) = 0,
a the covariance function of B(t) is
the sample path function of B(t; w ) is continuous in t with probability 1.
From the second property it follows that
The existence of such a Gaussian process is an easy consequence of the following:
Lemma 1.2.1 Let {W (t); O 5 t < cu) be a Wiener process. Then
is a Brownian bridge.
When considering a Wiener process, we can see that the increments of the pro-
c e s over non-overlapping t ime int ervals are st ationary and independent, but the
process itseif is not stationary, that is to Say, we do not have the propem: Given
a stochastic proceçs { X ( t ) ; t E T), then for any positive integer k and any points
t 1 , . . . , tk E T the joint distribution of X ( t l ) , . . . , X( tk ) is the same as the joint
distribution of X( t l + h) , . . . , X (tk + h) for al1 h E T, i-e. (X( t l ) , . . . , X (tk)) 2 ( X ( t l + h), . . . , X (e + h)). This means that a stationary process is a stochastic
proces whose finite dimensional distributions rernain unchanged through shifts in
tirne. One c m show that the covariance function lEX(s )X( t ) of such a process is
1.2. Stochastic Processes 6
a function of 1s - tl, the length of the interval. Hence, due to their respective co-
variance functions it is obvious that Wiener aad Brownian bridge processes are not
st at i o n q .
But we can define a stochastic process by rnodifying a Wiener process, such that
the new process is dationary. Consider the Gaussian process { U ( t ) = p; 0 5 t < CO). By the definition of a Wiener process it follows that
Definition 1.2.5 We d e h e the stationary Gaussian process V(t ) , the so-cded
~ r n s t e i n - iihlenbeck pmcess, via
We have that EV(t) = O, IEV2 (t) = 1, and EV(s) V(t ) = e-lt-SI.
Next, for further use, we quote the celebrated Donsker Theorem. Let XI, X2, . . . be i.i.d.r.v.'s a i th EXl = O, EX: = 1 and distribution function F. Further,
we put Sn = S(n) = Zn=, Xi. We construct a sequence of stochastic processes
{S,(t); O 5 t < 1) on C(0, l), the space of aU continuous functionç on the interval
(0, l), h m the partial sums So, Si, S2,. . . : Sn as follows:
We quote Donskerys Theorem (1951) as follows:
Theorem 1.2.1 (Donsker, 1951) We have, as n + oo,
h(Sn(t)) 5 h(W(t))
2 Stochastic Processes 7
for euery wntinuous fvnctional h : C(O: 1) + R I with respect to sup-nom topology.
Before appiying Donsker's Theorem tu some special functionals, we quote
Theorem 1.2.2 (Slutsky, 1925) Let X, -% X and Y, 5 c, whe~e c is a finite
wnstant, then, as n + ou, xn+yn -P ,x+c , XnYn 4 CX, Xn/Yn P, X/C i f C # O.
Donsker's theorem immediately implies that, as n -t CU,
S,(t) -% W(t), for t fixed in (OJ].
Combined with the fact that *(nt - [nt]) 5 O, Slutçky's theorem implies, a s
n + 00,
S([ntl) pt W@), fi
for t h e d in (0'11.
Moreover, we also have, as n + oo,
"If 1 feel unhappy, 1 do mathematics to become happy.
If 1 am happy, 1 do mathematics to keep happy."
- Alfiéd Rényi
Chapter 2
U- St at ist ics in Change-point Analysis
Introduction
The problem of abrupt parameter changes -es in many situations of daily life,
as well as in a variety of experimenta.1 and mathematical sciences. For instance, in
medicine one may be interested in testing whether treating HIV patients with a new
dmg stabilized the condition of the patients, and if not, in estimating the time(s) of
change(s) in order to change the treatment. Another example is variance andysiç
in daily stock prices, where the use of change detection methods allows an investor
to reduce his/her risk. In Chapter 6 a particular financial mode1 is described and
a special detection method is suggested. Detection of possible change-points is for
instance &O of interest in archaeobgy, in econometrics, in epidemiology, in nuclear
physics and in quality control.
In practice, usually a (large) set of data is observed, for instance the daily stock
prices over the ps t year. Then a statistical test should determine whether there
was a change in the data or not. This set of data may be rnodeled by saying that
we observe independent random variables over a special period of tirne, hence via a
random process. Then we wish to detect whether a change could have occured in
the distribution that governs this random process as tirne goes by.
We wish to study such phenomena in t e m of special stochastic processes based
on U-statistics. It is needless to Say that there are many other ways t o study such
phenomena. The construction of these processes is such that statistical tests can be
2.1. Introduction 9
based on them for det ecting possible changes in their distribution.
Change-point problems have origindy &en in the context of quality control,
where one typically observes the output of a production line and would wish to
signal deviations fIom an acceptable output level while observing the data. TVhen
one observes such a random process sequentially and stops observing at a random
time of detecting change, then one speaks of a sequeratéal procedure (e. g., stop a
production line, if a specified percentage of the ouptput is not good). Otherwise,
one usually observes a change in a chronically ordered finite sequence for the sake
of determining possible change(s) during the data collection (e. g., check whether
a production line produced reasonable output or not). Most such f i ed sample site
non-sequential pmcedures are descnbed in te- of asymptotic results ('inhite'
sample size, i-e., n -t 00).
Depending on whether the distribution of the data is assumed to be h o w n we
use either parametn'c or nonparameMc rnodelç. For parametric modelç we refer to a
survey of Csorgo and Homath (1997) at the end of the first Chapter of their book.
For nonparametric cases we refer, for example, to Brodsky and Darkhovsky (l993),
Csrjrgo and Horvgth (1988a, 1988b, lW7), Ferger and Stute (l992), Szyszkowicz
(1992, 1996, 1998) as well as to their bibliographies.
We now state the problem of testing for a change in the data in a more mathe-
matical way. Suppose we wodd Iike to test the null hypothesis
Ho : XI, . . . , X, are independent identicully distn'buted random variables
againçt the alternative that there is at most one change-point in the sequence Xi,
. . ., X,, namely that we have
: XI,. . . , X , are independent random variables and there is an in-
teger T, 1 < T < n, S U C ~ that P { X 1 5 t ) = . .. = P{X, 5 t ) ,
ex,+, 5 t ) = . . . = P{X, 5 t ) for al1 t and P{X, 5 t ~ ) # flXT+l 5 t o } for sorne ta.
This means that we are testing, for having n independent random variables belonging
to the same distribution, versus the f h t r ones belonging to the same distribution
2 Introduction 10
and the last n - T ones to a Beren t one. Therefore we will compare the fint k
observations to the last n - k ones by uçing a bivariate function h(x, y) t h a ~ is often
c d e d the kemel function in the literature on U-statistics (see Section 2.3).
Instead of the alternative & of at most one change, we will &O consider a more
general one that d o w s at most s change-points' in the sequence XI, . . . , X,, namely
that we have
ET:) : XI, . . . , X, are independent random variables and there exîst s,
1 < s < n, integers TI = T&), r z = r2(n), . . ., rS = r,(n), 1 < - 7 1 5 7-2 5 . . . < rs < n, such that l'{XI 5 t ) = . . . = P{Xn 5 t) ,
- P{Xq,, 5 t ) = ... = B7(X, 5 t), .. -, 4x,,+, 5 t ) = ... - q X , 5 t) for all t and P{X, 5 to) # P{X,+I 5 to) for some to
and for all 1 < i < S .
Similarly as before, we are now testing for at most s changes in the distribution.
Hence, we will have to use a statistic that will sornehow 'feel' the possibility of s
changes. We will split the given sample of size n into s + 1, 1 5 s < n, blocks,
compare each of them and combine the correspondhg kernel functions h(x, y) a p
propriat ely.
In view of Chapters 3 - 6, here we give an o v e ~ e w of the so-called Pstatistics
which, for example, will be used to detect changes in the mean or the variance. We
give some basic r d t s , defmïtions, and examples (cf. Seding (1980), Chapter 5).
This class of st atistics was introduced in a fundamental paper by Hoeffdiiing (1948).
The members of this class have good consistency properties and we only assume
that our n observations are independent and identicdly distributed. An appealing
feature of a U-statistic (cf. S e f i g (1980), Section 5.3) is its simple stmcture a s a
sum of identicdly, but not necessarily independent, distributed random variables.
However, by the special device of sprojection', a U-statistic may be approxïmated
by a sum of i.i.d. r.v.'s and then classical limit theory for sums does carry over to
U-statistics. For pro06 we again refer to Seding (1980).
'1x1 this ixi well as in the following parts ofthis work s will always denote the number of changes. Hence, whenever we are tniking about s changes, we think of s as being an integer, i. e., s f N.
2.2, Motivation II
2.2 Motivation
As mentioned earlier, large parts of this thesis focus on constmcting stochaçtic
processes based on U-statistics. The construction of these processes is such that
statistical tests can be based on them for detecting possible changes in their distri-
bution.
Tests for a t most one change-point which are based on processes of U-statistics
were first studied by Csorg6 and Horvith (1986, 1988b, 1997). They investigate the
asymptotic properties (as n + m) of the U-statistics based process
where the kernel h(x, y) is either symmetric, i.e.,
h(z, y) = h(y, x), for dl x, y E Ry
Typicd choices for (2.2.2) are zy, (çarnple variance), lz - y 1 (Giniys mean
difference) , or sign(x + y) (Wilcoxon's one-sample statiçtic) .
Typical choices for (2 -2.3) are (z - y) or sign (x -y), since sign(x - y) = -sign(y - x) and sign(0) = O. Later on we will use to detect changes in the Msiance
and x - y to detect changes in the mean.
Csorgo and Horviith (1988b, 1997) give various asymptotic distributions of the
U-statistics based process {Zk, 1 5 k < n: ) under the null hypothesis Ho and the
alternative Ha for çymmetric and antisymmetric kernels. They also give tests that
can be used to reject Ho vs.* HA.
2.3 Definition of U-Statistics
Throughout this section we refer to Sertling (1980) and Casella and Berger (1990).
Definition 2.3.1 A parameter B is said to be estimable of degree T for a family of
distributions 7, if r is the smallest sample size for which there exists a function
h*(xl, . . . , x,) such that
for all F E 7, where X I , . . . , X, are independent observations on a distribution F.
h* (xl, . . . , x,) is called the kernel of 0 and does not depend on F.
If h'(x1, . . . , zr) = h* ( x ~ , , . . . : xar ) for all permutations of the integers (1, . . . , T ) , then h8(zl , . . . , x,) is called a symmetric kernel.
Let
where & denotes the sumrnation over all r! permutations (<ri, . . . , a,.) of (1,. . . , r). Then h of (2.3.2) is always a symmetnc kernel.
Definition 2.3.2 We define the mean square e m r of an estimator V, of a parameter
8 to be the function of 0 dehed by
Moreover, (EFVn - 8 ) is caIled the bias of a point estimator Vn of a parameter B .
Vn depends only on the random sample X I , . . ., X,, i. e., = V(Xl, - . . , Xn). If the bias is identically (in 8) equal to O, then the estimator Vn is called unbiused and
satisfis
Dennition 2.3.3 A U-statistic of an estimable parameter of degree r is created
with the symmetric kernel h(-) by forming
where Cc denotes the çummation over aU (:) combinations of r (n 2 r ) distinct
elements (al, . . . , %) kom (1:. . . , n).
Un is an unbiased estimator, since IEFU. = B.
Example 2.3.1 The sample mean If B(F) = mean of F = p(F) = J z d F ( x ) and
h(x) = x, then the corresponding U-statistic is
Example 2.3.2 The sample variance. If B(F) = variance of F = a2(F) = J(x -
~ ) ~ d F ( z ) and h(zl, x2) = f (zl - x * ) ~ , then the corresponding U-statistic is
When dealing with one sarnple U-statistics in this work, we will dways have r = 2.
Therefore our kernel functions h will depend on two (real-Vitlued) axguments only.
Next we will define special functions which wÏU be used throughout this work for
r = 2, e. g., change in the mean (see Section 3.4) or variance (see Section 3.0). We
consider the symmetric kernel h(xl, . . . , xT) where
2.4. Generalized U-Statistics 14
and put h, := h. We center at expectation, by defining
and put & := %. Since EFh = 0,
Furthemore, we define Co := O and, for 1 < c T ,
The functions h, and L, depend on F for c 5 r - 1 and the role of these functions
is technical only. E. g., they are used to calculate the variance of U-statistics (see
Lemrna 2.5.1). An application is given in Section 3.1.1, where we use the same
methodology for T = 2.
2.4 Generalized U-Statistics
Throughout this section we refer to S e r f h g (1980).
In the case where we have one or several changes in the data, Say we have
s changes, s E N, we consider s + 1 independent collections of independent ob-
servations x,(') , xi1), . .. , x?), ~ f j , . . - , x!+'), xisf 11, . . . , taken from distribution
functions F('), . . . , respectively. Shen for h being symmetric within each of
2.4. Generalized U-Statistics 15
its (s + 1) blocks of arguments we have
where B denotes a parametric function for which there is an unbiased estimator.
Definition 2.4.1 The U-statistic for eçtimating 8 is defined as
where {ijl,. . . , ija,) denotes a set of aj distinct elements of the set {1,2, . . . ,nj),
1 5 j 5 (s + l), and Cc denotes summation over dl such combinations.
In our specific setup nght now we will deal with one single changepoint (s=1)
at time r = r(n) := [dl, O < X < 1, in the data set X1, X2, . . ., X, and therefore
we will have the two independent collections XI7X2,. . . , Xr and X,+l, X7+27 . . ., X, of independent observations taken kom distributions F(') and ~ ( ~ 1 , respectively.
Hence, nl and nl in (2.4.2) are equal to T and n - r, respectively. Using the previous
notations with al = q = 1 and s = 1, we get
and furthemore,
We mention again that T depends on n.
Example 2.4.1 The Wilwxon 2-sample statistic. Consider two randorn samples as
above and, in addition, we also assume that F(') and F(2) are continuous. Then the
2 -5. Variance of U-S t atistics 16
is an unbiased estimator for
B(F('), ~ ( ' 1 )
2.5 Variance of U-Statistics
Ln Section 2.3 we have seen that U-statistics are unbiased estimators. But in addition
to that we will see that those classes of estimators are the best under aIl unbiased
estimators (cf. Casella and Berger (1990) and Ser3ing (1980)).
Dennition 2.5.1 An estimator V,' is cdled best unbiased estimator or uni fom min-
imum variance unbimed estimator of 0, if EFVn = 8, for dl 8, and for any other
estimator V, wïth EFVn = O, we have Var&: 5 VarFVn, for a l l 8.
We now state two lemmas due to Hoeffding (1948) which give an expicit formula
for the cdculation of the variance as well as upper and lower bounds and asymptotic
pro perties.
Lemma 2.5.1 (Koeffding, 1948) The variance of Un as in (2.3.3) is given by
where cc i s defined
The cdculation of the variance can be very
is very useful. It gives us an upper and lower
totic behavior of the variance.
difficult and therefore the next lemma
bound, the behavior, and the asyrnp
Lemma 2.5.2 (Hoeffding, 1948) The vanance of U, crs in (2.3.3) satisfies
2.6. Some Convergence Results for U-Statistics
That U-statistics a.re the best in the class of unbiased estimators of B(F) is stated
in the following theorem:
Theorem 2.5.1 ( S e f i g , 1980) If Sn = S(Xl , . . . , X,) is an tlnbiased estimator of
B(F) based on the sample X I , . . . , X, from the distribution F , then the w ~ e s p o n d i n g
U-statistic is also unbiased and VarF(Un) 5 VarF(Sn) . Equality holds if and only
if Sn =Un-
2.6 Some Convergence Results for U-Statistics
The following as. behavior result of U-statistics is due to Hoeffding (1961) and
follows from the Strong Law of Large Kumbers. It iç a very important resdt and
wili for example be used to prove Theorem 3.3.2.
Theorem 2.6.1 (HoeEding, 1961) Let Un be defined as in (2.3.3) and 0 as in (2.3.1).
If .&[hl < a>, then, as n -t w,
For generalized (s + 1)-sample U-statistics, Sen (1977, Theorem 1) obtains strong
convergence of ~ 2 ~ ' ) under a stronger condition than that of Theorem 2.6.1.
Theorem 2.6.2 (Sen, 1977) Let uP+~) be defined as in (2.4.2), 0 as in (2.4.1), and
logC x = l o g ( x ~ l), vhere XV 1 denotes the maximum of x and 1. If Bp<~~,--.,Fcs+~, {]hl
logf [hls) < m, then, as nj + CC, 1 5 j 5 (s i- l ) ,
We mention that in Sen's theorem the condition
2.6. Some Convernence Results for U-Statistics 18
could be replaced by the stronger condition
since (2.6.2) implies (2.6.1). We note that condition (2.6.1) ïs sufEcient but not
necessaq (cf. Sen (1977, Section 3)).
In view of the next chanters, where we wül establiçh convergence in probability
results for a combination of U-statistics, we consider the two random samples XI, XZ,
. . ., X, and X,+i, Xr+21 . . ., Xn taken fiom distributions F(') and F(*), respectively,
where r = ~ ( n ) := [nX], O < X < 1. If we assume that lEFc~,{h(XL, X2)) < CY) then,
as n + w, we have by Theorem 2.6.1 that
Assuming that 82 := EF(~)(h(X,+l, XI+*)) < CO and defîning Yi = Xihr 1 -<
If n + w, then we have almost sure convergence of the last expression to &. Hence
we also have, as n + oo,
When combining the two different random samples with nl = ~ ( n ) := [nX],
722 = n - ~ ( n ) := n - [nX], O < X < 1 and cul = s = 1, we have t o assume that
2.6. Some Convergence Results for U-Statistics 19
I E F ( ~ ~ P ( 2 1 {Ih(Xi, X j ) ( logC Ih(Xi, Xj) 1) < w, 1 5 i < j 5 n. Then Theorem 2.6.2
applies and as n + oo we have that
where := IEF(~)3~~){h(XT, Xz+l)) and T := [TAXI, O < X < 1.
If F = F(') = ~ ( ~ 1 , t hen for u (~ ) ( X , , . . . , X, ; X,,l, . . . , X, ) in (2.6 -5) we have,
where el := IEF{h(Xl, X 2 ) ) and T := [nX], O < X < 1. Therefore, in this case,
(2.6-5) holds tme assuming I E F { ~ ( X ~ , X ~ ) ) < oo only, just Like in (2.6.3). More-
over, (2.6.3), (2.6.4) and (2.6.5) converge in probability to the same finite value.
Example 2.6.1 Let us assume that ~ ( l ) = N ( p l , 4) and F ( ~ ) = N(p2 , 6) and
d e h e the kemel h(x, y) = ?(z - y)2. This kernei was used in Example 2.3.2 where
we considered the sample variance. Using the results from above and letting again
r(n) := [nX], O < X < 1, we have that, as n + oo,
2.6. Some Convergence Results for U-Statistics 20
where O1 := IEF{h(Xi, X z ) ) = a2, an account of pl = ,u2 and a: = 02 =: 9.
'"Tf the facts don% fit the theory,
h g e the facts."
- Albe7-t Einstein
Chapter 3
At Most One Change-point
3.1 Introduction
We are to test the nul1 hypothesis
Ho : X I , . . . , X, are independent identically d i s t ~ b u t e d r a n d o m variables
against the alternative that there is at most one (single) change-point in the sequence
XI) X2, . . - > Xn, namely that we have
HA : XI,. . . , X, are independent random variables a n d there is a n in-
teger T, 1 5 T < R, such that P{Xl < t ) = . . . - - qxT 5 t ) ,
f lXr+, t ) = ... = P{Xn 5 t ) for all t and P{Xr 5 to) # IP(X,+l $ to) fo r some to.
As mentioned in Section 2.2, tests for at most one change-point which are based on
processes of U-statistics were studied by Cs6rgo and Homath (1986, 1988b, 1997).
Similarly we will investigate the asymptotic properties (as n + oo) of the U-statistics
based process
where the kernel h(x, y) is either symmetric or antisymmetric. We will state their
basic results and, moreover, we will impose conditions such that we will have a
3.1. Introduction
good estirnator for the time of change not only in the antisymrnetric case (cf. Sec-
tion 3.2.3), but dso in the symmetric one (cf. Section 3.3.3): which they have not
investigated. Then w e will apply the theoretical results to detect at most one (sin-
gle) change in the mean (h antisymmetric) and in the variance (h symrnetric). In
particular, testing for at most one change in the mean is illustrated by using a ge-
ometncal argument (cf. Section 3.4). We mention that though in part we build on
them, most of the results of Csorg6 and Horviith (1986,1988b, 1997) in this section
becorne imrnediate consequences d the results in Chapter 5 which ded with multiple
changes (put s = 1).
3.1.1 Notations under the Nul1 Hypothesis Ho
We define1
and
I E ~ ~ ( x ~ , x ~ ) =: Y, I < i < j s n .
We assume throughout the whole chapter that
V < 00,
which of course implies
lNote that in Section 2.3 and Section 2.6 we used the equident notation 8 := IE~h(xi , Xj), where F denotes the distribution function of the i.i.d. r.v.'s XI, -. . , X,, instead-
3.1. Introduction 23
Furthemore, the expected value of Zk, 1 5 k < n, dehed by
where the kernel h(z, y) is either symmetric or ant&etrïc, becomes
Let X and Y be independent identically distributed random variables and h be
an antisymrnetric kernel as defined in (2.2.3). Shen we have that
which is possible only, if
h(t) := ~ { h ( t , x2) - 0).
Then condition (3.1.4) implies that
E~*(x,) < M.
3.1. Introduction 24
We also assume that
which is the so-callecl non-degenerute case mhen studying ùistatistics. In this and
the following chapters we will focus on this case, but we mention that the degenerate
case in the context of change-point analysis was, for example, studied by Csargo and
H o d t h (1997, Section 2.4).
We mention that we will use the symbol 9 in this chapter, as meli as in the
following ones, for the variance of the given data X I , . . . , X,, and s2 for the expected
value as in (3.1.10). The function in (3.1.9) induces the projection of U-statistics
into sums of i.i.d. r.v.>s. We centralize Zk by its mean, and put
where the kernel function h in Zk is symmetric. Actually, we define the same for an
antisymmetréc kernel function, but since (3.1.8) holds,
and in th& case we define
Moreover, we may write Zk as the s u m of three U-statiçtiw and accordingly, Us, 1 5
k < n, equals to
where
3.1. Introduction 25
Similarly we define for an antisymmetric kernel
where
For further use we dso define
and
In this, and in the following chapters as well, U-statistic based processes that
have a bar on top have an antisymmetric kernel and those without a bar a symmetric
kernel. This will make it easier to distinguish between the antisymmetric and the
symmetric cases.
3.1. Introduction 26
3.1.2 Notations under the Alternative HA
Let ~ ( ' ) ( t ) = lP {X , 5 t) and F(*) (t) = lP{X,+l 5 t ) be the respective distribution
functions of the observations before and after the postuiated change T, and put2
and
Assume throughout the whole chapter that Eh2 (Xi, Xj) is finite for all possible
choices of i and j, namely
which implies that
Due to (3.1.17) and (3.1.20), we can calculate the expected value of Zk7 1 5 k < n,
which is dehed by
where the kemel h(z, y) is either symmetric or antisymmetric. Namely, under HA,
* ~ o t e that in Section 2.4 and Section 2.6 we i w d the equident notations Bi := EFci1 h(Xi, X2), $2 := EFW h(X,+i, &+z) and 81.2 := IEF~*, ,F(2) h(X,, x,+~) instead.
3.1. Introduction 27
we have
- 7 % ~ + k ( ~ - k)&, i ~ k ~ . r (3.1.22)
~ ( n - k)ûl,2+ (n- k)(k-T)&, T 5 k < n-
Therefore, by using (3.l.8), we get that in the antisymmetn'c case
and (3.1 -22) becomes
When searching for the possible time of change T , one has to investigate the
behavior of the process &, 1 5 k < n; under HA, and when loolcing at the expected
value of the process as a function in k, we see that it is increasing before the postu-
lated change-point r and decreasing after it, when assuming that h is antisymmetric
and is positive. Moreover, it reaches its maximum at time r , the change-point.
Therefore, we have to find k, where the process Zk, 1 5 k < n, reaches its maximum.
Consequently, for ûl,* positive, we define 2 . to be the maximum (taken in k) over
dl Zk7s, 1 5 k < n, Le.,
2, = max C C h ( X i , X j ) , lCk<* l<i<k k< j<n
and as a n estimation for the changepoint r we define
Since (3.1.24) is easier to handle than (3.1.22), more is known when we have an
3.2. Antisymmetric Kernek 28
antisynzmetrac kernel than about a symmetn'c kernel: It is clear that we will need
to impose special conditions on the expected value using a syrnmetric kernel, such
that the maximum again wilI be reached at time r.
We mention that the moments under the null hypothesis of no change may be
derived from the formulae under the alternative. Since
and
and if we define
then the previous results reduce to the corresponding ones in Section 3.1.1.
3.2 Antisymmetric Kernels
We consider the processes Zk, 1 5 k < n, n 2 2, where the kernel h is antisymmetric
as in (2.2.3). By using the notations and assumptions from Sections 3.1.1 and 3.1.2
here, n7e give some mell known asymptotic results (cf. Csorgo and Homith (1986,
1988b, 1997)) under Ho (cf. Section 3.2.1) and HA (cf. Section 3.2.2) as weU as the
time of change r (cf. Section 3.2.3).
3.2.1 Asymptotic Results under Ho
we wish to study the Iimiting bahavior of the stochastic process, as n -t w,
3.2, A n t i s m e t r i c Kernels 29
in the supnorm under the null hypothesis of no-change.
The asymptotic behavior of Dk will be derived fkom the following redvction prin-
ciple (Lemma 3.2.1). It is a consequence of Janson and Wichura (1983, Theorem
2 .l) and git-en explkit$ by Huse (1988, Lemma 2.1.6).
Lemma 3.2.1 (Huçe, 1988) Let h be an antisymmetric kernel and uk as in (3.l.M).
Then under Ho the followihg statements hold true as n -t CO:
k
max I z ~ ' - C ( k - 2 i + i ) T L ( ~ ~ ) l = OP(n), 1SkSn i=1
The basic idea for the proof is given by the above mentioned Jançon and Wichura.
Tbeir results are stated in the context of stochastic area integralS. We will state
their theorem (see Theorem 4.2.1) and d l diçcusç how this lemma follows from
their theorem in the context of at most two change-points in Chapter 4.
Janson and Wichura do not present a detailed proof. A detailed proof for this
lemma is given by Huse (1988, Lemma 2.1.6). She proves that the sequence of
randorn variables
is a martingale. Using the notations of Billingsley (1986, Section 35), we quote:
Definition 3.2.1 Let X I , X2, . . . be a sequence of random variables on a probability
space (0, F, P) and let f i , F2, . . . be a sequence of 0-fields in 7. Then the sequence
{(X,, 3,) : n = 1 ,2 , . . .) is a martingale if the following four conditioins hold:
If instead of 4. we have
then the sequence {(X,, Fn) : n = 1,2, . . .) is calied a subrnaltingale.
Having shown that 2; is a martingale sequence, and using the Hcijek-Rényi
inequality (see Shorack and Weber (1986, Appendix A)), Hue (1988) deduces
which shows that maxl<h<, (2; 1 = Op(n) . The other two statements of Lemma 3.2-1 - - are proved in a similar way. As a consequence, we have (cf. Corollary 2.1.7 of Huse
(1988))
Corollary 3.2.1 (Huse, 1988) Under the same conditions as in Lemma 3.2.1 we
have as n + w
Pioof of Corollary 3.2.1 Since Ük = z?) - ( ~ f ) + ~ f ) ) , 1 5 k < n, we have
that
k n = lok<n max IzL3) - (zf '+~f) ) - (nzh(xi) -kx&(xi)}( .
i=L i=l
We add and subtract specid tenns to apply the previous lemma. Accordingly,
we use the previous lemma and get the desired result, namely
We repeated here Hue's (1988) proof of Corollary 3.2.1 for the sake of demon-
strating the usefuleness of the reduction principle. Namely, Ük iç approximated by
sums of i.i.d. r.v.'s for which there are many Iimit theorems available.
Now we study the asymptotic behavior of G i , 1 5 k < n, in the supnorm. As
mentioned before the followi.ng results are due to Csorg6 and Horvat h (1988b, 1997).
Theorem 3.2.1 (Cs6rg6 and Homith, 198%) Assume that Ho, (2.2.3), (3.1.4)
and (3.1.10) hold. Then we can define a sequence of Broumian bfidges {B,(t), O < t 5 IlnEN SUC^ that, n + ûû,
3.2. Antisymmetric Kernels 32
where
for each n and with B being a standard Brownian bridge.
A proof is given by Csorg6 and Horvath (1988b, Theorem 4.1). They show that
n ~ f . ~ &(xi) may be associated with n W([nt]) and k Cr='=, h (xi) with [nt] W (n) , where W(=) is a standard Wiener process. This implies the theorem by uçing Corol-
I lary 3.2.1 via multiplying both sides of its statement by m. Theorem 3.2.1 implies that under Ho, as n + oo,
This means that the supfunctionals of IÜn(t)l converge in distribution to the s u p
functional of a Brownian bridge. Consequently, we can use tables for the supremum
of a Brownian bridge to accept or reject Ho. It is known that
the well known Limiting distribution of the two-sided Kolmogorov-Smirnov statistic
(cf. Kolmogorov, 1933) which has been widely tabulated.
Tests based on sup,,,,, IÙn(t)( are not sensitive on the tales. Hence, we note,
that a weighted version of Theorem 3.2.1 holds true a s in Theorem 2.1 of Szyszkowicz
(1991) if we replace S([(n+f)t])- tS(n) c by Ün(t) . In particular, by using the weight
q(t) = (t(1 - t ) log log l Y 2 7
t ( 1 - t )
we have, for example, that, as n + CQ,
and we cm use tables for sup,,,,, (cf. Eastwood and Easixood (1998)) and
3.2. Antisymmetric Kernek 33
jatol reject Ho for Iarge values of supO<t<I 4 ( f ) .
3.2.2 Asymptotic Results under HA
Cs6rg6 and H o d t h (1988b, Theorem 3.1) studied the asymptotic behavior of Ük
under the alternative HA and proved the following theorem.
Theorem 3.2.2 (Csôrgii and Horv&th, 1988b) Assume that (2.2.3), HA hold and
where 7 = ~ ( n ) := [nX], O < X < 1, and logt x = log(x V 1). men, as n -t m,
We note that Theorem 3.2.2 also holds if we replace condition (3.2.4) by condi-
tion (3.1.19). This theorem follows fkom Theorem 3.3.2, where h is assumed to be
symmetric, by using the fact that = = 0.
Theorem 3.2.2 implies the consistency of tests based on {Ü[(n+i)tl,o<t<~)) - where
Ük = Zk, 1 5 k < n. Assuming second finite moments as in (3.1.19) such that the 1 - results under Ho hold then we can reject Ho vs. a, when supo,,,, IU[(n+l)tl[
is large and OIv2 # 0.
3.2.3 Estimating the Time of Change
We have seen in Section 3.1 that an antisyrnmetric kemel depends ody on 0L,2 but
not on QI or 02. Hence, if is positive we define
since the point where Zk: 1 C, k < n, reaches its unique maximum is at k = T y as it
cm be seen in Figure 3.2.1.
Figure 3.2.1: The iimiting function ü) ( t ) with = 10 takes its maximum value of 2.2 at t = X = 5. Note: The x-axis denotes t and the y-axis denotes 2iA(t).
Otherwise, if is negative, we define
min {k : Z . = min z,) l<m<n
since the point where Zk, 1 5 k < n reaches its unique minimum is a t k = T .
P In practice $14 is ~ n h ~ m , but since $Z[(n+i)tl = &ü[cn+l>ti -) üx(t), &Z[(n+i)tl
is likely to exhibit whether WC have a --type or -type function. E-g., if we have a
- - m e fünction then we take .î to estimate r. Ferger and Stute (1992) showed that
î and i respectively are strongly consistent estimaton of T and Gombay (1998) gave
the asymptotic distribution of rnaxl<k<, Zk and that of T - 7 under the alternative - HA as we wili discuss in Section 3.3.3.
Now we are interested in the nd-distribution of î which may be descnbed as the
distribution of the argument of the maximum of Zk, 1 5 k < n. From Theorem 3.2.1 + we know that Un(t) converges weakly to a Brownian bridge. Hence, ; converges in
distribution to the time where a Brownian bridge reaches its maximum and it follows
3.3. Symmetric Kernels 35
from Birnbaum and Pyke (1958) (cf. Shorack and Wellner (1986, p. 385)) thzt the
time where a Brownian bridge reaches its maximum is a Uniform(0,l)-distributed
raadom variable. Hence, under Ho, .î takes every integer value from 1 to n with the
same probability as stated in the following theorem by Cs6rgo and Homath (1997,
Theorem 2 -4.14) :
Theorem 3.2.3 (Csorgo and Homith, 1997) We assume that Ho, (2.2.3), (3.1.4)
and (3.1.10) hold, then
Of course, using simila arguments as before we get a similm result for the
minïmïzer i which may be stated in a corollary as follows:
Corollary 3.2.2 We assume that Ho, (2.2.3), (3.1.4) and (3.1.10) hold, then
3.3 Symmetric Kernels
We consider the processes Zk, 1 5 k < cc, n 2 2, where the kernel h is symmetric as
in (2.2.2). By using notations and assumptions from Sections 3.1.1 and 3.1.2 we give
some well known results (cf. Csorgo and H o d t h (1997)) under Ho (cf. Section 3.3.1)
and Ha (cf. Section 3.3.2). F'urthermore, we will investigate under what assumptions
we may define an estimator for the time of change (cf. Section 3.3.3) as in the
antisymmetric case in Section 3.2-3.
3.3.1 Asymptotic Results under Ho
We wish to study the limiting bahavior of the stochastic proces, as n + W,
3.3. Symmetric Kernels 36
under the nul1 hypothesis of no-change in distribution. Similarly to the antisymmet-
ric case, the asymptotic behavior of U , d l be derived fiom the following reduction
principle (Lemma 3.3.1). it is a consequence of Theorem 1 of P. Hall (1979) and
given explicitly by Huse (1988, Lernma 2-1-12).
Lemma 3.3.1 (Huse, 1988) Let h be a symrnetric kernel and Uk as in (3.1.13).
Then under Ho the follom-ng staternents hold tme as n + oo:
k
max l ~ f ) - k C h ( ~ ~ ) l = Op(n), L<k<n
i=l
The proof is similar to that of Lemma 3.2.1. Again me can combine the three
statements above and get the following corollary as an immediate consequence.
Corollary 3.3.1 (Huse, 1988) Under the same conditions as in Lemma 3.3.1 we
have as n + cm
We shall see that the limit of {Uk, 1 5 k < n), as n -t cm, is a Gaussian process,
which is identified in the next theorem. Let {I'(t), O < t 5 1) be a Gaussian process
defined by
max lsksn
r(t) = (1 - t)w(t) + t(w(i) - w(t)), O 5 t 5 1,
where W is a standard Wiener process. Since
IEW(t)=0, os tg ,
k TL k
U , - { ( n - k ) x h ( ~ i ) + k ( C Ï t ( ~ ~ ) - xh(X i ) ) } I = Op(n). i=1 i=l i=f
3.3. Symrnetric Kernels 37
we have that
By using the fact that
the variance of this Gauçsian process is
vd'(t) = v a r ((1 - t) W(t) + t(W(1) - ~ ( t ) ) )
= (1 - t ) 2 ~ a r w (t) + t2 (var w (1) - Var W (t) )
= (1 - q 2 t ttZ(1- t )
= (1-t)t, o < t g .
Let {B( t ) , 1 5 t 5 1) be a Brownian bridge. Then we have that
where O 5 t 5 1. Although these expected values and variances are the same,
the two stochastic processes are not the same. This can be seen by checking the
covariance function of these sto chastic processes. Recall that a Gaussian process
is uniquely determined by its coMliance structure. By calcdating the covariance
function of the Brownian bridge {B(t ) , 1 5 t I: 1) and using the facts that
3.3. Symmetric Kernels 38
and
we get that
Cm [B ( s ) , B (t)] = EB ( s ) B (t)
= IE(W(s) - sW(l))(W(t) - tW(1 ) )
= IE(W(s)W(t) - s W ( l ) W ( t ) - t W ( s ) W ( l ) + s tw2(1 ) )
= IEW(s)W(t) - sIEW(1)W(t) -tEW(s)W(l)
+ stlEwZ(1)
= s A t - ~ ( l A t ) - t ( s A 1) + s t ( l A 1)
= s A t - s t - t s + &
= s A t - s t , 0 5 s , t _< 1. (3.3.2)
Similarly, we get the covariance function of our Gaussian proces { r ( t ) , O 5 t 5
11, narnely
3.3. Symmetric KerneIs 39
+ ( 1 - s ) ~ ( ( s A 1 ) - (S A t ) )
+ &((I A 1) - (S A 1 ) - (1 A t ) + ( S A t ) )
= ( 1 - ~ ) ( 1 - t ) ( ~ ~ t ) + s ( I - t ) ( t - AS))
+ (1 - s ) ~ ( s - (S A t ) ) + s t ( 1 - s - t + ( S A t ) )
= ( S A t)((l - ~ ) ( 1 - t ) - ~ ( 1 - t ) - (1 - s) t+ s t )
+ s(l - t)t + ( 1 - s ) t s + st(l - s - t)
= ( 1 - 2 s ) ( I - 2 t ) ( s A t ) + ( 3 - 2s- 2 t ) s t ,
OSs , t _< 1. (3.3.3)
Since the covariance functions of the Gaussian processes {I'(t): O 5 t 5 1 ) and
{B(t) , O 5 t < 1) are not the same, the two processes are also not the same. Hence,
we are now in the position to state the next theorem which is due to Csorgo and
Homath (1988b, 1997).
Theorem 3.3.1 (Csorg6 and Horviith, 1988b) Assume that Ho, (2.2.2), (3.1.4)
and (3.1.10) hold. Then we can define a sequence of Gaussian processes {r,(t), O $
t < l)nEN SUC^ that, 0.S 72 + 00,
and, for each n, we have
A proof is given by Csorgo and H 0 4 t h (1997, Theorem 2.4.1). They use Corol-
lary 3.3.1 and the fact that we can define a Wiener process {W(t), O 5 t < ca} such
that
Theorem 3.3.1 Mplies that under Ho, as n -t co,
that is to Say the supremum of IU,(t)l converges in distribution to the supremum
of the absolute value of the Gaussian process r(*) as in (3.3.1). One can compute
procentiles of its distribution function by producing tables for the latter random vari-
able. Then we can use those tables to reject Ho for large values of supo<t<i IU,(t)l.
Tests based on sup,,,,, [U,(t)l are not sensitive on the tales. Hence, we note,
that a weighted vernon of Theorem 3.3.1 holds true as in Theorem 2.1 of Szyszkow-
icz (1991) if we replace S([(n+l>t])-tS(n) c by UJt) and the Brownian bridges by the
Gaussian process r,(=). In particular, using the weight function
*(t) = (t (1 - t ) log log Y*? t(1 - t)
we have, for example, that, as n -t oo,
SUP IUn(t)l IW I + sup -.
o<t<i q(t ) o<t<i q(t)
Producing tables for the distribution function of the random variable supo,,,,
is desirable for use in testing for Ha.
3.3.2 Asymptotic Results under HA
Cs6rg6 and Horviith (1988b, Theorem 3.1) studied the açymptotic behavior of our U-
statistics based process under the aIternative HA and showed the following theorem.
Theorem 3.3.2 (Csorgo and Ho&th, 1988b) Assume that (2.2.2), HA hold and
Wlh(+~l, X[n~l+i)l log+- (Ih(X[nyJfny+i)l)) < (3.3.7)
3.3. Symmetric KemeIs 41
zuhe7e T = ~ ( n ) := [nX], O < X < 1, and log% = log(x v 1). Then, vith Zk as in (3.1.1), we haue, as n + w,
Again we mention that the theorem &O holds, if condition (3.3.7) is replaced by
condition (3.1.19). Sirice the way of proving this theorem wiU be of interest when
having an alternative of more than one changes, we give a proof of the theorem. It
is similar to the ones given by Csôrgo and Horviith (1986, Theorem 3.1, and 1988b,
Theorem 3.1) and Huse (1988, Theorem 2.3.7).
Proof of Theorem 3.3.2 We have r = r(n) = [dl, 1 9 r < n, the single change-
point under the alternative HA. We put m = [(n + l)t] and first assume that
l < m < ~ .
In view of Theorem 2.6.1 by Koeffding (1961) and Theorem 2.6.2 by Sen (l977),
we have to change the summation of Z,, 1 5 m 5 T, such that we can apply these
theorems on U-statistics and generalized U-statistics. We have
We have just split Z, into three parts, where the last two are now of the forms
of generalized U-statistics. We also have to rewrite the fkst part for the sake of
applying any of the previously mentioned theorems. To do this, we look a t the
summation meas of the following sums
3.3. Symmetric Kernels 42
Figure 3.3.2: Summation Area
where An is the summation area we want t o change. Figure 3.3.2 shows the areas
A,, A:*) and .G3) in cornparison to that of A:) and we can see that
Note that now Aii), AL^) and 43) are U-statistics without their respective nor-
malizing factors. In these U-statistics the underlying random variables are from the
same distribution. Therefore, (3.3.8) becomes
where An4) and A;') deno te the last two summands in (3 -3.8). We mention again that , except for their missing normalking factors (see (2.3.3)), A L I ) , 42) and An3) are (non-
degenerate) one-sample U-statistics and A?) and 45) are generalized two-sample
3.3. Symmetric Kernels 43
U-statistics. Hence, Hoeffding's Strong Law of Large Numbers (SLLN) applies and,
as n + oo, it yields
This together with
Similady, we obtain that
When looking at 43), we see that we can not immediately apply Hoeffding's SLLN
theorem, since the summation is taken over m + 1 5 i < j 5 r and we would need
a 1 to start with, instead of m + 1. But since XI,. . . , X, are i.i.d. r.v.'s , we have
where r = ~ ( n ) := [nA] and m = m(n) := [(n + l)t]. Using now Hoeffding's SLLN
theorem and taking (3.3.13) into consideration we get the following convergence in
probability result :
g4) and 45), normdization
are generalized two-sample U-statistics, except that the appropriat e
factors are missing. Therefore Theorem 2.6.2 by Sen (1977) applies
3.3. Symmetric Kernels 44
and we get
Since 1 - - 1
we obtain T (n - T ) [nX] (n - [ d l ) '
and similarly, we get
Finally, as n -t co, we arrive at
= t ( A - t)01 + t ( l - A)B1,2, t E [O, A], (3.3.18)
and this proves the theorem for 1 5 m 5 T. The proof for r < rn 5 n is similar and
hence omitted- O
Theorem 3.3.2 can be uçed to study the consistency of tests based on (Ul,+llq, O 2 t < 11, where Uk = Zk - k(n - k)0 , 1 5 k < n. Assuming second finite moments
as in (3.1.19) such that the results under Ho hold, then we can consistently re-
ject Ho vs. a, when sup0,,,, -& IU[(nii)tll is large except in the case where
B1 = O2 = el,* = 0.
3.3. Symmetric Kernels 45
3.3.3 Estimating the Time of Change
IR Section 3.2.3 we considered an estimator for the time of change for an antisym-
metric kernel. We either used the unique maximum or the unique minimum of Zk,
1 < k < n, to estimate the change-point T = ~ ( n ) := [dl, O < X < 1. In the
symmetnc case we will have similar results, but we will have to impose conditions3
on BI, O*, 0i2 and A, if we were to have that ux(t) fiom Theorem 3.3.2 should have a
unique maximum or
(cf. Theorem 3.3.2)
Differentiating with
minimum respectively at A. To do this, we consider the function
respect to t we get the derivative of uA(t), namley
furthermore, we get the second derivative of ux(t), Le.,
which tells us that ux(t) is a concave function if and O2 in (3.1.17) are positive.
Recall that O1 # O and O2 # O, since we are dealing with symmetric kernels. Using
the fact that a concave function is a function of the form n, we have to search for a
maximum. If both are negative, then ux(t) is a convex function, which is a function
of the form -, and we have to search for a minimum. So we defme
-
3Cs6rg6 and HorvAth (1998, Section 2.4) and Ferger and Stute (1992) suggest to impose condi- tions, but do not give them expiicitly.
3.3. Symmetric Kernels 46
and
T = [d] = min{k:&= min z,), L<m<n
Let us first consider the case when Oi and axe positive. We know that
u~(0) = uA (1) = 0, and U A (A) = X(1-
but we don't know, whether the maximum is taken at t = X or at O < t-, < X
and/or a t X < tmG2 < 1. It could happen that the maximum is taken at t = tmm,
or t = tmoz2. Figure 3.3.3 shows the graph of the limiting function (t) where the
maximum is taken at t = tW2 and not at t = A, and similarly, Figure 3.3.4 shows
the the graph of ux(t) when there are two local maxima at t = t-, and t = tmm,.
Figure 3.3.3: The limiting function u~ ( t ) with el = 1, O2 = 2 and 8y2 = 3 takes its 10
maximum value of 0.55 at t = 0.475.
Hence, we have to restrict ourselves to have a unique maximum at t = A only.
Similarly, if Bl and O2 are negative, we have to restrict ouselves to have a unique
minimum at t = X only.
3.3. Symmetric Kernels 47
Figure 3.3.4: The limiting function u~ (t) with BI = 3, B2 = 2 and 81,2 = 1 takes its 1
2
maximum value of ) at t = 3.
If O1 < O < B2 or O2 < O < Bi then we have a convex-concave or concave-
convex function respectively. Hence, we are looking for a minimum or maximum
respectively. Again, we have to restrict ourselves. We mention that in these cases
there could be also a maximum or minimum respectively.
In view of hding a unique maximum we look a t all possible extreme values of
ux (t). We get that
- Of course, we would like to have that X = t-, - t-,. Hence, we have to
investigate, if there is a maximum before andjor after the change-point A, narnely if
If now X < t-, is satisfied, then we have to show that the function value of t-, is
3.3. Symmetric Kernels 48
greater than the function value of A. We calculate ux(t-,) by plugging the second
part of (3.3.20) into (3.3.19). We consider
and show that the following inequality holds,
where we assume that 82 > O. Otherwise the inequality is never safisfied, except
when we are looking for the minimum. We have an '=' sign in (3.3.22), if
Similarly, we get that
3.3. Symmetric Kernek 49
and
where we assume that O1 > O. We have an '=' sign in (3.3.23), if
By using the definition of t-, and tmw, in (3.3.20), (3.3.21) leads to
If (3.3.24) is not satisfied, then uA(X) is the unique maximum. Therefore we have
to restrict ourselves to the following choice of A, narnely
where O < el < oo, O < O2 < DC> and 181,21 < w, to have a unique maximum at A.
If 81 < O < û2 or û2 < O < el then we may replace the right endpoint in (3.3.25)
by 1 or the left endpoint by O respectively. Furthemore, (3.3.25) needs to hold true
when O > 0, > -00, O > O2 > -cm and 181,21 < mY to have a unique minimum at A.
Using these conditions on el, û18 and X fkom (3.3.25), we are now in the
position to state a theorem by Ferger and Stute (1992) for symmetric kernel functions
that are also assumed to be bounded.
Theorem 3.3.3 (Ferger and Stute, 1992) Under suitable assumptions4 on the sym-
metric (or antisymrnetn'c) bounded kernel function h(x, y), such that X is the unique
'Note that these arsumptions, which were not given by Ferger and Stute (1992) explicitly, are the ones in (3.3.25) when O < Oi < ou, O < < oo and lûrV21 < cm. As mentioned above, similar versions of (3.3.25) hold when 01, and di ,2 are chosen in a dXerent way-
3.3. Symmetric Kernek 50
Figure 3.3.5: The limiting function u~ (t) with 81 = 1, = 2 and = 3 takes its 7 maximum value of 0.63 at t = X = E.
rnazZmizer (minimizer) of UA, and u i ( t ) # O, t # A, we have that, as n -+ m,
which implies that is a strongly consistent estimator of A. The same is true foi X.
Sumrnarizing, we need to impose the special conditions fiom (3.3.25) on the
bounded kernel h via el, & and and we will have a strongly consistent esimator
for the time of change (cf. Theorem 3.3.3). In practice, we have to check whether the
maximum or minimum respectively k taken in the latter interval. Moreover, since
the indicated expected values of h are usually unknown, i. e., 81, Q2 and are
unkaown, we have to estimate them from the data by uçing appropriate estimators.
If in (3.3.25) 81,2 + CO, i. e., when Olt becornes very large (by our assumptions
QI, O2 and OlY2 are dl finite), then the intenal in (3.3.25) converges to [O, 11. Tben
the estimators î and i work out fine. If on the other hand B1 ++ oo or & + co, then the inertval in (3.3.25) converges to either [O, O] or [l, 11, which means that
3.3, Symmetric Kernels 51
we can not use the estimaton at d l . Therefore, the optimal case is when we have
l&l, << 101,21 < oc. If O , , is between the others, then the i n t e d will just cover
a small area on the left or right half of the interval [O, 11.
Gombay (1998) defines
.T. = F(n) = min{k: & = max U,) l<m<n
as an estimator of r and considered the distribution of 7 under HA. She shows, for
example, that for a symmetxic non-degenerate kernel with finite second moment and
under some technical assumptions, as n -+ w: under HA we have
and
Moreover, she gives the distribution of T - r which depends on the underlying
distribution function and on the changepoint parameter A. It behaves like the
maximum of a two-sided random wak. In case of an antisymmetric kernel function
h, Gombay (1998) shows that, under HA, maXl<&<nÛk (cf. (3.1.12)) has the same - limiting distribution as rnaxl<k<, Uk . -
Similarly to the antisymmetric case, we would like to find the distribution of .î
under Ho, the nuil hypothesis of nechange. Since Uc is defined by
Zk, 1 5 k < n, reaches its maximum when Uk + k(n - k)B reaches its maximum.
We defhe
3.3. Symmetric Kernels 32
and get by using (3.3.27) that
Moreover, taking the sup on both sides,
But fiom Theorem 3.3.1 we know that under Ho, as n + oo,
which implies that supo<t<l Iu, (t) 1 behaves like supo,,,, 1 , where l? (t) is the
Gaussian process from (3.3.1). Of course, we may add and subtract a term in (3.3.29)
and get that, as n + w,
The latter statement Mplies that for n large
Consequently, via (3.3.28) and the latter statement we conclude that IZ,(t) 1 in t
reaches its rnâscirnurn for large n where Ir(t) + q t ( 1 - t)Ol does. But the latter
expression goes to oo as n -t m, since ?t(l - t ) B + m. Therefore, we do not
know where the maximum of Zn(t) iç taken. We may also think about using a
weighted version of Theorern 3.3.1. Indeed, ushg an appropriate weight function q
in (3.3.28), we get that
3.4. Change in the Mean 53
and as an estimation of T we now define
Zk z m = min k:-= max -1. { di) lSrncn dl) For example, using the weight function
we still have that
and therefore, as before, (3.3.30) goes to m as n -t m. Hence, in order to estimate
T , it seems that we ne& to centralize Zk, 1 5 k < 00, by its mean, i. e., we have
to work with Uk as in (3.3.27). This, however, wül not give us the desired result
anymore, since
max Uk # max Zk. l<k<n lsk<n
In case of an antisymmetric kerne! we had an '=' si- in (3.3.31) which made it
possible to use Theorem 3.2.1 and get the distribution of .î under Ho. But in case of
a symmetric kernel the fact that (3.3.31) holds does not aliow us to use Theorem 3.3.1
to give the distribution of î under Ho. Hence, the distribution of .Z under Ho is still
unknown.
In principle, we may, however, compute the distribution of T in (3.3.26) under
Ho: since (3.3.29) implies that [Un ( t ) 1 reaches its maximum where Ir@) 1 does. Hence + - n converges to the distribution of the argument where sup,,,,, Ir(t)l is taken.
3.4 Change in the Mean
We are to test the no-change in the mem null-hypothesis
3.4. Change in the Mean 54
Ho r X I , . . . , X, are independent identicdg distributed random variables
~ i t h E X ~ = ~ a n d ~ < c ? = V a r X ~ < CO, I s i S n ,
againçt the at most one change in the mean alternative
HA : XI , . . . , Xn are independent random variables and there is an in- - teger T, 1 < T < n, such thatEXl = ... = BXr # EX,I = . .. -
EXn and O < 9 = VarXi = .. . = VarX , < m.
Taking simulated values of the therein indicat ed (k, Xk) , Figure 3.4.6 gives an ex-
ample where a change in the mean occurs while the variance stays the same.
Figure 3 A.6: The data X I , . . . , are i.i.d. N(1,1)-distributed and Xmi , . . . , X~OOO are 'i.i.d. N(4,l)-distributed.
If we define the antisymmetric kernel
3.4. Change in the Mean 55
then the stochastic process Zk from (3.1.1) may be written as
which rnay be interpreted as comparing the meaa before the unknown time k, 1 5 k < n, of a possible change in the mean to the mean after change. Under Ho this
difference should be fluctuating near around zero, while under the alternative of
one change, the stochastic process Zk, 1 5 k < n, will have a maximum or minimum
at the time k = T.
When testing for more change-points we will use a different argument which in
our present situation is similar to the following. Observe that under Ho
then we will see that testing for a change in the mean can be Uustrated by using
a geornetrical argument. Consider the linear function m(t) := t, t E R, which joins
under Ho al1 the points (k, $ E { s ( ~ ) ) ) , k E N, if ,u # 0, and it joins all the points
(k, E{s(~))), k E N, ifp = O.
Without loss of generality let p = 1. Shen in Figure 3.4.7 we join al1 the points
(k, l E { ~ ( k ) ) ) , k E N, via the straight line m(t) = t. We pi& one k E (1,. . . , n-1):
and draw a hon~ontal line starting fiom B := (O, IE{s(~))), containing the point
(k,lE{~(k))), and with terminus, C := (n,lE{~(k))). We draw a vertical line
fiom the terminus and intersect the t-axis. We denote this intersection by D :=
(n, E{s(o))) , where me define S(0) := O. In this way we construct a rectangle,
Figure 3.4.7: A geometrical interpretation of E{nS(k) - kS(n) ) = O under Ho.
denoted by ABCD (see Figure 3.4.7), where A := (O, E{s(o))), with length n and
height E{S(k) ).
Reflecting each point of the rectangle ABCD around the 45 degree line rn = t, we
get a new rectangle AEFG, where A := (O, E{s(o))) is the reflection point of itself,
E := (O, E{s(~))) is the reflection point of D := (n, IE{s(o))) , F := (k, IE{s(~)))
is the reflection point of C := (nt E{s(~))) and G := (k, IE{s(o))) is the reflection
point of B := (o,E{s(~))), with length k and height IE{S(n)). Under Ho, both
rectangles have the same area. Conçequently, we have that
3.4. Change in the Mean 57
Thus, in principle, for each given k, 1 5 k < 72, we constnicted an unbiased
estimator of zero assuming that Ho is true. We may also Say that, viewed this
way, testing for one change in the rnean results in comparing areas of two different
rectangles with each other.
Inçtead of defining m(t) = t we may also d e h e m,(t) := pt which joins ail
the points (~,IE{s(~))), k E N. In a similar vein as before we get the çame
reçults. Moreover, under Ha the siope of the function m, will change exactiy at k
if E { S ( k ) ) = lE{S(r)) = PT. Hence the two areas wiU not be the same. Moreover,
the maximal difference between those two areas will be when k = T.
Continuùig with using the kernel h(z, y) = x - y to test for at most one change
in the mean, under & and 1 < i , j 5 n,
Furthermore, assuming that
we have
i. e., we assume that the variance is finite. We define
3.4. Change in the Mean 58
= t - p .
lEh2(x,) = JE{(X1 - = 2,
it follows from (3.4.3) that
and, in addition t o (3.4.3), we also assume that
Since h is an antisymmetn'c kernel, we know from Theorern 3.2.1 that under 6, as
n -03,
where for all O 5 t 5 1
We recall also for each n
with B being a standard Brownian bridge.
L\çsuming the same conditions as in Theorem 3.2.2, we obtain that under HA,
a s n + m ,
P sup Iün(t)l -+ 03.
O<t<l
Therefore we consistently reject Ho, if s~p~..~,, IÜn(t) ( becomes too big and me have
to think about 'what is too big?'. On account of (3.4.4) it foIlows that under Ho, as
n + w ,
This in turn means that we can use tables for the supremum of the absolute value
of a Brownian bridge to accept or reject Ho.
Since the latter test statistic is not sensitive on the tails, we may also use a
weighted version of (3.4.4) by using, for example, the weight function
1 \ 1/2 *(t) = (t(1 - t) log log
t(1 - t ) J
Consequently, with this q, as n + m,
IU&)l n sup - IW) 1 + sup - o<t<i q( t ) o<t<i q( t )
The variance O* in (3.4.5) is usually unknown. Consequently it has to be esti-
mated on the basis of the same random sample. One possible way of estimating 2 is via the sample variance
3.4. Change in the Mean 60
qrhere x,, = Xl*-*-+Xa n , the sample mean. According to Cs6rgo and Horvath (1997,
Section 2.1), the use of the so c d e d pooled variances
where
is preferable to that of (3.4.6).
By using the pooled variances, we consider the possibility that there be a change-
point in the data. Of course, this may also effect the variance. Therefore, we
compute the variance before and after each time k, any of which codd be a possible
change, instead of the variance of all n data, since the latter does not take a possible
change into consideration.
Csorgo and Horviith (1997, Section 2.1) show that by using the Law of Large
Numbers, weak uniform consistency of the sequence of estimators &lltl for esti-
mating 9 can be established, narnely, as n + oo,
Since ~,,1),1 " a consistent estimator for 02, cn(t) in (3.4.5) may be estimated by
and our previous result korn (3.4.4) cames over. In particular, as n + CO,
3.5. Change in the Variance 61
and, as n + w,
imply that we can still use tables for the distribution of sup,,,,, IB(t)l and reject A
the nul1 hypothesis of no change in the data, if our test statistic supo..t<i IÜ,(t) 1 becomes too big.
Summarizing, the test statistic sup,,,,, [ ~ , ( t ) ( may be consistently used to test
Ho against HA-
Moreover, if & is rejected, then the t h e of change may be estimated by
0,
3.5 Change in the Variance
We are to test the no-change in the vanance hypothesis
Ho : X1, . . . , X, are independent identically distributed randorn variables
u n ' t h E X i = p a n d O < $ = V a r X i < ~ , 1 s i < n ,
against the ut most one change in the variance alternative
& : XI, . - . , X,, are independent random variables and there is an in-
teger r , 1 5 T < n, svch that VarXl = . . . = VarX, # VarX,+l =
. . . = VarX,, O < VarX,, VarXTil < CO, and EXi = . . . = E X , =
P-
Taking simulated values of the therein indicated (k, Xk) , Figure 3.5.8 gives an ex-
ample where a change in the variance occurs while the mean stays the same.
3.5. Change in the Variance 62
Figure 3 -5.8: The data XI, . . . , are i.i.d. N(0, 1)-distributed and X701i . . . , XIOOO are i.i.d. N(O,4)-distributed.
Let us assume throughout this section, that 6 holds. If p is known, then the
problem seems to be very simple. Testing Ho against HA means that we are looking
for the change in the mean of (Xi - P ) ~ , 1 $ i 5 n. So, similarly to Section 3.4, we
look at the ciifFerence of
which under Ho shodd be fluctuating near zero. Hence, we consider the stochastic
process
If p were hown then this process is like Zk of Section 3.4 and assuming appropriate
moment conditions, a test statistic can be based on its supfunctional just like here.
Indeed, assuming that O < a2 = Var(X:) < m the results of Section 3.4 apply in
thiç context as well. Moreover, Gombay, Horvith aiid HuSkov5 (1996) also show
that estimating p if need be by its - under Ho - consistent estimator xn, the juçt
mentioned asymptotic results of Section 3.4 continue to hold tme in this context
as well, i. e., just Like as if p were known. When p iç unknown, as an alternative
approach, we may also use the symmetric kemel (cf. Example 2.3.2)
where under Ho and Xi # Xi,
Therefore, this h is an unbiased estimator for the variance o2 of the Xi's. Further-
more we have to assume that
and, on account of IE(Xi - p) = O and oz = E(Xi - P ) ~ , we have that under Ho and
xi # xj,
3.5. Change in the Variance 64
1 = - [ 2 ~ ( ~ i - p)4 + ~ ( v ~ T x ~ ) ~ ] .
4
Therefore our assumption in (3.5.1) becomes
s (t) = E(h (t, X*) - u2} 1
it follows from (3.5.1) that
3.5. Change in the Variance 65
and, iii addition to (3.5.l), we also assume that
Since h is a symmetric kemel, we know from Theorem 3.3.1 that under Ho, as
n+Oo,
where we define Un(t) as in (3.1.15) for O < t < 1 to be
1 un@) = - on3/2 U[(n+ l) t~ 1
with
and
where
for each n.
Similarly to Section 3.4, where we investigated the change in the mean, we
now have to estirnate the usually unknown parameters c2 and a*. Again we c m
3.5. Change in the Variance 66
estimate o2 by the usual estimator of the variance from (3.4.6), or by the pooled
Mnance âS(,+l)tl hem (3.4.7). The second one incorporates the fact that there may
be a change in the data better thaa the usual estimators for the variance, or
2 (Xi - CsorgO and Horv5th (1997) suggest the n-1 - - vaxiances.
We also have to estimate s2. Since
and B(Xl - p)* = VarX1, we estimate the second part in (3.5.4)
the pooled variances 6&,lltl. Tb estimate E(Xl - ,!L)~, we eithe~
for the 4 t h moments
use of the pooled
again by ô: or by
use the estimator
qrhere xn = X'+.--cxn n , or the pooled 4-th moments, since they similar to the pooled
variances feel a possible change better. To do so, define
1 Kn+l)tI n n (Ci=, (Xi - x[(n+i)tlI4 + C;=[(n+l)tl (Xi - ~ i ~ + ~ ) ~ ~ ) ~ ) 9 O 5 t < zl mi,+,,t] = i Z=l(xi - %14, - < t ~ l , n+L -
where x ~ ( , ~ ) ~ and Xi,+,)tl are defined in (3.4.7). By using the Law of Large Num-
bers, weak uniform consistency of the sequence of estirnators 7nt(n+,)tl for estimating
IE(Xl - p)* can be established.
Hence, we estimate Un@) in (3.5.3) by
and the test statistic s ~ p ~ , , , ~ [Ûn(t)l may be used to test Ho against HA. This is
due to the fact that, uniformly in t E ( O , 1) , Û,(t) is a consistent estimator of U,(t),
3.5. Change in the Variance 67
hence the results for Un@) ca,rry over. For instance, assuming the same conditions
as in Theorem 3.3.2 we know that under HA, as n -t co,
and fiom (3.5.2) it follows that under Ho, as n -t oc,
on account of ~up,,,,~ lÛ,(t) - Un(t) 1 = O&). This meam that the test statistic
s ~ p ~ < ~ < ~ 1Ûn ( t ) 1 converges in distribution to the supremum of the absolute value of
the Gausçian process r(t) and we can hope t o use tables for the distribution of the
supremum of this Gaussian process to accept or reject Ho.
Moreover, if Ho is rejected, then the t h e of change may be estimated by
i = [ni] := min {k : Zk = min z,), if Zk, 1 2 k < n, is - -shaped. lsm<rt
Let us assume that B I , 02, e13 > O (the other cases are similar). Shen we have
the unique maximum exactly at the change-point T , if the change-point lies in the
following interval (cf. (3.3 -25)) :
where r = ~ ( n ) = [nA], 8, = IE ( h (xi - 1 5 i c j 5 7: û2 = IE ($ (xi - x~)~), T < i < j 5 n, and @1,2 = IE($(x~ - x~)~), 1 5 i 5 T < j 5 n. These
parameters depend on the change-point T , hence we have to know them a prion to
check whether (3.5.7) is satisfied or not.
3.5. Change in the Variance 68
Suppose we estimate 81, and Bi,z by using î, the point where the maximum
is reached (or i respectively). Hence, we estimate O1 and O2 by the mual estimator
for the variance, the so-called second moment. To estimate the third parameter
we use the fact that OlYî = (VarXi + - 2BXiEXj + (VarXj + (EXj)2)-
Considering the n-shaped case, the respective estirnators are
This iç due to the fact that the estimators dl, & and may be very bad, if î is
far apart from r. Therefore it is necessary to know the parameters el, and 01,*
a prion. Otherwise the procedure of estimating the unknown changepoint T by î,
the point where the stochastic process Zk, 1 5 k < n takes its maximum, will not
be reliable. Moreover, Theorem 3.3.3 of Ferger and Stute d o s not hold, since we
do not have a unique maximum at k = T. As mentioned before in Section 3.3.3, the
above interval is very g ~ o d , if IdIl, 1021 <<
In conclusion, we should also say that the 'change in the mean' approach of
Gombay, Horv5th and Huâkova (1996) to testing for changes in the variance via the
stochastic process (cf. also Section 2.8.7 in Cs6rg6 and Horviith (1997))
appears to be preferable to that of (3.5.6) that we have just discussed.
"Courage is what it takes to stand up and speak.
Courage is also what it takes to sit down and listen."
- Winston ChurchdZ
Chapter 4
At Most Two Change-points
4.1 Introduction
We are to test the null hypothesis
Ho : XI,. . . , X, are independent identically distn'buted random variables
agaimt the alternative that there are at most two change-points in the sequence
XI, . . . , X, , namely t hat we have
We mention that the alternative HY) dowç US to consider random variables X I ,
X2, . . ., Xn with two changes in the distribution, but not necessarïly involving three
different distributions. For example, we could have a sample where the distribution
before the h t change is the same as the one after the second change, but in between
them we have a different distribution. This is the so-called epidemic alternative as
we shall see in Section 4.6
Suppose we were to test Ho vs. ~ f ) by using a properly nomalized çup-functional
of the stochastic process Z[(n+lltl, O < t < 1, fiom (3.1.1) as a test statistic. Bor-
rowing the notations fkom (4.1.16) and (4.1.17) the in probability limiting function
4.1. Introduction 71
of - $ z ~ + ~ ) ~ ~ under HF) Will be seen (cf. (5.7.2)) to be
Moreover, if we put = = O3 = 0, OlV2 = (1-x~ 102.3 and $1,3 = (A1-A2)e23, then Ai Ar
u ~ ~ ~ ~ ( t ) is equal to zero for each t, O < t < 1, when testing for two changes. Con-
sequently, sup,,,,, 1-1 is not consistent in general, when testing Ho vs. HF). This means that in çome instances the dternative ~ 2 ) will be rejected although it
may be true. Therefore, we have to define a dinerent stochastic process which kind
of 'feels' the possibiliw of two changes. This wül then allow us to study the behavior
of its sup-functiond and suggest a consistent test statistic for testing Ho vs. ~ 2 ) .
Our experience gained from the previous chapter suggests to use a stochastic
process that depends on two time variables. To construct such a tw*time param-
eter stochastic process, we split the given sample X I , . . . , Xn into three blocks and
compare each of the blocks with the other two. This corresponds to the idea of com-
paring the mean before (the first change-point), between (the two change-points)
and after (the two change-points). Hence, we are using a kernel h(z, y) of two va&
ables x and y, and, since we will compare two blocks with each other at a tirne, we
have three dinerent possibilities to do so. Therefore, we define our new stochastic
process in kl and k2 as follows:
4 Introduction 72
In t h way we compare the three blocks (XI, . . . , Xk, ) , (Xkl+l, . . . y Xk2) and (Xk2+1i
-. ., X,j with each other, where kl and k2 v a q &om 1 to n - 2 and 2 to n - 1,
respectively.
We are to study the asymptotic properties (as n -t ca) of the process &,k2 y
1 < kl < k2 < n, which may be expressed in terms of U-statistics. To do so we give
some notations and definitions under the nul1 hypothesis of no-change and under
the alternative of at most two changes.
4.1.1 Notations under the Nul1 Hypothesis Ho
We define
and
Eh2(xi7xj) =: V, i l i < j l n .
We assume throughout the whole chapter that
V < 00,
which of course implies
Computing now the expected value of Zkl,k2, we obtain
4.1. Introduction 73
= ((k2 - ki)ki+ (n - kz)k2)ey 1 < k l < k 2 < n . (4.1.7)
We d e h e , as in the previous chapter,
and again we assume that
We centralize ZklS;, by itç mean, and consider the process
where the kernel function h in ZkiYk2 is syrnrnetrie. For an antisyrnmetrie kernel we
define
Moreover, we can write &,,k2 as the sum of four U-statistics, and thus we get
where
4.1. Introduction 74
Similarly, for an antisymmetric kernel we define
where
For further use, via (4.1 -10) and (4.1.13) respectively, we &O define
and
4.1.2 Notations under the Alternative Hf)
Let FC1)(t) = p{Xrl 5 t ) , Fe)@) = lP{X,+l 5 t ) and ~ ( ~ ) ( t ) = P{X,,l - < t ) be the respective distribution functions of the obsenxtions before the first, between
4.1. Introduction 75
the first and second and after the second change, and put
Furthermore, we define
1 := [ n k ] and 72 := [nX2], O < XI 5 X2 < 1. (4.1.17)
Similarly t o (4.1.16), we also define the second moment of the kemel h by
and assume throughout the whole chapter that IEh2(Xi, Xj) is finite for al1 possible
choices of i and j , namely
Moreover, t his implies that
We will also use a weaker assumption than that of the existence of a finite second
moment of h, namely that for random variables from two different distributions we
4.2. Antisymmetric Kerneh 76
have
where log+ x = log(x V 1).
4.2 Antisymmetric Kernels
We consider the processes Zk1,k2, 1 5 kl < k2 < n, n 2 3, where the kernel
h is antisymmetrïc as in (2.2.3). By using the notations and assurnptions hom
Sections 4.1.1 and 4.1.2, we investigate the behavior of these processes under
(cf. Section 4.2.1) and ~ f ) (cf. Section 4.2.2).
4.2.1 Asymptotic Results under Ho
We wish to study the limiting bahavior of
in the supnorm under the null hypothesis of nechange . For the indices kl and k2,
we write [(n + 1) tl] and [(n + l)tz] respectively.
The asymptotic behavior of Ü~(,+l),,l,[(,+l)t21, O < t, < t2 < 1: will be derived
from the fact that we may mite it in tenns of U-statistics as in (4.1.13) which in
turn rnay be repaced by sums of i.i.d. r.v.'s. Furthermore, we may then approxirnate
these surns of i.i.d. r.v.'s by Wiener processes.
We state Theorem 2.1 of Janson and Wichura (1983) in the degenerate case
which, via the proof of Theorem 4.1 of Cs6rgo and Horv&th (1988b) (cf. Lemma 3.2.1
and Corollary 3.2.1 in this thesis), will be the basis for our approach as well.
Theorem 4.2.1 (Janson and Wichura, 1983) X i , X2 , . . . are i-i-d. r.v. 's with corn-
mon distribution. If o2 = lEh2(x1) = O (degenerate case), then
where A[ntl = &i<jc[ntl h(Xi, Xi), h LE an antisymmetric kernel, the Ar 3 are
independent stochastic area processes, and the sum o n the right-hand side of (4.2.1)
converges almost surely unifonnly in t. Morwver, a stochwtic area process is defined
to be the Ito integral A = J; -V(s)dU(s) + U(s)dV(s) , with a two-dimensional
Wiener proeess W (s) = (Fi:;), s 3 O.
We mention that the U-statistics in (4.1.13) are assumed t o have non-degenerate
kemeh h. We may, however, construct kernels h* that are degenerate in the same
context. Recdl that E L ( X ~ ) = E h ( X 1 , X z ) = O and o2 = V U & ( X ~ ) = E ( ~ ( x ~ ) ) ~ > O and let
h(G Y) = h' (x, y) + 6 ( x ) - h(y) .
Sinee h is an antisymmetric kernel, it follows that
where for each x
Hence h* is an antisymmetric kernel and it is degenerate (5** = 1f3e2(x1) = O) as
well. Moreover, for the U-statistic 2:) defined in (4.1.13) we get
which implies that
where now z:(~) is a U-statistic with an antiçymmetric and degenerate kernel h*,
and Theorem 4.2.1 can now be applied. We mention that the * in (4.2.2), as well
as in the following formulas, indicates that we consider the difference of one of the
U-statistics in (4.1.13) and a sum of i.i.d. r d s .
For convenience we let
and show that
Consequently to (4.2.4) we will approximate ni1) (tl, & ) y which is a sum of i-i-à. r-v-'s,
by appropnate Wiener processes and by doing this, we will get a Gaussian limiting
4.2, Ant isynimetric Kernels 79
distribution for a properly normed version of U~~n+l)t11,~(n+l)t21-
As to (4.2.4), we first let
and conclude
Hence, we may now use formula (4.l.l3), and get
Based on (4.2.5), we arrive at the inequality
and proceed t o prove the following reduction principle.
rnax ls[(n+l>till[(n+l)t21<n pr:fki>t1]
max 1 z ( ~ ) lS[(n+i)tl]s[(n+i)t2~sTL Kn+l)tl ld(n+l)tzl
rnax 15[(n+1)~1]~[(n+l) tz]~n )t,],n
Proof of Lemma 4.2.1 The fht statement follows via (4.2.1) and the fact that,
as n + oo,
*O) Z[(n+l)tll,[(n+i)t2~ is a 'Li-statistic (with an antisymmetric
from [(n + l)tl] + 1 to [(n f l)t2] minus a s u m of i.i.d. r.v.'s.
and degenerate
Hence, we have -
kernel)
to shift
the interval [[(n + 1) tl + 11, [(n + i)t2]] to the intenml [l, [(n + 1) (t2 - t l )] J by using
the fact that for i.i.d. r.v.3 Xl,X2,. . . ,Xn
4.2. A~tiisymmetric Kernels 81
for each n. Moreover, by (4.2.1) and as n + oo,
- and we may mi t e supo<t2-tl<i - s ~ p ~ ~ < ~ ~ < ~ + ~ ~ . Since (4.2.7) holds uniformly in tl - and ta such that O < tl < t2 < 1, we have s ~ p ~ ~ < ~ ~ < ~ + ~ ~ - SU^^<^^^^^<^ and me vrive
at, as n + oo,
which proves the second statement in this lemma.
*(3) Similarly, Z~(n+l)t,l,n is a U-statistic (with an antisymrnetrïc and degenerate ker-
nel) £rom [(n + l ) t2] + 1 to n minus a s u m of i i d . r.v.'s. Hecce, we have to shift the
interval [[(n + l ) t2 + 11, n] to the interval [1, [n - (n + l ) tn ) ] ] . Then
for each n, and, as n + CO,
Restricting t2 to the interval (tl, 1) c (0, l), where ti is between O and 1, then the
supremum over this interval will also be O&), heilce, as n + cm,
which proves the third statement in th& lemma.
4.2. Antisymmetric Kernels 82
By usùig (4.2.1) together with (4.2.2) we get, as n + m,
which proves the fourth and last statement in this lemma. 0
As a consequence of Lemma 4.2.1 and (4.2.6) we conclude &O
Corolla-py 4.2.1 Let h be an anttsymmetric kernel and Ù[cn+l>ai,r(n+l)t21 be defined
as in (4.1.13). Under Ho, as n -t m, we have
[b+WiI - max
lS[(n+l)ti]l[(n+l>tz]In Iü[(n+l)ti],[(n+i)t21 - {[(n + l ) t2I j=1 C N X j )
We emphazise that Ü[(n+i)t,l,((n+l)sl is now approximated by s u s of i.i.d. r x ' s
for which there are many limit theorems available. In particular, we now are in
the position to study the asymptotic Gaussian behavior of Ü[~n+l)til,[(n+l)t21 in the
sup-nom. We give the following theorem as well as a detailed proof.
Theorem 4.2.2 Assume that Ho, (2.2.3), (4.1.5) and (4.1.9) hold. Then we can
define a sequence of Gaussian processes {rE(tl, t2)y 0 5 tl 5 t2 < l I n E ~ S U C ~ that,
a s n - t w ,
and, for each n,
where the Gawsian process ïa is defined via a Iinear combination of a standard
Wiener process W as follows:
4.2. Antisymmetric Kernels 83
Proof of Theorem 4.2.2 We defme for each n the two-time parameter Gaussian
process
Indeed, we may mite in temis of a Linex combination of independent increments
of W, namely
The just mentioned independency follows from the definition of W.
Similarly, we conclude that Fa in (4.2.10) is a Gaussian process. Using the fact
that Gaussian processes are uniquely detesmined by their covariance structure and
that l?: (for each n) and ra have the same covariance, it follows that for each n
Having shown that the above defined rg is a sequence of Gaussian processes, we
now prove (4.2.9) via using Corollary 4.2.1 and the fact that we can define a Wiener
process {W (t) , O < t < CO) SU& that (cf. Csorgo and Révész (1981, Theorem S .2.2.1
by Major (1979) combined with (S.2.2.2))), as n + 00,
Hence, bounding above supo<ti<t2<i by supo<ti<l or sup,,,,,, then we also have, as
n + w,
1 1 [(n+lPjI
- SU^ [Y C h(.&) - w(ntj)l = op(1), for j = 1,2. (4.2.11) n1I2 0<t1 < t n < l 0 i=1
4.2. Antisymmetric Kernels 84
Recall the definition of tcL1)(tl7 t2) as in (4.2.3), and dehe
- - - + sup ~ e ) (t17 h) 6n3I2 o<t1 <t2<l en312
where denotes the second expression on the right hand side. Next we observe
that by using the dehitions of nkl)(tl, t2) and rn we arrive, as n + co, at
t2 Z n - - w(nt,)l+ sup - I = C h ( ~ ~ ) - w ( n )
0<tr< t2<1 n1I2 0 j=l
where the op(l) statement follows from (4.2.11). Hence, it is proven that p, = op(l).
To avoid confusion with the notation, we denoted the two-time parameter Gaus-
sian process by ru, where the upper index a indicates that this Gaussian process
corresponds to the case where we have an antisymmetric kernel. In case of a sym-
metric kernel, we will use the upper index $yrn instead. E'urthermore we note that
for O 5 tr 5 t2 5 1
4.2. Antisymmetric Kernels 85
VarP(t1 , t2) = t2 (1 - t l)(l + tl - t2),
and
where to = ro := O and t3 = r3 := 1. Theorem 4.2.2 implies that under Ho
Moreover, this d o w s us to produce tables for the unknom distribution of supo,,,ct,c,
Ira (tl, t2) 1 and reject the nul1 hypothesis of n*change, if ~up, , ,~ <,,,, 1 Ü' (tl , t2) 1 be-
cornes too large.
4.2.2 Asymptotic Results under H!)
where we wiil assume that h(x, y) is a nondegenerate antisymmetric kernel and (4.1.2 1)
holds. Therefore (4.1.21) replaces the stronger assumption of a second fkite m e
ment as in (4.1.19) which was used to derive (cf. Section 4.2.1) the non-degenerate
convergence in distribution, as n + CO,
where the stochastic process un (ti, t2) was defined as
and the limiting Gaussian process ra (tl , t2) as
Although there will be a larger class of distributions (~ ( ' 1 , F(2), ~ ( ~ 1 ) that sat is
fies (4.1.21) than ( U . I g ) , naturally the non-degenerate results under Ho require
that the class of distributions where we are testing for 2 changes satides (4.1.19).
Nevertheless, we will use the weaker assumption (4.1.21) to denve the degenerate
asymptotic resultç under H?) .
The limiting hnction in (4.2.13) WU depend on the location of the (at most)
two changepoints [nX1] and [da], 0 < Al 5 XÎ < 1. Moreover, we will see that
SUPo<ti<t2<i Iün(tï> t2) 1 is consistent and goes to infinity in probabilia as n -t co,
under HZ). The limiting function in (4.2.13) will be derived similarly to the results of Sec-
tion 4.3.2 which deal with symmetric kernels. Since an antisymmetnc kernel may
be viewed as a symmetric kernel with O1 = e2 = O3 = O, we are jumping ahead
and use the results from Section 4.3.2 here. There it is shown that Z[(n+l)til,[(n+l)t21,
O < tl < t 2 < 1, may be split into many double sums, where the sirmmation is taken
over two blocks of r.v.'s that do not contain a change-point. Moreover, after proper
nomalization each of these double sums converges in probability to the appropriate
(cf. (4.3.14) - (4.3.16)).
Since Z[(n+ï)tl),[(n+1)t2~ O < ti < t2 < 1, in (4.1.2) with = [(n + l)ti], [(n+W11 Kn+UtaI 1 < i 5 2, is the sum of the three double s u m ~ ~j=[(,+i)tIl+l h(Xi, Xj),
[(n+L)til n Kn+ l ) t 2 1 n L i &=[(n+1)t2]+1 Xj) and L[(n+,)tI1+, C j=[(n+l)t21+1 h(Xi, X j ) we d e fine for technical purposes
4.2. Antisymmetric Kernels 87
where 0 5 xl < 2 2 5 2 3 $ 2 4 5 zs < 1 6 5 ZT x g < 1. Hence (4.2.14) is the sum of
exactly n ine double sums, where the summation is taken over two blocks of r.v.'s that
do not contain a change-point. Here z2, x3, x6 and 27 play the role of dummy vari-
ables and are used to split blocks that contain change-points. At most two of them
will actually be used, but since the location of the change-points is unknown, we need
to consider two possible changes in each of the blocks (X~~n+i)z11+17 . . . , X[(,+l),,l)
d ( X n + l z + 7 . . , X n Hence, the convergence result fiom Section 4.3.2
may be applied to (4.2.14) and we get, as n + m,
where = = 83$ = 0 = O and1
Moreover, if the distribution before the bt change and after the second change are
the same then we also have = 8 = 0.
We are now in the position to go back to the definition of Z[(n+l)til,[(n+i)tzlj
O < tl < k < 1, in (4.1.2) and mite it in terms of double sums S(-) as in (4.2.14).
Thus we arrive at
lWe note that := el, := O2 and 03.3 := O3 which was defineci in (4.1.16).
where we define to := O and t3 := - " Since we do not know the location of the n+l '
change-points [nXi] and [ d l ] , where O < Xi < X2 < 1, we define the following
functions = &(A1, X2, t17 t2), 1 5 i 5 6 , which w f l be used to derive a formula
for (4.2.13) for all possible combinations of tl, t2, X1 and A*:
ot herwise, tl < Al 5 A2 5 t 2 ,
otherwise, t* < AL < 1, Al 5 t2 < A2 < 1, ot herwise, t 2 < XI < X2 < 1,
where O < al 5 a2 5 ... 5 as < 1 and Q E [OJ], 1s i 5 6. We need to define these a i ' s , 1 5 i 5 6: since there are exactly six possibilities
to place taro change-points in three blocks. Moreover, exactly two of the latter ai's
will change to one of the values Al or X2, while the other four ai's will get the value
Q and they wilt drop out of the limiting function üA1,~2 (tl, t2). Hence, the actual
values of the Q'S are not important. The s's are needed to give a general formula
for the limiting function.
We are now in the position to state the folloming theorem which is an immediate
consequence of the previouç arguments in this section.
Theorem 4.2.3 Assume that (2.2.3), (4.1.20), (4.1.21), and HF) hold. D e f i e to :=
4.2. Antisymmetric Kernels 89
O and t3 := se2 If rl = q(n) := [ d l ] and ~2 = T&) := [ I z X ~ ] , O < Al 5 X2 < 1,
then, as n -+ w,
where
and 4 and the 3, 1 5 i 5 6, are d e f i e d in (4.2.15) and (4.2.18), respectiuely.
We note that in Theorem 4.2.3 = 02,2 = = B = O. Moreover, if the
distributions before the first change and after the second change are the same, then
we also have 4 3 = 0 = O.
The limiting function in (4.2.19) is defined for every possible combination of tl , ta, AL and A2. If we mite these cases individually, then (4.2.19) may also be written
as3
2Note that t3 + 1 when n + m. In (4.3.17) q := 5, "nce n = [(n + 1 ) ~ ] = [(n + l)ta]
has to be satisfied, but in the limiting function (4.2-19) t3 = 1. 3We note that in case of exactly one change-point, nameiy Xi = X2, the cases where O < ti 5
A1 < t 2 5 A2 < 1, O < XI < tl < t 2 < A2 < 1 and O < XI < tl 5 X2 < t 2 < 1 do not &st. We are left with the remahhg three cases, when Xi = X2.
4.2. Antisymmetric Kernels 90
Furthemore, if also = O, then the distribution before the first and after the
second change are the same. Consequently then, = e2& and (4.2.19) may be
written as
' (A2 - Xl)t2@1,2, O < tl < t z 5 Xi 5 X2 < 1,
((t2 - A 1 ) ( 1 - A2 f t l ) + (A2 - t 2 ) ~ 1 ) 8 1 , 2 , O < tl 5 Al < t2 5 X2 < 1,
(A2 - A l ) (1 - t 2 + tl)Ql,2, O < tl 5 Al < X2 < t2 < 1,
( ( ~ 2 - t l ) X 1 + (1 - X2) (t2 - ~ 1 ) ) 4 , 2 , O < AL < tl <t2 5 X 2 < 1,
((1 - - A l ) + (A* - tl)(l - t2 i- X1))8l12, O < A1 < tl 5 A2 < t 2 < 1,
, (1 - t W 2 - &)&,2> O < Ar 5 X2 < tl < t2 < 1. (4.2.21)
Assuming second h i t e moments Eh2 instead of (4.1.21) Theorern 4.2.3 implies the
consistency of tests based on supfunctionals of {Ü[(,+l)t,l,[(,+l),,l> O < tl < t2 < 1).
This means that we can conçistently reject Ho vs. HF): when
except in the case when O1 = e2 = e3 = el,2 = = 192,~ = 0-
4.2. Antisymmetric Kernels 91
GAI iX2 (tl, t2) is equal to O if and only if all Bi+ invdved are equal to O. To show
this we observe that each of the six parts of the limiting function in (4.2.20) may be
written as
where Ajy Bj, Cj and Dj depend on 812 and 8213. Moreover, uxl,x2 (tl , t2) = 0,
O < t, < t2 < 1, if and oüly if Aj = Oz Bi = O, Cj = O and Dj = 0, 1 5 j 5 6- For
example, the linear system of the six independent equations Dl = O, D2 = 0, . . ., D6 = O involves only three unknown parameters which implies that there is only
one possible solution, namely 012 = OIv3 = 8213 = 0.
W e observe that
Moreover, it follows
& = e2 = e3 = 01v2 = 01v3 = 02,3 = 0 implies
then that under the nul1 hypothesis Ho of no-change
Thus, on assuming that O1 = = O3 = O1,z = = = 0 = O Y the sequence
is not consistent against any class of alternatives. On the other hmd, if at Ieast one
Oij iç not equal to O and we use Fn, then
P{Ho is rejected when using T , I X ~ ) is true) 12-00 --t 1.
This implies that the Iimits of the sequence (Tn)nEh' are different in probability
under Ho and ~ f ) , namely we have consistency of {Tn lnEN-
4.3. Symmetric KerneIs 92
4.3 Symmetric Kernels
We consider the processes Uki,k2, 1 5 kl < k2 < n, n > 3, where the kemel h is
symmetric as in (2.2.2). By using the notations and assumptions Eiom Sections 1.1.1
and 4.1.2 we investigate the behavior of these processes under 6 (cf. Section 4.3.1)
and H?) (cf. Section 4.3.2) .
4.3.1 Asymptotic Results under Ho
We wish to study the limiting bahavior of
in the sup-nom under the nul1 hypothesis of no-change. For the indices, we write
[(n + 1) tl] and [(n + l)t2] respectively.
The asymptotic behavior of U[~n+l)till[(ni9t2~ O < tl < t 2 < 1, will be derived
from the following reduction princzple (Coroilary 4.3.1)- It is a consequence of The*
rem 1 of Hull (1979) which may be reduced to the following coroilary of Hall (1979):
CoroIlary 4.3.1 (Hall, 1979) XI, X2, . . . are i. i.d. r. v. 's with cornmon distribution.
If e2 = B ~ ~ ( x ~ ) = O (degenerate case) then Ci<i<j<l*,l WiIX,) 2n converges weakly to
a stochastic process Y, where h Y a syrnrnetric kernel with mean O and finite second
moment.
We mention that the U-statistics in (4.1.12) are assumed to have a non-degenerate
kernel h, but we may, in our context, coristmct kernels 9,' that are degenerate. Re-
caii that IE~ (x~) = E(h(X, , &) - e) = O and a2 = varh(x1) = IE(&(X~))* z CI
and let
-4.3. Symmetric Kernels 93
Put
Since h is a sy-etnc kernel, it follows that
where for each z
Hence gé is a symmetric kernel with mean 0, and it is degenerate = l~ij,'~(Xl) =
O) as weU. Findly, for the centralized U-statistic uL4) defined in (4.1.12) we get
4.3. Symmetric Kernels 94
which implies that
where is a centralized U-statistic with a symmetric and degenerate kernel g,',
and Corollary 4.3.1 can be applied. We note that u:(~) corresponds to the U-statistic
2:(4) in (4.2.2) in the antisymmetric case, but they are different. Mso, we have to
use different known resdts to get Op(n) statements, namely Corollary 4.3.1 and
Theorem 4.2.1 respectively.
We mention that the * in (4.3.1) as well as in the following formulas indicates
that we consider the difference of one of the centralized U-statistics in (4.1.12) and
its corresponding projection.
Similarly to the antisymmetric case (cf. (4.2.3)), let
and we will show that (cf. Corollary 4.3.2)
max lI[(n+l)t1]S[(n+i>t2]~~, IU[(n+l)ti],[(n+i)t2]-rci2)(tlYt2)1 = O&). (4.3.2)
Consequently to (4.3.2), we approximate rci2) (tl, t2), which is a sum of i.i.d. r.v.'s,
by appropriate Wiener processes and by doing this, we will get a Gaussian limiting
distribution for a properly norrned version of 1 ,[(n+llsi. We first prove
4.3. Symmetric Kernels 95
uL4) defined as in (4.1.12). Then under Ho the following statements hold true as
I(n+r)tl] rnax
l~~(n+l)ti151(n+i)tz]1n IuE+,)tll - [(n + WI] i = L C ~ ( X J
rnax isl(n+l)t.-] <[(n+1)t21 <n I~is3n)+1)t~~,n - (n - [(n + ')a]) 2 &(xi)
i=[(n+l)t2]+i
Proof of Lemma 4.3.1 For technical purposes we define
Thus we obtain
n Kn+W
= (n - 1) C h(xj) - ([(n + l)t,] - 1) C h(xj)
4.3. Symmetric Kernek 96
and we now use formula (4.1.12) to anive at
Thus we have the inequ*
By using Corollary 4.3.1 together with (4.3.1), as n -t CG, we get
Since h(xi), 1 5 i 5 n,
and O < 9 = varh(xi) are i.i.d. r.v.'s the central limit theorem with IE~(x~) = O
< ca implies
Therefore (4.3.5), reduces to
which proves the fourth statement in this lemma.
The first statement in the lemma follows by combining (4.3.6) and the fact that,
4.3. Symmetric Kernels 97
a s n + w ,
U*(2) [(n+l)til,[(n+i)t21 is a U-statistic (with a s r n e t i c and degenerate kernel with
mean O) from [(n + l)tl] + 1 to [(n+ l)t2] minus a sum of i.i.d. r.v.'s. Hence, we have
to shift the interval [[(n + l)tl i 11, [(n + l ) t 2 ] ] to the interval [l , [(n + 1) (t2 - tl)]]
by using the fact that for i.i.d. r.v.'s Xi, &, - . - , Xn
for each n. Note, that we also have equality in distribution when replacing h* by
g$ := h* - 8. Moreover, by CoroLlary 4.3.1,
- and we may write SU^^..^^-^^ <1 - SU^^^<^^^^+^^. Since (4.3.7) holds uniformly in tl - and t2 such that O < tl < t2 < Ir we have suptl,t,,l+tl - ~up,,,~ ,,1,, and we h v e
at
and similar arguments as in (4.3.6) yield
which proves the second staternent in this lemma.
4.3. Symmetric Kernels 98
to shift the intemal [[(n + l)t2 + 11 , n] to the interval [1, [n - (n f l ) t z ) ] ] . Then
for each n, and
- and we may wrfte ~upo< l+<~ - supo,t2-l,-l = ~ u p ~ , ~ , , , . If we restrict t2 to the
intervd (tr ,1) c ( O , 1), where tl is between O and 1, then the supremum over this
interval will &O be Op(n), hence
Moreover, we have
which proves the third staternent in this lemma. 0
Moreover, using (4.3.4) we get
Corollary 4.3.2 Let h be a symmetric kemel and U[(,,,)tll,~(n,,)t21 be defïraed as
in (4.1.10). Under Ho, as n -+ co, we have
Kn+W1l max
lI[(n+l)tr]~[(n+~)to]In l"~(n+l)til,[(n+l)t21 - {([(n + l ) t 2 I - 2[(n + l ) t i ] ) j=i C h(xj)
by sums of i-i-d. r.v.'s. This enables us to study the Gaussian asymptotic behavior
of q(n+i)ti],[(n+l)tz] " the supnom.
Theorem 4.3.1 Assume that Ho, (2.2.2), (4.1.5) and (4.1.9) hold. Then we can
define a sepence of Gatlssian processes {ry=(tl, t2), O 5 t1 5 t2 - < l)nEN S D C ~
that, us n + CO,
and, for each n,
where the Gaussian process rsp is defined via a linear combination of a standard
Wiener process W as follows:
Proof of Theorem 4.3.1 The proof goes dong the lines of the proof of Theo-
rem 4.2.2. Namely we combine Coroltary 4.3.2 with (4.2.11) and the theorem folloms
Mmediately. O
We note that for O stl 5 t2 5 1
and
4.3. Syxmnetric Kernets 100
where to = ro := O and t3 = T~ := 1. Theorem 4.3.1 implies that u d e r Ho
Moreover, this allows us to produce tables for the unlmown distribution of SU~,,~,<,~,,
Irsum(tl, t2) 1 and reject the null hypothesis of no-change, if supo<ti<tz<i IUn(tl, t2) [
becomes too large.
4.3.2 Asymptotic Results under HY) We wish to study the limiting behavior of Zr(n+l)til,~(n+l)tzl, O < ti < t 2 1,
from (4.1.2). We will show that (cf. Theorern 4.3.2), as n -+ co,
where we will assume that h(x, y) is a nondegenerate symmetric kernel and (4.1.21)
holds. Thus (4.1.21) replaces the stronger asumption of a second finite moment as
in (4.1.19) which was used to derive the convergence in distribution result under Ho
(cf. Section 4.3.1), as n -t w,
where the stochastic process U&, t2) was defined as
and the limiting Gaussian process rsP(tl, t2) as
The limiting function in (4.3.12) will depend on the location of the (at most)
two change-points [nXl] and [nX2], O < XI 5 X2 < 1. Moreover, we dl see that
4.3. Symmetric Kernels 101
When looking at the definition of Z~~,,l)tll,~~,+~~t,l, O < tl < t2 < 1, in (4.1.2)
with ki = [(n + l)ti], 1 5 i 5 2, we see that it may be epressed in terms of the sum [(n+i)t11 [(nf W 2 ] p-)tlI
of the double Cj=[(,+~)tll+l h(Xi, Xj) 7 ~~=[Kn+l l t21+1 [(,+l)t21 n
h ( X i , X j ) and Z=[(n+iltil+l Cj=[(n+l)t21+l h(Xi7 Xj)- Thus, each of these double s u s is a sum of the form
where O 5 a < b 5 c < d 5 n and a, b, c, d E N. These double sums rnay be
associated with comparing the two blocks (X,+, , . . . , Xb) and (X,+l, . . . , Xd) with
each other. In case of testing for (at most) one single change-point we compared the
block (XI, . . . , Xk) with the block (Xk+l, . . . y Xn) . NOW we have the three blocks
(XI, . . . , &,), (Xkl+l, . . . , Xkî ) and (Xk2+1, . . . , X,) and therefore three possibilities
to compare two dinerent blocks with each other. Each block rnay contain a t most
two change-points, hence we split each of the sums in (4.3.13) into three sums. Since
we do not know the location of the change-points we mite the latter sum as follows,
where O 5 a < y1 5 % 5 b 5 c < y3 5 7 4 5 d 5 n. Consequently, we are comparing
blocks that do not have a change inside. Again we emphazise that there are a t most
two changes in total, which implies that some of the new srnall blocks may have no
change inside or in between them. For example, if we were to have one change in
the first block and one change in the second block, then (4.3.14) reduces to
Xd) do not contain a change-point, and, the two change-points now are postulzted
to be at the positions 71 and 73, which thernselves are also unknown. Since we do not
know in advance the location of the change-points, we consider sums as in (4.3.14).
For technical purposes we define the following function
i 1, O<z<X1 ,
l ( z ) := 2, Al < z 5 X2,
3, A:! < z < 1, which will be used to remind ourselves of the location (either before or between or
after the ~ h a n ~ e - ~ o b t s ) of a block of r.v.% which does not contain any change-point.
We split (4.3.14) ~ t o 32 different double sums which are of the form
where 1 5 Ri = &(n) := [ml] < R2 = Rz(n) := [nrz] 5 R3 = &(n) :=
[nr3] < = &(n) := [nr4] 5 n are chosen properly according to the double
sums in (4.3.14). m b e r m o r e we wil l see that for each of these sums we have that,
a s n - o o ,
This is an immediate consequence of Theorem 2.6.2 for generalized two sample U-
statistics by Sen (1977) (compare the proof of Theorem 3.3.2), and Hoeffding's SLLN
(cf. Theorem 2.6.1) for two samples fiom the same distribution.
The double sum in (4.3.16) may be associated with comparing the two blocks
(XRiil, . . . , XR2) and (&+1, . . . , X%) with each other, where both do not have
any changes inside. If both belong to dinerent distributions, then Theorem 2.6.2
applies, and (4.3.16) follows immediately. On the other hand, if both belong to the
çame distribution, thep we may consider the two blocks as one block. We do this
by deleting everything between those two blocks. We therefore consider the block
may then use Hoeff&gYs SLLN, but first we have to write the double s u in (4.3.16)
in ter- of U-statistics. Hence, we now wnte
and
C D h(E;-,k;.) = C h(Ky y,), R2<i<j<%-(&-R2) RI <i<j,<&-(R3-R2)-(R2-RI)
for each h e d n. Using now Hoeffding's SLLN, we get the followïng convergence
results for the U-statistics A&~), An2) and An3), as n + CO,
Hence, as n + m,
and, therefore, (4.3.16) also holds, if two different blocks have the same distribution.
Note that 01(,),1(,1 = 6'i(r2),r(t,-(r,-r2)) when using the block of i-i-d. r.v.'s Y.
Similady to Section 4.2.2, we define
where O 5 XI < 2 2 5 x3 5 2 4 5 23 < x6 5 x7 5 x g < 1. Moreover, since (4.3.18)
is a sum of nine double sums, (4.3.16) may be applied nine times, and the limiting
function of (4.3.18) can be written as (n + co)
We observe that for O1 = O2 = d3 = = = 02,3 = 6
We are now in the position to go back to the definition of Z[(n+1)t11,[(n+1)t2~
0 < tl < t2 < lY in (4-1.2), and we write it in terms of double sums S(-) as
4.3. Symrnetric Kemels 105
in (4.3-18). Consequently
" Since we do not know the Iocation of the where we define to := O and t3 := a- change-points [dl] and [nXz], where O < Al 5 & < l, m define the followbg
Al, XI I tl, al :=
cl, otherwise,
:= otherwise
where O < al 5 a2 5 .. . 5 < 1 and q E [0,1], 15 i 5 6.
We need to defbe these ai's, 1 5 i 5 6, since there are exactly six possibilities
to place tmo change-points in three blocks. Moreover, exactly two of the latter aiYs
will change to one of the values XI or X2, while the other four ai's will get the value
a and they wilI drop out of the limiting function u x , , ~ , ( t ~ , t2) . Hence, the actual
values of the ciYs are not important, but they are needed to give a general formula
for the limiting function.
We are now in the position to state the following theorem which is an immediate
4.3. Symmetric Kernelç 106
consequence of the previous arguments in this section.
Theorern 4.3.2 Assame that (2.2.2)' (4.1.20), (4.l.21), and ~ 2 ) hald. De f i e to :=
O and t3 := S.* I ~ T I = T&) := [dl] and TZ = r2(n) := [nA2], O < Al 5 X2 < 1' then, as n -+ m,
where
and q and the 's, 1 5 i 5 6, are defined in (4.3.19) and (4.3.21), respectiuely.
The limiting function in (4.3.22) is defined for every possible combination of t l ,
t2, XI and A2. If we speU out these cases individually, then (4.3.22) cm &O be
'Note that t3 -t 1 when n -t CO. In (4.3.20) t3 := *, since n = [(n + l)&] = [(n t l ) t a ] has to be satisfied, but in the limiting funetion (4.3.22) tJ = 1.
4.3. Symmetric Kernels 107
Assuming that IEh2 is finite instead of (4.1.21) then Theorem 4.3.2 implies the
consistency of tests based on supfunctionals of {U[(ncl~~il,r(n+l)tzli O < tl < t2 < 1). This means that we can consistently reject Ho vs. ~ f ) , when
except in the case when = O2 = B3 = = = 82,3 = 0.
~ A ~ , A ~ (tl, t2) is equd to O if and only if al1 involved are equal to O. To show
this we obsenre that each of the six parts of the limiting function above may be
written as
4.3. Symmetric Kernels 108
where Aj, Bi, Cj and Dj depend on 81, 82, 83, elY3 and and 1 5 ml 5 ma 5 3
have to be chosen properly. Moreover, ux,,A2(t1, t2) = O, O < tl < t2 < 1, if and o d y
i f O 1 =O, & = 0, 02,3 = O , Aj =O, Bj = O , Ci = O and Dj = O , 15 j 1: 6. But
combining 013 = elY3 = 0Zy3 = O with the arguments from the antisymmetric case
we get consistency except in the trivial case.
We observe that if el = B2 = O3 = 81,2 = = û2,3 = 8, then
which is the limit (n -t ca) of the expected value of ~Z~~n+l)tll ,I(n+llt21 under Ho
(cf. (4.1.7)). Moreover, it follows then that under the null hypothesis Ho of no-
change
Thus on assuming that O1 = O2 = B3 = = 0113 = 02,3 = 8 = 0, the sequence
is not consistent against any dass of alternatives. On the other hand, if at least one
4.4. ChangesintheMean 109
Bij is not equal to O and we use Tn, then
This implies that the Lirnits of the sequence {TnInEN ae Merent in probability
under Ho and ~ f ) , namely we have consistency of {Tn)rFN
Example 4.3.1 1% consider T, and let c, be such that
for some fked O < a < 1 and n E N large enough. We apply Theorem 4.3.2 and
get that
P{Ho is rejected when using T , I H ~ ) iç tme) =
-+ 1 as n + oo,
1 where tl and t2 are picked such that JU[(n+l)t,l,[(nii)t21 f f O. Hence, T, in this
example is, for n E N large enough, unbiased at any level a and, as a sequence in n
it is consistent againçt the class of all alternatives where at least one 9, # O.
4.4 Changes in the Mean
We are t o test the no-change in the meun nuli-hypothesis
4.4. Changes in the Mean 110
Ho : X I , . . . , X, are independent identzcally distributai random variables
w i t h E X i = ~ andO<t?=VarXi <CQ, 1 SiSn,
against the at most tvlo changes in the m a n alternative
~ 2 ) : XI,. . . , X , are independent randorn variables and there are the - tvlo integers TI and r2, 1 5 TI < < n, such that E X l = . . . -
E& # Exriil = . - . = EXny lex,+, = . . . = E X , # EX,+, =
. . . = E X , and O < O* = VarXi = . . . = VarX, < cm.
Takuig simulated values of the therein indicated (k, Xk), Figure 4.4.1 gives an ex-
ample where the mean changes two times while the Msiance stays the same.
Figure 4.4.1: The data XI, . . . , are i.i.d. N(0, 1)-distributed, XSo1 , . . . , &OO are i.i.d. N(3,l)-distributed and X701r - . . , XlOo0 are i.i.d. N(2,l)-distnbuted.
Similarly a s in the case of testing for at most one changepoint, testing for at most
two change-points in the mean may be illustrated by using a geometrical argument.
Consider the linear hinction m(t) := t, t E R, which joins under Ho al1 the points
(k, ;IE(S(~))), k E N, if p # 0, and it joins dl the points (k, E(s(~))), k E N, if
p = o.
4.4. Changes in the Mean 111
Figure 4.4.2: A geometricd interpretation of IE{k2S(ki) + (n- kl) S(k2) - k2S(n) ) = O under Ho.
mthout loss of generality let p = 1. Shen in Figure 4.4.2 we join all the
points (k, E{s(~))), k E N, via the straight line m(t) = t. We pick one kl E
(1,. . . , n - 2) and then one k2 E {kl + 1,. . . , n - I ) . ~ We draw a horizontal line
starting from B := (O, IE{s(~~))), containing the point (k17 l ~ { ~ ( k 1 ) ) ) , and *th
terminus, C := (k2, IE{s(kl))). We draw a vertical line fiom the terminus and
intersect the t-axis. We denote this intersection by D := (k2, E{s(o))), where we
define S(0) := O. In this way we constmct a rectangle, denoted by ABCD (see
Figure 4.4.2), where A := (o,E(s(o))), with length k2 and àeight lE{S(kl)). We
also draw a horizontal line starting from F := (kl, lE{~(k~)) ) , containing the point
(k2, E{S(~))), and with terminus, G := (n, E3{s(k2)}). We then draw a vertical
line fiom the starting point and another one fiom the terminus, both intersecting
the t-axiç at E := (kl, E{s(o))) and H := (n, E(s(o))) respectively. Similarly,
we constmct another rectangle, denoted by EFGH (see Figure 4.4.2)) with length
'In Figure 4.4.2 we may think of kt and k2 being defhed as ki := [ntkf")] and k2 := [ntyk)] respectively, O < ti") < tifÜ) < 1.
4.4. Changes in the Mean il2
(n - kl) and height E{S(k2)).
Reffecting each point of the rectangle EFGH around the 45 degree line m =
t, we get the nem rectangk BIJC, where B := (o,IE{s(~~))) is the refiection
point of E := (kl, E{s(o))) , I := (O, E{s(~))) iç the reflection point of H :=
(n, IE{S (O) )) , J := (k2, E{S (n) )) is the reflection point of G : = (n, E{S (k2) )) and C := (k271E{~(k1))) is the reflection point of F := (kl,E3{s(k2))). This new
rectangle has length k2 and height lE{S(n) - S(kl)).
Combining the two rectangles ABCD and BIJC with each other, we have con-
structed the rectangle AIJD, which has length kz and height E{S(n)). Moreover,
under a, the new rectangle RTJD has the same area as the sum of the two other
OIES, namely ABCD and EFGH. Consequently, we have that
Thus, in principle, for each given kl and k2, 1 5 kl < k2 < n, we constnicted an
unbiased estimator of zero assuming that Ho is tme. We may also Say that, viewed
this way, testing for at most two changes in the mean results in comparing areas of
three different rectangles with each other.
Continuing with using the kernel h(z, y) = z - y to test for at most two changes
in the rnean and using the same arguments as in Section 3.4 we assume that under
Ho
4.4. Changes in the Mean 113
and
Since h is an antisymnetric kernel, we know fiom Theorem 4.2.2 that, as n + oo, under Ho
where for aU O < tl < t2 < 1
and
We note that
for each n.
Assuming the same conditions as in Theorem 4.2.3, we know that: as n + a, under HF)
4.4. Changes in the Mean 114
Therefore we consistently reject Ho , if SU^^<^^ <f2..l IUn(tli t2) 1 becornes too large.
We note that (4.4.2) implies that under Ho, as n + m,
and one would have to produce tables for the right hand side randorn variable
in (4.4.5) for the sake of accepting or rejecting Ho.
We note that by expressing (4.4.3) in terms of partial sums S([(n + 1)-1) we get
t bat
and now Donsker's theorem (see Theorern 1.2.1) yields (4.4.5) as well.
The variance o2 in (4.4.3) is usually unknown. Consequently it has to be esti-
mated on the bais of the same randoni sample. One possible way of estimating (r2
is via the sample variance, namely
where X, = X1+-.CXn n is the sample mean. Since ô: is a consistent estimator for 02,
Ün(ti, t2) in (4.4.3) may be estimated by
Now o u previous result fiom (4.4.2) carries over. In particular, as n + m, we
obtain
4.5. Changes in the Variance 115
and hence also
Consequently, just like in (4.44, tables for the distribution of ~up,,,~ <,,,, Ira (tl, t2) 1 couid again be used to consistently reject the n d l hypothesis of no change in the - data, if our test statistic sup,,,, ,, 1 Ù, (tl, t2) ( becornes too large.
4.5 Changes in the Variance
We are to test the no-change in the varionce hypothesis
Ho : X I , . . . , X,, are independent identically distrz'buted random variables
w i thEXi=pandO<$=VarXi<oo , l s i sn ,
against the ut most t u o changes in the variance alternative
H : XI, . . . , X, are independent mndom uoriables and there exist the - two integers TI and 72, 1 _< TI 5 72 < n, such that VarXl = . . . -
V a r X , # V~TX,,~ = . . . = VarX,, V ~ T X , + ~ = . . . = VarX, # VarX72+l = . . . - - VarX,, O < V U ~ X ~ , V U ~ X , + ~ , VU~X,,+~ < CQ,
and EXl = . .. = E X n = p.
Taking simulated values of the therein indicated (k, Xk), Figure 4.5.3 gives an ex-
ample where the Mnance changes two tirnes while the mean stays the same.
Similar arguments as in Section 3.5 suggest the use of the symmetric kernel
Using the same arguments as in Section 3.5, we assume that under Ho
-
4.5. Changes -in the Variance 116
Figure 4.5.3: The data XI, . . . , Xloo are i.i.d. N(O,2)-distributed, XIo1, . . . , are i.i.d. N(O,1)-distributed and X301, . . . , X1OOO axe i.i.d. N(O,3)-distributed.
and
Since h is a symmetric kernel, we know h m Theorem 4.3.1 that, as n -t 00, under
Ho
where for all O < tl < t2 < 1
4.5. Changes in the Variance 117
and
we note that
for each n.
Assuming the same conditions as in Theorem 4.3.2 we know that under the
alternative ~ f ) , as n _t cm,
Similarly to previous sections, we have to estimate the usually unknown param-
eters 9 and C2.
4.5. Changes in the Variance 118
Again we estimate o2 by the usual eçtimator of the Mnance 6: as in (3.4.6). we
also have to estimate 5 whîch is defined as
Hence, we estimate the second part of the latter equation by ô: and the first part
via the estimator for the 4 t h centerized sample moments
where X, = xl+---Cxn n *
Hence, we estimate U,(tl, t2) in (4.5.3) by
Now our previous result from (4.5.2) carries over. In particular, as n -+ CO, we
obtain
and hence &O, as n + oo
Thus, possible tables for the distribution of s~p , ,~~ , , ~ , , II'-(tl, t2) 1 couid be used
to reject the nul1 hypothesis of no change in the data, Sour test statistic SU^^<^^^^^<^ lon(tl,t2)( becornes too large.
4.6. Epidemïc Alternatives 119
4.6 Epidemic Alternatives
In the past two sections we studied alternatives with at most two changes in the
mean and variance: respectively. Similarly, we will test for two changes and will dso
require that the distributions before the first and after the second change are the
same. This kind of alternative, more or less as formulat ed by Levin and Kline (1 985),
has been called the epidemic alternative, on postulating that an epidemic state nuis
from t h TI through T*? after which the normal state is restored. Applications of
this mode1 in econometric context are studied by Broemeling and Tsunimi (1987).
Ure &art with the case, where we have an epidemic change in the mean as
assumend in the alternative HF) below. In particular, we test the no-change in t he
Ho : X1, . . . , X , are independent identically distributed random variables
vith EXi = p and O < a2 = VarXi < oo, 1 5 i 5 n,
against the epidemic change in the mean alternative
H?) : XI,. . . , X, are independent randorn variables and there are tzuo
integers 71 and 72, 1 < TI < r2 < n, such that EXl = . . . = E X , =
EXn+l= -. . = EX,, ExT,+, = . . . = EXnz BX, # EXn+l and
O < o2 = VarXl = . . . = VarX , < m.
This alternative postdates that the mean changed at an unknown time TI and
that, at another unknown time 72, i t changed back to its original level. Figure 4.6.4
gives an example where the mean changes a t some point 7-1, and at point 7-2 it
changes back t o the original mean before the first change TI.
Nonparametric tests for epidemic alternatives were discussed in the literature
in the past two or so decades (cf. Csorg6 and Homath (1997, Section 2.8.4) and
Yao (1993)) and their related references. Levin and Kline (1985) suggested the test
st atistics
4.6. Epidemic Alternatives 120
Figure 4.6.4: The data XI, . . . , and Xioi, . . . , Xlooo are i.i.d. N(0, 1)-distributed and &O1l . . . , Xm are i-i-d. N(3,I)-distributed.
and
which rnay be interpreted as comparing the mean in the middle to the mean before
and after. Using Donsker's theorem (cf. Theorern 1.2.1) we have that under Ho
1 -TF) P+ mp B(t) - inf B (t) , as n + a. n1I2a o ~ t < r olt<i
The latter convergence results uses the fact that
'D sup IB(tz)-B(tl)l = sup B(t) - inf B(t). Ostr < t 2 9 O<tsi Ostg
4.6. Evidemic Alternat ives 121
In a similar vein, Lombard (1987) proposed a quadratic version of the latter
statistic, namely
Again we rnay use Donsker's theorem to compute its limiting distribution under Ho:
namely
The likelihood ratio test çtatiçtic for testing Ho vs. HF) , if the obsenations are
normal with constant mean, is calculated to be
~ ( 4 ) = m u n l l k i <kz<n
{(k2 - kl) ( 1 - v) } '
where the distribution of TA*) is unknown. lnstead of the latter expression Yao
(1993) obtained large deviation approximations for
assuming that O < lim inf,,, < lim sup,,, < 1. This statistic is similar to
~ ( ~ 1 , but technical d ~ c u l t i e s are avoided by trimming the endpoints. In practical
situations one may face problems when choosing a proper ml and mz.
Based on the recursive residuals wk defined by
Yao (1993) introduced some new statistics. Let
4.6. Epidemic Alternatives 122
and d e h e
and
The distributions of these two statistics are unknown due
arising by (k2 - k1)'I2 in the denominator. To avoid these technical problems, Yao
(1993) considered a trimmed version of T;~), namely
in case of normal observations and obtained large deviation approximations.
Csorgo and Horvath (1997) avoided technical difficulties by multiplying T;~) and
TP) with their denominator. Namely, similady to the statistics TA') and TA^) they
considered the statistics
and
as well as their quadratic analogue
Using the fact that, as n + w,
4.6. Epidemic Alternatives 123
they obtained the following asyrnptotic distributions for the latter three statistics
and
which are the same as the ones for &TA'), -&TF) and LT(~), 2n3u n respectively.
Using the results from Section 4.4 we suggest the use of the test statistic T, =
Tn (XI, x2, . . . , Xn) which is defined via
By (4.4.2), or by using Donsker's theorem we have under Ho that
where the limiting Gausçian process Fa (tl , tz) is defined as
Moreover, under HF), -&$ converges in probability to w, since MT, converges
in probability- to the supremum of the limiting function E A , ~ , (tl, t2) defined as
4.6. Epidemic Alternatives 124
(A2 - Xl)t2&,2r O < ti < t2 5 XI < A2 < 1,
((a - ~ ( 1 - X~ + t l ) + ( x ~ - t 2 ) ~ 1 ) 4 2 7 O < tl 5 A, < t2 5 X~ -=. 1, 0 2 - X1)(1 - t 2 + t l ) 4 2 > O < tl < Al < X2 < t* < 1,
( ( ~ 2 - td&+ (1 - X2)@2 - ~1))81,2, O < Al < tl < t2 5 X2 < 1,
((1 - &)(tl - AL) + (A2 - t l ) ( l - t 2 + X & $ , 2 , O < Al < tl 5 A2 < t 2 < 1,
. (1 - t l ) ( X 2 - w 1 , 2 > O < A, < X2 < t1 < t2 < 1,
where TI = [nX1], T- = [nA2] and = E(Xi - X j ) # o , ~ 1 5 i 5 71 < j 5 72 < n a n d l < r l < i < T 2 < j I n .
The limiting function GIa,(tl, tz) depends on Al, X2, tlr tZ and In Fig-
ure 4.6.5 we c m see the six areas where tl and t2 are dehed. The areas Al, Aq and
A6 are triangles, whiie A2, AJ and Ag are rectangles. Each of them depends on the
location of at least one of the change-points Al and A2.
Figure 46.5: Summation Area of ÜAIvA2 (tl t2)
61f were to be equal to O then -&T, is not consistent.
4.6. Epidemic Alternatives 125
A special point in Figure 4.6.5 is (A1, X2), which is the point where the areas
A2, A3, & and Ag intersect. Moreover, assuming that > O2 it turns out that
the point where Ü A ~ , & (tl, t2) reaches its maximum is at (A1, A2). We note that
Ü,,,,(h, A21 = 0 2 - W(1 - A2 + w4,2.
Figure 4.6.6: The limiting function ü~ 2(tll t2) with = 10 takes its maximum 3'3
value of 2.2 at the point (B, i).
In Figure 4.6.6 the graph of the Limiting function u+?+(ti, t2) is plotted. T t can
be seen that ü~ 2 (tl, t2) is defined in six dSerent areas via six diaerent functions. 3'3
Three of these surfaces have the shape of a triangle, and three that of a rectangle.
Moreover, the surfaces corresponding to the summation areas Al, 4, A4 and A6
are planes and the two others are not-
Figure 4.6 -7 shows the limiting function ZL (t t2). Thus, in this case, we con-
4.6. Epidemic Alternatives 126
sider a situation when w o change-points are close to each other. This means that the
epidernic change fiom one mean to another and back was very short. Nevertheless,
the maximum is taken at point (X1,X2) = (&, 6).
Figure 4.6.7: The limiting function O' 2 (tl, t2) with O,,* = 10 takes its m b u m 10 ' 10
value of 0.9 at the point (h, $) .
Figure 4.6.8 shows the limiting function ü I 9 (tl, tz), i. e., now the two change- 10 ' 10
points are far apart from each other. This means that there was a long penod
between the first and second change. Similar to the previous two cases the maximum
is taken at point (A1, Xz).
Therefore, in addition to the test statistic Tn which tests for Ho against ~ f ) , we
may also define an estimator for the times of change, say B. Hence, we d e h e as an
4.6. Epidemic Alternatives 127
Figure 4.6.8: The lhiting function ü~ A (tl, t2) with = 10 takes its maximum 10' 10
value of 1.6 at the point (6, 6).
estimator for the change-points TI and r2
assuming that BlY2 > If &,2 < O then we define as an estimator for the change-
points TI and ~2
'We use m i n { ( k ~ , 4 ) : . . .) to denote the point (4, k2) where the maximum (or minimum respectivdy) is taken and which has the smdest &tance (&didean nom) fkom th; point (0,O).
4.6. Epidemic Alternatives 128
We note that ? and 5 are two-dimensional vectors.
Simîlarly to an epidemic change in the mean, we may assume an epidemic change
in the variance. In particular, we now are to test the no-change in the variance nu&
hypo t hesis
Ho : XI , . . . , Xn are independent identically distribzcted random variables
wi thEXi=p a n d 0 < a 2 = V a r x i C W , 1 S i S n ,
against the epidemic change in the variance alternative
~ 2 ) : XI,. . . , X, are independent random variables and there are two
integers rl and Q, 1 < TI < ~2 < n, svch that Var& = . . . = V a r X ,
= V U T X ~ + ~ = . . . = VarX,, VarX,+l = . . . = VarX,, V a r X , # O < VarX,, VarXn+l < a, and EXl = . . . = EXn = p.
This alternative postdates that the variance changed at an unknown time TI
and that, at another unknown time r2, it changed back to its original level.
Using the results from Section 4.5, we suggest the use of the test statistic T, =
T, ( X I , X2, . . . , Xn) which is defined via
where we replaced 5 and o2 by the estimators suggested in Section 4.5 (cf. (4.4.6)
and (4.5.4) respectively).
Assuming that the second moments of (Xi - Xj)2, 1 5 i < j 5 n, and a2 > O are
al1 finite then, under Ho, T, converges in distribution to SU^^<^^<^<^ IrSW(tl, t2)1, where the Limiting Gaussian process rsm(t1, t2) is defined as
Moreover, under HA, T, converges in probability to CO, since & converges i n
4.6- Epidemic Alternatives 129
probability to the supremurn of the limiting function u ~ t , ~ 2 (tl, t2) which is given by
We note that this limiting function corresponds to the one in (4.3.23), when we
put O1 = 8123 = û3 and el,* = 82,3. The limiting function depends on XI, X2, t l , t 2 ,
81, 61.2 and 0 2 -
Similady to the case of an epidemic change in the mean, the limiting function
t2) is defined via six different functionç over six different areas. Unfortu-
nately, it is not tme anymore that the maximum or minimum, respectively, of the
limiting function is taken at (A1, A2). This is due to the fact that there are two
more parameters, namely and 0 2 , involved than before. Figure 4.6.9 gives an
example where neither the maximum nor the minimum is taken at (Xi, X2). Hence,
an estimator for the times of change is only possible, if both changes satisfy special
conditions. Such conditions can be found in a similôr vein as in Section 3.3.3, where
we deal with one change only.
Similar test statistics can also be found for different epidemic alternatives. The
4.6- Epidemic Alternatives 130
Figure 4.6.9: The lirniting function ü~ &, t2) with 019 = 10, = 20 and 82,3 = 3 ' 4
- 12 takes its maximum value of 3.05 at the point (&, $) .
main problem is to End an (anti-) symmetnc kernel h(x, y) which d l be appropriate
for the desired alternatives.
"Changing one thing for the better is worth more
than proving a thousand things are wrong."
- Anonyrnous
Chapter 5
Multiple Change-point s
Introduction
In the previous two chapters we investigated test statistics which dowed us to
test for at most one or two change-points. h this chapter we generalize the main
ideas obtained in these chapters, especially t-he one where we tested for at most two
change-points, and study test statistics which allow us to test for at most s change-
points, 1 5 s < n. Due to the increase of the number of possible change-points,
the results become more complex and so do the notations. We note that proofs are
done in a similar vein as in the previous chapterç.
We are t o test the null hypothesis
Ho : XI, . . . , X, are independent identically dktributed mndom variables
against the alternative that there are at most s changepoints in the sequence
XI, . . . , X,, namely that we have
H(;) : 4,. . . , X,, are independent random uariables and there ex& s,
1 5 s < n, integers = rl(n), r2 = ~ ~ ( n ) , . . ., rs = r,(n), 1
TI 5 7-2 < . . . 5 T~ < n, mch that P{Xl 5 t ) = . . . = P{X, 5 t) , - qx,,, 5 t) = ... = EyX, 5 t), . . . y P{X,,+l 5 t ) = - - - -
a>(& 5 t ) for al1 t and q X T i < to) # P&+ 5 tO) for some to
andfor all 15 i 5 S.
5.1. Introduction 132
We note that, just like in the case of epidemic alternatives (cf. Section 4.6), the al-
ternative H:) allows us to consider rasdom variables Xl, X2, . . ., Xn with s changes
in the distribution, which do not necessarily result in (s + 1) di£Ferent distributions.
Since we are testing for at most s, 1 5 s < n, changes we need to define a
stochastic process which 'feels' the possibility of s changes. To do this we split the
given sample XI, . . . , X, into s + 1 blocks and compare each of the blocks with the
others. We continue using a kernel h(x, y) of the two variables. Since, out of s, we
dways compare two blocks with each other, we have (J:') different possibilities to
do so. Therefore, for the problem in hand, we define a sequence of s - t h e parameter
stochastic processes as follows:
where ko := O and k,+l := n. In this way we compare the (s+l) blocks (XI, . . . , Xki ) , (Xk,+l, .. .,Xk,), . . . , (&,+1,. . .,X,) with each other, where the ki7s7 1 5 i 5 s,
vary from i t o n+i - 1 - S .
When testing for a t most one hange, this process reduces ta
5.1. Introduction 133
the one we have already seen in Chapter 3. When testing for at most two changes
the test statistic reduces to the one we studied in Chapter 4, namely to
Here we study the asymptotic properties (as n -t co) of the s u p n o m of the
process ZklYk 2,-.-, et 1 5 kl < k2 < . . . < k, < n, which will be seen to be based on a
combination of U-statistics. First we give some ~ota t ions and definitions under the
null hypothesis of no-change and the alternative of a t most s changes.
5.1.1 Notations under the Null Hypothesis Ho
We define
and
ECh2 (xi> X j ) =: Y) 1 5 i < j 5 n.
We assume throughout the mhole chapter that
V < 00,
which of course Mplies
5.1. Introduction 134
F'urthermore, we have that
w h e r e O s a < b < c < n a n d a , b , c ~ N . Hence,
= (n - kl )k l8 + (n - k2)(k2 - kl)B+ ( n - k3)(k3 - k2)9 +. ..
= (n- kl)k10 + (n - k2)k28 - (n - kz)klB + (n - k3)k38
where ks+l := n. We define, as in the previous chapters,
and again we assume that
5.1. Introduction 135
We centralize Zkl ,k2 ,...,k* by its mean, and consider the process
where the kernel function h in Zk1,k2,...,k, is symmetric. For an antisymmetn'c kemel
we d e h e
Moreover, we can m i t e Zkl,k2,-.,t as the sum of (s + 2) U-statistics, and thus we get
that
where
Similady, for an antisymmetn'c kernel we define
5.1. Introduction 136
where
For hrther use we dso define the sequence of s-time parameter stochastic processes
and
5.1.2 Notations under the Alternative ~ 2 ) Let ~ ( ' ) ( t ) = P{X, 5 t ) , F(')(t) = IP{X,+l 5 t ) , F(3)(t) = P{X,+l 5 t ) , . . . , ~('")( t) = P{X,,+l 5 t ) be the respective distribution functions of the observa-
tions before the £ k t y between the fist and second, between the second and third,
. . ., and after the s-th change respectively, and put
5.1. Introduction 137
hirthermore, we will put
Similady to (5.1.15), we also d e h e the second moment of the kernel h by
where r, < i 5 T*+I and T, < j 5 r,+i for all O 5 q 5 r 5 S. We assume throughout
the whole chapter that Eh2(Xi, Xi) is finite for ail possible choices of i and j , namely
that
which implies t hat
For the sake of strong laws, me will also use a weaker assumption than a finite
second moment of h, namely that for random variables fiom different distributions
the following holdç:
where log+ x = Log(x V 1).
5.2. Antisymmetric Kernels 138
5.2 Antisymmetric Kernels
We consider the s-time parameter processes Zkl,h,...,k,, 1 5 kl < k2 < - . - < ks < n, n 3 s + 1, where the kemel h is antisymmetric as in (2.2.3). By using the nota-
tions and ssumptions kom Sections 5.1.1 and 5.1.2, we are to investigate nom the
behavior of these processes under Ho (cf. Section 5.2.1) and ~ 2 ) (cf. Section 5.2.2),
asn-m.
5.2.1 Asymptotic Results under Ho
We wish to study the Iirniting bahavior of
in the supnorm under the null hypothesis of nechange. For the indices kl, . . . , k,, we vmite [(n + l)t& . . . , [(n + l)t,], O < tl < t2 < . . . < tS < 1? respectively.
The ~ ~ t o t i c behavior of ~ ~ ~ n + ~ ) t ~ ~ , ~ ~ n + ~ ) t ~ ~ , . ~ . , ~ ~ n + ~ ~ t ~ ~ ~ 0 < ti < t 2 < < ts < 1, will be derived kom the following reduction principle (Lemma 5.2.1), which follows
fiom Theorem 4.2.1 by Janson and Wichura (1983).
Similady as in case of testing for at most two changepoints (cf. Section 4.2.l),
we define the antisymmetric and degenerate (aw2 = IE~**(&) = O) kemel h* with
mean zero by
Hence, the U-statistic
@+2) - -
- -
~ 2 ~ 2 1 defined in (5.1.12) can be written as
5.2. Antisymmetric Kernels 139
and, moreover,
where z , ( ~ + ~ ) is a U-statistic with an antisymmetric and degenerate kernel h*, and
Theorem 4.2.1 can be applied. We mention that the * in (5.2.l), as well as in the
followhg formulas, indicat es that we consider the clifference of one of the U-statistics
in (5.1.12) and its correspondhg projection.
We now proceed to prove the following reduction principle.
$1 Lemma 5.2.1 Let h be an antisymnaetric kernel and ~ t & ~ ) ~ ~ ~ , ~~n+l)tlly[(n+i)tal, . . ., defined as in (5.1.12). Then m d e r Ho the following statements hold true as
max ~S[(n+~)tr]~[(n+l)t2]~---~[(n+1)ts]_<n lz:&l)tl]
Proof of Lemma 5.2.1 The finit statement follows from (4.2.1) and the fact that,
asn+w,
z*(O [(n+l)~-l~,[(ncl)~ 2 2 i 5 s, is a U-statistic (with an antisymmetric and de-
generate kemel) fiom [(n + l)ti-l] + 1 to [(n + l)ti] minus a s u m of i.i.d. r.v.'s.
Hence, we have t o shift the interval [[(n + l)&-l] + 1, [(n + l)ti]] to the intemal
[l, [(n + 1) (ti - ti-i)]] . Moreover, using the fact that for i.i.d. r.v.'s XI, X2, . . . , Xn
for each n, we get, as n + oo,
which proves statements 2 to s in this lemma.
*('+') is a U-statistic (with an antisymmetric and degenerate ker- Sim'lar'y7 ' [ (n+~) 4 ?,n
nel) £?om [(n + l)t,] + 1 to n minus a sum of i.i.d. r.v.'s. Hence, we have to shift the
intenml [[(n + l)t, + 11, n] to the intenal [l, [n - (n + i)t,)]]. Using now the fact
5.2. Antisymmetric Kernels 141
that
{ hS(Xi, Xi), O < ts < 1) [(n+l)t,tl]~i<j<n
g { C h*(xi, X j ) , O < ts < i ) , lSicj<[n-(n+L)ts j
for each n, we arrive, as n + m, at
which proves statement (s + 1) in this lemma.
Combining (4.2.1) together with (5.2.1) we get
which proves the 1 s t statement in this lemma. O
Now we are in the position t o show that the maximum of ~b+l)tll,~~n+i)t21..,~(n+~)tS~
minus a sum of projected i.i.d. r.v.'s is also of order n. We state the following corol-
lary and give a detailed proof. The idea of the proof is similar to one in Corol-
Iary 3.2.1, however, here we give a more general version. This corollary clearly
implies C o r o l l q 3.2.1 and Corollaxy 4.2.1 when we put s = 1 and s = 2 respec-
t ively.
CoroUary 5.2.1 Let h be an antisymmetric k e n d and ~~(~l)~ll,[(~cllhl,..-,~(n+~)ts~ be
defined as in (3.1.12). Under Ho, as n + CQ, we have
- max
15[(n+l)ti]~[(n+l)tz]I....~[(n+1)ts]~n ~"[~n+l)t~l,[(n+~)t~],---,[(n+~]ts] - { (2 ï=1 ([(n + l)ti+-~
where [(n + l)to] := O and [(n + l)t,+l] := n.
Define to := O and t,+l := 5 and let
s [(n+l)til
~ 2 ' (h, t z - , G) := ( C ([(n + l)ti+l] - [(n + i)tâ-i]) C h(xj)) i=1 j=l
and
Since t,+l + 1 as n + w, for convenience we mite H(l) instead of H(--&). We
rearrange the summation of icF)(tl, t 2 . . . y ts) and we get the following:
t€n')(tl, t*. - . , t,) =
= C ([(n + l)ti+lI + [(n + l)ti] - [(n + l)ti] - [(n + l)ti-l])H(ti) i=l
We are now in the position to split these sums in ~Ll)( t~ , t 2 . . . , ts) and apply the
previous lemma. Since,
n - - max
l~[(n+~)t~fl[(~+l)t2]~.--<[(n+~)t,]~n 1 (@+*) - C(n - 2i + i)h(xi))
i=l
[b+l)tlI - (~(1)
[(n+l)trl - C ([(n + i)tl] - 2i + i ) h (xi)) i=1
we have that
5.2. Ant isymmetric Kernels 144
Theorem 5.2.1 Assume that Ho, (2.2.3), (5.1.4) and (5.1.8) hold. T h e n we c m
d e h e a sequence of Gaussian processes {rE(tl, t2y . . . y ts), 0 5 tl 5 tZ - - - - < ts 2
l)nEN SUC^ that, n -t CO,
and, for each n,
where the Gavssian process ra Zs defined vin a linear combination of a standard
Wiener process W as folZows:
where to := O and tStl := 1.
5 -2. Ant isymmetric KerneLs 145
Proof of Theorem 5.2.1 The proof is similu to that of Theorem 4.2.2. Again
we use the fact that we can define a Wiener process {W(t), O 5 t < co) such that
(cf. Csorgo and Gvész (1981, Theorem S.2.2.1 by Major (1979) combined with
(S.2.2.2))), as n -t m,
Combining this fact with Corollary 5.2.1 we get the desired result. O
Similady, as in Chapter 4, the upper index a symbolizes that this Gaussian
process corresponds to the case where we have an antisymmetric kernel. Furthermore
we note that for O 5 tl t 2 . . . 5 t, 5 1
and
where ta = ro := O and t,+, = r S + ~ := 1. The computation of the covariance is
straight fomard. we note that
5.2. Antisymmetric Kernels 146
Theorem 5.2.1 Mplies that under Ho
Moreover, this allows us to produce tables for the unknown distribution of
SU^^<^^<^^<..<^^<^ Ira (tl , t2, - - . : ts) 1 and reject the nul1 hypothesis of nechange, if -
SuPo<tl <t2< ...<ta <l ( t ~ ? t2? - - - 7 1 becornes large.
We mention that in case of at most one change, Theorem 5.2.1 reduces to The-
orem 3.2.1 which is due to Cs6rg6 and Horvgth (1988b, l997). This is easy to see,
since with s = 1 we have
where t,+l = t2 := 1, to := O and B is a Brownïan bridge as in Theorem 3.2.1.
In case of at most two changes, namely s = 2, ra reduces to
where t,+, = t3 := 1, to := O and ra(t,, t2) is the Gaussian process from (4.2.10) as
in Theorem 4.2.2.
We are now in the position to combine test statistics for a Werent nurnber
of change-points. To do this, we define the i-dimensional vectors (tl , t2, . . . , ti) E
5 -2. Ant isymmetric Kernels 147
(O, 1)' c Ri and xi E R, 1 5 i 5 S, and the SupEuclldean n o m
Theorem 5.2.2 Assume that Ho, (2.2.3), (5.1.4) and (5.1.8) hold. Then we can
defiBe sepences of Gazsian processes {rn(t1), O 5 t, 5 1), {r: (t,, tz), O 5 tl < tz $ 1), . . -, {rE(t17 tZ7 . . . ,td), O 5 tl c t2 < . . - < ts 5 1, 1 5 s c n ) such that
with the Sup-Euclidean n o m we have componentwise wnvergence in didribution,
narnley, as n + m,
where for each n and 1 5 i 5 s
Proof of Tbeorem 5.2.3
This implies that we have componentwise convergence in distribution of the Ün7s to
the appropriate P ' s using the appropnate nom. O
5.2. Antisymmetric Kernels 148
where we will assume that h(z, y) is a nondegenerate antisymmetric kernel and (5.1.21)
holds. Therefore, in this context, (5.1.21) replaces the stronger assumption of a sec-
ond finite moment as in (5.1.19) which was used t o derive (cf. Section 5.2.1) the
convergence in distribution result, as n + CO,
where the stochastic process On (t,, t2, . . . , t,) was defined as
and the limiting Gaussian process P(t l , t2, . . . , ts) as
with to := O and t,+l := 1.
The limiting non-random function in (5.2.9) will depend on the location of the
change-points [dl], [nX2], . . . , [nX,], O < XI 5 X2 5 . . - 5 Ar < 1. Moreover, we
wdl see that SU^^^^^<^^<.-.<^^<^ Iu.(tl, t2, . . . , t,) 1 is consistent and goes to hfhity in
probability, as n -t co, under H?).
5.2. Antisymmetric Kernels 149
In a similar vein aç in Section 5.3.2, we define
where O _< xi < x2 < 2 3 5 . . . 5 2,+2 5 xs+3 < xs+4 5 xs+5 5 - . - 5 x2s+4 < 1- Here 1 2 , . . . xs+ly xl+qY . . - y X ~ ~ + Z and x2s+3 play the role of dummy variables and
are used to split blocks that may contain change-points. At most s of them will
actually be used, but since the location of the change-points is unknown, we need to
consider s possible changes in each of the blocks (X~(n+i)zil+iy. - - y X[(n+l)xS+,~) and
( X + l + 3 1 + . - , X I S ) - Hence, the convergence reçult from the following
Section 5.3.2 can be applied to (5.2.10) and we get
where = . . . = t9s+i,s+i = 0 = O and1
lWe note that := Bi: . . ., BICl,b+l := 88+f which was defined in (5.1.15).
5.2- Antisymmetric Kernels 150
Moreover, if the distribution between two different changes is the same t hen, by
the property of antisymmetric kemels, the corresponding Oij is ako equal to O. For
example, if the distribution between the second and third and between the sixth
and seventh change-point are the same, then 03,1 = 0.
We are now in the position to go back to the definition of Z~~n+l)til,~(n+l~t21,...,~(n+~)~S~~
O < tl < t 2 -. . < ts < 1 in (5.1.1), and write it in terms of double sums S(-) as
in (5.2.10). Thus we arrive a t
" Since we do not know the Iocation of the where we define to := O and ts+l := ,. change-points [nXl], [&], . . . , [nX,], where O < Xi 5 X2 5 . . . 5 A, < 1, we define
the following functions = q(Xi, A*, . . ., A,, tl, t2, . . ., t,), 1 5 i 5 (S + 1)st2 which
will be used to derive a formula for (5.2.9) that wiI1 allow us to handle al1 possible
combinations of XI, . . . , As and t l , . . . , t,:
5.2. Antisymmetric Kernels 151
where O < a l 5 a2 5 ... 5 < 1 and q E [O,1], 1s i 5 (s+I)s.
We need to define these ai 's , 1 5 i _< (s+l)s, tince there are exactly (s+l)s pos-
5.2. Antisymmetric Kernels 152
sibilities to place s change-points in (s + 1) blocks. Moreover, exactly s of the latter
ai's wiU change to one of the values Xj7 1 5 j 5 s, while the other s2 ai 's will get
the value q and they will drop out of the iimiting function ü~~ ,X~,...,X. (tl , t2 . . . ts) . We are now in the position to state the following theorem which is an immediate
consequence of the above argume~ts in this section.
Theorem 5.2.3 Assume that (2.2.3), (5.1.20), (Ei.l.Zl), and ~ 2 ) hold. Define to :=
O andt,+l := 5. If% = ~ ( n ) := [di], i = 1,2 ,..., s, O < Al 5 XÎ 5 ... 5 A, < 1,
then, as n + CO,
and 7/ and the a+, 1 5 i 5 s(s+l), are defined in (5.2.11) and (5.2.14), respectively.
- We note that in Theorem 5.2.3 81,1 = = . . . - BS+l,s+l = 0 = O. Moreover,
if the distributions between the (i - 1)-th and 2-th and between the (j - 1)-th and
j-th change are the same, then we alço have Bi j = 6 = 0.
The limiting function in (5.2.15) is defined for every possible combination of
Al, . . . , A, and tl, . . . , t,. As we will discuss it in Section 5.2.2, the limiting function
U ~ ~ J ~ , . . . , ~ (tl, t2, - . - ts) involves (y) different cases. Furthemore, exactly ('z2) dXerent O$, 1 5 i 5 j 5 s i 1, appear. Removing the ones which are 0, namely
6lY1, 82,2, . . . : es,= and 6s+1,+l, then exactly (":') different Bi j 7 ~ , 1 5 i < j 5 s + 1,
appear .
Assuming that IEh2 is finite instead of (5.1.21), then Theorem 5.2.3 impiies the
consistency of tests based on sugfunctiona.1~ of {Ü~(n+~~~l~,~(n+~~t21,...,~~n+~~tS~7 O < tl <
t2 < . . . < t, < 1). This means that we can consistently reject Ho vs. @), when
except in the case when & = 0, 1 5 i 5 j 5 s + 1. u ~ , , ~ 2 y ~ . ~ y ~ , (tl, t2> . . - , ts) is equd to O if and only if ail &/s involved are equal to O.
Otherwise, by using the fact that Al, X 2 , -, and A, are h e d a priori between
O and 1 and do not depend on ti, 1 5 i 5 s, one can show that there is at l e s t
one combintation of XI, . . . , A, and tlt . . . , t, çuch that UA, ,A ,,.-. J, (tL , t2 , . . - , ts) is not
equal to O.
We observe that if = 19 for al1 possible choices of 1 5 i _< j 5 s + 1, then
since B = O. Moreover, it follows that in this case under the null hypothesis Ho
no-change
Thuç on assuming that Bij = 8 = O for a
sequence
LU possible choices of 1 5 i 5 j 5 n, the
is not consistent against any class of alternatives. On the other hand, if at least one
B i j is not equal to O and we use z, then
P{Ho is rejected when using T,IH?) is true) - 1. n400
5.3. S m e t r i c Kernels 154
This impiies that the b i t s of the sequence {T,),E~ in thiç case are different in
p r o b a b w under Ho and H!), and hence we have consistency of { T n ) , , ~ .
5.3 Symmetric Kernels
We consider the s-time parameter processes Crkt ,k2,...&, 1 < kr < k2 < . - - < ks < n, n 2 s + 1, where the kernel h is symmetnc as in (2.2.2). By using the notations
and assumptions from Sections 5.1.1 and 5.1.2, we inveçtigate the behavior of these
processes under Ho (cf. Section 5.3.1) and ~ 2 ) (cf. Section 5.3.2), as n -t m.
5.3.1 Asyrnptotic Results under Ho
We wish to study the limiting bahavior of
in the supnorm under the null hqpothesis of no-change. Again we will write [(n + i)ti], . . . , [(n + i)tS], O < tl < tî < . . . < tS < 1, instead of the indices kl, . . . , ks-
Similady, as in case of testing for at most two changepoints (cf. Section 4.3.1)
we define the symmetric degenerate (iY2 = I E ~ , ' ~ ( X ~ ) = O) kemel gè with mean O as
5.3. Symmetric Kernels 155
Hence the centralized U-statistic uPC2) defmed in (5.1.11) can be written as
which implies that
where u:('+~) is a centralized U-statiçtic with a symmetnc and degenerate kernel
g;, and Corollary 4.3.1 can be applied.
We now proceed to prove the following reduction principle.
max 1 u(') i<~~n+i)t~~<[(n+~)t2]~..-~[(nii)t~ [(n+l)tlJ
max I u(~+') i<[(n+i)tt ]~[(n+i) t?]< ...<[( n+i)t,l<n Nnf l)4Ln
5.3. Symmetric Kernels 156
Proof of Lemma 5.3.1 The f h t statement follows Erom Corollary 4.3.1 and the
fact that, as n + w,
and combined with similar arguments as in (4.3.6) we have, as n + co,
u*(4 [(n+llti-t J,[(~+Wil' 2 i 5 s, is a centralized U-statistic (with a symmetric and
degenerate kernel) £rom [(n + i)ti-l] + 1 to [(n + l)ti] minus a s u m of i.i.d. r.v.'s.
Hence, we have to shift the interval [[(n + l)ti-l] + 1, [(n + l)ti]] to the interval
[1, [(n + 1) (& - ti-i)]]. Moreover, using the fact that for i.i.d. r d s X I , X2, . . . , Xn
for each n, we get,
Since (5.3.2) holds
Similar arguments as in (4.3.6) yield, as n -+ CU,
which proves statements 2 to s in th% lemma. *(s+l)
s ~ ~ ~ 1 X up+l)tsi ,* iç a centralized U-statistic (with a symmetric and degenerate kernel) from [(n + l)t,] + 1 to n minus a sum of i.i.d. r.v.'s. Hence, we have to shift
the interval [[(n + l)ts + 11, n] to the interval [1, [n - (n + l)t,)]] . Using now the
fact that
for each n, we arrive, as n + CQ, at
Moreover, we have
which proves statement (s + 1) in this lemma.
Corollary 4.3.1 combined with (5.3.1) leads to
Using the definition of U:(s+2) we get
and by using (4.3.6) this reduces to
which pro- the last statement in tfiis lemma. O
Now we are in the position to show that the maximum of ~~(n+l)tl~,~(n+l)t21...,~(n+11tS~
minus a s u m of projected i.i.d. rx ' s is also of order n. We state the following corol-
lary and give a detailed proof. The idea of the proof is simi1ar to one in Corol-
lary 3.2.1, however, here we give a more general version. This corollary clearly
implies Corollary 3.3.1 and Corollary 4.3.2 when we put s = 1 and s = 2 respec-
tivel y.
Corollary 5.3.1 Let h be a symmetRc kernel and U[(n+l)tll,[(n+i)t21,-..,[(n+1)t,~ be de-
fined as in (5.1.11). Under &, as n -+ m, we have
rnax ~5((n+~)t~l~[(n+l)t2]~..-~[(n+~)t~]~n ~"~(~+~~~~l,[(n+~)~~]7.--7[(n+l)t,] - { ( i=l 2 ([(n + l)ti+11
E(n+mil + + 119-il - 2[(n + l)ti]) )I: K(xj)) + [(n + i)t] 2 h(~')} 1
j=l t=1
where [(n + l ) h ] := O and [(n + l)ts+l] := n.
- - max n
(1)
~<[(n+~)trl_<[(n+~)t~]~.--~[(n+l)tS]~n - ('[(n+i)tl] + uE+l)ti],[(n+i)t2,
5.3. Symmetric Kernels 159
Recall that to := O and ts+l := -& and let
and
[(n+i>til
H(ti) := C h(xj), Osti<l with H(to) := O,
where again we wilI mi te H(1) instead of H(*) on account of being interested in
n large. We rearrange the summation of ili2)(tl, t 2 . . . , t , ) and we get the following:
We are now in the position to split these sums in K&*) (tl , t2 . . . , t,) and apply the
5.3. Symmetric Kernels 160
previous lemma. Since;
we have that
n
< - I U ( ~ + ~ ) n - n ~ h ( ~ i ) l + max i= 1 ll[(n+l)~r]<[(n+i)t2]~...~[(n+l)ts]<n I"E+l)tl]
+ max I + ) i<[(n+i)t,]<[(n+i)t2 j ~. . .<[ (n+ï ) t S ] s n [(n+l)taldnl
We now have U~(n+l)tll,~(n+l)t21~~~~,~~~+1~ts~ approximated by SUMS of i-id. r.v.'s, and
thus we can easily study the asymptotic behavior of the weak convergence of the
Theorem 5.3.1 Assame that Ho, (2.2.2), (5.1.4) and (5.1.8) hold. Then we a n
define a sequence of Gatassian processes {rnW(tl, t 2 , . . . , tS) , O I tl I t 2 . . . 5 tS 5
lInEN S U C ~ thut, as n + m,
5.3. Symmetric Kernels 161
and, for each n,
where the Gawsian prucess r- is defied via a linear wrnbinatzon of a standard
Wiener processes W as fo Ibws:
where to := O and t,+l := 1.
Proof of Theorem 5.3.1 Combining Corollary 5.3.1 with (5.2.6) gives the desired
result. t7
We note that for O 5 t1 5 t z . .. 5 ts 9 1
E F ( t l , t2 , . . . , t,) = 0,
and
where to = ro := O and t,+l = r,+l := 1. To compute the covariance, we use the
fact that
5.3. Symmetric Kernek 162
Theorem 5.3.1 implies that under Ho
We mention that in case of at most one change, Theorem 5.3.1 reduces to The-
orem 3.3.1 which is due to Csorg6 and Ho&th (1988b, 1997). This is easy to see,
since with s = 1 we have
where t,+, = t2 := 1, t0 := O and I' is the Gaussian process from (3.3.1) as in
Theorem 3.3.1.
In case of a t most two changes, namely s = 2, Pm reduces to
where t,+l = t3 := 1, t0 := O and rS"(tl, t2) is the Gaussian process from (4.3.10)
as in Theorem 4.3.1.
hirthermore, we mention that the Gaussian processes ra fiom (5.2.5) and
fiom (5.3.6) are diaerent and their relationship is as follows:
We are now also in the position to combine test statistics for a different number
of change-points and get the following theorem:
Theorem 5.3.2 Assume that Ho, (2.2.2), (5.1.4) and (5.1.8) hold. Then we can
define sequences of Gawian processes {rip(tl), O 5 tl 5 11, {rip(tl, t 2 ) , O 5
tl c k 5 11, . . ., {r;w(tlyt2,. . . , ts) , O 5 tl < t2 < - . - c ts 5 1, 1 5 s < n ) S U C ~
that with the Sup-Euclidean n o m as in (5.2.8) we have, as n + ai,
where fur euch n and 1 5 i 5 s
The proof of this theorem goes dong the lines of the proof of Theorem 5.2.2.
5.3.2 Asymptotic Results under ~ 2 )
5.3. Symmetric Kernels 164
where we will assume that h(x, y) is a nondegenerate symmetric kernel and (5.1.21)
holds. Therefore, in this context, (5.1.21) replaces the stronger assumption of a
second h i t e moment as in (5.1.19) which was used to derive (cf. Section 5.3.1) the
convergence in distribution statement, as n + CQ,
with t,+l := $ and the limiting Gaussian process rsp(t l , t 2 , . . . , t s ) as
with to := O and ts+l := 1.
The limiting function in (5 -3.9) will depend on the location of the change-points
[nh], [&], . . . , [nX,], O < A1 5 A2 5 . . - 5 As < 1. Moreover, we will see that
SUPO<~~ <t2<...<t,<l lun ( t l , tZi - . . , ts) ( k consistent and goes to infini^ in probability,
as n + 00, under H:).
The general limiting function in (5.3.9) will involve many variables, since we have
to handle all possible combinations of t i Y s and Xi's. This is due to the fact that we
have to consider every possible combination of change-points.
w'hen looking at the &finition of ~[(n,l)t,],[(n+,)t,], ...,[(*+ L)ts]l 0 < tl < t 2 - < t, < 1 in (5.1.1), we see that we may split it into many double sums, where each of
5.3. Symmetric Keniels 165
these double s u m ~ ; is a sum of the fonn
where O < a c b 5 c < d 5 n and a, b, c, d E N. These double sums may be
associated with comparing the two blocks (Xa+17 . . . , Xb) and (XNlr . . . , X d ) with
each other. In the case of testing for s changes, we have to compare each of the
blocks (XI, . . . , &), (Xk,+i, - . . ; Xk2), - . ., (XkSily . . . , Xn) with each other. Of
corne, each of these s + 1 blocks may contain s change-points and therefore we split
each of the sums in (5.3.10) into s + 1 sums. Since we do not know the location of
the s chang,e-points in advance, we mite the latter sum as follows,
Comequently, we are comparing blocks with each other that do not have a change
inside. When usbg ( X X l l ) , the double sum in (5.3.10) is split into (s + 1)2 double
sums. 4gain we emphazise that there are at most s changes in total which implies
that some of the new s m d blocks may be viewed a s bigger ones, since there is no
change inside the blocks, nor in between them. Consider
(s - 2) changes in the block (Xa+l, . . . , Xb) and 2 changes in
then (5.3.1 1) reduces tu
the case where we have
the block (X,,,, . . . , Xd)
Since we do not know in advance the location of the change-points, we consider
sums as in (5.3.11). For technical purposes, we define the following function
which will be used to remind ourselves of the location (either before the first or
between the first and second or between the second and third or . . . or after the last
changepoint) of a block of r . d s which does not contain any change-point.
We split (5.3.11) into (s + 1)2 different double sums which are of the form
where 1 5 Rl = Ri(n) := [nr1] < R2 = R2(n) := [ni2] 5 R3 = &(n) :=
[nr3] < & = &(n) := [nr4] 5 n are chosen properly according to the double
sums in (5.3.1 1). Furthemore we know fiom Section 4.3.2 as an immediate conse-
quence of Theorem 2.6.2 by Sen (1977) and Hoeffding's SLLN (cf. Theorem 2.6.1)
that for each of these sums we have that, as n + cm,
The proof of (5.3.13) was diçcussed in the case of at most two change-points (cf.
Section 4.3.2), and may also be applied here.
Similady to (5.3.11), define now
5.3. Symmetric Kernels 167
where O 5 XI < x2 5 x3 5 - . . .< z S + 2 5 x,+3 < Zs+4 5 x,+5 < . . . 1 x2s+4 < 1.
Moreover, since (5.3.14) is a sum of double sums, (5.3.13) c m be applied many
times and the limiting function of (5.3.14) can be writ ten as (n -t m)
We note that if Bl(,,,,),l(zq+,) = 0 for all possible choices of r and q then
We are now in the position to go back to the definition of Z~(n+~)ti~,[(n+l~tz~l.~.,~(iZ+l)tJ~y
O < tl < t z . . . < t, < 1 in (5.1.1), and mite it in terms of double sumç S(-) as
in (5.3.11). Thus we obtain
" Since we do not know the location of the where we d e h e to := O and tscl := ,- change-points [dl], [nA2], . . . , [ n A & where O < XI S A2 I . . . I AS < 1, we define
the following functions = %(A1, A*, - - -, As, tl, t2, . . ., t,), 1 5 i 5 ( S + 1)s, which
will be used to derive a formula for (5.3.9) that rvill d o w us to handle dl possible
combinations of Al, . . . , A, and tl,-. . , t,:
A l , A l I t l , al :=
A,, Al 5 -.- - I t l , a, :=
Cs, t l < L
(cSc2, otherwise,
5.3. Symmetric Kernels 169
( czsis, otherwise, ( Al7 t, < A, < 1,
[ c otherwise,
t, < A l 5 - - - 5 As < 1,
c ~ . ~ + ~ ot herwise,
where O < al 5 5 ... 5 a,.,+, < 1 and q E [O,l], 1s i 5 (s+l)s.
We need to define these aiYs, 1 5 i 5 (s + l)s, since there are exactly (s + 1)s possibilities to place s change-points in (s + 1) blocks. Moreover, exactly s of the
latter ai's will change to one of the values Aj7 1 5 j 5 s, the other s2 ai's will get
the value q and they will drop out of the limiting b c t i o n UA,,* ,,..., ,*(tl, t2,. . . , ts).
The following theorem is an imrnediate consequence of the above arguments in
this section.
Theorem 5.3.3 Assume that (2.2.2), (5.1.20), (5.1.21), and ~ 2 ) hold. Define to :=
O and t,+~ := If ~i = q(n) := [nAi], i = 1 , 2 > . - -, s, O < Al 5 A2 5 . . . 5 A, <
- -
3Note that t,+l + 1 when n + CG. In (5.3.16) taci := e, since n = [(R + 1)&] = [(ta + l)t.+l] has to be satisfied, but in the iimiting funetion (5.3.18) tSçl = 1.
5.3. Symmetric Kernels 170
1, then, as n + ou,
where
and q and the ai 's, 1 5 i 5 s(s+ 1), are defined in (5.3.15) and (5.3.17), respectiueiy.
The limiting function in (5.3.18) is defined for every possible combination of
Al , . . . , As and ti, . . . , t,. Moreover, there are (T) possibilities to choose tl, t2,. . . , t, given the postulated change-points Al, A*, . . . , A,, hence (5.3.18) has (y) different
cases. This is, why the definition of the limiting function UA,J ,,..., (tl , t2, . . . , t s )
involves so many variables. Otherwise we aould have to state all different cases
explicitly. For example, in case of 1 change-point we have 2 different cases, in case
of 2 we have 6, in case of 3 we have 20, in case of 4 we have 70 and so on.
We note that in the limiting function in (5.3.18) exactly rZ2) different Bi jYs ,
1 5 i < j 5 s + 1, appear.4 This follows fiom the fact that there are exactly (5z2) ways to compare 2 blocks (ie. both before the first change, both between the h t
and the second change, . . . , one before the fht and the other between the first and
the second change, . . . ), where none of these blocks contains any of the s changes.
The problem is the same as choosing 2 balls (2 blocks) out of an urn with (s + 2)
balls (2 blocks and s change-points).
Assuming that is finite instead of (5.1.21), then Theorem 5.3.3 implies the
consistency of tests based on suptunctionals of {U~b+l)til,l(n+i)t21,...,~~n+1~tj~i O < ti <
4We note that some of these O i j ' s can have the same value.
t2 < . . . < ts < 1). This means that we can consistently reject Ho vs. LI:), when
except in the case when Bi = O, 1 5 i 5 i 5 s + 1. The hinction U X ~ J ~ , . . . , ~ , (tl, t2 , . . . , ts ) is e ~ u a l to O if and only if all Bi involved
are equd to O. Othemise, by using the fact that Al, X2, - - O, and As are fùced
a priori for each n between O and 1 and do not depend on ti7 1 5 i 5 s, one can
show that there is at least one combination of Al , . . . , A s and t l , . . . , t s such that
We observe that if B i j = B for al1 possible choices of 1 < i 5 j 5 s + 1, then
5.4. Multiple Changes in the Mean 172
is not consistent against any class of alternatives. On the other haad, if a t least one
Bi is not equal to O and we use T,, then similady to Example 4.3.1 we cm show
that
P(Ho is rejected when uçing T,I ~ 2 ) is true) + 1. n+Oo
This implies that the iimits of the sequence (Sn)nEN are different in probability
under Ho and H$), and hence we have consistency of {Tn)neN
5.4 Multiple Changes in the Mean
We are to test the no-change in the mean nd-hypothesis
Ho : XI,. . . , Xn are independent identically distn'buted mndom variables
vith Exi = p and 0 < 9 = VarXi < co, 1 5 i < n,
against the at most s changes in the mean alternative
5.4. Multiple Changes in the Mean 173
Sirnilarly, as in the previous sections, we note that testing for change-points
in the mean can be Uustrated via a geometrical argument, namely by comparing
special areas with each other (cf. Section 3.4 and Section 4.4). Again we consider the
linear function m(t) := t, t E R, which joins under Ho all the points (k, :E{s(~))),
k E N, if p # 0, and it joins all the points (k, E{s(~))), k E N: if p = 0.
Without l o s of generality let p = 1. Shen similarly to Section 4.4, we can draw
the graph of the 45 degree line m(t) = t which joins the points (k, E(s(~))), k E N.
In a similar vein as in Figure 4.4.2, where we focused on at most two change-points,
we can draw a more generalized graph, focusing on at most s change-points.
For each @en kl, k2, . . ., k,, 1 5 kl < k2 < . . . < kr < n, there correspond the s
rectangles with endpoints (ki-1, E{S(O) )) , (ki-1, ~E{s(ki)}) (k+l , ~ { ~ ( k i ) 1) and
(ki+l, E{S(O) }) , 1 5 i 5 s, respectively, where ko := O and k,+l := n. Each of these
rectangles with length (ki+l - ki-l) and height IE{S(ki)) has area (ki+l - ki-i) x
E{S(ki ) ) , 1 5 i 5 s, respectively, where ko := O and ksii := n.
By reflecting some parts of these rectangles5 around the 45 degree line m =
t we can construct a new rectangle with enpoints ( o , ~ ~ ( s ( o ) ) ) , (o,JE{s(~))),
(k, , E{S(~))) and (k,, E{S(O) )) . This rectangle, with length k,; height E{S(n))
and area k, x IE{S(n) ), has now, under Ho, the same area as the sum of the previous
areas. Consequently, for each given combination of 1 < kl < k2 < . . . c k, < n,
with ko := O and k,,l := n, is an unbiased estimator of zero under Ho. Moreover,
T(") kl iky---,kr is equivalent to ZklYk2,...,+ in (5.1.1) with
is a linear combination of partial sums and ti = 2, as n + oo,
(n> h(x, Y) = - y- Since Tklyk2,.-.y~ 1 5 i 5 s, Tbeorem 5.2.1 yields,
SNote that these rectangles are overlapping.
5.5. Multiple Changes in the Variance 174
with to := O and ts+l := 1, and where the limiting process is the same as îa( t l , t2,
. . ., ts) fkom (5.25)-
5.5 Multiple Changes in the Variance
We axe to test the no-change in the variance hypothesis
Ho : XI, . . . , X, are independent identiually distributed mndom variables
v i t h a = p a n d 0 < $ = V a r X i < m , l < i < n ,
against the ut most s changes in the variance alternative
~ 2 ) : X I , - - - , X, are independent randorn variables and there ezist s
integers 1 i p - 2 , . .-,T~, 1 5 TI 5 73 5 . - . < rS < n, such that - VarXl = . . . = VarX, # VUTX,,~ = . . . = VarX,, VarXsiCl =
. . . = VarXT2 # V~TX,,+~ = . .. = V ~ T X , , ..., VarX,,-,+l =
. . . = VarX,, # VarX,,,, = . . . = VarX,, O < Var&, VUTX,,+~,
V ~ T X , + ~ , . . ., v a ~ X , + ~ < w, and EXl = . . . = B X , = p.
Again, as in Sections 3.5 and 4.5, we suggest the use of the symmetric kemel
and, after some algebraic manipulations, ZkiJi2,...,ks in (5.1-1) can be written as
where ko := O, kS+i := n, S ( k ) := Xi and R(k) := c:=~ X:. By Theorem 5.3.1,
as n + w, we have that under Ho
5.6. Multiple Changes in the Mean and/or Variance 175
with to := O and t,+l := 1, and where the limiting process is the same as rs"(ti,
t2, . . .t ts) kom (5.3.6).
5.6 Multiple Changes in the Mean and/or Vari-
ance
In the previous chapters, as weU as in the previous sections of this chapter, we
considered tests for changes in the mean or changes in the variance, separately only.
Frequently, it is of interest to be able to test for changes in both the mean and
the variance. It tums out that this is not an easy task in general. Based on U-
statistics-type processes, here we propose a test statistic that will test for changes
either in the mean or the variance or both. Unfortunately, in the following setup
we are not able to distinguish between changes in both and changes in only one of
them. Nevertheless, this test can be used when both depend on each other, namely
the mean changes if and only if the h a n c e changes (cf. Section 6.5). In case of
independent normal variables Gombay and H o d t h (1997) proposed an estimator
for testing one single change in the mean and/or variance using the likelihood ratio
test.
We are to test the no-change in the mean and variance hypothesis
Ho : X I , . . . , X, are independent identically dktributed random variables
with E& = p and O < o2 = VarXi < 03, 1 5 i 5 n,
against the at most s changes in the mean and/or variance alternative
~ 2 ) : XI, . . . , X, are independent randorn variables and there ezist s
integers T~~Q,.. .,q, 1 5 TI < 72 -< . . . 5 rS < n, such that
EXl = .. . = EXn # EX,+1 = . .. = IEX, and/or Var& =
. . . = VarX, # V ~ T X , + ~ = . . . = VarX,, - - . . . - -
E X . # EXT2+1 = . . . = dFX, and/or VUTX,+~ = . . . = VarX, #
5.6. Multiple Changes in the Mean and/or Variance 176
VarXncl = . . . = VarX,, . . . , EXTd-,+l = . . . = EX, # BXTd+l =
.-. - - EXrn and/or V~TX,-,+~ = . . . - - VarX,, # VarX,,,, =
. . - = Var&, ando < VarXi , VUTX,+~, VarX,+l, . . ., VUTX,,+~ < W.
it is reasonable to consider symmetric kernels of the form
Consequently, under Ho, h(Xi, X j ) is an unbiased estimator for B = 2 + a2. It iç obvious that changes in p or a2 or in both will change B. By using this kernel
function, we can not distinguish which parameter changed. Nevertheless it may
be used to detect, if there were any changes at ail in any one, or in both of these
paramet ers.
To apply our theory on U-statistic based processes, we assume that under Ho
and put
Furthemore, we also assume that under Ho
is positive and finite. Hence, after some algebraic manipulations, Zki,kz,-..,ks in (5.1-1)
with the kernel £rom above can be written as
where ko := O, k,+l := n and R(k) := ~ f = , X:. By Theorem 5.3.1 we have that
mder 6, as n + oo,
with to := O and t,+l := 1, and where the limiting process is the sarne as rsp(tl, t2, . - . , ts ) frOm (5.3.6).
5.7 Estimating the Number of Changepoints
When dealing with multiple change-points one of the questions that occurs is 'how
many (change-points) are there?'. This tums out to be not an easy task and there
are not many papers mitten on this topic. Yao (1988) suggested an estimator
for the number of changes in the mean when we have normal observations with
cornmon variance via a maximum likelihood argument and provided consistent way
of estimating the tme number of change-points. Serbinowska (1996) showed the
consistency of the maximum likelihood estimator for the number of changes in case
of binomial observations. Other related references about multiple change-points can
be found in Csorgo and Homath (1997) and Lee (1996).
Lee (1996) obtallis a nonparanietric estimator for the number of change-points
and proved its consistency. Namely, we are to test the null hypothesis Ho from
Section 5.1 against the alternative
H!) : XI,. . . , X, are independent randorn variables and there ezist s,
5.7. Estimating the Number of Change-points 178
integers TI = +), ~2 = ~ ~ ( n ) , - - . , T- = Ts,(?t), 1 < TI < 7-3 < . . . < rS, < n, such that P{Xi 5 t ) = . . . = P{X, 5 t) , P{X,,+l
t ) = . . . = P{Xn 5 t), . . . , qXr.n+i 5 t} = . . . = P{X* 5 t )
for al2 t and P{X, 5 t ~ ) # IP(XTi+l 5 to) for some to and for al2
l s i < s , .
Uçing the idea of a window of observations (with length &), which is due to Lombard
and Carroll (1992), Lee (1996) considered the difFerence Dj of two weighted empirical
measures at each possible location j of changepoint a s follows:
where ci, 1 5 i 5 4: is a sequence of positive numbers, llcll = 4- and x E R. Assuming that the difference betweec two successive change-points is
larger than 2A,, we can compte the expected value of D?)(z) for each s. Note
that for each j we can think of 'comparing' the block of random variables (xj , Xj-1, . . ., Xj-a,+2y X,-A,+~) to the b10& of random variables (xj+1, Xj+2, . . ., Xj+a,-i, x j + A n ) - Moreover, there iç at most one change-point in the combined
b10c.k (xj-AnCl7. . . , xj+An) , since by our assumption the Merence between two
successive change-points is larger than 2%. Hence, if the compared random vari-
ables are both before or both after the change, then they have the same distribution.
Therefore,
5.7. Estimating the Number of Change-points 179
for some r E 1,. . . , s,, and x E R.
Let
llcll Ci=lsr-jl+i an(j) =: {I An if 17, - jl 5 An - 1 for some T,,
0, O therwise.
Then it is easy to see that 6,(j) is increasing, if j - A, < j 5 T,, and decreasing,
if T, 5 j 5 2&. Moreover, it takes its local maximum at the change-point T,. The
same holds true for A?), if we assume, that sup,,, (P(& 5 z) - IP(X,,+, < z))
is positive as., 1 5 r 5 s,. Therefore, as an estimator for the true number of
change-points, Lee (1996) suggests the estimator Sn with
rî, := # { j : SU^ D:) (z) 2 Cn and SU^ DP) (z) > sup DC) (z) for ZER XER zER
j - R , < m < j and sup~(i"'(z) 1 s u p ~ g ) ( z ) for j < m < j + % ) , zER ZER
*+O and +O. Xt is obvious that j, = O if sup,,, where Cn satisfies cn n-tm A n n+w
D?) (z) < Cn for al1 possible j'ç.
Under assurnptions (Al) - (A3) of Lee (1996, page %?), Lee (1996) establishes
that, as n -+ oo,
1 1% note that, for example, q, 1 5 i 5 A,, may be chosen as ci = 1 or q = a + a for a > O and b 2 9.
Vostrikova (1981) suggested a binary segmentation method and also proved its
5.7. Estimating the Number of Change-points 180
consistency. Shis binary segmentation procedure detects the number of change-
points and their locations simultaneously. Namely, if we were to test for Ho vs. H:)
(cf. Section 5.1), we h t test Ho, the null hypothesis of no-change, against HA
(cf. Section 3.1)) the alternative of one single change in the distribution. If Ho is
not rejected,' then we stop. On the other hand, if Ho is rejected, then we know
that at least one change-point is indicated and we test the two subsequences before
and after the already located change-point for the possibility of having a t most one
further change-point in them. The procedure stops, when no further subsequences
have changepoints. The number of change-points found estimates the true number
of change-points, S.
This procedure çuggests that when looking for the true number of change-points,
we should first test for one single changepoint. On the other hand, this test statistic
has to be consistent under the alternative of more than only one single changepoint.
But we saw in the beginning of Section 4.1 that, depending on the test statistics
used, testing for more thm one change can be inconsistent in special cases.
Recall notations from Section 5.1.2, especially
Suppose we were to test Ho vs. H:) by using a properly normalized supfunctional
of the stochastic process Z[(,+lltI, O < t < 1, fkom (3.1.1) as a test statistics. m e r
some algebraic manipulations it turns out that the limiting function of -$Z[(n+i)tl
under H:) is of the form
where Ai and Bj are fimctions of X k 7 1 5 k < s + 1, and 1 5 p 5 r 5 s + 1.
We can see that U X ~ , \ ~ , . . . , X , (t) = O for aU O < t < 1 if and only if Oj+i j+i = O, dj = O
and Bi = O, where O 5 j 5 S. Hence, any symmetric kernel will be consistent
5.7. Estimating the Number of Change-points 181
when testing Ho vs. H?)~ but antisymmetnc kemels may faîl to be. This suggests
t o combine the procedure suggested by Vostrikova (1981) with the statistics studied
in Section 3.3, namely sup,,,,, IU,(t) 1 as defined in in (3.1-15). Thus we aim at
having a Vostrikova-type procedure in a non-parametric context. D e h e
and the foliowing matrices that depend on the number of change-points s and on î, which depends on t and the location of the change-points:
5.7. Estimating the Number of Change-points 182
Obviously, the product of the fkt three matrices and the last three matrices is
in RIX1. The structure of the matrices is quite complicated. This is due to the
fact that for fked s we still have to consider where t is located. This, of course,
will now change which also changes the dimensions of the three matices A, B
and C. With similar arguments as in Section 5.2.2, namely using Theorem 2.6.2
and Hoeffding's SLLN (cf. Theorem 2.6.1) respectively, we arrive at the following
theorem (cf. Theorem 5.3.3).
Theorem 5.7.1 Assume that (2.2.2), (5.1.20), (5.1.21) and ~ 2 ) from Section 5.1
hold. Define to := O and ts+l := S. If .ri = ~ ( n ) := [di], i = 1,2, . - . , s,
0 < X 1 < X 2 < ... < X , < l , then, asn- too ,
where
and the matfices A,,;, B,,i, Ci, Ds, Es and F are defined above.
5.7. Estimating the Number of Change-points 183
We observe that if Oij = B for ail possible choices of 1 5 i 5 j 5 s + 1, then, as
n r w ,
which corresponds to the case where we dont have any changes as in (3.1.7) when 1 we replace EZk by b,, 21EZf(n+l)tl. Moreover, it follows that under the nuU
hypothesis Ho of no-change, as n -t 00,
5.7. Eçtimating the Number of Change-points 184
Assuming that there are two changes in the distribution then the latter theorem
implies the following. As n + m,
We note that this is the same limiting function as used in (4.1.1).
Theorem 3.3.2 implies that,
is consistent againçt any class of alternatives, if at least one B i j is not equal to O.
Hence,
This implies that the limi.i;s of the sequence {TnInEN are different in probabüity
under Ho and H!)~ and hence we have consistency of {Tn)nEN
5.7. Estirnating the Number of Change-points 185
Now we continue dong the lines of Vostrikova (1981), and search for the argu-
ment, where the test statistic Tn takes its maximum. Unfortunately, the maximum
is not always taken at one of the change-points, as we saw this in case of one sin-
gle change-point in Section 3.3.3. Therefore we don% know, if the point where we
split the intervals is a change-point or not. Hence it is possible that the number of
possible change-points can be too big. Of course, if the change-points are in such
a way that the maximum is always taken at one of them then the procedure works
out fine.
Summarîaing, we showed that T, which mas used in Section 3.3 in the context of
testing for one single change-point is &O consistent when testing for s, 1 5 s < n,
change-points. Due to the fact that the local maximas are not necessarily taken at
the changepoints, one may, for example, prefer the use of the estimator 3, in (5.7.1)
proposed by Lee (1996) to test for the tme number of change-points.
"'Imagination is more important
than knowledge."
- Albert Einstein
Chapter 6
Applying Change-point Theory to the
Financial Market
6.1 Introduction
In 1973, F. BIack and M. Scholes denved a formula for option pricing, that has
been called the Black-Scholes formula since. They worked closely together with
R. Merton, and in 1997 Merton and SchoIes were awarded the "Nobel Prize in
Economicd Sciences". Black died in 1995 in his mid-fifties. Thousands of traders
and investors now use this formula every day to value stock options in markets
throughout the world. Black, Merton and Scholes thus contributed to the rapid
growth of markets for derivatives in the last 10 yearç.
The derivation of the Black-Scholes formula involves many areas in Probability
theory, for example, Martingale theory, Wiener processes, Itô processes, Stochastic
integration and Stochastic differentiation. In the next sections we give a glimpse for
some of the basic notions.
The Black-Scholes formula is used to compute the value of the so-cded Euro-
pean options and other derivative securities. In their mode1 the secalled volatility
parameter a is assumed to be a constant. It (the variance (wlatility)) is bequently
estimated via histoncal data. Using the results from the previous sections, we WU
propose a test procedure for testing for possible changes in the volatiliw in the
Black-Scholes setup.
6.2. Derivative Securities 187
In practice, changes of the variance (volatility) are very important to know about,
since these changes will affect the behavior of an investor. The higher the variance
(volatili@), the higher is the value of an option in the Black-Scholes model.
Using the methodologies fYom the previous chapters, we will a h a t explaining
how to detect changes in the variance (volatility). &O, we WU see that changes
in the mean of the stock pricel do not affect the Black-Scholes formula, hence the
value of an option.
For further reference and a more detailed description, we refer t o J. Hull (1993),
especially Chapters 1, 9 and 10. We will closely follow his presentation of the matter
in hand. For a review of the Black-Scholes formula we also refer, for exampIe, to
D d e (1996), and to Csorgo (1999), who detailç the original derivation of Black and
Scholes (1973).
6.2 Derivative Securities
A derivative security is a variable depending on other more basic underlying vari-
ables. During the last few years, derivative securities have become more important in
the financial market. DEerent possibilities of derivatives, for example, are forward
and futures contracts or options. We will give an overview on these derivatives.
6.2.1 Forward Contracts
A f o m d contract is an agreement, for example between two financial institutions
or a financial institution and one of its clients, to buy or sell an asset a t a given tirne
for a certain price, the so-called deliuery pn'ce. It is not traded on an exchange and
the parties usually know each other. One of the parties agrees to buy the underlying
a s e t at a certain specified date and price, which is called the long position, and the
other party agrees to sell it, and this is called the short position. The holder of the
'We emphasize that we wiii denote throughout this chapter the stock pr i e at time t by S(t). This is the same notation as used widely in the finance literature. On the other hônd, in mathe- matical literature S(n) uçually denotes the sum of n randorn variables. Although this was already used in previous chapters we will change our notation in this chapter.
short position defivers the asset to the holder of the long position in return for a
cash amount, the delivery price. A forward contract is worth zero when it is first
entered into. Depending on the movements in the price of the asset it can have a
positive or negative value later on. The pay-off boom a long position in a forwazd
contract on one unit of an asset is
where K is the delivery price and S(T) the spot price of the underlying asset at
maturity time T of the contract. Of course, the pay-off from a short position is
Both pay-offs can be positive or negative.
6.2.2 Futures Contracts
A futures contract is an agreement between two parties to buy or sell an asset a t a
given time for a certain price. I t is traded on an exchange and the parties usually
do not know each other. The underlying asset could be a commodity, e. g., sügar,
lumber, gold or cows, or a hancial asset, e. g., currencies, treasury bills or bonds.
An exact delivery date is usually not specified, but there is a delivery month. The
exchange specSes the period during the month when delivery must be made.
6.2.3 Options
There are two different types of options. The d l option gives the holder the right,
but not the obligation, to buy the underlying asset, e. g., stocks, foreign currencies,
commodities, or futures contracts, a t a given time t = T for a certain pnce K, the
so-called ezercise or strike pn'ce or maturity. Similarly, a put option gives the holder
the right so seU. There are different khds of options.
Suppose, we sign at time t = O a contract which gives us the right (option) to
6 -2. Derivative Securities 189
buy one share of a stock at a specified price K at a specified t h e T. Then we will
excercise the option, this means realize the right to buy at the exercise pnce K, if
the price of the stock S(T) is greater than K, the price we agreed to pay for the
stock. On the other hand, if K 2 S(T), we do not exercise the option, since the
@ce of the stock S(T) is l e s than the specified price K. Of course, exactly the
opposite is true, if we have the option to sell instead of the option to buy.
It is clear that an investor must pay a fked amount of money to purchase an
option contract. The Black-Scholes formula dows us to compute how much some-
body should be willing to pay for an option contract. Moreover, there are two sides
to every option contract, the investor who has taken the long position (i.e., who has
bought the option) and the investor who has taken the short position (i.e., who has
sold or written the option). The investor with the short position receives cash up
front but may have liabilities later. The profit or l o s of both investors is the reverse
of each other.
For now, let us consider a cal1 option in the long position for one single share of
a stock. We r e c d that K is the delivery pnce. Then the pay-off when buying at a
specified time T is
(S(T) - K)+ = max(S(T) - K,O),
which is the so-called European cal1 option (option exercised a t maturity t = T)
mentioned before, there are diaerent options and by allowing to buy (exercise
option) at any time from now (t = O) to maturity (t = T) the pay-off is
{(W - K)+Yo I t 5 T),
which is the so-called Arne7ican cal1 option (option exercised at any time horn t
A s
the
= O
to maturity t = 2'). If we allow t o buy when the price of the stock is a maximum
over a specified period then the pay-off is
6.3. ModeLing the Behavior of Stock Prices 190
which is the so-called call on maximum option, or look back option. Simi ly , we
may d o w to buy when looking at the average of the stock pnce over a specified
period of time. Then the pay-off is
which is the so-called Asian option, or cal2 on average option.
Some of these options are riskier than the others. Hence, each of them will have
a different price. For example, a n Amez-îcan option does not have the same risk as
a European option, since we may exercise at any time fi0111 nom to maturiS. This
will make the Amencan option more expensive than the European. Of course, there
are many other options and there is a lot of fkeedom about the characteristics of an
option.
6.3 Modeling the Behavior of Stock Prices
Stock prices are usually assumed to foUow a Markov process. This means that the
probability distribution of the pnce of any particdar future t h e depends only on
the current stock price S(t) . Hence, the present price of a stock S(t) impounds all
the information contained in a record of past prices S(t - t*), t* < t. If the Markov
prope* does not hold, then one could make lots of profit by comparing charts of
the past stock prices. Hence, it is a reasonable assumption.
Wbile a Wiener process has mean or drift rate O and variance 1, we can d e k e the
so-called generalized Wiener process for a variable x which has drift p and variance
9, namely
The pdt term means that z has an expected drift of p per unit time, hence in a
time i n t e d of length T, x increases by an amount of PT. The adW term is the
so-called noise added to the path followed by x.
Since the parameters p and O may also depend on the Msiable z and the time
t, we mo- equation (6.3.1), and get a so-called Itô process
Now we are in the position to define a process for the stock price S(t). -4 first guess
would be that S(t) follows a generalized Wiener process with constant expected drift
and constant variance. However, this process fails, since the expected percentage
return required by inveçtors from a stock is independent of S(t) . For example, an
investor will require a 10% per annum expected return when S(t) is 810 as well as
when it is $20. Hence, we assume that the expected drift rate in S(t) is pS(t) for
some constant parameter p. Therefore,
and, when compounded continuously
so that
When considering a stock we also have to consider its volatility. We do this by
assuming that the variance of the percentage return in a short period of time At is the
same, indepent of S(t). Hence, we express S(t) by an Itô process with instantaneous
expected drift rate p and instantaneous variance 02, i. e., we assume
We mention that /I E R is the constant ezpected rate of return and o > O the so-
called stock pn'ce ~olati l i ty . p depends on the Ievel of interest rates in the economy.
6.3. Modeling the Behavior of Stock Prices 192
The lower the Ievel of interest rates, the lower the expected return required on any
stock. Typical values for o are in the range 0.2 to 0.4, i-e. 20% to 40%-
We even rnay sirnulate the movements in the stock price by using Monte Carlo
Simulation. We give an example of one possible pattern of a stock price movement,
since diaerent random &ables wiU produce another pattern. But by repeating this
procedure many times, we rnay even estimate the distribution function S(T) for a
given T. Since
S(t + At) - S(t)
S(t) = p 4 t + fae6
v with c = N(0,l) we have that
Hence, we may compute the price of the stock the following day, S ( t + At), as
foilows:
where again E N(0, l ) .
We now consider S(t) over a penod of one year, T = 1, and we consider daily
changes in S(t) , At = &. We assume to have an initial stock price S(0) of $20 and
an exptected return of 16% per annum, hence p = 0.16. Moreover, we draw a graph
with a volatiiïty of 20% per annum: ol = 0.2, and 40% per annum, 02 = 0.4. For
both patterns we will use the same generated random variables el,. . . , €365 which
are standard normal distributed. Again we mention that these are only two possible
patterns, and they depend on the generated random variables ~ 1 , . . . , ~365, but we
can see the eEect of different variances on the stock price.
In Figure 6.3.1 we show the patterns of the stock price for different volatilities and
we see that the stock with a volatiliG- of 40% per annurn is much more fluctuating
around the expected drift than the one with 20%. Hence, the higher the volatility
Trading Qys t
Figure 6.3.1: Daily stock pnces with different volatilities ai = 0.2 and o2 = 0.4.
the higher the risk for a n investor. This is one of the motivations where change-
point theory cornes into the picture, for, as the volatility changes, the behavior of
an investor might also change. This d l ako effect how to pnce an option which
depends on the behavior of the stock S(t) and hence dso on its volatility.
For an orner of a stock, the movements in the stock tend to offset each other.
But for an owner of an option this is different. For example, an owner of a c d
option (see Section 6.2.3), who has the nght to buy an option at a specified tirne
and price, benefits from price increase, but has iimited d o m i d e risk if the premium
decreases. Therefore the value of an option increases as the volatüity increases.
6.4. The Black-Scholes Formula 194
6.4 The Black-Scholes Formula
Using the notions fiom Cs6rgo (1999, Section 4) in this section, we assume, as
in (6.3.3), that the stock prie process {S = S(t), t 2 0) is govemed (driven) by
a standard Wiener process {W = W(t), O 5 t < w) on some probabüity space
(il, 3, P) via the Itô process
dS(t) = pS (t) dt + oS(t) d W (t) , O 5 t 5 T.
Assume further that the value or price C of a Euopean c d option at any time
t E [O, T] depends only on the underlying stock price S(t) and the time t, i.e., we
have
and, in addition, rhat the real vdued function C = C(t, S(t)) on [O, Tl x (0, CU) is
continuously differentiable in t and twice continuously differentiable in S, denoted
by C E C1$2[[0, T) x (O, cm)]. We note that the processes in (6.4.1) and (6.4.2) are
driven by the same Wiener process. Itô showed that C is again an Itô process, and
by Itô's chah mle formuIa
where dS is given by (6.4.1). Hence we have
where, for computing ( p S d t + o S d ~ ) * , we used the following "multiplication table"
6.4. The Black-Scholes Formula 195
for diaerentials (cf. Karatzas and Shreve (1988, p. 154))
d w I 0 d t .
The geometrîc Brownian motion process
is a solution of (6.4.1), starting hom S(0) at time O. Indeed, on letting f (t, W) :=
S(t), then Itô's formula (cf- (6.4.3)) for the Itô proces in (6.4.1) is
By (6.4.5) we a.rrive at the differentials
which, in tum, via (6.4.6) yield (6.4.1) as desired, namely
dt
= aS(t)dW(t)+pS(t)dt, O i t ST.
The uniqueness of the solution S in (6.4.5) for (6.4.1) follows from a general result
of Itô, which states that a stochastic differential equation wïth Lipschitz continuous
coefficients has a unique solution.
By (6.4.5), we have that for each t E [O, Tl
S m logS(T) -logS(t) = log- W )
6 -4- The Black-Scholes Formula 196
Fdowing Black and Scholes (1973), we now derive the Black-Scholes formula.
Their assumptions are:
0 The stock price S(t) follows a geometric Brownian motion with constant p and
constant a,
Ko transaction costs, no taxes, and no dividends involved,
Shoa selling of securities is permitted,
The trading is continuous,
0 No arbitrage opportunities, which means no riskless profit by simultaneously
entering into transaction in more than one market, e. g. to buy 1000 shares
of a stock in New York for $320 and sell it after considering the exchange rate
for $350 in Viema,
The risk-fiee interest rate r is constant and the same for dl maturities.
Black and Scholes define a portfolio such that the holder of it is short one deriva-
tive security and long an amoünt of shares. Hence, the value of the portfolio,
Say P, is
Using (6.4.1) and (6.4.4) we get that
Usùig the no-arbitrage condition, it follows that we have
6.4. The Black-Scholes Formula 197
which, in turn, implies the Black-Schola partial dilferential equation
where C is a function of S and t. It is very interesting to see that the expected
return on the stock p drops out of the equation and d o s not have any influence on
pricing the option.
Note that this equation does not involve any variables that are affected by the rÏsk
preferences of investors. It would not be independent of risk preferences if it involved
the expected return p. In fact, the higher the level of risk aversion by investors, the
higher p will be for any giwn stock, hence p dopends on risk preferences (cf. Hull
(1993, sectian 10.8)). Remarkably, under the Black-Scholes model, ,tu drops out of
the equation.
Subject to C(T, S) = (S - K)+, equation (6.4.9) can be solved, i. e., in case of
a Euopean cal1 option as in Section 6.2.3 (cf. Black and Scholes (IN"?)), the price
C of the option a t any t ime t = 4 E [O, T] is the Black-Scholes formula a s follows,
= S(to)Q ( d ) - ~e-'(*-%D(d - O ~ T - to) ,
where
The formula says that the option value C is higher the higher the price of the share
today, S(to), the higher the volatility of the share price, 0, the higher the risk-fiee
interest rate, r, the longer the time to maturity, T, the higher the probability that
the option will be exercised, and the lower the stlike price, K. We note that
6.5. ChangesintheVolatility 198
denotes the standard normal distribution function.
6.5 Changes in the Volatility
In the previous section we stated the Black-Scholes formula that allows us to com-
pute the value of a European c d option assuming that S(t) is a geometrïc Brownian
motion with constants p and 02. Since most of the factors in this formula are fixed
in advance, e. g., we know the price of the share today, S(t), the t h e of maturi@,
T, the strike price, K, and more or less the risk free interest rate, r , we propose to
test whether the volatility, O, may have changed or not. Frequently, o is estimated
by histoncal data, for instance, via the stock pnces of the last n days.
We will see that the problem of detecting changes in the volatility in stock prices
is a very dificult task due to its complicated behavior. We will see that although
under the assumptions of the Black-Scholes model we have a consistent estimator to
test for s changes in the volatility, but it wiU not be the case, if we change the mode1
a bit. This will be clear when we exchange the assumption of geometric Brownian
motion (cf. (6.4.5)) by geometric hactional Brownian motion (cf. (6 -5.2)) .
In Figure 6.5.2 we see the dependency of the price of a European call option on
the volatility o. As o becornes big, the price of the option increases. Hence, when
estimating the volatility using historical data, change in the volatility will effect its
estimator. We fixed the spot price S(0) = $20, the rkk-free interest r = 0.04,
T = $ which is equivalent to a period of half a year and then we can see graphs
for the strike price K = $15 and K = $25, respectively. Of course, when the strike
price is below the spot price then the option will be more expensive, since we buy
at time t = O the right to buy in half a year a share for l e s money than the spot
pnce S(0). The price of the option goes down, if the spot price is below the strike
pnce.
Let S( t l ) , . . . , S ( L + ~ ) be the stock pnces of the last n + 1 days, following a
geometric Brownian motion under the Black-Scholes model (cf. (6.4.5), where p is
6.5. Changes in the Volatility 199
Figure 6.5.2: Price of an European cal1 option depending on a for K =$15 (above) and K=$25 (below).
replaced by r ) , and let At = ti - ti-1, 1 < i 5 n + 1. Typically ti = &. Thus,
which is the continuously compounded return in the i-th intenal and S(ti) =
~ ( t ~ - ~ ) & 1 . We note that the ,&'s are i.i.d. r.v.'s, nomally distnbuted as in-
dicated. Since we are interested in testing for changes in the volatility a, we define
l s isn .
We note in passing that in this specific set-up the volatility o in the Black-Schoies
formula (cf. (6.4.10)) can be estimated via the maximum likelihood estimator (MLE).
6.5. Changes in the Volatüity 200
Using the fact that, in this case, T and At are both known, we get
mhich may replace the usually suggested sample variance for estimating O in (6.4.10).
Suppose we are testing for two changes in the volatility (variance). From Sec-
tion 4.5 we know that we may use a supfunctional of a properly normalized U-
statistics based process with kernel function h(x, y) = i ( x - y)2 to test for changes
in the variance.
Recall that for this test statistic we had to assume that under Ho we have
i.i.d. r.v.'s with finite non-zero variance 02 and under ~ f ) we have independent
random variables with at most two changes in the &ance, but no changes in the
mean. Since r and At are constant, we observe that our random &ables Zi,
1 5 i 5 n, satis& the assumptions made under Ho, but violate the assumptions
under HT), since EZi depends on Var& = aZ, if r # $. On the other hand, we saw in Section 5.6 that by using the kernel h(z, y) = 9,
we cm test for changes in the mean and/or variance. In general, the test does not
distinguish between changes in the mean or changes in the variance, but it wül
reject, when at least one of them changes. Fortunately, in our present model, T and
At are constant, which means that JEZi will change if and only if o2 changes. Thus,
in this specific set-up, the statistics studied in Section 5.6 will reject Ho when the
mean and the variance change simultaneously. Narnely, on estimating 02 via (6.5.1)
and 2 via a similar procedure, an appropriate test statistic can now be based on
with ko := O, k3 := n, a* := I V a r ( g ) , 4 and R(k) := ~ f . = , q, to test for at most
two change-points in the volatility o. Under Ho such a test statistic converges in
probability to the sup-functional of the Gauçsian process in (4.3.10) and under Hf)
it converges in probability to m.
Assuming that we detected at most two changes - ~ f ) holds - then i t is not
clear how to estimate a in the Black-Scholes formula (cf. (6.4.10)), since we don't
know in general where the changepoints are. Unfortunately, the maximum of the
test statistic iç not necessarily taken at the times of change.
In the previous setup, which was essential to test for changes in the variance, we
assumed that the stock pnce S(t) follows a geometnc Brownian motion. That the
test statistic is not applicable in every mode1 will be clear, if we assume tha t the
stock price, Say SE(t), foUows a geometric fractional Brownîan motion instead of a
geometric Brownian motion. Hence, instead of (6.4.5), we start with a geometric
fiactional Brownian motion of order O < H < 1 of the f o m (cf. Csorgo (1999))
which is a centered Gaussian process (WH(t); O 5 t < CO) with stationary incre-
ments and WH@) = O , i. e.,
We note that H is the so-called Hurst constant. For further references to the frac-
tional Brownian motion in the context of hancial modeling we refer to Salopek
(1997, Chapter 1) as well as to her references. Obviously for H = 6 we have a geo-
metric Brownian motion. Using now this geometric fkactional Brownian motion then
we may rnodiS the Black-Scholes formula and calculate the pnce C of an European
call option at to = O as follows:
6.5. Changes in the Volatility 202
where
has variance 9, but the 2 , ' s fail to be independent, if H # $. Note that under the
null hypothesis of no-change IEZF depends on t:R - t:: and
Therefore the joint distribution of zP, . . . ,z: will be different than the product of
the corresponding marginal distributions and our proposed test statistic fails, due
to the dependence of the observations.
Summarizing, we can test for changes in the volatility when the stock pnces
are assumed to follow a geometric Brownian motion. This assumption d o w s us
to produce i.i.d. r.v.'s by looking at the log-difference of the stock prices S(t). We
suggested a test statistic that will test for a t most two changes in the voiatility.
Using the results from the previous sections, we may also test for more, or &O less
change-points.
Bibliography
[l] Billingsley, P. (1986). Probability and Measure. 2nd ed. Wiley, New York.
[2] Bimbaurn, Z.W. and Pyke, R. (1958). On Some Distributions Related to the
Statistic D:. AM. Math. Statist., 29, 179-187.
[3] Black, F. and Scholes, M. (1973). The Pricing of Options and Corporate Lia-
bilities. Journal of Political Economy, 8 1, 637-654.
[4] Brodsky, B.E. and Darkhovsky, B.S. (1993). Nonparametric Metods in Change-
Point Problems. Kluwer, Dordrecht.
[5] Broemeling, L.D. and Tsunimi, H. (1987). Econometrics and Structural
Change. Marcel Dekker. New York.
[6] Casella, G. and Berger, R. (1990). Statistical Inference. Duxbury Press. Bel-
mont.
[7] Cramér, H. and Leadbetter, M.R. (1967). Stationary and Related Stochas-
tic Processes; Sample h c t ion Properties and their Applications. Wiley, New
York.
[8] Csorgo, M. (1979). Broumian Motion - Wiener Process. Canad. Math. Bull. Vol
22 (3), 257-279.
[9] Csorg6, M. (1983). Quantile Processes with Statistical Applications. Society for
Industrial and Applied Mat hemat ics . P hiladelphia, Pennsylvania.
[IO] Csorgo, M. (1999). Random Walking Around Financial Mathematics. Preprint.
[il] Csorg6, M., CsorgO, S. and Homith, L. (1986). An Asymptotic Theory for
Ernpirical Reliability and Concentration Processes. Springer-Verlag. Berlin.
[12] Csorg6, M. and Horvgth, L. (1986). Invariance Principles for Changepoint
Pmblems. Techn. Rep. Senes of the Laboratory for Research in Statistics and
Probabilim Carleton U. - U. of Ottawa, N0.80.
[13] Cs6rg6, M. and Horvkth, L. (1988a). Nonparametric Methodi for Changepoint
Pro blems. Handbook of S tatistics, Vol. 7 403-425. Elsevier Science Publishers
B .V. (Nort-Holland) .
[14] Csorg6, M. and Horvath, L. (1988b). Invariance Principles for Changepoint
P r o b l m . Journal of Multivariate Analysis 27, 151-168.
[15] CsorgO, M. and H o d t h , L. (1993). Weighted Approximations in Probability
and S tatistics. John WiIey. Chichester.
[16] Csorgo, M. and Homith, L. (1997). Limit Theorems in Change-point Analysis.
John Wiley. Chichester.
[17] Csorg6, M. and Révész, P. (1981). Strong Approximations in Probability and
S tatistics. kademic Press, New York.
[18] Donsker, M. (1951). An Invariance Principle for Certain Pro bability Limit The-
o r e m . Mem. Amer. Math. SOC., No. 6.
[19] Doob, J.L. (1953). Stochastic Processes. John Wiley. New York.
[20] Duffie, D. (1996). Dynamic Asset Pricing Theory (2nd edn). Princeton Univer-
sity Press. Princeton, New Jersey.
[21] ES~WOO~, B.J. and Eastwood, V.R. (1998). Tabulating Weighted sup-Nom
Fvnctionals of Brownian Bridges via Monte Carlo Simulation. Asymp t otic
Methods in Probability and Statistics - a Volume in Honour of MM& Csorgo
(ed. B. Szyszkowicz), 707-719, Elsevier Science B.V., Amsterdam
[22] Ferger, D. and Stute, W. (1992). Convergence of Changepoint Estimators.
Stoch. Proc. Appl., 42, 345351.
[23] Gombay, E. (1998). U-Statistics for Change under Alternatives. Preprint.
[24] Gombay, E. and Horviith, L. (1997). An Application of the Likelihood Method
to Change-Point Deteetion. EnWonmetrics, 8, 459-467.
[25] Gombay, E., Horviith, L. and HuSkowi, M. (1996). Estimators and Tests for
Change in the Vaciance. Statistics and Decisions, 14, 145-159.
[26] Hall, P. (1979). On the Invariance Principle for U-Statistics. J . Stoch. Proc.
Appl., 9, 163-174.
[27] Hoeffding, W. (1948). A Class of Statistics with Asymptotically Normal Distri-
bution. Ann. Math. Stat., 19, 293-325.
[28] Hoeffding, W. (1961). The Strong Law of Large Nvmbers for U-Statistics. Univ.
of North Carolina Institute of Statistics Mimeo Series, No. 302.
[29] Hull, J. (1993). Options, R i t w s , and Other Derivative Secunties (2nd edn).
Prentice Hall International Editions. Englewood C m , N J.
[30] Huse, V. (1988). On Some Nonparametric Methods for Changepoint Problems.
PhD Thesis, Carleton University, Ottawa, Canada.
[31] Janson, S. and Wichura, M.J. (1983). Invariance Principles for Stochastic A r a
and Related Stochastic Integrals. Stoch. Proc. and their .4ppl., 16, 71-84.
[32] Karatzas, 1. and Shreve, S. (1988)- Brownian Motion and Stochastic Calculus.
S pringer-Verlag. New York, N.Y.
[33] Kendall, M.G. and Stuart, A. (1963). The Advanced Theory of Statistics. Vol-
ume 1. Distribution Theory. Charles Gr i f f i & Company Limited, London.
[34] Kolmogorov, A.N. (1933). Sulla deteminazione empin'uz di una legge di dis-
tn'buzz'one. Giorn. Inst. Ital. Attuari, 4, 83-91.
[35] Lee, C.-B. (1996). Nonpararnetric Mvltiple Change-point Estirnato~s- Stat. and
Prob. Letters, 27, 295-304.
[36] Levin, B. and m e , J. (1985). The Cwvm Test of Hornogeneity with an Ap-
plication zn Spontaneous Abortion Epideméology. Statist. Med., 4, 469-488.
1371 Lombard, F. (1987). Rank Tets for Changepoint Problems. Biometrika, 74, 615-
624.
[38] Lombard, F. and Carroll, R-J. (1992). Change Point Estimation via Running
Cusums. Technical Report No. 155, 1992, Statistics Department, Texas A&M
University.
[39] Loynes, R.M. (IWO). An Invariance Principle for Reversed Martingales. Proc.
Math. Soc., 25, 56-64.
[40] Major, P. (1979). An Improvement of Strassen's Invariance Principle. AM.
Probability, 7, 55-61.
[41] Miller, R.G.,Jr. and Sen, P.K. (1972). Weak Convergence of U-Statistics and
Von Mises ' Difierentiable Statistical Functions. AM. Math. Stat ., 43, 31-41.
[42] Resnick, S. (1992). Adventures in Stochastic Processes. Birkhiiuser. Boston.
[43] Salopek, D.M. (1997). Tolerance to Arbitrage: Inclusion of Fractional Brownian
Motion t o Mode1 Stock Price Fluctuations. PhD Thesis, Carleton University,
Ottawa, Canada.
[44] Sen, P.K. (1977). Almost Sure Convergence of Generalized Ci-Statistics. AM.
Prob., 5, 287-290.
[45] Serbinowska, M. (1996). Consistency of the Estimator of the Number of Change-
points in Binomial Observations. Statist. Probab. Letters, 29, 337-344.
1461 Serfhg, R (1980). Approximation Theorerns of Mathematical Statistics. John
Wiley. Chichester.
[47] Shorack, G.R. and W e b e r , J.A. (1986). Empirical Processes with Applications
to Statistics. Wiley, New York.
[48] Slutsky, E. (1925). Über stochastische Asymptoter und Grenmede . Math. An-
nalen, 5, 93.
[49] Spanos, A. (1986). Statistical Foundations of Econometric Modelling. Cam-
bridge University Press. Cambridge.
[50] Szyszkowicz, B. (1991). Weighted Stochastic Processes under Contiguous Alter-
natives. C.R.Math.Rep.Acad.Sci. Canada 13, 211-216.
[51] Szyszkowicz, B. (1992). Weak Convergence of Stochastic Processes in Weighted
Metrics and their Applications to Contiguous Changepoint Analysis. PhD The-
sis, Carleton University, Ottawa, Canada.
[52] Szyszkowicz, B. (1995). Weighted Sequential Empirical Type Processes &th Ap-
plications to Change-point Problems. Techn. Rep. Series of the Laboratory for
Research in Statistics and Probability, Carleton U. - U. of Ottawa, No.276.
[53] Szyszkowicz, B. (1998). Weighted Sequential Empirical Type Processes with Ap-
plications to Change-point Problems. Handbook of Statistics, Vol. 16 (N. Bal-
alaishan and C.R. Rao, eds.), 573-630.
[54] Vostrïkova, L. Ju. (1981). Detecting "Disorder" in Multidimensional Random
Processes. Soviet Mathematics Doklady, 24, 55-59.
[55] Yao, Q. (1993). Tests for Change-Points with Epidemic A&ernatives.
Biometrika, 80, 179-191.
[56] Yao, Y.-C. (1988). Estimating the Number of Change-points via Schwarz's Cri-
terion. Statist. and Probab. Letters, 6, 181-189.