informatlon tq userscollectionscanada.gc.ca/obj/s4/f2/dsk1/tape10/pqdd_0002/... · 2005-02-11 ·...

INFORMATlON TQ USERS

This manuscript has been reproduced from the microfilm master. UMI films the

text direcüy from the original or copy submitted. Thus, some thesis and

dissertation copies are in typewriter face, while others may be from any type of

cornputer printer.

The quality of this reproduction is dependent upon the quality of the copy

submitted. Broken or indistinct print, calored or poor quality illustrations and

photographs, pnnt bieedthrough, substandard margins, and improper alignment

can adversely affect reproduction.

In the unlikely event that the author did not send UMI a cornplete manuscript and

there are rnissing pages, these will be noted. AISO, if unauthorized copyright

material had to be removed, a note wiil indicate the deletion.

Ovenue materials (e-g., rnaps. drawings, charts) are reproduced by sectioning

the original, beginning at the upper left-hand corner and continuing fmm left to

right in equal sections with small overlaps.

Photographs included in the original manusa-pt have k e n reproduced

xerographically in this copy. Higher quality 6" x 9" black and white photographie

prints are available for any photographs or illustrations appearing in this wpy for

an additional charge. Contact UMI directly to order.

Bell & Howell Information and Learning 300 North Zeeb Road, Ann A W r , iV11 48106-1346 USA

Multiple Changepoints wit h an

Application to Financial Modeling

A thesis submitted to

the Faculw of Graduate Studies

in partial £ulfillment of

the requirements for the degree of

Doctor of Philosophy

School of

Mathematics and Statistics

Carleton University

Ottawa, Ontario, Canada

April 1999

National tibrary E3ibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services services bibliographiques

395 Wellington Street 395. me Wellington OttawaON K 1 A W O(tawaON K I A M Canada Canada

The author has granted a non- L'auteur a accorde une licence non exclusive Licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distnbute or sell reproduire, prêter, didbuer ou copies of this thesîs in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/film. de

reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts f?om it Ni la thèse ni des extraits substantiels may be printed or othenvise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.

"Cogito ergo sumP

- René Descartes

Abstract This dissertation de& wit h asyrnptotic met ho& in probability and statistics

which may, for example, be used in business and finance. In particular, we study

U-statistics based processes which may be used to detect (multiple or structural)

changes in the distribution of independent observations.

In Chapter 1 we give an o v e ~ e w of well-known results which wi l I be used

throughout this thesis. Chapters 2 - 5 deal with changepoint analysis.

Chapter 2 summarizes some basic properties of U-statistics and their use in the

context of change-point analysis.

In Chapter 3 we review and study U-statistics based processes under the nuIl

hrpothesis of nechange as wel1 as under the alternative hypothesis of at most one

change in the distribution.

Chapter 4 de& with U-statistics based processes when the alternative allows a t

most two changes in their distribution. As special cases, epidemic alternatives are

investigated as weU.

Using similar arguments as in Chapter 3 and Chapter 4, we extend our theory

of at most one and two changes to multiple change-points in Chapter 5.

Besides the mentioned general theory, in Chapters 3 - 5 we study in particular

changes in the mean and the variance, respectively. In addition to that we study

multiple changes in the mean and/or variance and give an application in Chapter 6,

which deals with changes of volatility in the h a n c i a l stock market, using the Black-

Scholes set-up.

The many repetitions in descnbing the hypotheses conçidered are done on pur-

pose for the convenience of the reader. We prefer to give as much details as possible

rather than to send the reader back to previous chapters.

In Chapter 1, Chapter 2 and Chapter 3 we summarïze well-known resuIts. In

addition, we discuss in Chapter 3 some results which are believed to be new. Results

in Chapters 4 and 5 as well as its application in Chapter 5 are also believed to be

new. Theorems, corollaries and lemmas due to other authors WU be accompanied

by their names in brackets.

Acknowledgements Before thanking d l those people who guided and helped me over the past years

and who were a very important part of my life 1 would like to quote a part of a story

by Charles Krauthammerl about the extraordinary mathematician Pat Erd6s:

A few years ago, Ron Graham (a fiend and benefactor of Erdos) tells

me, Erdos heard of a promising young mathematician who wanted to go

to H m d but was short the money needed. Erd6s arrwged to see him

and lent him $1,000. (The sum total of the money E r d k carried around

at any one time was about $30.) He told the young man he could pay it

back when he was able to. Recently, the young man called Graham to

Say that he had gone through Harvard and was now teaching at Michigan

and could finally pay the money back. What should he do?

Graham consdted Erd6s. Erdos said, "TeIl him to do with $1,000 what

1 did."

The reason why 1 tell this story is that a similar thing happened to me. If it

were not for ail the help of'my Professors Mikl6s Csorgô (Carleton University) and

Pd Révész (Technical University of Vienna) it wodd not have been possible for me

to study in Canada. I would like to thank them for the opportunity they have given

me and I hope one day 1 will be able to do the same for someone eke.

I would Iü<e to thank Professor Mikl6s CsGrg6 for al1 the discussions and meetings

we had, for al1 his suggestions, his patience, perseverance and financial support. I

have learned a lot £rom him. For that 1 am honored to have been his student. What

makes Professor Csorgo truly exceptional is that he continues t o teach his students,

even though he is officially retired. - . -

'Washington Post Writers Group, A life that added up to something. This axticie may be found on the followi~g webpage http:// cecm.sfi.cu/ personal/ jbowein/ erdos-html

As well 1 a m thankful to the School of Mathematics and Statistics for all their

financial support. In addition, my thanks also go to: the Pacifie Institute for the

Mathematical Sciences for inviting me to participate in the Industrial Problem Solv-

ing Workshop in Calgary last June; the P d Erdos Center of Mathematics in Bu-

dapest for inviting me to the Workshop on Random Walks last July to give a talk

on my research; Professor Barbara Szyszkowicz (Carleton University) for funding to

attend the Meeting of the Canadian Mathematical S o c i e ~ in Kingston last Decem-

ber; and the Fields Institute for inviting me to attend the Workshop on Probability

in Finance 1 s t January.

1 would dso like to thank Dr. Ri~ardaç Zitikis of the Laboratory for Research in

Statistics and Probabiliw, Cadeton Universiw, for al l the discussions and meetings

we had, his encouragement and everything he taught me. As well many th&

to Professor Pa1 Révész who was my 'Diplomarbeit' (thesis) supervisor in Viema,

Austria, for teaching me al1 the basics I needed and for offering a helping hand the

1 s t couple of years.

My experience in Canada would not have been worthwile if it were not for my

fkiends who were there for me whenever 1 needed them and who made my stay in

Ottawa so enjoyable and unforgettable. In particular, 1 would like to t h d Kul-

winder Saini, Paul Wear and Nick Xenos for their proof reading on several occasions.

1 would also like to express m y gratitude to Mrs. Gillian Murray and Mrs. Adrienne

Richter for the many questions they answered for me with so much patience.

To my best friend and fiancee, Silvia, her love, support and understanding sus-

tained me over the past years. Finally, 1 would like to thank those without their

help, love, understanding and support I would not have corne this far: my family.

Thanks for everything!

Gewidmet meinen Eltern, Melitta und Albert Orasch

Contents

Acceptance sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . u ...

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~u

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . v

Dedication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii ..-

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v u

ListofFigures . . - . . - . . . . . . . . . . . . . . . . . xi .a. List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xm

1 Preliminaries 1

1.1 BasicDefinitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Stochastic Processes 3

2 U-Statistics in Change-point Andysis 8

2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Motivation. II

2.3 Definîtion of U-Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Generalized U-Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Variance of U-Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6 Some Convergence Results for U-Statistics . . . . . . . . . . . . . . . 17

3 At Most One Changepoint 21

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.1 Notations under the Null Hypothesis Ha . . . . . . . . . . . . 22

3.1.2 Notations under the Alternative HA . . . . . . . . . . . . . . . 26

3.2 Antisymmetric Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.1 Asymptotic Results under Ho . . . . . . . . . . . . . . . . . . 28

3.2.2 Asymptotic Resultsunder . . . . . . . . . . . . . . . . . . 33

3.2.3 Estimating the Time of Change . . . . . . . . . . . . . . . . . 33

3.3 Symmetric Keniels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.1 Asymptotic Resirlts under Ho . . . . . . . . . . . . . . . . . . 35

3.3.2 Asymptotic Results under HA . . . . . . . . . . . . . . . . . . 40

3.3.3 Estimating the Time of Change . . . . . . . . . . . . . . . . . 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Change in the Mean 53

3.5 Change in the Vaxiance . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4 At Most Two Change-points 70

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.1.1 Notations under the Null Hypothesis Ho . . . . . . . . . . . . 72

4.1.2 Notations under the Alternative .Hf ) . . . . . . . . . . . . . . 74


4.2.1 -4symptotic Results under Ho . . . . . . . . . . . . . . . . . . 76

4.2.2 AsymptoticResultsunder HF) . . . . . . . . . . . . . . . . . 85

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Symmetric Kernels 92

. . . . . . . . . . . . . . . . . . 4.3.1 Asymptotic Results under 92

4.3.2 AsynptoticFtesults under HF' . . . . . . . . . . . . . . . . . 100

4.4 Changes in the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

4.5 Changesin thevariance . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.6 Epidemic Alternatives . . . . . . . . . . . . . . . .. . . . . . . . . . 119

5 Multiple Change-points 131

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.1.1 Notations under the Null Hypothesis Ho . . . . . . . . . . . . 133

5.1.2 Notations under the Alternative H$' . . . . . . . . . . . . . . 136


5.2.1 AsymptoticResultsunder Ho . . . . . . . . . . . . . . . . . . 138

5.2.2 Aqnnptotic Results under H$) . . . . . . . . . . . . . . . . . 148

5.3 Symmetric KerneIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.3.1 Asymptotic Results under Ho . . . . . . . . . . . . . . . . . . 154

5.3.2 Asymptotic Resuits under H$) . . . . . . . . . . . . . . . . . 163

5.4 Multiple Changes in the Mean . . . . . . . . . . . . . . . . . . . . . . 172

5.5 Multiple Changes in the Variance . . . . . . . . . . . . . . . . . . . . 174

5.6 Multiple Changes in the Mean andlor Variance . . . . . . . . . . . . 175

5.7 Estimating the Number of Change-points . . . . . . . . . . . . . . . . 177

6 Applying Change-point Theory t o the Financial Market 186

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

6.2 Derivative Secunties . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

6.2.1 Forward Contracts . . . . . . . . . . . . . . . . . . . . . . . . 187

6.2.2 Futures Contracts . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.2.3 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.3 Modeling the Behavior of Stock Prices . . . . . . . . . . . . . . . . . 190

6.4 The Black-Scho1.e~ Formula . . . . . . . . . . . . . . . . . . . . . . . . 194

6.5 Changes in the Volatility . . . . . . . . . . . . . . . . . . . . . . . . . 198

Bibliography 203

List of Figures

3.2.1 The limiting function tif (t) with OIt2 = 10 takes its maximum value of

2.2 at t = X = f. Note: The z-rucis denotes t and the y-axis denotes

üx(t). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2 Summation Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.3 The Limiting function u ~ ( t ) with 61 = 1, O2 = 2 and = 3 takes 10

its maximum value of 0.55 at t = 0.475. . . . . . . . . . . . . . . . . . 46

3.3.4 The limiting function ul(t) with 81 = 3, 82 = 2 and OIt2 ='1 takes its

maximum value of 5 at t = !. . . . . . . . . . . . . . . . . . . . . . . 47

3.3.5 The Limiting function v ~ ( t ) with 81 = 1, & = 2 and = 3 takes 10

7 its maximum value of O.63at t = A = . . . . . . . . . . . . . . . . 50

3.4.6 The data XI, . . . , &Oo are i.i.d. N(1, 1)-distributed and Xioi, - . . , Xlooo

are i.i.d. N(4,l)-distributed. . . . . . . . . . . . . . . . . . . . . . . 54

3.4.7 A geometrical interpretation of lE{nS(k) - k S ( n ) ) = O under Ho. . . 56

3.5.8 The data XI, . . . , are i.i.d. N(0, 1)-distributed and X701, . - . , XIOOO

are i.i.d. N(O,4)-distributed. . . . . . . . . . . . . . . . . . . . . . . 62

4.4.1 The data XI, . . . , are i.i.d. N(0,l)-distributed, X3al, . . . , Xtoo

are i-i-d. N(3,l)-distributed and X701i . . . , XIWO are i.i.d. N(2,l)-

distributed. . . . . . . . . . . . . . . . - . . . . . . . . . . . . . . . . 110

4.4.2 A geometricd interpretation of IE{k2S(kl) +(n-kl)S(k2) -k2S(n)) =

O under a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Il1 4.5.3 The data XI, .-. , Xioo are i.i.d. N(O,2)-distributed, Xiol, ... , XSo0

are i.i.d. N ( O , 1)-distributed and X301r.-- ,X~ooO are i-i-d. N(O,3)-

distributed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

... 4.6.4 The data XI, , and X701, ... , XlOo0 are i.i.d. N(0, 1)-distributed

- - - . . . . . . . and X3017 , are i.i.d. N(3,l)-distributed. 120

4.6.5 Summation Area of fixlsh (tl , t2) . . . . . . . . . . . . . . . . . . . . . 124 4.6.6 The limiting function 211 2(tl, t2) with 012 = 10 takes its maximum

3'3

value of 2.2 at the point (i, i). . . . . . . . . . . . . . . . . . . . . . . 125

4.6.7 The limiting function ü 1 2 (t17 t2) with = 10 takes its maximum 10'10

value of 0.9 at the oint (+,&). . . . . . . . . . . . . . . . . . . . 126

4.6.8 The limiting function ü 1 9 (tl, t2) with = 10 takes its maximum 10 ' 10

value of 1.6 at the point (&, 5). . . . . . . . . . . . . - . . . . . . 127

4.6.9 The limiting function Üi,&, t2) with = 10, = 20 and 82,3 =

. . . . . . . -12 takes its maximum value of 3.05 at the point ( h l i) 130

6.3.1 Daily stock pnces with different volatilities al = 0.2 and 0 2 = 0.4. . 193

6.5.2 Price of an European caU option depending on o for K =$l5 (above)

and K=$25 (below). . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

xii

List of Syrnbols

statement holds almost surely

number of elements in the set A

Brownian bridge

expectation value of the random variable X

nul1 hypothesis

alternative hypothesis of s changes

kernel function with (x, y) E R x R

indicator function: I(X $ z) = 1, if X 5 x, other-

wise O

independent identically distributed random variables

natural logarithm

set of ail positive integers: {l, 2,3, . . .}

probabiliw space

probability that the random variable X is less than

or equal to x

set of all real numbers: {z : -XI < x < m)

sum of the squared random variables XI, . . . , X, number of change- points

price process of a stock, O 5 t 5 T

sum of the random vaiiables X I , . . . , X,

Wiener process

X is a normal random variable with mean p and

variance d largest integer less than or equal to x

average of the random variables XI,. . . , X, - xn

l i n q , , ~ d { I , I > c } = 0

X, is bounded in probabüity

2 converges to zero in probability

X, converges to zero in probability

X, converges almost surely to zero

variance of the random &able X

equality in distribution

convergence of real numbers

convergence almost surely

convergence in probability

convergence in distribution

equal by definition, where the new expression is on

the dotted side

indicates the end of a proof

xiv

"A mathematician is a machine for

tuming coffee int O t heorems.''

- PZ ErdOs

Chapter 1

Preliminaries

In this chapter we give some important definitions used throughout this thesis

which focuses on probability, statistics as well as on stochastics.

1.1 Basic Definitions

In this section some fundamental concepts of probabilty theory are given. The basic

mathenatical tool for this purpose is measure theory. We give a bnef survey of some

concepts that wiU be required in the following chapters. For further references, we

refer, for example, to Billingsley (1986).

Definition 1.1.1 0, an erbitrary space or set of points w, stands for the set of dl

possible outcomes w of an experirnent. A class 3 of subsetsl of an arbitrary nonemty

space R is called an algebra or field if the following are satisfied:

Q E ~ ,

A E 3 AC E 3, where AC denotes the complement of A,

A , B E F - A u B E T .

A c las F of subsets of R is caUed a O-field, if in addition to being a field, the

field also satisfis

Al,A2 ,... € F - A , U A 2 U ... €3.

'Many authors also use the sp -bo l A instead of F.

1.1. Basic Definitions 2

Defhïtion 1.1.2 A set fitnction is a rd-valued function deked on some class of

subsets of Q. A set function p on a field 3 of subsets of S2 is a countably additive

measure if the following is satisfied:

p(A) E [O, m] for A E 3,

P(0) = O,

If iJE, Ak E F for a disjoint sequence of F-sets Al, A*, . . . then

the secallecl wvntable additivity.

Moreover, the triple (0, T, p) is a measure space if p is a measure on a a-field T of

subsets of R. The pair (R, F) is a measurable space if 3 is a O-field of subsets of

a.

Definition 1.1.3 Consider two rneasurable spaces ( R , 3 ) and (a', T ) and the m a p

ping f : R + R'. Shen f is measurable if f -'(Af) = {w E 0 : f (w) E A', A' E 3') E

F. In probability context, a red measurzble function is called a random variable.

Definition 1.1.4 We cal1 a set function IP on a field 3 a probability measure if the

following is satisfied

O 5P(A) 5 1 for a l l A ~ 3 ,

P(0) = 0 and P(R) = 1,

If Ug?_, Ak E 3 for a disjoint sequence of F-sets Al, A*, . . . then

We cal1 the triple (a, F, P) a probability (measure) space if 3 is a 0-field of subsets

of f2 and IP is a probability measure on F.

1.2. Stochastic Processes 3

1.2 Stochastic Processes

In this section the general concept of a stochastic process and some of the most

important properties of such processes will be introduced. We ~ 3 1 also introduce

the notion of Gaussian process which plays a fundamental role when talking about

Wiener processes, Brownian bridges and Ohrnstein-Uhlenbeck processes. After in-

troducing these stochastic processes we state Donskerys theorem which is of crucial

importance when dealing with weak convergence of partial sums. For hrther refer-

ences on concepts related to stochastic processes we refer, for example, to Cramér

and Leadbetter (1967).

Definition 1.2.1 Let (a, 7, P) be a probability (measure) space and T be a given

parameter set. Shen a finite and real-valued function X = {X(t) = X(t, w ) ; t E T)

which is a measurable function of w E R for every fixed t is called a stochmtic

process.

The index set T is said to be discrete if it contains at most countably many

points, otherwise it is said to be wntinuow.

A one-parameter stochastic process is a stochastic process with a one-dimensional

index set T, while an s-parameter stochastic process has an s-dimensional index set

If we fix t E T, then X(tta, w ) , w E $2, is a random variable and if we fix w E Ci,

tben X(t, wfk), t E T, is the so-called sample puth or trajectory fiuLction of X .

Next we define a special stochastic process which plays a cruicid role in the area

of stochastics.

Definition 1.2.2 Let X = { X ( t ) : t E T) be a stochastic process on ( R , 3 , P). If

dl its hitedimensional distributions are normal, or in other words, if every finite-

dimensional vector ( X ( t l ) , . . . , X(tk) ) , ti E T, i = 1 , . . . , k, k E N, of the process X

has a multivariate normal distribution, then X is called a Gaussian process.

A Gaussian process is uniquely determined by its mean and its covariance func-

tion, since these two specib uniquely any multivariate normal distribution. One of


the most important Gaussian processes is the Wiener process. Throughout the rest

of this section we use the notations fkom Csorgo and Révész (1981, Chapter 1).

Dennition 1.2.3 A stochastic process {W(t; w ) = W(t) ; O 5 t < cm), where w E

R, and (R, F, IP) is a probabiiity space, is called a (standard) Wiener process or

Broumian motion, if the following is satisfied:

W ( t ) - W ( S ) % N ( O , ~ - s ) f o r a U O ~ s < t < c a andW(O)=O,

W(t) is an independent increment process, that is W(t2) - W(t1) , W(t4) -

W(t3), . . - , W(tZi) - W(tZi-1) are independent r x ' s for all O 5 tl < tz 5 t 3 < t4 5 ... 5 tZi-1 < t2i < 03 (i = 2,3* .. .), the sample path function W(t; w) is continuous in t with probabiliv 1.

The propem that W(0) = O is merely a normalization and is a convenience rather

than a basic requirement. If a process { ~ ( t ) ; O 5 t < m) satisfies the other

assumptions but not W ( O ) = O , then the process { ~ ( t ) - ~ ( 0 ) ; O 5 t < w) would

be a standard Wiener process. The property that the sample path is continuous

in t with probability 1 follows fkom the first two properties in the definition of a

Wiener process in the sense that given a process w with the first two properties,

there always exists a version which satiçfies the third property (cf. Doob (1953) and

Resnick (1992)).

Cleady, the first two properties imply that the covariance function of a Wiener

process is

@ W ( s ) W ( t ) = I E W ( s A t ) W ( s V t )

= E ( W ( s A t ) W ( s V t ) - w Z ( s A t ) + w2(s A t ) )

= l E W ( s ~ t ) ( W ( s v t ) - W ( s A t ) ) +lEw2(s ~ t )

= EW(s A t ) E ( W ( s V t ) - W ( s A t ) ) +EW*(S A t )

A constructive proof for the existence of this process is, for example, given by Csorgo

and Révész (1981). For a historical summary and development of a Wiener process,

a s well as for further results, we refer to Cs6rg6 (1979).


We now define another Gaussian process of crucial importance, the so-called

Brownian bridge.

Definition 1.2.4 A stochastic process ( B ( t ) ; O 5 t 5 1) is cdled a Broumian bridge

or tied down Brownian motion, if the foltowing is satisfied:

a the joint distribution of B(t,), ..., B(t,) ( O 5 tl < t2 < ... < t, 5 l ;n =

1,2, . . .) is Gauçsian, with IEB(t) = 0,

a the covariance function of B(t) is

the sample path function of B(t; w ) is continuous in t with probability 1.

From the second property it follows that

The existence of such a Gaussian process is an easy consequence of the following:

Lemma 1.2.1 Let {W (t); O 5 t < cu) be a Wiener process. Then

is a Brownian bridge.

When considering a Wiener process, we can see that the increments of the pro-

c e s over non-overlapping t ime int ervals are st ationary and independent, but the

process itseif is not stationary, that is to Say, we do not have the propem: Given

a stochastic proceçs { X ( t ) ; t E T), then for any positive integer k and any points

t 1 , . . . , tk E T the joint distribution of X ( t l ) , . . . , X( tk ) is the same as the joint

distribution of X( t l + h) , . . . , X (tk + h) for al1 h E T, i-e. (X( t l ) , . . . , X (tk)) 2 ( X ( t l + h), . . . , X (e + h)). This means that a stationary process is a stochastic

proces whose finite dimensional distributions rernain unchanged through shifts in

tirne. One c m show that the covariance function lEX(s )X( t ) of such a process is


a function of 1s - tl, the length of the interval. Hence, due to their respective co-

variance functions it is obvious that Wiener aad Brownian bridge processes are not

st at i o n q .

But we can define a stochastic process by rnodifying a Wiener process, such that

the new process is dationary. Consider the Gaussian process { U ( t ) = p; 0 5 t < CO). By the definition of a Wiener process it follows that

Definition 1.2.5 We d e h e the stationary Gaussian process V(t ) , the so-cded

~ r n s t e i n - iihlenbeck pmcess, via

We have that EV(t) = O, IEV2 (t) = 1, and EV(s) V(t ) = e-lt-SI.

Next, for further use, we quote the celebrated Donsker Theorem. Let XI, X2, . . . be i.i.d.r.v.'s a i th EXl = O, EX: = 1 and distribution function F. Further,

we put Sn = S(n) = Zn=, Xi. We construct a sequence of stochastic processes

{S,(t); O 5 t < 1) on C(0, l), the space of aU continuous functionç on the interval

(0, l), h m the partial sums So, Si, S2,. . . : Sn as follows:

We quote Donskerys Theorem (1951) as follows:

Theorem 1.2.1 (Donsker, 1951) We have, as n + oo,

h(Sn(t)) 5 h(W(t))

2 Stochastic Processes 7

for euery wntinuous fvnctional h : C(O: 1) + R I with respect to sup-nom topology.

Before appiying Donsker's Theorem tu some special functionals, we quote

Theorem 1.2.2 (Slutsky, 1925) Let X, -% X and Y, 5 c, whe~e c is a finite

wnstant, then, as n + ou, xn+yn -P ,x+c , XnYn 4 CX, Xn/Yn P, X/C i f C # O.

Donsker's theorem immediately implies that, as n -t CU,

S,(t) -% W(t), for t fixed in (OJ].

Combined with the fact that *(nt - [nt]) 5 O, Slutçky's theorem implies, a s

n + 00,

S([ntl) pt W@), fi

for t h e d in (0'11.

Moreover, we also have, as n + oo,

"If 1 feel unhappy, 1 do mathematics to become happy.

If 1 am happy, 1 do mathematics to keep happy."

- Alfiéd Rényi

Chapter 2

U- St at ist ics in Change-point Analysis

Introduction

The problem of abrupt parameter changes -es in many situations of daily life,

as well as in a variety of experimenta.1 and mathematical sciences. For instance, in

medicine one may be interested in testing whether treating HIV patients with a new

dmg stabilized the condition of the patients, and if not, in estimating the time(s) of

change(s) in order to change the treatment. Another example is variance andysiç

in daily stock prices, where the use of change detection methods allows an investor

to reduce his/her risk. In Chapter 6 a particular financial mode1 is described and

a special detection method is suggested. Detection of possible change-points is for

instance &O of interest in archaeobgy, in econometrics, in epidemiology, in nuclear

physics and in quality control.

In practice, usually a (large) set of data is observed, for instance the daily stock

prices over the ps t year. Then a statistical test should determine whether there

was a change in the data or not. This set of data may be rnodeled by saying that

we observe independent random variables over a special period of tirne, hence via a

random process. Then we wish to detect whether a change could have occured in

the distribution that governs this random process as tirne goes by.

We wish to study such phenomena in t e m of special stochastic processes based

on U-statistics. It is needless to Say that there are many other ways t o study such

phenomena. The construction of these processes is such that statistical tests can be

2.1. Introduction 9

based on them for det ecting possible changes in their distribution.

Change-point problems have origindy &en in the context of quality control,

where one typically observes the output of a production line and would wish to

signal deviations fIom an acceptable output level while observing the data. TVhen

one observes such a random process sequentially and stops observing at a random

time of detecting change, then one speaks of a sequeratéal procedure (e. g., stop a

production line, if a specified percentage of the ouptput is not good). Otherwise,

one usually observes a change in a chronically ordered finite sequence for the sake

of determining possible change(s) during the data collection (e. g., check whether

a production line produced reasonable output or not). Most such f i ed sample site

non-sequential pmcedures are descnbed in te- of asymptotic results ('inhite'

sample size, i-e., n -t 00).

Depending on whether the distribution of the data is assumed to be h o w n we

use either parametn'c or nonparameMc rnodelç. For parametric modelç we refer to a

survey of Csorgo and Homath (1997) at the end of the first Chapter of their book.

For nonparametric cases we refer, for example, to Brodsky and Darkhovsky (l993),

Csrjrgo and Horvgth (1988a, 1988b, lW7), Ferger and Stute (l992), Szyszkowicz

(1992, 1996, 1998) as well as to their bibliographies.

We now state the problem of testing for a change in the data in a more mathe-

matical way. Suppose we wodd Iike to test the null hypothesis

Ho : XI, . . . , X, are independent identicully distn'buted random variables

againçt the alternative that there is at most one change-point in the sequence Xi,

. . ., X,, namely that we have

: XI,. . . , X , are independent random variables and there is an in-

teger T, 1 < T < n, S U C ~ that P { X 1 5 t ) = . .. = P{X, 5 t ) ,

ex,+, 5 t ) = . . . = P{X, 5 t ) for al1 t and P{X, 5 t ~ ) # flXT+l 5 t o } for sorne ta.

This means that we are testing, for having n independent random variables belonging

to the same distribution, versus the f h t r ones belonging to the same distribution

2 Introduction 10

and the last n - T ones to a Beren t one. Therefore we will compare the fint k

observations to the last n - k ones by uçing a bivariate function h(x, y) t h a ~ is often

c d e d the kemel function in the literature on U-statistics (see Section 2.3).

Instead of the alternative & of at most one change, we will &O consider a more

general one that d o w s at most s change-points' in the sequence XI, . . . , X,, namely

that we have

ET:) : XI, . . . , X, are independent random variables and there exîst s,

1 < s < n, integers TI = T&), r z = r2(n), . . ., rS = r,(n), 1 < - 7 1 5 7-2 5 . . . < rs < n, such that l'{XI 5 t ) = . . . = P{Xn 5 t) ,

- P{Xq,, 5 t ) = ... = B7(X, 5 t), .. -, 4x,,+, 5 t ) = ... - q X , 5 t) for all t and P{X, 5 to) # P{X,+I 5 to) for some to

and for all 1 < i < S .

Similarly as before, we are now testing for at most s changes in the distribution.

Hence, we will have to use a statistic that will sornehow 'feel' the possibility of s

changes. We will split the given sample of size n into s + 1, 1 5 s < n, blocks,

compare each of them and combine the correspondhg kernel functions h(x, y) a p

propriat ely.

In view of Chapters 3 - 6, here we give an o v e ~ e w of the so-called Pstatistics

which, for example, will be used to detect changes in the mean or the variance. We

give some basic r d t s , defmïtions, and examples (cf. Seding (1980), Chapter 5).

This class of st atistics was introduced in a fundamental paper by Hoeffdiiing (1948).

The members of this class have good consistency properties and we only assume

that our n observations are independent and identicdly distributed. An appealing

feature of a U-statistic (cf. S e f i g (1980), Section 5.3) is its simple stmcture a s a

sum of identicdly, but not necessarily independent, distributed random variables.

However, by the special device of sprojection', a U-statistic may be approxïmated

by a sum of i.i.d. r.v.'s and then classical limit theory for sums does carry over to

U-statistics. For pro06 we again refer to Seding (1980).

'1x1 this ixi well as in the following parts ofthis work s will always denote the number of changes. Hence, whenever we are tniking about s changes, we think of s as being an integer, i. e., s f N.

2.2, Motivation II

2.2 Motivation

As mentioned earlier, large parts of this thesis focus on constmcting stochaçtic

processes based on U-statistics. The construction of these processes is such that

statistical tests can be based on them for detecting possible changes in their distri-

bution.

Tests for a t most one change-point which are based on processes of U-statistics

were first studied by Csorg6 and Horvith (1986, 1988b, 1997). They investigate the

asymptotic properties (as n + m) of the U-statistics based process

where the kernel h(x, y) is either symmetric, i.e.,

h(z, y) = h(y, x), for dl x, y E Ry

Typicd choices for (2.2.2) are zy, (çarnple variance), lz - y 1 (Giniys mean

difference) , or sign(x + y) (Wilcoxon's one-sample statiçtic) .

Typical choices for (2 -2.3) are (z - y) or sign (x -y), since sign(x - y) = -sign(y - x) and sign(0) = O. Later on we will use to detect changes in the Msiance

and x - y to detect changes in the mean.

Csorgo and Horviith (1988b, 1997) give various asymptotic distributions of the

U-statistics based process {Zk, 1 5 k < n: ) under the null hypothesis Ho and the

alternative Ha for çymmetric and antisymmetric kernels. They also give tests that

can be used to reject Ho vs.* HA.

2.3 Definition of U-Statistics

Throughout this section we refer to Sertling (1980) and Casella and Berger (1990).

Definition 2.3.1 A parameter B is said to be estimable of degree T for a family of

distributions 7, if r is the smallest sample size for which there exists a function

h*(xl, . . . , x,) such that

for all F E 7, where X I , . . . , X, are independent observations on a distribution F.

h* (xl, . . . , x,) is called the kernel of 0 and does not depend on F.

If h'(x1, . . . , zr) = h* ( x ~ , , . . . : xar ) for all permutations of the integers (1, . . . , T ) , then h8(zl , . . . , x,) is called a symmetric kernel.

Let

where & denotes the sumrnation over all r! permutations (<ri, . . . , a,.) of (1,. . . , r). Then h of (2.3.2) is always a symmetnc kernel.

Definition 2.3.2 We define the mean square e m r of an estimator V, of a parameter

8 to be the function of 0 dehed by

Moreover, (EFVn - 8 ) is caIled the bias of a point estimator Vn of a parameter B .

Vn depends only on the random sample X I , . . ., X,, i. e., = V(Xl, - . . , Xn). If the bias is identically (in 8) equal to O, then the estimator Vn is called unbiused and

satisfis

Dennition 2.3.3 A U-statistic of an estimable parameter of degree r is created

with the symmetric kernel h(-) by forming

where Cc denotes the çummation over aU (:) combinations of r (n 2 r ) distinct

elements (al, . . . , %) kom (1:. . . , n).

Un is an unbiased estimator, since IEFU. = B.

Example 2.3.1 The sample mean If B(F) = mean of F = p(F) = J z d F ( x ) and

h(x) = x, then the corresponding U-statistic is

Example 2.3.2 The sample variance. If B(F) = variance of F = a2(F) = J(x -

~ ) ~ d F ( z ) and h(zl, x2) = f (zl - x * ) ~ , then the corresponding U-statistic is

When dealing with one sarnple U-statistics in this work, we will dways have r = 2.

Therefore our kernel functions h will depend on two (real-Vitlued) axguments only.

Next we will define special functions which wÏU be used throughout this work for

r = 2, e. g., change in the mean (see Section 3.4) or variance (see Section 3.0). We

consider the symmetric kernel h(xl, . . . , xT) where

2.4. Generalized U-Statistics 14

and put h, := h. We center at expectation, by defining

and put & := %. Since EFh = 0,

Furthemore, we define Co := O and, for 1 < c T ,

The functions h, and L, depend on F for c 5 r - 1 and the role of these functions

is technical only. E. g., they are used to calculate the variance of U-statistics (see

Lemrna 2.5.1). An application is given in Section 3.1.1, where we use the same

methodology for T = 2.

2.4 Generalized U-Statistics

Throughout this section we refer to S e r f h g (1980).

In the case where we have one or several changes in the data, Say we have

s changes, s E N, we consider s + 1 independent collections of independent ob-

servations x,(') , xi1), . .. , x?), ~ f j , . . - , x!+'), xisf 11, . . . , taken from distribution

functions F('), . . . , respectively. Shen for h being symmetric within each of

2.4. Generalized U-Statistics 15

its (s + 1) blocks of arguments we have

where B denotes a parametric function for which there is an unbiased estimator.

Definition 2.4.1 The U-statistic for eçtimating 8 is defined as

where {ijl,. . . , ija,) denotes a set of aj distinct elements of the set {1,2, . . . ,nj),

1 5 j 5 (s + l), and Cc denotes summation over dl such combinations.

In our specific setup nght now we will deal with one single changepoint (s=1)

at time r = r(n) := [dl, O < X < 1, in the data set X1, X2, . . ., X, and therefore

we will have the two independent collections XI7X2,. . . , Xr and X,+l, X7+27 . . ., X, of independent observations taken kom distributions F(') and ~ ( ~ 1 , respectively.

Hence, nl and nl in (2.4.2) are equal to T and n - r, respectively. Using the previous

notations with al = q = 1 and s = 1, we get

and furthemore,

We mention again that T depends on n.

Example 2.4.1 The Wilwxon 2-sample statistic. Consider two randorn samples as

above and, in addition, we also assume that F(') and F(2) are continuous. Then the

2 -5. Variance of U-S t atistics 16

is an unbiased estimator for

B(F('), ~ ( ' 1 )

2.5 Variance of U-Statistics

Ln Section 2.3 we have seen that U-statistics are unbiased estimators. But in addition

to that we will see that those classes of estimators are the best under aIl unbiased

estimators (cf. Casella and Berger (1990) and Ser3ing (1980)).

Dennition 2.5.1 An estimator V,' is cdled best unbiased estimator or uni fom min-

imum variance unbimed estimator of 0, if EFVn = 8, for dl 8, and for any other

estimator V, wïth EFVn = O, we have Var&: 5 VarFVn, for a l l 8.

We now state two lemmas due to Hoeffding (1948) which give an expicit formula

for the cdculation of the variance as well as upper and lower bounds and asymptotic

pro perties.

Lemma 2.5.1 (Koeffding, 1948) The variance of Un as in (2.3.3) is given by

where cc i s defined

The cdculation of the variance can be very

is very useful. It gives us an upper and lower

totic behavior of the variance.

difficult and therefore the next lemma

bound, the behavior, and the asyrnp

Lemma 2.5.2 (Hoeffding, 1948) The vanance of U, crs in (2.3.3) satisfies

2.6. Some Convergence Results for U-Statistics

That U-statistics a.re the best in the class of unbiased estimators of B(F) is stated

in the following theorem:

Theorem 2.5.1 ( S e f i g , 1980) If Sn = S(Xl , . . . , X,) is an tlnbiased estimator of

B(F) based on the sample X I , . . . , X, from the distribution F , then the w ~ e s p o n d i n g

U-statistic is also unbiased and VarF(Un) 5 VarF(Sn) . Equality holds if and only

if Sn =Un-

2.6 Some Convergence Results for U-Statistics

The following as. behavior result of U-statistics is due to Hoeffding (1961) and

follows from the Strong Law of Large Kumbers. It iç a very important resdt and

wili for example be used to prove Theorem 3.3.2.

Theorem 2.6.1 (HoeEding, 1961) Let Un be defined as in (2.3.3) and 0 as in (2.3.1).

If .&[hl < a>, then, as n -t w,

For generalized (s + 1)-sample U-statistics, Sen (1977, Theorem 1) obtains strong

convergence of ~ 2 ~ ' ) under a stronger condition than that of Theorem 2.6.1.

Theorem 2.6.2 (Sen, 1977) Let uP+~) be defined as in (2.4.2), 0 as in (2.4.1), and

logC x = l o g ( x ~ l), vhere XV 1 denotes the maximum of x and 1. If Bp<~~,--.,Fcs+~, {]hl

logf [hls) < m, then, as nj + CC, 1 5 j 5 (s i- l ) ,

We mention that in Sen's theorem the condition

2.6. Some Convernence Results for U-Statistics 18

could be replaced by the stronger condition

since (2.6.2) implies (2.6.1). We note that condition (2.6.1) ïs sufEcient but not

necessaq (cf. Sen (1977, Section 3)).

In view of the next chanters, where we wül establiçh convergence in probability

results for a combination of U-statistics, we consider the two random samples XI, XZ,

. . ., X, and X,+i, Xr+21 . . ., Xn taken fiom distributions F(') and F(*), respectively,

where r = ~ ( n ) := [nX], O < X < 1. If we assume that lEFc~,{h(XL, X2)) < CY) then,

as n + w, we have by Theorem 2.6.1 that

Assuming that 82 := EF(~)(h(X,+l, XI+*)) < CO and defîning Yi = Xihr 1 -<

If n + w, then we have almost sure convergence of the last expression to &. Hence

we also have, as n + oo,

When combining the two different random samples with nl = ~ ( n ) := [nX],

722 = n - ~ ( n ) := n - [nX], O < X < 1 and cul = s = 1, we have t o assume that

2.6. Some Convergence Results for U-Statistics 19

I E F ( ~ ~ P ( 2 1 {Ih(Xi, X j ) ( logC Ih(Xi, Xj) 1) < w, 1 5 i < j 5 n. Then Theorem 2.6.2

applies and as n + oo we have that

where := IEF(~)3~~){h(XT, Xz+l)) and T := [TAXI, O < X < 1.

If F = F(') = ~ ( ~ 1 , t hen for u (~ ) ( X , , . . . , X, ; X,,l, . . . , X, ) in (2.6 -5) we have,

where el := IEF{h(Xl, X 2 ) ) and T := [nX], O < X < 1. Therefore, in this case,

(2.6-5) holds tme assuming I E F { ~ ( X ~ , X ~ ) ) < oo only, just Like in (2.6.3). More-

over, (2.6.3), (2.6.4) and (2.6.5) converge in probability to the same finite value.

Example 2.6.1 Let us assume that ~ ( l ) = N ( p l , 4) and F ( ~ ) = N(p2 , 6) and

d e h e the kemel h(x, y) = ?(z - y)2. This kernei was used in Example 2.3.2 where

we considered the sample variance. Using the results from above and letting again

r(n) := [nX], O < X < 1, we have that, as n + oo,

2.6. Some Convergence Results for U-Statistics 20

where O1 := IEF{h(Xi, X z ) ) = a2, an account of pl = ,u2 and a: = 02 =: 9.

'"Tf the facts don% fit the theory,

h g e the facts."

- Albe7-t Einstein

Chapter 3

At Most One Change-point

3.1 Introduction

We are to test the nul1 hypothesis

Ho : X I , . . . , X, are independent identically d i s t ~ b u t e d r a n d o m variables

against the alternative that there is at most one (single) change-point in the sequence

XI) X2, . . - > Xn, namely that we have

HA : XI,. . . , X, are independent random variables a n d there is a n in-

teger T, 1 5 T < R, such that P{Xl < t ) = . . . - - qxT 5 t ) ,

f lXr+, t ) = ... = P{Xn 5 t ) for all t and P{Xr 5 to) # IP(X,+l $ to) fo r some to.

As mentioned in Section 2.2, tests for at most one change-point which are based on

processes of U-statistics were studied by Cs6rgo and Homath (1986, 1988b, 1997).

Similarly we will investigate the asymptotic properties (as n + oo) of the U-statistics

based process

where the kernel h(x, y) is either symmetric or antisymmetric. We will state their

basic results and, moreover, we will impose conditions such that we will have a

3.1. Introduction

good estirnator for the time of change not only in the antisymrnetric case (cf. Sec-

tion 3.2.3), but dso in the symmetric one (cf. Section 3.3.3): which they have not

investigated. Then w e will apply the theoretical results to detect at most one (sin-

gle) change in the mean (h antisymmetric) and in the variance (h symrnetric). In

particular, testing for at most one change in the mean is illustrated by using a ge-

ometncal argument (cf. Section 3.4). We mention that though in part we build on

them, most of the results of Csorg6 and Horviith (1986,1988b, 1997) in this section

becorne imrnediate consequences d the results in Chapter 5 which ded with multiple

changes (put s = 1).

3.1.1 Notations under the Nul1 Hypothesis Ho

We define1

and

I E ~ ~ ( x ~ , x ~ ) =: Y, I < i < j s n .

We assume throughout the whole chapter that

V < 00,

which of course implies

lNote that in Section 2.3 and Section 2.6 we used the equident notation 8 := IE~h(xi , Xj), where F denotes the distribution function of the i.i.d. r.v.'s XI, -. . , X,, instead-

3.1. Introduction 23

Furthemore, the expected value of Zk, 1 5 k < n, dehed by

where the kernel h(z, y) is either symmetric or ant&etrïc, becomes

Let X and Y be independent identically distributed random variables and h be

an antisymrnetric kernel as defined in (2.2.3). Shen we have that

which is possible only, if

h(t) := ~ { h ( t , x2) - 0).

Then condition (3.1.4) implies that

E~*(x,) < M.


We also assume that

which is the so-callecl non-degenerute case mhen studying ùistatistics. In this and

the following chapters we will focus on this case, but we mention that the degenerate

case in the context of change-point analysis was, for example, studied by Csargo and

H o d t h (1997, Section 2.4).

We mention that we will use the symbol 9 in this chapter, as meli as in the

following ones, for the variance of the given data X I , . . . , X,, and s2 for the expected

value as in (3.1.10). The function in (3.1.9) induces the projection of U-statistics

into sums of i.i.d. r.v.>s. We centralize Zk by its mean, and put

where the kernel function h in Zk is symmetric. Actually, we define the same for an

antisymmetréc kernel function, but since (3.1.8) holds,

and in th& case we define

Moreover, we may write Zk as the s u m of three U-statiçtiw and accordingly, Us, 1 5

k < n, equals to

where


Similarly we define for an antisymmetric kernel

where

For further use we dso define

and

In this, and in the following chapters as well, U-statistic based processes that

have a bar on top have an antisymmetric kernel and those without a bar a symmetric

kernel. This will make it easier to distinguish between the antisymmetric and the

symmetric cases.


3.1.2 Notations under the Alternative HA

Let ~ ( ' ) ( t ) = lP {X , 5 t) and F(*) (t) = lP{X,+l 5 t ) be the respective distribution

functions of the observations before and after the postuiated change T, and put2

and

Assume throughout the whole chapter that Eh2 (Xi, Xj) is finite for all possible

choices of i and j, namely

which implies that

Due to (3.1.17) and (3.1.20), we can calculate the expected value of Zk7 1 5 k < n,

which is dehed by

where the kemel h(z, y) is either symmetric or antisymmetric. Namely, under HA,

* ~ o t e that in Section 2.4 and Section 2.6 we i w d the equident notations Bi := EFci1 h(Xi, X2), $2 := EFW h(X,+i, &+z) and 81.2 := IEF~*, ,F(2) h(X,, x,+~) instead.


we have

- 7 % ~ + k ( ~ - k)&, i ~ k ~ . r (3.1.22)

~ ( n - k)ûl,2+ (n- k)(k-T)&, T 5 k < n-

Therefore, by using (3.l.8), we get that in the antisymmetn'c case

and (3.1 -22) becomes

When searching for the possible time of change T , one has to investigate the

behavior of the process &, 1 5 k < n; under HA, and when loolcing at the expected

value of the process as a function in k, we see that it is increasing before the postu-

lated change-point r and decreasing after it, when assuming that h is antisymmetric

and is positive. Moreover, it reaches its maximum at time r , the change-point.

Therefore, we have to find k, where the process Zk, 1 5 k < n, reaches its maximum.

Consequently, for ûl,* positive, we define 2 . to be the maximum (taken in k) over

dl Zk7s, 1 5 k < n, Le.,

2, = max C C h ( X i , X j ) , lCk<* l<i<k k< j<n

and as a n estimation for the changepoint r we define

Since (3.1.24) is easier to handle than (3.1.22), more is known when we have an

3.2. Antisymmetric Kernek 28

antisynzmetrac kernel than about a symmetn'c kernel: It is clear that we will need

to impose special conditions on the expected value using a syrnmetric kernel, such

that the maximum again wilI be reached at time r.

We mention that the moments under the null hypothesis of no change may be

derived from the formulae under the alternative. Since

and

and if we define

then the previous results reduce to the corresponding ones in Section 3.1.1.

3.2 Antisymmetric Kernels

We consider the processes Zk, 1 5 k < n, n 2 2, where the kernel h is antisymmetric

as in (2.2.3). By using the notations and assumptions from Sections 3.1.1 and 3.1.2

here, n7e give some mell known asymptotic results (cf. Csorgo and Homith (1986,

1988b, 1997)) under Ho (cf. Section 3.2.1) and HA (cf. Section 3.2.2) as weU as the

time of change r (cf. Section 3.2.3).

3.2.1 Asymptotic Results under Ho

we wish to study the Iimiting bahavior of the stochastic process, as n -t w,

3.2, A n t i s m e t r i c Kernels 29

in the supnorm under the null hypothesis of no-change.

The asymptotic behavior of Dk will be derived fkom the following redvction prin-

ciple (Lemma 3.2.1). It is a consequence of Janson and Wichura (1983, Theorem

2 .l) and git-en explkit$ by Huse (1988, Lemma 2.1.6).

Lemma 3.2.1 (Huçe, 1988) Let h be an antisymmetric kernel and uk as in (3.l.M).

Then under Ho the followihg statements hold true as n -t CO:

k

max I z ~ ' - C ( k - 2 i + i ) T L ( ~ ~ ) l = OP(n), 1SkSn i=1

The basic idea for the proof is given by the above mentioned Jançon and Wichura.

Tbeir results are stated in the context of stochastic area integralS. We will state

their theorem (see Theorem 4.2.1) and d l diçcusç how this lemma follows from

their theorem in the context of at most two change-points in Chapter 4.

Janson and Wichura do not present a detailed proof. A detailed proof for this

lemma is given by Huse (1988, Lemma 2.1.6). She proves that the sequence of

randorn variables

is a martingale. Using the notations of Billingsley (1986, Section 35), we quote:

Definition 3.2.1 Let X I , X2, . . . be a sequence of random variables on a probability

space (0, F, P) and let f i , F2, . . . be a sequence of 0-fields in 7. Then the sequence

{(X,, 3,) : n = 1 ,2 , . . .) is a martingale if the following four conditioins hold:

If instead of 4. we have

then the sequence {(X,, Fn) : n = 1,2, . . .) is calied a subrnaltingale.

Having shown that 2; is a martingale sequence, and using the Hcijek-Rényi

inequality (see Shorack and Weber (1986, Appendix A)), Hue (1988) deduces

which shows that maxl<h<, (2; 1 = Op(n) . The other two statements of Lemma 3.2-1 - - are proved in a similar way. As a consequence, we have (cf. Corollary 2.1.7 of Huse

(1988))

Corollary 3.2.1 (Huse, 1988) Under the same conditions as in Lemma 3.2.1 we

have as n + w

Pioof of Corollary 3.2.1 Since Ük = z?) - ( ~ f ) + ~ f ) ) , 1 5 k < n, we have

that

k n = lok<n max IzL3) - (zf '+~f) ) - (nzh(xi) -kx&(xi)}( .

i=L i=l

We add and subtract specid tenns to apply the previous lemma. Accordingly,

we use the previous lemma and get the desired result, namely

We repeated here Hue's (1988) proof of Corollary 3.2.1 for the sake of demon-

strating the usefuleness of the reduction principle. Namely, Ük iç approximated by

sums of i.i.d. r.v.'s for which there are many Iimit theorems available.

Now we study the asymptotic behavior of G i , 1 5 k < n, in the supnorm. As

mentioned before the followi.ng results are due to Csorg6 and Horvat h (1988b, 1997).

Theorem 3.2.1 (Cs6rg6 and Homith, 198%) Assume that Ho, (2.2.3), (3.1.4)

and (3.1.10) hold. Then we can define a sequence of Broumian bfidges {B,(t), O < t 5 IlnEN SUC^ that, n + ûû,

3.2. Antisymmetric Kernels 32

where

for each n and with B being a standard Brownian bridge.

A proof is given by Csorg6 and Horvath (1988b, Theorem 4.1). They show that

n ~ f . ~ &(xi) may be associated with n W([nt]) and k Cr='=, h (xi) with [nt] W (n) , where W(=) is a standard Wiener process. This implies the theorem by uçing Corol-

I lary 3.2.1 via multiplying both sides of its statement by m. Theorem 3.2.1 implies that under Ho, as n + oo,

This means that the supfunctionals of IÜn(t)l converge in distribution to the s u p

functional of a Brownian bridge. Consequently, we can use tables for the supremum

of a Brownian bridge to accept or reject Ho. It is known that

the well known Limiting distribution of the two-sided Kolmogorov-Smirnov statistic

(cf. Kolmogorov, 1933) which has been widely tabulated.

Tests based on sup,,,,, IÙn(t)( are not sensitive on the tales. Hence, we note,

that a weighted version of Theorem 3.2.1 holds true a s in Theorem 2.1 of Szyszkowicz

(1991) if we replace S([(n+f)t])- tS(n) c by Ün(t) . In particular, by using the weight

q(t) = (t(1 - t ) log log l Y 2 7

t ( 1 - t )

we have, for example, that, as n + CQ,

and we cm use tables for sup,,,,, (cf. Eastwood and Easixood (1998)) and

3.2. Antisymmetric Kernek 33

jatol reject Ho for Iarge values of supO<t<I 4 ( f ) .

3.2.2 Asymptotic Results under HA

Cs6rg6 and H o d t h (1988b, Theorem 3.1) studied the asymptotic behavior of Ük

under the alternative HA and proved the following theorem.

Theorem 3.2.2 (Csôrgii and Horv&th, 1988b) Assume that (2.2.3), HA hold and

where 7 = ~ ( n ) := [nX], O < X < 1, and logt x = log(x V 1). men, as n -t m,

We note that Theorem 3.2.2 also holds if we replace condition (3.2.4) by condi-

tion (3.1.19). This theorem follows fkom Theorem 3.3.2, where h is assumed to be

symmetric, by using the fact that = = 0.

Theorem 3.2.2 implies the consistency of tests based on {Ü[(n+i)tl,o<t<~)) - where

Ük = Zk, 1 5 k < n. Assuming second finite moments as in (3.1.19) such that the 1 - results under Ho hold then we can reject Ho vs. a, when supo,,,, IU[(n+l)tl[

is large and OIv2 # 0.

3.2.3 Estimating the Time of Change

We have seen in Section 3.1 that an antisyrnmetric kemel depends ody on 0L,2 but

not on QI or 02. Hence, if is positive we define

since the point where Zk: 1 C, k < n, reaches its unique maximum is at k = T y as it

cm be seen in Figure 3.2.1.

Figure 3.2.1: The iimiting function ü) ( t ) with = 10 takes its maximum value of 2.2 at t = X = 5. Note: The x-axis denotes t and the y-axis denotes 2iA(t).

Otherwise, if is negative, we define

min {k : Z . = min z,) l<m<n

since the point where Zk, 1 5 k < n reaches its unique minimum is a t k = T .

P In practice $14 is ~ n h ~ m , but since $Z[(n+i)tl = &ü[cn+l>ti -) üx(t), &Z[(n+i)tl

is likely to exhibit whether WC have a --type or -type function. E-g., if we have a

- - m e fünction then we take .î to estimate r. Ferger and Stute (1992) showed that

î and i respectively are strongly consistent estimaton of T and Gombay (1998) gave

the asymptotic distribution of rnaxl<k<, Zk and that of T - 7 under the alternative - HA as we wili discuss in Section 3.3.3.

Now we are interested in the nd-distribution of î which may be descnbed as the

distribution of the argument of the maximum of Zk, 1 5 k < n. From Theorem 3.2.1 + we know that Un(t) converges weakly to a Brownian bridge. Hence, ; converges in

distribution to the time where a Brownian bridge reaches its maximum and it follows

3.3. Symmetric Kernels 35

from Birnbaum and Pyke (1958) (cf. Shorack and Wellner (1986, p. 385)) thzt the

time where a Brownian bridge reaches its maximum is a Uniform(0,l)-distributed

raadom variable. Hence, under Ho, .î takes every integer value from 1 to n with the

same probability as stated in the following theorem by Cs6rgo and Homath (1997,

Theorem 2 -4.14) :

Theorem 3.2.3 (Csorgo and Homith, 1997) We assume that Ho, (2.2.3), (3.1.4)

and (3.1.10) hold, then

Of course, using simila arguments as before we get a similm result for the

minïmïzer i which may be stated in a corollary as follows:

Corollary 3.2.2 We assume that Ho, (2.2.3), (3.1.4) and (3.1.10) hold, then

3.3 Symmetric Kernels

We consider the processes Zk, 1 5 k < cc, n 2 2, where the kernel h is symmetric as

in (2.2.2). By using notations and assumptions from Sections 3.1.1 and 3.1.2 we give

some well known results (cf. Csorgo and H o d t h (1997)) under Ho (cf. Section 3.3.1)

and Ha (cf. Section 3.3.2). F'urthermore, we will investigate under what assumptions

we may define an estimator for the time of change (cf. Section 3.3.3) as in the

antisymmetric case in Section 3.2-3.


We wish to study the limiting bahavior of the stochastic proces, as n + W,


under the nul1 hypothesis of no-change in distribution. Similarly to the antisymmet-

ric case, the asymptotic behavior of U , d l be derived fiom the following reduction

principle (Lemma 3.3.1). it is a consequence of Theorem 1 of P. Hall (1979) and

given explicitly by Huse (1988, Lernma 2-1-12).

Lemma 3.3.1 (Huse, 1988) Let h be a symrnetric kernel and Uk as in (3.1.13).

Then under Ho the follom-ng staternents hold tme as n + oo:

k

max l ~ f ) - k C h ( ~ ~ ) l = Op(n), L<k<n

i=l

The proof is similar to that of Lemma 3.2.1. Again me can combine the three

statements above and get the following corollary as an immediate consequence.

Corollary 3.3.1 (Huse, 1988) Under the same conditions as in Lemma 3.3.1 we

have as n + cm

We shall see that the limit of {Uk, 1 5 k < n), as n -t cm, is a Gaussian process,

which is identified in the next theorem. Let {I'(t), O < t 5 1) be a Gaussian process

defined by

max lsksn

r(t) = (1 - t)w(t) + t(w(i) - w(t)), O 5 t 5 1,

where W is a standard Wiener process. Since

IEW(t)=0, os tg ,

k TL k

U , - { ( n - k ) x h ( ~ i ) + k ( C Ï t ( ~ ~ ) - xh(X i ) ) } I = Op(n). i=1 i=l i=f

3.3. Symrnetric Kernels 37

we have that

By using the fact that

the variance of this Gauçsian process is

vd'(t) = v a r ((1 - t) W(t) + t(W(1) - ~ ( t ) ) )

= (1 - t ) 2 ~ a r w (t) + t2 (var w (1) - Var W (t) )

= (1 - q 2 t ttZ(1- t )

= (1-t)t, o < t g .

Let {B( t ) , 1 5 t 5 1) be a Brownian bridge. Then we have that

where O 5 t 5 1. Although these expected values and variances are the same,

the two stochastic processes are not the same. This can be seen by checking the

covariance function of these sto chastic processes. Recall that a Gaussian process

is uniquely determined by its coMliance structure. By calcdating the covariance

function of the Brownian bridge {B(t ) , 1 5 t I: 1) and using the facts that


and

we get that

Cm [B ( s ) , B (t)] = EB ( s ) B (t)

= IE(W(s) - sW(l))(W(t) - tW(1 ) )

= IE(W(s)W(t) - s W ( l ) W ( t ) - t W ( s ) W ( l ) + s tw2(1 ) )

= IEW(s)W(t) - sIEW(1)W(t) -tEW(s)W(l)

+ stlEwZ(1)

= s A t - ~ ( l A t ) - t ( s A 1) + s t ( l A 1)

= s A t - s t - t s + &

= s A t - s t , 0 5 s , t _< 1. (3.3.2)

Similarly, we get the covariance function of our Gaussian proces { r ( t ) , O 5 t 5

11, narnely

3.3. Symmetric KerneIs 39

+ ( 1 - s ) ~ ( ( s A 1 ) - (S A t ) )

+ &((I A 1) - (S A 1 ) - (1 A t ) + ( S A t ) )

= ( 1 - ~ ) ( 1 - t ) ( ~ ~ t ) + s ( I - t ) ( t - AS))

+ (1 - s ) ~ ( s - (S A t ) ) + s t ( 1 - s - t + ( S A t ) )

= ( S A t)((l - ~ ) ( 1 - t ) - ~ ( 1 - t ) - (1 - s) t+ s t )

+ s(l - t)t + ( 1 - s ) t s + st(l - s - t)

= ( 1 - 2 s ) ( I - 2 t ) ( s A t ) + ( 3 - 2s- 2 t ) s t ,

OSs , t _< 1. (3.3.3)

Since the covariance functions of the Gaussian processes {I'(t): O 5 t 5 1 ) and

{B(t) , O 5 t < 1) are not the same, the two processes are also not the same. Hence,

we are now in the position to state the next theorem which is due to Csorgo and

Homath (1988b, 1997).

Theorem 3.3.1 (Csorg6 and Horviith, 1988b) Assume that Ho, (2.2.2), (3.1.4)

and (3.1.10) hold. Then we can define a sequence of Gaussian processes {r,(t), O $

t < l)nEN SUC^ that, 0.S 72 + 00,

and, for each n, we have

A proof is given by Csorgo and H 0 4 t h (1997, Theorem 2.4.1). They use Corol-

lary 3.3.1 and the fact that we can define a Wiener process {W(t), O 5 t < ca} such

that

Theorem 3.3.1 Mplies that under Ho, as n -t co,

that is to Say the supremum of IU,(t)l converges in distribution to the supremum

of the absolute value of the Gaussian process r(*) as in (3.3.1). One can compute

procentiles of its distribution function by producing tables for the latter random vari-

able. Then we can use those tables to reject Ho for large values of supo<t<i IU,(t)l.

Tests based on sup,,,,, [U,(t)l are not sensitive on the tales. Hence, we note,

that a weighted vernon of Theorem 3.3.1 holds true as in Theorem 2.1 of Szyszkow-

icz (1991) if we replace S([(n+l>t])-tS(n) c by UJt) and the Brownian bridges by the

Gaussian process r,(=). In particular, using the weight function

*(t) = (t (1 - t ) log log Y*? t(1 - t)

we have, for example, that, as n -t oo,

SUP IUn(t)l IW I + sup -.

o<t<i q(t ) o<t<i q(t)

Producing tables for the distribution function of the random variable supo,,,,

is desirable for use in testing for Ha.

3.3.2 Asymptotic Results under HA

Cs6rg6 and Horviith (1988b, Theorem 3.1) studied the açymptotic behavior of our U-

statistics based process under the aIternative HA and showed the following theorem.

Theorem 3.3.2 (Csorgo and Ho&th, 1988b) Assume that (2.2.2), HA hold and

Wlh(+~l, X[n~l+i)l log+- (Ih(X[nyJfny+i)l)) < (3.3.7)

3.3. Symmetric KemeIs 41

zuhe7e T = ~ ( n ) := [nX], O < X < 1, and log% = log(x v 1). Then, vith Zk as in (3.1.1), we haue, as n + w,

Again we mention that the theorem &O holds, if condition (3.3.7) is replaced by

condition (3.1.19). Sirice the way of proving this theorem wiU be of interest when

having an alternative of more than one changes, we give a proof of the theorem. It

is similar to the ones given by Csôrgo and Horviith (1986, Theorem 3.1, and 1988b,

Theorem 3.1) and Huse (1988, Theorem 2.3.7).

Proof of Theorem 3.3.2 We have r = r(n) = [dl, 1 9 r < n, the single change-

point under the alternative HA. We put m = [(n + l)t] and first assume that

l < m < ~ .

In view of Theorem 2.6.1 by Koeffding (1961) and Theorem 2.6.2 by Sen (l977),

we have to change the summation of Z,, 1 5 m 5 T, such that we can apply these

theorems on U-statistics and generalized U-statistics. We have

We have just split Z, into three parts, where the last two are now of the forms

of generalized U-statistics. We also have to rewrite the fkst part for the sake of

applying any of the previously mentioned theorems. To do this, we look a t the

summation meas of the following sums


Figure 3.3.2: Summation Area

where An is the summation area we want t o change. Figure 3.3.2 shows the areas

A,, A:*) and .G3) in cornparison to that of A:) and we can see that

Note that now Aii), AL^) and 43) are U-statistics without their respective nor-

malizing factors. In these U-statistics the underlying random variables are from the

same distribution. Therefore, (3.3.8) becomes

where An4) and A;') deno te the last two summands in (3 -3.8). We mention again that , except for their missing normalking factors (see (2.3.3)), A L I ) , 42) and An3) are (non-

degenerate) one-sample U-statistics and A?) and 45) are generalized two-sample


U-statistics. Hence, Hoeffding's Strong Law of Large Numbers (SLLN) applies and,

as n + oo, it yields

This together with

Similady, we obtain that

When looking at 43), we see that we can not immediately apply Hoeffding's SLLN

theorem, since the summation is taken over m + 1 5 i < j 5 r and we would need

a 1 to start with, instead of m + 1. But since XI,. . . , X, are i.i.d. r.v.'s , we have

where r = ~ ( n ) := [nA] and m = m(n) := [(n + l)t]. Using now Hoeffding's SLLN

theorem and taking (3.3.13) into consideration we get the following convergence in

probability result :

g4) and 45), normdization

are generalized two-sample U-statistics, except that the appropriat e

factors are missing. Therefore Theorem 2.6.2 by Sen (1977) applies


and we get

Since 1 - - 1

we obtain T (n - T ) [nX] (n - [ d l ) '

and similarly, we get

Finally, as n -t co, we arrive at

= t ( A - t)01 + t ( l - A)B1,2, t E [O, A], (3.3.18)

and this proves the theorem for 1 5 m 5 T. The proof for r < rn 5 n is similar and

hence omitted- O

Theorem 3.3.2 can be uçed to study the consistency of tests based on (Ul,+llq, O 2 t < 11, where Uk = Zk - k(n - k)0 , 1 5 k < n. Assuming second finite moments

as in (3.1.19) such that the results under Ho hold, then we can consistently re-

ject Ho vs. a, when sup0,,,, -& IU[(nii)tll is large except in the case where

B1 = O2 = el,* = 0.


3.3.3 Estimating the Time of Change

IR Section 3.2.3 we considered an estimator for the time of change for an antisym-

metric kernel. We either used the unique maximum or the unique minimum of Zk,

1 < k < n, to estimate the change-point T = ~ ( n ) := [dl, O < X < 1. In the

symmetnc case we will have similar results, but we will have to impose conditions3

on BI, O*, 0i2 and A, if we were to have that ux(t) fiom Theorem 3.3.2 should have a

unique maximum or

(cf. Theorem 3.3.2)

Differentiating with

minimum respectively at A. To do this, we consider the function

respect to t we get the derivative of uA(t), namley

furthermore, we get the second derivative of ux(t), Le.,

which tells us that ux(t) is a concave function if and O2 in (3.1.17) are positive.

Recall that O1 # O and O2 # O, since we are dealing with symmetric kernels. Using

the fact that a concave function is a function of the form n, we have to search for a

maximum. If both are negative, then ux(t) is a convex function, which is a function

of the form -, and we have to search for a minimum. So we defme

-

3Cs6rg6 and HorvAth (1998, Section 2.4) and Ferger and Stute (1992) suggest to impose conditions, but do not give them expiicitly.


and

T = [d] = min{k:&= min z,), L<m<n

Let us first consider the case when Oi and axe positive. We know that

u~(0) = uA (1) = 0, and U A (A) = X(1-

but we don't know, whether the maximum is taken at t = X or at O < t-, < X

and/or a t X < tmG2 < 1. It could happen that the maximum is taken at t = tmm,

or t = tmoz2. Figure 3.3.3 shows the graph of the limiting function (t) where the

maximum is taken at t = tW2 and not at t = A, and similarly, Figure 3.3.4 shows

the the graph of ux(t) when there are two local maxima at t = t-, and t = tmm,.

Figure 3.3.3: The limiting function u~ ( t ) with el = 1, O2 = 2 and 8y2 = 3 takes its 10

maximum value of 0.55 at t = 0.475.

Hence, we have to restrict ourselves to have a unique maximum at t = A only.

Similarly, if Bl and O2 are negative, we have to restrict ouselves to have a unique

minimum at t = X only.


Figure 3.3.4: The limiting function u~ (t) with BI = 3, B2 = 2 and 81,2 = 1 takes its 1

2

maximum value of ) at t = 3.

If O1 < O < B2 or O2 < O < Bi then we have a convex-concave or concave-

convex function respectively. Hence, we are looking for a minimum or maximum

respectively. Again, we have to restrict ourselves. We mention that in these cases

there could be also a maximum or minimum respectively.

In view of hding a unique maximum we look a t all possible extreme values of

ux (t). We get that

- Of course, we would like to have that X = t-, - t-,. Hence, we have to

investigate, if there is a maximum before andjor after the change-point A, narnely if

If now X < t-, is satisfied, then we have to show that the function value of t-, is


greater than the function value of A. We calculate ux(t-,) by plugging the second

part of (3.3.20) into (3.3.19). We consider

and show that the following inequality holds,

where we assume that 82 > O. Otherwise the inequality is never safisfied, except

when we are looking for the minimum. We have an '=' sign in (3.3.22), if

Similarly, we get that

3.3. Symmetric Kernek 49

and

where we assume that O1 > O. We have an '=' sign in (3.3.23), if

By using the definition of t-, and tmw, in (3.3.20), (3.3.21) leads to

If (3.3.24) is not satisfied, then uA(X) is the unique maximum. Therefore we have

to restrict ourselves to the following choice of A, narnely

where O < el < oo, O < O2 < DC> and 181,21 < w, to have a unique maximum at A.

If 81 < O < û2 or û2 < O < el then we may replace the right endpoint in (3.3.25)

by 1 or the left endpoint by O respectively. Furthemore, (3.3.25) needs to hold true

when O > 0, > -00, O > O2 > -cm and 181,21 < mY to have a unique minimum at A.

Using these conditions on el, û18 and X fkom (3.3.25), we are now in the

position to state a theorem by Ferger and Stute (1992) for symmetric kernel functions

that are also assumed to be bounded.

Theorem 3.3.3 (Ferger and Stute, 1992) Under suitable assumptions4 on the sym-

metric (or antisymrnetn'c) bounded kernel function h(x, y), such that X is the unique

'Note that these arsumptions, which were not given by Ferger and Stute (1992) explicitly, are the ones in (3.3.25) when O < Oi < ou, O < < oo and lûrV21 < cm. As mentioned above, similar versions of (3.3.25) hold when 01, and di ,2 are chosen in a dXerent way-


Figure 3.3.5: The limiting function u~ (t) with 81 = 1, = 2 and = 3 takes its 7 maximum value of 0.63 at t = X = E.

rnazZmizer (minimizer) of UA, and u i ( t ) # O, t # A, we have that, as n -+ m,

which implies that is a strongly consistent estimator of A. The same is true foi X.

Sumrnarizing, we need to impose the special conditions fiom (3.3.25) on the

bounded kernel h via el, & and and we will have a strongly consistent esimator

for the time of change (cf. Theorem 3.3.3). In practice, we have to check whether the

maximum or minimum respectively k taken in the latter interval. Moreover, since

the indicated expected values of h are usually unknown, i. e., 81, Q2 and are

unkaown, we have to estimate them from the data by uçing appropriate estimators.

If in (3.3.25) 81,2 + CO, i. e., when Olt becornes very large (by our assumptions

QI, O2 and OlY2 are dl finite), then the intenal in (3.3.25) converges to [O, 11. Tben

the estimators î and i work out fine. If on the other hand B1 ++ oo or & + co, then the inertval in (3.3.25) converges to either [O, O] or [l, 11, which means that

3.3, Symmetric Kernels 51

we can not use the estimaton at d l . Therefore, the optimal case is when we have

l&l, << 101,21 < oc. If O , , is between the others, then the i n t e d will just cover

a small area on the left or right half of the interval [O, 11.

Gombay (1998) defines

.T. = F(n) = min{k: & = max U,) l<m<n

as an estimator of r and considered the distribution of 7 under HA. She shows, for

example, that for a symmetxic non-degenerate kernel with finite second moment and

under some technical assumptions, as n -+ w: under HA we have

and

Moreover, she gives the distribution of T - r which depends on the underlying

distribution function and on the changepoint parameter A. It behaves like the

maximum of a two-sided random wak. In case of an antisymmetric kernel function

h, Gombay (1998) shows that, under HA, maXl<&<nÛk (cf. (3.1.12)) has the same - limiting distribution as rnaxl<k<, Uk . -

Similarly to the antisymmetric case, we would like to find the distribution of .î

under Ho, the nuil hypothesis of nechange. Since Uc is defined by

Zk, 1 5 k < n, reaches its maximum when Uk + k(n - k)B reaches its maximum.

We defhe


and get by using (3.3.27) that

Moreover, taking the sup on both sides,

But fiom Theorem 3.3.1 we know that under Ho, as n + oo,

which implies that supo<t<l Iu, (t) 1 behaves like supo,,,, 1 , where l? (t) is the

Gaussian process from (3.3.1). Of course, we may add and subtract a term in (3.3.29)

and get that, as n + w,

The latter statement Mplies that for n large

Consequently, via (3.3.28) and the latter statement we conclude that IZ,(t) 1 in t

reaches its rnâscirnurn for large n where Ir(t) + q t ( 1 - t)Ol does. But the latter

expression goes to oo as n -t m, since ?t(l - t ) B + m. Therefore, we do not

know where the maximum of Zn(t) iç taken. We may also think about using a

weighted version of Theorern 3.3.1. Indeed, ushg an appropriate weight function q

in (3.3.28), we get that

3.4. Change in the Mean 53

and as an estimation of T we now define

Zk z m = min k:-= max -1. { di) lSrncn dl) For example, using the weight function

we still have that

and therefore, as before, (3.3.30) goes to m as n -t m. Hence, in order to estimate

T , it seems that we ne& to centralize Zk, 1 5 k < 00, by its mean, i. e., we have

to work with Uk as in (3.3.27). This, however, wül not give us the desired result

anymore, since

max Uk # max Zk. l<k<n lsk<n

In case of an antisymmetric kerne! we had an '=' si- in (3.3.31) which made it

possible to use Theorem 3.2.1 and get the distribution of .î under Ho. But in case of

a symmetric kernel the fact that (3.3.31) holds does not aliow us to use Theorem 3.3.1

to give the distribution of î under Ho. Hence, the distribution of .Z under Ho is still

unknown.

In principle, we may, however, compute the distribution of T in (3.3.26) under

Ho: since (3.3.29) implies that [Un ( t ) 1 reaches its maximum where Ir@) 1 does. Hence + - n converges to the distribution of the argument where sup,,,,, Ir(t)l is taken.

3.4 Change in the Mean

We are to test the no-change in the mem null-hypothesis


Ho r X I , . . . , X, are independent identicdg distributed random variables

~ i t h E X ~ = ~ a n d ~ < c ? = V a r X ~ < CO, I s i S n ,

againçt the at most one change in the mean alternative

HA : XI , . . . , Xn are independent random variables and there is an in- - teger T, 1 < T < n, such thatEXl = ... = BXr # EX,I = . .. -

EXn and O < 9 = VarXi = .. . = VarX , < m.

Taking simulated values of the therein indicat ed (k, Xk) , Figure 3.4.6 gives an ex-

ample where a change in the mean occurs while the variance stays the same.

Figure 3 A.6: The data X I , . . . , are i.i.d. N(1,1)-distributed and Xmi , . . . , X~OOO are 'i.i.d. N(4,l)-distributed.

If we define the antisymmetric kernel


then the stochastic process Zk from (3.1.1) may be written as

which rnay be interpreted as comparing the meaa before the unknown time k, 1 5 k < n, of a possible change in the mean to the mean after change. Under Ho this

difference should be fluctuating near around zero, while under the alternative of

one change, the stochastic process Zk, 1 5 k < n, will have a maximum or minimum

at the time k = T.

When testing for more change-points we will use a different argument which in

our present situation is similar to the following. Observe that under Ho

then we will see that testing for a change in the mean can be Uustrated by using

a geornetrical argument. Consider the linear function m(t) := t, t E R, which joins

under Ho al1 the points (k, $ E { s ( ~ ) ) ) , k E N, if ,u # 0, and it joins all the points

(k, E{s(~))), k E N, ifp = O.

Without loss of generality let p = 1. Shen in Figure 3.4.7 we join al1 the points

(k, l E { ~ ( k ) ) ) , k E N, via the straight line m(t) = t. We pi& one k E (1,. . . , n-1):

and draw a hon~ontal line starting fiom B := (O, IE{s(~))), containing the point

(k,lE{~(k))), and with terminus, C := (n,lE{~(k))). We draw a vertical line

fiom the terminus and intersect the t-axis. We denote this intersection by D :=

(n, E{s(o))) , where me define S(0) := O. In this way we construct a rectangle,

Figure 3.4.7: A geometrical interpretation of E{nS(k) - kS(n) ) = O under Ho.

denoted by ABCD (see Figure 3.4.7), where A := (O, E{s(o))), with length n and

height E{S(k) ).

Reflecting each point of the rectangle ABCD around the 45 degree line rn = t, we

get a new rectangle AEFG, where A := (O, E{s(o))) is the reflection point of itself,

E := (O, E{s(~))) is the reflection point of D := (n, IE{s(o))) , F := (k, IE{s(~)))

is the reflection point of C := (nt E{s(~))) and G := (k, IE{s(o))) is the reflection

point of B := (o,E{s(~))), with length k and height IE{S(n)). Under Ho, both

rectangles have the same area. Conçequently, we have that


Thus, in principle, for each given k, 1 5 k < 72, we constnicted an unbiased

estimator of zero assuming that Ho is true. We may also Say that, viewed this

way, testing for one change in the rnean results in comparing areas of two different

rectangles with each other.

Inçtead of defining m(t) = t we may also d e h e m,(t) := pt which joins ail

the points (~,IE{s(~))), k E N. In a similar vein as before we get the çame

reçults. Moreover, under Ha the siope of the function m, will change exactiy at k

if E { S ( k ) ) = lE{S(r)) = PT. Hence the two areas wiU not be the same. Moreover,

the maximal difference between those two areas will be when k = T.

Continuùig with using the kernel h(z, y) = x - y to test for at most one change

in the mean, under & and 1 < i , j 5 n,

Furthermore, assuming that

we have

i. e., we assume that the variance is finite. We define


= t - p .

lEh2(x,) = JE{(X1 - = 2,

it follows from (3.4.3) that

and, in addition t o (3.4.3), we also assume that

Since h is an antisymmetn'c kernel, we know from Theorern 3.2.1 that under 6, as

n -03,

where for all O 5 t 5 1

We recall also for each n

with B being a standard Brownian bridge.

L\çsuming the same conditions as in Theorem 3.2.2, we obtain that under HA,

a s n + m ,

P sup Iün(t)l -+ 03.

O<t<l

Therefore we consistently reject Ho, if s~p~..~,, IÜn(t) ( becomes too big and me have

to think about 'what is too big?'. On account of (3.4.4) it foIlows that under Ho, as

n + w ,

This in turn means that we can use tables for the supremum of the absolute value

of a Brownian bridge to accept or reject Ho.

Since the latter test statistic is not sensitive on the tails, we may also use a

weighted version of (3.4.4) by using, for example, the weight function

1 \ 1/2 *(t) = (t(1 - t) log log

t(1 - t ) J

Consequently, with this q, as n + m,

IU&)l n sup - IW) 1 + sup - o<t<i q( t ) o<t<i q( t )

The variance O* in (3.4.5) is usually unknown. Consequently it has to be esti-

mated on the basis of the same random sample. One possible way of estimating 2 is via the sample variance


qrhere x,, = Xl*-*-+Xa n , the sample mean. According to Cs6rgo and Horvath (1997,

Section 2.1), the use of the so c d e d pooled variances

where

is preferable to that of (3.4.6).

By using the pooled variances, we consider the possibility that there be a change-

point in the data. Of course, this may also effect the variance. Therefore, we

compute the variance before and after each time k, any of which codd be a possible

change, instead of the variance of all n data, since the latter does not take a possible

change into consideration.

Csorgo and Horviith (1997, Section 2.1) show that by using the Law of Large

Numbers, weak uniform consistency of the sequence of estimators &lltl for esti-

mating 9 can be established, narnely, as n + oo,

Since ~,,1),1 " a consistent estimator for 02, cn(t) in (3.4.5) may be estimated by

and our previous result korn (3.4.4) cames over. In particular, as n + CO,

3.5. Change in the Variance 61

and, as n + w,

imply that we can still use tables for the distribution of sup,,,,, IB(t)l and reject A

the nul1 hypothesis of no change in the data, if our test statistic supo..t<i IÜ,(t) 1 becomes too big.

Summarizing, the test statistic sup,,,,, [ ~ , ( t ) ( may be consistently used to test

Ho against HA-

Moreover, if & is rejected, then the t h e of change may be estimated by

0,

3.5 Change in the Variance

We are to test the no-change in the vanance hypothesis

Ho : X1, . . . , X, are independent identically distributed randorn variables

u n ' t h E X i = p a n d O < $ = V a r X i < ~ , 1 s i < n ,

against the ut most one change in the variance alternative

& : XI, . - . , X,, are independent random variables and there is an in-

teger r , 1 5 T < n, svch that VarXl = . . . = VarX, # VarX,+l =

. . . = VarX,, O < VarX,, VarXTil < CO, and EXi = . . . = E X , =

P-

Taking simulated values of the therein indicated (k, Xk) , Figure 3.5.8 gives an ex-

ample where a change in the variance occurs while the mean stays the same.


Figure 3 -5.8: The data XI, . . . , are i.i.d. N(0, 1)-distributed and X701i . . . , XIOOO are i.i.d. N(O,4)-distributed.

Let us assume throughout this section, that 6 holds. If p is known, then the

problem seems to be very simple. Testing Ho against HA means that we are looking

for the change in the mean of (Xi - P ) ~ , 1 $ i 5 n. So, similarly to Section 3.4, we

look at the ciifFerence of

which under Ho shodd be fluctuating near zero. Hence, we consider the stochastic

process

If p were hown then this process is like Zk of Section 3.4 and assuming appropriate

moment conditions, a test statistic can be based on its supfunctional just like here.

Indeed, assuming that O < a2 = Var(X:) < m the results of Section 3.4 apply in

thiç context as well. Moreover, Gombay, Horvith aiid HuSkov5 (1996) also show

that estimating p if need be by its - under Ho - consistent estimator xn, the juçt

mentioned asymptotic results of Section 3.4 continue to hold tme in this context

as well, i. e., just Like as if p were known. When p iç unknown, as an alternative

approach, we may also use the symmetric kemel (cf. Example 2.3.2)

where under Ho and Xi # Xi,

Therefore, this h is an unbiased estimator for the variance o2 of the Xi's. Further-

more we have to assume that

and, on account of IE(Xi - p) = O and oz = E(Xi - P ) ~ , we have that under Ho and

xi # xj,


1 = - [ 2 ~ ( ~ i - p)4 + ~ ( v ~ T x ~ ) ~ ] .

4

Therefore our assumption in (3.5.1) becomes

s (t) = E(h (t, X*) - u2} 1

it follows from (3.5.1) that


and, iii addition to (3.5.l), we also assume that

Since h is a symmetric kemel, we know from Theorem 3.3.1 that under Ho, as

n+Oo,

where we define Un(t) as in (3.1.15) for O < t < 1 to be

1 un@) = - on3/2 U[(n+ l) t~ 1

with

and

where

for each n.

Similarly to Section 3.4, where we investigated the change in the mean, we

now have to estirnate the usually unknown parameters c2 and a*. Again we c m


estimate o2 by the usual estimator of the variance from (3.4.6), or by the pooled

Mnance âS(,+l)tl hem (3.4.7). The second one incorporates the fact that there may

be a change in the data better thaa the usual estimators for the variance, or

2 (Xi - CsorgO and Horv5th (1997) suggest the n-1 - - vaxiances.

We also have to estimate s2. Since

and B(Xl - p)* = VarX1, we estimate the second part in (3.5.4)

the pooled variances 6&,lltl. Tb estimate E(Xl - ,!L)~, we eithe~

for the 4 t h moments

use of the pooled

again by ô: or by

use the estimator

qrhere xn = X'+.--cxn n , or the pooled 4-th moments, since they similar to the pooled

variances feel a possible change better. To do so, define

1 Kn+l)tI n n (Ci=, (Xi - x[(n+i)tlI4 + C;=[(n+l)tl (Xi - ~ i ~ + ~ ) ~ ~ ) ~ ) 9 O 5 t < zl mi,+,,t] = i Z=l(xi - %14, - < t ~ l , n+L -

where x ~ ( , ~ ) ~ and Xi,+,)tl are defined in (3.4.7). By using the Law of Large Num-

bers, weak uniform consistency of the sequence of estirnators 7nt(n+,)tl for estimating

IE(Xl - p)* can be established.

Hence, we estimate Un@) in (3.5.3) by

and the test statistic s ~ p ~ , , , ~ [Ûn(t)l may be used to test Ho against HA. This is

due to the fact that, uniformly in t E ( O , 1) , Û,(t) is a consistent estimator of U,(t),


hence the results for Un@) ca,rry over. For instance, assuming the same conditions

as in Theorem 3.3.2 we know that under HA, as n -t co,

and fiom (3.5.2) it follows that under Ho, as n -t oc,

on account of ~up,,,,~ lÛ,(t) - Un(t) 1 = O&). This meam that the test statistic

s ~ p ~ < ~ < ~ 1Ûn ( t ) 1 converges in distribution to the supremum of the absolute value of

the Gausçian process r(t) and we can hope t o use tables for the distribution of the

supremum of this Gaussian process to accept or reject Ho.

Moreover, if Ho is rejected, then the t h e of change may be estimated by

i = [ni] := min {k : Zk = min z,), if Zk, 1 2 k < n, is - -shaped. lsm<rt

Let us assume that B I , 02, e13 > O (the other cases are similar). Shen we have

the unique maximum exactly at the change-point T , if the change-point lies in the

following interval (cf. (3.3 -25)) :

where r = ~ ( n ) = [nA], 8, = IE ( h (xi - 1 5 i c j 5 7: û2 = IE ($ (xi - x~)~), T < i < j 5 n, and @1,2 = IE($(x~ - x~)~), 1 5 i 5 T < j 5 n. These

parameters depend on the change-point T , hence we have to know them a prion to

check whether (3.5.7) is satisfied or not.


Suppose we estimate 81, and Bi,z by using î, the point where the maximum

is reached (or i respectively). Hence, we estimate O1 and O2 by the mual estimator

for the variance, the so-called second moment. To estimate the third parameter

we use the fact that OlYî = (VarXi + - 2BXiEXj + (VarXj + (EXj)2)-

Considering the n-shaped case, the respective estirnators are

This iç due to the fact that the estimators dl, & and may be very bad, if î is

far apart from r. Therefore it is necessary to know the parameters el, and 01,*

a prion. Otherwise the procedure of estimating the unknown changepoint T by î,

the point where the stochastic process Zk, 1 5 k < n takes its maximum, will not

be reliable. Moreover, Theorem 3.3.3 of Ferger and Stute d o s not hold, since we

do not have a unique maximum at k = T. As mentioned before in Section 3.3.3, the

above interval is very g ~ o d , if IdIl, 1021 <<

In conclusion, we should also say that the 'change in the mean' approach of

Gombay, Horv5th and Huâkova (1996) to testing for changes in the variance via the

stochastic process (cf. also Section 2.8.7 in Cs6rg6 and Horviith (1997))

appears to be preferable to that of (3.5.6) that we have just discussed.

"Courage is what it takes to stand up and speak.

Courage is also what it takes to sit down and listen."

- Winston ChurchdZ

Chapter 4

At Most Two Change-points

4.1 Introduction

We are to test the null hypothesis

Ho : XI,. . . , X, are independent identically distn'buted random variables

agaimt the alternative that there are at most two change-points in the sequence

XI, . . . , X, , namely t hat we have

We mention that the alternative HY) dowç US to consider random variables X I ,

X2, . . ., Xn with two changes in the distribution, but not necessarïly involving three

different distributions. For example, we could have a sample where the distribution

before the h t change is the same as the one after the second change, but in between

them we have a different distribution. This is the so-called epidemic alternative as

we shall see in Section 4.6

Suppose we were to test Ho vs. ~ f ) by using a properly nomalized çup-functional

of the stochastic process Z[(n+lltl, O < t < 1, fiom (3.1.1) as a test statistic. Bor-

rowing the notations fkom (4.1.16) and (4.1.17) the in probability limiting function


of - $ z ~ + ~ ) ~ ~ under HF) Will be seen (cf. (5.7.2)) to be

Moreover, if we put = = O3 = 0, OlV2 = (1-x~ 102.3 and $1,3 = (A1-A2)e23, then Ai Ar

u ~ ~ ~ ~ ( t ) is equal to zero for each t, O < t < 1, when testing for two changes. Con-

sequently, sup,,,,, 1-1 is not consistent in general, when testing Ho vs. HF). This means that in çome instances the dternative ~ 2 ) will be rejected although it

may be true. Therefore, we have to define a dinerent stochastic process which kind

of 'feels' the possibiliw of two changes. This wül then allow us to study the behavior

of its sup-functiond and suggest a consistent test statistic for testing Ho vs. ~ 2 ) .

Our experience gained from the previous chapter suggests to use a stochastic

process that depends on two time variables. To construct such a tw*time param-

eter stochastic process, we split the given sample X I , . . . , Xn into three blocks and

compare each of the blocks with the other two. This corresponds to the idea of com-

paring the mean before (the first change-point), between (the two change-points)

and after (the two change-points). Hence, we are using a kernel h(z, y) of two va&

ables x and y, and, since we will compare two blocks with each other at a tirne, we

have three dinerent possibilities to do so. Therefore, we define our new stochastic

process in kl and k2 as follows:

4 Introduction 72

In t h way we compare the three blocks (XI, . . . , Xk, ) , (Xkl+l, . . . y Xk2) and (Xk2+1i

-. ., X,j with each other, where kl and k2 v a q &om 1 to n - 2 and 2 to n - 1,

respectively.

We are to study the asymptotic properties (as n -t ca) of the process &,k2 y

1 < kl < k2 < n, which may be expressed in terms of U-statistics. To do so we give

some notations and definitions under the nul1 hypothesis of no-change and under

the alternative of at most two changes.

4.1.1 Notations under the Nul1 Hypothesis Ho

We define

and

Eh2(xi7xj) =: V, i l i < j l n .

We assume throughout the whole chapter that

V < 00,

which of course implies

Computing now the expected value of Zkl,k2, we obtain


= ((k2 - ki)ki+ (n - kz)k2)ey 1 < k l < k 2 < n . (4.1.7)

We d e h e , as in the previous chapter,

and again we assume that

We centralize ZklS;, by itç mean, and consider the process

where the kernel function h in ZkiYk2 is syrnrnetrie. For an antisyrnmetrie kernel we

define

Moreover, we can write &,,k2 as the sum of four U-statistics, and thus we get

where


Similarly, for an antisymmetric kernel we define

where

For further use, via (4.1 -10) and (4.1.13) respectively, we &O define

and

4.1.2 Notations under the Alternative Hf)

Let FC1)(t) = p{Xrl 5 t ) , Fe)@) = lP{X,+l 5 t ) and ~ ( ~ ) ( t ) = P{X,,l - < t ) be the respective distribution functions of the obsenxtions before the first, between


the first and second and after the second change, and put

Furthermore, we define

1 := [ n k ] and 72 := [nX2], O < XI 5 X2 < 1. (4.1.17)

Similarly t o (4.1.16), we also define the second moment of the kemel h by

and assume throughout the whole chapter that IEh2(Xi, Xj) is finite for al1 possible

choices of i and j , namely

Moreover, t his implies that

We will also use a weaker assumption than that of the existence of a finite second

moment of h, namely that for random variables from two different distributions we

4.2. Antisymmetric Kerneh 76

have

where log+ x = log(x V 1).


We consider the processes Zk1,k2, 1 5 kl < k2 < n, n 2 3, where the kernel

h is antisymmetrïc as in (2.2.3). By using the notations and assurnptions hom

Sections 4.1.1 and 4.1.2, we investigate the behavior of these processes under

(cf. Section 4.2.1) and ~ f ) (cf. Section 4.2.2).


We wish to study the limiting bahavior of

in the supnorm under the null hypothesis of nechange . For the indices kl and k2,

we write [(n + 1) tl] and [(n + l)tz] respectively.

The asymptotic behavior of Ü~(,+l),,l,[(,+l)t21, O < t, < t2 < 1: will be derived

from the fact that we may mite it in tenns of U-statistics as in (4.1.13) which in

turn rnay be repaced by sums of i.i.d. r.v.'s. Furthermore, we may then approxirnate

these surns of i.i.d. r.v.'s by Wiener processes.

We state Theorem 2.1 of Janson and Wichura (1983) in the degenerate case

which, via the proof of Theorem 4.1 of Cs6rgo and Horv&th (1988b) (cf. Lemma 3.2.1

and Corollary 3.2.1 in this thesis), will be the basis for our approach as well.

Theorem 4.2.1 (Janson and Wichura, 1983) X i , X2 , . . . are i-i-d. r.v. 's with corn-

mon distribution. If o2 = lEh2(x1) = O (degenerate case), then

where A[ntl = &i<jc[ntl h(Xi, Xi), h LE an antisymmetric kernel, the Ar 3 are

independent stochastic area processes, and the sum o n the right-hand side of (4.2.1)

converges almost surely unifonnly in t. Morwver, a stochwtic area process is defined

to be the Ito integral A = J; -V(s)dU(s) + U(s)dV(s) , with a two-dimensional

Wiener proeess W (s) = (Fi:;), s 3 O.

We mention that the U-statistics in (4.1.13) are assumed t o have non-degenerate

kemeh h. We may, however, construct kernels h* that are degenerate in the same

context. Recdl that E L ( X ~ ) = E h ( X 1 , X z ) = O and o2 = V U & ( X ~ ) = E ( ~ ( x ~ ) ) ~ > O and let

h(G Y) = h' (x, y) + 6 ( x ) - h(y) .

Sinee h is an antisymmetric kernel, it follows that

where for each x

Hence h* is an antisymmetric kernel and it is degenerate (5** = 1f3e2(x1) = O) as

well. Moreover, for the U-statistic 2:) defined in (4.1.13) we get

which implies that

where now z:(~) is a U-statistic with an antiçymmetric and degenerate kernel h*,

and Theorem 4.2.1 can now be applied. We mention that the * in (4.2.2), as well

as in the following formulas, indicates that we consider the difference of one of the

U-statistics in (4.1.13) and a sum of i.i.d. r d s .

For convenience we let

and show that

Consequently to (4.2.4) we will approximate ni1) (tl, & ) y which is a sum of i-i-à. r-v-'s,

by appropnate Wiener processes and by doing this, we will get a Gaussian limiting

4.2, Ant isynimetric Kernels 79

distribution for a properly normed version of U~~n+l)t11,~(n+l)t21-

As to (4.2.4), we first let

and conclude

Hence, we may now use formula (4.l.l3), and get

Based on (4.2.5), we arrive at the inequality

and proceed t o prove the following reduction principle.

rnax ls[(n+l>till[(n+l)t21<n pr:fki>t1]

max 1 z ( ~ ) lS[(n+i)tl]s[(n+i)t2~sTL Kn+l)tl ld(n+l)tzl

rnax 15[(n+1)~1]~[(n+l) tz]~n )t,],n

Proof of Lemma 4.2.1 The fht statement follows via (4.2.1) and the fact that,

as n + oo,

*O) Z[(n+l)tll,[(n+i)t2~ is a 'Li-statistic (with an antisymmetric

from [(n + l)tl] + 1 to [(n f l)t2] minus a s u m of i.i.d. r.v.'s.

and degenerate

Hence, we have -

kernel)

to shift

the interval [[(n + 1) tl + 11, [(n + i)t2]] to the intenml [l, [(n + 1) (t2 - t l )] J by using

the fact that for i.i.d. r.v.3 Xl,X2,. . . ,Xn

4.2. A~tiisymmetric Kernels 81

for each n. Moreover, by (4.2.1) and as n + oo,

- and we may mi t e supo<t2-tl<i - s ~ p ~ ~ < ~ ~ < ~ + ~ ~ . Since (4.2.7) holds uniformly in tl - and ta such that O < tl < t2 < 1, we have s ~ p ~ ~ < ~ ~ < ~ + ~ ~ - SU^^<^^^^^<^ and me vrive

at, as n + oo,

which proves the second statement in this lemma.

*(3) Similarly, Z~(n+l)t,l,n is a U-statistic (with an antisymrnetrïc and degenerate ker-

nel) £rom [(n + l ) t2] + 1 to n minus a s u m of i i d . r.v.'s. Hecce, we have to shift the

interval [[(n + l ) t2 + 11, n] to the interval [1, [n - (n + l ) tn ) ] ] . Then

for each n, and, as n + CO,

Restricting t2 to the interval (tl, 1) c (0, l), where ti is between O and 1, then the

supremum over this interval will also be O&), heilce, as n + cm,

which proves the third statement in th& lemma.


By usùig (4.2.1) together with (4.2.2) we get, as n + m,

which proves the fourth and last statement in this lemma. 0

As a consequence of Lemma 4.2.1 and (4.2.6) we conclude &O

Corolla-py 4.2.1 Let h be an anttsymmetric kernel and Ù[cn+l>ai,r(n+l)t21 be defined

as in (4.1.13). Under Ho, as n -t m, we have

[b+WiI - max

lS[(n+l)ti]l[(n+l>tz]In Iü[(n+l)ti],[(n+i)t21 - {[(n + l ) t2I j=1 C N X j )

We emphazise that Ü[(n+i)t,l,((n+l)sl is now approximated by s u s of i.i.d. r x ' s

for which there are many limit theorems available. In particular, we now are in

the position to study the asymptotic Gaussian behavior of Ü[~n+l)til,[(n+l)t21 in the

sup-nom. We give the following theorem as well as a detailed proof.

Theorem 4.2.2 Assume that Ho, (2.2.3), (4.1.5) and (4.1.9) hold. Then we can

define a sequence of Gaussian processes {rE(tl, t2)y 0 5 tl 5 t2 < l I n E ~ S U C ~ that,

a s n - t w ,

and, for each n,

where the Gawsian process ïa is defined via a Iinear combination of a standard

Wiener process W as follows:


Proof of Theorem 4.2.2 We defme for each n the two-time parameter Gaussian

process

Indeed, we may mite in temis of a Linex combination of independent increments

of W, namely

The just mentioned independency follows from the definition of W.

Similarly, we conclude that Fa in (4.2.10) is a Gaussian process. Using the fact

that Gaussian processes are uniquely detesmined by their covariance structure and

that l?: (for each n) and ra have the same covariance, it follows that for each n

Having shown that the above defined rg is a sequence of Gaussian processes, we

now prove (4.2.9) via using Corollary 4.2.1 and the fact that we can define a Wiener

process {W (t) , O < t < CO) SU& that (cf. Csorgo and Révész (1981, Theorem S .2.2.1

by Major (1979) combined with (S.2.2.2))), as n + 00,

Hence, bounding above supo<ti<t2<i by supo<ti<l or sup,,,,,, then we also have, as

n + w,

1 1 [(n+lPjI

- SU^ [Y C h(.&) - w(ntj)l = op(1), for j = 1,2. (4.2.11) n1I2 0<t1 < t n < l 0 i=1


Recall the definition of tcL1)(tl7 t2) as in (4.2.3), and dehe

- - - + sup ~ e ) (t17 h) 6n3I2 o<t1 <t2<l en312

where denotes the second expression on the right hand side. Next we observe

that by using the dehitions of nkl)(tl, t2) and rn we arrive, as n + co, at

t2 Z n - - w(nt,)l+ sup - I = C h ( ~ ~ ) - w ( n )

0<tr< t2<1 n1I2 0 j=l

where the op(l) statement follows from (4.2.11). Hence, it is proven that p, = op(l).

To avoid confusion with the notation, we denoted the two-time parameter Gaus-

sian process by ru, where the upper index a indicates that this Gaussian process

corresponds to the case where we have an antisymmetric kernel. In case of a sym-

metric kernel, we will use the upper index $yrn instead. E'urthermore we note that

for O 5 tr 5 t2 5 1


VarP(t1 , t2) = t2 (1 - t l)(l + tl - t2),

and

where to = ro := O and t3 = r3 := 1. Theorem 4.2.2 implies that under Ho

Moreover, this d o w s us to produce tables for the unknom distribution of supo,,,ct,c,

Ira (tl, t2) 1 and reject the nul1 hypothesis of n*change, if ~up, , ,~ <,,,, 1 Ü' (tl , t2) 1 be-

cornes too large.

4.2.2 Asymptotic Results under H!)

where we wiil assume that h(x, y) is a nondegenerate antisymmetric kernel and (4.1.2 1)

holds. Therefore (4.1.21) replaces the stronger assumption of a second fkite m e

ment as in (4.1.19) which was used to derive (cf. Section 4.2.1) the non-degenerate

convergence in distribution, as n + CO,

where the stochastic process un (ti, t2) was defined as

and the limiting Gaussian process ra (tl , t2) as

Although there will be a larger class of distributions (~ ( ' 1 , F(2), ~ ( ~ 1 ) that sat is

fies (4.1.21) than ( U . I g ) , naturally the non-degenerate results under Ho require

that the class of distributions where we are testing for 2 changes satides (4.1.19).

Nevertheless, we will use the weaker assumption (4.1.21) to denve the degenerate

asymptotic resultç under H?) .

The limiting hnction in (4.2.13) WU depend on the location of the (at most)

two changepoints [nX1] and [da], 0 < Al 5 XÎ < 1. Moreover, we will see that

SUPo<ti<t2<i Iün(tï> t2) 1 is consistent and goes to infinity in probabilia as n -t co,

under HZ). The limiting function in (4.2.13) will be derived similarly to the results of Sec-

tion 4.3.2 which deal with symmetric kernels. Since an antisymmetnc kernel may

be viewed as a symmetric kernel with O1 = e2 = O3 = O, we are jumping ahead

and use the results from Section 4.3.2 here. There it is shown that Z[(n+l)til,[(n+l)t21,

O < tl < t 2 < 1, may be split into many double sums, where the sirmmation is taken

over two blocks of r.v.'s that do not contain a change-point. Moreover, after proper

nomalization each of these double sums converges in probability to the appropriate

(cf. (4.3.14) - (4.3.16)).

Since Z[(n+ï)tl),[(n+1)t2~ O < ti < t2 < 1, in (4.1.2) with = [(n + l)ti], [(n+W11 Kn+UtaI 1 < i 5 2, is the sum of the three double s u m ~ ~j=[(,+i)tIl+l h(Xi, Xj),

[(n+L)til n Kn+ l ) t 2 1 n L i &=[(n+1)t2]+1 Xj) and L[(n+,)tI1+, C j=[(n+l)t21+1 h(Xi, X j ) we d e fine for technical purposes


where 0 5 xl < 2 2 5 2 3 $ 2 4 5 zs < 1 6 5 ZT x g < 1. Hence (4.2.14) is the sum of

exactly n ine double sums, where the summation is taken over two blocks of r.v.'s that

do not contain a change-point. Here z2, x3, x6 and 27 play the role of dummy vari-

ables and are used to split blocks that contain change-points. At most two of them

will actually be used, but since the location of the change-points is unknown, we need

to consider two possible changes in each of the blocks (X~~n+i)z11+17 . . . , X[(,+l),,l)

d ( X n + l z + 7 . . , X n Hence, the convergence result fiom Section 4.3.2

may be applied to (4.2.14) and we get, as n + m,

where = = 83$ = 0 = O and1

Moreover, if the distribution before the bt change and after the second change are

the same then we also have = 8 = 0.

We are now in the position to go back to the definition of Z[(n+l)til,[(n+i)tzlj

O < tl < k < 1, in (4.1.2) and mite it in terms of double sums S(-) as in (4.2.14).

Thus we arrive at

lWe note that := el, := O2 and 03.3 := O3 which was defineci in (4.1.16).

where we define to := O and t3 := - " Since we do not know the location of the n+l '

change-points [nXi] and [ d l ] , where O < Xi < X2 < 1, we define the following

functions = &(A1, X2, t17 t2), 1 5 i 5 6 , which w f l be used to derive a formula

for (4.2.13) for all possible combinations of tl, t2, X1 and A*:

ot herwise, tl < Al 5 A2 5 t 2 ,

otherwise, t* < AL < 1, Al 5 t2 < A2 < 1, ot herwise, t 2 < XI < X2 < 1,

where O < al 5 a2 5 ... 5 as < 1 and Q E [OJ], 1s i 5 6. We need to define these a i ' s , 1 5 i 5 6: since there are exactly six possibilities

to place taro change-points in three blocks. Moreover, exactly two of the latter ai's

will change to one of the values Al or X2, while the other four ai's will get the value

Q and they wilt drop out of the limiting function üA1,~2 (tl, t2). Hence, the actual

values of the Q'S are not important. The s's are needed to give a general formula

for the limiting function.

We are now in the position to state the folloming theorem which is an immediate

consequence of the previouç arguments in this section.

Theorem 4.2.3 Assume that (2.2.3), (4.1.20), (4.1.21), and HF) hold. D e f i e to :=


O and t3 := se2 If rl = q(n) := [ d l ] and ~2 = T&) := [ I z X ~ ] , O < Al 5 X2 < 1,

then, as n -+ w,

where

and 4 and the 3, 1 5 i 5 6, are d e f i e d in (4.2.15) and (4.2.18), respectiuely.

We note that in Theorem 4.2.3 = 02,2 = = B = O. Moreover, if the

distributions before the first change and after the second change are the same, then

we also have 4 3 = 0 = O.

The limiting function in (4.2.19) is defined for every possible combination of tl , ta, AL and A2. If we mite these cases individually, then (4.2.19) may also be written

as3

2Note that t3 + 1 when n + m. In (4.3.17) q := 5, "nce n = [(n + 1 ) ~ ] = [(n + l)ta]

has to be satisfied, but in the limiting function (4.2-19) t3 = 1. 3We note that in case of exactly one change-point, nameiy Xi = X2, the cases where O < ti 5

A1 < t 2 5 A2 < 1, O < XI < tl < t 2 < A2 < 1 and O < XI < tl 5 X2 < t 2 < 1 do not &st. We are left with the remahhg three cases, when Xi = X2.


Furthemore, if also = O, then the distribution before the first and after the

second change are the same. Consequently then, = e2& and (4.2.19) may be

written as

' (A2 - Xl)t2@1,2, O < tl < t z 5 Xi 5 X2 < 1,

((t2 - A 1 ) ( 1 - A2 f t l ) + (A2 - t 2 ) ~ 1 ) 8 1 , 2 , O < tl 5 Al < t2 5 X2 < 1,

(A2 - A l ) (1 - t 2 + tl)Ql,2, O < tl 5 Al < X2 < t2 < 1,

( ( ~ 2 - t l ) X 1 + (1 - X2) (t2 - ~ 1 ) ) 4 , 2 , O < AL < tl <t2 5 X 2 < 1,

((1 - - A l ) + (A* - tl)(l - t2 i- X1))8l12, O < A1 < tl 5 A2 < t 2 < 1,

, (1 - t W 2 - &)&,2> O < Ar 5 X2 < tl < t2 < 1. (4.2.21)

Assuming second h i t e moments Eh2 instead of (4.1.21) Theorern 4.2.3 implies the

consistency of tests based on supfunctionals of {Ü[(,+l)t,l,[(,+l),,l> O < tl < t2 < 1).

This means that we can conçistently reject Ho vs. HF): when

except in the case when O1 = e2 = e3 = el,2 = = 192,~ = 0-


GAI iX2 (tl, t2) is equal to O if and only if all Bi+ invdved are equal to O. To show

this we observe that each of the six parts of the limiting function in (4.2.20) may be

written as

where Ajy Bj, Cj and Dj depend on 812 and 8213. Moreover, uxl,x2 (tl , t2) = 0,

O < t, < t2 < 1, if and oüly if Aj = Oz Bi = O, Cj = O and Dj = 0, 1 5 j 5 6- For

example, the linear system of the six independent equations Dl = O, D2 = 0, . . ., D6 = O involves only three unknown parameters which implies that there is only

one possible solution, namely 012 = OIv3 = 8213 = 0.

W e observe that

Moreover, it follows

& = e2 = e3 = 01v2 = 01v3 = 02,3 = 0 implies

then that under the nul1 hypothesis Ho of no-change

Thus, on assuming that O1 = = O3 = O1,z = = = 0 = O Y the sequence

is not consistent against any class of alternatives. On the other hmd, if at Ieast one

Oij iç not equal to O and we use Fn, then

P{Ho is rejected when using T , I X ~ ) is true) 12-00 --t 1.

This implies that the Iimits of the sequence (Tn)nEh' are different in probability

under Ho and ~ f ) , namely we have consistency of {Tn lnEN-

4.3. Symmetric KerneIs 92


We consider the processes Uki,k2, 1 5 kl < k2 < n, n > 3, where the kemel h is

symmetric as in (2.2.2). By using the notations and assumptions Eiom Sections 1.1.1

and 4.1.2 we investigate the behavior of these processes under 6 (cf. Section 4.3.1)

and H?) (cf. Section 4.3.2) .



in the sup-nom under the nul1 hypothesis of no-change. For the indices, we write

[(n + 1) tl] and [(n + l)t2] respectively.

The asymptotic behavior of U[~n+l)till[(ni9t2~ O < tl < t 2 < 1, will be derived

from the following reduction princzple (Coroilary 4.3.1)- It is a consequence of The*

rem 1 of Hull (1979) which may be reduced to the following coroilary of Hall (1979):

CoroIlary 4.3.1 (Hall, 1979) XI, X2, . . . are i. i.d. r. v. 's with cornmon distribution.

If e2 = B ~ ~ ( x ~ ) = O (degenerate case) then Ci<i<j<l*,l WiIX,) 2n converges weakly to

a stochastic process Y, where h Y a syrnrnetric kernel with mean O and finite second

moment.

We mention that the U-statistics in (4.1.12) are assumed to have a non-degenerate

kernel h, but we may, in our context, coristmct kernels 9,' that are degenerate. Re-

caii that IE~ (x~) = E(h(X, , &) - e) = O and a2 = varh(x1) = IE(&(X~))* z CI

and let

-4.3. Symmetric Kernels 93

Put

Since h is a sy-etnc kernel, it follows that

where for each z

Hence gé is a symmetric kernel with mean 0, and it is degenerate = l~ij,'~(Xl) =

O) as weU. Findly, for the centralized U-statistic uL4) defined in (4.1.12) we get


which implies that

where is a centralized U-statistic with a symmetric and degenerate kernel g,',

and Corollary 4.3.1 can be applied. We note that u:(~) corresponds to the U-statistic

2:(4) in (4.2.2) in the antisymmetric case, but they are different. Mso, we have to

use different known resdts to get Op(n) statements, namely Corollary 4.3.1 and

Theorem 4.2.1 respectively.

We mention that the * in (4.3.1) as well as in the following formulas indicates

that we consider the difference of one of the centralized U-statistics in (4.1.12) and

its corresponding projection.

Similarly to the antisymmetric case (cf. (4.2.3)), let

and we will show that (cf. Corollary 4.3.2)

max lI[(n+l)t1]S[(n+i>t2]~~, IU[(n+l)ti],[(n+i)t2]-rci2)(tlYt2)1 = O&). (4.3.2)

Consequently to (4.3.2), we approximate rci2) (tl, t2), which is a sum of i.i.d. r.v.'s,

by appropriate Wiener processes and by doing this, we will get a Gaussian limiting

distribution for a properly norrned version of 1 ,[(n+llsi. We first prove


uL4) defined as in (4.1.12). Then under Ho the following statements hold true as

I(n+r)tl] rnax

l~~(n+l)ti151(n+i)tz]1n IuE+,)tll - [(n + WI] i = L C ~ ( X J

rnax isl(n+l)t.-] <[(n+1)t21 <n I~is3n)+1)t~~,n - (n - [(n + ')a]) 2 &(xi)

i=[(n+l)t2]+i

Proof of Lemma 4.3.1 For technical purposes we define

Thus we obtain

n Kn+W

= (n - 1) C h(xj) - ([(n + l)t,] - 1) C h(xj)


and we now use formula (4.1.12) to anive at

Thus we have the inequ*

By using Corollary 4.3.1 together with (4.3.1), as n -t CG, we get

Since h(xi), 1 5 i 5 n,

and O < 9 = varh(xi) are i.i.d. r.v.'s the central limit theorem with IE~(x~) = O

< ca implies

Therefore (4.3.5), reduces to

which proves the fourth statement in this lemma.

The first statement in the lemma follows by combining (4.3.6) and the fact that,


a s n + w ,

U*(2) [(n+l)til,[(n+i)t21 is a U-statistic (with a s r n e t i c and degenerate kernel with

mean O) from [(n + l)tl] + 1 to [(n+ l)t2] minus a sum of i.i.d. r.v.'s. Hence, we have

to shift the interval [[(n + l)tl i 11, [(n + l ) t 2 ] ] to the interval [l , [(n + 1) (t2 - tl)]]

by using the fact that for i.i.d. r.v.'s Xi, &, - . - , Xn

for each n. Note, that we also have equality in distribution when replacing h* by

g$ := h* - 8. Moreover, by CoroLlary 4.3.1,

- and we may write SU^^..^^-^^ <1 - SU^^^<^^^^+^^. Since (4.3.7) holds uniformly in tl - and t2 such that O < tl < t2 < Ir we have suptl,t,,l+tl - ~up,,,~ ,,1,, and we h v e

at

and similar arguments as in (4.3.6) yield

which proves the second staternent in this lemma.


to shift the intemal [[(n + l)t2 + 11 , n] to the interval [1, [n - (n f l ) t z ) ] ] . Then

for each n, and

- and we may wrfte ~upo< l+<~ - supo,t2-l,-l = ~ u p ~ , ~ , , , . If we restrict t2 to the

intervd (tr ,1) c ( O , 1), where tl is between O and 1, then the supremum over this

interval will &O be Op(n), hence

Moreover, we have

which proves the third staternent in this lemma. 0

Moreover, using (4.3.4) we get

Corollary 4.3.2 Let h be a symmetric kemel and U[(,,,)tll,~(n,,)t21 be defïraed as

in (4.1.10). Under Ho, as n -+ co, we have

Kn+W1l max

lI[(n+l)tr]~[(n+~)to]In l"~(n+l)til,[(n+l)t21 - {([(n + l ) t 2 I - 2[(n + l ) t i ] ) j=i C h(xj)

by sums of i-i-d. r.v.'s. This enables us to study the Gaussian asymptotic behavior

of q(n+i)ti],[(n+l)tz] " the supnom.


define a sepence of Gatlssian processes {ry=(tl, t2), O 5 t1 5 t2 - < l)nEN S D C ~

that, us n + CO,

and, for each n,

where the Gaussian process rsp is defined via a linear combination of a standard

Wiener process W as follows:

Proof of Theorem 4.3.1 The proof goes dong the lines of the proof of Theo-

rem 4.2.2. Namely we combine Coroltary 4.3.2 with (4.2.11) and the theorem folloms

Mmediately. O

We note that for O stl 5 t2 5 1

and

4.3. Syxmnetric Kernets 100

where to = ro := O and t3 = T~ := 1. Theorem 4.3.1 implies that u d e r Ho

Moreover, this allows us to produce tables for the unlmown distribution of SU~,,~,<,~,,

Irsum(tl, t2) 1 and reject the null hypothesis of no-change, if supo<ti<tz<i IUn(tl, t2) [

becomes too large.

4.3.2 Asymptotic Results under HY) We wish to study the limiting behavior of Zr(n+l)til,~(n+l)tzl, O < ti < t 2 1,

from (4.1.2). We will show that (cf. Theorern 4.3.2), as n -+ co,

where we will assume that h(x, y) is a nondegenerate symmetric kernel and (4.1.21)

holds. Thus (4.1.21) replaces the stronger asumption of a second finite moment as

in (4.1.19) which was used to derive the convergence in distribution result under Ho

(cf. Section 4.3.1), as n -t w,

where the stochastic process U&, t2) was defined as

and the limiting Gaussian process rsP(tl, t2) as

The limiting function in (4.3.12) will depend on the location of the (at most)

two change-points [nXl] and [nX2], O < XI 5 X2 < 1. Moreover, we dl see that


When looking at the definition of Z~~,,l)tll,~~,+~~t,l, O < tl < t2 < 1, in (4.1.2)

with ki = [(n + l)ti], 1 5 i 5 2, we see that it may be epressed in terms of the sum [(n+i)t11 [(nf W 2 ] p-)tlI

of the double Cj=[(,+~)tll+l h(Xi, Xj) 7 ~~=[Kn+l l t21+1 [(,+l)t21 n

h ( X i , X j ) and Z=[(n+iltil+l Cj=[(n+l)t21+l h(Xi7 Xj)- Thus, each of these double s u s is a sum of the form

where O 5 a < b 5 c < d 5 n and a, b, c, d E N. These double sums rnay be

associated with comparing the two blocks (X,+, , . . . , Xb) and (X,+l, . . . , Xd) with

each other. In case of testing for (at most) one single change-point we compared the

block (XI, . . . , Xk) with the block (Xk+l, . . . y Xn) . NOW we have the three blocks

(XI, . . . , &,), (Xkl+l, . . . , Xkî ) and (Xk2+1, . . . , X,) and therefore three possibilities

to compare two dinerent blocks with each other. Each block rnay contain a t most

two change-points, hence we split each of the sums in (4.3.13) into three sums. Since

we do not know the location of the change-points we mite the latter sum as follows,

where O 5 a < y1 5 % 5 b 5 c < y3 5 7 4 5 d 5 n. Consequently, we are comparing

blocks that do not have a change inside. Again we emphazise that there are a t most

two changes in total, which implies that some of the new srnall blocks may have no

change inside or in between them. For example, if we were to have one change in

the first block and one change in the second block, then (4.3.14) reduces to

Xd) do not contain a change-point, and, the two change-points now are postulzted

to be at the positions 71 and 73, which thernselves are also unknown. Since we do not

know in advance the location of the change-points, we consider sums as in (4.3.14).

For technical purposes we define the following function

i 1, O<z<X1 ,

l ( z ) := 2, Al < z 5 X2,

3, A:! < z < 1, which will be used to remind ourselves of the location (either before or between or

after the ~ h a n ~ e - ~ o b t s ) of a block of r.v.% which does not contain any change-point.

We split (4.3.14) ~ t o 32 different double sums which are of the form

where 1 5 Ri = &(n) := [ml] < R2 = Rz(n) := [nrz] 5 R3 = &(n) :=

[nr3] < = &(n) := [nr4] 5 n are chosen properly according to the double

sums in (4.3.14). m b e r m o r e we wil l see that for each of these sums we have that,

a s n - o o ,

This is an immediate consequence of Theorem 2.6.2 for generalized two sample U-

statistics by Sen (1977) (compare the proof of Theorem 3.3.2), and Hoeffding's SLLN

(cf. Theorem 2.6.1) for two samples fiom the same distribution.

The double sum in (4.3.16) may be associated with comparing the two blocks

(XRiil, . . . , XR2) and (&+1, . . . , X%) with each other, where both do not have

any changes inside. If both belong to dinerent distributions, then Theorem 2.6.2

applies, and (4.3.16) follows immediately. On the other hand, if both belong to the

çame distribution, thep we may consider the two blocks as one block. We do this

by deleting everything between those two blocks. We therefore consider the block

may then use Hoeff&gYs SLLN, but first we have to write the double s u in (4.3.16)

in ter- of U-statistics. Hence, we now wnte

and

C D h(E;-,k;.) = C h(Ky y,), R2<i<j<%-(&-R2) RI <i<j,<&-(R3-R2)-(R2-RI)

for each h e d n. Using now Hoeffding's SLLN, we get the followïng convergence

results for the U-statistics A&~), An2) and An3), as n + CO,

Hence, as n + m,

and, therefore, (4.3.16) also holds, if two different blocks have the same distribution.

Note that 01(,),1(,1 = 6'i(r2),r(t,-(r,-r2)) when using the block of i-i-d. r.v.'s Y.

Similady to Section 4.2.2, we define

where O 5 XI < 2 2 5 x3 5 2 4 5 23 < x6 5 x7 5 x g < 1. Moreover, since (4.3.18)

is a sum of nine double sums, (4.3.16) may be applied nine times, and the limiting

function of (4.3.18) can be written as (n + co)

We observe that for O1 = O2 = d3 = = = 02,3 = 6

We are now in the position to go back to the definition of Z[(n+1)t11,[(n+1)t2~

0 < tl < t2 < lY in (4-1.2), and we write it in terms of double sums S(-) as

4.3. Symrnetric Kemels 105

in (4.3-18). Consequently

" Since we do not know the Iocation of the where we define to := O and t3 := a- change-points [dl] and [nXz], where O < Al 5 & < l, m define the followbg

Al, XI I tl, al :=

cl, otherwise,

:= otherwise

where O < al 5 a2 5 .. . 5 < 1 and q E [0,1], 15 i 5 6.

We need to defbe these ai's, 1 5 i 5 6, since there are exactly six possibilities

to place tmo change-points in three blocks. Moreover, exactly two of the latter aiYs

will change to one of the values XI or X2, while the other four ai's will get the value

a and they wilI drop out of the limiting function u x , , ~ , ( t ~ , t2) . Hence, the actual

values of the ciYs are not important, but they are needed to give a general formula

for the limiting function.

We are now in the position to state the following theorem which is an immediate

4.3. Symmetric Kernelç 106

consequence of the previous arguments in this section.

Theorern 4.3.2 Assame that (2.2.2)' (4.1.20), (4.l.21), and ~ 2 ) hald. De f i e to :=

O and t3 := S.* I ~ T I = T&) := [dl] and TZ = r2(n) := [nA2], O < Al 5 X2 < 1' then, as n -+ m,

where

and q and the 's, 1 5 i 5 6, are defined in (4.3.19) and (4.3.21), respectiuely.

The limiting function in (4.3.22) is defined for every possible combination of t l ,

t2, XI and A2. If we speU out these cases individually, then (4.3.22) cm &O be

'Note that t3 -t 1 when n -t CO. In (4.3.20) t3 := *, since n = [(n + l)&] = [(n t l ) t a ] has to be satisfied, but in the limiting funetion (4.3.22) tJ = 1.


Assuming that IEh2 is finite instead of (4.1.21) then Theorem 4.3.2 implies the

consistency of tests based on supfunctionals of {U[(ncl~~il,r(n+l)tzli O < tl < t2 < 1). This means that we can consistently reject Ho vs. ~ f ) , when

except in the case when = O2 = B3 = = = 82,3 = 0.

~ A ~ , A ~ (tl, t2) is equd to O if and only if al1 involved are equal to O. To show

this we obsenre that each of the six parts of the limiting function above may be

written as


where Aj, Bi, Cj and Dj depend on 81, 82, 83, elY3 and and 1 5 ml 5 ma 5 3

have to be chosen properly. Moreover, ux,,A2(t1, t2) = O, O < tl < t2 < 1, if and o d y

i f O 1 =O, & = 0, 02,3 = O , Aj =O, Bj = O , Ci = O and Dj = O , 15 j 1: 6. But

combining 013 = elY3 = 0Zy3 = O with the arguments from the antisymmetric case

we get consistency except in the trivial case.

We observe that if el = B2 = O3 = 81,2 = = û2,3 = 8, then

which is the limit (n -t ca) of the expected value of ~Z~~n+l)tll ,I(n+llt21 under Ho

(cf. (4.1.7)). Moreover, it follows then that under the null hypothesis Ho of no-

change

Thus on assuming that O1 = O2 = B3 = = 0113 = 02,3 = 8 = 0, the sequence

is not consistent against any dass of alternatives. On the other hand, if at least one

4.4. ChangesintheMean 109

Bij is not equal to O and we use Tn, then

This implies that the Lirnits of the sequence {TnInEN ae Merent in probability

under Ho and ~ f ) , namely we have consistency of {Tn)rFN

Example 4.3.1 1% consider T, and let c, be such that

for some fked O < a < 1 and n E N large enough. We apply Theorem 4.3.2 and

get that

P{Ho is rejected when using T , I H ~ ) iç tme) =

-+ 1 as n + oo,

1 where tl and t2 are picked such that JU[(n+l)t,l,[(nii)t21 f f O. Hence, T, in this

example is, for n E N large enough, unbiased at any level a and, as a sequence in n

it is consistent againçt the class of all alternatives where at least one 9, # O.

4.4 Changes in the Mean

We are t o test the no-change in the meun nuli-hypothesis

4.4. Changes in the Mean 110

Ho : X I , . . . , X, are independent identzcally distributai random variables

w i t h E X i = ~ andO<t?=VarXi <CQ, 1 SiSn,

against the at most tvlo changes in the m a n alternative

~ 2 ) : XI,. . . , X , are independent randorn variables and there are the - tvlo integers TI and r2, 1 5 TI < < n, such that E X l = . . . -

E& # Exriil = . - . = EXny lex,+, = . . . = E X , # EX,+, =

. . . = E X , and O < O* = VarXi = . . . = VarX, < cm.

Takuig simulated values of the therein indicated (k, Xk), Figure 4.4.1 gives an ex-

ample where the mean changes two times while the Msiance stays the same.

Figure 4.4.1: The data XI, . . . , are i.i.d. N(0, 1)-distributed, XSo1 , . . . , &OO are i.i.d. N(3,l)-distributed and X701r - . . , XlOo0 are i.i.d. N(2,l)-distnbuted.

Similarly a s in the case of testing for at most one changepoint, testing for at most

two change-points in the mean may be illustrated by using a geometrical argument.

Consider the linear hinction m(t) := t, t E R, which joins under Ho al1 the points

(k, ;IE(S(~))), k E N, if p # 0, and it joins dl the points (k, E(s(~))), k E N, if

p = o.


Figure 4.4.2: A geometricd interpretation of IE{k2S(ki) + (n- kl) S(k2) - k2S(n) ) = O under Ho.

mthout loss of generality let p = 1. Shen in Figure 4.4.2 we join all the

points (k, E{s(~))), k E N, via the straight line m(t) = t. We pick one kl E

(1,. . . , n - 2) and then one k2 E {kl + 1,. . . , n - I ) . ~ We draw a horizontal line

starting from B := (O, IE{s(~~))), containing the point (k17 l ~ { ~ ( k 1 ) ) ) , and *th

terminus, C := (k2, IE{s(kl))). We draw a vertical line fiom the terminus and

intersect the t-axis. We denote this intersection by D := (k2, E{s(o))), where we

define S(0) := O. In this way we constmct a rectangle, denoted by ABCD (see

Figure 4.4.2), where A := (o,E(s(o))), with length k2 and àeight lE{S(kl)). We

also draw a horizontal line starting from F := (kl, lE{~(k~)) ) , containing the point

(k2, E{S(~))), and with terminus, G := (n, E3{s(k2)}). We then draw a vertical

line fiom the starting point and another one fiom the terminus, both intersecting

the t-axiç at E := (kl, E{s(o))) and H := (n, E(s(o))) respectively. Similarly,

we constmct another rectangle, denoted by EFGH (see Figure 4.4.2)) with length

'In Figure 4.4.2 we may think of kt and k2 being defhed as ki := [ntkf")] and k2 := [ntyk)] respectively, O < ti") < tifÜ) < 1.

4.4. Changes in the Mean il2

(n - kl) and height E{S(k2)).

Reffecting each point of the rectangle EFGH around the 45 degree line m =

t, we get the nem rectangk BIJC, where B := (o,IE{s(~~))) is the refiection

point of E := (kl, E{s(o))) , I := (O, E{s(~))) iç the reflection point of H :=

(n, IE{S (O) )) , J := (k2, E{S (n) )) is the reflection point of G : = (n, E{S (k2) )) and C := (k271E{~(k1))) is the reflection point of F := (kl,E3{s(k2))). This new

rectangle has length k2 and height lE{S(n) - S(kl)).

Combining the two rectangles ABCD and BIJC with each other, we have con-

structed the rectangle AIJD, which has length kz and height E{S(n)). Moreover,

under a, the new rectangle RTJD has the same area as the sum of the two other

OIES, namely ABCD and EFGH. Consequently, we have that

Thus, in principle, for each given kl and k2, 1 5 kl < k2 < n, we constnicted an

unbiased estimator of zero assuming that Ho is tme. We may also Say that, viewed

this way, testing for at most two changes in the mean results in comparing areas of

three different rectangles with each other.

Continuing with using the kernel h(z, y) = z - y to test for at most two changes

in the rnean and using the same arguments as in Section 3.4 we assume that under

Ho


and

Since h is an antisymnetric kernel, we know fiom Theorem 4.2.2 that, as n + oo, under Ho

where for aU O < tl < t2 < 1

and

We note that

for each n.

Assuming the same conditions as in Theorem 4.2.3, we know that: as n + a, under HF)


Therefore we consistently reject Ho , if SU^^<^^ <f2..l IUn(tli t2) 1 becornes too large.

We note that (4.4.2) implies that under Ho, as n + m,

and one would have to produce tables for the right hand side randorn variable

in (4.4.5) for the sake of accepting or rejecting Ho.

We note that by expressing (4.4.3) in terms of partial sums S([(n + 1)-1) we get

t bat

and now Donsker's theorem (see Theorern 1.2.1) yields (4.4.5) as well.

The variance o2 in (4.4.3) is usually unknown. Consequently it has to be esti-

mated on the bais of the same randoni sample. One possible way of estimating (r2

is via the sample variance, namely

where X, = X1+-.CXn n is the sample mean. Since ô: is a consistent estimator for 02,

Ün(ti, t2) in (4.4.3) may be estimated by

Now o u previous result fiom (4.4.2) carries over. In particular, as n + m, we

obtain

4.5. Changes in the Variance 115

and hence also

Consequently, just like in (4.44, tables for the distribution of ~up,,,~ <,,,, Ira (tl, t2) 1 couid again be used to consistently reject the n d l hypothesis of no change in the - data, if our test statistic sup,,,, ,, 1 Ù, (tl, t2) ( becornes too large.

4.5 Changes in the Variance

We are to test the no-change in the varionce hypothesis

Ho : X I , . . . , X,, are independent identically distrz'buted random variables

w i thEXi=pandO<$=VarXi<oo , l s i sn ,

against the ut most t u o changes in the variance alternative

H : XI, . . . , X, are independent mndom uoriables and there exist the - two integers TI and 72, 1 _< TI 5 72 < n, such that VarXl = . . . -

V a r X , # V~TX,,~ = . . . = VarX,, V ~ T X , + ~ = . . . = VarX, # VarX72+l = . . . - - VarX,, O < V U ~ X ~ , V U ~ X , + ~ , VU~X,,+~ < CQ,

and EXl = . .. = E X n = p.

Taking simulated values of the therein indicated (k, Xk), Figure 4.5.3 gives an ex-

ample where the Mnance changes two tirnes while the mean stays the same.

Similar arguments as in Section 3.5 suggest the use of the symmetric kernel

Using the same arguments as in Section 3.5, we assume that under Ho

-

4.5. Changes -in the Variance 116

Figure 4.5.3: The data XI, . . . , Xloo are i.i.d. N(O,2)-distributed, XIo1, . . . , are i.i.d. N(O,1)-distributed and X301, . . . , X1OOO axe i.i.d. N(O,3)-distributed.

and

Since h is a symmetric kernel, we know h m Theorem 4.3.1 that, as n -t 00, under

Ho

where for all O < tl < t2 < 1


and

we note that

for each n.

Assuming the same conditions as in Theorem 4.3.2 we know that under the

alternative ~ f ) , as n _t cm,

Similarly to previous sections, we have to estimate the usually unknown param-

eters 9 and C2.


Again we estimate o2 by the usual eçtimator of the Mnance 6: as in (3.4.6). we

also have to estimate 5 whîch is defined as

Hence, we estimate the second part of the latter equation by ô: and the first part

via the estimator for the 4 t h centerized sample moments

where X, = xl+---Cxn n *

Hence, we estimate U,(tl, t2) in (4.5.3) by

Now our previous result from (4.5.2) carries over. In particular, as n -+ CO, we

obtain

and hence &O, as n + oo

Thus, possible tables for the distribution of s~p , ,~~ , , ~ , , II'-(tl, t2) 1 couid be used

to reject the nul1 hypothesis of no change in the data, Sour test statistic SU^^<^^^^^<^ lon(tl,t2)( becornes too large.

4.6. Epidemïc Alternatives 119

4.6 Epidemic Alternatives

In the past two sections we studied alternatives with at most two changes in the

mean and variance: respectively. Similarly, we will test for two changes and will dso

require that the distributions before the first and after the second change are the

same. This kind of alternative, more or less as formulat ed by Levin and Kline (1 985),

has been called the epidemic alternative, on postulating that an epidemic state nuis

from t h TI through T*? after which the normal state is restored. Applications of

this mode1 in econometric context are studied by Broemeling and Tsunimi (1987).

Ure &art with the case, where we have an epidemic change in the mean as

assumend in the alternative HF) below. In particular, we test the no-change in t he

Ho : X1, . . . , X , are independent identically distributed random variables

vith EXi = p and O < a2 = VarXi < oo, 1 5 i 5 n,

against the epidemic change in the mean alternative

H?) : XI,. . . , X, are independent randorn variables and there are tzuo

integers 71 and 72, 1 < TI < r2 < n, such that EXl = . . . = E X , =

EXn+l= -. . = EX,, ExT,+, = . . . = EXnz BX, # EXn+l and

O < o2 = VarXl = . . . = VarX , < m.

This alternative postdates that the mean changed at an unknown time TI and

that, at another unknown time 72, i t changed back to its original level. Figure 4.6.4

gives an example where the mean changes a t some point 7-1, and at point 7-2 it

changes back t o the original mean before the first change TI.

Nonparametric tests for epidemic alternatives were discussed in the literature

in the past two or so decades (cf. Csorg6 and Homath (1997, Section 2.8.4) and

Yao (1993)) and their related references. Levin and Kline (1985) suggested the test

st atistics

4.6. Epidemic Alternatives 120

Figure 4.6.4: The data XI, . . . , and Xioi, . . . , Xlooo are i.i.d. N(0, 1)-distributed and &O1l . . . , Xm are i-i-d. N(3,I)-distributed.

and

which rnay be interpreted as comparing the mean in the middle to the mean before

and after. Using Donsker's theorem (cf. Theorern 1.2.1) we have that under Ho

1 -TF) P+ mp B(t) - inf B (t) , as n + a. n1I2a o ~ t < r olt<i

The latter convergence results uses the fact that

'D sup IB(tz)-B(tl)l = sup B(t) - inf B(t). Ostr < t 2 9 O<tsi Ostg

4.6. Evidemic Alternat ives 121

In a similar vein, Lombard (1987) proposed a quadratic version of the latter

statistic, namely

Again we rnay use Donsker's theorem to compute its limiting distribution under Ho:

namely

The likelihood ratio test çtatiçtic for testing Ho vs. HF) , if the obsenations are

normal with constant mean, is calculated to be

~ ( 4 ) = m u n l l k i <kz<n

{(k2 - kl) ( 1 - v) } '

where the distribution of TA*) is unknown. lnstead of the latter expression Yao

(1993) obtained large deviation approximations for

assuming that O < lim inf,,, < lim sup,,, < 1. This statistic is similar to

~ ( ~ 1 , but technical d ~ c u l t i e s are avoided by trimming the endpoints. In practical

situations one may face problems when choosing a proper ml and mz.

Based on the recursive residuals wk defined by

Yao (1993) introduced some new statistics. Let


and d e h e

and

The distributions of these two statistics are unknown due

arising by (k2 - k1)'I2 in the denominator. To avoid these technical problems, Yao

(1993) considered a trimmed version of T;~), namely

in case of normal observations and obtained large deviation approximations.

Csorgo and Horvath (1997) avoided technical difficulties by multiplying T;~) and

TP) with their denominator. Namely, similady to the statistics TA') and TA^) they

considered the statistics

and

as well as their quadratic analogue

Using the fact that, as n + w,


they obtained the following asyrnptotic distributions for the latter three statistics

and

which are the same as the ones for &TA'), -&TF) and LT(~), 2n3u n respectively.

Using the results from Section 4.4 we suggest the use of the test statistic T, =

Tn (XI, x2, . . . , Xn) which is defined via

By (4.4.2), or by using Donsker's theorem we have under Ho that

where the limiting Gausçian process Fa (tl , tz) is defined as

Moreover, under HF), -&$ converges in probability to w, since MT, converges

in probability- to the supremum of the limiting function E A , ~ , (tl, t2) defined as


(A2 - Xl)t2&,2r O < ti < t2 5 XI < A2 < 1,

((a - ~ ( 1 - X~ + t l ) + ( x ~ - t 2 ) ~ 1 ) 4 2 7 O < tl 5 A, < t2 5 X~ -=. 1, 0 2 - X1)(1 - t 2 + t l ) 4 2 > O < tl < Al < X2 < t* < 1,

( ( ~ 2 - td&+ (1 - X2)@2 - ~1))81,2, O < Al < tl < t2 5 X2 < 1,

((1 - &)(tl - AL) + (A2 - t l ) ( l - t 2 + X & $ , 2 , O < Al < tl 5 A2 < t 2 < 1,

. (1 - t l ) ( X 2 - w 1 , 2 > O < A, < X2 < t1 < t2 < 1,

where TI = [nX1], T- = [nA2] and = E(Xi - X j ) # o , ~ 1 5 i 5 71 < j 5 72 < n a n d l < r l < i < T 2 < j I n .

The limiting function GIa,(tl, tz) depends on Al, X2, tlr tZ and In Fig-

ure 4.6.5 we c m see the six areas where tl and t2 are dehed. The areas Al, Aq and

A6 are triangles, whiie A2, AJ and Ag are rectangles. Each of them depends on the

location of at least one of the change-points Al and A2.

Figure 46.5: Summation Area of ÜAIvA2 (tl t2)

61f were to be equal to O then -&T, is not consistent.


A special point in Figure 4.6.5 is (A1, X2), which is the point where the areas

A2, A3, & and Ag intersect. Moreover, assuming that > O2 it turns out that

the point where Ü A ~ , & (tl, t2) reaches its maximum is at (A1, A2). We note that

Ü,,,,(h, A21 = 0 2 - W(1 - A2 + w4,2.

Figure 4.6.6: The limiting function ü~ 2(tll t2) with = 10 takes its maximum 3'3

value of 2.2 at the point (B, i).

In Figure 4.6.6 the graph of the Limiting function u+?+(ti, t2) is plotted. T t can

be seen that ü~ 2 (tl, t2) is defined in six dSerent areas via six diaerent functions. 3'3

Three of these surfaces have the shape of a triangle, and three that of a rectangle.

Moreover, the surfaces corresponding to the summation areas Al, 4, A4 and A6

are planes and the two others are not-

Figure 4.6 -7 shows the limiting function ZL (t t2). Thus, in this case, we con-


sider a situation when w o change-points are close to each other. This means that the

epidernic change fiom one mean to another and back was very short. Nevertheless,

the maximum is taken at point (X1,X2) = (&, 6).

Figure 4.6.7: The limiting function O' 2 (tl, t2) with O,,* = 10 takes its m b u m 10 ' 10

value of 0.9 at the point (h, $) .

Figure 4.6.8 shows the limiting function ü I 9 (tl, tz), i. e., now the two change- 10 ' 10

points are far apart from each other. This means that there was a long penod

between the first and second change. Similar to the previous two cases the maximum

is taken at point (A1, Xz).

Therefore, in addition to the test statistic Tn which tests for Ho against ~ f ) , we

may also define an estimator for the times of change, say B. Hence, we d e h e as an


Figure 4.6.8: The lhiting function ü~ A (tl, t2) with = 10 takes its maximum 10' 10

value of 1.6 at the point (6, 6).

estimator for the change-points TI and r2

assuming that BlY2 > If &,2 < O then we define as an estimator for the change-

points TI and ~2

'We use m i n { ( k ~ , 4 ) : . . .) to denote the point (4, k2) where the maximum (or minimum respectivdy) is taken and which has the smdest &tance (&didean nom) fkom th; point (0,O).


We note that ? and 5 are two-dimensional vectors.

Simîlarly to an epidemic change in the mean, we may assume an epidemic change

in the variance. In particular, we now are to test the no-change in the variance nu&

hypo t hesis

Ho : XI , . . . , Xn are independent identically distribzcted random variables

wi thEXi=p a n d 0 < a 2 = V a r x i C W , 1 S i S n ,

against the epidemic change in the variance alternative

~ 2 ) : XI,. . . , X, are independent random variables and there are two

integers rl and Q, 1 < TI < ~2 < n, svch that Var& = . . . = V a r X ,

= V U T X ~ + ~ = . . . = VarX,, VarX,+l = . . . = VarX,, V a r X , # O < VarX,, VarXn+l < a, and EXl = . . . = EXn = p.

This alternative postdates that the variance changed at an unknown time TI

and that, at another unknown time r2, it changed back to its original level.

Using the results from Section 4.5, we suggest the use of the test statistic T, =

T, ( X I , X2, . . . , Xn) which is defined via

where we replaced 5 and o2 by the estimators suggested in Section 4.5 (cf. (4.4.6)

and (4.5.4) respectively).

Assuming that the second moments of (Xi - Xj)2, 1 5 i < j 5 n, and a2 > O are

al1 finite then, under Ho, T, converges in distribution to SU^^<^^<^<^ IrSW(tl, t2)1, where the Limiting Gaussian process rsm(t1, t2) is defined as

Moreover, under HA, T, converges in probability to CO, since & converges i n

4.6- Epidemic Alternatives 129

probability to the supremurn of the limiting function u ~ t , ~ 2 (tl, t2) which is given by

We note that this limiting function corresponds to the one in (4.3.23), when we

put O1 = 8123 = û3 and el,* = 82,3. The limiting function depends on XI, X2, t l , t 2 ,

81, 61.2 and 0 2 -

Similady to the case of an epidemic change in the mean, the limiting function

t2) is defined via six different functionç over six different areas. Unfortu-

nately, it is not tme anymore that the maximum or minimum, respectively, of the

limiting function is taken at (A1, A2). This is due to the fact that there are two

more parameters, namely and 0 2 , involved than before. Figure 4.6.9 gives an

example where neither the maximum nor the minimum is taken at (Xi, X2). Hence,

an estimator for the times of change is only possible, if both changes satisfy special

conditions. Such conditions can be found in a similôr vein as in Section 3.3.3, where

we deal with one change only.

Similar test statistics can also be found for different epidemic alternatives. The

4.6- Epidemic Alternatives 130

Figure 4.6.9: The lirniting function ü~ &, t2) with 019 = 10, = 20 and 82,3 = 3 ' 4

- 12 takes its maximum value of 3.05 at the point (&, $) .

main problem is to End an (anti-) symmetnc kernel h(x, y) which d l be appropriate

for the desired alternatives.

"Changing one thing for the better is worth more

than proving a thousand things are wrong."

- Anonyrnous

Chapter 5

Multiple Change-point s

Introduction

In the previous two chapters we investigated test statistics which dowed us to

test for at most one or two change-points. h this chapter we generalize the main

ideas obtained in these chapters, especially t-he one where we tested for at most two

change-points, and study test statistics which allow us to test for at most s change-

points, 1 5 s < n. Due to the increase of the number of possible change-points,

the results become more complex and so do the notations. We note that proofs are

done in a similar vein as in the previous chapterç.

We are t o test the null hypothesis

Ho : XI, . . . , X, are independent identically dktributed mndom variables

against the alternative that there are at most s changepoints in the sequence

XI, . . . , X,, namely that we have

H(;) : 4,. . . , X,, are independent random uariables and there ex& s,

1 5 s < n, integers = rl(n), r2 = ~ ~ ( n ) , . . ., rs = r,(n), 1

TI 5 7-2 < . . . 5 T~ < n, mch that P{Xl 5 t ) = . . . = P{X, 5 t) , - qx,,, 5 t) = ... = EyX, 5 t), . . . y P{X,,+l 5 t ) = - - - -

a>(& 5 t ) for al1 t and q X T i < to) # P&+ 5 tO) for some to

andfor all 15 i 5 S.


We note that, just like in the case of epidemic alternatives (cf. Section 4.6), the al-

ternative H:) allows us to consider rasdom variables Xl, X2, . . ., Xn with s changes

in the distribution, which do not necessarily result in (s + 1) di£Ferent distributions.

Since we are testing for at most s, 1 5 s < n, changes we need to define a

stochastic process which 'feels' the possibility of s changes. To do this we split the

given sample XI, . . . , X, into s + 1 blocks and compare each of the blocks with the

others. We continue using a kernel h(x, y) of the two variables. Since, out of s, we

dways compare two blocks with each other, we have (J:') different possibilities to

do so. Therefore, for the problem in hand, we define a sequence of s - t h e parameter

stochastic processes as follows:

where ko := O and k,+l := n. In this way we compare the (s+l) blocks (XI, . . . , Xki ) , (Xk,+l, .. .,Xk,), . . . , (&,+1,. . .,X,) with each other, where the ki7s7 1 5 i 5 s,

vary from i t o n+i - 1 - S .

When testing for a t most one hange, this process reduces ta


the one we have already seen in Chapter 3. When testing for at most two changes

the test statistic reduces to the one we studied in Chapter 4, namely to

Here we study the asymptotic properties (as n -t co) of the s u p n o m of the

process ZklYk 2,-.-, et 1 5 kl < k2 < . . . < k, < n, which will be seen to be based on a

combination of U-statistics. First we give some ~ota t ions and definitions under the

null hypothesis of no-change and the alternative of a t most s changes.

5.1.1 Notations under the Null Hypothesis Ho

We define

and

ECh2 (xi> X j ) =: Y) 1 5 i < j 5 n.

We assume throughout the mhole chapter that

V < 00,

which of course Mplies


F'urthermore, we have that

w h e r e O s a < b < c < n a n d a , b , c ~ N . Hence,

= (n - kl )k l8 + (n - k2)(k2 - kl)B+ ( n - k3)(k3 - k2)9 +. ..

= (n- kl)k10 + (n - k2)k28 - (n - kz)klB + (n - k3)k38

where ks+l := n. We define, as in the previous chapters,

and again we assume that


We centralize Zkl ,k2 ,...,k* by its mean, and consider the process

where the kernel function h in Zk1,k2,...,k, is symmetric. For an antisymmetn'c kemel

we d e h e

Moreover, we can m i t e Zkl,k2,-.,t as the sum of (s + 2) U-statistics, and thus we get

that

where

Similady, for an antisymmetn'c kernel we define


where

For hrther use we dso define the sequence of s-time parameter stochastic processes

and

5.1.2 Notations under the Alternative ~ 2 ) Let ~ ( ' ) ( t ) = P{X, 5 t ) , F(')(t) = IP{X,+l 5 t ) , F(3)(t) = P{X,+l 5 t ) , . . . , ~('")( t) = P{X,,+l 5 t ) be the respective distribution functions of the observa-

tions before the £ k t y between the fist and second, between the second and third,

. . ., and after the s-th change respectively, and put


hirthermore, we will put

Similady to (5.1.15), we also d e h e the second moment of the kernel h by

where r, < i 5 T*+I and T, < j 5 r,+i for all O 5 q 5 r 5 S. We assume throughout

the whole chapter that Eh2(Xi, Xi) is finite for ail possible choices of i and j , namely

that

which implies t hat

For the sake of strong laws, me will also use a weaker assumption than a finite

second moment of h, namely that for random variables fiom different distributions

the following holdç:

where log+ x = Log(x V 1).



We consider the s-time parameter processes Zkl,h,...,k,, 1 5 kl < k2 < - . - < ks < n, n 3 s + 1, where the kemel h is antisymmetric as in (2.2.3). By using the nota-

tions and ssumptions kom Sections 5.1.1 and 5.1.2, we are to investigate nom the

behavior of these processes under Ho (cf. Section 5.2.1) and ~ 2 ) (cf. Section 5.2.2),

asn-m.


We wish to study the Iirniting bahavior of

in the supnorm under the null hypothesis of nechange. For the indices kl, . . . , k,, we vmite [(n + l)t& . . . , [(n + l)t,], O < tl < t2 < . . . < tS < 1? respectively.

The ~ ~ t o t i c behavior of ~ ~ ~ n + ~ ) t ~ ~ , ~ ~ n + ~ ) t ~ ~ , . ~ . , ~ ~ n + ~ ~ t ~ ~ ~ 0 < ti < t 2 < < ts < 1, will be derived kom the following reduction principle (Lemma 5.2.1), which follows

fiom Theorem 4.2.1 by Janson and Wichura (1983).

Similady as in case of testing for at most two changepoints (cf. Section 4.2.l),

we define the antisymmetric and degenerate (aw2 = IE~**(&) = O) kemel h* with

mean zero by

Hence, the U-statistic

@+2) - -

- -

~ 2 ~ 2 1 defined in (5.1.12) can be written as


and, moreover,

where z , ( ~ + ~ ) is a U-statistic with an antisymmetric and degenerate kernel h*, and

Theorem 4.2.1 can be applied. We mention that the * in (5.2.l), as well as in the

followhg formulas, indicat es that we consider the clifference of one of the U-statistics

in (5.1.12) and its correspondhg projection.

We now proceed to prove the following reduction principle.

$1 Lemma 5.2.1 Let h be an antisymnaetric kernel and ~ t & ~ ) ~ ~ ~ , ~~n+l)tlly[(n+i)tal, . . ., defined as in (5.1.12). Then m d e r Ho the following statements hold true as

max ~S[(n+~)tr]~[(n+l)t2]~---~[(n+1)ts]_<n lz:&l)tl]

Proof of Lemma 5.2.1 The finit statement follows from (4.2.1) and the fact that,

asn+w,

z*(O [(n+l)~-l~,[(ncl)~ 2 2 i 5 s, is a U-statistic (with an antisymmetric and de-

generate kemel) fiom [(n + l)ti-l] + 1 to [(n + l)ti] minus a s u m of i.i.d. r.v.'s.

Hence, we have t o shift the interval [[(n + l)&-l] + 1, [(n + l)ti]] to the intemal

[l, [(n + 1) (ti - ti-i)]] . Moreover, using the fact that for i.i.d. r.v.'s XI, X2, . . . , Xn

for each n, we get, as n + oo,

which proves statements 2 to s in this lemma.

*('+') is a U-statistic (with an antisymmetric and degenerate ker- Sim'lar'y7 ' [ (n+~) 4 ?,n

nel) £?om [(n + l)t,] + 1 to n minus a sum of i.i.d. r.v.'s. Hence, we have to shift the

intenml [[(n + l)t, + 11, n] to the intenal [l, [n - (n + i)t,)]]. Using now the fact


that

{ hS(Xi, Xi), O < ts < 1) [(n+l)t,tl]~i<j<n

g { C h*(xi, X j ) , O < ts < i ) , lSicj<[n-(n+L)ts j

for each n, we arrive, as n + m, at

which proves statement (s + 1) in this lemma.

Combining (4.2.1) together with (5.2.1) we get

which proves the 1 s t statement in this lemma. O

Now we are in the position t o show that the maximum of ~b+l)tll,~~n+i)t21..,~(n+~)tS~

minus a sum of projected i.i.d. r.v.'s is also of order n. We state the following corol-

lary and give a detailed proof. The idea of the proof is similar to one in Corol-

Iary 3.2.1, however, here we give a more general version. This corollary clearly

implies C o r o l l q 3.2.1 and Corollaxy 4.2.1 when we put s = 1 and s = 2 respec-

t ively.

CoroUary 5.2.1 Let h be an antisymmetric k e n d and ~~(~l)~ll,[(~cllhl,..-,~(n+~)ts~ be

defined as in (3.1.12). Under Ho, as n + CQ, we have

- max

15[(n+l)ti]~[(n+l)tz]I....~[(n+1)ts]~n ~"[~n+l)t~l,[(n+~)t~],---,[(n+~]ts] - { (2 ï=1 ([(n + l)ti+-~

where [(n + l)to] := O and [(n + l)t,+l] := n.

Define to := O and t,+l := 5 and let

s [(n+l)til

~ 2 ' (h, t z - , G) := ( C ([(n + l)ti+l] - [(n + i)tâ-i]) C h(xj)) i=1 j=l

and

Since t,+l + 1 as n + w, for convenience we mite H(l) instead of H(--&). We

rearrange the summation of icF)(tl, t 2 . . . y ts) and we get the following:

t€n')(tl, t*. - . , t,) =

= C ([(n + l)ti+lI + [(n + l)ti] - [(n + l)ti] - [(n + l)ti-l])H(ti) i=l

We are now in the position to split these sums in ~Ll)( t~ , t 2 . . . , ts) and apply the

previous lemma. Since,

n - - max

l~[(n+~)t~fl[(~+l)t2]~.--<[(n+~)t,]~n 1 (@+*) - C(n - 2i + i)h(xi))

i=l

[b+l)tlI - (~(1)

[(n+l)trl - C ([(n + i)tl] - 2i + i ) h (xi)) i=1

we have that

5.2. Ant isymmetric Kernels 144

Theorem 5.2.1 Assume that Ho, (2.2.3), (5.1.4) and (5.1.8) hold. T h e n we c m

d e h e a sequence of Gaussian processes {rE(tl, t2y . . . y ts), 0 5 tl 5 tZ - - - - < ts 2

l)nEN SUC^ that, n -t CO,

and, for each n,

where the Gavssian process ra Zs defined vin a linear combination of a standard

Wiener process W as folZows:

where to := O and tStl := 1.

5 -2. Ant isymmetric KerneLs 145

Proof of Theorem 5.2.1 The proof is similu to that of Theorem 4.2.2. Again

we use the fact that we can define a Wiener process {W(t), O 5 t < co) such that

(cf. Csorgo and Gvész (1981, Theorem S.2.2.1 by Major (1979) combined with

(S.2.2.2))), as n -t m,

Combining this fact with Corollary 5.2.1 we get the desired result. O

Similady, as in Chapter 4, the upper index a symbolizes that this Gaussian

process corresponds to the case where we have an antisymmetric kernel. Furthermore

we note that for O 5 tl t 2 . . . 5 t, 5 1

and

where ta = ro := O and t,+, = r S + ~ := 1. The computation of the covariance is

straight fomard. we note that


Theorem 5.2.1 Mplies that under Ho

Moreover, this allows us to produce tables for the unknown distribution of

SU^^<^^<^^<..<^^<^ Ira (tl , t2, - - . : ts) 1 and reject the nul1 hypothesis of nechange, if -

SuPo<tl <t2< ...<ta <l ( t ~ ? t2? - - - 7 1 becornes large.

We mention that in case of at most one change, Theorem 5.2.1 reduces to The-

orem 3.2.1 which is due to Cs6rg6 and Horvgth (1988b, l997). This is easy to see,

since with s = 1 we have

where t,+l = t2 := 1, to := O and B is a Brownïan bridge as in Theorem 3.2.1.

In case of at most two changes, namely s = 2, ra reduces to

where t,+, = t3 := 1, to := O and ra(t,, t2) is the Gaussian process from (4.2.10) as

in Theorem 4.2.2.

We are now in the position to combine test statistics for a Werent nurnber

of change-points. To do this, we define the i-dimensional vectors (tl , t2, . . . , ti) E

5 -2. Ant isymmetric Kernels 147

(O, 1)' c Ri and xi E R, 1 5 i 5 S, and the SupEuclldean n o m


defiBe sepences of Gazsian processes {rn(t1), O 5 t, 5 1), {r: (t,, tz), O 5 tl < tz $ 1), . . -, {rE(t17 tZ7 . . . ,td), O 5 tl c t2 < . . - < ts 5 1, 1 5 s c n ) such that

with the Sup-Euclidean n o m we have componentwise wnvergence in didribution,

narnley, as n + m,

where for each n and 1 5 i 5 s

Proof of Tbeorem 5.2.3

This implies that we have componentwise convergence in distribution of the Ün7s to

the appropriate P ' s using the appropnate nom. O


where we will assume that h(z, y) is a nondegenerate antisymmetric kernel and (5.1.21)

holds. Therefore, in this context, (5.1.21) replaces the stronger assumption of a sec-

ond finite moment as in (5.1.19) which was used t o derive (cf. Section 5.2.1) the

convergence in distribution result, as n + CO,

where the stochastic process On (t,, t2, . . . , t,) was defined as

and the limiting Gaussian process P(t l , t2, . . . , ts) as

with to := O and t,+l := 1.

The limiting non-random function in (5.2.9) will depend on the location of the

change-points [dl], [nX2], . . . , [nX,], O < XI 5 X2 5 . . - 5 Ar < 1. Moreover, we

wdl see that SU^^^^^<^^<.-.<^^<^ Iu.(tl, t2, . . . , t,) 1 is consistent and goes to hfhity in

probability, as n -t co, under H?).


In a similar vein aç in Section 5.3.2, we define

where O _< xi < x2 < 2 3 5 . . . 5 2,+2 5 xs+3 < xs+4 5 xs+5 5 - . - 5 x2s+4 < 1- Here 1 2 , . . . xs+ly xl+qY . . - y X ~ ~ + Z and x2s+3 play the role of dummy variables and

are used to split blocks that may contain change-points. At most s of them will

actually be used, but since the location of the change-points is unknown, we need to

consider s possible changes in each of the blocks (X~(n+i)zil+iy. - - y X[(n+l)xS+,~) and

( X + l + 3 1 + . - , X I S ) - Hence, the convergence reçult from the following

Section 5.3.2 can be applied to (5.2.10) and we get

where = . . . = t9s+i,s+i = 0 = O and1

lWe note that := Bi: . . ., BICl,b+l := 88+f which was defined in (5.1.15).

5.2- Antisymmetric Kernels 150

Moreover, if the distribution between two different changes is the same t hen, by

the property of antisymmetric kemels, the corresponding Oij is ako equal to O. For

example, if the distribution between the second and third and between the sixth

and seventh change-point are the same, then 03,1 = 0.

We are now in the position to go back to the definition of Z~~n+l)til,~(n+l~t21,...,~(n+~)~S~~

O < tl < t 2 -. . < ts < 1 in (5.1.1), and write it in terms of double sums S(-) as

in (5.2.10). Thus we arrive a t

" Since we do not know the Iocation of the where we define to := O and ts+l := ,. change-points [nXl], [&], . . . , [nX,], where O < Xi 5 X2 5 . . . 5 A, < 1, we define

the following functions = q(Xi, A*, . . ., A,, tl, t2, . . ., t,), 1 5 i 5 (S + 1)st2 which

will be used to derive a formula for (5.2.9) that wiI1 allow us to handle al1 possible

combinations of XI, . . . , As and t l , . . . , t,:


where O < a l 5 a2 5 ... 5 < 1 and q E [O,1], 1s i 5 (s+I)s.

We need to define these ai 's , 1 5 i _< (s+l)s, tince there are exactly (s+l)s pos-


sibilities to place s change-points in (s + 1) blocks. Moreover, exactly s of the latter

ai's wiU change to one of the values Xj7 1 5 j 5 s, while the other s2 ai 's will get

the value q and they will drop out of the iimiting function ü~~ ,X~,...,X. (tl , t2 . . . ts) . We are now in the position to state the following theorem which is an immediate

consequence of the above argume~ts in this section.

Theorem 5.2.3 Assume that (2.2.3), (5.1.20), (Ei.l.Zl), and ~ 2 ) hold. Define to :=

O andt,+l := 5. If% = ~ ( n ) := [di], i = 1,2 ,..., s, O < Al 5 XÎ 5 ... 5 A, < 1,

then, as n + CO,

and 7/ and the a+, 1 5 i 5 s(s+l), are defined in (5.2.11) and (5.2.14), respectively.

- We note that in Theorem 5.2.3 81,1 = = . . . - BS+l,s+l = 0 = O. Moreover,

if the distributions between the (i - 1)-th and 2-th and between the (j - 1)-th and

j-th change are the same, then we alço have Bi j = 6 = 0.

The limiting function in (5.2.15) is defined for every possible combination of

Al, . . . , A, and tl, . . . , t,. As we will discuss it in Section 5.2.2, the limiting function

U ~ ~ J ~ , . . . , ~ (tl, t2, - . - ts) involves (y) different cases. Furthemore, exactly ('z2) dXerent O$, 1 5 i 5 j 5 s i 1, appear. Removing the ones which are 0, namely

6lY1, 82,2, . . . : es,= and 6s+1,+l, then exactly (":') different Bi j 7 ~ , 1 5 i < j 5 s + 1,

appear .

Assuming that IEh2 is finite instead of (5.1.21), then Theorem 5.2.3 impiies the

consistency of tests based on sugfunctiona.1~ of {Ü~(n+~~~l~,~(n+~~t21,...,~~n+~~tS~7 O < tl <

t2 < . . . < t, < 1). This means that we can consistently reject Ho vs. @), when

except in the case when & = 0, 1 5 i 5 j 5 s + 1. u ~ , , ~ 2 y ~ . ~ y ~ , (tl, t2> . . - , ts) is equd to O if and only if ail &/s involved are equal to O.

Otherwise, by using the fact that Al, X 2 , -, and A, are h e d a priori between

O and 1 and do not depend on ti, 1 5 i 5 s, one can show that there is at l e s t

one combintation of XI, . . . , A, and tlt . . . , t, çuch that UA, ,A ,,.-. J, (tL , t2 , . . - , ts) is not

equal to O.

We observe that if = 19 for al1 possible choices of 1 5 i _< j 5 s + 1, then

since B = O. Moreover, it follows that in this case under the null hypothesis Ho

no-change

Thuç on assuming that Bij = 8 = O for a

sequence

LU possible choices of 1 5 i 5 j 5 n, the

is not consistent against any class of alternatives. On the other hand, if at least one

B i j is not equal to O and we use z, then

P{Ho is rejected when using T,IH?) is true) - 1. n400

5.3. S m e t r i c Kernels 154

This impiies that the b i t s of the sequence {T,),E~ in thiç case are different in

p r o b a b w under Ho and H!), and hence we have consistency of { T n ) , , ~ .


We consider the s-time parameter processes Crkt ,k2,...&, 1 < kr < k2 < . - - < ks < n, n 2 s + 1, where the kernel h is symmetnc as in (2.2.2). By using the notations

and assumptions from Sections 5.1.1 and 5.1.2, we inveçtigate the behavior of these

processes under Ho (cf. Section 5.3.1) and ~ 2 ) (cf. Section 5.3.2), as n -t m.

5.3.1 Asyrnptotic Results under Ho


in the supnorm under the null hqpothesis of no-change. Again we will write [(n + i)ti], . . . , [(n + i)tS], O < tl < tî < . . . < tS < 1, instead of the indices kl, . . . , ks-

Similady, as in case of testing for at most two changepoints (cf. Section 4.3.1)

we define the symmetric degenerate (iY2 = I E ~ , ' ~ ( X ~ ) = O) kemel gè with mean O as


Hence the centralized U-statistic uPC2) defmed in (5.1.11) can be written as

which implies that

where u:('+~) is a centralized U-statiçtic with a symmetnc and degenerate kernel

g;, and Corollary 4.3.1 can be applied.

We now proceed to prove the following reduction principle.

max 1 u(') i<~~n+i)t~~<[(n+~)t2]~..-~[(nii)t~ [(n+l)tlJ

max I u(~+') i<[(n+i)tt ]~[(n+i) t?]< ...<[( n+i)t,l<n Nnf l)4Ln


Proof of Lemma 5.3.1 The f h t statement follows Erom Corollary 4.3.1 and the

fact that, as n + w,

and combined with similar arguments as in (4.3.6) we have, as n + co,

u*(4 [(n+llti-t J,[(~+Wil' 2 i 5 s, is a centralized U-statistic (with a symmetric and

degenerate kernel) £rom [(n + i)ti-l] + 1 to [(n + l)ti] minus a s u m of i.i.d. r.v.'s.

Hence, we have to shift the interval [[(n + l)ti-l] + 1, [(n + l)ti]] to the interval

[1, [(n + 1) (& - ti-i)]]. Moreover, using the fact that for i.i.d. r d s X I , X2, . . . , Xn

for each n, we get,

Since (5.3.2) holds

Similar arguments as in (4.3.6) yield, as n -+ CU,

which proves statements 2 to s in th% lemma. *(s+l)

s ~ ~ ~ 1 X up+l)tsi ,* iç a centralized U-statistic (with a symmetric and degenerate kernel) from [(n + l)t,] + 1 to n minus a sum of i.i.d. r.v.'s. Hence, we have to shift

the interval [[(n + l)ts + 11, n] to the interval [1, [n - (n + l)t,)]] . Using now the

fact that

for each n, we arrive, as n + CQ, at

Moreover, we have

which proves statement (s + 1) in this lemma.

Corollary 4.3.1 combined with (5.3.1) leads to

Using the definition of U:(s+2) we get

and by using (4.3.6) this reduces to

which pro- the last statement in tfiis lemma. O

Now we are in the position to show that the maximum of ~~(n+l)tl~,~(n+l)t21...,~(n+11tS~

minus a s u m of projected i.i.d. rx ' s is also of order n. We state the following corol-

lary and give a detailed proof. The idea of the proof is simi1ar to one in Corol-

lary 3.2.1, however, here we give a more general version. This corollary clearly

implies Corollary 3.3.1 and Corollary 4.3.2 when we put s = 1 and s = 2 respec-

tivel y.

Corollary 5.3.1 Let h be a symmetRc kernel and U[(n+l)tll,[(n+i)t21,-..,[(n+1)t,~ be de-

fined as in (5.1.11). Under &, as n -+ m, we have

rnax ~5((n+~)t~l~[(n+l)t2]~..-~[(n+~)t~]~n ~"~(~+~~~~l,[(n+~)~~]7.--7[(n+l)t,] - { ( i=l 2 ([(n + l)ti+11

E(n+mil + + 119-il - 2[(n + l)ti]) )I: K(xj)) + [(n + i)t] 2 h(~')} 1

j=l t=1

where [(n + l ) h ] := O and [(n + l)ts+l] := n.

- - max n

(1)

~<[(n+~)trl_<[(n+~)t~]~.--~[(n+l)tS]~n - ('[(n+i)tl] + uE+l)ti],[(n+i)t2,


Recall that to := O and ts+l := -& and let

and

[(n+i>til

H(ti) := C h(xj), Osti<l with H(to) := O,

where again we wilI mi te H(1) instead of H(*) on account of being interested in

n large. We rearrange the summation of ili2)(tl, t 2 . . . , t , ) and we get the following:

We are now in the position to split these sums in K&*) (tl , t2 . . . , t,) and apply the


previous lemma. Since;

we have that

n

< - I U ( ~ + ~ ) n - n ~ h ( ~ i ) l + max i= 1 ll[(n+l)~r]<[(n+i)t2]~...~[(n+l)ts]<n I"E+l)tl]

+ max I + ) i<[(n+i)t,]<[(n+i)t2 j ~. . .<[ (n+ï ) t S ] s n [(n+l)taldnl

We now have U~(n+l)tll,~(n+l)t21~~~~,~~~+1~ts~ approximated by SUMS of i-id. r.v.'s, and

thus we can easily study the asymptotic behavior of the weak convergence of the

Theorem 5.3.1 Assame that Ho, (2.2.2), (5.1.4) and (5.1.8) hold. Then we a n

define a sequence of Gatassian processes {rnW(tl, t 2 , . . . , tS) , O I tl I t 2 . . . 5 tS 5

lInEN S U C ~ thut, as n + m,


and, for each n,

where the Gawsian prucess r- is defied via a linear wrnbinatzon of a standard

Wiener processes W as fo Ibws:

where to := O and t,+l := 1.

Proof of Theorem 5.3.1 Combining Corollary 5.3.1 with (5.2.6) gives the desired

result. t7

We note that for O 5 t1 5 t z . .. 5 ts 9 1

E F ( t l , t2 , . . . , t,) = 0,

and

where to = ro := O and t,+l = r,+l := 1. To compute the covariance, we use the

fact that


Theorem 5.3.1 implies that under Ho

We mention that in case of at most one change, Theorem 5.3.1 reduces to The-

orem 3.3.1 which is due to Csorg6 and Ho&th (1988b, 1997). This is easy to see,

since with s = 1 we have

where t,+, = t2 := 1, t0 := O and I' is the Gaussian process from (3.3.1) as in

Theorem 3.3.1.

In case of a t most two changes, namely s = 2, Pm reduces to

where t,+l = t3 := 1, t0 := O and rS"(tl, t2) is the Gaussian process from (4.3.10)

as in Theorem 4.3.1.

hirthermore, we mention that the Gaussian processes ra fiom (5.2.5) and

fiom (5.3.6) are diaerent and their relationship is as follows:

We are now also in the position to combine test statistics for a different number

of change-points and get the following theorem:


define sequences of Gawian processes {rip(tl), O 5 tl 5 11, {rip(tl, t 2 ) , O 5

tl c k 5 11, . . ., {r;w(tlyt2,. . . , ts) , O 5 tl < t2 < - . - c ts 5 1, 1 5 s < n ) S U C ~

that with the Sup-Euclidean n o m as in (5.2.8) we have, as n + ai,

where fur euch n and 1 5 i 5 s

The proof of this theorem goes dong the lines of the proof of Theorem 5.2.2.

5.3.2 Asymptotic Results under ~ 2 )


where we will assume that h(x, y) is a nondegenerate symmetric kernel and (5.1.21)

holds. Therefore, in this context, (5.1.21) replaces the stronger assumption of a

second h i t e moment as in (5.1.19) which was used to derive (cf. Section 5.3.1) the

convergence in distribution statement, as n + CQ,

with t,+l := $ and the limiting Gaussian process rsp(t l , t 2 , . . . , t s ) as

with to := O and ts+l := 1.

The limiting function in (5 -3.9) will depend on the location of the change-points

[nh], [&], . . . , [nX,], O < A1 5 A2 5 . . - 5 As < 1. Moreover, we will see that

SUPO<~~ <t2<...<t,<l lun ( t l , tZi - . . , ts) ( k consistent and goes to infini^ in probability,

as n + 00, under H:).

The general limiting function in (5.3.9) will involve many variables, since we have

to handle all possible combinations of t i Y s and Xi's. This is due to the fact that we

have to consider every possible combination of change-points.

w'hen looking at the &finition of ~[(n,l)t,],[(n+,)t,], ...,[(*+ L)ts]l 0 < tl < t 2 - < t, < 1 in (5.1.1), we see that we may split it into many double sums, where each of

5.3. Symmetric Keniels 165

these double s u m ~ ; is a sum of the fonn

where O < a c b 5 c < d 5 n and a, b, c, d E N. These double sums may be

associated with comparing the two blocks (Xa+17 . . . , Xb) and (XNlr . . . , X d ) with

each other. In the case of testing for s changes, we have to compare each of the

blocks (XI, . . . , &), (Xk,+i, - . . ; Xk2), - . ., (XkSily . . . , Xn) with each other. Of

corne, each of these s + 1 blocks may contain s change-points and therefore we split

each of the sums in (5.3.10) into s + 1 sums. Since we do not know the location of

the s chang,e-points in advance, we mite the latter sum as follows,

Comequently, we are comparing blocks with each other that do not have a change

inside. When usbg ( X X l l ) , the double sum in (5.3.10) is split into (s + 1)2 double

sums. 4gain we emphazise that there are at most s changes in total which implies

that some of the new s m d blocks may be viewed a s bigger ones, since there is no

change inside the blocks, nor in between them. Consider

(s - 2) changes in the block (Xa+l, . . . , Xb) and 2 changes in

then (5.3.1 1) reduces tu

the case where we have

the block (X,,,, . . . , Xd)

Since we do not know in advance the location of the change-points, we consider

sums as in (5.3.11). For technical purposes, we define the following function

which will be used to remind ourselves of the location (either before the first or

between the first and second or between the second and third or . . . or after the last

changepoint) of a block of r . d s which does not contain any change-point.

We split (5.3.11) into (s + 1)2 different double sums which are of the form

where 1 5 Rl = Ri(n) := [nr1] < R2 = R2(n) := [ni2] 5 R3 = &(n) :=

[nr3] < & = &(n) := [nr4] 5 n are chosen properly according to the double

sums in (5.3.1 1). Furthemore we know fiom Section 4.3.2 as an immediate conse-

quence of Theorem 2.6.2 by Sen (1977) and Hoeffding's SLLN (cf. Theorem 2.6.1)

that for each of these sums we have that, as n + cm,

The proof of (5.3.13) was diçcussed in the case of at most two change-points (cf.

Section 4.3.2), and may also be applied here.

Similady to (5.3.11), define now


where O 5 XI < x2 5 x3 5 - . . .< z S + 2 5 x,+3 < Zs+4 5 x,+5 < . . . 1 x2s+4 < 1.

Moreover, since (5.3.14) is a sum of double sums, (5.3.13) c m be applied many

times and the limiting function of (5.3.14) can be writ ten as (n -t m)

We note that if Bl(,,,,),l(zq+,) = 0 for all possible choices of r and q then

We are now in the position to go back to the definition of Z~(n+~)ti~,[(n+l~tz~l.~.,~(iZ+l)tJ~y

O < tl < t z . . . < t, < 1 in (5.1.1), and mite it in terms of double sumç S(-) as

in (5.3.11). Thus we obtain

" Since we do not know the location of the where we d e h e to := O and tscl := ,- change-points [dl], [nA2], . . . , [ n A & where O < XI S A2 I . . . I AS < 1, we define

the following functions = %(A1, A*, - - -, As, tl, t2, . . ., t,), 1 5 i 5 ( S + 1)s, which

will be used to derive a formula for (5.3.9) that rvill d o w us to handle dl possible

combinations of Al, . . . , A, and tl,-. . , t,:

A l , A l I t l , al :=

A,, Al 5 -.- - I t l , a, :=

Cs, t l < L

(cSc2, otherwise,


( czsis, otherwise, ( Al7 t, < A, < 1,

[ c otherwise,

t, < A l 5 - - - 5 As < 1,

c ~ . ~ + ~ ot herwise,

where O < al 5 5 ... 5 a,.,+, < 1 and q E [O,l], 1s i 5 (s+l)s.

We need to define these aiYs, 1 5 i 5 (s + l)s, since there are exactly (s + 1)s possibilities to place s change-points in (s + 1) blocks. Moreover, exactly s of the

latter ai's will change to one of the values Aj7 1 5 j 5 s, the other s2 ai's will get

the value q and they will drop out of the limiting b c t i o n UA,,* ,,..., ,*(tl, t2,. . . , ts).

The following theorem is an imrnediate consequence of the above arguments in

this section.

Theorem 5.3.3 Assume that (2.2.2), (5.1.20), (5.1.21), and ~ 2 ) hold. Define to :=

O and t,+~ := If ~i = q(n) := [nAi], i = 1 , 2 > . - -, s, O < Al 5 A2 5 . . . 5 A, <

- -

3Note that t,+l + 1 when n + CG. In (5.3.16) taci := e, since n = [(R + 1)&] = [(ta + l)t.+l] has to be satisfied, but in the iimiting funetion (5.3.18) tSçl = 1.


1, then, as n + ou,

where

and q and the ai 's, 1 5 i 5 s(s+ 1), are defined in (5.3.15) and (5.3.17), respectiueiy.

The limiting function in (5.3.18) is defined for every possible combination of

Al , . . . , As and ti, . . . , t,. Moreover, there are (T) possibilities to choose tl, t2,. . . , t, given the postulated change-points Al, A*, . . . , A,, hence (5.3.18) has (y) different

cases. This is, why the definition of the limiting function UA,J ,,..., (tl , t2, . . . , t s )

involves so many variables. Otherwise we aould have to state all different cases

explicitly. For example, in case of 1 change-point we have 2 different cases, in case

of 2 we have 6, in case of 3 we have 20, in case of 4 we have 70 and so on.

We note that in the limiting function in (5.3.18) exactly rZ2) different Bi jYs ,

1 5 i < j 5 s + 1, appear.4 This follows fiom the fact that there are exactly (5z2) ways to compare 2 blocks (ie. both before the first change, both between the h t

and the second change, . . . , one before the fht and the other between the first and

the second change, . . . ), where none of these blocks contains any of the s changes.

The problem is the same as choosing 2 balls (2 blocks) out of an urn with (s + 2)

balls (2 blocks and s change-points).

Assuming that is finite instead of (5.1.21), then Theorem 5.3.3 implies the

consistency of tests based on suptunctionals of {U~b+l)til,l(n+i)t21,...,~~n+1~tj~i O < ti <

4We note that some of these O i j ' s can have the same value.

t2 < . . . < ts < 1). This means that we can consistently reject Ho vs. LI:), when

except in the case when Bi = O, 1 5 i 5 i 5 s + 1. The hinction U X ~ J ~ , . . . , ~ , (tl, t2 , . . . , ts ) is e ~ u a l to O if and only if all Bi involved

are equd to O. Othemise, by using the fact that Al, X2, - - O, and As are fùced

a priori for each n between O and 1 and do not depend on ti7 1 5 i 5 s, one can

show that there is at least one combination of Al , . . . , A s and t l , . . . , t s such that

We observe that if B i j = B for al1 possible choices of 1 < i 5 j 5 s + 1, then

5.4. Multiple Changes in the Mean 172

is not consistent against any class of alternatives. On the other haad, if a t least one

Bi is not equal to O and we use T,, then similady to Example 4.3.1 we cm show

that

P(Ho is rejected when uçing T,I ~ 2 ) is true) + 1. n+Oo

This implies that the iimits of the sequence (Sn)nEN are different in probability

under Ho and H$), and hence we have consistency of {Tn)neN

5.4 Multiple Changes in the Mean

We are to test the no-change in the mean nd-hypothesis

Ho : XI,. . . , Xn are independent identically distn'buted mndom variables

vith Exi = p and 0 < 9 = VarXi < co, 1 5 i < n,

against the at most s changes in the mean alternative

5.4. Multiple Changes in the Mean 173

Sirnilarly, as in the previous sections, we note that testing for change-points

in the mean can be Uustrated via a geometrical argument, namely by comparing

special areas with each other (cf. Section 3.4 and Section 4.4). Again we consider the

linear function m(t) := t, t E R, which joins under Ho all the points (k, :E{s(~))),

k E N, if p # 0, and it joins all the points (k, E{s(~))), k E N: if p = 0.

Without l o s of generality let p = 1. Shen similarly to Section 4.4, we can draw

the graph of the 45 degree line m(t) = t which joins the points (k, E(s(~))), k E N.

In a similar vein as in Figure 4.4.2, where we focused on at most two change-points,

we can draw a more generalized graph, focusing on at most s change-points.

For each @en kl, k2, . . ., k,, 1 5 kl < k2 < . . . < kr < n, there correspond the s

rectangles with endpoints (ki-1, E{S(O) )) , (ki-1, ~E{s(ki)}) (k+l , ~ { ~ ( k i ) 1) and

(ki+l, E{S(O) }) , 1 5 i 5 s, respectively, where ko := O and k,+l := n. Each of these

rectangles with length (ki+l - ki-l) and height IE{S(ki)) has area (ki+l - ki-i) x

E{S(ki ) ) , 1 5 i 5 s, respectively, where ko := O and ksii := n.

By reflecting some parts of these rectangles5 around the 45 degree line m =

t we can construct a new rectangle with enpoints ( o , ~ ~ ( s ( o ) ) ) , (o,JE{s(~))),

(k, , E{S(~))) and (k,, E{S(O) )) . This rectangle, with length k,; height E{S(n))

and area k, x IE{S(n) ), has now, under Ho, the same area as the sum of the previous

areas. Consequently, for each given combination of 1 < kl < k2 < . . . c k, < n,

with ko := O and k,,l := n, is an unbiased estimator of zero under Ho. Moreover,

T(") kl iky---,kr is equivalent to ZklYk2,...,+ in (5.1.1) with

is a linear combination of partial sums and ti = 2, as n + oo,

(n> h(x, Y) = - y- Since Tklyk2,.-.y~ 1 5 i 5 s, Tbeorem 5.2.1 yields,

SNote that these rectangles are overlapping.

5.5. Multiple Changes in the Variance 174

with to := O and ts+l := 1, and where the limiting process is the same as îa( t l , t2,

. . ., ts) fkom (5.25)-

5.5 Multiple Changes in the Variance

We axe to test the no-change in the variance hypothesis

Ho : XI, . . . , X, are independent identiually distributed mndom variables

v i t h a = p a n d 0 < $ = V a r X i < m , l < i < n ,

against the ut most s changes in the variance alternative

~ 2 ) : X I , - - - , X, are independent randorn variables and there ezist s

integers 1 i p - 2 , . .-,T~, 1 5 TI 5 73 5 . - . < rS < n, such that - VarXl = . . . = VarX, # VUTX,,~ = . . . = VarX,, VarXsiCl =

. . . = VarXT2 # V~TX,,+~ = . .. = V ~ T X , , ..., VarX,,-,+l =

. . . = VarX,, # VarX,,,, = . . . = VarX,, O < Var&, VUTX,,+~,

V ~ T X , + ~ , . . ., v a ~ X , + ~ < w, and EXl = . . . = B X , = p.

Again, as in Sections 3.5 and 4.5, we suggest the use of the symmetric kemel

and, after some algebraic manipulations, ZkiJi2,...,ks in (5.1-1) can be written as

where ko := O, kS+i := n, S ( k ) := Xi and R(k) := c:=~ X:. By Theorem 5.3.1,

as n + w, we have that under Ho

5.6. Multiple Changes in the Mean and/or Variance 175

with to := O and t,+l := 1, and where the limiting process is the same as rs"(ti,

t2, . . .t ts) kom (5.3.6).

5.6 Multiple Changes in the Mean and/or Vari-

ance

In the previous chapters, as weU as in the previous sections of this chapter, we

considered tests for changes in the mean or changes in the variance, separately only.

Frequently, it is of interest to be able to test for changes in both the mean and

the variance. It tums out that this is not an easy task in general. Based on U-

statistics-type processes, here we propose a test statistic that will test for changes

either in the mean or the variance or both. Unfortunately, in the following setup

we are not able to distinguish between changes in both and changes in only one of

them. Nevertheless, this test can be used when both depend on each other, namely

the mean changes if and only if the h a n c e changes (cf. Section 6.5). In case of

independent normal variables Gombay and H o d t h (1997) proposed an estimator

for testing one single change in the mean and/or variance using the likelihood ratio

test.

We are to test the no-change in the mean and variance hypothesis

Ho : X I , . . . , X, are independent identically dktributed random variables

with E& = p and O < o2 = VarXi < 03, 1 5 i 5 n,

against the at most s changes in the mean and/or variance alternative

~ 2 ) : XI, . . . , X, are independent randorn variables and there ezist s

integers T~~Q,.. .,q, 1 5 TI < 72 -< . . . 5 rS < n, such that

EXl = .. . = EXn # EX,+1 = . .. = IEX, and/or Var& =

. . . = VarX, # V ~ T X , + ~ = . . . = VarX,, - - . . . - -

E X . # EXT2+1 = . . . = dFX, and/or VUTX,+~ = . . . = VarX, #

5.6. Multiple Changes in the Mean and/or Variance 176

VarXncl = . . . = VarX,, . . . , EXTd-,+l = . . . = EX, # BXTd+l =

.-. - - EXrn and/or V~TX,-,+~ = . . . - - VarX,, # VarX,,,, =

. . - = Var&, ando < VarXi , VUTX,+~, VarX,+l, . . ., VUTX,,+~ < W.

it is reasonable to consider symmetric kernels of the form

Consequently, under Ho, h(Xi, X j ) is an unbiased estimator for B = 2 + a2. It iç obvious that changes in p or a2 or in both will change B. By using this kernel

function, we can not distinguish which parameter changed. Nevertheless it may

be used to detect, if there were any changes at ail in any one, or in both of these

paramet ers.

To apply our theory on U-statistic based processes, we assume that under Ho

and put

Furthemore, we also assume that under Ho

is positive and finite. Hence, after some algebraic manipulations, Zki,kz,-..,ks in (5.1-1)

with the kernel £rom above can be written as

where ko := O, k,+l := n and R(k) := ~ f = , X:. By Theorem 5.3.1 we have that

mder 6, as n + oo,

with to := O and t,+l := 1, and where the limiting process is the sarne as rsp(tl, t2, . - . , ts ) frOm (5.3.6).

5.7 Estimating the Number of Changepoints

When dealing with multiple change-points one of the questions that occurs is 'how

many (change-points) are there?'. This tums out to be not an easy task and there

are not many papers mitten on this topic. Yao (1988) suggested an estimator

for the number of changes in the mean when we have normal observations with

cornmon variance via a maximum likelihood argument and provided consistent way

of estimating the tme number of change-points. Serbinowska (1996) showed the

consistency of the maximum likelihood estimator for the number of changes in case

of binomial observations. Other related references about multiple change-points can

be found in Csorgo and Homath (1997) and Lee (1996).

Lee (1996) obtallis a nonparanietric estimator for the number of change-points

and proved its consistency. Namely, we are to test the null hypothesis Ho from

Section 5.1 against the alternative

H!) : XI,. . . , X, are independent randorn variables and there ezist s,

5.7. Estimating the Number of Change-points 178

integers TI = +), ~2 = ~ ~ ( n ) , - - . , T- = Ts,(?t), 1 < TI < 7-3 < . . . < rS, < n, such that P{Xi 5 t ) = . . . = P{X, 5 t) , P{X,,+l

t ) = . . . = P{Xn 5 t), . . . , qXr.n+i 5 t} = . . . = P{X* 5 t )

for al2 t and P{X, 5 t ~ ) # IP(XTi+l 5 to) for some to and for al2

l s i < s , .

Uçing the idea of a window of observations (with length &), which is due to Lombard

and Carroll (1992), Lee (1996) considered the difFerence Dj of two weighted empirical

measures at each possible location j of changepoint a s follows:

where ci, 1 5 i 5 4: is a sequence of positive numbers, llcll = 4- and x E R. Assuming that the difference betweec two successive change-points is

larger than 2A,, we can compte the expected value of D?)(z) for each s. Note

that for each j we can think of 'comparing' the block of random variables (xj , Xj-1, . . ., Xj-a,+2y X,-A,+~) to the b10& of random variables (xj+1, Xj+2, . . ., Xj+a,-i, x j + A n ) - Moreover, there iç at most one change-point in the combined

b10c.k (xj-AnCl7. . . , xj+An) , since by our assumption the Merence between two

successive change-points is larger than 2%. Hence, if the compared random vari-

ables are both before or both after the change, then they have the same distribution.

Therefore,


for some r E 1,. . . , s,, and x E R.

Let

llcll Ci=lsr-jl+i an(j) =: {I An if 17, - jl 5 An - 1 for some T,,

0, O therwise.

Then it is easy to see that 6,(j) is increasing, if j - A, < j 5 T,, and decreasing,

if T, 5 j 5 2&. Moreover, it takes its local maximum at the change-point T,. The

same holds true for A?), if we assume, that sup,,, (P(& 5 z) - IP(X,,+, < z))

is positive as., 1 5 r 5 s,. Therefore, as an estimator for the true number of

change-points, Lee (1996) suggests the estimator Sn with

rî, := # { j : SU^ D:) (z) 2 Cn and SU^ DP) (z) > sup DC) (z) for ZER XER zER

j - R , < m < j and sup~(i"'(z) 1 s u p ~ g ) ( z ) for j < m < j + % ) , zER ZER

*+O and +O. Xt is obvious that j, = O if sup,,, where Cn satisfies cn n-tm A n n+w

D?) (z) < Cn for al1 possible j'ç.

Under assurnptions (Al) - (A3) of Lee (1996, page %?), Lee (1996) establishes

that, as n -+ oo,

1 1% note that, for example, q, 1 5 i 5 A,, may be chosen as ci = 1 or q = a + a for a > O and b 2 9.

Vostrikova (1981) suggested a binary segmentation method and also proved its


consistency. Shis binary segmentation procedure detects the number of change-

points and their locations simultaneously. Namely, if we were to test for Ho vs. H:)

(cf. Section 5.1), we h t test Ho, the null hypothesis of no-change, against HA

(cf. Section 3.1)) the alternative of one single change in the distribution. If Ho is

not rejected,' then we stop. On the other hand, if Ho is rejected, then we know

that at least one change-point is indicated and we test the two subsequences before

and after the already located change-point for the possibility of having a t most one

further change-point in them. The procedure stops, when no further subsequences

have changepoints. The number of change-points found estimates the true number

of change-points, S.

This procedure çuggests that when looking for the true number of change-points,

we should first test for one single changepoint. On the other hand, this test statistic

has to be consistent under the alternative of more than only one single changepoint.

But we saw in the beginning of Section 4.1 that, depending on the test statistics

used, testing for more thm one change can be inconsistent in special cases.

Recall notations from Section 5.1.2, especially

Suppose we were to test Ho vs. H:) by using a properly normalized supfunctional

of the stochastic process Z[(,+lltI, O < t < 1, fkom (3.1.1) as a test statistics. m e r

some algebraic manipulations it turns out that the limiting function of -$Z[(n+i)tl

under H:) is of the form

where Ai and Bj are fimctions of X k 7 1 5 k < s + 1, and 1 5 p 5 r 5 s + 1.

We can see that U X ~ , \ ~ , . . . , X , (t) = O for aU O < t < 1 if and only if Oj+i j+i = O, dj = O

and Bi = O, where O 5 j 5 S. Hence, any symmetric kernel will be consistent


when testing Ho vs. H?)~ but antisymmetnc kemels may faîl to be. This suggests

t o combine the procedure suggested by Vostrikova (1981) with the statistics studied

in Section 3.3, namely sup,,,,, IU,(t) 1 as defined in in (3.1-15). Thus we aim at

having a Vostrikova-type procedure in a non-parametric context. D e h e

and the foliowing matrices that depend on the number of change-points s and on î, which depends on t and the location of the change-points:


Obviously, the product of the fkt three matrices and the last three matrices is

in RIX1. The structure of the matrices is quite complicated. This is due to the

fact that for fked s we still have to consider where t is located. This, of course,

will now change which also changes the dimensions of the three matices A, B

and C. With similar arguments as in Section 5.2.2, namely using Theorem 2.6.2

and Hoeffding's SLLN (cf. Theorem 2.6.1) respectively, we arrive at the following

theorem (cf. Theorem 5.3.3).

Theorem 5.7.1 Assume that (2.2.2), (5.1.20), (5.1.21) and ~ 2 ) from Section 5.1

hold. Define to := O and ts+l := S. If .ri = ~ ( n ) := [di], i = 1,2, . - . , s,

0 < X 1 < X 2 < ... < X , < l , then, asn- too ,

where

and the matfices A,,;, B,,i, Ci, Ds, Es and F are defined above.


We observe that if Oij = B for ail possible choices of 1 5 i 5 j 5 s + 1, then, as

n r w ,

which corresponds to the case where we dont have any changes as in (3.1.7) when 1 we replace EZk by b,, 21EZf(n+l)tl. Moreover, it follows that under the nuU

hypothesis Ho of no-change, as n -t 00,

5.7. Eçtimating the Number of Change-points 184

Assuming that there are two changes in the distribution then the latter theorem

implies the following. As n + m,

We note that this is the same limiting function as used in (4.1.1).

Theorem 3.3.2 implies that,

is consistent againçt any class of alternatives, if at least one B i j is not equal to O.

Hence,

This implies that the limi.i;s of the sequence {TnInEN are different in probabüity

under Ho and H!)~ and hence we have consistency of {Tn)nEN

5.7. Estirnating the Number of Change-points 185

Now we continue dong the lines of Vostrikova (1981), and search for the argu-

ment, where the test statistic Tn takes its maximum. Unfortunately, the maximum

is not always taken at one of the change-points, as we saw this in case of one sin-

gle change-point in Section 3.3.3. Therefore we don% know, if the point where we

split the intervals is a change-point or not. Hence it is possible that the number of

possible change-points can be too big. Of course, if the change-points are in such

a way that the maximum is always taken at one of them then the procedure works

out fine.

Summarîaing, we showed that T, which mas used in Section 3.3 in the context of

testing for one single change-point is &O consistent when testing for s, 1 5 s < n,

change-points. Due to the fact that the local maximas are not necessarily taken at

the changepoints, one may, for example, prefer the use of the estimator 3, in (5.7.1)

proposed by Lee (1996) to test for the tme number of change-points.

"'Imagination is more important

than knowledge."

- Albert Einstein

Chapter 6

Applying Change-point Theory to the

Financial Market

6.1 Introduction

In 1973, F. BIack and M. Scholes denved a formula for option pricing, that has

been called the Black-Scholes formula since. They worked closely together with

R. Merton, and in 1997 Merton and SchoIes were awarded the "Nobel Prize in

Economicd Sciences". Black died in 1995 in his mid-fifties. Thousands of traders

and investors now use this formula every day to value stock options in markets

throughout the world. Black, Merton and Scholes thus contributed to the rapid

growth of markets for derivatives in the last 10 yearç.

The derivation of the Black-Scholes formula involves many areas in Probability

theory, for example, Martingale theory, Wiener processes, Itô processes, Stochastic

integration and Stochastic differentiation. In the next sections we give a glimpse for

some of the basic notions.

The Black-Scholes formula is used to compute the value of the so-cded Euro-

pean options and other derivative securities. In their mode1 the secalled volatility

parameter a is assumed to be a constant. It (the variance (wlatility)) is bequently

estimated via histoncal data. Using the results from the previous sections, we WU

propose a test procedure for testing for possible changes in the volatiliw in the

Black-Scholes setup.

6.2. Derivative Securities 187

In practice, changes of the variance (volatility) are very important to know about,

since these changes will affect the behavior of an investor. The higher the variance

(volatili@), the higher is the value of an option in the Black-Scholes model.

Using the methodologies fYom the previous chapters, we will a h a t explaining

how to detect changes in the variance (volatility). &O, we WU see that changes

in the mean of the stock pricel do not affect the Black-Scholes formula, hence the

value of an option.

For further reference and a more detailed description, we refer t o J. Hull (1993),

especially Chapters 1, 9 and 10. We will closely follow his presentation of the matter

in hand. For a review of the Black-Scholes formula we also refer, for exampIe, to

D d e (1996), and to Csorgo (1999), who detailç the original derivation of Black and

Scholes (1973).

6.2 Derivative Securities

A derivative security is a variable depending on other more basic underlying vari-

ables. During the last few years, derivative securities have become more important in

the financial market. DEerent possibilities of derivatives, for example, are forward

and futures contracts or options. We will give an overview on these derivatives.

6.2.1 Forward Contracts

A f o m d contract is an agreement, for example between two financial institutions

or a financial institution and one of its clients, to buy or sell an asset a t a given tirne

for a certain price, the so-called deliuery pn'ce. It is not traded on an exchange and

the parties usually know each other. One of the parties agrees to buy the underlying

a s e t at a certain specified date and price, which is called the long position, and the

other party agrees to sell it, and this is called the short position. The holder of the

'We emphasize that we wiii denote throughout this chapter the stock pr i e at time t by S(t). This is the same notation as used widely in the finance literature. On the other hônd, in mathematical literature S(n) uçually denotes the sum of n randorn variables. Although this was already used in previous chapters we will change our notation in this chapter.

short position defivers the asset to the holder of the long position in return for a

cash amount, the delivery price. A forward contract is worth zero when it is first

entered into. Depending on the movements in the price of the asset it can have a

positive or negative value later on. The pay-off boom a long position in a forwazd

contract on one unit of an asset is

where K is the delivery price and S(T) the spot price of the underlying asset at

maturity time T of the contract. Of course, the pay-off from a short position is

Both pay-offs can be positive or negative.

6.2.2 Futures Contracts

A futures contract is an agreement between two parties to buy or sell an asset a t a

given time for a certain price. I t is traded on an exchange and the parties usually

do not know each other. The underlying asset could be a commodity, e. g., sügar,

lumber, gold or cows, or a hancial asset, e. g., currencies, treasury bills or bonds.

An exact delivery date is usually not specified, but there is a delivery month. The

exchange specSes the period during the month when delivery must be made.

6.2.3 Options

There are two different types of options. The d l option gives the holder the right,

but not the obligation, to buy the underlying asset, e. g., stocks, foreign currencies,

commodities, or futures contracts, a t a given time t = T for a certain pnce K, the

so-called ezercise or strike pn'ce or maturity. Similarly, a put option gives the holder

the right so seU. There are different khds of options.

Suppose, we sign at time t = O a contract which gives us the right (option) to

6 -2. Derivative Securities 189

buy one share of a stock at a specified price K at a specified t h e T. Then we will

excercise the option, this means realize the right to buy at the exercise pnce K, if

the price of the stock S(T) is greater than K, the price we agreed to pay for the

stock. On the other hand, if K 2 S(T), we do not exercise the option, since the

@ce of the stock S(T) is l e s than the specified price K. Of course, exactly the

opposite is true, if we have the option to sell instead of the option to buy.

It is clear that an investor must pay a fked amount of money to purchase an

option contract. The Black-Scholes formula dows us to compute how much some-

body should be willing to pay for an option contract. Moreover, there are two sides

to every option contract, the investor who has taken the long position (i.e., who has

bought the option) and the investor who has taken the short position (i.e., who has

sold or written the option). The investor with the short position receives cash up

front but may have liabilities later. The profit or l o s of both investors is the reverse

of each other.

For now, let us consider a cal1 option in the long position for one single share of

a stock. We r e c d that K is the delivery pnce. Then the pay-off when buying at a

specified time T is

(S(T) - K)+ = max(S(T) - K,O),

which is the so-called European cal1 option (option exercised a t maturity t = T)

mentioned before, there are diaerent options and by allowing to buy (exercise

option) at any time from now (t = O) to maturity (t = T) the pay-off is

{(W - K)+Yo I t 5 T),

which is the so-called Arne7ican cal1 option (option exercised at any time horn t

A s

the

= O

to maturity t = 2'). If we allow t o buy when the price of the stock is a maximum

over a specified period then the pay-off is

6.3. ModeLing the Behavior of Stock Prices 190

which is the so-called call on maximum option, or look back option. Simi ly , we

may d o w to buy when looking at the average of the stock pnce over a specified

period of time. Then the pay-off is

which is the so-called Asian option, or cal2 on average option.

Some of these options are riskier than the others. Hence, each of them will have

a different price. For example, a n Amez-îcan option does not have the same risk as

a European option, since we may exercise at any time fi0111 nom to maturiS. This

will make the Amencan option more expensive than the European. Of course, there

are many other options and there is a lot of fkeedom about the characteristics of an

option.

6.3 Modeling the Behavior of Stock Prices

Stock prices are usually assumed to foUow a Markov process. This means that the

probability distribution of the pnce of any particdar future t h e depends only on

the current stock price S(t) . Hence, the present price of a stock S(t) impounds all

the information contained in a record of past prices S(t - t*), t* < t. If the Markov

prope* does not hold, then one could make lots of profit by comparing charts of

the past stock prices. Hence, it is a reasonable assumption.

Wbile a Wiener process has mean or drift rate O and variance 1, we can d e k e the

so-called generalized Wiener process for a variable x which has drift p and variance

9, namely

The pdt term means that z has an expected drift of p per unit time, hence in a

time i n t e d of length T, x increases by an amount of PT. The adW term is the

so-called noise added to the path followed by x.

Since the parameters p and O may also depend on the Msiable z and the time

t, we mo- equation (6.3.1), and get a so-called Itô process

Now we are in the position to define a process for the stock price S(t). -4 first guess

would be that S(t) follows a generalized Wiener process with constant expected drift

and constant variance. However, this process fails, since the expected percentage

return required by inveçtors from a stock is independent of S(t) . For example, an

investor will require a 10% per annum expected return when S(t) is 810 as well as

when it is $20. Hence, we assume that the expected drift rate in S(t) is pS(t) for

some constant parameter p. Therefore,

and, when compounded continuously

so that

When considering a stock we also have to consider its volatility. We do this by

assuming that the variance of the percentage return in a short period of time At is the

same, indepent of S(t). Hence, we express S(t) by an Itô process with instantaneous

expected drift rate p and instantaneous variance 02, i. e., we assume

We mention that /I E R is the constant ezpected rate of return and o > O the so-

called stock pn'ce ~olati l i ty . p depends on the Ievel of interest rates in the economy.

6.3. Modeling the Behavior of Stock Prices 192

The lower the Ievel of interest rates, the lower the expected return required on any

stock. Typical values for o are in the range 0.2 to 0.4, i-e. 20% to 40%-

We even rnay sirnulate the movements in the stock price by using Monte Carlo

Simulation. We give an example of one possible pattern of a stock price movement,

since diaerent random &ables wiU produce another pattern. But by repeating this

procedure many times, we rnay even estimate the distribution function S(T) for a

given T. Since

S(t + At) - S(t)

S(t) = p 4 t + fae6

v with c = N(0,l) we have that

Hence, we may compute the price of the stock the following day, S ( t + At), as

foilows:

where again E N(0, l ) .

We now consider S(t) over a penod of one year, T = 1, and we consider daily

changes in S(t) , At = &. We assume to have an initial stock price S(0) of $20 and

an exptected return of 16% per annum, hence p = 0.16. Moreover, we draw a graph

with a volatiiïty of 20% per annum: ol = 0.2, and 40% per annum, 02 = 0.4. For

both patterns we will use the same generated random variables el,. . . , €365 which

are standard normal distributed. Again we mention that these are only two possible

patterns, and they depend on the generated random variables ~ 1 , . . . , ~365, but we

can see the eEect of different variances on the stock price.

In Figure 6.3.1 we show the patterns of the stock price for different volatilities and

we see that the stock with a volatiliG- of 40% per annurn is much more fluctuating

around the expected drift than the one with 20%. Hence, the higher the volatility

Trading Qys t

Figure 6.3.1: Daily stock pnces with different volatilities ai = 0.2 and o2 = 0.4.

the higher the risk for a n investor. This is one of the motivations where change-

point theory cornes into the picture, for, as the volatility changes, the behavior of

an investor might also change. This d l ako effect how to pnce an option which

depends on the behavior of the stock S(t) and hence dso on its volatility.

For an orner of a stock, the movements in the stock tend to offset each other.

But for an owner of an option this is different. For example, an owner of a c d

option (see Section 6.2.3), who has the nght to buy an option at a specified tirne

and price, benefits from price increase, but has iimited d o m i d e risk if the premium

decreases. Therefore the value of an option increases as the volatüity increases.

6.4. The Black-Scholes Formula 194

6.4 The Black-Scholes Formula

Using the notions fiom Cs6rgo (1999, Section 4) in this section, we assume, as

in (6.3.3), that the stock prie process {S = S(t), t 2 0) is govemed (driven) by

a standard Wiener process {W = W(t), O 5 t < w) on some probabüity space

(il, 3, P) via the Itô process

dS(t) = pS (t) dt + oS(t) d W (t) , O 5 t 5 T.

Assume further that the value or price C of a Euopean c d option at any time

t E [O, T] depends only on the underlying stock price S(t) and the time t, i.e., we

have

and, in addition, rhat the real vdued function C = C(t, S(t)) on [O, Tl x (0, CU) is

continuously differentiable in t and twice continuously differentiable in S, denoted

by C E C1$2[[0, T) x (O, cm)]. We note that the processes in (6.4.1) and (6.4.2) are

driven by the same Wiener process. Itô showed that C is again an Itô process, and

by Itô's chah mle formuIa

where dS is given by (6.4.1). Hence we have

where, for computing ( p S d t + o S d ~ ) * , we used the following "multiplication table"


for diaerentials (cf. Karatzas and Shreve (1988, p. 154))

d w I 0 d t .

The geometrîc Brownian motion process

is a solution of (6.4.1), starting hom S(0) at time O. Indeed, on letting f (t, W) :=

S(t), then Itô's formula (cf- (6.4.3)) for the Itô proces in (6.4.1) is

By (6.4.5) we a.rrive at the differentials

which, in tum, via (6.4.6) yield (6.4.1) as desired, namely

dt

= aS(t)dW(t)+pS(t)dt, O i t ST.

The uniqueness of the solution S in (6.4.5) for (6.4.1) follows from a general result

of Itô, which states that a stochastic differential equation wïth Lipschitz continuous

coefficients has a unique solution.

By (6.4.5), we have that for each t E [O, Tl

S m logS(T) -logS(t) = log- W )

6 -4- The Black-Scholes Formula 196

Fdowing Black and Scholes (1973), we now derive the Black-Scholes formula.

Their assumptions are:

0 The stock price S(t) follows a geometric Brownian motion with constant p and

constant a,

Ko transaction costs, no taxes, and no dividends involved,

Shoa selling of securities is permitted,

The trading is continuous,

0 No arbitrage opportunities, which means no riskless profit by simultaneously

entering into transaction in more than one market, e. g. to buy 1000 shares

of a stock in New York for $320 and sell it after considering the exchange rate

for $350 in Viema,

The risk-fiee interest rate r is constant and the same for dl maturities.

Black and Scholes define a portfolio such that the holder of it is short one deriva-

tive security and long an amoünt of shares. Hence, the value of the portfolio,

Say P, is

Using (6.4.1) and (6.4.4) we get that

Usùig the no-arbitrage condition, it follows that we have


which, in turn, implies the Black-Schola partial dilferential equation

where C is a function of S and t. It is very interesting to see that the expected

return on the stock p drops out of the equation and d o s not have any influence on

pricing the option.

Note that this equation does not involve any variables that are affected by the rÏsk

preferences of investors. It would not be independent of risk preferences if it involved

the expected return p. In fact, the higher the level of risk aversion by investors, the

higher p will be for any giwn stock, hence p dopends on risk preferences (cf. Hull

(1993, sectian 10.8)). Remarkably, under the Black-Scholes model, ,tu drops out of

the equation.

Subject to C(T, S) = (S - K)+, equation (6.4.9) can be solved, i. e., in case of

a Euopean cal1 option as in Section 6.2.3 (cf. Black and Scholes (IN"?)), the price

C of the option a t any t ime t = 4 E [O, T] is the Black-Scholes formula a s follows,

= S(to)Q ( d ) - ~e-'(*-%D(d - O ~ T - to) ,

where

The formula says that the option value C is higher the higher the price of the share

today, S(to), the higher the volatility of the share price, 0, the higher the risk-fiee

interest rate, r, the longer the time to maturity, T, the higher the probability that

the option will be exercised, and the lower the stlike price, K. We note that

6.5. ChangesintheVolatility 198

denotes the standard normal distribution function.

6.5 Changes in the Volatility

In the previous section we stated the Black-Scholes formula that allows us to com-

pute the value of a European c d option assuming that S(t) is a geometrïc Brownian

motion with constants p and 02. Since most of the factors in this formula are fixed

in advance, e. g., we know the price of the share today, S(t), the t h e of maturi@,

T, the strike price, K, and more or less the risk free interest rate, r , we propose to

test whether the volatility, O, may have changed or not. Frequently, o is estimated

by histoncal data, for instance, via the stock pnces of the last n days.

We will see that the problem of detecting changes in the volatility in stock prices

is a very dificult task due to its complicated behavior. We will see that although

under the assumptions of the Black-Scholes model we have a consistent estimator to

test for s changes in the volatility, but it wiU not be the case, if we change the mode1

a bit. This will be clear when we exchange the assumption of geometric Brownian

motion (cf. (6.4.5)) by geometric hactional Brownian motion (cf. (6 -5.2)) .

In Figure 6.5.2 we see the dependency of the price of a European call option on

the volatility o. As o becornes big, the price of the option increases. Hence, when

estimating the volatility using historical data, change in the volatility will effect its

estimator. We fixed the spot price S(0) = $20, the rkk-free interest r = 0.04,

T = $ which is equivalent to a period of half a year and then we can see graphs

for the strike price K = $15 and K = $25, respectively. Of course, when the strike

price is below the spot price then the option will be more expensive, since we buy

at time t = O the right to buy in half a year a share for l e s money than the spot

pnce S(0). The price of the option goes down, if the spot price is below the strike

pnce.

Let S( t l ) , . . . , S ( L + ~ ) be the stock pnces of the last n + 1 days, following a

geometric Brownian motion under the Black-Scholes model (cf. (6.4.5), where p is

6.5. Changes in the Volatility 199

Figure 6.5.2: Price of an European cal1 option depending on a for K =$15 (above) and K=$25 (below).

replaced by r ) , and let At = ti - ti-1, 1 < i 5 n + 1. Typically ti = &. Thus,

which is the continuously compounded return in the i-th intenal and S(ti) =

~ ( t ~ - ~ ) & 1 . We note that the ,&'s are i.i.d. r.v.'s, nomally distnbuted as in-

dicated. Since we are interested in testing for changes in the volatility a, we define

l s isn .

We note in passing that in this specific set-up the volatility o in the Black-Schoies

formula (cf. (6.4.10)) can be estimated via the maximum likelihood estimator (MLE).

6.5. Changes in the Volatüity 200

Using the fact that, in this case, T and At are both known, we get

mhich may replace the usually suggested sample variance for estimating O in (6.4.10).

Suppose we are testing for two changes in the volatility (variance). From Sec-

tion 4.5 we know that we may use a supfunctional of a properly normalized U-

statistics based process with kernel function h(x, y) = i ( x - y)2 to test for changes

in the variance.

Recall that for this test statistic we had to assume that under Ho we have

i.i.d. r.v.'s with finite non-zero variance 02 and under ~ f ) we have independent

random variables with at most two changes in the &ance, but no changes in the

mean. Since r and At are constant, we observe that our random &ables Zi,

1 5 i 5 n, satis& the assumptions made under Ho, but violate the assumptions

under HT), since EZi depends on Var& = aZ, if r # $. On the other hand, we saw in Section 5.6 that by using the kernel h(z, y) = 9,

we cm test for changes in the mean and/or variance. In general, the test does not

distinguish between changes in the mean or changes in the variance, but it wül

reject, when at least one of them changes. Fortunately, in our present model, T and

At are constant, which means that JEZi will change if and only if o2 changes. Thus,

in this specific set-up, the statistics studied in Section 5.6 will reject Ho when the

mean and the variance change simultaneously. Narnely, on estimating 02 via (6.5.1)

and 2 via a similar procedure, an appropriate test statistic can now be based on

with ko := O, k3 := n, a* := I V a r ( g ) , 4 and R(k) := ~ f . = , q, to test for at most

two change-points in the volatility o. Under Ho such a test statistic converges in

probability to the sup-functional of the Gauçsian process in (4.3.10) and under Hf)

it converges in probability to m.

Assuming that we detected at most two changes - ~ f ) holds - then i t is not

clear how to estimate a in the Black-Scholes formula (cf. (6.4.10)), since we don't

know in general where the changepoints are. Unfortunately, the maximum of the

test statistic iç not necessarily taken at the times of change.

In the previous setup, which was essential to test for changes in the variance, we

assumed that the stock pnce S(t) follows a geometnc Brownian motion. That the

test statistic is not applicable in every mode1 will be clear, if we assume tha t the

stock price, Say SE(t), foUows a geometric fractional Brownîan motion instead of a

geometric Brownian motion. Hence, instead of (6.4.5), we start with a geometric

fiactional Brownian motion of order O < H < 1 of the f o m (cf. Csorgo (1999))

which is a centered Gaussian process (WH(t); O 5 t < CO) with stationary incre-

ments and WH@) = O , i. e.,

We note that H is the so-called Hurst constant. For further references to the frac-

tional Brownian motion in the context of hancial modeling we refer to Salopek

(1997, Chapter 1) as well as to her references. Obviously for H = 6 we have a geo-

metric Brownian motion. Using now this geometric fkactional Brownian motion then

we may rnodiS the Black-Scholes formula and calculate the pnce C of an European

call option at to = O as follows:

6.5. Changes in the Volatility 202

where

has variance 9, but the 2 , ' s fail to be independent, if H # $. Note that under the

null hypothesis of no-change IEZF depends on t:R - t:: and

Therefore the joint distribution of zP, . . . ,z: will be different than the product of

the corresponding marginal distributions and our proposed test statistic fails, due

to the dependence of the observations.

Summarizing, we can test for changes in the volatility when the stock pnces

are assumed to follow a geometric Brownian motion. This assumption d o w s us

to produce i.i.d. r.v.'s by looking at the log-difference of the stock prices S(t). We

suggested a test statistic that will test for a t most two changes in the voiatility.

Using the results from the previous sections, we may also test for more, or &O less

change-points.

Bibliography

[l] Billingsley, P. (1986). Probability and Measure. 2nd ed. Wiley, New York.

[2] Bimbaurn, Z.W. and Pyke, R. (1958). On Some Distributions Related to the

Statistic D:. AM. Math. Statist., 29, 179-187.

[3] Black, F. and Scholes, M. (1973). The Pricing of Options and Corporate Lia-

bilities. Journal of Political Economy, 8 1, 637-654.

[4] Brodsky, B.E. and Darkhovsky, B.S. (1993). Nonparametric Metods in Change-

Point Problems. Kluwer, Dordrecht.

[5] Broemeling, L.D. and Tsunimi, H. (1987). Econometrics and Structural

Change. Marcel Dekker. New York.

[6] Casella, G. and Berger, R. (1990). Statistical Inference. Duxbury Press. Bel-

mont.

[7] Cramér, H. and Leadbetter, M.R. (1967). Stationary and Related Stochas-

tic Processes; Sample h c t ion Properties and their Applications. Wiley, New

York.

[8] Csorgo, M. (1979). Broumian Motion - Wiener Process. Canad. Math. Bull. Vol

22 (3), 257-279.

[9] Csorg6, M. (1983). Quantile Processes with Statistical Applications. Society for

Industrial and Applied Mat hemat ics . P hiladelphia, Pennsylvania.

[IO] Csorgo, M. (1999). Random Walking Around Financial Mathematics. Preprint.

[il] Csorg6, M., CsorgO, S. and Homith, L. (1986). An Asymptotic Theory for

Ernpirical Reliability and Concentration Processes. Springer-Verlag. Berlin.

[12] Csorg6, M. and Horvgth, L. (1986). Invariance Principles for Changepoint

Pmblems. Techn. Rep. Senes of the Laboratory for Research in Statistics and

Probabilim Carleton U. - U. of Ottawa, N0.80.

[13] Cs6rg6, M. and Horvkth, L. (1988a). Nonparametric Methodi for Changepoint

Pro blems. Handbook of S tatistics, Vol. 7 403-425. Elsevier Science Publishers

B .V. (Nort-Holland) .

[14] Csorg6, M. and Horvath, L. (1988b). Invariance Principles for Changepoint

P r o b l m . Journal of Multivariate Analysis 27, 151-168.

[15] CsorgO, M. and H o d t h , L. (1993). Weighted Approximations in Probability

and S tatistics. John WiIey. Chichester.

[16] Csorgo, M. and Homith, L. (1997). Limit Theorems in Change-point Analysis.

John Wiley. Chichester.

[17] Csorg6, M. and Révész, P. (1981). Strong Approximations in Probability and

S tatistics. kademic Press, New York.

[18] Donsker, M. (1951). An Invariance Principle for Certain Pro bability Limit The-

o r e m . Mem. Amer. Math. SOC., No. 6.

[19] Doob, J.L. (1953). Stochastic Processes. John Wiley. New York.

[20] Duffie, D. (1996). Dynamic Asset Pricing Theory (2nd edn). Princeton Univer-

sity Press. Princeton, New Jersey.

[21] ES~WOO~, B.J. and Eastwood, V.R. (1998). Tabulating Weighted sup-Nom

Fvnctionals of Brownian Bridges via Monte Carlo Simulation. Asymp t otic

Methods in Probability and Statistics - a Volume in Honour of MM& Csorgo

(ed. B. Szyszkowicz), 707-719, Elsevier Science B.V., Amsterdam

[22] Ferger, D. and Stute, W. (1992). Convergence of Changepoint Estimators.

Stoch. Proc. Appl., 42, 345351.

[23] Gombay, E. (1998). U-Statistics for Change under Alternatives. Preprint.

[24] Gombay, E. and Horviith, L. (1997). An Application of the Likelihood Method

to Change-Point Deteetion. EnWonmetrics, 8, 459-467.

[25] Gombay, E., Horviith, L. and HuSkowi, M. (1996). Estimators and Tests for

Change in the Vaciance. Statistics and Decisions, 14, 145-159.

[26] Hall, P. (1979). On the Invariance Principle for U-Statistics. J . Stoch. Proc.

Appl., 9, 163-174.

[27] Hoeffding, W. (1948). A Class of Statistics with Asymptotically Normal Distri-

bution. Ann. Math. Stat., 19, 293-325.

[28] Hoeffding, W. (1961). The Strong Law of Large Nvmbers for U-Statistics. Univ.

of North Carolina Institute of Statistics Mimeo Series, No. 302.

[29] Hull, J. (1993). Options, R i t w s , and Other Derivative Secunties (2nd edn).

Prentice Hall International Editions. Englewood C m , N J.

[30] Huse, V. (1988). On Some Nonparametric Methods for Changepoint Problems.

PhD Thesis, Carleton University, Ottawa, Canada.

[31] Janson, S. and Wichura, M.J. (1983). Invariance Principles for Stochastic A r a

and Related Stochastic Integrals. Stoch. Proc. and their .4ppl., 16, 71-84.

[32] Karatzas, 1. and Shreve, S. (1988)- Brownian Motion and Stochastic Calculus.

S pringer-Verlag. New York, N.Y.

[33] Kendall, M.G. and Stuart, A. (1963). The Advanced Theory of Statistics. Vol-

ume 1. Distribution Theory. Charles Gr i f f i & Company Limited, London.

[34] Kolmogorov, A.N. (1933). Sulla deteminazione empin'uz di una legge di dis-

tn'buzz'one. Giorn. Inst. Ital. Attuari, 4, 83-91.

[35] Lee, C.-B. (1996). Nonpararnetric Mvltiple Change-point Estirnato~s- Stat. and

Prob. Letters, 27, 295-304.

[36] Levin, B. and m e , J. (1985). The Cwvm Test of Hornogeneity with an Ap-

plication zn Spontaneous Abortion Epideméology. Statist. Med., 4, 469-488.

1371 Lombard, F. (1987). Rank Tets for Changepoint Problems. Biometrika, 74, 615-

624.

[38] Lombard, F. and Carroll, R-J. (1992). Change Point Estimation via Running

Cusums. Technical Report No. 155, 1992, Statistics Department, Texas A&M

University.

[39] Loynes, R.M. (IWO). An Invariance Principle for Reversed Martingales. Proc.

Math. Soc., 25, 56-64.

[40] Major, P. (1979). An Improvement of Strassen's Invariance Principle. AM.

Probability, 7, 55-61.

[41] Miller, R.G.,Jr. and Sen, P.K. (1972). Weak Convergence of U-Statistics and

Von Mises ' Difierentiable Statistical Functions. AM. Math. Stat ., 43, 31-41.

[42] Resnick, S. (1992). Adventures in Stochastic Processes. Birkhiiuser. Boston.

[43] Salopek, D.M. (1997). Tolerance to Arbitrage: Inclusion of Fractional Brownian

Motion t o Mode1 Stock Price Fluctuations. PhD Thesis, Carleton University,

Ottawa, Canada.

[44] Sen, P.K. (1977). Almost Sure Convergence of Generalized Ci-Statistics. AM.

Prob., 5, 287-290.

[45] Serbinowska, M. (1996). Consistency of the Estimator of the Number of Change-

points in Binomial Observations. Statist. Probab. Letters, 29, 337-344.

1461 Serfhg, R (1980). Approximation Theorerns of Mathematical Statistics. John

Wiley. Chichester.

[47] Shorack, G.R. and W e b e r , J.A. (1986). Empirical Processes with Applications

to Statistics. Wiley, New York.

[48] Slutsky, E. (1925). Über stochastische Asymptoter und Grenmede . Math. An-

nalen, 5, 93.

[49] Spanos, A. (1986). Statistical Foundations of Econometric Modelling. Cam-

bridge University Press. Cambridge.

[50] Szyszkowicz, B. (1991). Weighted Stochastic Processes under Contiguous Alter-

natives. C.R.Math.Rep.Acad.Sci. Canada 13, 211-216.

[51] Szyszkowicz, B. (1992). Weak Convergence of Stochastic Processes in Weighted

Metrics and their Applications to Contiguous Changepoint Analysis. PhD The-

sis, Carleton University, Ottawa, Canada.

[52] Szyszkowicz, B. (1995). Weighted Sequential Empirical Type Processes &th Ap-

plications to Change-point Problems. Techn. Rep. Series of the Laboratory for

Research in Statistics and Probability, Carleton U. - U. of Ottawa, No.276.

[53] Szyszkowicz, B. (1998). Weighted Sequential Empirical Type Processes with Ap-

plications to Change-point Problems. Handbook of Statistics, Vol. 16 (N. Bal-

alaishan and C.R. Rao, eds.), 573-630.

[54] Vostrïkova, L. Ju. (1981). Detecting "Disorder" in Multidimensional Random

Processes. Soviet Mathematics Doklady, 24, 55-59.

[55] Yao, Q. (1993). Tests for Change-Points with Epidemic A&ernatives.

Biometrika, 80, 179-191.

[56] Yao, Y.-C. (1988). Estimating the Number of Change-points via Schwarz's Cri-

terion. Statist. and Probab. Letters, 6, 181-189.

informatlon tq userscollectionscanada.gc.ca/obj/s4/f2/dsk1/tape10/pqdd_0002/... · 2005-02-11 ·...

Documents