structural change tests under heteroskedasticity: joint

Structural Change Tests under Heteroskedasticity:

Joint Estimation versus Two-Steps Methods

Pierre Perron∗

Boston University

Yohei Yamamoto†

Hitotsubashi University

August 22, 2020

Abstract

There has been a recent upsurge of interest in testing for structural changes in

heteroskedastic time series, as changes in the variance invalidate the asymptotic dis-

tribution of conventional structural change tests. Several tests have been proposed

that are robust to general form of heteroskedastic errors. The most popular use a

two-steps approach: first estimate the residuals assuming no changes in the regression

coefficients; second, use the residuals to approximate the heteroskedastic asymptotic

distribution or take an entire sample average to construct a test for which the variance

process is averaged out. An alternative approach was proposed by Perron, Yamamoto

and Zhou (2020) who provided a test for changes in the coefficients allowing for changes

in the variance of the error term. We show that it transforms the variance profile into

one that effectively has very little impact on the size of the test. With respect to the

power properties, the two-steps procedures can suffer from non-monotonic power prob-

lems in dynamic models and in static models with a correction for serial correlation in

the error. Most have power equals to size with zero-mean regressors. Even when the

two-steps tests have power, it is generally lower than that of the latter test.

JEL Classification Number: C14, C22

Keywords: heteroskedasticity, structural change test, non-monotonic power, vari-

ance profile, likelihood ratio test, CUSUM test, U-statistic.

∗Department of Economics, Boston University, 270 Bay State Rd., Boston, MA, 02215 ([email protected]).†Department of Economics, Hitotsubashi University, 2-1 Naka, Kunitachi, Tokyo, Japan, 186-8601

([email protected]).

1 Introduction

There has been a recent upsurge of interest in testing for structural changes in heteroskedastic

time series, as changes in the variance invalidate the asymptotic distribution of the conven-

tional structural change tests in a conditional mean model, such as the cumulative sum

() and supremum of likelihood ratio (sup) tests. Several methods have been

proposed in both the statistic and econometrics literature to have tests that are robust, at

least asymptotically, to general forms of heteroskedastic errors or variance profiles. The most

popular method can be labelled as a two-steps approach: first estimate the residuals assum-

ing the null hypothesis of no change in the regression coefficients; second, use the residuals to

approximate the heteroskedastic asymptotic distribution or take an entire sample average to

construct a test for which the variance process is averaged out. Examples of the first group

include Zhou (2013) who proposed a wild bootstrap procedure to mimic the heteroskedastic

Wiener process, Xu (2015) who used the null residuals to construct the time transformed

Wiener process, and Górecki, Horváth and Kokiszka (2018) who applied a Karhunen-Loeve

expansion of the Brownian bridge with the kernels constructed using the null residuals. Ex-

amples of the second group include Dalla, Giraitis and Phillips (2017) who provided a simple

U-statistic for which the heteroskedastic null residuals are averaged out in the entire sample

long-run variance estimate and Zhang and Wu (2019) who proposed a U-statistic avoiding

estimating the long-run variance using a feasible generalized least squares procedure. These

tests control size asymptotically under general variance profiles and are successful in pro-

viding an exact size close to the nominal level in finite samples. However, when a change

in the conditional mean caused by some parameter change occurs, the residuals assuming

the null hypothesis may be a poor approximation to the true errors. Hence, power issues

become a concern; see Pitarakis (2004) and Perron and Yamamoto (2019) who investigated

the power problem when structural changes are present both in the coefficients and in the

error variance but either ones are neglected; see also Hansen (2000).1

Perron, Yamamoto and Zhou (2020) provided a comprehensive treatment to test for

structural changes in both the coefficients and the variance of the errors in a linear regression

model. An important ingredient of their framework is that it jointly estimates structural

1Cavaliere and Taylor (2006, 2008) considered a wild bootstrap method for testing changes in the per-

sistence coefficient. Pein, Sieling, and Munk (2017) also considered structural change tests for both mean

and variance but restricted to occur at the same dates. Bock, Collilieux, Gullamon, Lebarbier and Pascal

(2020) used a model selection approach. Dalla, Giraitis and Robinson (2020) developed asymptotic theory

for a semi-parametric model with changing mean and variance, but did not proposed testing methods.

1

changes in the regression coefficients and error variance, allowing the break dates to be

different or overlap. Related to the topic of this paper, they proposed the sup3 test for

changes in the coefficients allowing for changes in the variance of the error term. Because

the changes in the coefficients are jointly accounted for in constructing the test, the variance

process can be consistently estimated even under the alternative hypothesis. This suggests

that substantial power gains can be expected. Even though, under the null hypothesis of

no change in coefficients, it does allow for a one-time change, it still has the correct size

asymptotically (and in finite samples). However, a potential drawback, compared to the

tests mentioned above, is that it accounts for heteroskedasticity only partially via a few

major abrupt changes. Hence, some size-distortions may be expected in the context of a

general class of variance profiles. This is something we shall address extensively.

In this paper, we carefully assess the size and power properties of the tests discussed

above. With respect to size, our findings are summarized as follows. First, the exact size of

the two-steps tests is close to nominal size. Second, although the size of the sup3 test

is not completely immune to general forms of variance profiles other than abrupt structural

changes, accounting for a few structural changes in the variance can flatten or detrend the

original variance process, thus considerably reduce the size distortions. Another way to state

this important fact is as follow. In many cases, the original sup test that ignores potential

changes in the variance of the errors is very little affected by various types of variance profiles

(e.g., increases early in the sample, periodic variations, etc.). By allowing just a few breaks

in variance (one or two in most cases), the sup3 test in a sense transforms the variance

profile into one that effectively has very little impact on the size of the test. It can either

completely flatten, partially flatten or detrend the variance process by rescaling the pre- and

post- break levels of variances. We provide some analytic explanations about this feature.

With respect to the power properties, we find that the two-steps methods can, in impor-

tant practical cases, partially or completely lose power. First, the two-steps methods often

suffer from a non-monotonic power problem, i.e., power going to zero as the magnitude of

the change under the alternative increases. This occurs most notably in dynamic models

with lagged dependent variables and in static models when a correction for serial correlation

in the errors is applied.2 Second, many of the two-steps methods have trivial power, i.e.,

2This issue can be traced back to Perron (1991) and further documented by Vogelsang (1999). Crainiceanu

and Vogelsang (2007) investigated the problem when a heteroskedasticity and autocorrelation consistent

variance is used. See Deng and Perron (2008) for some asymptotic analyses in the case of CUSUM tests,

Kim and Perron (2009) for the tests proposed by Andrews (1993) and Andrews and Ploberger (1994), and

Perron and Yamamoto (2016) for Elliott and Müller’s (2006) test.

2

power equal to or below size, when the regressors associated with changing coefficients have

zero-mean. This is because these tests simply look at the unconditional mean of the depen-

dent variable when estimating the variance profile since they do not account for parameter

variations in the coefficients. Hence, they are fooled and assign all the variance change in the

dependent variable to a change in the variance of the errors. Third, even when the two-steps

tests have power, they are generally less powerful than the sup3 test.

The rest of the paper is structured as follows. Section 2 presents the model, the hy-

potheses and conventional tests valid with constant variance as they are key ingredients of

the procedures to be discussed later; the and sup tests. We also discuss the

test proposed by Perron, Yamamoto and Zhou (2020). Section 3 provides a review of five

two-steps methods designed to be robust to heteroskedasticity. These are picked as repre-

sentative of this approach. In Section 4, we investigate the size of the tests via simulations,

followed by an analytic explanation about why the sup3 test can have good size even

when the variance process is not an abrupt structural change. In (almost) all cases covered,

the sup3 allowing for one (or sometimes two) changes in variance is remarkably robust

(exact size close to nominal size) to various forms of changes even if they are designed to

model abrupt changes. Section 5 investigates the power properties of the sup3 test

compared with those of the two-steps method. Both abrupt changes and random walk pa-

rameter variations are considered. Overall, the results show a clear power advantage for the

sup3 test. Section 6 provides brief concluding remarks with a discussion of the main

remaining contentious issues. An appendix contains some technical derivations.

2 Model, hypotheses, and conventional tests

We consider a linear regression model

= 0 + for = 1 , (1)

where is a scalar independent variable, is a -dimensional vector of regressors, is

a -dimensional vector of coefficients which are possibly time varying and is a scalar

error term that satisfies () = (|) = 0 for all . The goal is to test if structural

changes are present in the coefficients, so that the null hypothesis is 0 : = for all

versus the alternative hypothesis of 1 : 6= for some . In particular, we allow for

the unconditional variance of the error term to be time varying such that (2 ) = 2 . For

simplicity, we assume 2 to be a deterministic function. Note that the model above could

be generalized to allow partial structural changes, i.e., some of the elements of are not

3

subject to change. Since this has no impact at all on the message we wish to convey, we

simply keep the pure structural change model (1) for simplicity of exposition.

It is useful to first discuss two leading tests for structural changes, valid with constant

variance (2 = 2 for all ) as they are key ingredients of the procedures to be discussed

later. The first is the test proposed by Brown, Durbin, and Evans (1975):

= max1≤≤ −1P

=1

h( − ( ) )

√i22

where =P

=1 e and e = −0(P

=1 0)−1P

=1 are the OLS regression residuals

from estimating (1) under 0 and 2 is a consistent estimate of the long run variance of

(or 2 times the spectral density function at frequency zero of when it is stationary)

using the entire sample given by

2 = 0 + 2P−1

=1 ( ) (2)

where = −1P

=+1 ee− for = 0 −1, with (·) some weight function and a

bandwidth parameter whose exact choice varies across different papers; we shall specify the

exact form used when discussing the various tests considered. The asymptotic distribution

is given by the supremum of the squared Brownian bridge:

⇒ sup0≤≤1[ ()− (1)]2 (3)

where “⇒” denotes weak convergence under the Skorohod topology and () for ∈ [0 1] isthe standard Wiener process. The CUSUM test as proposed by Brown, Durbin, and Evans

(1975) uses recursive residuals. However, Ploberger and Kramer (1992) showed that using

OLS residuals instead of recursive residuals yields a valid test, though the limit distribution

under the null hypothesis is different, namely as stated in (3) with the Brownian Bridge

() − (1) instead of a scaled Wiener process. Since the papers we shall review use

the OLS based CUSUM, we only consider this version. The second test is the sup test

proposed by Quandt (1958, 1960) and further analyzed by Andrews (1993):

sup = 2£sup∈Λ log (

)− log

¤

where = [ ] with ∈ (0 1) a constant called the break fraction. As required, isrestricted to be in a subset of the [0 1] interval and we assume that ∈ Λ = [ 1 − ]

called a set of permissible break fractions with being a small positive constant, called a

4

trimming parameter. Throughout, we set = 015 although this does not affect any of the

qualitative results. The log-likelihood function under the alternative hypothesis is

log () = −(2)(log 2 + 1)− (2) log 2

with 2 = −1P

=1 2 where are the OLS residuals from estimating (1) assuming a

structural change in the coefficients at date . Under 0, the log-likelihood function is

log = −(2)(log 2 + 1)− (2) log e2with e2 = −1

P

=1 e2 where e are the OLS regression residuals assuming no change in thecoefficients (as previously defined). The asymptotic distribution of the test is

sup ⇒ sup∈Λ[(1)−(

)]0[(1)−()]

(1− ) (4)

where () is a -dimensional standard Wiener process. The asymptotic null distributions

of the and the sup tests are not valid when heteroskedasticity is present

in the errors. We first review in Section 3, the test proposed by Perron, Yamamoto, and

Zhou (2020) based on the likelihood approach and in Section 4 five different proposals using

variants of the CUSUM methodology.

3 Test jointly estimating structural changes in coefficients and in variance

Perron, Yamamoto, and Zhou (2020) (henceforth PYZ) recently provided a comprehensive

framework allowing various tests for multiple structural changes in the coefficients and vari-

ance of the errors in a linear regression model. An important ingredient of their approach

is that it jointly estimates structural changes in the regression coefficients and those in the

error variance, allowing for the break dates in the two components to be different or over-

lap. Hence, joint tests can be constructed. Here, the interest is in testing for changes in

the coefficients allowing for changes in the variance. Hence, the relevant test from PYZ is

their sup3 test. Let the coefficients be subject to structural changes at dates

1

with

= [ ] for = 1 (where [·] denotes the greatest lower integer)

and let the error variance 2 be subject to structural changes at dates 1

with

= [

]; the break dates in coefficients and variance can be different, the same, or over-

lap; the only restriction is some separation between the break dates; see below. To estimate

the break dates, the algorithm of Qu and Perron (2007), building on Bai and Perron (2003),

5

is used. The test is

sup3 = 2[ sup(1

;

1

)∈Λ

log (1

;

1

)− sup

(1 )∈Λ

log (1

)]

where the log-likelihood function under the alternative hypothesis is

log (1

;

1

) = −(2)(log 2 + 1)−

P+1

=1 [( −

−1)2)] log 2 (5)

where 2 = ( −

−1)−1P

= −1+1(−0)

2 for = 1 +1 with, for = 1 +1,

= (P

= −1+1(

0

2))

−1P = −1+1

(2)

The log-likelihood under the null hypothesis is

log (1

) = −(2)(log 2 + 1)−

P+1

=1 [( −

−1)2)] log e2 where e2 = (

− −1)

−1P= −1+1

( − 0e)2 for = 1 + 1 withe = (P

=1(0e2))−1P

=1(e2)the coefficient estimate assuming no structural change. We let 2 denote

2 if

−1 + 1 ≤

≤ , likewise for e2. The set Λ is the union of the permissible break fractions for the

coefficients and variance breaks and Λ that for the variance break fractions. Hence,

Λ = (1 1 ) ; for (1 ) = (1 ) ∪ (1 )|+1 − | ≥ ( = 1 − 1) 1 ≥ ≤ 1−

and Λ =©(1

) ;

¯+1 −

¯≥ ( = 1 − 1) 1 ≥ ≤ 1−

ªwith the

total number of breaks dates in coefficients and variance. Note that this test encompasses

the sup setting = 1 and = 0. PYZ showed that a bound on the limit distribution

is the asymptotic distribution in Bai and Perron (1998). Namely,

sup3 ⇒ sup(1

)∈Λ

X=1

||(+1)− +1(

)||2

+1(

+1 − )

≤ sup(1

)∈Λ

X=1

||(+1)− +1(

)||2

+1(

+1 − )

where

Λ =

©(1

) ; for(1 ) = (

1

) ∪ (01 0 )

|+1 − | ≥ ( = 1 − 1) 1 ≥ ≤ 1−

6

and Λ = (1 ) ; |+1− | ≥ ( = 1 − 1) 1 ≥ ≤ 1− . This impliesthat, theoretically, some conservative size distortions are present even asymptotically. They

show, however, that they are very minor and of no important consequences for inference.

Note that this test is designed to account for abrupt structural changes in the error variance,

not for general form of heteroskedasticity. We shall address this problem carefully later. In

anticipation of the results, it will be shown that the sup3 test is surprisingly robust to

a wide range of possible time variation in the variance 2 .

When allowing and correcting for serial correlation in the errors, we use the following

robust Wald type statistic: sup(1)∈Λ 3 ( | = 0 ), where

3 ( | = 0 ) = 00( ()0)−1 (6)

with = (01

0+1)

0 the QMLE of under a given partition of the sample, is the con-

ventional matrix such that ()0= (01−02 0−0+1) and () is an estimate of the co-

variance matrix of robust to serial correlation and heteroskedasticity, i.e., a consistent esti-

mate of () = plim→∞ ¡ 0

¢−1Ω

¡ 0

¢−1, whereΩ

= lim→∞( 0

0),

with = ¡1

+1

¢,

= ( −1+1

)0, = (

1

)0, = () and

= (), for 0−1 ≤ 0

( = 1 + 1). In practice, the computation of this

test can be very involved. Following Bai and Perron (1998), we first use the dynamic pro-

gramming algorithm to get the break points corresponding to the global maximizers of the

likelihood function (5), then plug the estimates into (6) to construct the test. This will not

affect the consistency of the test since the break fractions are consistently estimated.

4 Two-steps methods

We consider five methods recently proposed in this growing literature. They all take what

we call a two-steps approach. In the first step, they estimate the regression residuals under

the null hypothesis of no change in the coefficients. In the second step, they use these

residuals to construct a correction so that the asymptotic distribution of the test is valid

under heteroskedasticity. Some take an entire sample average to construct a test in which

the variance process is averaged out. In no way do we dispute their theoretical results and

the validity of their tests under the null hypothesis.

The work considered are the following. Zhou (2013) who proposed a wild bootstrap pro-

cedure for partial sums of the null residuals to mimic the heteroskedastic Wiener process;

Xu (2015) who used the null residuals to construct the time indicator for the heteroskedas-

tic Wiener process; Górecki, Horváth and Kokiszka (2018) who applied a Karhunen-Loeve

7

expansion to (the square of) the Brownian bridge with the kernels constructed using the

residuals under the null hypothesis. The last two papers use a two-steps approach to av-

erage the null residuals: Dalla, Giraitis and Phillips (2017) provided a simple U-statistic

in which the heteroskedastic null residuals are averaged out in the entire sample using a

long run variance estimate; Zhang and Wu (2019) proposed a U-statistic avoiding the use

of a long-run variance estimate using a Feasible Generalized Least Squares (FGLS) pro-

cedure. The estimate of the variance (when assuming serially uncorrelated errors) or the

long-run variance (when correcting for possible serial correlation) using the entire sample is

constructed by

2 =

⎧⎨⎩ 0 when no serial correlations are accounted for

0 + 2P−1

=1 ( ) when serial correlations are accounted for (7)

where = −1P

=+1 ee− for = 0 1 − 1, with e being the regression residualsunder the null hypothesis of no structural change in the coefficients. When it is constructed

using a subsample of e for = 1 , it is denoted by 2 . Unless otherwise stated, we usethe Quadratic Spectral kernel

() = [25(1222)] (sin(65)(65)− cos(65))

with the bandwidth selected via the (1) approximation of Andrews (1991), namely

= 13221£4e2(1− e)4¤15 , (8)

where e is the OLS estimate in the regression e = ee−1 + e. When the authors imposea specific choice for such as the correction of Andreou (2008) used by Xu (2015) and a

minimum of given by 12515 used by Zhang and Wu (2019), we incorporate them.

4.1 Zhou (2013)

Zhou (2013) considered a test statistic based on the test without estimating the

subsample long run variance, given by:

= max1≤≤ | − ( ) | √

where =P

=1 e. Its asymptotic distribution involves a heteroskedastic Wiener processand is approximated by replications of the bootstrapped statistic

()

( = 1 2 ):

()

= [ ( − + 1)]−12P

=1 (− ( ) )

()

8

for = 1 , where =P+−1

= e and () ∼ (0 1) being an external random

variable independent of all variables entering the construction of the test. We set the band-

width = 1 when no serial correlation in the errors is accounted for and the selection rule

(8) when it is accounted for. We then compute

= max+1≤≤−+1

¯

()− 1

− + 1

()

−+1

¯

for = 1 . With the ordered statistics (1) ≤ (2) ≤ · · · ≤ (), the -value is computed

by 1−∗ where ∗ = max : () ≤ .

4.2 Xu (2015)

Xu (2015) also proposed to use the test3

= max1≤≤ | − ( ) |

where =P

=1 e and is constructed by the square root of (7). The asymptotic

distribution is approximated by a -discrete steps heteroskedastic Wiener process ()(())

with = 0 1 2 1 such that

= max0≤≤1¯ ()(())− ( ) ()(1)

¯

with ()(()) = −12P[()]

=1 () where

() ∼ (0 1) for = 1 and the time

indicator is specified by () = 0 if = 0, () = 1 if = 1 and

() =³P

=1 e2´−1 ³P[]

=1 e2´ if 0 1

With the ordered statistics (1) ≤ (2) ≤ · · · ≤ (), the -value is computed by 1 − ∗

where ∗ = max : () ≤ . When serial correlation in the errors is accounted for, asXu (2015) suggested, we use = mine 085 with e obtained from the selection rule (8);

for more details, see Andreou (2008).

4.3 Górecki, Horváth and Kokiszka (2018)

Górecki, Horváth and Kokiszka (2018) proposed to use the statistic

= −1P

=1

h( − ( ) )

√i2

3Xu (2015) also proposed a version of the sup test, however, its properties are essentially the same

as the CUSUM test considered here.

9

where =P

=1 e. The asymptotic distribution is approximated by (the truncated versionof) the Karhunen-Loeve expansion of the squared Brownian bridgeR 1

=0[ ()− R 1

=0 ()]2 ≈P

=1 (() )

2

where () ∼ (0 1) is an external random variable. The kernel for = 1 is

obtained as the largest eigenvalues of a × matrix whose ( ) element is

( ) = min − ( ) − ( ) + ( )( )

where is constructed by the square root of (7) and min, , and are the subsample

estimates. Using draws for () ( = 1 ), the Brownian bridge is approximated via

replications of =P

=1 (() )2. With the ordered statistics (1) ≤ (2) ≤ · · · ≤ (), the

-value is computed by 1−∗ where ∗ = max : () ≤ .

4.4 Dalla, Giraitis and Phillips (2017)

Dalla, Giraitis and Phillips (2017) considered the same U-statistic as Górecki, Horváth and

Kokiszka (2018), except that they scale it by the full sample estimate 2 in order to cancel

the effect of the variance profile, so that

= −1P

=1

h( − ( ) )

√i22

where =P

=1 e and 2 is constructed by (7). The asymptotic distribution is given byR 1=0[ ()− R 1

=0 ()]2.

4.5 Zhang and Wu (2019)

Zhang and Wu (2019) considered a U-statistic given by:

= −2P

=+1

P

=+16= ( )0

where = e and = |− |, when no serial correlation is accounted for. When serial cor-relation in the errors is accounted for, the residuals are constructed by e = ∗ −∗0 , where

= (P

=2 ∗∗0 )−1(P

=2 ∗∗0 ) with Cochrane-Orcutt (1949) type transformed variables

∗ = − −1 and ∗ = − −1. In this case, is obtained as the OLS estimator

of an (1) model 4 applied to the residuals ∗ . Note that ∗ is obtained by accounting for

local coefficients variations using the specification:

∗ = − 00 − 0¡(0 − )(

35)¢1

4Note that one could use an () model with the lag order estimated via information criteria. Doing

so provides almost identical results, hence, we only report results with the (1) specification.

10

The test statistic is = , where

2 = 2−2P

=+1

P

=+16=[( )(0)]

2

is an estimate of the long-run variance of , whether or not serial correlation in the error

is accounted for. Then, asymptotically follows a standard normal distribution under

0. They recommend the bandwidth to be set at the minimum of the value given by

the rule (8) and 125 15 to avoid inflating it.

5 Simulation design

The simulations design consists of the specifications for the variance profiles, discussed in

Section 5.1, and the conditional mean regression model discussed in Section 5.2. Note that

the sup3 test depends on the prior specification of the number of changes in variance.

This is undesirable hence, for practical applications, we use the sup10 (1 + 1|1 )test of PYZ, which test the null hypothesis of breaks in variance versus the alternative

hypothesis of +1 breaks. The relevant limit distribution is presented in PYZ. We start with

= 1 and continue until a non-rejection occurs subject to a maximal value of = 3. Hence,

we basically search for the “best” number of breaks in variance within the set = 1 2 3 4.

This is enough for all cases considered. Typically, the selected value are = 1 or = 2.

Ignoring the value = 0 allows us to avoid problematic cases for which the test would not

reject when two breaks in variance are needed; see Bai and Perron (2006) for a discussion.

In any event, even if no break in variance is present and one allows a break, the size and

power of the sup3 test are unaffected; see Perron and Yamomoto (2019). This version

of the test is denoted by sup∗3 .

5.1 Variance profiles

To assess the finite sample size and power of the tests, the data are generated from model

(1) and the errors are drawn independently from a (0 1) multiplied by . We exper-

imented with many specifications for the temporal pattern of 2 , and choose the following

four functions as the most illustrative. They are sufficient to convey the relevant message.

Throughout, the parameter captures the magnitude of changes. The first case (VC1) has

a single structural change at date = [0 ], so that 2 = 1 for ≤ [0 ] and 2 = (1+ )2

for [0 ], where we set 0 = 03 and 075. The second case (VC2) considers smooth

changes in the form of a trigonometric function 2 = 2 sin( )2, where we set = 2

11

and 4. The third case (VC3) considers trending variance; the first growing linearly (VC3L)

so that 2 = ( ) and the second having a quadratic trend (VC3Q) with 2 = [( )]2.

The fourth case (VC4) has a short but large spike with length 5% of the sample so that the

variance is given by 2 = (1 + )2, for [(0 − 0025) ] ≤ ≤ [(0 + 0025) ], and 2 = 1,

otherwise. We set 0 = 03, 05, 07.

5.2 Regression models

We consider the following four linear regression models all specified by (1) with ∼(0 1). 1) The “Mean Model” uses = 1 for all and the tests are constructed

without accounting for serial correlations in (i.e., assuming correctly that the errors are

uncorrelated). 2) The “Mean Model (HAC)” also uses = 1 for all but the tests are

constructed allowing for potential serial correlation in the errors. They involve the long-run

(or HAC) variance estimate with the idiosyncratic modifications proposed by the authors

as stated in the previous section; for the sup∗3 test, we use Kejriwal’s (2009) hybrid

correction. The fact that there is no serial correlation in the true errors is inconsequential.

In fact the results reported below would simply be exacerbated (i.e., the power would be

even lower). 3) The “Zero-Mean Regressor” case sets = [1 ]0 where is drawn from

a standard normal distribution independent of and the coefficients are = [0 ]0. 4)

The “Dynamic Model” includes a lagged dependent variable in so that = [1 −1]0 and

= [ 0]0. In order to treat all methods on an equal basis, the tests are constructed for a

pure structural change setting, i.e., allowing all elements of to change. The sample size is

set to = 100, 300 for size and = 100 for power. The size and the power are computed

using a 5% nominal level based on 1,000 Monte Carlo replications. The number of bootstrap

repetitions is = 499 for the tests of Zhou (2013), Xu (2015), and Górecki, Horváth, and

Kokiszka (2018).

6 Size properties

For the size properties of the tests, we first present results from simulation experiments in

Section 6.1. These will show that all tests, including the sup∗3 , have exact size close to

the nominal 5% level. This may seem surprising for the sup∗3 since it is designed for

abrupt changes in variance. Some analytical explanations are provided in Section 6.2.

12

6.1 Simulation results

We first examine the finite sample size, focusing on the effect of structural changes in variance

on the sizes of the various tests. We also present the result for the conventional sup and

tests to assess whether a particular type of variance change causes size distortions

to tests that do not account for variance changes; i.e., to see if there is anything to correct

in practice. The results for a heteroskedasticity robust version of Xu’s (2015) test are also

presented. The results of the other two-steps tests are in line with Xu (2015) with the

exception of the test of Zhang and Wu (2019) which yields conservative size distortions in

almost all cases as mentioned in the original paper. Hence, these are not reported and the

results reported for Xu’s (2015) test should be viewed as representative of the five tests

robust to heteroskedasticity discussed in the previous section. We label it as .

Table 1 presents the exact size for nominal 5% size tests using the four regression spec-

ifications. Overall, we obtain similar results across the regression specifications. Hence, we

focus on the results for VC0 and highlight the main differences for the other model speci-

fication afterwards. When the variance is constant (VC0), all the tests have an exact size

close to the nominal level. This is also the case when the variance process follows a smooth

trigonometric function (VC2) for which the size distortions of the conventional tests are

negligible, meaning that there is nothing to correct. In fact, the sup test has size closer

to 5% in most cases along with the and . The same applies to VC1 with an

early sharp increase. These are all cases where the changes in variance cause no (or little)

size-distortions.

The effect of heteroskedasticity is more pronounced when the variance has an abrupt late

increase (VC1) with the break fraction at 075 or a trend (VC3), in which case the standard

tests have liberal size distortions. For the case of a spike (VC4) the tests are conservative but

still not far from the exact size of the test . The size improves somewhat when = 300.

For VC1, the size of the sup test becomes close to 30% when = 075, however, this

distortion is completely fixed when the sup∗3 test is used. This is expected as the latter

can account for the structural breaks in the error variance. For VC3L, the conventional

sup and tests show size distortions and they become worse with VC3Q.

What is interesting, and somewhat surprising at first sight, is that the sup∗3 test has

a good size in most cases even though the variance process is far from being one of abrupt

structural change. For VC3Q, some liberal size distortions are present with = 100, but

the size is near 5% when = 300. The size of the sup∗3 test is also closer to 5% (less

conservative distortions) when = 300.

13

When using a HAC variance estimate, the size distortions are considerably reduced for

sup∗3 for VC3Q. For VC4, the sup and the tests show conservative size

distortions unless the dynamic model is used. With a dynamic model, they show liberal size

distortions when = 100. However, these size is close to 5% when = 300. For the case of

a dynamic model with a lagged dependent variable, the size distortions for the sup are

important in all cases. The test tends to be conservative, except for a one-time

increase in variance late in the sample. However, the sup∗3 test again has an exact size

very close to 5%. For VC4, it has liberal size distortions when = 100. However, these size

is close to 5% when = 300. The test has an exact size close to nominal level, except

for VC4, in which case it is slightly conservative.

The bottom line is the following. The standard sup and tests can have

substantial size distortions, especially in models with a lagged dependent variable. The tests

designed to be robust to general forms of changes in variance have, in general, good size,

though they can be conservative in some cases. However, in (almost) all cases covered (and

others not reported here), the sup∗3 is remarkably robust (exact size close to nominal

size) to various forms of changes even if they are designed to model abrupt changes. The

next section provides some theoretical explanations for this curious feature.

6.2 Analytic explanations

This subsection provides an analytic explanation about the fact that the sup3 test

can provide good size even when the variance process does not consist of abrupt structural

changes. To this end, we focus on a single structural change in variance ( = 1). The

sup3 test estimates “the most likely” break date in variance through the maximum

likelihood method. In essence it minimizes the overall sum of squared residuals. If we

disregard the parameter uncertainty in the coefficients for simplicity, the quasi log-likelihood

with a single structural change in variance is

log () ∝ −[ log 21 + ( − ) log 22]

where 21 and 22 are the sample variances of the pre- and the post- break samples. Denote

their probability limits by 21 and 22, respectively. Then the probability limit of the log-

likelihood (scaled by −1) is such that

lim→∞ −1 log () ∝ − £ log 21 + (1− ) log 22

¤

The maximizer of this function, denoted by ∗, is what we refer to as “the most likely

variance break fraction”. This, of course, depends on the underlying variance profile. We

14

derive ∗ for the cases VC1 to VC4 in Appendix A. The log-likelihood of the sup3 test

when the model has variance 2 is equivalent to the log-likelihood of the sup test when

the model has variance ∗2 given by

∗2 =

⎧⎨⎩ 2 (221) for = 1 [∗ ]

2 (222) for = [∗ ] + 1

,

where 2 is the probability limit of the full-sample variance estimate. Then, ∗ can be con-

sidered as a modified variance profile; see Appendix B. Figure 2 shows the original variance

profile 2 with a solid line and the modified variance profile ∗2 with a broken line for VC1

to VC4, where we set = 10 although this does not affect the overall shapes. For VC1,

the modified variance profile is completely flattened, as expected for this case of a one-time

abrupt change. For VC2, the modified profile retains the same overall shape as the original

one, with only minor local changes. This is to be expected since the process is essentially

stationary, hence, no breaks are detected. It is the reason why the sup test is little

affected by such changes. For VC3, since the original variance process is monotonically in-

creasing, the variance profile of the post-break regime is divided by a larger value than the

variance profile of the pre-break regime. Hence, it effectively acts as a detrending device via

a rescaling of the levels of the variance profile. For VC4, the spike is flattened as the subpe-

riod which includes the spike is divided by a larger value. In sum, allowing for a single break

in variance can either completely flatten, partially flatten, or detrend the original variance

function by rescaling the pre- and post- beak variance profiles.

We now consider the asymptotic distributions of the test statistics. Define a continuous

analogue 2 with ∈ [0 1] to 2 for = 1 , that is 2 = lim→∞P[]

=1 2 . Note

that 2 depends on the parameter , which measures the magnitude of the variance changes.

Under the assumption that 2 is bounded, it is known that the asymptotic distribution of

the sup test is:

sup ⇒ sup∈Λ [ (())− (1)]2[(1− )]

where (()) is the time transformed Wiener process with () ≡ R 02

R 102; see

Hall and Heyde (1980) and Davidson (1994). Similarly, the asymptotic distribution of the

sup3 test is obtained simply replacing 2 with ∗2 (similarly defined using ∗ ):

sup3 ⇒ sup∈Λ [ (∗())− (1)]2[(1− )]

where ∗() ≡ R 0∗2

R 10∗2 . In Table 2, we evaluate the asymptotic size of the

sup test and the sup3 tests at the 5% nominal level using these distributions with

15

the Wiener process approximated by 5,000 discrete steps. The asymptotic size is very close

to the finite sample size presented in Table 1. They show that including a single break in

variance can drastically improve the size of the sup3 in all cases.

These results can help understand why accounting for one variance break can yield tests

with better size. Let us consider the difference between the maximands of the heteroskedastic

and homoskedastic asymptotic distributions given by:

∆() =£ (())− (1)2 − ()− (1)2¤ [(1− )]

=£ 2(())− 2 (()) (1)− 2() + 2 () (1)

¤[(1− )]

Using [ 2()] = and [ () (1)] = , the expected value conditional on is:

[∆()|] = [()− 2()− + 2()2][(1− )]

= (1− 2)[()− ][(1− )]

i.e., the distance between () and weighted by (1−2)[(1−)]. If we further assumethat is chosen at random under 0, e.g.,

∼ [0 1] for simplicity, we can compute the

unconditional counterpart, which acts as a measure of distance between the original and

transformed variance profiles: =R 1=0|(1− 2)[(1− )][()− ]|, or simply take the

dominating part =R 1=0|() − |, denoted by since it is akin to a Gini coefficient

for (). Table 3 reports the values of and for each variance profile. Clearly, the

larger and are, the more size distortion we observe in Table 2. These measures go a

long way toward getting an intuition about the effect of heteroskedasticity on the size of the

tests. In Figure 2, we illustrate , as the area between the fourty-five-degree line and the

heteroskedastic time scale (). This explains why the size distortions of the sup test

are small for VC2 as the area is narrow. It also shows that the larger the area, the more

severe the distortions become; e.g., VC1 (0 = 075), VC3, and VC4. More importantly,

the area shrinks with the modified profile and this illustrates how rescaling the pre- and the

post- break levels of the variance profile can reduce the size distortions.

7 Power properties

We now investigate via simulations the power properties of the sup∗3 test (henceforth

denoted by 3) and the two-steps methods outlined in Section 4: Zhou (2013; ),

Xu (2015; ), Górecki, Horváth and Kokiszka (2018; ), Dalla, Giraitis and Phillips

(2017; ), and Zhang and Wu (2019; ). We consider power against (1) coefficients

16

with a single structural change and (2) coefficients following a random walk. We include the

latter case because 3 is designed to detect abrupt structural changes, while the other

tests are agnostic about the nature of the changes. Hence, this will allow assessing how

robust 3 is to non-abrupt changes. We select three variance profiles: VC0, VC1, and

VC3L, as they cover the most important cases and are representative of the main features

that can arise. To enable us to make direct comparisons, we set the average variance levels

to be the same across specifications, that is, we set 2 = 1 for all for VC0; 2 = 05 for

≤ [05 ] and 2 = 15 for [05 ] for VC1 and 2 = 2( ) for VC3L. We evaluate the

power functions under the four regression models; (i) the Mean Model without accounting for

serial correlations in the errors, (ii) the Mean Model (HAC) accounting for serial correlations

in the errors, (iii) the Zero-Mean Regressor (without HAC), and (iv) the Dynamic Model

(with HAC). The HAC variance specifications follow the authors’ recommendations and, for

the sup∗3 test, we use Kejriwal’s (2009) hybrid correction.

7.1 Structural change in coefficients

We first consider model (1) with one structural change in coefficients at date 0 with a)

= for the Mean Model, b) = [0 ] for the Zero-Mean Regressor, and c) = [ 0]

for the Dynamic Model, where = 0 for ≤ 0, and = for ≥ 0, with 0 = [025 ],

without loss of generality at least qualitatively. The break magnitude varies from 0 to 10.

Figure 3.1 shows the power functions of the tests for VC0 (constant variance). Here,

all methods have a similar power function that quickly reaches one as increases. When

serial correlation in the errors is accounted for (Mean Model (HAC)), the power of all tests

initially increases as increases until around = 2. Then the power functions of all tests,

except 3 start decreasing as becomes larger and eventually reach zero (except for

, whose power eventually goes back up). This is the well-known non-monotonic power

problem, which has been documented before as stated earlier but seems to be ignored re-

peatedly. If the structural changes in the coefficients are neglected, they translate into level

shifts in the regression residuals. Then, as shown in Perron (1990), such level shifts inflate

the autocovariance estimates for the residuals. They also inflate the bandwidth used

when constructing the HAC variances as the persistence parameter is biased toward one

(e.g., Deng and Perron, 2008, among others). Some studies use a modified version. For

instance, Xu (2015) proposed using the suggestion of Andreou (2008) to avoid the large

bandwidth. However, it remains that autocovariance estimates are inflated resulting in a

power function that is non-monotonic. The same problem occurs when the regression model

17

includes a lagged dependent variable. This is again, following Perron (1989, 1990), because

the coefficient estimate of the lagged dependent variable is then biased toward one if struc-

tural changes are present. The estimated model is then akin to using first-differences of

the series, thereby transforming informative structural changes in the conditional mean into

outliers. An exception, is the test which does not show non-monotonic power in the

Mean Model (HAC) because it avoids the HAC variance estimate and uses FGLS. However,

it performs rather miserably when a lagged dependent variable is present. In contrast, 3

does not show any non-monotonic power as it accounts for structural changes in the mean

so that it obtains uncontaminated residuals. In all cases, the power increases to one rapidly

without any decrease as increases.

A case of interest that is particularly revealing of the properties of the tests is with a

Zero-Mean Regressor. Here, the change in the parameter causes a change in the variance of

the dependent variable even if no change in the variance of the errors occurs. Since the two-

steps type tests construct the quantities of interest using the residuals assuming no change,

they are fooled and assign all the variance change in the dependent variable to a change in

the variance of the errors. This results in no power at all; i.e., power equal to size. This holds

for all tests, except since it is essentially based on the regression scores . Hence,

in a sense, it has power mostly by luck since the presence of changes in coefficients that are

unaccounted for inflate the residuals. However, as discussed previously completely

fails when a lagged dependent variable is present for the same reasons as discussed above.

The 3 continues to have the highest power function, monotonic in .

Figures 3.2 presents the power functions for the case with VC1. We observe similar

patterns: non-monotonic power with the Mean Model (HAC) and with the Dynamic Model,

a trivial power with Zero-Mean Regressor as for VC0. For the Mean Model (HAC), all

methods have a very low power function, the only one increasing with being the 3 and

. When dealing with a model having a lagged dependent variable, all tests, except the

3, have non-monotonic power functions.

Finally, we consider a linearly trending variance process with the results reported in

Figure 3.3. They show that 3 still has the highest power, which is monotonic in all

cases. This is because it detrends or flattens the original variance profile as discussed in

Section 5.2. Again, a non-monotonic power with the Mean Model (HAC) and with the

Dynamic Model is observed for all two-step tests, the test having virtually no power

in the Mean Model (HAC). For the Zero-Mean Regressor case all two-steps tests have trivial

power, except for , which again has near zero power for the Dynamic Model.

18

7.2 Random walk coefficients

The previous subsection illustrates power advantage of 3 over the two-steps methods.

However, it may be suspected that the advantage comes from the particular pattern of

structural changes in the coefficients, i.e., abrupt ones to which the test is tailored. To

address this concern, we consider coefficients following a random walk, namely = −1+,

where ∼ (0 (√ )2). The models and variance profiles remain the same.

Figure 4.1 shows the power functions for VC0. The general shapes remain qualitatively

similar. With the Mean Model, all tests have similar power functions except that

has slightly lower power. The two-steps tests suffer from the non-monotonic power problem

with Mean Model (HAC) (except for ) and with the Dynamic Model. The power

of 3 does not reach one and shows some gradual power decline as increases because

the unaccounted coefficient variations can inflate the HAC variance and the persistence

parameter estimate. However, 3 still dominates the other tests. An exception is with

the Mean Model (HAC), in which case has higher power than 3 for large ,

though again its power is very low with the Dynamic Model. Figures 4.2-4.3 present the

power functions for VC1 and VC3L, respectively. The results are qualitatively the same.

Hence, overall, the 3 test continues to have a strong power advantage, except for the

Mean Model (HAC) for large values of the alternative, in which case the test is more

powerful, though still very deficient with a dynamic model. The fact that the 3 test

retains decent power should be expected since tests for an abrupt change in coefficients are

consistent against general parameter variations processes; e.g., Andrews (1993).

8 Conclusion

In this study, we carefully assessed the effect of accounting for heteroskedasticity on the size

and power of structural change tests in linear regression models. In particular, we focused on

the two-steps nature of some representative existing nonparametric methods, which obtain

regression residuals assuming no coefficient change, then use them to fully approximate the

heteroskedastic asymptotic distributions. These methods have good size. However, they

suffer from several deficiencies when it comes to power. First, most are prone to the non-

monotonic power problem discussed at length in the literature, when serial correlation in the

errors is accounted for or when the regressors include lagged dependent variables. Second,

most tests simply look at the unconditional mean of dependent variable when estimating

the variance profile, exactly because they do not account for parameter variations in the

19

conditional mean model. Hence, they only have trivial power when the regressors have zero-

mean. Third, even when the power is monotonically increasing, the tests are, in general, not

as powerful as the sup∗3 proposed by Perron, Yamamoto and Zhou (2020), who take a

different approach that accounts jointly for changes in coefficient and variance of the error.

The sup∗3 test of Perron, Yamamoto and Zhou (2020) has surprising properties.

First, although the size is not completely immune to general forms of heteroskedastic pat-

terns other than abrupt structural changes, accounting for a few structural changes in the

variance can flatten or detrend the original variance process, thus considerably reduce the

size distortions. Another way to state this important fact is as follow. In many cases, the

original sup test that ignores potential changes in the variance of the errors is very little

affected by various types of variance profiles (e.g., increases early in the sample, periodic

variations, etc.). By allowing just a few breaks in variance (one or two in most cases), the

sup∗3 test in a sense transforms the variance profile into one that effectively has very

little impact on the size of the test. It can either completely flatten, partially flatten or

detrend the variance process by rescaling the pre- and post- break levels of variances. We

also showed that the sup∗3 test is immune to the non-monotonic power problem (since

it treats changes in coefficients and variance jointly) and attains high power even when the

regressors have zero-mean. In general, it has the highest power.

It is clear that within the class of abrupt changes in coefficients and variance of the errors,

the sup∗3 have the correct asymptotic size and the best power. Some may argue that

since the sup∗3 test does not have precisely an asymptotic size of 5% for all types of

variances profiles (within some broader class) it is an invalid test from the start. This ignores

the general principles of hypothesis testing, which deals with a good balance between type I

(incorrectly rejecting when the null is true) and type II (incorrectly not rejecting when the

null is false) errors. By sticking to this philosophy of considering size only irrespective of

power, we end with tests that are only marginally better than throwing coins in important

cases of interest. Our argument is that the sup∗3 test involves some but very little size

distortions in general, and none when the changes in the variance of the errors are abrupt.

The other tests have the correct size in general but can fail drastically when it comes to

power. Hence, the trade-off should favor the sup∗3 test until something better is found.

20

Appendix A: Derivation of the “most likely” variance break fraction

In this Appendix, we derive the “most likely” variance break fraction under VC1 to VC4.As in the text, we ignore parameter uncertainty, for simplicity. The log-likelihood functionwith one variance break is then

() = −[ log 21 + ( − ) log 22]

We have lim→∞ 21 = ()−1R 02 and lim→∞ 22 = (1 − )−1

R 12, which can

easily be computed given a specific variance profile. For VC1, they are 21 and 22 and

∗ = 0. For VC2, after some algebra, we obtain

lim→∞ 21 = 2()−1R 0sin2()

=2

2− 2

4sin(2)

and

lim→∞ 22 = 2(1− )−1R 1sin2()

=2

2− 2

4(1− )sin(2)− sin(2)

where 2 =R 102. Then, the probability limit of (minus) the log-likelihood becomes

lim→∞¡−−1¢ = log

∙2 − 2

2sin(2)

¸+(1− ) log

∙2 − 2

2(1− )sin(2)− sin(2)

¸

Although this is not globally concave in , the numerical solutions are ∗ = 0025 and0975. We present results for ∗ = 0025 in Figure 1 (using ∗ = 0975 provides essentiallythe same results). For VC3L, the limit of the sample variances are

lim→∞ 21 = 2()−1R 0 = 22

lim→∞ 22 = 2(1− )−1R 1 = 2(1 + )2

For VC3Q, we have:

lim→∞ 21 = 2()−1R 02 = 223

lim→∞ 22 = 2(1− )−1R 12 = 2(1 + + 2)3

Hence, the limit of the log-likelihood for VC3L is,

lim→∞¡−−1¢ = log

¡22

¢+ (1− ) log

¡2(1 + )2

¢

21

and for VC3Q,

lim→∞¡−−1¢ = log

¡223

¢+ (1− ) log

¡2(1 + + 2)3

¢

Hence, for VC3L, ∗ = 0225 and for VC3Q, ∗ = 0285. For VC4, the most likely breakfraction is chosen at the end of the spike when 0 05 and at the beginning of the spikewhen 0 05. When 0 = 05, either ∗ = 0 + 0025 or ∗ = 0 − 0025 canmaximize the probability limit of the log-likelihood. Hence, when reporting results, we use∗ = 0 + 0025, although the results with ∗ = 0 − 0025 are essentially the same.

Appendix B: Derivation of the modified variance profiles

We show that the log-likelihood of a linear model with variance profile 2 which accountsfor one variance break is asymptotically the same as the log-likelihood of the linear modelwith variance profile ∗2 which accounts for no variance break. Let the error term be = where () = 1. Again, for simplicity, we ignore the uncertainty due to the estimationof the coefficient parameters. Then, we can simply consider the log-likelihood under thenull hypothesis of no coefficient change. If we omit asymptotically negligible terms, thelog-likelihood accounting for one variance break (see the proof of Theorem 1 (c) of PYZ) is

() = − (e2121)− ( − )(e2222)

= −−21P

=1 2 − −22

P

= +1 2

= −P

=1

¡2

21

¢2 −

P

= +1

¡2

22

¢2 (A.1)

Similarly, the log-likelihood accounting for no variance break but with ∗ = ∗ is

= − (e22)= −2

P

=1 ∗2

= −P

=1

¡∗2

2¢2 (A.2)

Hence, (A.1), evaluated at = [∗ ], and (A.2) yield

∗22

=

⎧⎨⎩ (221) for = 1 [∗ ]

(222) for = [

∗ ] + 1

or

∗2 =

⎧⎨⎩ 2 (221) for = 1 [∗ ]

2 (222) for = [∗ ] + 1

.

22

References

Andreou, E. (2008) “Restoring monotone power in CUSUM test,” Economics Letters, 98,48-58.

Andrews, D.W.K. (1991) “Heteroskedasticity and autocorrelation consistent covariance ma-trix estimation,” Econometrica, 59, 817-858.

Andrews, D.W.K. (1993) “Tests for parameter instability and structural change with un-known change point,” Econometrica, 61, 821-856.

Andrews, D.W.K and W. Ploberger (1994) “Optimal tests when a nuisance parameter ispresent only under the alternative,” Econometrica, 62, 1383-1414.

Bai, J. and P. Perron (1998) “Estimating and testing linear models with multiple structuralchanges,” Econometrica, 66, 47-78.

Bai, J. and P. Perron (2003) “Computation and analysis of multiple structural change mod-els,” Journal of Applied Econometrics, 18, 1-22.

Bai, J. and P. Perron (2006) “Multiple structural change models: a simulation analysis,” inEconometric Theory and Practice: Frontiers of Analysis and Applied Research, D. Corbae,S. Durlauf and B.E. Hansen (eds.), Cambridge University Press: Cambridge: 212-237.

Bock, O., X. Collilieux, F. Guillamon, E. Lebarbier, and C. Pascal (2020) “A breakpointdetection in the mean model with heterogenous variance on fixed time-intervals,” Statisticsand Computing, 30, 195-207.

Brown, R.L., J. Durbin, and J.M. Evans (1975) “Techniques for testing the constancy ofregression relationships over time,” Journal of the Royal Statistical Society B, 37, 149-163.

Cavaliere, G. and A.M.R. Taylor (2006) “Testing for a change in persistence in the presenceof a volatility shift,” Oxford Bulletin of Economics and Statistics, 68, 761-781.

Cavaliere, G. and A.M.R. Taylor (2008) “Testing for a change in persistence in the presenceof non-stationary volatility,” Journal of Econometrics, 147, 84-98.

Cochrane, D. and G.H. Orcutt (1949) “Applications of least squares regressions to relation-ships containing autocorrelated error terms,” Journal of the American Statistical Association,44, 32-61.

Crainiceanu, C.M. and T.J. Vogelsang (2007) “Nonmonotonic power for tests of mean shiftin a time series,” Journal of Statistical Computation and Simulation, 77, 457-476.

Dalla, V., L. Giraitis, and P.C.B. Phillips (2017) “Testing mean stability of a heteroskedastictime series,” Unpublished manuscript, Yale University.

Dalla, V., L. Giraitis, and P.M. Robinson (2020) “Asymptotic theory for time series withchanging mean and variance,” forthcoming in Journal of Econometrics.

23

Davidson, J. (1994) Stochastic Limit Theory: An Introduction for Econometricians. Oxford:Oxford University Press.

Deng, A. and P. Perron (2008) “The limit distribution of the CUSUM of squares test undergeneral mixing conditions,” Econometric Theory, 24, 809-822.

Elliott, G. and U.K. Müller (2006) “Efficient tests for general persistent time variation inregression coefficients,” The Review of Economic Studies 73, 907-940.

Górecki, T., L. Horváth, and P. Kokoszka (2018) “Change point detection in heteroscedastictime series,” Econometrics and Statistics, 7, 63-88.

Hall, P. and C.C. Hyde (1980) Martingale Limit Theory and its Applications. New York:Academic Press.

Hansen, B.E. (2000) “Testing for structural change in conditional models,” Journal of Econo-metrics, 97, 93-115.

Kejriwal, M. (2009) “Tests for a mean shift with good size and monotonic power,” EconomicsLetters, 102, 78-82.

Kim, D. and P. Perron (2009) “Assessing the relative power of structural break tests using aframework based on the approximate Bahadur slope,” Journal of Econometrics, 149, 26-51.

Pein, F., H. Sieling, and A. Munk (2017) “Heterogeneous change point inference,” Journalof Royal Statistical Society Series B, 79, 1207-1227.

Perron, P. (1989) “The great crash, the oil price shock and the unit root hypothesis,” Econo-metrica, 57, 1361-1401.

Perron, P. (1990) “Testing for a unit root in a time series with a changing mean,” Journalof Business and Economic Statistics, 8, 153-162.

Perron, P. (1991) “A test for changes in a polynomial trend function for a dynamic timeseries,” Econometric Research Program Research Memorandum No. 363, Princeton Univer-sity. Reprinted in Time Series Econometrics: Volume 2: Structural Change, (Perron, P.,ed.), World Scientific, 2019, 1-65.

Perron, P. and Y. Yamamoto (2016) “On the usefulness or lack thereof of optimality criteriafor structural change tests,” Econometric Reviews, 35(5), pp 782-844.

Perron, P. and Y. Yamamoto (2019) “Pitfalls of two steps testing for changes in the errorvariance and coefficients of a linear regression models,” Econometrics, 7, 22.

Perron, P., Y. Yamamoto, and J. Zhou (2020) “Testing jointly for structural changes in theerror variance and coefficients of a linear regression model,” Quantitative Economics, 11,1019-1057.

Pitarakis, J.-Y. (2004) “Least-squares estimation and tests of breaks in mean and varianceunder misspecification,” Econometrics Journal, 7, 32-54.

24

Ploberger, W. andW. Krämer (1992) “The CUSUM test with OLS residuals,” Econometrica,60, 271-285.

Qu, Z. and P. Perron (2007) “Estimating and testing multiple structural changes in multi-variate regressions,” Econometrica, 75, 459-502.

Quandt, R.E. (1958) “The estimation of the parameters of a linear regression system obeyingtwo separate regimes,” Journal of the American Statistical Association, 53, 873-880.

Quandt, R.E. (1960) “Tests of the hypothesis that a linear regression system obeys twoseparate regimes,” Journal of the American Statistical Association, 55, 324-330.

Vogelsang, T.J. (1999) “Sources of nonmonotonic power when testing for a shift in mean ofa dynamic time series,” Journal of Econometrics, 88, 283-299.

Xu, K.-L. (2015) “Testing for structural change under non-stationary variances,” Economet-rics Journal, 18, 274-305.

Zhang, E. and J. Wu (2019) “Testing for structural changes in linear regressions with time-varying variance,” forthcoming in Communication in Statistics-Theory and Methods.

Zhou, Z. (2013) “Heteroscedasticity and autocorrelation robust structural change detection,”Journal of American Statistical Association, 108, 726-740.

25

Table 1: Exact size of the structural change tests under heteroskedasticity

a) Mean Model

= 100 = 300

supLR supLR∗3 CUSUM XU supLR supLR∗3 CUSUM XU

VC0 - 0 5.0 4.0 3.8 6.4 5.0 4.1 4.5 5.3

VC1 0=0.3 5 6.6 3.7 6.1 6.4 6.6 2.3 5.8 6.0

10 5.7 4.5 4.1 5.7 6.4 2.9 5.8 6.1

0=0.75 5 27.5 4.6 16.8 7.0 25.9 3.9 15.6 5.9

10 26.2 5.7 16.3 5.4 27.7 3.7 18.9 5.8

VC2 = 2 5 4.8 1.8 3.1 5.6 5.1 2.8 4.4 5.1

10 4.3 1.9 3.4 5.6 5.1 2.5 4.3 5.4

= 4 5 3.8 3.5 1.7 2.9 4.7 2.4 3.1 5.1

10 5.8 4.0 3.7 5.4 5.4 2.5 4.1 5.3

VC3L linear 5 8.6 3.9 5.7 6.1 10.1 3.4 6.6 5.7

10 8.1 4.1 4.5 4.8 10.0 3.6 6.6 5.8

VC3Q quadratic 5 13.6 10.5 7.4 4.4 13.8 5.2 9.8 5.6

10 14.3 9.7 8.5 5.9 15.0 5.3 10.6 5.4

VC4 0=0.3 5 3.3 1.8 2.7 3.8 4.8 2.2 4.7 4.5

10 4.2 0.6 3.9 4.0 7.1 3.9 6.8 3.9

0=0.5 5 1.7 1.7 2.5 5.9 2.2 3.2 3.4 4.9

10 0.3 1.6 0.7 2.4 1.8 4.0 3.1 4.8

0=0.7 5 3.1 1.0 2.8 4.0 4.3 2.1 4.4 4.0

10 2.9 0.9 2.9 3.0 7.5 3.7 7.3 4.5

b) Mean Model (HAC)

= 100 = 300


VC0 - 0 6.9 5.7 3.8 6.2 5.5 5.5 3.9 6.2

VC1 0= 03 5 9.3 3.2 4.6 5.8 7.0 2.1 5.7 5.8

10 8.3 2.8 3.2 4.8 6.8 2.0 5.5 4.8

0= 075 5 30.8 9.1 15.0 4.9 27.0 7.0 15.7 4.9

10 31.6 7.9 15.2 3.3 30.3 8.5 19.6 3.3

VC2 = 2 5 6.4 2.1 3.4 5.0 6.2 2.0 4.3 5.0

10 5.7 2.0 2.7 5.0 5.6 3.0 4.2 5.0

= 4 5 6.3 4.9 2.2 3.5 5.5 3.8 3.1 3.5

10 6.4 5.6 2.2 4.4 6.5 3.9 4.3 4.4

VC3L linear 5 9.6 6.9 3.6 3.9 9.5 4.8 6.2 3.9

10 10.6 6.2 5.0 5.3 11.6 4.4 7.2 5.3

VC3Q quadratic 5 15.7 7.5 7.0 3.9 14.3 5.6 8.2 3.9

10 16.9 7.1 7.6 5.0 16.3 6.6 10.2 5.0

VC4 0= 03 5 3.2 1.6 1.7 2.7 4.4 1.3 3.5 2.7

10 3.1 0.6 2.1 1.7 5.5 0.9 4.1 1.7

0= 05 5 2.8 0.9 1.7 4.4 2.9 1.5 3.6 4.4

10 1.1 0.2 1.6 4.0 1.0 1.0 1.4 4.0

0= 07 5 4.1 1.4 2.3 2.8 4.3 1.9 3.9 2.8

10 2.7 0.7 1.7 1.8 5.9 1.3 4.8 1.8

Table 1 (cont’d): Exact size of the structural change tests under heteroskedasticity

c) Zero-Mean Regressor

= 100 = 300


VC0 - 0 3.9 5.4 4.1 5.6 5.1 4.1 3.1 4.1

VC1 0= 03 5 7.3 4.9 4.6 6.2 7.4 2.9 4.0 4.2

10 8.5 7.1 4.7 6.0 7.5 3.8 4.9 6.0

0= 075 5 35.6 5.8 15.3 5.8 37.0 3.8 15.8 4.1

10 39.2 6.7 16.0 5.3 40.9 5.8 19.3 4.9

VC2 = 2 5 5.4 4.8 4.3 5.7 5.5 2.1 4.0 4.4

10 5.3 2.5 4.2 5.9 5.2 3.1 4.4 5.8

= 4 5 5.6 4.6 2.5 4.8 5.4 2.1 3.7 4.5

10 7.2 5.6 4.2 5.7 6.6 3.4 4.5 5.5

VC3L linear 5 8.7 7.5 3.1 3.4 11.4 4.0 5.1 4.3

10 11.0 6.8 4.2 4.9 12.4 4.8 7.0 6.3

VC3Q quadratic 5 18.3 15.5 6.0 3.4 18.8 7.7 8.6 4.8

10 19.5 17.0 7.8 5.2 21.0 9.9 9.2 5.5

VC4 0= 03 5 4.9 2.6 3.5 4.6 6.9 1.7 5.8 5.4

10 7.1 1.4 2.7 2.6 11.5 4.3 7.5 4.8

0= 05 5 1.9 1.4 2.8 5.3 1.9 2.5 3.0 5.1

10 0.9 3.0 0.8 2.9 1.3 4.3 3.1 4.4

0= 07 5 5.9 1.7 2.4 3.9 5.3 2.4 4.4 4.4

10 7.0 1.8 3.1 2.9 10.9 4.0 7.5 5.2

d) Dynamic Model

= 100 = 300


VC0 - 0 3.8 3.4 2.5 4.1 5.2 4.8 4.7 5.8

VC1 0= 03 5 10.2 3.5 4.4 5.1 11.4 2.6 5.5 5.4

10 9.7 5.2 4.4 5.4 13.7 3.3 5.1 4.6

0= 075 5 37.7 4.6 13.8 4.8 40.6 3.1 17.0 6.0

10 46.7 4.8 18.7 5.7 50.7 3.9 18.4 4.2

VC2 = 2 5 8.7 3.8 2.9 4.7 11.6 4.7 3.9 5.7

10 7.0 4.6 2.5 4.3 12.0 3.7 3.8 4.0

= 4 5 7.0 4.6 2.1 3.6 11.0 5.0 4.3 5.7

10 9.0 5.6 2.2 4.3 10.7 5.0 4.8 5.4

VC3L linear 5 8.0 4.6 3.5 3.8 10.8 3.4 6.2 5.7

10 11.1 4.7 4.3 4.3 11.7 3.8 6.6 5.3

VC3Q quadratic 5 13.7 9.1 6.3 4.3 17.8 7.3 10.2 5.7

10 17.8 9.1 7.3 3.8 17.4 5.9 9.2 5.1

VC4 0= 03 5 38.3 8.5 2.6 3.1 56.4 7.4 4.2 3.9

10 63.2 14.1 1.0 1.3 88.8 6.7 5.5 3.0

0= 05 5 25.5 6.2 1.6 3.7 51.2 6.0 2.8 4.7

10 58.3 11.6 0.8 2.0 85.7 5.0 1.7 3.1

0= 07 5 25.6 4.3 2.0 3.2 51.5 5.9 3.9 3.7

10 54.7 5.1 0.8 1.0 85.1 6.9 5.6 3.2

Figure 1: The original and the modified variance profiles after accounting for one break

VC1 0 = 03 0 = 075

0

20

40

60

80

100

120

140

originalmodified

0

20

40

60

80

100

120

140

VC2 = 2 = 4

VC3 Linear Quadratic

VC4 0 = 03 0 = 05

Table 2: Asymptotic size of the sup and sup3 tests

(%)

sup sup3 1 5 10 1 5 10

VC0 5.1 3.5

VC1 0 = 03 6.6 6.4 6.3 3.6 3.2 2.9

0 = 075 13.8 29.0 29.4 3.3 4.1 2.3

VC2 = 2 5.6 4.6 6.6 5.6 4.7 6.5

= 4 6.0 5.1 5.3 5.7 5.1 4.7

VC3L 10.1 12.3 9.6 6.3 7.4 6.2

VC3Q 16.3 15.4 16.7 8.8 8.1 9.2

VC4 0 = 03 6.1 8.7 15.0 4.4 3.6 4.5

0 = 05 4.8 4.1 4.3 4.3 2.7 2.1

0 = 07 3.5 9.3 12.4 2.9 2.8 4.6

Table 3: Measures of size distortions

Measure Measure

sup sup3 sup sup3 1 5 10 1 5 10 1 5 10 1 5 10

VC0 - 0.00 0.00 0.00 0.00

VC1 0 = 03 0.63 0.89 0.91 0.00 0.00 0.00 0.37 0.52 0.54 0.00 0.00 0.00

0 = 075 1.03 2.15 2.32 0.00 0.00 0.00 0.58 1.21 1.30 0.00 0.00 0.00

VC2 = 2 0.27 0.27 0.27 0.25 0.25 0.25 0.08 0.08 0.08 0.08 0.08 0.08

= 4 0.15 0.15 0.15 0.17 0.17 0.17 0.14 0.14 0.14 0.14 0.14 0.14

VC3L 1.00 1.00 1.00 0.39 0.39 0.39 0.60 0.60 0.60 0.24 0.24 0.24

VC3Q 1.50 1.50 1.50 0.66 0.66 0.66 0.90 0.90 0.90 0.35 0.35 0.35

VC 4 0 = 03 0.20 0.95 1.28 0.14 0.36 0.41 0.14 0.67 0.90 0.04 0.12 0.13

0 = 05 0.18 0.88 1.18 0.15 0.53 0.64 0.11 0.56 0.75 0.10 0.36 0.43

0 = 07 0.20 0.95 1.28 0.14 0.36 0.41 0.14 0.67 0.90 0.04 0.12 0.13

Figure 2: Measure of size distortions

VC1

0 = 03 0 = 075 original modified

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0 original modified

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

VC2

= 2 = 4 original modified

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

original modified

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

VC3

Linear Quadratic original modified

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

original modified

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

VC4

0 = 03 0 = 07

original modified

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

original modified

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 3.1: Power of structural change tests

when the coefficients have a single abrupt break; VC0

Mean Model Mean Model (HAC)

Zero-Mean Regressor Dynamic Model

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

SLR3 ZRB

GHK XU

VS ZWU

(c)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 100.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10


when the coefficients have a single abrupt break; VC1



0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

SLR3 ZRB

GHK XU

VS ZWU

(c)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10


when the coefficients have a single abrupt break; VC3L



0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 4 8 12 16 20 24 28 32 36 40

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 4 8 12 16 20 24 28 32 36 40

SLR3 ZRB

GHK XU

VS ZWU

(c)0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 4 8 12 16 20 24 28 32 36 40

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 4 8 12 16 20 24 28 32 36 40


when the coefficients follow a random walk; VC0



0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

SLR3 ZRB

GHK XU

VS ZWU

(c)0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 100.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10


when the coefficients follow a random walk; VC1



0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

SLR3 ZRB

GHK XU

VS ZWU

(c)0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10


when the coefficients follow a random walk; VC3L



0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 4 8 12 16 20 24 28 32 36 40

SLR3 ZRB

GHK XU

VS ZWU

(c)0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 4 8 12 16 20 24 28 32 36 40

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 4 8 12 16 20 24 28 32 36 400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 4 8 12 16 20 24 28 32 36 40

structural change tests under heteroskedasticity: joint

Documents