predictive regressions: method of least autocorrelationwise.xmu.edu.cn/seta2006/download/oral...

38
Predictive Regressions: Method of Least Autocorrelation Yan Liu Emory University Qi Zhu Emory University February 1, 2006 Abstract Conventional predictive regressions produce biased and inefficient coefficient estimates in small samples when the predicting variable is Gaussian first order persistent and its innovations are highly correlated with the error series of the return. We propose a new estimation method (the method of least-autocorrelation) to solve this problem, conditional on the assumption of serially uncorrelated innovations to the dependent variable. The simulation results demonstrate that our method produces more accurate and efficient point estimate of the true parameter, and can significantly improve performance in the power and size tests compared with least-square based methods in Stambaugh(1999), Lewellen(2004), and Amihud and Hurvich(2004). Our empirical results provide some fresh evidence on the predictability of stock returns during the post-war time period. JEL classification: C12 C13 C32 G12 Key Words: Predictability, Method of Least Autocorrelation, Hypothesis Testing * Emory Unversity, Atlanta, GA 30233; [email protected] Emory Unversity, Atlanta, GA 30233; [email protected] 1

Upload: others

Post on 18-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Predictive Regressions:

Method of Least Autocorrelation

Yan Liu∗

Emory University

Qi Zhu†

Emory University

February 1, 2006

Abstract

Conventional predictive regressions produce biased and inefficient coefficient estimates in small

samples when the predicting variable is Gaussian first order persistent and its innovations are

highly correlated with the error series of the return. We propose a new estimation method (the

method of least-autocorrelation) to solve this problem, conditional on the assumption of serially

uncorrelated innovations to the dependent variable. The simulation results demonstrate that

our method produces more accurate and efficient point estimate of the true parameter, and can

significantly improve performance in the power and size tests compared with least-square based

methods in Stambaugh(1999), Lewellen(2004), and Amihud and Hurvich(2004). Our empirical

results provide some fresh evidence on the predictability of stock returns during the post-war time

period.

JEL classification: C12 C13 C32 G12

Key Words: Predictability, Method of Least Autocorrelation, Hypothesis Testing

∗Emory Unversity, Atlanta, GA 30233; [email protected]†Emory Unversity, Atlanta, GA 30233; [email protected]

1

Page 2: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

1 Introduction

The predictability of stock returns has been a central research topic in finance and economics

for at least five decades since Kendall’s (1953). Empirical studies by Fama and French (1988)

and others1 document the predictability of stock returns using various lagged predictive financial

variables, including dividend yield, the earning-price ratio, the book-to-market ratio, and interest

rates. The tests of predictability are assessed typically and predominantly in the context of a

predictive regression model in which rates of return are regressed against the lagged values of

a stochastic explanatory (a.k.a. forecasting, predictive) variable. However, existing estimation

methods produce biased and inefficient coefficient estimates in small samples when the predictive

variable is Gaussian first order autoregressive and its innovations are highly correlated with the

error series of the return. This paper addresses this problem by proposing a new estimation

method that obtains more efficient estimates with reduced bias.

Mankiw and Shapiro (1986) and Stambaugh (1986) first discern the econometric difficulties

inherent in predictive regressions when the regressor is Gaussian first-order autoregressive with

errors that are correlated with the error series of the dependent return variable. Nelson and

Kim (1993) show that in this case OLS estimates of the slope coefficient and its standard errors

are substantially biased, which lead to an unreliable t-ratio for valid inferences in finite sample.

That’s why conventional tests for the predictability of stock returns based on standard t-test from

OLS tend to overreject the null of non-predictability in the Monte-Carlo simulations. This prob-

lem in predictive regressions is pervasive in the empirical work, since many of predictive financial

variables are highly persistent and not really exogenous but lagged endogenous (predetermined),

having disturbances co-varying with the disturbances from the regression. Such simultaneity prob-

lem in the system of a univariate model for the forecasting variable and the predictive regression

model violates the Gauss-Markov theorem on which OLS estimation relies to deliver unbiased and

efficient estimates.

A rather large literature on the stock return predictability attempt to resolve this econometric

problem in efficiently estimating and making valid inferences on the coefficient of the lagged

predictive variable in the regression. They usually tackle the problem along two fronts.

One approach is to correct the bias of the OLS estimate, using information convoyed by the

autoregressive process of the predictive variable. Kothari and Shanken (1997) and Stambaugh

(1999) derive a bias expression for the OLS estimator of slope coefficient based on Kendall’s (1954)

first-order bias-correction expression for autocorrelation parameter in AR(1) model. Amihud and

1Keim and Stambaugh (1986), Campbell and Shiller (1988), Cutler, Poterba, and Summers (1991), Balvers,

Cosmano, and McDonald (1990), Schwert (1990), and Fama(1990), Kothari and Shanken (1997).

2

Page 3: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Hurvich (2004) propose a two stage augmented regression method implementing a second-order

bias-correction. Lewellen (2004) estimates and corrects the slope coefficient and its t-statistics by

assuming a true autoregressive coefficient in AR(1) close to one.

The other approach mainly focuses on the statistical inference of the regression coefficient. Nel-

son and Kim (1993) conclude that valid inferences cannot be drawn from conventional t tables.

Hodrik (1992) finds substantial bias in test statistics in long-horizon forecasting in Monte-Carlo

experiments. Kothari and Shanken (1997) employs the nonparametric bootstrapping procedure

to estimate the standard error of the bias-corrected coefficient estimate. Stambaugh (1999) de-

rives the exact small-sample Bayesian posterior distributions for the regression parameters under

different specifications of priors. Lewellen’s (2004) conditional approach does a good job in power

and size tests when true autoregressive process of forecasting variable is close to unit root. Ami-

hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure based on augmented

regression method. Polk, Thompson and Vuolteenaho (2005) introduce neural network to ap-

proximate the critical value of t-stat in predictive regressions. Finally, Campbell and Yogo (2005)

develop a pretest to determine whether the conventional t-test leads to invalid inference and a new

Bonferroni test. It is noteworthy that the hypothesis testing procedures surveyed above almost

universally rely on the biased or bias-corrected OLS estimation of the regression coefficient2.

In light of these econometric problems arisen from typical OLS estimation, we propose a

new and convenient method (or the Method of Least Autocorrelation, MLA henceforth) with

no reliance on the estimation of any unknown nuisance parameters. The method produces a

more accurate and efficient estimate of the true slope coefficient conditional on one (and only

one!) commonly maintained assumption in the literature, serially uncorrelated innovations in the

predictive regression.

As a matter of fact, the assumption of i.i.d. innovations has always been, either explicitly

or implicitly, maintained in the predictive regression literature surveyed above.3 Our estimation

method directly utilizes this assumption to describe the zero autocorrelated true error series in

the predictive regression as the population autocorrelation condition (alike a population moment

condition in GMM framework) and takes the autocorrelation of the estimated residual series

as the sample autocorrelation condition, a sample counterpart of the population autocorrelation

condition. Analog to the method of moments estimators, the parameter of interest is estimated by

matching the sample moment, or sample autocorrelation condition in our case, to its corresponding

2OLS estimates of coefficient in predictive regression are also widely used in literature on out-of-sample fore-

casting in Goyal and Welch (2003a, 2003b)3Independently and identically distributed (i.i.d.) series is a sufficient condition for the series to be Non-

autocorrelated.

3

Page 4: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

population moment, or population autocorrelation condition here. Our estimator generates a

model where the autocorrelations of the fitted residuals are as close as possible to those of the true

innovations, which are all zero under the i.i.d. assumption. There are two potential motivations

for this approach.

First, unlike previous methods built on least-square estimator, our approach does not rely on

the problematic estimation of unknown nuisance parameter in the autoregressive process of the

predictive variable to sequentially correct for the biased slope coefficient in the regression. The

information convoyed in the time series properties of predictive variables is now irrelevant to the

estimation of parameter. Thus we can avoid the dilemma of correcting a bias by using another

biased estimate of unknown nuisance parameter as described above.

The second motivation arises when we particularly deal with models such as predictive re-

gressions. The i.i.d. assumption of innovations to the stock returns is maintained not only for

the statistical convenience in estimation but also for some fundamental belief or perception about

financial market underlying the model specifications. From a statistical perspective, the violation

of zero serial correlation of disturbances would pose additional estimation difficulty to the already

staggering OLS estimate in the predictive regressions. Though it remains unclear to what extent

the absence of the i.i.d. disturbances assumption can deteriorate the OLS estimation, for the sake

of both methods being valid, we would rather follow the conventional model setup in a large body

of preceding literatures about predictive regressions.4 Maybe more importantly, from a model

specification perspective, the i.i.d. innovations guarantee the predictive regression model is well-

specified such that the predictive variable is capable of predicting all the explainable variations

in the stock returns and leaves unexplainable residuals to be purely white noise.5 Of course,

throughout this paper as well as most of related literature, the model of predictive regressions is

presumably correctly specified.6 Finally, from a perspective of market efficiency, Stephen Ross

(2004, p42) arguably asserts that,

“Futhermore, it (market efficiency) implies that the future returns on assets will largely

depend not on the information currently possessed by the market but, rather, on new

information that comes to the market, that is, on news.”

4In a sense, the number of assumptions for our estimation method to be valid dramatically reduce to only one,

comparing to six classical assumptions for OLS.5See Luger (2005) for a detail discussion of residual-based model misspecification test.6Shankan and Tamayo (2005) proposes a Bayesian approach to explore the possibility of model uncertainty in

the stock return predictibility.

4

Page 5: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Thus, the innovations to the asset returns in the predictive regressions possess the feature of

“news” that is independent (though not necessarily identical) of each other, which in turn implies

zero autocorrelated disturbances.

The rest of the paper is organized as follows. In section 2, we present the model of predictive

regression and describe an estimator of MLA. In section 3, we demonstrate and compare the

performance of MLA with existing estimation methods in Monte-Carlo simulations. We apply

the MLA to U.S. stock returns and reexamine the empirical evidence on predictability in Section

4. Section 5 concludes with a discussion of potential extension of the method.

2 The Regression Model

2.1 Model Setup

We consider a single-variable predictive regression model, which is conventional in the literature.

Let yt denote the predictable variable in period t, for example, excess stock return, and let xt−1

denote a predicting variable observed at t − 1, for example, dividend yield.

The model is,

yt = α + βxt−1 + et (1)

xt = θ + ρxt−1 + ut (2)

With observations t = 1, ..., T . The parameter β is the unknown parameter of interest. The

parameter ρ is the unknown degree of persistence in the predicting variable xt−1. The autoregres-

sive coefficient ρ of xt satisfies the constraint ρ < 1, to meet stationarity condition of xt.

The innovations (et, ut) are serially independent and identically distributed, but they are

correlated each other. We assume they follow bivariate normal distribution,

et, ut ∼i.i.d N(0, Σ)

corr(et, ut) = σeu

Σ =

σ2e σeu

σue σ2u

The correlation between et and ut violates the OLS assumption that the independent variable

xt−1 is uncorrelated with the error et at all leads and lags. Therefore, the simple OLS estimate β

is upwardly biased. The estimation errors in the two equations are closely related,7

E(β − β) = γE(ρ − ρ) (3)

7see Appendix Proof 1 for details.

5

Page 6: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

where γ = cov(e,u)var(u) = σeuσe

σu

The well-known downward bias in ρ induces the upward bias in β through an augmented

factor γ (if γ is negative).

In order to correct this bias of the OLS estimate, the mainstream approach is to approximate

the bias in the autoregressive estimate of the predictor variable. The followings are three existing

methods applying this approach.

2.2 Existing Methods

2.2.1 First-Order Bias Correction: Stambaugh(1999)

Kendall(1954) shows the analytical form of E(ρ− ρ) is −(1 + 3ρ)/n + O(n−2). Stambaugh(1999)

applies this result and obtains the following first-order bias correction estimator

βS = β +σeuσe

σu

(1 + 3ρ)/n, (4)

where β, ρ, e and u are obtained from OLS estimation.

Stambaugh(1999) derives the exact small-sample Bayesian posterior distribution for the esti-

mates.

2.2.2 Augmented Regression Model: Amihud and Hurvich (2004)

Since et = γut + vt, predictive regression (1) can be re-written as

yt = α + βxt−1 + γut + vt, (5)

where vt is independent of xt and ut at all leads and lags. Thus this regression meets the classical

assumption of OLS. The key is to estimate the correct ut from Eqn. (2). Amihud and Hurvich

(2004) applies the second-order bias correction to the OLS estimate of ρ and then calculate the

fitted error term uAHt . The approach takes the followings steps:

1. Estimate Eqn. (2) by OLS and obtain ρ. Then apply the second-order bias correction to ρ

and have ρAH = ρ + (1 + 3ρ)/n + 3(1 + 3ρ)/n2;

2. With ρAH , we can obtain the fitted residuals uAHt ;

3. With fitted residuals uAHt , estimate Eqn. (5) and obtain the bias-corrected estimate βAH .

6

Page 7: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

2.2.3 Lewellen’s Method: Lewellen(2004)

Based on the empirical observations that the autoregressive coefficient ρ is close to unity in some

predicting variables, Lewellen(2004) proposes a conservative bias-adjusted estimator as following

βL = β +σeuσe

σu

(0.9999 − ρ) (6)

The beauty of this method lies on the improvement in the power of the test. When the predictor

variable contains a unit root, Lewellen(2004)’s test is a uniformly most powerful test. However,

this method only applies to highly persistent predictor variables.

These reduced-bias estimates of regression coefficient critically hinge on the estimation of

autoregressive parameter ρ, a unknown nuisance parameter in the context. It is also well known

that OLS estimator of the ρ could be wildly biased when the variable is highly persistent. The

presence of a biased estimate of nuisance parameter ρ and its critical role in correcting for the bias

of regression coefficient β are undesirable in delivering a unbiased estimate of β. To avoid this

problem, we might either follow Lewellen (2004) to restrict the true ρ to be a fixed non-random

number; or develop an estimator of regression coefficient free from the estimation of unknown

nuisance parameter ρ.

2.3 Method of Least Autocorrelation (MLA)

The MLA is based on considering the sample autocorrelation of a covariance stationary and invert-

ible innovation process et. It is convenient to consider the innovations et are serially independent

and identically distributed (i.i.d. assumption). The first k autocorrelations of the innovation et

are ρ′ = [ρ1, ρ2, ..., ρk].8 By the i.i.d. assumption, we have ρj = 0, j = 1, ..., k.

With a realization of the series yt, xt and given β, for t = 2, ..., T , the sample autocorrelations

of the fitted residuals are given by ρ′ = [ρ1, ρ2, ..., ρk], where

ρj =

∑T−jt=1 etet+j∑T

t=1 e2t

, (7)

and et is the fitted residual series.

From Bartlett (1946), there is convergence in distribution,

√T (ρ − ρ) ⇒ N(0, C) (8)

where C is a k × k covariance matrix with (i, j)th element given by

Cij =∞

h=1

(ρh+i + ρh−i − 2ρiρh)(ρh+j + ρh−j − 2ρjρh) (9)

8notice here ρ is the autocorrelation of the innovation, which is different from the autoregressive coefficient of

predictor variable.

7

Page 8: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

By the assumption of serially uncorrelated innovations, we have ρj = 0, j = 1, ..., k, there is

convergence in probability,

plimρ = O, (10)

where O is a vector of zero.

Our estimator minimizes the distance between the sample autocorrelation of fitted residuals

and the zero population autocorrelation of the true innovations. Based on this condition, the

general form of MLA estimator βMLA is defined as

βMLA = arg minβ

ρ(β)′Wρ(β), (11)

where W is a weighting matrix and βMLA is the MLA estimate of β which minimizes the quadratic

objective function specified in Eqn. (11).

Here W is a symmetric, positive-definite weighting matrix. If W = I, the k × k identity

matrix, we have equally weighted form of the estimator.

The only assumption of the MLA is that the innovations are serially uncorrelated. In the pre-

dictive regression, the least-square based methods fail because of the violation of OLS assumption

such that the independent variable is related to the innovations. The MLA does not make this

assumption and the problem associated with least-square based methods disappears.

Similar to the generalized method of moments (GMM) estimator of Hansen(1982), MLA es-

timator is obtained by minimizing the quadratic objective function (11). The GMM estimator

makes use of the moment conditions gT (β) = T−1∑T

t=1 g(yt, β), while MLA estimator uses the

sample autocorrelations of the fitted residuals, {ρj , j = 1, ..., k}. Our method critically relies on

the assumption of serially uncorrelated innovation sequence, which is a broadly accepted assump-

tion in the predictive regression literature.

Recall the criterion function

βMLA = arg minβ

ρ(β)′Wρ(β)

and ρ′ = [ρ1, ρ2, ..., ρk].

In order to obtain the MLA estimator from minimizing this function, we need choose the order

k and the weighting matrix W . In practice, one way to select the weighting matrix as following:

1. Select the order of the MLA, k;

2. Estimate the sample autocorrelations of the predicted variable yt up to the order k, which

is a′ = [a1, ..., ak];

8

Page 9: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

3. Then the weighting matrix is given by a k × k diagonal matrix W d with (j, j)th element

given by W djj =

[

a2j

]−1.

It is in general the case that, the asymptotic variance of MLA estimator decreases as the order

of MLA increases. However, the performance of the estimator might decline when the number of

autocorrelations is increased beyond a certain level. The reason for this trade off is that, even

though the information used in the estimator is increased when the order of MLA is increased,

the weighting matrix becomes harder to be precisely estimated. However, we found substantial

efficiency improvement from this estimated weighting matrix than the identity matrix. With

selected order of MLA and the weighting matrix, we can obtain the MLA estimator numerically

from minimizing the quadratic objective function (11).9

The related study includes Tieslau, Schmidt and Baillie (1996). In their paper, they estimate

the fractional differencing parameter of ARFIMA model by minimizing the difference between

sample and population autocorrelation of the dependent variable.

3 Simulations and Comparisons

The performance of the method of least autocorrelation is investigated and compared with alter-

native least-square based estimation methods in a Monte Carlo simulation study. We’re interested

in the unbiasedness and efficiency of the estimators, and their respective inference performance

in hypothesis testing. The data generating process (DGP) follows model (1) and (2) with i.i.d.

bivariate normal error vectors. Through out the simulations, the standard deviations for et and

ut are set to be 4.068 and 0.043 obtained from Table 2 of Lewellen (2004)10 and β is arbitrarily

fixed to be 0.5.11

We first briefly compare the performance of MLA estimators using different orders of autocor-

relations when the sample size is given. Then we report the simulation results on the unbiasedness

and efficiency of the point estimates from MLA (k = 10) and from alternative estimation meth-

ods. Finally, we implement hypothesis tests on the simulated data using MLA and alternative

estimation/testing procedures, and compare their respective test powers under a wide range of

parametric specifications.

9The derivation of a general formula for the asymptotic variance of MLA estimators, as well as the optimal

weighting matrix, are still under the study.10We also experiment a couple of other pairs of σe and σu values obtained from OLS estimation using actual data

set, the relative performance of various estimation methods doesn’t alter qualitatively in the simulation results.

Intuitively, increasing σe and σu would uniformly deteriorate the performance for all.11The parameter α and θ are nuisance parameters of no statistical importance here. To be concrete, we fix values

of α and θ through out the simulation to be those implied by Lewellen’s estimates in Table 2 of Lewellen (2004).

9

Page 10: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

3.1 The Performance of MLA estimator when k = 2, 5, 10 and 20

The numbers of the autocorrelation, k, considered in the MLA estimation are chosen quite arbi-

trarily for data set with 120 observations. The purpose of the comparison of MLA estimator with

different k is to verify our conjecture that the asymptotic variance of MLA estimator in small

sample decreases as the number of autocorrelations increases. However, there is a tradeoff between

increasing the order of autocorrelation and losing the precision of estimated weighting matrix in

any finite sample.12 Moreover, we find increasing the number of autocorrelations also significantly

raises the computing burden in the estimation, especially when bootstrapping procedure is applied

to draw small sample null distribution for the MLA estimator.

We consider a parametric space over ρ ∈ {.5, .525, . . . , .975, 99, .997, 1}13 and corr(et, ut) ∈{−.8,−.81, . . . ,−.99}14 to generate 5000 simulated trial (data sets) for each pair of ρ and corr(et, ut)

from model (1) and (2) with T = 120. We estimate the slope coefficient for each data set us-

ing MLA with different numbers of autocorrelations, k = 2, 5, 10, and 20, respectively. From

5000 point estimates for each pair of ρ and corr(e, u), we then compute their respective mean

biases (Bias), standard deviations (SD), mean absolute deviation (MAD), and root-mean-square-

error(RMSE).

We illustrate our findings in Figure 1 through 4 for a selection of ρ = 0.997, 0.95, 0.8 and 0.5.

Figure 1 shows the mean bias of MLA estimates with different k in estimation. In general, there

is a patten for the mean bias of MLA estimators, regardless of k, to shift upward (from negative

bias to positive bias) with decreases in ρ. It seems estimations with small k(= 2, 5) tend to have

smaller mean bias than those with large k(= 10, 20) in high ρ(= 0.997, 0.95); while this pattern

reverses when ρ = 0.5 where high order MLA estimators outperforms the low order estimators in

mean bias; and they’re indistinguishable for ρ in between.15

Figure 2 is a vivid illustration of our conjecture that in small sample the standard error of the

estimator decreases with increase in the number of autocorrelations used in estimation. Standard

deviation of the MLA estimates with k = 20 stays the least while estimates with k = 2 the

12Thus, we speculate that there might exist some optimal k for any sample size T such that the marginal reduction

of asymptotic variance of MLA estimator is exhausted. However, the exact identification of the optimal k is beyond

the scope of current paper and certainly warrantees future efforts.13It covers a reasonable range of autoregressive coefficients for various financial variables empirically used as

predictive variables.14It comprises of the typically-observed correlations of disturbances between stock returns and potential predictive

variables.15In practice, without explicit knowledge about true ρ, MLA estimator with large k may be preferred for its

relative stability in the absolute magnitude of mean bias across different ρ and other desirable properties discussed

below.

10

Page 11: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

largest across all settings of ρ and corr(e, u). We further report mean absolute deviations (MAD)

in Figure 3 and root-mean-square-error (RMSE) in Figure 4 to demonstrate that increasing the

number of autocorrelations in the MLA estimation does improve the efficiency of estimator to a

certain degree in the small sample.

3.2 Point Estimation of Slope Coefficient

We in turn implement five different estimation methods, which are simple OLS (OLS), Stam-

baugh’s first order bias corrected method (FBC), Amihud and Hurvich(2004)’s augmented re-

gression method (ARM), Lewellen(2004)’s method(ρ ≈ 1), and method of least autocorrelation

(MLA, k = 10), to the simulated data specified above. Mean biases (Bias), standard deviations

(SD), mean absolute deviations (MAD), and root-mean-square-error (RMSE) are calculated from

5000 point estimates for each pair of ρ and corr(e, u) to assess the unbiasedness and efficiency of

different estimators.

We first report the mean bias of the point estimates from different estimation methods in

Figure 5. In Panel A, we set ρ = 0.997 in DGP and vary corr(et, ut) between −0.8 and −0.99.

Among all estimation methods, the simple OLS method produces the largest mean (upward) bias.

FBC and ARM considerably reduce the bias of OLS point estimate but still have sizable upward

biases. It is unsurprising that Lewellen’s method generates very small mean bias as the true

ρ = 0.997 is close enough to unity. The mean bias of MLA estimates is comparable to, or literally

indistinguishable from, Lewellen’s for all corr(e, u).

Panel B reports mean biases of estimates from different estimation methods when ρ = 0.95,

in which MLA generates the least mean bias in absolute value of all estimation methods. It also

illustrates the facts that (1) Lewellen’s point estimates are no better than simple OLS estimates

in absolute bias when true ρ moves away from unity, and (2) FBC and ARM improve their bias

correction for OLS estimate when regressor becomes less persistent. We remove the mean bias

plot for Lewellen’s method in Panel C when ρ = 0.85 and focus on the comparison of mean

biases among FBC, ARM and MLA. The MLA estimates continue to be the least mean biased,

while FBC and ARM further improve their bias-corrected estimates with mean bias closer to zero

than they are in Panel B. We last present Panel D with ρ = 0.5. FBC and ARM bias corrected

estimates eventually fully correct the bias in OLS estimate with mean bias closest to zero, while

mean bias of MLA estimates slightly biased upward. It is obvious from these graphes for selected

ρ, in terms of the mean bias of point estimates, MLA dominates the rest of estimation methods to

be the least biased estimator across a wide range of parametric specifications for ρ or corr(e, u).16

16We actually have similar plots for every ρ ∈ {.5, .525, . . . , .975, 99, .997, 1}. Results are available upon request.

11

Page 12: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

In comparing the unbiasedness of point estimates from various estimation methods, we’re

cautious of interpreting the evidence from the measure of mean bias, as it ignores other impor-

tant distributional characteristics of the point estimates. For instance, an estimator can generate

point estimates with zero mean bias simply because its extremely large positively biased esti-

mates happen to cancel out its equally large negatively biased estimates, which is definitely a

undesirable feature for any estimation method even though its mean bias looks completely fine.

To better assess the unbiasedness of MLA estimates, we implement the measure of mean absolute

deviation(MAD)17 to detect any potential problem not captured by the measure of mean bias.

Panel A of Figure 6 reports the MADs for five competing estimation methods when ρ = 0.997.

Among them, simple OLS method with no surprises produces the largest MAD. The difference in

the MADs between FBC and ARM are almost identical in the graph.18 Both methods reduce the

bias of simple OLS estimates and have smaller MADs. Under the setting of ρ = 0.997, a nearly

unit root autoregressive process for the predictive variable, Lewellen’s method can, in principle,

“exactly” correct the bias arisen from OLS estimation. Conditional on the true nuisance parameter

ρ = 1 (ρ = 0.997 in the simulation), Lewellen’s method is shown to be the least unbiased. MAD

of MLA estimates scores higher than Lewellen’s estimates but considerably lower than those of

FBC and ARM.

Panel B of Figure 6 illustrates another example of how persistent regressor can bias the point

estimate of slope coefficient, when ρ = 0.95. Lewellen’s estimates now have the largest MAD of

all estimation methods, which is a dramatic deterioration in unbiasedness of the estimate from

the previous panel. MADs for estimates from FBC and ARM slightly shift upward though still

locate lower than MAD of simple OLS estimates. The MLA generates the least biased estimates

of all methods in terms of MAD. When ρ further declines to 0.85 in Panel C,or to as low as 0.5

in Panel D,19 estimates obtained from MLA continuously beat least-square based bias corrected

estimates with a large margin in MAD.

A more comprehensive picture about the MAD of the point estimates is presented in Figure 7.

Each grid point corresponds to a pair of ρ and corr(et, ut) in the simulation, and Z-axis records

MADs of point estimates from each pair. The MLA method consistently generates point estimates

with MAD that is uniformly smaller than MADs from OLS, FBC, and ARM methods across a

17Results from of root-mean-square-error (RMSE) are also available upon request.18As discussed in Section 2.2.1 and 2.2.2, FBC and ARM only differ in their different approximation orders for

the corrected autoregressive coefficient ρ, a nuisance parameter, which is to correct for the OLS estimate of slope

coefficient in the predictive regression.19Again, we remove the plot for Lewellen’s MAD in both graphes to focus on FBC, ARM, and MLA

12

Page 13: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

wide range of ρ and corr(e, u). Panel 1 through 4 are alike in the shape of the surface, but they

are very much different in the magnitude of scales.

We are always interested in the efficiencies of different estimators. The smaller the dispersion

of the estimates, the more efficient the estimator is, ceteris paribus. We calculate the standard

deviations of the point estimates obtained from 5000 Monte Carlo trials for each pair of ρ and

corr(et, ut) as we do for mean bias and MAD comparisons. Results are presented in Figure 8. For

ρ = 0.997, 0.95, 0.8 and 0.5, the standard deviations of point estimates from MLA are always the

second smallest of all. Although point estimates from Lewellen’s method always have the smallest

standard deviations for every ρ, it is of little statistical use as we know its point estimates are

wildly biased when ρ is not sufficiently close to unity. By construction, point estimates of OLS,

FBC, and ARM have standard deviations proportional to each other and slightly different in

magnitude. Nonetheless, the standard deviations of point estimates from OLS, FBC, and ARM

are uniformly larger than that of MLA estimates.

In summary, the results presented above have demonstrated that, the MLA estimator (k = 10)

can more efficiently estimate the slope coefficient in predictive regressions with much reduced bias,

comparing to existing estimation methods based on least square estimators.

3.3 Power of Hypothesis Testing

The performance of hypothesis testing on different estimates of the slope coefficient in predictive

regressions is also investigated in a Monte Carlo study. We compare the rejection rates for MLA

test of predictability to the rejection rates for five alternative tests of predictability. The same set

of parameter values are maintained for σe, σu, θ, and α as in previous simulations. We further fix

the correlation of disturbances between stock returns and predictive variables to be −0.955, an ad

hoc value obtained from Table 2 of Lewellen (2004). We then vary the true β ∈ {0, .4, .8, 1.2, 1.6, 2}and nuisance parameter ρ ∈ {.999, 997, .995, .993, .99, .985, .875, .95, .9, .8}, and generate 1500

Monte Carlo simulations with T=150 for each pair of β and ρ. When statistical inference is

obtained from bootstrapping procedure, we perform 500 bootstrap replications for each data set.

One sided hypothesis test are performed at nominal significance level of 5%. The null hypothesis,

in all cases, is H0 : β = 0.

Table 1 shows rejection rates for simple OLS test, OLS bootstrapping test, Stambaugh’s

FBC bootstrapping test, Amihud and Hurvich’s ARM test, Lewellens’ conditional ρ ≈ 1 test,

Bonferroni Joint test, and MLA bootstrapping test.20 The first row of each panel in the table

illustrates the size of the test for each testing procedure. Under the null hypothesis of β = 0,

20Detailed testing procedures are documented in Appendix B

13

Page 14: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

the rejection rate, or the size of the test, should exactly equal to the nominal significance level

5% across different ρ. The rest of the rows in each panel report the power of the hypothesis test

under various β-ρ pairs.

There are several key results from the table. First, the simple OLS test rejects too often under

the null (the row with β = 0), which has been long recognized in the prior literature. Also, the

rejection rates decrease monotonically in ρ for each true β = 0.4, 0.8, 1.2, 1.6 and 2; as predictive

variable becomes less and less persistent, the power of the test based on standard t statistics

gradually declines.

Second, tests based on OLS bootstrapping and Stambaugh’s FBC bootstrapping perform fairly

good in size, both generating rejection rates a little lower than the nominal 5% significance level.

However, the power of both tests is in general unsatisfactory. They can only reject the null at its

best half of the occasions even when the true β is actually as large as 2. Moreover, the power of

the tests quickly falls as ρ gets smaller. For ρ ≤ 0.99, most of the rejection rates fall into single

digit, which gives us test of predictability too conservative to trust.

Third, test based on Amihud and Hurvich’s ARM method has the right size when ρ is very

close to 1, and over-rejects the null when ρ becomes smaller. The power of the test suffers the

same weakness as above two tests using bootstrap, it under-rejects the null in about half of the

occasions when the true β is far away from zero. However, ARM approach seems to overcome the

problem of rapidly deteriorating power of the test across ρ; there is no obvious declining trend in

the rejection rates when predictable variable becomes less persistent.

Fourth, Lewellen’s conditional ρ ≈ 1 test by itself and the joint test based on the modified

Bonferroni p-value reject too little under the null with rejection rates always less than nominal 5%

significance level. The power of two tests are quite strong when ρ is very close to 1 and true β is

as large as two. However, the power of the test dramatically drops to near zero as ρ gets smaller,

which is a property inherent to the construction of Lewellen’s conditional test that critically relies

on the assumption that true autoregressive process of predictable variable is close to 1.

Finally, the last panel of the table reports the size (first row) and the power of the test using

bootstrapped MLA. Compared to the above tests, MLA bootstrap test has almost the right size

of nominal 5% across all ρ. Meanwhile, the power of the test is uniformly better than any of the

above tests for all pairs of β and ρ. The rejection rates can be as high as 90% when true β is as

close to zero as 0.4; while none of the alternative tests can obtain a rejection rate above 50% at

this level of β. As ρ decreases, the power of the test only gradually declines, much slower than the

deterioration rate in the power of the test for OLS bootstrapping, FBC bootstrapping, Lewellen’s

condition test, and Bonferroni joint test. For instance, as ρ is as low as 0.95, it still maintains a

14

Page 15: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

fairly high rejection rates around 60% for β = 0.4, 82.3% for β = 0.8, 92.2% for β = 1.2, 96.1%

for β = 1.6, and 98% for β = 2.

In summary, MLA bootstrap test of predictability significantly outperforms all the alternative

tests in both the size and the power of the test. In terms of the size of the test, MLA bootstrap test

consistently rejects the null at nominal 5% significance level across all specifications of ρ, while

alternative tests either under-reject or over-reject the null with a much larger margin. In terms

of the power of the test, MLA bootstrap test is universally more powerful than the alternative

tests considered here. Being aware of the advantage of MLA estimation in hypothesis testing in

a simulated world, we now take the method to the real data to test for the predictability of stock

returns using a variety of predictive financial variables.

4 Empirical Illustration

In this section, we use a common model of predictive regression studied by Stambaugh (1999),

Lewellen (2004), Amihud and Hurvich (2004), Campbell and Yogo (2005), and Goyal and Welch

(2005). We estimate models where monthly stock returns are predicted by lagged financial vari-

ables, including the dividend price ratio (DP), earning price ratio (EP), book to market ratio

(BM), corporate bond returns (CORP), default yield spread (DFY), long term government bond

returns(LTGR), term spread (SPREAD) and Net Equity Expansion (NetIss). The predictability

tests are carried out using different testing/estimation procedures outlined above.

4.1 Data Construction and Descriptive Statistics

We obtain our data all from Amit Goyal’s website. He compiles and maintains a comprehensive

data set comprising a wide range historical financial time series he collects from a number of

different data sources and prepares for Goyal and Welch (2005). We next briefly describe the

financial variables used in this paper and more technical details can be found in Goyal and Welch

(2005). The whole sample period covers January 1946 to December 2003.

We define Stock Returns (R) as the continuously compounded returns, including dividends,

on month-end values of S&P 500 index from CRSP. Dividend Price ratio (DP) is the difference

between the log of dividends and the log of prices. Dividends are twelve-month moving sums of

dividends paid on the S&P 500 index. Earning Price ratio (EP) is the difference between the

log of earnings and the log of prices. Earnings are twelve-month moving sums of earnings on the

S&P 500 index. Book to Market ratio (BM) is the ratio of book value to market value for Dow

Jones Industrial Average. Corporate Bond returns (CORP) are long-term corporate bond returns

15

Page 16: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

from Ibbotson’s Stocks, Bonds, Bills and Inflation Yearbook. Default Yield Spread (DFY) is the

difference between BAA- and AAA- rated corporate bond yields. Long Term Government Bond

returns (LTGR) are from Ibbotson’s Stocks, Bonds, Bills and Inflation Yearbook. Term Spread

(SPREAD) is the difference between the long term yield on government bonds and the T-bill.

Net Equity Expansion (NetIss) is the ratio of twelve-month moving sums of net issues by NYSE

listed stocks divided by the total market capitalization of NYSE stocks.

Table 2 provides summary statistics of the data. Motivated by Lewellen’s approach, we reor-

ganize and split the data into five sub-samples, 1946-2003, 1946-1994, 1946-1972, and 1973-1994

to account for potential structural shifts in the market fundamentals over time.

Table 2 also reports the autoregressive coefficient, ρ, of forecasting variables in the predictive

regression. It shows some financial ratios are extremely persistent but some of them are not. DP,

EP, and BM are the most persistent with empirical autoregressive coefficients as large as 0.997.

These three variables are of particular interest to Lewellen (2004) since his estimation method

is conditional on the true ρ being close to 1. As autoregressive process becomes less persistent

for variables such as DFY and SPREAD, we know from above simulated studies that Lewellen’s

method is less trustable and performs poorly in both power and size tests. As ρ further declines

to the range as those of CORP, NetIss and LTGR, tests such as OLS bootstrap or ARM have

relatively stronger power. The purpose to consider all these variables in the empirical tests is

to illustrate the capability of MLA-based test to handle predictability tests for a wide range

of financial variables with different autoregressive processes, as we have shown in the simulation

studies. The only assumption under which the predictability test based on MLA is uniformly more

powerful than all alternative methods is that the disturbances of predictive regression is serially

uncorrelated, or equivalently that the predictive model we’re estimating is correctly specified.

4.2 Empirical Results

This section presents empirical evidence of return predictability using various financial ratios for

different subperiods. We will first focus on BM, DP, and EP, all of which receive most attention in

the literature and have the most persistent autoregressive processes across all sub-samples. Next,

we will study those less persistent financial variables, such as DFY and SPREAD. Finally, we

will consider Corp, NetIss, and LTGR, all of which cannot be characterized as persistent financial

ratios in the data. Using MLA procedure whose advantageous performance in estimation and

hypothesis testing has been documented above, we hope to shed more light on the issue of stock

return predictability.

16

Page 17: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

4.2.1 Predicting with BM, DP, and EP

As shown in summary statistics, BM, DP, and EP are the most persistent financial ratios of

all predictive variables considered here. We report estimated coefficient of β, t-stat, p-value, R

square from simple OLS, and correlation of fitted disturbances between e and u, in turn, across

four different subperiods in Panel A and B of Table 3. For each variable, we use six alternative

estimation/testing methods.

Conditional on ρ ≈ 1, Lewellen (2004) finds strong evidence of DP on the predictability of

stock returns for various sample periods and somewhat “weaker” but still limited forecasting

power for BM and EP. As a conditionally Uniformly Most Power test when ρ ≈ 1, Lewellen’s

point estimate of slope coefficient and testing power has been recognized as reliable in Campbell

and Yogo (2005). We can thus compare results from MLA to those from Lewellen’s method first.

For DP, Lewellen’s method, again, finds strong evidence of predictability with p-values less

than 1% significance level in three out of four subperiods, except for 1973-1994 period21. Test

result from MLA concurs with Lewellen’s finding in subperiod 1946-1994, rejecting the null of

no predictability at conventional significance level; the point estimate of slope coefficient is 0.339

in MLA vs. 0.541 in Lewellen’s method. For period 1946-1972, our method marginally fails to

reject the null at 10% significant level with ˆbeta = 0.23 while Lewellen’s method finds evidence

of predictability with β almost three times larger. For period 1973-1994 during which Lewellen’s

method generate β = 0.295 with p-value slightly greater 10% significance level, MLA estimates

the slope coefficient to be 0.383 that is statistically different from zero at 10% significance level.

The most striking conflict between the result of Lewellen’s method and MLA method occurs in

sample 1946-2003. Test statistic from MLA bootstrap approach implies no predictable power of

DP on stock returns, which disagrees with Lewellen’s conclusion but seems consistent with results

from OLS BT, FBC BT, and ARM approach.

As to BM and EP, MLA method finds predictable power for BM in period 1946-1972 and for

EP in period 1946-1994, respectively; while Lewellen’s method indicates that BM has forecasting

power in both periods ending in 1994 and EP is effective predictable variable only for period 1946-

2003. It is interesting to notice that BM becomes less persistent with ρ = 0.976 during period

1946-172 where MLA finds evidence for the predictability of return. It cast some reasonable

doubts on the validity of Lewellen’s conditional test under the circumstances when predictive

variable has autocorrelation less than 0.99, since the power of the test using Lewellen’s method

can reduce dramatically to zero as we have shown in the previous Monte Carlo experiment. Thus

we are cautious of interpreting Lewellen’s results as definite evidence against BM as an effective

21This is a subperiod that is not studied in Lewellen (2004).

17

Page 18: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

forecasting variable for returns, since the most powerful test, such as MLA, under the condition

of ρ = 0.976 might be more trustworthy.

The results from the rest of the estimation methods might be less interesting to interpret

here; in the face of highly persistent predictive variables, they’re likely to have either severe bias

in point estimation or distorted size and power in hypothesis testing, or both.

4.2.2 Predicting with DFY and SPREAD

We now turn to a set of predictive variables, DFY and SPREAD, whose persistence is modest

typically with ρ less than 0.98 for all sub-periods. Obviously, Lewellen’s method is incapable of

handling the predictability tests efficiently under this circumstance. Therefore, we’re interested to

compare the results of MLA to those of OLS Bootstrap, FBC, and ARM methods across different

sample periods.

Simply put, using MLA method we can only conclude that SPREAD has forecasting power

for returns in period 1946-2003 with estimated slope coefficient 0.36. However, test results from

both FBC BT and ARM strongly suggest DFY and SPREAD are significant forecasters for stock

returns in all periods. Comparing the slope estimates between MLA and FBC BT/ARM, we find

a general pattern that β estimated by MLA are usually much smaller in magnitude than those

from FBC BT or ARM. It is worth noting that from simulation studies, there should not be very

much discrepancy in the estimates of β between MLA and FBC/ARM approach, if the true data

generating process is correctly specified by predictive regression model as in (1) and (2). One

possibility for such conflicting testing results and point estimates between MLA and FBC/ARM

is due to model misspecifications, such as missing variables or nonlinearity, which is obviously

beyond the research scope of our current paper.

4.2.3 Predicting with CORP, NetIss and LTGR

The last set of predictive variables consist of CORP, NetIss, and LTGR. These three variables have

the least persistent autoregressive processes with empirical ρ between 0.2 and 0.02 for different

subperiods.

MLA test indicate LTGR can predict stock returns in 1946-2003 period, which is also confirmed

by alternative testing methods based on least square estimations, such as FBC BT and ARM.

Except that, the MLA finds no evidence of predictability for any variables during any periods.

These results are strikingly in contrast to those from FBC BT and ARM methods which find

significant forecasting ability for CORP and LTGR in all periods.

18

Page 19: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

5 Conclusion and Extension

The literature has provided divergent evidence on stock predictability over the last 30 years mainly

because predictive regression still has estimation problem not solved efficiently. Most of existing

methods concentrate on the least square based method to draw inference. This paper addresses

this problem by proposing a new estimation method, method of least autocorrelation, conditional

on the assumption that the true disturbances to the stock returns are serially uncorrelated. Our

method produces more efficient estimates with reduced bias in Monte Carlo experiments. Simula-

tion results also indicate that the power and size properties are significantly improved compared

with existing methods based on least squares.

We empirically apply our estimation method to a number of predictive variables in the lit-

erature to test for the predictability of stock returns. Conditioning on the prior belief that true

innovations to the returns are i.i.d., we obtain point estimate consistent with the assumption and

draw inferences accordingly. The estimates of MLA differ to a certain degree from the estimates

of existing methods, while the testing results are in general similar to each other.

The method proposed in this paper is intuitively analogous to the GMM and is easy to imple-

ment when the first order autocorrelation is considered. The methodology can be generalized to

include higher orders of autocorrelation as conditional “quasi-moments” with appropriate weight-

ing matrix. The parameter can be numerically obtained by minimizing the distance between the

autocorrelation(s) of fitted residuals and their theoretical counterparts which is zero under the

i.i.d. assumption. The method can be easily extended to multiple predictor model.

The MLA approach does not depend on the assumption of identical conditional variance.

Therefore the method is also robust to the existence of heteroscadascity in the disturbances to

stock returns. The only required condition is that the autocorrelation of true innovations is, or is

believed to be, close to zero.

A more ambitious application of MLA is to test for market efficiency hypothesis or model

mis-specification, both of which have strong implications on the serial autocorrelation of model

residuals as in Luger (2005).

19

Page 20: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

A Proof 1

Consider the predictive regression model,

yt = α + βxt−1 + et

xt = θ + ρxt−1 + ut

where

et, ut ∼i.i.d N(0, Σ)

corr(et, ut) = σeu

Σ =

σ2e σeu

σue σ2u

Let b = (α β)′ and p = (θ ρ)′, then the OLS estimate of b and p are given by,

b = b + (X ′X)−1X ′e (12)

p = p + (X ′X)−1X ′u (13)

Since corr(et, ut) = σeu 6= 0, we have

et = γut + vt (14)

where γ = cov(e,u)var(u) = σeuσe

σu, v is a white noise and independent of u at all leads and lags.

Insert Eqn. (14) into Eqn. (12), we have

b − b = (X ′X)−1X ′(γu + v)

= γ(X ′X)−1X ′u + (X ′X)−1X ′v

= γ(p − p) + η (15)

where η = (X ′X)−1X ′v and we have E(η) = 0. Taking expectation to both sides of Eqn. (15),

we have

E(β − β) = γE(ρ − ρ)

20

Page 21: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

B Estimation Methods used in Hypothesis Testing

B.1 Simple OLS (OLS)

We use the standard t stat to draw inference on the point estimate of simple OLS regression.

It has been widely recognized since Mankiw and Campell (1991) that simple OLS estimation of

slope coefficient in predictive regression rejects the null hypothesis of no predictability too often

in the simulation study. We present the testing results from simple OLS estimation mainly as a

benchmark to alternative testing procedures.

B.2 OLS Bootstrapping (OLS BT)

Bootstrapping the standard error of the OLS estimates of the slope coefficient in the predictive

regression has been used by Nelson and Kim(1993), Kothari and Shanken (1997) and Polk, Thomp-

son, and Vuolteenaho (2004). We specifically use parametric bootstrapping procedure following

Polk, Thompson, and Vuolteenho (2004) and Amihud, Hurvich, and Wang (2005). Technical

details can be found in Amihud, Hurvich, and Wang (2005).

B.3 Stambaugh’s First Order Bias Correction Bootstrapping (FBC BT)

The standard error of the estimate is obtained similarly as in OLS bootstrapping method, ex-

cept that the point estimate of slope coefficient is corrected using Stambaugh’s first order bias

correction formula.

B.4 Amihud and Hurvich’s Augmented Regression Method (ARM)

The point estimate and estimated standard error is obtained using augmented regression method

proposed in Amihud and Hurvich (2004). Standard t stat is appropriate to draw inference of the

estimate. See Amihud and Hurvich (2004) and Amihud, Hurvich, and Wang (2005) for details.

B.5 Lewellen’s Method (Lewellen)

Lewellen (2004) assumes ρ ≈ 1 to estimate a upper bound for the bias in the estimate of slope

coefficient. The bias corrected predictive coefficient and its estimated standard deviation can

produce ordinary t stat for hypothesis testing. See Lewellen (2004) for details.

B.6 Bonferroni Joint Test (Joint)

Bonferroni test can be used to rely on both conditional and unconditional test to calculate an

overall significance level that reflects the probability of rejecting using either test. Lewellen (2004)

21

Page 22: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

proposes to use an p-value equal to min(2P, P +D), where P is the smaller of the two stand-alone

p-values and D is the p-value for testing ρ = 1, based on the sampling distribution of ρ. We

define the p-values obtained from Lewellen’s method and Stambaugh’s Bootstrapping method as

two stand-alone conditional and unconditional tests, respectively.

B.7 Method of Least Autocorrelation Bootstrapping (MLA)

We follows Polk, Thompson, and Vuolteenho (2004)’s parametric bootstrap procedure to obtain

95th percentile of (one-sided) critical value from the distribution of MLA point estimates boot-

strapped 500 times under the null of β = 0 for each Monte Carlo repetition. The MLA point

estimate of the slope coefficient from that particular Monte Carlo data set is compared to the

95th percentile critical value from bootstrapping, if it is larger than the critical value then we

reject the null. Due to the cumbersome computing work in both Monte Carlo and bootstrapping,

we choose the order of autocorrelations for all MLA estimations to be k = 5. As discussed above,

with higher order of autocorrelations to be considered in the estimation, the unbiasedness and

efficiency of the MLA estimator will accordingly improve. Thus, we conjecture that the power of

the test with higher order of autocorrelations would be even better than what we report here.

22

Page 23: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

References

Amihud, Y. and C. Hurvich, 2004, Predictive Regressions: A Reduced-Bias Estimation Method.

Journal of Financial and Quantitative Analysis, 39, 813-841.

Amihud, Y., C. Hurvich and Y. Wang, 2004, Hypothesis Testing in Predictive Regressions. Work-

ing Paper, New York University.

Bacchetta, Philippe, and Eric van Wincoop, 2000, Does exchange-rate stability increase trade and

welfare? American Economic Review 90, 1093-1109.

Backus, David K., Patrick J. Kehoe, and Finn E. Kydland, 1992, International real business

cycles. Journal of Political Economy 100, 745-775.

Balvers, Ronald J., Thomas F. Cosimano, and Bill McDonald, 1990, Predicting Stock Returns in

an Efficient Market, Journal of Finance, 45, 1109-1128.

Barlett, M.S., 1946, On the theoretical specification and sample properties of autocorrelated

time-series. Journal of Royal Statistical Society B8, 27-41.

Campbell, J. and R. Shiller, 1988, The dividend-price ratio and expectations of future dividends

and discount factors, Review of Financial Studies, 1, 195-227.

Campbell, J. and M. Yogo, 2005, Efficient Tests of Stock Return Predictability. forthcoming,

Journal of Financial Economics.

Cutler, David M., James M. Poterba, and Lawrence H. Summers, 1991, Speculative Dynamics,

Review of Economic Studies, 58, 529-546.

Fama, E.F., French, K.R., 1988, Dividend Yields and expected stock returns. Journal of Financial

Economics, 22, 3-26.

Fama, Eugene F., 1990, Stock Returns, Real Returns, and Economic Activity, Journal of Finance,

45, 1089-1108.

Goyal, A. and I. Welch, 2003a, Predicting the equity premium with dividend ratios. Management

Science, 49, 639-654.

Goyal, A. and I. Welch, 2003b, A Note on ”Predicting Returns with Financial Ratios”. Working

Paper

Hodrick, R.J., 1992, Dividend yields and expected stock returns: Alternative procedures for

inference and measurement. Review of Financial Studies, 5, 357-386.

Keim, D.B., Stambaugh, R.F., 1986, Predicting returns in the stock and bond markets. Journal

of Financial Economics, 17, 357-390.

Kendall, M.G., 1953, The analysis of economic time series, Part I. Prices. Journal of Royal

Statistical Society, 96, 11-25.

23

Page 24: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Kendall, M.G., 1954, Note on Bias in the Estimation of Autocorrelation. Biometrika, 41, 403-404.

Kothari, S.P., Shanken, J., 1997, Book-to-market, dividend yield, and expected market returns:

A time-series analysis. Journal of Financial Economics, 44, 169-203.

Lewellen, J., 2004, Predicting returns with financial ratios. Journal of Financial Economics, 74,

209-235.

Luger, Richard, 2005, Exact Permutation Tests for Non-nested Non-linear Regression Models,

Journal of Econometrics, forthcoming.

Mankiw, N.G., 1986, Do we reject too often? Small sample properties of tests of rational expec-

tation models. Economics Letters, 20, 139-145.

Nelson, C.R., Kim, M.J., 1993, Predictable stock returns: The role of small sample bias. Journal

of Finance, 48, 641-661.

Polk, C., S. Thompson and T. Vuolteenaho, 2005, Cross-sectional forecasts of the equity premium.

forthcoming, Journal of Financial Economics

Ross, Stephen, Neoclassical Finance, Princeton University Press, 2004, 42

Schwert, G. William, 1990, Stock Returns and Real Activity: A Century of Evidence, Journal of

Finance, 45, 1237-1257.

Shankan, Jay and Ane Tamayo, 2005, Dividend Yield, Risk, and Mispricing: A Bayesian Analysis,

mimeo, Emory University.

Stambaugh, R., 1986, Bias in Regressions with Lagged Stochastic Regressors. Working Paper,

University of Chicago.

Stambaugh, R., 1999, Predictive Regressions. Journal of Financial Economics, 54, 375-421.

Tieslau, M.A., Schmidt, P. and Baillie, R.T., 1996, A minimum distance estimator for long

memory processes. Journal of Econometrics, 71, 249-264.

24

Page 25: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

−1 −0.95 −0.9 −0.85 −0.8

−0.5

0

0.5

1

σe,u

bias

Panel A: ρ = 0.997

−1 −0.95 −0.9 −0.85 −0.8

−0.5

0

0.5

1

σe,u

bias

Panel B: ρ = 0.950

−1 −0.95 −0.9 −0.85 −0.8

−0.5

0

0.5

1

σe,u

bias

Panel C: ρ = 0.850

−1 −0.95 −0.9 −0.85 −0.8

−0.5

0

0.5

1

σe,u

bias

Panel D: ρ = 0.500

k=2 k=5 k=10 k=20

Figure 1: Simulated Mean Bias for MLA

Consider the data-generating process: yt = α + βxt−1 + et; xt = θ + ρxt−1 + ut, with i.i.d.

bivariate normal innovations. The plot illustrates the estimates bias from the method of least

autocorrelation with order k = 2, 5, 10, 20. β is set as 0.5 and ρ is set as indicated in the figure.

The grid values range over σe,u ∈ {−0.99,−0.98, ...,−0.81,−0.80}. For each grid point there are

5,000 Monte Carlo simulations of T = 120 observations. The bias is calculated as Bias = β − β

and then taken average, where β is the parameter estimate from the above methods and β = 0.5

is the true parameter.

25

Page 26: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

−1 −0.95 −0.9 −0.85 −0.8

2

3

4

5

6

7

8

9

σe,u

S.D

.

Panel A: ρ = 0.997

−1 −0.95 −0.9 −0.85 −0.8

2

3

4

5

6

7

8

9

σe,u

S.D

.

Panel B: ρ = 0.950

−1 −0.95 −0.9 −0.85 −0.8

2

3

4

5

6

7

8

9

σe,u

S.D

.

Panel C: ρ = 0.850

−1 −0.95 −0.9 −0.85 −0.8

2

3

4

5

6

7

8

9

σe,u

S.D

.

Panel D: ρ = 0.500

k=2 k=5 k=10 k=20

Figure 2: Simulated Standard Deviation (S.D.) for MLA

Consider the data-generating process: yt = α+βxt−1+et; xt = θ+ρxt−1+ut, with i.i.d. bivariate

normal innovations. The plot illustrates the estimates standard deviation from the method of least

autocorrelation with order k = 2, 5, 10, 20. β is set as 0.5 and ρ is set as indicated in the figure.

The grid values range over σe,u ∈ {−0.99,−0.98, ...,−0.81,−0.80}. For each grid point there are

5,000 Monte Carlo simulations of T = 120 observations. The S.D. is calculated as S.D. = std(β),

where β is the parameter estimate from the above methods.

26

Page 27: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

−1 −0.95 −0.9 −0.85 −0.81

2

3

4

5

6

σe,u

MA

D

Panel A: ρ = 0.997

−1 −0.95 −0.9 −0.85 −0.81

2

3

4

5

6

σe,u

MA

D

Panel B: ρ = 0.950

−1 −0.95 −0.9 −0.85 −0.81

2

3

4

5

6

σe,u

MA

D

Panel C: ρ = 0.850

−1 −0.95 −0.9 −0.85 −0.81

2

3

4

5

6

σe,u

MA

D

Panel D: ρ = 0.500

k=2 k=5 k=10 k=20

Figure 3: Simulated Mean Absolute Deviation (MAD) for MLA

Consider the data-generating process: yt = α + βxt−1 + et; xt = θ + ρxt−1 + ut, with i.i.d.

bivariate normal innovations. The plot illustrates the estimates MAD from the method of least

autocorrelation with order k = 2, 5, 10, 20. β is set as 0.5 and ρ is set as indicated in the figure.

The grid values range over σe,u ∈ {−0.99,−0.98, ...,−0.81,−0.80}. For each grid point there are

5,000 Monte Carlo simulations of T = 120 observations. The MAD is calculated as MAD = |β−β|and then taken average, where β is the parameter estimate from the above methods and β = 0.5

is the true parameter.

27

Page 28: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

−1 −0.95 −0.9 −0.85 −0.8

2

3

4

5

6

7

8

9

σe,u

RM

SE

Panel A: ρ = 0.997

−1 −0.95 −0.9 −0.85 −0.8

2

3

4

5

6

7

8

9

σe,u

RM

SE

Panel B: ρ = 0.950

−1 −0.95 −0.9 −0.85 −0.8

2

3

4

5

6

7

8

9

σe,u

RM

SE

Panel C: ρ = 0.850

−1 −0.95 −0.9 −0.85 −0.8

2

3

4

5

6

7

8

9

σe,u

RM

SE

Panel D: ρ = 0.500

k=2 k=5 k=10 k=20

Figure 4: Simulated Root Mean Square Error (RMSE) for MLA

Consider the data-generating process: yt = α + βxt−1 + et; xt = θ + ρxt−1 + ut, with i.i.d.

bivariate normal innovations. The plot illustrates the estimates RMSE from the method of least

autocorrelation with order k = 2, 5, 10, 20. β is set as 0.5 and ρ is set as indicated in the figure.

The grid values range over σe,u ∈ {−0.99,−0.98, ...,−0.81,−0.80}. For each grid point there are

5,000 Monte Carlo simulations of T = 120 observations. The RMSE is calculated as RMSE =√

1/T∑

(β − β)2, where β is the parameter estimate from the above methods and β = 0.5 is the

true parameter.

28

Page 29: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

−1 −0.95 −0.9 −0.85 −0.8−5

0

5

σe,u

bias

Panel A: ρ = 0.997

−1 −0.95 −0.9 −0.85 −0.8−5

0

5

σe,u

bias

Panel B: ρ = 0.950

−1 −0.95 −0.9 −0.85 −0.8−5

0

5

σe,u

bias

Panel C: ρ = 0.850

−1 −0.95 −0.9 −0.85 −0.8−5

0

5

σe,u

bias

Panel D: ρ = 0.500

OLS FBC Lewellen MLA ARM

Figure 5: Simulated Mean Bias Comparison

Consider the data-generating process: yt = α+βxt−1+et; xt = θ+ρxt−1+ut, with i.i.d. bivariate

normal innovations. ‘OLS’ reports the simple OLS estimates bias, ‘FBC’ reports the Stambaugh’s

(1999) first-order biased-adjusted estimate bias, ‘ARM’ reports Amihud and Hurvich’s(2004) Aug-

mented regression estimates bias, ‘Lewellen’ reports Lewellen’s (2004) bias-adjusted estimate bias,

‘MLA’ reports estimates bias from method of least autocorrelation. β is set as 0.5 and ρ is set

as indicated in the figure. The grid values range over σe,u ∈ {−0.99,−0.98, ...,−0.81,−0.80}. For

each grid point there are 5,000 Monte Carlo simulations of T = 120 observations. The bias is

calculated as Bias = β − β and then taken average, where β is the parameter estimate from the

above methods and β = 0.5 is the true parameter.

29

Page 30: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

−1 −0.95 −0.9 −0.85 −0.80

1

2

3

4

5

6

7

σe,u

MA

D

Panel A: ρ = 0.997

−1 −0.95 −0.9 −0.85 −0.80

1

2

3

4

5

6

7

σe,u

MA

D

Panel B: ρ = 0.950

−1 −0.95 −0.9 −0.85 −0.80

1

2

3

4

5

6

7

σe,u

MA

D

Panel C: ρ = 0.850

−1 −0.95 −0.9 −0.85 −0.80

1

2

3

4

5

6

7

σe,u

MA

D

Panel D: ρ = 0.500

OLS FBC Lewellen MLA ARM

Figure 6: Simulated Mean Absolute Deviation (MAD) Comparison

Consider the data-generating process: yt = α + βxt−1 + et; xt = θ + ρxt−1 + ut, with i.i.d.

bivariate normal innovations. ‘OLS’ reports the simple OLS estimates MAD, ‘FBC’ reports

the Stambaugh’s (1999) first-order biased-adjusted estimates MAD, ‘ARM’ reports Amihud

and Hurvich’s(2004) Augmented regression estimates MAD, ‘Lewellen’ reports Lewellen’s (2004)

bias-adjusted estimates MAD, ‘MLA’ reports estimates MAD from method of least autocor-

relation. β is set as 0.5 and ρ is set as indicated in the figure. The grid values range over

σe,u ∈ {−0.99,−0.98, ...,−0.81,−0.80}. For each grid point there are 5,000 Monte Carlo simula-

tions of T = 120 observations. The MAD is calculated as MAD = |β−β| and then taken average,

where β is the parameter estimate from the above methods and β = 0.5 is the true parameter.

30

Page 31: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

0.5 0.6 0.7 0.8 0.9 −1

−0.9

−0.80

2

4

6

8

σe,u

MLA (k=10)

ρ

MA

D

0.5 0.6 0.7 0.8 0.9 1 −1−0.9

−0.80

2

4

6

8

σe,u

OLS

ρ

MA

D

0.5 0.6 0.7 0.8 0.9 −1−0.9

−0.80

2

4

6

8

σe,u

FBC

ρ

MA

D

0.5 0.6 0.7 0.8 0.9 1 −1

−0.9

−0.8

0

2

4

6

8

σe,u

ARM

ρ

MA

D

Figure 7: Simulated Mean Absolute Deviation (MAD) Comparison

Consider the data-generating process: yt = α + βxt−1 + et; xt = θ + ρxt−1 + ut, with i.i.d. bi-

variate normal innovations. ‘OLS’ reports the simple OLS estimates MAD, ‘FBC’ reports the

Stambaugh’s (1999) first-order biased-adjusted estimates MAD, ‘ARM’ reports Amihud and Hur-

vich’s(2004) Augmented regression estimates MAD, ‘MLA’ reports estimates MAD from method

of least autocorrelation. The grid values range over σe,u ∈ {−0.99,−0.98, ...,−0.81,−0.80} and

ρ ∈ {0.5, 0.525, ..., 0.95, 0.975, 0.99, 0.997, 1}. For each grid point there are 5,000 Monte Carlo

simulations of T = 120 observations. The MAD is calculated as MAD = |β − β| and then taken

average, where β is the parameter estimate from the above methods and β = 0.5 is the true

parameter.

31

Page 32: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

−1 −0.95 −0.9 −0.85 −0.80

1

2

3

4

5

6

7

8

σe,u

S.D

.

Panel A: ρ = 0.997

−1 −0.95 −0.9 −0.85 −0.80

1

2

3

4

5

6

7

8

σe,u

S.D

.

Panel B: ρ = 0.950

−1 −0.95 −0.9 −0.85 −0.80

1

2

3

4

5

6

7

8

σe,u

S.D

.

Panel C: ρ = 0.850

−1 −0.95 −0.9 −0.85 −0.81

2

3

4

5

6

7

8

σe,u

S.D

.

Panel D: ρ = 0.500

OLS FBC Lewellen MLA ARM

Figure 8: Simulated Standard Deviation (S.D.) Comparison

Consider the data-generating process: yt = α+βxt−1+et; xt = θ+ρxt−1+ut, with i.i.d. bivariate

normal innovations. ‘OLS’ reports the simple OLS estimates standard deviation, ‘FBC’ reports the

Stambaugh’s (1999) first-order biased-adjusted estimates standard deviation, ‘ARM’ reports Ami-

hud and Hurvich’s(2004) Augmented regression estimates standard deviation, ‘Lewellen’ reports

Lewellen’s (2004) bias-adjusted estimates standard deviation, ‘MLA’ reports estimates standard

deviation from method of least autocorrelation. β is set as 0.5 and ρ is set as indicated in the

figure. The grid values range over σe,u ∈ {−0.99,−0.98, ...,−0.81,−0.80}. For each grid point

there are 5,000 Monte Carlo simulations of T = 120 observations. The S.D. is calculated as

S.D. = std(β), where β is the parameter estimate from the above methods.

32

Page 33: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Table 1.

Power Test (5%)

The table reports powers for ordinary least square (OLS), OLS bootstrapping (OLS BT), Stam-

baugh’s(1999) first-order bias correction bootstrapping (FBC BT), Amihud and Hurvich’s(2004)

Augmented regression method (ARM), Lewellen’s (2004) conditional test (ρ ≈ 1), Bonferonni

joint test (Joint), Method of Least Autocorrelation (MLA) with order k = 5. The models are:

yt = α+βxt−1 +et and xt = θ+ρxt−1 +ut. The null is H0 : β ≤ 0 and the significance level is 5%.

β and ρ are set as indicated in the table. There are 1,500 Monte Carlo simulations and 500 boot-

strapping simulations. The bootstrap is calculated as follows: for each grid point we simulated

1,500 sample data sets, and for each simulated sample we bootstrapped 500 new data sets from

the model with bivariate normal innovations, setting β = 0 and the other parameters to their

OLS estimates. We set the bootstrapped critical value to the 95th of bootstrapped t-statistics.

Test β ρ

0.999 0.997 0.995 0.993 0.990 0.985 0.975 0.950 0.900 0.800

OLS 0.0 0.449 0.473 0.467 0.489 0.527 0.477 0.445 0.360 0.257 0.165

0.4 0.634 0.654 0.623 0.631 0.634 0.626 0.582 0.412 0.252 0.154

0.8 0.808 0.808 0.781 0.775 0.749 0.736 0.659 0.484 0.333 0.187

1.2 0.893 0.885 0.875 0.847 0.849 0.796 0.733 0.547 0.378 0.229

1.6 0.944 0.943 0.945 0.924 0.918 0.886 0.815 0.621 0.409 0.249

2.0 0.981 0.981 0.970 0.963 0.968 0.933 0.894 0.695 0.453 0.285

OLS BT 0.0 0.027 0.029 0.033 0.029 0.023 0.030 0.023 0.035 0.054 0.044

0.4 0.057 0.054 0.068 0.057 0.056 0.049 0.049 0.038 0.048 0.036

0.8 0.136 0.122 0.088 0.070 0.064 0.067 0.062 0.059 0.061 0.052

1.2 0.271 0.200 0.179 0.142 0.130 0.079 0.074 0.065 0.066 0.071

1.6 0.392 0.310 0.282 0.204 0.151 0.126 0.084 0.091 0.083 0.088

2.0 0.544 0.478 0.348 0.297 0.251 0.190 0.129 0.103 0.099 0.098

FBC BT 0.0 0.025 0.025 0.030 0.023 0.019 0.025 0.021 0.027 0.045 0.032

0.4 0.058 0.043 0.065 0.048 0.047 0.044 0.046 0.029 0.036 0.025

0.8 0.139 0.116 0.073 0.066 0.053 0.057 0.051 0.051 0.042 0.044

1.2 0.261 0.191 0.166 0.118 0.110 0.071 0.061 0.050 0.045 0.054

1.6 0.367 0.282 0.251 0.182 0.132 0.113 0.063 0.076 0.053 0.063

2.0 0.507 0.435 0.322 0.269 0.210 0.162 0.099 0.081 0.071 0.073

(To be cont’d)

33

Page 34: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Table 1. (cont’d)

Power Test (5%)

Test β ρ

0.999 0.997 0.995 0.993 0.990 0.985 0.975 0.950 0.900 0.800

ARM 0.0 0.049 0.065 0.088 0.105 0.122 0.173 0.187 0.165 0.115 0.081

0.4 0.062 0.089 0.105 0.144 0.171 0.191 0.231 0.203 0.138 0.091

0.8 0.113 0.141 0.166 0.185 0.237 0.243 0.271 0.229 0.175 0.111

1.2 0.189 0.201 0.221 0.259 0.279 0.346 0.313 0.292 0.193 0.127

1.6 0.279 0.297 0.329 0.356 0.358 0.427 0.400 0.305 0.222 0.155

2.0 0.431 0.447 0.479 0.478 0.511 0.510 0.476 0.375 0.271 0.161

ρ ≈ 1 0.0 0.029 0.012 0.003 0.002 0.003 0.001 0.000 0.000 0.000 0.000

0.4 0.382 0.160 0.075 0.030 0.010 0.002 0.000 0.000 0.000 0.000

0.8 0.795 0.594 0.399 0.197 0.069 0.019 0.001 0.000 0.000 0.000

1.2 0.921 0.865 0.772 0.597 0.339 0.069 0.004 0.000 0.000 0.000

1.6 0.974 0.953 0.906 0.817 0.634 0.273 0.023 0.000 0.000 0.000

2.0 0.990 0.983 0.970 0.942 0.851 0.539 0.100 0.000 0.000 0.000

Joint 0.0 0.034 0.017 0.016 0.011 0.007 0.011 0.010 0.013 0.020 0.015

0.4 0.305 0.111 0.079 0.040 0.025 0.018 0.020 0.017 0.017 0.011

0.8 0.740 0.522 0.330 0.157 0.054 0.028 0.019 0.026 0.017 0.024

1.2 0.892 0.825 0.692 0.521 0.264 0.065 0.032 0.021 0.020 0.027

1.6 0.961 0.926 0.890 0.787 0.562 0.219 0.034 0.036 0.022 0.039

2.0 0.987 0.977 0.953 0.932 0.804 0.480 0.091 0.041 0.034 0.041

MLA 0.0 0.049 0.057 0.043 0.051 0.045 0.045 0.045 0.054 0.042 0.042

(k = 5) 0.4 0.914 0.947 0.979 0.961 0.895 0.920 0.715 0.564 0.326 0.145

0.8 0.976 0.996 0.958 0.986 0.984 0.930 0.933 0.823 0.589 0.183

1.2 0.979 0.996 0.971 0.994 0.994 0.970 0.963 0.922 0.785 0.301

1.6 0.992 0.996 0.985 0.996 0.996 0.991 0.988 0.961 0.836 0.483

2.0 0.995 0.999 0.996 0.996 0.997 0.996 0.995 0.98 0.923 0.486

34

Page 35: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Table 2.

Summary Statistics, 1946-2003, 1946-1994, 1946-1972 and 1973-1994

The table reports summary statistics for S&P 500 index return (R), book-to-market (log(BM)),

dividend yield (log(DP)), earnings-price ratio (log(EP)), default yield spread (DFY), term spread

(Spread), inflation (Inf), corporate bond returns (Corp), net issuing (NetIss), long term govern-

ment bond returns (LTGR). Observations are monthly and the variables are expressed in percent;

log(x) equals the natural log of x expressed in percent. The data comes from Amit Goyal’s dataset

prepared for Goyal and Welch(2005). ρ is the first order autocorrelation.

Mean S.D. Skew. Kurt. ρ Mean S.D. Skew. Kurt. ρ

1946.01 - 2003.12 1946.01-1994.12

R 0.598 4.206 -0.573 5.113 0.016 0.557 4.103 -0.526 5.565 0.020

log(BM) 3.964 0.515 -0.992 3.405 0.995 4.143 0.304 -0.179 2.685 0.990

log(DP) 1.229 0.415 -0.706 3.380 0.995 1.367 0.265 0.549 2.194 0.987

log(EP) 1.918 0.418 -0.065 2.557 0.994 2.023 0.349 0.297 2.060 0.992

DFY 0.907 0.412 1.528 5.510 0.974 0.922 0.435 1.442 5.028 0.975

Spread 1.496 1.308 -0.033 3.414 0.950 1.409 1.300 -0.115 3.475 0.945

Corp 0.513 2.223 0.573 7.685 0.152 0.461 2.228 0.789 8.206 0.171

NetIss 0.037 5.724 0.734 5.551 0.122 0.005 5.777 0.794 5.870 0.136

LTGR 0.489 2.527 0.645 6.872 0.064 0.427 2.493 0.904 7.613 0.067

1946.01 - 1972.12 1973.01-1994.12

R 0.592 3.703 -0.425 3.077 0.041 0.515 4.554 -0.570 6.537 0.003

log(BM) 4.124 0.179 0.216 2.096 0.976 4.168 0.406 -0.333 1.832 0.993

log(DP) 1.374 0.290 0.696 2.118 0.991 1.358 0.231 0.090 1.811 0.979

log(EP) 1.975 0.335 0.828 2.629 0.993 2.082 0.357 -0.281 2.050 0.990

DFY 0.683 0.204 1.073 4.586 0.973 1.216 0.461 0.951 3.353 0.957

Spread 0.980 0.682 -0.181 3.004 0.952 1.935 1.642 -0.919 3.385 0.933

Corp 0.219 1.414 0.801 6.800 0.144 0.759 2.909 0.444 5.629 0.163

NetIss -0.958 4.968 0.189 3.115 0.075 1.187 6.452 0.942 6.162 0.127

LTGR 0.167 1.642 0.537 5.965 -0.038 0.747 3.221 0.629 5.321 0.081

35

Page 36: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Table 3.

Empirical Results

The table reports predictive regressions for S&P 500 index returns for five periods, 1946.01-

2003.12, 1946.01-1994.12, 1946.01-1972.12, 1973.01-1994.12 and 1995.01-2003.12.

The predictor variables include: book-to-market (BM), dividend yield(DP), earnings-price ratio

(EP), default yield spread (DFY), term spread (Spread), inflation (Inf), corporate bond re-

turns (Corp), net issuing (NetIss), long term government bond returns (LTGR). Observations

are monthly and the variables are expressed in percent. BM, DP and EP are expressed in percent

and then taken natural log. The data comes from Amit Goyal’s dataset prepared for Goyal and

Welch(2005).

‘OLS’ reports the simple OLS estimates, ‘OLS BT’ reports the bootstrapped p-value based on

the simple OLS estimates, ‘FBC BT’ reports the Stambaugh’s (1999) first-order biased-adjusted

estimate and bootstrapped p-value, ‘ARM’ reports Amihud and Hurvich’s(2004) Augmented re-

gression estimates, ‘ρ ≈ 1’ reports Lewellen’s (2004) bias-adjusted estimate and p-value assuming

that ρ is close to one, ‘Joint’ reports modified Bonferonni p-value based on ‘FBC BT’ and ‘ρ ≈ 1’,

‘MLA’ reports estimates from method of least autocorrelation and bootstrapped t-stat and p-

value. The models are:

yt = α + βxt−1 + et (16)

xt = θ + ρxt−1 + ut (17)

All of the p-values are one-sided p-values and the null hypothesis is H0 : β = 0.

36

Page 37: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Panel A: Empirical Results, 1946-2003 and 1946-1994

1946.01 - 2003.12 1946.01-1994.12

β t-stat p-value R2 σeu β t-stat p-value R2 σeu

BM OLS 0.195 4.898 0.000 0.001 -0.737 0.802 19.729 0.000 0.003 -0.862OLS BT 0.195 4.898 0.022 0.802 19.729 0.867FBC BT -0.154 - 0.292 0.254 - 0.001ARM -0.156 -3.900 1.000 0.252 6.167 0.000ρ ≈ 1 0.030 0.145 0.443 0.409 1.443 0.075MLA BT -0.303 -1.187 0.182 -0.090 -0.314 0.621

DP OLS 0.663 5.409 0.000 0.004 -0.978 1.637 13.552 0.000 0.011 -0.978OLS BT 0.663 5.409 0.846 1.637 13.552 0.755FBC BT 0.118 - 0.385 1.000 - 0.025ARM 0.115 0.937 0.174 0.997 8.211 0.000ρ ≈ 1 0.400 4.990 0.000 0.541 4.028 0.000MLA BT -0.167 -0.258 0.502 0.339 0.554 0.013

EP OLS 0.722 8.922 0.000 0.005 -0.919 0.862 10.488 0.000 0.005 -0.940OLS BT 0.722 8.922 0.982 0.862 10.488 0.010FBC BT 0.248 - 0.090 0.288 - 0.040ARM 0.246 3.029 0.001 0.285 3.450 0.000ρ ≈ 1 0.278 1.853 0.032 0.066 0.403 0.343MLA BT -0.140 -0.260 0.557 0.183 0.481 0.057

DFY OLS 0.750 4.700 0.000 0.005 0.033 0.995 6.031 0.000 0.011 0.039OLS BT 0.750 4.700 0.660 0.995 6.031 0.035FBC BT 0.759 - 0.040 1.006 - 0.011ARM 0.759 4.753 0.000 1.006 6.098 0.000ρ ≈ 1 0.790 2.035 0.021 1.037 2.666 0.004MLA BT -0.134 -0.232 0.438 0.058 0.094 0.382

Spread OLS 0.230 2.866 0.002 0.005 0.032 0.288 3.278 0.001 0.008 0.029OLS BT 0.230 2.866 0.976 0.288 3.278 0.409FBC BT 0.232 - 0.031 0.290 - 0.015ARM 0.232 2.889 0.002 0.290 3.299 0.001ρ ≈ 1 0.245 1.992 0.023 0.304 2.313 0.011MLA BT 0.360 1.826 0.060 0.018 0.078 0.571

CORP OLS 0.206 2.966 0.002 0.012 0.251 0.260 3.535 0.000 0.020 0.290OLS BT 0.206 2.966 0.571 0.260 3.535 0.569FBC BT 0.207 - 0.001 0.262 - 0.000ARM 0.207 2.979 0.001 0.262 3.552 0.000ρ ≈ 1 0.610 6.718 0.000 0.705 7.504 0.000MLA BT -0.026 -0.064 0.406 -0.013 -0.040 0.433

NetIss OLS 0.003 0.093 0.463 0.000 -0.673 0.004 0.123 0.451 0.000 -0.673OLS BT 0.003 0.093 0.979 0.004 0.123 0.236FBC BT 0.002 - 0.482 0.002 - 0.476ARM 0.002 0.058 0.477 0.002 0.084 0.467ρ ≈ 1 -0.435 -15.808 1.000 -0.413 -14.366 1.000MLA BT -0.001 -0.003 0.484 0.570 0.009 0.488

LTGR OLS 0.156 2.523 0.006 0.009 0.182 0.196 2.956 0.002 0.014 0.248OLS BT 0.156 2.523 0.391 0.196 2.956 0.053FBC BT 0.156 - 0.007 0.197 - 0.000ARM 0.156 2.531 0.006 0.197 2.968 0.002ρ ≈ 1 0.438 5.177 0.000 0.574 6.430 0.000MLA BT -0.023 -0.043 0.006 -0.013 -0.023 0.494

37

Page 38: Predictive Regressions: Method of Least Autocorrelationwise.xmu.edu.cn/seta2006/download/oral presentation... · hud, Hurvich, and Wang (2004) propose a new hypothesis testing procedure

Panel B: Empirical Results, 1946-1972 and 1973-1994

1946.01 - 1972.12 1973.01-1994.12

β t-stat p-value R2 σeu β t-stat p-value R2 σeu

BM OLS 2.185 44.148 0.000 0.011 -0.832 0.477 7.115 0.000 0.002 -0.886OLS BT 2.185 44.148 0.196 0.477 7.115 0.283FBC BT 1.235 - 0.000 -0.779 - 0.675ARM 1.226 24.619 0.000 -0.794 -11.731 1.000ρ ≈ 1 0.313 0.492 0.312 0.494 1.536 0.063MLA BT 0.389 0.719 0.079 -0.139 -0.106 0.654

DP OLS 1.228 8.439 0.000 0.009 -0.977 2.381 11.751 0.000 0.014 -0.978OLS BT 1.228 8.439 0.934 2.381 11.751 0.419FBC BT 0.067 - 0.193 0.971 - 0.028ARM 0.056 0.380 0.352 0.955 4.664 0.000ρ ≈ 1 0.671 4.464 0.000 0.295 1.161 0.123MLA BT 0.230 0.282 0.111 0.383 0.387 0.054

EP OLS 1.065 10.437 0.000 0.009 -0.931 0.689 5.182 0.000 0.003 -0.949OLS BT 1.065 10.437 0.364 0.689 5.182 0.074FBC BT 0.021 - 0.046 -0.589 - 0.177ARM 0.012 0.115 0.454 -0.604 -4.497 0.900ρ ≈ 1 0.403 1.803 0.036 -0.279 -1.125 0.869MLA BT 0.140 0.282 0.112 -0.153 -0.219 0.597

DFY OLS 1.331 4.625 0.000 0.005 -0.006 1.724 8.102 0.000 0.030 0.069OLS BT 1.331 4.625 0.180 1.724 8.102 0.845FBC BT 1.325 - 0.080 1.758 - 0.004ARM 1.325 4.604 0.000 1.758 8.264 0.000ρ ≈ 1 1.317 1.306 0.096 1.821 3.007 0.001MLA BT 0.303 0.284 0.289 0.056 0.064 0.269

Spread OLS 0.640 3.748 0.000 0.014 0.092 0.275 2.497 0.007 0.010 0.015OLS BT 0.640 3.748 0.459 0.275 2.497 0.173FBC BT 0.659 - 0.022 0.277 - 0.067ARM 0.660 3.861 0.000 0.277 2.512 0.006ρ ≈ 1 0.719 2.388 0.009 0.283 1.636 0.051MLA BT 0.019 0.043 0.428 0.032 0.125 0.455

CORP OLS 0.214 1.497 0.068 0.007 0.190 0.280 3.039 0.001 0.032 0.358OLS BT 0.214 1.497 0.776 0.280 3.039 0.734FBC BT 0.216 - 0.073 0.283 - 0.000ARM 0.216 1.512 0.066 0.283 3.069 0.001ρ ≈ 1 0.643 3.423 0.000 0.748 6.434 0.000MLA BT -0.587 -0.017 0.464 0.004 0.008 0.461

NetIss OLS -0.057 -1.405 0.920 0.006 -0.700 0.050 1.169 0.122 0.005 -0.673OLS BT -0.057 -1.405 0.237 0.050 1.169 0.000FBC BT -0.059 - 0.911 0.047 - 0.126ARM -0.059 -1.448 0.926 0.047 1.104 0.135ρ ≈ 1 -0.538 -13.392 1.000 -0.367 -8.589 1.000MLA BT 0.809 0.020 0.441 -0.001 0.000 0.585

LTGR OLS 0.338 2.740 0.003 0.023 0.089 0.155 1.827 0.034 0.012 0.347OLS BT 0.338 2.740 0.189 0.155 1.827 0.541FBC BT 0.339 - 0.003 0.157 - 0.042ARM 0.339 2.744 0.003 0.157 1.852 0.033ρ ≈ 1 0.544 3.061 0.001 0.605 5.463 0.000MLA BT -0.759 -0.007 0.138 0.004 0.014 0.148

38