on mean variance portfolio optimization - college of business

43
On Mean Variance Portfolio Optimization: Improving Performance Through Better Use of Hedging Relations * Shingo Goto University of South Carolina Yan Xu University of Rhode Island November 2009 This version: February 2012 * We thank Yufeng Han, Jingzhi (Jay) Huang, Steven Mann, Zhan Shi, Walter Torous, Raman Uppal, Rossen Valkanov, Masahiro Watanabe, Hong Yan, Frank Yu, Yuzhao Zhang and Guofu Zhou for helpful comments and discussions. This research also benefits greatly from discussions with a number of investment professionals. We are especially grateful to Miguel Alvarez at Deutsche Bank Securities; Nick Baturin at Bloomberg; Vinod Chandrashekaran, Hong Leng Chuah, Mohanaraman Gopalan, Daniel Morillo, Jeff Shen, and Li Wang at Black- rock/BGI; Ernie Chow, Jonathan Howe, and Scott Shaffer at Senstato Capital Management; Julian Douglass and Matt Kurbat at CPP Investment Board; Xiaowei Li at Fullgoal Fund Management; David Stewart at Durant Capital Management; Dawn Jia at State Street Global Advisors; and Jim Wang at Bosera Funds. Of course all errors and inadequacy are the authors’ own. Moore School of Business, University of South Carolina, 1705 College St., Columbia, SC 29208, USA; phone: +1 803 777 6644; fax +1 803 777 6876; email: [email protected]. College of Business Administration, University of Rhode Island; 7 Lippitt Road, Kingston, RI 02881-0802, USA; phone: +1 401 874 4190; fax: +1 401 874 4312; email: yan [email protected].

Upload: others

Post on 09-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

On Mean Variance Portfolio Optimization: Improving

Performance Through Better Use of Hedging Relations ∗

Shingo Goto†

University of South Carolina

Yan Xu‡

University of Rhode Island

November 2009

This version: February 2012

∗We thank Yufeng Han, Jingzhi (Jay) Huang, Steven Mann, Zhan Shi, Walter Torous, Raman Uppal, Rossen

Valkanov, Masahiro Watanabe, Hong Yan, Frank Yu, Yuzhao Zhang and Guofu Zhou for helpful comments

and discussions. This research also benefits greatly from discussions with a number of investment professionals.

We are especially grateful to Miguel Alvarez at Deutsche Bank Securities; Nick Baturin at Bloomberg; Vinod

Chandrashekaran, Hong Leng Chuah, Mohanaraman Gopalan, Daniel Morillo, Jeff Shen, and Li Wang at Black-

rock/BGI; Ernie Chow, Jonathan Howe, and Scott Shaffer at Senstato Capital Management; Julian Douglass and

Matt Kurbat at CPP Investment Board; Xiaowei Li at Fullgoal Fund Management; David Stewart at Durant

Capital Management; Dawn Jia at State Street Global Advisors; and Jim Wang at Bosera Funds. Of course all

errors and inadequacy are the authors’ own.

†Moore School of Business, University of South Carolina, 1705 College St., Columbia, SC 29208, USA; phone:

+1 803 777 6644; fax +1 803 777 6876; email: [email protected].

‡College of Business Administration, University of Rhode Island; 7 Lippitt Road, Kingston, RI 02881-0802,

USA; phone: +1 401 874 4190; fax: +1 401 874 4312; email: yan [email protected].

On Mean Variance Portfolio Optimization: Improving

Performance Through Better Use of Hedging Relations

Abstract

Portfolio optimization achieves risk reduction beyond naıve diversification by ex-

ploiting hedging relations among stocks. Stevens (1998) shows that the inverse

covariance matrix prescribes the hedge trades where a portfolio of stocks hedges

each one with all other stocks to minimize portfolio risk. However, in a large port-

folio of, for example, 100 stocks, using the other 99 stocks to hedge one stock is not

necessarily optimal, as it entails large estimation errors due to multicollinearity. To

improve the mean-variance portfolio optimization, we propose a method to reduce

the effects of estimation errors in the hedge trades. We introduce a sparse inverse

covariance matrix estimator, which effectively reduces the number of stocks in each

hedge trade. Many, but not all, of the inverse are zero in the proposed inverse,

meaning that each hedge trade uses only a subset of the stocks in a given portfo-

lio. Supporting our motivation, portfolios formed on the proposed estimator achieve

significant out-of-sample risk reduction in a range of empirical applications. Their

out-of-sample performance compares favorably with those under no-short-sale re-

striction (Jagannathan and Ma (2003)), using shrunk covariance matrix (e.g. Ledoit

and Wolf (2003, 2004)), and using industry factor model (Chan, karceski and Lakon-

ishok (1999)). By improving hedge trades, our approach delivers higher certainty

equivalent returns after transaction costs.

Mean variance portfolio optimization relies on covariances to transform expected returns

into optimal portfolio weights. Consider a portfolio of N stocks. A portfolio manager uses an

asset pricing model to predict a vector of expected returns, denoted by µ, and employs a risk

model to predict the covariance matrix, denoted by Σ.1 Then, it is well known that her optimal

portfolio weights are summarized by

w ∝ Σ−1µ.

In this familiar result, the inverse covariance matrix Σ−1 plays a pivotal role in transforming

the return forecast µ into the optimal portfolio weights w. Therefore, we take the liberty of

calling the inverse covariance matrix, Ψ ≡ Σ−1, the mean variance optimizer.2

While simple and forceful in theory, the idea of mean variance portfolio optimization turns

out to be surprisingly difficult to implement. In many real world situations, the available number

of historical return observations per stock (T ) is not much larger than the number of stocks

(N).3 Then, the inverse covariance matrix (the mean variance optimizer) estimated from the

sample becomes very sensitive to small perturbations in the estimated covariances, making it

unstable from one period to another. As a consequence, the mean variance optimizer tends

to produce poor out-of-sample performance. The mean variance optimization tends to erode,

rather than enhance, the gains from naıve diversification policies such as equal-weighting (e.g.,

Jobson and Korkie (1981a), DeMiguel, Garlappi, and Uppal (2009).) Michaud (1989) even calls

the mean variance optimization the “error maximization.”4

1We assume that a riskless asset exists for both borrowing and lending. Unless otherwise stated, the termstock return in this paper refers to excess return, which is calculated by subtracting the return of the risklessasset from the total return.

2Inverse covariance matrix is also called “precision matrix” or “concentration matrix” (Dempster (1972)).

3For example, consider an active manager who runs a number of portfolios based on her proprietary sourcesof information (“signals”). Each signal portfolio tends not to have a large number of historical returns.

4Please see Brandt (2009) for a comprehensive review of the literature.

1

To regain our confidence in the mean variance optimization framework, we propose an im-

provement of the mean variance optimizer Ψ = Σ−1. In principle, the mean variance optimizer

achieves risk reduction beyond naıve diversification by hedging relations among stocks. Specifi-

cally, since stock returns are correlated, we can use each stock to hedge others. Motivated by this

simple idea, our proposed estimator of Ψ = Σ−1 delivers a significant portfolio risk reduction

out-of-sample. Our approach has three features that are different from previous contributions.

First, our motivation to enhance the hedging relations among stocks differentiates us from

many existing papers that start from assuming a structure on the return generating process

(e.g., linear factor model). To elaborate, Stevens (1998) shows that the inverse covariance

matrix of stock returns reveals the optimal hedging trades among stocks. Specifically, the i-th

row (or column) of Ψ is proportional to the stock’s minimum variance hedge portfolio. The

hedge portfolio consists of a long position in the i-th stock, and a short position in the “tracking

portfolio” of the other N − 1 stocks to hedge the i-th stock. We aim to enhance the mean

variance optimizer Ψ = Σ−1 by improving each hedge portfolio.

Second, we focus on estimating the inverse covariance matrix Ψ = Σ−1 by directly imposing

a structure on itself, rather than on the covariance matrix Σ first then inverting as existing

approaches do. For example, Chan, Karceski, and Lakonishok (1999) use a low dimensional

factor structure on the covariance matrix to improve the optimized portfolio’s out-of-sample

performance.5 Many commercial risk models (e.g. APT, Axioma, MSCI Barra, Northfield,

etc.) also appear to impose multivariate factor structures on the covariance matrix, though

their model details are proprietary and unknown.6 Ledoit and Wolf (2003, 2004a,b) shrink

5Fan, Fan, and Lv (2008) also report the advantage of employing a factor structure in optimal portfolioconstruction.

6There is also a voluminous work in the area of “robust portfolio optimization” with emphasis on the optimiza-tion process, especially from practitioners’ perspectives. Please see Fabozzi, Kolm, Pachamanova, and Focardi

2

the sample covariance matrix toward a more parsimonious target matrix, such as a constant

correlation matrix or a covariance matrix with a one factor structure. In contrast to them, our

approach applies an Occam’s razor to the inverse covariance matrix (mean variance optimizer)

Ψ = Σ−1, rather than to the covariance matrix Σ.

Third, our Occam’s razor promotes sparsity of Ψ = Σ−1. That is, our estimation method-

ology drives some of the off-diagonal elements of Ψ to zero, and these zero elements indeed

constitute a significant fraction of our estimator in empirical applications. Note that the spar-

sity of Ψ implies neither the sparsity of the covariance matrix Σ ≡ Ψ−1 nor the sparsity of the

optimal portfolio weights.7 It generally utilizes most stocks in the portfolio.

Our proposed estimator belongs to a wide class of shrinkage estimators, in that we bias

(shrink) the estimator in a direction to reduce estimation errors. However, the gains from

reducing estimation errors outweigh the costs of deviating from the optimal hedge suggested

by the past evidence. For example, each hedge portfolio is supposed to hedge one stock with

the other N − 1. But do we really need to use all the remaining N − 1 stocks? Does the fact

that the i-th and j-th stocks hedge each other strongly in the sample period imply that this

particular hedging relationship will remain in the future? With finite samples, these questions

are very difficult to answer. In this situation, we do not shy away from choosing a biased answer

– less hedging than the data suggests, or even no hedging – because such a solution reduces the

magnitude of estimation errors. As we show, our estimator of Ψ = Σ−1 is typically sparse in

empirical applications, meaning that each hedge portfolio includes only a subset of the stocks

in a given portfolio.

(2007) for a review. However, our approach is quite distinct from this venue of research.

7For example, a diagonal Ψ is sparse because all of its off-diagonal elements are zero. In this extreme case,for global minimum variance portfolio, each portfolio weight is the diagonal element itself thus non-zero.

3

To promote shrinkage and sparsity, we estimate Ψ by maximum likelihood with an addi-

tional constraint on the sum of the absolute values (i.e., l1 norm) of its off-diagonal elements.8

Intuitively, this estimation methodology works as follows: The constraint imposes a penalty on

the absolute quantities of the hedge trades, thus shrinking their overall trade size. Meanwhile,

different stocks compete with each other to enter the hedged portfolios. The methodology turns

off the j-th stock in the i-th hedged portfolio if the marginal gain from retaining the stock is

not worth the cost. This is called the “soft thresholding”. In effect, this approach restricts

the (i, j)th and (j, i)th elements of the inverse covariance matrix to be zero. In this way, we

encourage shrinkage and sparsity simultaneously in the estimation of Ψ.

Aiming for risk reduction, we test examine the out-of-sample performance of the proposed

sparse mean variance optimizer using a few representative datasets in the US and international

stock markets. Our datasets include portfolios available on Ken French’s webpage, and 100

portfolios of randomly selected 100 individual stocks from the NYSE/AMEX universe. The

proposed estimator of the mean variance optimizer accomplishes a substantial reduction in the

out-of-sample risk of the global minimum variance portfolio compared to the one based on the

sample covariance matrix. Empirically, the proposed estimator also compares favorably with

the portfolio with no-short-sale restriction (Jagannathan and Ma (2003)) and with the portfolio

based on the shrinkage covariance matrix estimator of Ledoit and Wolf (2004a). The proposed

also dominates industry factor models in improving portfolio performance of individual stocks.

The relative strength of our proposed estimator is more prominent in datasets with a large

number of assets per periods, N/T . The optimizer produces stable portfolio weights even when

8We follow Yuan and Lin (2007) and Rothman, Bickel, Levina, and Zhu (2008) in setting up the likelihoodfunction with the penalty, and employ the “glasso” algorithm of Friedman, Hastie, and Tibshirani (2008) to solvethe estimation problem.

4

N/T exceeds one, that is, when the sample covariance matrix is not invertible. The reduction in

the portfolio risk is also accompanied by a high and stable level of Sharpe ratios and certainty

equivalent returns. For many datasets, the proposed estimator of the inverse covariance matrix

Ψ delivers positive gains in certainty equivalent returns after accounting for transaction costs.

Furthermore, with the improved optimizer Ψ in place, no-short-sale constraint no longer helps

improve the out-of-sample portfolio performance. Consistent with Green and Hollifield (1992),

by improving hedge trades, Ψ helps achieve further portfolio risk reduction beyond the no-short-

sale constraint.

The rest of this paper is structured as follows. Section I. elaborates on Stevens’ (1998)

framework to motivate our approach. Section II. proposes an improved estimator of Ψ. After

setting up the out-of-sample portfolio analysis in Section III., Section IV. presents evidence on

its out-of-sample performance. Section V. concludes.

I. The Role of Hedging in Portfolio Risk Minimization

In order to motivate our approach, we first discuss the role of hedge portfolios in mean variance

portfolio optimization. Stevens (1998) shows that the inverse covariance matrix Σ−1 reveals

the optimal hedging relations among stocks. Specifically, the i-th row (or column) of Σ−1 is

proportional to the i-th stock’s hedge portfolio. The i-th hedge portfolio consists of taking (1)

a long position in i-th stock and (2) a short position in a portfolio of the other N − 1 stocks

that tracks the i-th stock return. Each tracking portfolio can be estimated from the following

regression:

ri,t = αi +

N∑k=1,k 6=i

βi|krk,t + εi,t, (1)

5

where ri,t denotes the i-th stock return in period t. εi,t is the unhedgeable component of ri,t,

whose variance is denoted by υi = V ar (εi,t). υi is a measure of the i-th stock’s unhedgeable

risk. The objective of each hedge regression is to minimize υi, and consequently, we can view

(1) as an OLS estimation problem. Stevens (1998) calls this a “regression hedge.” When βi|j is

different from zero in population, it implies that the j-th stock provides a greater hedge for the

i-th stock beyond the effects of other N − 2 stocks in the portfolio.

Let us denote the N×N covariance matrix and its inverse by Σ and define Ψ = Σ−1 =[ψij],

with ψij as the (i, j)th element. Then, Stevens (1998) establishes the following identity between

Ψ and the hedge regression (1):9

ψij =

−βi|j

υiif i 6= j

1υi

if i = j

. (2)

Remember that the hedge coefficient βi|j represents the ability of the j-th stock to hedge the

i-th stock beyond the effects of other N − 2 stocks. We can view ψij as a measure of marginal

hedgeability between the i-th and j-th stocks conditional on the presence of all other stocks in

the portfolio.10 If the i-th and j-th stocks are uncorrelated with each other after controlling for

all other stocks, then βi|j = 0 and hence ψij = 0 must hold.

We can gain more useful insights about the inverse covariance matrix by looking at its i-th

row:

Ψ (i, ·) =1

υi

[−βi|1, ...,−βi|i−1, 1,−βi|i+1, ...,−βi|N

]. (3)

9For the symmetry ψij = ψji, see Stevens’ (1998) footnote 3.

10Much earlier in the statistics literature, Dempster (1972) shows that ψij represents the conditional dependencebetween the i-th variable and the j-th variable given all other variables in the system.

6

The identity (2) and the expression for the hedge portfolio holdings (3) form the basis for

subsequent analysis.

We can regard (3) as a vector of stock holdings in the i-th stock’s hedge portfolio. It involves

a unit long position in the i-th stock and a short position of the hedge portfolio constructed

from the regression (1). The hedge portfolio is a long/short portfolio whose holdings do not

necessarily sum to a prefixed value (e.g. one). However, holdings of each asset are scaled by

1/υi, meaning that the portfolio takes a larger (smaller) position in a stock when its unhedgeable

risk is smaller (larger). We call Ψ the mean variance optimizer because it provides a detailed

prescription about the hedging trades we need to make to minimize the portfolio risk.

II. An Improved Mean Variance Optimizer

A. Objective

The framework laid out in section I. is useful to understand the sources of potential large esti-

mation errors in mean variance portfolio optimization. As shown above, each hedge regression

(1) contains a constant and N − 1 stock returns as regressors. However, many of these stock

returns are highly correlated. Furthermore, in many practical applications, the number of avail-

able historical returns to estimate the regression is not much larger than the number of stocks.

Therefore, estimation of the hedge regression with practical sample size subjects us to the econo-

metric problem: multicollinearity.11 The undesirable consequences of multicollinearity are that

the estimated hedge coefficients (β’s) have such large estimation errors that the estimates are

unstable from sample to sample, and hence too unreliable to be useful. By the identity (2), this

11Please see Judge, Griffiths, Hill, Lutkepohl, and Lee (1980) , Chapter 12, for a detailed discussion of multi-collinearity.

7

also implies that the off-diagonal elements of Ψ are also susceptible to large estimation errors

and instability.

We conquer the multicollinearity problem in two ways.12 First, in the hedged portfolio of a

stock, we shrink the portfolio holdings of other stocks. The fact that the i-th and j-th stocks

hedge each other in the past is not necessarily indicative of a continued mutual hedging relation

going forward. In the presence of this uncertainty, we prefer erring on the non-hedging side,

because it reduces the estimation errors. From the identity (2), this is equivalent to shrinking

the off-diagonal elements of the inverse covariance matrix. The rationale behind the shrinkage

is well known: an interior optimum exists in the trade-off between bias and estimation error if

our objective is to minimize the forecast error variances. On the one hand, shrinkage introduces

biases in the optimal weights of a hedge portfolio; on the other hand, it reduces the estimation

errors in the estimated portfolio weights. For proper levels of shrinkage, gains from the reduction

of estimation errors dominate the costs of biases.

Secondly, our estimation method curbs the multicollinearity more directly by just turning off

stocks that are not worth including in the hedge portfolio. In other words, our method restricts

some of the hedge portfolio weights to zero, which is equivalent to imposing zero restrictions on

the off-diagonal elements of Ψ. This corresponds to a strong form of shrinkage. When (i, j)-th

element of Ψ is zero, it simply means that the i-th and j-th stocks do not help hedge each other

in the presence of other N − 2 stocks in the portfolio. For example, when N stock returns are

driven by a small number of common factors, we may not need to use all stocks to hedge each

other.

12An alternative solution to mitigate the multicollinearity problem is to increase the number of historicalreturns (T ). For example, Jagannathan and Ma (2003) report a significant gain from using daily returns, insteadof monthly returns, to estimate the sample covariance matrix. However, in practical applications, decision makersare often constrained by the available number of historical returns.

8

In statistical literature, such a solution is called ‘lasso’ (Tibshirani, 1996). Specifically, the

shrunk parameters from hedge regression in the (1) have the following relationship with those

obtained from an OLS regression:

βi|k = sign(βOLS

i|k

)(∣∣∣βOLSi|k

∣∣∣− γ)+

where (x)+ = max(x, 0), βOLS

i|k is the full set of least square estimates, and γ is a certain soft

threshold determined by the tuning parameter. Clearly, when the magnitude of an OLS point

estimate is below the soft threshold, βi|k is set to zero. Consequently, (i, j)-th element of Ψ is

set to zero. When the absolute value of an OLS point estimate is above the soft threshold, its

sign will be given to that of βi|k, however the magnitude of βi|k is shrunk by the magnitude of

the soft threshold.

The zero restriction may again introduce potential misspecification biases (omitted regressors

in hedge regressions), but it leads to lower estimation errors. In this way, we promote the sparsity

of the mean variance optimizer Ψ = Σ−1. Needless to say, it does not imply the sparsity of the

covariance matrix Σ ≡ Ψ−1 since all stock returns are still correlated with each other. As a

matter of fact, Σ is hardly sparse even when Ψ is sparse.

B. Relation to the Recent Literature

The trade-off between the misspecification biases and estimation errors has been emphasized

in a few recent papers. For example, Jagannathan and Ma (2003) demonstrate that “wrong

constraints” (leading to misspecifications) can help improve the out-of-sample performance of

optimized portfolios to the extent that they reduce estimation errors. Ledoit and Wolf’s (2003,

2004a,b) main impetus for shrinking the sample covariance matrix is also to strike an optimal

9

balance between the misspecification biases and the estimation errors. Our approach shares the

same objective, but suggests a different route to achieve it.

An oft-cited criticism against mean variance optimization is that optimized portfolios often

involve extreme and unstable weights. Green and Hollifield (1992) argue that the extreme

weights are due to the presence of a dominant systematic factor (i.e., the market factor) in the

covariance structure: large long and short positions are unavoidable to hedge the systematic

risk regardless of the estimation errors. In response, Jagannathan and Ma (2003) demonstrate

that the measurement errors in the estimated covariance matrix still play a prominent role in

deteriorating the portfolio performance, as imposing no-short-sale constraint – that is “wrong”

in population – can improve the portfolio performance by curtailing estimation errors. Our

estimator of Ψ = Σ−1 combines some of the salient features in both papers. First, it focuses

on reducing estimation errors by promoting the parsimony of Ψ. Second, it aims to capture

the hedging relations better, so that it aids the portfolio to achieve a better hedging of the

systematic risk.

In a recent contribution, DeMiguel, Garlappi, Nogales, and Uppal (2009) (“DGNU” for

short) propose a general framework to constrain the norms of the final weighting solutions.

Their general and flexible framework encompasses many existing portfolio weighting methods,

including those of Jagannathan and Ma (2003) and Ledoit and Wolf (2003, 2004a,b). While we

believe DGNU sets a new stage in the portfolio optimization literature, we also feel that it is

worthwhile to step back to reexamine the key input (Ψ = Σ−1) for portfolio optimization. From

a practical perspective, constraining the norms of the portfolio weights is interpreted as using

a prior on the weighting solution (output) as DGNU note, but most portfolio managers acquire

and process information about the inputs rather than the outputs.

10

Furthermore, hedging relations among stocks are not necessarily stable, and portfolio man-

agers may choose to rely on their priors in choosing inputs under certain market conditions.

For example, it is known that hedging relations among various style factors, such as size and

value, have undergone substantial shifts periodically.13 With such shifts, historical covariance

provides less useful guidance, but some portfolio managers are more adept at predicting the op-

timal hedging relations among the factors. Our approach provides more flexibility to portfolio

managers as they can revise the elements of Ψ = Σ−1 to better reflect their private information

when necessary. This is possible because our approach makes the relationship between hedging

relations and optimal portfolio weights transparent.

Needless to say, out-of-sample performance of the optimized portfolios depends not only on

the optimizer Ψ = Σ−1, but also on the expected return forecast µ. Given the well-known perils

of using sample means, expected return µ should further depend on an asset pricing model

and the manager’s private information.14 Since there is a one-to-one mapping between (w,Ψ)

and (µ,Ψ), Tu and Zhou (2009a) ? propose to build economic objectives into the prior on

portfolio weights w rather than on µ. This provides a sensible approach when accompanied by

an informative input (or prior) for Ψ. Meanwhile, the success of this approach hinges on the

quality of Ψ, as the optimal weighting solution is susceptible to estimation errors in Ψ. Our focus

is on the significance of improving Ψ independent of the choice of µ, as an improved estimator of

Ψ should complement recent innovations in the literature. Consequently, we leave the problem

of finding an optimal input of µ outside the scope of this study. In empirical applications, our

13For example, rolling correlations among the Market, SMB, and HML factor returns have alternated signsbetween strongly negative and strongly positive.

14To improve the quality of µ, recent literature incorporates various theoretical restrictions in a prior distribu-tion of from a Bayesian perspective. Seminal works include Black and Litterman (1992) , Pastor (2000), Pastorand Stambaugh (2000), and MacKinlay and Pastor (2000).

11

primary focus is on the out-of-sample portfolio risk reduction of the global minimum variance

portfolio.

Nevertheless, the importance of improving the mean variance optimizer Ψ should not be

understated. While there is a general perception that estimation error in µ is far more costly

than the estimation error in Ψ, Kan and Zhou (2007) demonstrate that when N/T is not so

small, there is a very significant interactive effect between the estimation errors in µ and Ψ that

can make the optimized portfolios very unstable and unreliable.15 This is the situation in which

a stable optimizer is particularly called for. It is thus of our interest to see if our our estimator

of Ψ = Σ−1 achieves a significant reduction in the out-of-sample portfolio risk when N/T is

large.

C. Empirical Implementation

The two solutions we propose – (i) shrinkage and (ii) selection of stocks in each hedge regres-

sion – are not incompatible. We now have a statistical technology to address both objectives

simultaneously in a parsimonious manner: the least absolute shrinkage and selection operator

(lasso) (Tibshirani (1996)). They key innovation of lasso is to constrain the l1 norm of the

parameters that need to be estimated. To take advantage of this innovation, we estimate the

inverse covariance matrix Ψ directly by maximum likelihood, but with a constraint on the l1

norm of its off-diagonal elements.16

15See Chan, Karceski, and Lakonishok (1999), Jagannathan and Ma (2003), Ledoit and Wolf (2003, 2004a),among others, for similar perspectives. Kan and Zhou (2007) and Tu and Zhou (2009b) theoretically demonstratethat, in the presence of estimation errors, mean variance investors should allocate a significant fraction of wealthto the global minimum variance portfolio and equal weighted portfolio when N/T is large.

16Lasso has been receiving attention in recent econometrics literature. Caner (2009) studies a lasso-typeestimator which is formed by the GMM objective function with the addition of a penalty term. He shows thatthe lasso-type GMM correctly selects the true model much more often than other regular procedures. Bai andNg (2008) consider lasso to perform selection and shrinkage simultaneously in factor forecasting of time series .

12

Let Rt = (r1,t, . . . , rN,t)′ be a vector of N excess stock returns at time t. We consider a

problem of estimating the inverse covariance matrix Ψ from a sample of T historical observa-

tions of Rt. Put differently, T specifies the length of the “estimation window.” Rt denotes the

“centered” vector of Rt, whose elements have zero time-series means in the estimation window.17

Then, the log likelihood for the inverse covariance matrix Ψ = Σ−1 in the estimation window is

T

2ln |Ψ| − 1

2

T∑l=1

R′t−l+1ΨRt−l+1. (4)

We seek an estimator Ψ that maximizes the likelihood function (4) subject to the following

constraint: ∑N

i=1

∑N

j=1i 6=j

∣∣ψij∣∣ ≤ τ , (5)

where τ ≥ 0 is a tuning parameter so that the sum of∣∣ψij∣∣ (across all i and j, i 6= j) must be less

than or equal to τ . Expression (5) constrains the sum of the absolute values of its off-diagonal

elements, ψij (i 6= j) . In other words, we constrain the l1 norm of the off-diagonal elements of

the inverse covariance matrix Ψ.18

Intuitively, the constraint (5) promotes the overall shrinkage by restricting the sum of∣∣ψij∣∣ .

Meanwhile, it imposes the restriction of the form∣∣ψij∣∣ = 0 in the following way. Under the

constraint (5), different elements of ψij compete with each other to remain non-zero. However,

if the marginal gain from retaining the j-th stock in the i-th stock’s hedged portfolio does

not justify the cost, then we turn off the j-th stock in the i-th hedged portfolio. Then, by

the identity (2) and symmetry, this effectively restricts the (i, j)th and (j, i)th elements of the

17The MLE estimator of the mean is always the sample mean.

18Yuan and Lin (2007) and Rothman, Bickel, Levina, and Zhu (2008) solve this estimation problem in adifferent setup (graphical model).

13

inverse covariance matrix to be zero.

A Lagrangian expression of this constrained maximum likelihood estimation problem is

maxΨ=[ψij]

T

2ln |Ψ| − T

2tr(SΨ)− ρ∑N

i=1

∑N

j=1i 6=j

∣∣ψij∣∣ , (6)

where S is the sample covariance matrix:

S =1

T

T∑l=1

R′t−l+1Rt−l+1.

The regularization parameter ρ ≥ 0 denotes the penalty on the l1 norm,∑N

i=1

∑Nj 6=i∣∣ψij∣∣ . The

estimator, Ψρ, depends on the regularization parameter ρ, as signified by the subscript. A

larger value of ρ promotes more sparsity of Ψρ whereas ρ = 0 makes the solution identical to the

unconstrained MLE solution. We will discuss the choice of ρ when we evaluate out-of-sample

performance in the following section. We conduct our estimation using the graphical lasso

(glasso) algorithm of Friedman, Hastie, and Tibshirani (2008).19 In Appendix, we explain how

the estimation problem (6) is related to the system of hedge portfolios introduced in Section I.

19The “penalized” maximum likelihood estimation problem (6) is closely related to the model selection problemconsidered by Akaike (1974) and Schwarz (1978). They propose information criteria (AIC and BIC) to trade-off between model misspecification bias and estimation errors in selecting from competing models. For example,

according to AIC, the optimal inverse covariance matrix Ψ solves: ΨAIC = arg max T2

ln |Ψ|− T2tr(SΨ)−Card(Ψ)

where Card(Ψ) is the number of non-zero elements of Ψ, while Schwarz’s BIC uses lnT2Card(Ψ) instead of

Card(Ψ). The information criteria penalize the number of parameters Card(Ψ) to discourage model complexityand overfitting. Unfortunately, when N is not small, the penalties on Card(Ψ) make our estimation problem

infeasible because Ψ can be sparse in 2n(n−1)/2 different ways. However, by replacing Card(Ψ) or lnT2Card(Ψ)

with the l1 norm ρ∑Ni=1

∑Nj 6=i

∣∣ψij∣∣, we can combine model selection with parameter estimation within a feasible

convex optimization problem. Indeed, graphical lasso (glasso) algorithm accomplishes two goals in just one step.Friedman, Hastie, and Tibshirani (2008) make the glasso algorithm publicly available in R programming language.

14

III. Out-of-Sample Evaluation: Setup

A. Dataset and Methodology

To test the out-of-sample performance of the proposed mean variance optimizer, Ψρ, we employ

the following datasets listed in Table 1. First, the datasets cover the representative portfolios,

both US and international, that are of interest to both academic researchers and industry

practitioners. They can be classified as: portfolios formed on size and book-to-market ratio

(#1,2), industry portfolios (#3,4), country market indexes (#7,10), value and growth portfolios

in international markets (#8), and their combinations (#5,6,9). The number of assets ranges

between 15 and 148.20 Because we are interested in cases in which N/T is not small, our datasets

have more assets than those studied by DeMiguel, Garlappi, Nogales, and Uppal (2009) and

Tu and Zhou (2009a,b). Second, following Jagannathan and Ma (2003), we extract individual

stocks from AMEX and NYSE with stock price no less than 5 dollars, and with previous 120

months price record available. In order for our results not to be decided purely by a certain

combination of some individual stocks, we randomly choose 100 individual stocks at a time for

100 times, and report the average results in the following tables.

Table 1 about here.

We primarily focus on the out-of-sample performance of the minimum variance portfolio

of risky assets (stocks). Since the minimum variance locus must be constructed from risky

assets only and without risk-free ones, this portfolio also requires full initial investments. We

thus exclude any “spread assets” such as the SMB and HML factor portfolios and futures

contracts, otherwise they will allow investors to obtain risky exposure without making initial

20We thank Ken French for making many of the datasets available.

15

investments (except margins). Then spread assets won’t have bona fide returns unless we further

assume that they are covered by risk-free assets, a contradiction with above. Therefore, our

out-of-sample analysis excludes spread assets and imposes the usual constraint 1′Nw = 1, where

1N denotes the N × 1 vector of ones.21

We evaluate the out-of-sample portfolio performance using the standard “rolling-horizon”

approach. In each month t, we construct the global minimum variance portfolios using the past

120 months (10 years) of stock returns (the “estimation window,” T = 120). Next, we hold such

portfolios for one month and calculate the portfolio returns in month t + 1 out-of-sample. We

continue this process by adding the return for the next period in the dataset and dropping the

earliest return from the estimation window. The choice of the rolling estimation window size,

T = 120, follows the standard practice in the literature. Meanwhile, we are also interested in

the case of large N/T where a stable optimizer is called for. Therefore, our analysis is conducted

on datasets that have at least 100 assets (#2,6,8,9), and also on those with N > T in which the

sample covariance matrix is not invertible.22 Column [5] of Table 2 reports the N/T ratios of

our datasets.

B. Choice of the Regularization Parameter ρ

Our proposed Ψρ depends on the regularization parameter ρ [equation (6)]. Since choosing the

parameter after observing out-of-sample performance induces a look-ahead bias and thus is not

appropriate, we fix the first 120 months in-sample, in which we search for the value of ρ that

21Of course, if our primary focus is on the Sharpe ratio rather than the portfolio risk, then it is certainlyinteresting to release the 1′Nw = 1 constraint and incorporate various spread assets into analysis. However, our

current paper focuses on the out-of-sample performance of Ψρ and abstracts away from the prediction of expectedreturns.

22To conserve space, we only report the results for T = 120. We have also conducted an analysis using a shorterestimation window T = 60, and results are available upon request. In fact,with T = 60 the proposed estimatorachieves more significant gains in forecasting future covariances and reducing out-of-sample risk in many datasets.

16

maximizes the predictive likelihood using a grid with increment of 0.1. Then we adhere to this

choice throughout the out-of-sample “testing period” and burn in the first 120 out-of-sample

“training period” portfolio returns. Our approach is simple and conservative, but the optimizer

can deliver consistent performance when the optimal value of ρ remains stable over time.

For each data set, columns [1] and [2] of Table 2 summarize the out-of-sample training and

testing periods. For example, the out-of-sample testing period starts in July 1983 for the first

6 datasets and in January 1995 for the next three datasets, and so on. The total number of

testing periods is 330 months for the first 6 datasets, 192 months for the next three datasets,

168 months for the tenth dataset (MSCI), and 216 months for the individual stocks.

Table 2 about here.

C. Descriptives for Ψρ

Column [4] of Table 2 reports the values of ρ chosen in the training period, which range from

0.3 to 5.9 across datasets.23 The next column [5] reports the “sparsity” of Ψρ, measured by the

percentage of zero off-diagonal elements. The time-series average of the sparsity ranges from

22.4 percent to 47.1 percent, meaning that a significant fraction of the inverse covariance matrix

is set to zero. In this sense, the proposed estimator Ψρ is indeed highly sparse.

Table 3 reports the condition numbers for Ψρ and S−1 as well as for Ledoit and Wolf’s (2004a)

shrunk covariance matrix (sample covariance matrix shrunk to constant correlation matrix), the

inverse of which is denoted by Σ−1LW .

24 When covariance matrices are ill-conditioned, i.e., when

23Recall that we have 100 randomized 100-individual stock samples from AMEX and NYSE. 5.9 thus is theaverage value of from these 100 samples.

24In constructing the Ledoit-Wolf shrunk covariance matrix ΣLW , we replace the sample covariance matrix

(S) with a weighted average of the sample covariance matrix and a shrinkage target. For the latter, we use aconstant correlation matrix, that is, the correlation between any two stocks is just the average of all the pairwisecorrelations from the sample covariance matrix. We appeal to the principle of parsimony and choose this shrinkage

17

condition numbers are large, estimation errors amplify in the inverse operation. It can be seen

that Ψρ and Σ−1LW behave much better than S−1 as the condition numbers are much smaller for

Ψρ and Σ−1LW than for S−1. As the ratio N/T gets larger, the sample covariance matrix becomes

more ill-conditioned. When N/T exceeds one, S−1 does not exist. Nevertheless, Ψρ and Σ−1LW

can still yield stable condition numbers.

Clearly ΣLW and our proposed estimator Ψρ share the same objective to achieve a supe-

rior prediction of covariances by reducing estimation errors. However, Σ−1LW inherits the well-

conditionedness from shrinking ΣLW toward a parsimonious structure. In contrast, our proposed

estimator Ψρ directly shrinks the inverse covariance matrix toward a sparse structure. Table

3 shows that both Ψρ and Σ−1LW help stabilize S−1 in situations when the sample covariance

matrix is ill-conditioned or even non-invertible.

Table 3 about here.

IV. Out-of-Sample Evaluation: Evidence

A. Out-of-Sample Portfolio Risk Minimization

We are primarily interested in the ability of the proposed mean variance optimizer Ψρ in reducing

the out-of-sample portfolio risk. Our focus is on the global minimum variance portfolio, because

this portfolio depends only on the estimator of the inverse covariance matrix Σ−1 but not on

expected returns. For a given estimator of the inverse covariance matrix, Σ−1, the global

target instead of one obtained by further assuming a one-factor structure (such as a market model). In fact, Eltonand Gruber (1973) and Elton, Gruber, and Urich (1978) show that the constant correlation model produces betterforecasts of the future correlation matrix than those obtained from the market model or the sample correlationmatrix. We then solve the optimal weight (shrinkage intensity) by minimizing the loss function of Ledoit and Wolf(2004b), which is represented by the Frobenius norm of the distance between the true covariance and shrinkageestimator.

18

minimum variance portfolio is

wGMV =1

1′N Σ−11NΣ−11N .

By replacing Σ−1 with the proposed sparse estimator Ψρ, we propose the global minimum

variance portfolio: wGMV-Ψρ= (1′N Ψρ1N )−1Ψρ1N . We denote this portfolio by GMV-Ψρ.

We compare the out-of-sample portfolio risk of GMV-Ψρ with those of the following alter-

native portfolios.

• The sample-based global minimum variance portfolio, denoted by GMV-S−1. Its portfolio

weights are summarized by wGMV-S−1 = (1′N S−11N )−1S−11N .

• The equal-weighted (1/N) portfolio, wEW = 1N 1N = (1′N1N )−11N . We obtain this port-

folio (denoted by EW) when we replace Σ−1 with the identity matrix. EW (1/N) does

not require any estimation and hence is free of estimation errors. Its strong and stable

out-of-sample performance is well known. (see DeMiguel, Garlappi, and Uppal (2009) for

a recent review).

• The sample-based global minimum variance portfolio with no-short-sale constraint such

that every single portfolio weight has to be non-negative, as proposed by Jagannathan

and Ma (2003). We denote this portfolio by GMV-JM. Specifically, GMV-JM minimizes

w′Sw subject to 1′Nw = 1 and wi ≥ 0 for i = 1, ..., N where wi denotes i-th element of w.

S is the sample covariance matrix.25

• The global minimum variance portfolio constructed from the Ledoit and Wolf’s (2004a)

25Jagannathan and Ma (2003) show that the constrained solution corresponds to a shrunk sample covariancematrix, in which covariance between a certain asset and others is reduced when the constraint on this asset isbinding.

19

shrinkage estimator, Σ−1LW . We call this portfolio GMV-LW. Its portfolio weights are sum-

marized by wLW = (1′N Σ−1LW1N )−1Σ−1

LW1N .

In the out-of-sample test, we first obtain the time-series of out-of-sample returns for the

five portfolios: GMV-Ψρ, GMV-S−1, EW (1/N), GMV-JM, and GMV-LW. Then we calculate

the out-of-sample portfolio variances and standard deviations, the latter denoted by σΨρ, σS−1 ,

σEW , σJM , and σLW , respectively.

We then inspect if the proposed portfolio GMV-S−1 achieves a reduction in the out-of-

sample portfolio risk compared to the four alternatives. We test the null hypothesis of no

difference in out-of-sample portfolio risk by computing bootstrap two-sided confidence intervals

for σS−1 −σΨρ, σEW −σΨρ

, σJM −σΨρ, and σLW −σΨρ

. We again apply Politis and Romano’s

(1994) stationary bootstrap with expected block size of 5, following the standard practice in

the recent literature (e.g., Ledoit and Wolf (2008), DeMiguel, Garlappi, and Uppal (2009),

DeMiguel, Garlappi, Nogales, and Uppal (2009)).

Columns under heading [1] of Table 4 report the out-of-sample variances of the five portfolios.

The portfolio variance of GMV-S−1 gets very large in datasets with large N/T . The portfolio

cannot even be constructed when N > T , because S−1 does not exist then. On the other hand,

EW (1/N), GMV-JM, and GMV-LW attain stable portfolio variance across datasets.

Furthermore, our proposed portfolio GMV-Ψρ always generates lower portfolio variance than

alternative portfolios in most datasets except for IntMkt (#7), where GMV-LW maintains a

slightly even lower variance. Columns under heading [2] tabulate differences in out-of-sample

portfolio risks (standard deviations) between GMV-Ψρ and GMV-LW. Improvements in out-

of-sample risk reduction from GMV-S−1, EW (1/N) or GMV-JM to GMV-Ψρ are evident and

statistically significant. GMV-Ψρ also enhances significant out-of-sample risk reduction from

20

GMV-LW for the datasets #1, #2 and #11, but the difference is smaller in magnitude and

insignificant for the rest of datasets. Still, GMV-Ψρ is never significantly dominated in portfolio

risk minimization. Therefore, the proposed portfolio compares favorably with the one based

on no-short-sale restrictions (Jagannathan and Ma (2003)) and the one with Ledoit and Wolf’s

(2004a) shrinkage covariance matrix estimator.26

Table 4 about here.

B. Out-of-Sample Sharpe Ratio

In principle, reduction in the portfolio risk can improve the Sharpe ratio if the mean returns

remain the same. Although mean returns are susceptible to estimation errors, Sharpe ratio is

among the most widely used performance measures. It is also known that the global minimum

variance portfolio, that ignores the estimated mean returns altogether but exploits covariances

among stocks, often achieves a higher Sharpe ratio than other portfolios (e.g. Jorion (1985,

1986), DeMiguel, Garlappi, and Uppal (2009)). Therefore, we calculate out-of-sample Sharpe

ratios for GMV-Ψρ and the four alternative portfolios. Columns under heading [1] of Table 5

tabulate monthly Sharpe ratios. GMV-Ψρ generates the highest or second-highest out-of-sample

Sharpe ratios among the eleven datasets except for #4 and #10. For the US portfolios (#1-6),

GMV-Ψρ yields Sharpe ratios between 0.126 and 0.295 during the testing period between July

1983 and Dec 2010. These Sharpe ratios are higher than those of the EW (1/N) portfolio, which

26We would rather add that we do not intend to claim a general superiority of Ψρ-based portfolios to the onesproposed by Jagannathan and Ma (2003) and Ledoit and Wolf (2003, 2004a,b). All of these portfolios involve someflexibility in implementation. For example, Jagannathan and Ma’s (2003) portfolio strategy can accommodatevarious upper and lower bounds for individual portfolio weights. Ledoit and Wolf’s (2003, 2004a,b) shrinkageestimator depends on the shrinkage target and the shrinkage intensity parameter. Our choice of the constantcorrelation matrix follows Ledoit and Wolf’s recommendation, but other choices are certainly possible. The

proposed sparse estimator Ψρ also involves a choice of the regularization parameter ρ. The relative performanceof the portfolios may depend on datasets, estimation windows, and performance measures as well as testingmethodologies.

21

range between 0.113 and 0.139. The value-weighted market portfolio has a Sharpe ratio of 0.113

during the same period. Therefore, GMV-Ψρ indeed achieves higher Sharpe ratios than the

equally-weighted and value-weighted diversification policies. For the 100 randomized samples

of 100-individual stocks, GMV-Ψρ is dominated by the EW (1/N) strategy (0.181), still, its

Sharpe ratio of 0.147 is considerably above the other remaining strategies with Sharpe ratios

ranging between 0.100 and 0.105.

Table 5 about here.

Detecting reliable differences in the Sharpe ratios are difficult due to large estimation errors

in mean returns. Furthermore, in the presence of fat-tails, serial correlation, and volatility

clustering, the conventional Jobson and Korkie’s (1981b) test is not appropriate. We therefore

employ Ledoit and Wolf’s (2008) studentized circular block bootstrap (with block size 5) to

test the null hypothesis of no difference in Sharpe ratios.27 Results are shown in Table 5 under

heading [2]. GMV-Ψρ achieves significantly higher Sharpe ratios than GMV-S−1, EW (1/N),

and GMV-JM in four to six datasets. Meanwhile, only for dataset #1 and #5, GMV-S−1

attains the highest Sharpe ratio that is significantly higher than GMV-Ψρ and other alternative

portfolios. And, only for dataset #1 is Sharpe ratio of GMV-Ψρ significantly worse than one of

the competing portfolios. In addition, for industry portfolios (#3) and country indexes (#7,10)

we do not find any statistically discernible differences in Sharpe ratios between GMV-Ψρ and

the four alternative portfolios. As for the individual stocks, all the difference between GMV-Ψρ

and other strategies are statistically significantly.

In summary, the proposed portfolio GMV-Ψρ achieves high and stable Sharpe ratios that

compare favorably with alternative portfolios. However, given large estimation errors in mean

27DeMiguel, Garlappi, Nogales, and Uppal (2009) also adopt this approach.

22

returns, we refrain from drawing strong conclusions for the differences in Sharpe ratios in this

exercise.

C. Economic Gains from Improved Portfolio Optimization

Mean variance portfolio optimization accomplishes portfolio risk reduction beyond the EW

(1/N) diversification rule by exploiting the hedging relations among stocks. Assessing the eco-

nomic significance of portfolio risk reduction, however, is difficult without additional assump-

tions. Furthermore, meaningful assessment of economic gains must account for the effect of

transaction costs, the amount of which grows rapidly with the hedge trades induced turnovers.

To this end, we calculate the annualized certainty equivalent excess return (CER) of each

portfolio after subtracting its transaction cost (TCost). Specifically, the TCost-adjusted CER

is

TCost-adjusted CERq = µq −γ

2σ2q − TCostq

where µq and σ2q are the time-series (annualized) mean and variance of out-of-sample excess

returns for portfolio q. Following the standard practice in the literature (e.g., Brandt (2009)),

we set the risk aversion coefficient γ to be 5.28 TCostq is the annualized transaction costs

associated with portfolio q, measured by the annualized turnover multiplied by proportional

transaction costs of 50 basis points per trade.29 We can interpret TCost-adjusted CERq as

the increase in the risk-free rate that an investor is willing to trade for a risky portfolio q after

28Tu and Zhou (2009a,b) use γ = 3. With this choice, EW (1/N) portfolio of US domestic stocks yield TCost-adjusted CERs higher than 3 percent, which we deem to be higher than most investors would require. However,results for the case of γ = 3 (available upon request) are qualitatively similar to the results reported in this paper.

29According to Domowitz, Glen, and Madhavan (2001), average proportional trading costs for US equities(across NYSE, AMEX, NASDAQ for 1990-1998) is 38.1 basis points per trade. This almost coincides with theearlier estimate of 38 basis points by Stoll and Whaley (1983) for NYSE stocks. Average transaction cost for the18 countries in the MSCI dataset is 44.6 basis points per trade.

23

accounting for transaction costs. For example, suppose that a portfolio q has a TCost-adjusted

CER of one percent. In this case, the investor is indifferent between the portfolio q and an

asset that guarantees riskless return of one percent plus the risk-free rate. A higher value of

TCost-adjusted CER indicates that the portfolio has a more desirable risk-return characteristic.

In order to get a sense of the size of trades, Table 6 reports monthly turnover in columns

under heading [1]. The reported turnover can be interpreted as the average fraction of wealth

traded in each rebalancing period. Not surprisingly, the EW (1/N) portfolio has the lowest

turnover because the diversification rule does not involve any hedge trades. The no-short-sale

constraint also keeps the turnover of the GMV-JM low. On the contrary, GMV-Ψρ and GMV-

LW portfolio strategies actively employ hedge trades to reduce portfolio risk, and hence entail

higher turnover. For GMV-Ψρ and GMV-LW, monthly turnovers range between 0.160 and

0.886 and between 0.134 and 1.220 respectively. However, these portfolios attain a much lower

turnover than GMV-S−1 by reducing the effects of estimation errors.

The columns of Table 6 under heading [2] report TCost-adjusted CERs of the five portfolios.

GMV-Ψρ clearly realizes positive economic gains over GMV-S−1 in all datasets except #1.

While GMV-Ψρ incurs much larger transaction costs than the EW (1/N) and the no-short-sale

constrained GMV-JM portfolios, its gains from portfolio risk reduction still exceed the increased

transaction costs for most datasets. For example, GMV-Ψρ achieves positive economic gains

over EW (1/N) in all datasets but #4 and individual stocks (#11). It also achieves positive

economic gains over GMV-JM in 8 datasets except #3, #4 and #10. Therefore, for an investor

with risk aversion coefficient of γ = 5, economic gains from increased hedge trades to reduce

portfolio risk are worth their transaction costs in most datasets. Economic gains from our

proposed estimator Ψρ also compare favorably with those from the Ledoit and Wolf’s shrinkage

24

estimator Σ−1LW , as GMV-Ψρ achieves higher TCost-adjusted CERs than GMV-LW in seven

datasets. Overall, GMV-Ψρ yields the highest or second highest TCost-adjusted CERs among

the five portfolios in all datasets, indicating the consistency of its economic gains across different

datasets.

Table 6 about here.

Still, we caution a strong conclusion from this analysis. First of all, TCost-adjusted CERs

include a time-series mean of out-of-sample portfolio returns that are subject to large estima-

tion errors. Second, this analysis tends to underestimate the gains from the minimum variance

optimized portfolios in favor of EW (1/N) strategy, given the assumption that the optimized

portfolios are rebalanced in every period. In practice, portfolio managers incorporate transac-

tion costs explicitly in making their optimal portfolio decisions. Consequently, they rebalance

portfolios only when predicted utility gains (e.g., certainty equivalent returns) exceed the trans-

action costs, thereby lowering their turnover. Seeking an optimal trade-off between portfolio risk

reduction and transaction costs, however, requires dynamic optimization with a specific penalty

function on the transaction costs. We abstract from this practical implementation issue. Our

focus is on improving the estimator of the inverse covariance matrix, because such an improved

estimator can help improve our portfolio decisions in any setup.

D. Effects of No-Short-Sale Restriction

In some practical situations, portfolio managers impose no-short-sale constraint (non-negativity

constraint) on their portfolios to avoid extreme positions. However, Green and Hollifield (1992)

show that imposing no-short-sale constraint inhibits their ability to hedge the dominant sys-

tematic risk (i.e., the market factor) in minimizing portfolio risk. In response, Jagannathan and

25

Ma (2003) demonstrate that imposing no-short-sale constraint (non-negativity constraint) does

not necessarily hurt the portfolio performance because the estimated covariance matrix contains

large measurement errors. In summary, the effects of no-short-sale constraint on portfolio per-

formance depend on the magnitude of measurement errors in the estimated covariance matrix.

If large estimation errors are still present in Ψρ, then adding the no-short-sale restriction can

still help improve the performance of the GMV-Ψρ portfolio.

Consequently, we examine how the non-negativity constraint affects the performance of the

GMV-Ψρ portfolio. We use GMV-JM-Ψρ to denote the GMV-Ψρ portfolio with no-short-sale

restriction, i.e., all portfolio weights of GMV-JM-Ψρ are non-negative. In other words, GMV-

JM-Ψρ replaces S−1 with Ψρ in GMV-JM.

Table 7 compares the out-of-sample portfolio performance of GMV-Ψρ, GMV-JM, and GMV-

JM-Ψρ. Obviously, the non-negativity restriction limits the optimizer’s ability to diversify port-

folio risk. With the same restriction but using different inverse covariances, GMV-JM and

GMV-JM-Ψρ have very similar portfolio risk and Sharpe ratio out-of-sample, though GMV-

JM-Ψρ yields lower turnover than GMV-JM in all datasets. This result suggests that, with the

improved estimator of the inverse covariance matrix Ψρ in place, the non-negativity constraint

no longer helps improve the portfolio performance.30 It also indicates that the improved per-

formance of GMV-Ψρ is achieved through improved hedge trades that entail short positions in

some constituents of the portfolio. By constraining the hedge trades, no-short-sale restriction

leads to lower portfolio performance once the estimation errors are contained.

Table 7 about here.

30This is consistent with Jagannathan and Ma’s (2003) observation that non-negativity constraint leads toreduction in portfolio performance when we apply a factor structure or a shrinkage to the covariance matrixestimation, or when we use daily returns (low N/T ) to estimate the sample covariance matrix.

26

E. Comparison with Industry Factor Models

In practice, factor models are widely used to estimate covariance matrices. By imposing partic-

ular structures to reduce the number of free parameters, factor models offer alternative ways of

reducing estimation errors (Chan, Karceski, and Lakonishok 1999). Our datasets #1 to #10 are

portfolio themselves. However, to compare the performance between GMV-Ψρ and the global

minimum variance portfolio based on a factor model, it is more appropriate and meaningful to

cast sights on individual stocks, i.e., dataset #11 (individuals). Since it is commonly known

that industry factors can explain most of the common variations in individual stock returns, we

use a 30-industry factor model.31 We denote the global minimum variance portfolio based on

this industry factor model as GMV-Σ−1IND.

Table 8 about here.

Table 8 reports the comparison between GMV-Ψρ and GMV-Σ−1IND. Since we have 100

randomized samples of 100 individual stocks, we also report the distributions of conditional

number of estimated inverse covariance matrix, monthly variance, Sharpe ratio, and Tcost-

adjusted CER. Generally, for all these items, GMV-Ψρ has better results than GMV-Σ−1IND. For

example, regarding the monthly portfolio variance, GMV-Ψρ has all the smaller value, in terms

of mean, standard deviation, maximum and minimum. Again, we draw statistical inference from

the distribution of results from 100 samples, and the reduction in volatility is also significant

statistically (t-stat= 2.56). Similarly, GMV-Ψρ attains higher Sharpe ratios than GMV-Σ−1IND

in terms of mean, minimum, and maximum. Furthermore, the standard deviation of Sharpe

ratio is also lower for GMV-Ψρ than for GMV-Σ−1IND. These results suggest that the proposed

31Again from the website of Ken French.

27

estimator Ψρ achieves a better prediction of the inverse covariance matrix Σ−1 better than the

industry factor model. The gain in the mean Sharpe ratio from the industry factor model to

our proposed model is also statistically significant (t-stat= 2.02).

Finally, when it comes to Tcost-adjusted CER, there is a same pattern between two portfolios

as that of Sharpe ratio. The test for difference also achieves a high t-stat as 5.74. Overall, GMV-

Ψρ overperforms GMV-ΨIND in all the criteria for 100 randomized samples of 100 individual

stocks drawn from the NYSE/AMEX universe.

F. Effects of updating ρ

Given our “rolling-horizon” approach, it is natural to ask if we can improve the portfolio per-

formance by re-estimating the tuning parameter ρ period by period. This practice is however

computation-intensive. Still, to examine the potential benefits by updating the ρ, we take the

following approach. For every datasets from #1 to #10, we split the sample into two halves,

and update ρ for the purpose of second half sample. For example for dataset #1, we re-solve

the glasso problem and start to build the portfolio with the updated ρ in 1997:04, which lies in

the middle of the out-of-sample period between 1983:07 and 2010:12. The newly constructed

GMV-Ψρ is denoted ‘with new ρ’ to be contrasted with the previous one without updating

(‘with old ρ’) in the same second half sample period.

Table 9 about here.

Table 9 reports the effects of updating ρ except datasets #3 and #5 for which re-estimation

results in identical choice of ρ. We then report the monthly variance, Sharpe ratio and Tcost-

adjusted CERs with both the ‘old’ and ‘new’ ρs, and calculate the differences. Overall there is

a fairly obvious benefit to update ρ even only once. In 7 out of 8 datasets (except dataset #6),

28

updating the tuning parameter helps bringing the portfolio volatility down. And in 6 out of 8

datasets (except datasets #2 and #6), updating also boosts up the Tcost-adjusted CERs.

V. Conclusion

Appealing to the idea that the inverse covariance matrix prescribes the optimal hedging relations

among stocks, we propose a method to reduce the estimation errors in the inverse covariance

matrix by shrinking the estimated hedge portfolio weights. The proposed estimator of the inverse

covariance matrix is sparse, meaning that a significant fraction of its off-diagonal elements are

zero.

We show that the proposed estimator produces superior forecasts of covariances, and ac-

complishes a significantly better task in out-of-sample portfolio risk minimization compared

to the sample-based (maximum likelihood) estimator of the covariance matrix, especially in

datasets with large N/T ratio. Furthermore, out-of-sample performance of the global minimum

variance portfolio with the sparse inverse covariance matrix compares favorably to those of the

equal-weighted portfolio, the no-short-sale constrained portfolio (Jagannathan and Ma (2003)),

and the portfolio with shrunk covariance matrix (e.g. Ledoit and Wolf (2004a)) as well as the

portfolio using an industry factor model. These results support the initial motivation of our

analysis: By mitigating estimation errors in the hedge portfolios, we can enhance the ability

of mean-variance optimizer in reducing the out-of-sample portfolio risk. We further show that,

with the proposed estimator, economic gains from improved hedge trades exceed conventional

level of transaction costs. Moreover, additional no-short-sale restriction does not help enhance

the out-of-sample performance because it inhibits better use of hedge trades.

29

References

Akaike, Hirotsugu, “A new look at the statistical model identification.” IEEE Transactions on AutomaticControl , 19 (1974), 716–723.

Bai, Jushan and Serena Ng, “Forecasting economic time series using targeted predictors.” Journal of Econo-metrics, 146 (2008), 304–317.

Banerjee, Onureena, Laurent El Ghaoui, and Alexandre d’Aspremont, “Model Selection Through Sparse Max-imum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine LearningResearch, 9 (2008), 485–516.

Black, Fisher and Robert Litterman, “Global Portfolio Optimization.” Financial Analyst Journal , 48 (1992),28–43.

Brandt, Michael, “Portfolio Choice Problems.” In Handbook of Financial Econometrics, Lars Hansen, eds.:North-Holland (2009).

Caner, Mehmet, “Lasso-type GMM estimator.” Econometric Theory , 25 (2009), 270–290.

Chan, Louis K. C., Jason Karceski, and Josef Lakonishok, “On Portfolio Optimization: Forecasting Covariancesand Choosing the Risk Model.” Review of Financial Studies, 12 (1999), 937–974.

DeMiguel, Victor, Lorenzo Garlappi, and Raman Uppal, “Optimal versus Naive Diversification: How InefficientIs the 1/N Portfolio Strategy?” Review of Financial Studies, forthcoming , (2009).

, , Javier Nogales, and Raman Uppal, “A Generalized Approach to Portfolio Optimization: ImprovingPerformance By Constraining Portfolio Norms.” Management Science, forthcoming , (2009).

Dempster, Arthur. P., “Covariance Selection.” Biometrics, 28 (1972), 157–175.

Domowitz, Ian, Jack Glen, and Ananth Madhavan, “Liquidity, Volatility, and Equity Trading Costs AcrosCountries and Over Time.” International Finance, 4 (2001), 221–255.

Elton, Edwin J. and Martin J. Gruber, “Estimating the Dependence Structure of Share Prices–Implications forPortfolio Selection.” Journal of Finance, 28 (1973), 1203–1232.

, , and Thomas J. Urich, “Are Betas Best?” Journal of Finance, 33 (1978), 1375–1384.

Fabozzi, Frank J., Petter N. Kolm, Dessislava Pachamanova, and Sergio M. Focardi, Robust Portfolio Optimiza-tion and Management (Hoboken, NJ: John Wiley & Sons, Inc., 2007).

Fan, Jianqing, Fan Yingying, and Lv Jinchi, “High dimensional covariance matrix estimation using a factormodel.” Journal of Econometrics, 147 (2008), 186–197.

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani, “Sparse inverse covariance estimation with the graph-ical lasso.” Biostatistics, 9 (2008), 432–441.

Green, Richard C. and Hollifield, Burton, “When Will Mean-Variance Efficient Portfolios be Well Diversified?”Journal of Finance, 47 (1992), 1785–1809.

Jagannathan, Ravi and Tongshu Ma, “Risk Reduction in Large Portfolios: Why Imposing the Wrong ConstraintsHelps.” Journal of Finance, 58 (2003), 1651–1683.

Jobson, J. D. and Bob Korkie, “Putting Markowitz theory to work.” Journal of Portfolio Management , 7 (1981),70–74.

and Bob M. Korkie, “Performance Hypothesis Testing with the Sharpe and Treynor Measures.” Journalof Finance, 36 (1981), 889–908.

Jorion, Philippe, “International Portfolio Diversification with Estimation Risk.” The Journal of Business, 58(3)(1985), 259–278.

30

, “Bayes-Stein Estimation for Portfolio Analysis.” The Journal of Financial and Quantitative Analysis,21(3) (1986), 279–292.

Judge, George G., William E. Griffiths, R. Carter Hill, Helmut Lutkepohl, and Tsoung-Chao Lee, The Theoryand Practice of Econometrics Wiley Series in Probability and Statistics, Hoboken, NJ: John Wiley & Sons,Inc.: (1980).

Kan, Raymond and Guofu Zhou, “Optimal Portfolio Choice with Parameter Uncertainty.” Journal of Financialand Quantitative Analysis, 42 (2007), 621–656.

Ledoit, Olivier and Michael Wolf, “Improved estimation of the covariance matrix of stock returns with anapplication to portfolio selection.” Journal of Empirical Finance, 10 (2003), 603 – 621.

and , “Honey, I Shrunk the Sample Covariance Matrix.” Journal of Portfolio Management , Summer2004 (2004), 110–119.

and , “A well-conditioned estimator for large-dimensional covariance matrices.” Journal of Multi-variate Analysis, 88 (2004), 365–411.

and , “Robust performance hypothesis testing with the Sharpe ratio.” Journal of Empirical Finance,15 (2008), 850–859.

MacKinlay, A. Craig and Lubos Pastor, “Asset Pricing Models: Implications for Expected Returns and PortfolioSelection.” The Review of Financial Studies, 13 (2000), 883–916.

Michaud, Richard, “The Markowitz Optimzation Enigma: Is ‘Optimized’ Optimal?” Financial Analyst Journal ,45 (1989), 31–42.

Pastor, Lubos, “Portfolio Selection and Asset Pricing Models.” The Journal of Finance, 55 (2000), 179–223.

and Robert F. Stambaugh, “Comparing asset pricing models: an investment perspective.” Journal ofFinancial Economics, 56 (2000), 335 – 381.

Politis, Dimitris N. and Halbert White, “Automatic block-length selection for the dependent bootstrap.” Econo-metric Reviews, 23 (2004), 53–70.

and Joseph P. Romano, “The Stationary Bootstrap.” Journal of the American Statistical Association, 89(1994), 1303–1313.

Rothman, Adam J., Peter J. Bickel, Elizaveta Levina, and Ji Zhu, “Sparse permutation invariant covarianceestimation.” Electronic Journal of Statistics, 2 (2008), 494–515.

Schwarz, Gideon E., “Estimating the dimension of a model.” Annals of Statistics, 6 (1978), 461–464.

Stevens, Guy V. G., “On the Inverse of the Covariance Matrix in Portfolio Analysis.” Journal of Finance, 53(1998), 1821–1827.

Stoll, Hans R. and Robert E. Whaley, “Transaction costs and the small firm effect.” Journal of FinancialEconomics, 12 (1983), 57–79.

Tibshirani, Robert, “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society.Series B (Methodological), 58 (1996), 267–288.

Tu, Jun and Guofu Zhou, “Incorporating Economic Objectives into Bayesian Priors: Portfolio Choice underParameter Uncertainty.” Journal of Financial and Quantitative Analysis, 45 (2010), 954–986.

and , “Markowitz Meets Talmud: A Combination of Sophisticated and Naive Diversification Strate-gies.” Journal of Financial Economics, 99 (2011), 204–215.

Yuan, Ming and Yi Lin, “Model selection and estimation in the gaussian graphical model.” Biometrika, 94(2007), 19–35.

31

Appendix

In this appendix we show the relationship between the estimation of Ψ and constructing the

system of hedge portfolios. Again the constrained maximum likelihood estimation problem is

maxΨ=[ψij]

T

2ln |Ψ| − T

2tr(SΨ)− ρ∑N

i=1

∑N

j=1i 6=j

∣∣ψij∣∣ , (7)

where S is the sample covariance matrix.

By constraining maximum likelihood, graphical lasso algorithm also imposes constraints on

hedge regressions. Banerjee, El Ghaoui and d’Asprement (2008) show that the problem (7) is

convex and consider estimation as follows. Letting W be a perturbation of the sample estimator

S, they show that one can solve the problem by optimizing over each row and corresponding

column of W . Suppose we rearrange the stocks so that the last row and column correspond to

the stock one wants to hedge by other stocks. Then partitioning W and S,

W =

W11 w12

w′12 w22

, S =

S11 s12

s′12 s22

,Using convex duality, Banerjee, El Ghaoui and d’Asprement show the dual problem of the above

turns out to be

minβ

{1

2

∥∥∥W 1/211 β − b

∥∥∥2+ ρ ‖β‖1

}, (8)

where b = W−1/211 s12 and β = W−1

11 w12.

Now this dual exactly resembles a lasso least-squares problem (Tibshirani (1996)) applied

on hedge regression (1) (Stevens (1998)). Without the constraint on its l1 norm, β is exactly the

vector of hedge coefficients [see expression (3)], the result of regressing the last stock return on

all the previous N − 1 stocks (thus the expression W−111 w12). However, when the solution to the

minimization problem is subject to the l1 norm constraint, the lasso will shrink some element

of the row vector β to zero. Again the lasso estimator contains bias but reduces estimation

variation.

This duality further motivates Friedman, Hastie, and Tibshirani’s (2008) glasso algorithm,

which recursively solves each row and update the lasso problem until convergence, ensuring the

symmetry of the Ψ. For the equivalence between the above two problems, please see Banerjee,

El Ghaoui and d’Asprement (2008) or Friedman, Hastie, and Tibshirani (2008).

32

Tab

le1:

Dat

ad

escr

ipti

on

#[1

]D

escr

ipto

r[2

]D

escr

ipti

on[3

]M

kts

[4]

N[5

]N

/T

[6]

Tim

ep

erio

d

1SZBM25

25(5×

5)p

ortf

olio

sfo

rmed

on

size

and

book

-to-

mar

ket.

US

25

0.2

08

07/1963-1

2/2010

2SZBM100

100

(10×

10)

por

tfoli

os

form

edon

size

and

book

-to-

mar

ket.

US

100

0.8

33

07/1963-1

2/2010

3IN

D30

30in

du

stry

por

tfol

ios.

US

30

0.2

50

07/1963-1

2/2010

4IN

D48

48in

du

stry

por

tfol

ios.

US

48

0.4

00

07/1963-1

2/2010

5SZBM25+IN

D30

Com

bin

atio

nofSZBM

25an

dIND

30.

US

55

0.4

58

07/1963-1

2/2010

6SZBM100+IN

D48

Com

bin

atio

nofSZBM

100

an

dIND

48.

US

148

1.2

33

07/1963-1

2/2010

7IntM

ktC

ountr

ym

arket

port

foli

osof

15d

evel

op

edco

untr

ies.

Int’

l15

0.1

25

01/1975-1

2/2010

8IntValG

roV

alu

ean

dgr

owth

por

tfoli

osin

15d

evel

oped

cou

ntr

ies.

Int’

l118

0.9

83

01/1975-1

2/2010

9IntM

kt+IntValG

roC

omb

inat

ion

ofIntM

kt

andIntValGro.

Int’

l133

1.1

08

01/1975-1

2/2010

10MSCI

MS

CI

cou

ntr

yin

dex

esfo

r18

dev

elop

edm

arke

ts,

incl

ud

ing

US

.In

t’l

18

0.1

50

01/1977-1

2/2010

11Individuals

100

stock

sfr

omN

YS

Ean

dA

ME

X.

US

100

0.8

33

01/1973-1

2/2010

Not

es:

Th

ista

ble

list

sth

eva

riou

ste

stin

gp

ortf

olio

sw

eco

nsi

der

.C

olu

mn

[1]

give

sth

eab

bre

via

tion

use

dto

refe

rto

the

test

ing

port

foli

osin

colu

mn

[2],

from

eith

erU

Sor

Inte

rnat

ion

alca

pit

alm

arke

ts(C

olu

mn

[3])

.C

olu

mn

[4]

rep

ort

sth

enu

mb

erof

ass

ets

inea

chd

atase

t,an

dco

lum

n[5

]re

port

sth

eN/T

rati

ow

hen

T=

120.

Th

esa

mp

lep

erio

dof

each

data

set

issh

own

inC

olu

mn

[6].

ForIND

48,

we

augm

ent

the

data

wit

han

old

ind

ust

rycl

assi

fica

tion

[fro

mJu

l196

3to

Dec

2004

;use

dby

DeM

igu

el,

Garl

ap

pi,

and

Up

pal

(200

9)]

wit

hth

ose

from

an

ewon

efr

omJan

2005

toD

ec20

10.IntM

kts

are

US

doll

arre

turn

sof

the

valu

e-w

eighte

dco

untr

ym

arke

tp

ortf

olio

sof

the

foll

owin

g15

cou

ntr

ies:

Au

stra

lia,

Bel

giu

m,

Can

ada,

Fra

nce

,G

erm

any,

Hon

gK

on

g,

Italy

,Jap

an

,N

eth

erla

nd

,N

orw

ay,

Sin

gap

ore

,S

pai

n,

Sw

eden

,S

wit

zerl

and

,an

dU

K.IntValGro

conta

ins

valu

ean

dgro

wth

port

foli

os

inea

chof

the

15

cou

ntr

ies

usi

ng

fou

rva

luat

ion

rati

os:

book

-to-

mar

ket

(B/M

);ea

rnin

gs-p

rice

(E/P

);ca

shea

rnin

gs

top

rice

(CE

/P

);an

dd

ivid

end

yie

ld(D

/P

).T

he

valu

ep

ortf

olio

sco

nta

infi

rms

inth

eto

p30

%of

ara

tio

an

dth

egr

owth

por

tfoli

os

conta

infi

rms

inth

eb

otto

m30%

.W

eex

clu

de

the

CE

/P

base

dva

lue

and

grow

thp

ortf

olio

sfo

rN

orw

ay,

du

eto

ala

rge

nu

mb

erof

mis

sin

gva

lues

.MSCI

conta

ins

US

dol

lar

retu

rns

onth

eM

SC

Ico

untr

yin

dex

esfo

r:A

ust

rali

a,A

ust

ria,

Bel

giu

m,

Can

ada,

Den

mark

,F

rance

,G

erm

any,

Hon

gK

on

g,

Ital

y,Jap

an

,N

eth

erla

nd

s,N

orw

ay,

Sin

gap

ore,

Sp

ain

,S

wed

en,

Sw

itze

rlan

d,

UK

,an

dU

S.

33

Tab

le2:

Ou

t-of-

Sam

ple

Per

iod

s,R

egu

lari

zati

onP

aram

eter

,an

dS

pars

ity

for

Est

imate

d

Ou

t-of

-Sam

ple

An

alysi

sP

erio

dP

rop

ose

dO

pti

miz

er:

Ψρ

#Dataset

[1]

Tra

inin

gp

erio

d[2

]T

esti

ng

per

iod

[3]Tf

[4]ρ

[5]

Sp

ars

ity

1SZBM25

197

3:07-

198

3:06

1983

:07-

2010

:12

450

0.3

28.5

%

2SZBM100

197

3:07-

198

3:06

1983

:07-

2010

:12

450

1.7

45.0

%

3IN

D30

197

3:07-

198

3:06

1983

:07-

2010

:12

450

1.1

27.8

%

4IN

D48

197

3:07-

198

3:06

1983

:07-

2010

:12

450

1.3

32.2

%

5SZBM25+IN

D30

197

3:07-

198

3:06

1983

:07-

2010

:12

450

0.5

32.3

%

6SZBM100+IN

D48

197

3:07-

198

3:06

1983

:07-

2010

:12

450

1.9

47.1

%

7IntM

kt198

5:01-

199

4:12

1995

:01-

2010

:12

312

0.2

22.4

%

8IntValG

ro198

5:01-

199

4:12

1995

:01-

2010

:12

312

1.1

44.4

%

9IntM

kt+IntValG

ro198

5:01-

199

4:12

1995

:01-

2010

:12

312

0.8

44.0

%

10

MSCI

198

7:01-

199

6:12

1997

:01-

2010

:12

288

0.7

30.5

%

11

Individuals

198

3:01-

199

2:12

1993

:01-

2010

:12

336

5.9

32.4

%

Not

es:

For

each

data

set,

this

tab

lere

por

ts:

[1]th

etr

ain

ing

per

iod

;[2

]te

stin

gp

erio

d;

and

[3]th

enu

mb

erof

test

ing

per

iod

s(m

onth

s)fo

ran

out-

of-s

am

ple

anal

ysi

s;[4

]th

ere

gula

riza

tion

par

amet

erρ

chos

enfr

omth

etr

ain

ing

per

iod

;an

d[5

]th

ed

egre

eof

spars

ity,

wh

ich

isth

eav

erage

per

centa

ge

of

zero

off-d

iago

nal

elem

ents

inth

ees

tim

ated

inve

rse

cova

riance

mat

rix

Ψρ.

34

Tab

le3:

Con

dit

ion

Nu

mb

ers

for

Est

imat

edIn

vers

eC

ovar

ian

ceM

atri

ces

#Dataset

N/T

[1]

Ψρ

[2]S−

1[3

]Σ−

1LW

mea

nst

dm

axm

inm

ean

std

max

min

mea

nst

dm

ax

min

1SZBM25

0.21

660

142

1048

396

1767

300

261

210

88285

38

376

214

2SZBM100

0.83

881

189

1482

473

8446

331

138

1707

09

348

11

3306

3449

12240

562

3IN

D30

0.25

192

6631

994

511

214

980

208

162

49

289

75

4IN

D48

0.40

327

140

591

149

1497

841

3195

502

303

114

546

127

5SZBM25+IN

D30

0.46

1116

288

1743

628

1137

932

22186

05464

8560

124

1002

391

6SZBM100+IN

D48

1.23

1153

252

1916

636

∞∞

∞∞

4279

4740

17796

1008

7IntM

kt0.

13

514

197

1072

270

1294

821

399

4479

95

46

175

34

8IntValG

ro0.

98

1865

358

2972

1360

VL

NV

LN

VL

NV

LN

722

191

1045

372

9IntM

kt+IntValG

ro1.

11

2844

527

4508

2057

∞∞

∞∞

902

238

1308

462

10

MSCI

0.15

127

7333

167

231

144

601

96

77

24

127

39

11

Individuals

0.83

321

8653

916

6V

LN

VL

NV

LN

VL

N709

432

2207

265

Not

es:

For

each

dat

ase

t,th

ista

ble

rep

ort

sco

nd

itio

nnu

mb

ers

ofth

eth

ree

esti

mat

ors

of

the

inve

rse

cova

rian

cem

atr

ix:

[1]

Ψρ

isth

ep

rop

ose

dsp

ars

ees

tim

ator

wit

hth

ere

gu

lari

zati

onp

aram

eterρ

give

nin

colu

mn

[4]

ofT

able

2;

[2]S−

1is

the

inve

rse

of

the

sam

ple

cova

rian

cem

atr

ix;

[3]

Σ−

1LW

isth

ein

vers

eof

the

Led

oit-

Wol

fsh

run

kco

vari

ance

mat

rix

(sam

ple

cova

rian

cem

atr

ixsh

run

kto

con

stant

corr

elati

onm

atri

x).

Con

dit

ion

nu

mb

erfo

rth

e(i

nve

rse)

cova

rian

cem

atri

xis

the

rati

oof

the

max

imal

eigen

valu

eto

the

min

imal

eige

nva

lue.

Hig

her

valu

eof

the

con

dit

ion

nu

mb

erin

dic

ates

grea

ter

nu

mer

ical

inst

abil

ity

ofth

ein

vers

eop

erati

on

.C

on

dit

ion

nu

mb

erofS−

1is

show

nas

“in

fin

ite

(∞)”

wh

enN

>T

(T=

120)

,as

the

min

imal

eige

nva

lue

ofS−

1is

zero

.VLN

mea

ns

“ver

yla

rge

nu

mb

er.”

35

Tab

le4:

Ou

t-of

-Sam

ple

Por

tfol

ioR

isk

(Month

ly)

[1]

Mon

thly

Var

ian

ce( %

2)

[2]

Red

uct

ion

inM

onth

lyV

ola

tili

ty(%

)#

Dataset

σ2 Ψρ

σ2 s−

2 EW

σ2 JM

σ2 LW

σs−

1−σ

Ψρ

σEW−σ

Ψρ

σJM−σ

Ψρ

σLW−σ

Ψρ

1SZBM25

13.5

214

.51

24.7

223

.20

21.4

40.1

31.

30*

**1.1

4***

0.9

5***

12

54

32

SZBM100

13.3

065

.30

25.4

358

.57

21.9

84.

43**

*1.4

0**

*4.0

1*

1.0

4***

15

34

23

IND30

12.3

514

.49

22.5

315

.97

13.0

50.2

91.

23*

**0.4

8**

0.1

01

35

42

4IN

D48

12.4

517

.59

22.8

816

.22

13.1

50.

66**

*1.2

5**

*0.5

0*

0.1

01

45

32

5SZBM25+IN

D30

10.0

114

.32

22.8

815

.91

11.0

50.

62**

*1.6

2**

*0.8

2***

0.1

61

35

42

6SZBM100+IN

D48

10.7

0T<

N24

.12

16.4

312

.74

T<

N1.

64*

**0.7

8***

0.3

01

54

32

7IntM

kt20

.06

20.2

829

.33

25.4

319

.80

0.0

20.9

4*

0.5

6-0

.03

23

54

18

IntValG

ro15

.15

100

4.95

29.6

123

.72

15.6

027

.81*

**1.

55*

**0.9

8*

0.0

61

54

32

9IntM

kt+IntValG

ro15

.25

T<

N29

.56

23.6

415

.92

T<

N1.

53*

**0.9

6*

0.0

81

54

32

10MSCI

16.5

417

.41

32.2

524

.84

16.6

70.

11*

*1.6

1**

*0.9

2*

0.0

21

35

42

11Individuals

10.1

019

.44

23.7

913

.40

11.3

61.

21**

*1.7

2**

*0.4

5***

0.1

9***

14

53

2

Not

es:

For

each

dat

ase

t,th

ista

ble

rep

ort

sth

em

onth

lyva

rian

ces

ofou

t-of

-sam

ple

retu

rns

for

the

five

port

foli

os:

GM

V-Ψ

ρ,

GM

V-

S−

1,

EW

(1/N

),G

MV

-JM

,an

dG

MV

-LW

2 Ψρ,σ

2 s−1,σ

2 EW,σ

2 JM

,an

2 LW

den

ote

thei

rre

spec

tive

vari

an

ces,

calc

ula

ted

from

the

tim

e-se

ries

of

out-

of-s

am

ple

retu

rns

inth

ete

stin

gp

erio

d.T<N

ind

icat

esth

atth

ep

ortf

olio

can

not

be

con

stru

cted

du

eto

the

non

-inve

rtib

ilit

yof

the

sam

ple

cova

rian

cem

atri

x.

Col

um

ns

un

der

[1]

tab

ula

teth

ep

oint

esti

mat

esof

the

ou

t-of-

sam

ple

month

lyre

turn

vari

ance

sin

%2.

Nu

mb

ers

bel

owth

eva

rian

ces

initalics

are

the

ran

kin

gof

port

foli

ova

rian

ceam

on

gth

efi

vep

ort

foli

os:

1in

dic

ates

the

small

est

vari

an

cean

d5

ind

icat

esth

ela

rges

t.C

olu

mn

su

nd

er[2

]ta

bu

late

the

mea

nd

iffer

ence

sin

the

ou

t-of-

sam

ple

por

tfol

iost

an

dar

dd

evia

tion

s,b

ase

don

Pol

itis

and

Rom

ano’

s(1

994)

stat

ion

ary

boot

stra

pw

ith

exp

ecte

db

lock

size

of

5.

We

test

the

nu

llof

no

diff

eren

cein

dir

ectl

yby

con

stru

ctin

gth

etw

o-si

ded

boot

stra

pin

terv

als

for

the

diff

eren

ce.∗∗∗ ,∗∗,

an

d∗

ind

icate

sign

ifica

nce

at

the

1%

,5%

an

d10%

leve

l.

36

Tab

le5:

Ou

t-of

-Sam

ple

Sh

arp

eR

atio

(Mon

thly

)

[1]

Mon

thly

Shar

pe

Rat

ios

(SR

s)[2

]D

iffer

ence

sin

SR

sb/w

GM

V-Ψ

ρan

d#

Dataset

GM

V-Ψ

ρG

MV

-S−

1E

W(1

/N)

GM

V-J

MG

MV

-LW

GM

V-S−

1E

W(1

/N)

GM

V-J

MG

MV

-LW

1SZBM25

0.29

50.

340

0.13

70.

105

0.22

7-0

.045

**0.1

58*

**0.

190

***

0.0

68

21

45

32

SZBM100

0.26

00.

112

0.11

30.

065

0.215

0.14

8**

0.14

7**

*0.1

95*

**0.0

45

14

35

23

IND30

0.15

40.

129

0.13

70.

149

0.15

50.0

26

0.01

70.

005

-0.0

01

25

43

14

IND48

0.12

60.

070

0.12

90.

151

0.119

0.05

6**

-0.0

03

-0.0

25

0.0

07

35

21

45

SZBM25+IN

D30

0.27

10.

337

0.13

90.

157

0.26

7-0

.066

0.132

***

0.11

4***

0.0

04

21

54

36

SZBM100+IN

D48

0.26

7T<

N0.

119

0.12

30.

315

T<

N0.

148

***

0.14

4***

-0.0

48

25

43

17

IntM

kt0.

134

0.12

90.

123

0.07

30.

129

0.005

0.0

11

0.06

00.0

05

12

45

38

IntValG

ro0.

257

0.05

30.

139

0.14

10.

224

0.20

3**

0.11

7*0.1

15*

0.0

33

15

43

29

IntM

kt+IntValG

ro0.

273

T<

N0.

138

0.14

20.

227

T<

N0.1

36*

*0.

132

**0.

046**

15

43

210

MSCI

0.08

30.

082

0.10

20.

137

0.09

10.

000

-0.0

19-0

.055

-0.0

09

45

21

311

Individuals

0.14

70.

100

0.18

10.

101

0.10

50.

047*

**-0

.034

***

0.04

6**

*0.0

42***

25

13

2

Not

es:

Th

ista

ble

rep

orts

the

ou

t-of-

sam

ple

mon

thly

Sh

arp

era

tios

for

the

foll

owin

gfi

ve

port

foli

os:

GM

V-Ψ

ρ,

GM

V-S−

1,

EW

(1/N

),G

MV

-JM

,an

dG

MV

-LW

.T<N

ind

icat

esth

atth

ep

ortf

olio

can

not

be

con

stru

cted

du

eto

the

non

-inve

rtib

ilit

yof

the

sam

ple

cova

rian

cem

atri

x.

Colu

mn

su

nd

er[1

]re

por

tth

ele

vel

ofm

onth

lyS

har

pe

rati

os.

Nu

mb

ers

bel

owth

eS

harp

era

tios

initalics

ind

icate

the

ran

kin

gof

Sh

arp

era

tios

am

ong

the

five

por

tfol

ios:

1in

dic

ates

the

hig

hes

tS

har

pe

rati

oan

d5

ind

icate

sth

elo

wes

t.C

olu

mn

su

nd

er[2

]re

por

td

iffer

ence

sin

Sh

arp

era

tios

bet

wee

nG

MV

-Ψρ

and

oth

erfo

ur

por

tfol

ios.

We

test

the

nu

llof

no

diff

eren

cein

Sh

arp

era

tios

by

con

stru

ctin

gtw

o-si

ded

boot

stra

pin

terv

als

for

the

diff

eren

ce,

usi

ng

the

stu

den

tize

dci

rcu

lar

blo

ckb

oots

trap

of

Led

oit

and

Wol

f(2

008

)w

ith

ab

lock

size

equ

alto

5.∗∗∗ ,∗∗,

and∗

ind

icat

esi

gnifi

can

ceat

the

1%,

5%an

d10%

leve

l.

37

Tab

le6:

Tu

rnov

eran

dT

Cos

ts-a

dju

sted

Cer

tain

tyE

qu

ival

ent

Exce

ssR

etu

rns

(CER

s)

[1]

Mon

thly

Turn

over

[2]

TC

osts

-adju

sted

CE

R(a

nnual

%)

#Dataset

GM

V-Ψ

ρG

MV

-S−

1E

W(1

/N)

GM

V-J

MG

MV

-LW

GM

V-Ψ

ρG

MV

-S−

1E

W(1

/N

)G

MV

-JM

GM

V-L

W

1SZBM25

0.44

20.

796

0.01

80.

095

0.30

66.

316.

430.6

6-1

.44

4.3

72

14

53

2SZBM100

0.53

47.

970

0.02

50.

282

1.22

04.

17-5

6.55

-0.9

4-1

3.3

2-1

.83

15

32

43

IND30

0.23

40.

464

0.03

00.

045

0.22

91.

40-1

.26

0.8

82.1

11.4

43

54

12

4IN

D48

0.29

80.

778

0.03

30.

062

0.32

7-0

.19

-6.4

20.3

42.0

8-0

.73

35

21

45

SZBM25+IN

D30

0.49

91.

626

0.02

50.

043

0.41

84.

301.2

70.9

72.4

84.8

32

45

31

6SZBM100+IN

D48

0.52

7T<

N0.

028

0.06

61.

140

4.11

T<

N-0

.38

0.6

42.8

21

54

32

7IntM

kt0.

234

0.33

00.

026

0.08

70.

134

-0.2

4-1

.12

-0.9

8-3

.72

0.1

52

42

51

8IntValG

ro0.

718

134.

598

0.03

40.

102

0.47

53.

14-1

088.

770.0

20.5

43.0

71

54

32

9IntM

kt+IntValG

ro0.

886

T<

N0.

033

0.10

20.

534

2.92

T<

N-0

.09

0.5

52.8

91

54

32

10MSCI

0.19

20.

268

0.02

80.

101

0.13

9-2

.09

-2.7

2-2

.89

0.1

4-1

.36

35

41

211

Individuals

0.16

05.

220

0.03

30.

109

0.31

91.

54-3

1.9

93.2

7-0

.24

-0.7

52

51

34

Not

es:

Colu

mn

su

nd

er[1

]re

port

turn

over

of

the

five

por

tfol

ios:

GM

V-Ψ

ρ,

GM

V-S−

1,

EW

(1/N

),G

MV

-JM

,an

dG

MV

-LW

.P

lease

see

the

text

for

thei

rd

efin

itio

n.

Cal

cula

tion

ofth

ep

ortf

olio

turn

over

foll

ows

DeM

igu

el,

Garl

app

i,an

dU

pp

al

(2009).

T<N

ind

icat

esth

at

the

por

tfol

ioca

nn

ot

be

con

stru

cted

du

eto

the

non

-inve

rtib

ilit

yof

the

sam

ple

cova

rian

cem

atr

ix.

Colu

mn

su

nd

er[2

]re

port

the

valu

esof

the

tran

sact

ion

cost

adju

sted

cert

ainty

equ

ival

ent

exce

ssre

turn

s(T

Cos

t-ad

just

edC

ER

s),

show

nin

an

nu

al

per

centa

ge

poin

ts.

Th

etr

ansa

ctio

nco

stof

each

isca

lcu

late

das

50b

asis

poi

nts

tim

esm

onth

lytu

rnov

erti

mes

12

(to

an

nu

ali

ze).

38

Tab

le7:

Eff

ects

ofN

o-S

hor

t-S

ale

Res

tric

tion

[1]

Mon

thly

Var

iance

(%2)

[2]

Mon

thly

Shar

pe

Rat

io(S

R)

[3]

Month

lyT

urn

over

#Dataset

GM

V-Ψ

ρG

MV

-JM

GM

V-J

M-Ψ

ρG

MV

-Ψρ

GM

V-J

MG

MV

-JM

-Ψρ

GM

V-Ψ

ρG

MV

-JM

GM

V-J

M-Ψ

ρ

1SZBM25

13.5

223

.20

23.1

40.

295

0.10

50.

105

0.4

42

0.0

95

0.0

90

2SZBM100

13.3

058

.57

57.9

00.

260

0.06

50.

047

0.5

34

0.2

82

0.2

82

3IN

D30

12.3

515

.97

16.2

50.

154

0.14

90.

152

0.2

34

0.0

45

0.0

40

4IN

D48

12.4

516

.22

16.4

50.

126

0.15

10.

151

0.2

98

0.0

62

0.0

58

5SZBM25+IN

D30

10.0

115

.91

15.9

20.

271

0.15

70.

157

0.4

99

0.0

43

0.0

44

6SZBM100+IN

D48

10.7

016

.43

16.4

40.

267

0.12

30.

126

0.5

27

0.0

66

0.0

82

7IntM

kt20

.06

25.4

325

.56

0.13

40.

073

0.07

50.2

34

0.0

87

0.0

83

8IntValG

ro15

.15

23.7

223

.61

0.25

70.

141

0.144

0.7

18

0.1

02

0.0

90

9IntM

kt+IntValG

ro15

.25

23.6

423

.51

0.27

30.

142

0.14

50.8

86

0.1

02

0.0

96

10MSCI

16.5

424

.84

24.6

00.

083

0.13

70.

143

0.1

92

0.1

01

0.0

95

11Individuals

10.1

013

.40

13.9

10.

147

0.10

10.

105

0.1

60

0.1

09

0.1

05

Not

es:

Th

ista

ble

com

pare

sG

MV

-Ψρ,

GM

V-J

M,

and

GM

V-J

M-Ψ

ρin

term

sof

[1]

ou

t-of-

sam

ple

mon

thly

retu

rnva

rian

ce;

[2]

month

lyS

har

pe

Rat

io,

and

[3]

mon

thly

turn

over

.G

MV

-JM

-Ψρ

rep

lace

sth

ein

vers

esa

mp

leco

vari

an

cem

atr

ixu

sed

inG

MV

-JM

wit

hth

ep

rop

osed

spar

sein

vers

eco

vari

ance

mat

rix

Ψρ.In

oth

erw

ords,

GM

V-J

M-Ψ

ρis

GM

V-Ψ

ρw

ith

no-

short

-sale

rest

rict

ion

(e.g

.,Jaga

nn

ath

anan

dM

a(2

003)

).

39

Table 8: Comparison with Indusry factor models

GMV-Ψρ GMV-IND

Conditional number

mean 321 568

std 86 155

max 539 1031

min 166 289

Monthly Variance(%2)

mean 10.097 11.011

std 2.226 2.563

max 14.919 15.870

min 6.048 6.519

Reduction in average Monthly Volatility (%) 0.137

(GMV-Σ−1IND−GMV-Ψρ) (2.564)

Monthly Sharpe Ratios (SRs)

mean 0.147 0.138

std 0.031 0.035

max 0.237 0.229

min 0.065 0.036

Difference in average SR 0.010

(GMV-Ψρ−GMV-Σ−1IND) (2.022)

Monthly Turnover 0.160 0.281

Tcosts-adjusted CER (annual %)

mean 1.537 0.429

std 1.247 1.477

max 4.866 4.101

min -2.471 -4.947

Difference in average CER 1.108

(GMV-Ψρ−GMV-Σ−1IND) (5.735)

Notes: This table compares two portfolios, GMV-Ψρ, and GMV-Σ−1IND in terms of out-of-sample

conditional number, monthly return variance, monthly Sharpe Ratio, monthly turnover andTcost adjusted certainty equivalent return (CER). Please see the text for their definition. Cal-culation of the portfolio turnover follows DeMiguel, Garlappi, and Uppal (2009). The tests forthe differences in the average value of monthly volatility, Sharpe Ratio and Tcost adjusted CERare conducted using 100 sample, where 100 individual stocks randomly picked from NYSE andAMEX. 40

Tab

le9:

Eff

ects

ofu

pd

atin

[1]

Mon

thly

Vari

ance

(%2)

[2]

Mon

thly

Sharp

eR

ati

o(S

R)

[3]

TC

ost

s-adju

sted

CE

R(a

nnual

%)

#Dataset

oldρ

new

ρw

ith

oldρ

wit

hnew

ρch

ange

wit

holdρ

wit

hnew

ρch

ange

wit

holdρ

wit

hnew

ρch

ange

1SZBM25

0.3

0.2

15.9

8915

.787

-0.2

020.

204

0.2

17

0.0

13

2.4

32

2.8

45

0.4

13

2SZBM100

1.7

0.8

15.6

97

15.3

47

-0.3

50

0.14

90.

140

-0.0

08

-0.7

77

-1.8

76

-1.0

98

4IN

D48

1.3

1.5

12.3

47

12.2

28

-0.1

19

-0.0

19

-0.0

16

0.0

03

-6.1

71

-5.9

37

0.2

34

6SZBM100+IN

D48

1.9

1.1

11.2

39

10.9

53

-0.2

86

0.11

30.

107

-0.0

06

-1.8

46

-2.6

37

-0.7

91

7IntM

kt0.

21.

020

.363

19.8

17

-0.5

45

0.16

60.

171

0.0

04

1.3

42

2.0

88

0.746

8IntValG

ro1.

11.

614

.961

15.1

38

0.17

60.

183

0.175

-0.0

08

0.4

33

0.6

69

0.236

9IntM

kt+IntValG

ro0.

81.

315

.150

14.8

54

-0.2

96

0.19

60.

191

-0.0

06

0.1

56

0.8

61

0.7

05

10MSCI

0.7

1.1

13.4

22

13.3

04

-0.1

18

0.05

60.

060

0.0

05

-2.7

53

-2.3

99

0.3

53

Not

es:

Th

ista

ble

exam

ines

the

effec

tof

up

dat

ingρs.

For

each

dat

aset

,w

esp

lit

the

ou

t-of-

sam

ple

test

ing

per

iod

into

two

even

hal

ves.

We

then

up

dat

eth

ees

tim

ati

onofρ

for

the

seco

nd

hal

fof

the

out-

of-s

amp

lete

stin

gp

erio

d,

then

reco

nst

ruct

the

GM

V-Ψ

ρ,

port

foli

o(‘

wit

hn

ewρ’)

,an

dco

mp

are

the

resu

lts

wit

hth

ose

pre

vio

us

resu

lts

wit

hou

tu

pd

ati

ngρ

(‘w

ith

oldρ’)

.T

he

com

pari

son

sare

new

resu

lts

min

us

old

resu

lts

inte

rms

of

[1]ou

t-of

-sam

ple

mon

thly

retu

rnva

rian

ce;

[2]m

onth

lyS

har

pe

Rat

io,an

d[3

]T

cost

-ad

just

edC

ER

.D

atase

t#

3an

d#

5ar

en

ot

inth

eco

mp

aris

onas

re-e

stim

ates

ofρ

do

not

gen

erat

ea

new

valu

e.

41