sparse weighted norm minimum variance portfolio · sparse weighted norm minimum variance portfolio...
TRANSCRIPT
Sparse Weighted Norm Minimum Variance Portfolio
Yu-Min Yen ∗†
June 10, 2013
Abstract
We propose to impose a weighted l1 and squared l2 norm penalty on the portfolio weights
to improve performances of portfolio optimization when number of assets N becomes large.
We show that under certain conditions, realized risk of the optimal portfolio obtained
from the strategy can asymptotically be less than those of some benchmark portfolios
with high probability. An intuitive interpretation for why including a smaller number of
assets may be beneficial in the high dimensional situation is built on a constraint between
sparsity of the optimal portfolio weight vector and the realized risk. The theoretical results
also imply that penalty parameters for the weighted norm penalty can be specified as a
function of the number of assets N and sample size n for estimating parameters used in
the portfolio optimization. We then adopt a coordinate-wise descent algorithm to solve
the penalized weighted norm portfolio optimization. We find performances of the weighted
norm strategy dominate other benchmarks for the case of Fama-French 100 size and book
to market ratio portfolios, but are mixed for the case of individual stocks. We also propose
several alternative norm penalties and show that their performances are comparable to the
weighted norm strategy.
KEYWORDS: Sparsity and diversity; Weighted norm portfolio; Coordinate-
wise descent algorithm.
JEL Codes: C40, C61, G11.
∗Postdoc Research Fellow, Institute of Economics, Academia Sinica, 128 Academia Road, Section 2, Nankang,Taipei 115, Taiwan (E-mail: [email protected]). I would like to thank Yeutien Chou, Jack Favilukis,Christian Julliard, Oliver Linton, Antonio Mele, Philippe Mueller, Kevin Sheppard, Andrea Vedolin and Tso-Jung Yen for helpful comments.†The program codes for replicating all numerical results shown in this paper are available from the author.
1
1 Introduction
In this paper, we propose to impose a weighted l1 and squared l2 norm penalty on the portfolio
weights to improve performances of portfolio optimization when the number of assets N be-
comes large. An optimal portfolio often has a less than expected performance when the number
of assets N is large relative to the sample size n for estimating parameters used in portfolio
optimization. The main reason is that when the sample size n is not relatively large enough to
the number of assets N , cumulative estimation errors of the estimated parameters will become
non-negligible, which will deteriorate accuracy on solving the portfolio optimization.
In such a situation, if we want to reduce impacts from the estimation errors, we can
consider to choose a smaller number of assets, say N ′ ≤ N in the portfolio optimization. It
is equivalent to imposing an l0 norm penalty on the portfolio weights. To obtain the optimal
number of assets N ′, however, we need to solve a combinatoric optimization which in general
is intractable for large N . An alternative way is to impose an l1 norm penalty on the portfolio
weights. The l1 norm penalty, like the l0 norm penalty, facilitates sparsity (zero components)
of the portfolio weight vector, which in turn leads to automatically selecting and excluding
certain assets. The l1 norm is also a convex function of the portfolio weights, and such convex
relaxation makes the modified portfolio optimization easily tractable even when N becomes
very large. In fact, except for the l1 norm, there does not exist a norm penalty which can
simultaneously produce sparsity as well as being a convex function of the portfolio weights.
While the l1 norm penalty can reduce the number of assets, it may also cause problems of
under diversification and extreme portfolio weights in the optimal portfolio. To mitigate the
two problems, we propose to additionally impose a squared l2 norm penalty on the portfolio
weights. The squared l2 norm penalty does not produce any sparsity, but it can efficiently
regularize size of the portfolio weights. Thus the squared l2 can function as a solution to
alleviate the problems of under diversification and extreme portfolio weights. With the l1 and
squared l2 norm penalties jointly imposed on portfolio weights in a minimum variance portfolio
(mvp) optimization, we call the solved optimal portfolio the weighted norm minimum variance
portfolio.
While combining the l1 and squared l2 norm penalties together in the portfolio optimization
is rarely seen in previous literature1, using them separately is not a new idea (e.g. Brodie
et al., 2009; DeMiguel et al., 2009a; Fan et al., 2012; Welsch and Zhou, 2007). In this paper,
we show that the number of assets included in the portfolio is the key to explain why the
weighted norm approach can work well in the situation when the number of available assets N
becomes large. We compare the one period ahead out-of-sample (oos) conditional variances of
1Various norm penalties have been widely used in statistics for model selections and ill-posed problems, andthere are plenty literatures devoted on studying their properties. Using the l1 norm penalty can be dated toTibshirani (1996). The squared l2 norm penalty was originally proposed by Russian mathematician Andrey N.Tikhonov. Combining the l1 and squared l2 norm penalties for regression problems were first proposed in Zouand Hastie (2005). Summary for recent developments on the norm penalty approaches in statistics can be foundin Hastie et al. (2009) and Buhlmann and van de Geer (2011).
2
different portfolios. The oos conditional variance is a reasonable measure for the risk that an
investor will immediately face in the next period when she decides to adopt a certain portfolio
strategy in the current period. We prove that under certain conditions, the one period ahead
oos conditional variance of the weighted norm mvp portfolio will be less than that of some
benchmark portfolios with high probability, as the number of assets N and sample size n both
becomes large. One of the conditions explicitly characterizes a constraint between sparsity of
the portfolio weight vector and the true conditional portfolio variances. The constraint states
a relationship between two fundamental factors affecting portfolio performances: namely how
many assets and which assets should to be included in the portfolio, and in turn provides a
heuristic justification on why including fewer assets may be beneficial in the high dimensional
situation.
The theoretical results also imply that the penalty parameters for the weighted norms
can be specified as a function of N and n. The specification helps us to tackle the problem of
choosing the optimal penalty parameters. Previous research suggests that we can use nonpara-
metric methods such as cross validation to choose the optimal penalty parameters (DeMiguel
et al., 2009a). However, cross validation is computational intensive and may produce very
unstable sequences of the penalty parameters. The instability of the penalty parameters may
damage the performance of the weighted norm strategy. On contrary, our specification for
the penalty parameters is easier for implementation and can produce stable sequences of the
penalty parameters, and consequently leads to satisfactory results in the empirical analysis.
In addition to the econometric side, we also explain why using the norm penalty in the
portfolio optimization is reasonable from other perspectives. Imposing the norm penalty is
similar as an investor wants to limit transaction costs or exposure to risky assets or faces
liquidity constraint such as margin requirements. The approach also can be viewed as an
investor making a decision based on marginal increment of the portfolio variance. The view
is closely related to Gabaix (2011), who showed that many different bounded rationality and
psychological phenomenons can be delineated by models that incorporate the l1 norm penalty
into individuals’ optimization problems. We also discuss the relations between the weighted
norm approach, the maximum a posteriori probability (MAP) estimator and the minimum
mean square deviation problem.
To solve the weighted norm mvp optimization problem, we use a coordinate-wise descent
algorithm proposed by Yen and Yen (2011). The algorithm is fast, efficient and can be easily
extended to mvp optimization problems with various norm penalties. The algorithms previ-
ously used to solve such norm constrained portfolio optimization problems are either quadratic
programming or the least angle regression (LARS) type algorithms (Efron et al., 2004). Re-
cently, the coordinate-wise descent algorithms have been shown to be powerful tools for solving
large dimensional variable selection problems in which the norm penalties are imposed on co-
variate coefficients (Friedman et al., 2007). We also demonstrate that the coordinate-wise
descent algorithm can be used to solve portfolio optimization with various norm penalties in
3
the empirical analysis.
We use two real data sets, the Fama-French 100 size and book to market portfolios (FF100)
and three hundred stocks randomly chosen from CRSP data bank (CRSP300), to demonstrate
how the weighted norm approach performs in real world. The covariance matrix is estimated by
the sample covariance estimator with an expanding window scheme. The estimated covariance
matrix by using the expanding window scheme may be less capable to capture dynamics of the
true conditional covariance matrix than those by using the rolling window scheme. However,
the expanding window scheme can make the estimated covariance matrix more stable, leading
to a decrease in portfolio turnover rate. The other source for reducing the portfolio turnover
rate is the l1 norm penalty. Unlike explicitly imposing constraint on the portfolio turnover
rate (e.g., DeMiguel et al., 2010; Kirby and Ostdiek, 2011), we find that the two sources are
enough to prevent high transaction costs.
We compare performances of the weighted norm mvp with three benchmark portfolio strate-
gies: 1/N , no-shortsale minimum variance portfolio (nsmvp) and global minimum variance
portfolio (gmvp). For the FF100, the weighted norm mvp yields an annualized variance from
82.97% to 98.60% and an annualized Sharp ratio from 1.06 to 1.18 when different penalty pa-
rameters are used. For the three benchmark strategies, their annualized variances are 311.16%
(1/N), 176.21% (nsmvp) and 77.66% (gmvp) and annualized Sharpe ratios are 0.45 (1/N), 0.74
(nsmvp) and 0.94 (gmvp). For the CRSP300, the performances of the weighted norm mvp are
mixed. Although it is able to produce an annualized variance lower than those of the three
benchmark strategies, it fails to achieve higher Sharpe ratio than the no-shortsale mvp. These
empirical results are robust to different frequencies of portfolio rebalances.
Previous literatures argued that adding the estimated return vector into the portfolio opti-
mization often damages portfolio performances (Jagannathan and Ma, 2003; DeMiguel et al.,
2009a). We re-examine this argument by imposing an additional target return constraint into
the weighted norm mvp optimization. We find that the performances are not as good as
the case without such target return constraint imposed, which is in line with the previous
literatures found.
Finally, we investigate whether imposing different forms of norm penalties on the portfolio
optimization can obtain better portfolio performances than the weighted norm penalty does.
Three novel alternative penalties are introduced, and the reasons why they can be used in
the portfolio optimization are also discussed. We find they can deliver at least comparable
performances as the weighted norm mvp and other benchmarks.
Besides the literatures mentioned above, our work also relates to recent developments
on the penalized norm approaches for the large dimensional portfolio optimization. Fastrich
et al. (2012) demonstrated how imposing nonconvex norm penalties can improve portfolio
performances when the number of assets becomes large. Fan et al. (2013) analyzed properties
of risks of the large dimensional portfolios which are obtained from portfolio optimiation with
covariance matrices estimated by various methods. Our theoretical results are different from
4
theirs. While Fan et al. (2013) focused on how to estimate the risks of the portfolios and
evaluate the accuracy of such estimations, our theoretical analysis compares risks of different
portfolio strategies, and explains why some of them can generate lower risks than the others
in the large dimensional situation.
In addition to the norm penalty approach, there are many other methods proposed for
improving portfolio performances when the number of asset N becomes large. The most
frequently used one may be to assign the portfolio weights with some simple rules, and avoid
massive estimations on the parameters used in the portfolio optimization. The value weighted
and equally weighted (1/N) portfolios are such examples. DeMiguel et al. (2009b) showed
how such simple strategies can outperform more sophisticated strategies. We also can treat
the optimal portfolio weights as a function of parameters in a parsimonious structural model.
The parameters can be directly estimated and used to construct the optimal portfolio weights.
Brandt et al. (2009) and Kirby and Ostdiek (2011) showed such method can deliver better
portfolio performances than some benchmark portfolio strategies. The other frequently used
way is to construct more robust statistical estimators for the mean vector and covariance
matrix of the asset returns, such as bias-adjusted or Bayesian shrinkage estimators (El Karoui,
2010; Jorion, 1986; Kan and Zhou, 2007; Ledoit and Wolf, 2003; Lai, Xing, and Chen, 2011),
and use them in the portfolio optimization problems. We also can combine the improved
portfolios to form a new portfolio; for example Frahm and Christoph (2011) and Tu and Zhou
(2011) showed that a suitable linear combination of weights of a benchmark portfolio and a
more sophisticated strategy often provides better performances than either only one of them
is considered. It is natural to incorporate the latter two approaches with the norm penalty
strategy.
The rest of the paper is organized as follows. In Section 2, we introduce the weighted
norm approach and describe some basic properties of the weighted norm mvp. We discuss
the theoretical results in section 3. In Section 4 we give some explanations on the norm
constrained mvp problem from various economic and statistical perspectives. We then describe
the coordinate-wise descent algorithm for solving the weighted norm mvp optimization in
Section 5. In Section 6, we present the empirical results. Section 7 is the conclusion.
2 Methodology
2.1 The Weighted Norm MVP Optimization
The weighted norm mvp (minimum variance portfolio) optimization is defined as
minw
wTΣw + λ1 ‖w‖1 + λ2 ‖w‖22 , subject to Aw = u, (1)
where w is a N × 1 portfolio weight vector and Σ is a N ×N covariance matrix of the asset
returns (or the asset excess returns). The ith diagonal term in Σ is variance of the ith asset
5
return σ2i , i = 1, . . . , N , and the i, jth off-diagonal term in Σ is covariance of the ith and jth
asset returns σij , i, j = 1, . . . , N , and i 6= j. The optimization problem (1) has the portfolio
variance plus a penalty function on the portfolio weights as its objective function and subject
to a set of linear constraints. Without such a penalty, the portfolio optimization is the global
minimum variance portfolio (gmvp) optimization.
We call the penalty function imposed in (1) the weighted norm penalty, which is a combi-
nation of two norm penalty functions. One is the l1 norm penalty function ‖w‖1 =∑N
i=1 |wi|and the other is the squared l2 norm penalty function ‖w‖22 =
∑Ni=1w
2i . The parameters
λ1, λ2 ∈ R+ are the penalty parameters. Aw = u is a system of k linear constraints on the
portfolio weights, where dim(A) = k × N and dim(u) = k × 1. To guarantee the objective
function in (1) is a convex function of w, Σ should be positive semidefinite (psd). If the set of
solutions for Aw = u is non-empty, w is called feasible. With psd of Σ and feasibility of w,
(1) is a well defined convex optimization.
For the linear constraints, the most frequently used one is the full investment constraint,
which requires sum of portfolio weights equals to 1 (or a constant): 1TNw = wT1N = 1, where
1N is a N×1 vector which elements are all 1’s. Another example is the target return constraint,
in which the expected portfolio return should satisfy a certain required level: µTw = wTµ = µ,
where µ is a N × 1 vector of the expected asset returns and µ is the required portfolio return.
The linear constraints can also be specified for other purposes. Cochrane (2011) considered to
constrain covariance of the portfolio return and a factor f ,
cov (Rp, f) =
N∑i=1
wicov (Ri, f) = σTf,Rw = ξf , (2)
where σf,R is a N × 1 vector in which the ith element is the covariance of the ith asset return
and the factor f , and ξf is the required level of comovement of the portfolio return and the
factor f . The motivation for such a linear constraint is that the investor may want to limit
volatility of her holding assets while the factor f fluctuates. For instance, the investor may
fear that her labor income declines simultaneously with the values of her holding assets. Hence
she may choose to hold a portfolio which has a low or negative comovement with her labour
income. Such a portfolio can be constructed by solving a mvp optimization with the linear
constraint (2) by treating the factor f as her labor income stream.
2.2 Some Notations and Basic Properties
The Lagrangian of the optimization problem (1) is given by
L (w, γ;Σ, λ1, λ2) = wTΣw + λ1 ‖w‖1 + λ2 ‖w‖22 + +γT (u−Aw) ,
where γ is a k × 1 vector of the Lagrange multipliers for the linear constraints Aw = u.
Let w∗ = (w∗1, . . . , w∗N ) denote the optimal solution of (1), and S = i : w∗i 6= 0 and Sc =
6
i : w∗i = 0 denote the sets of assets with nonzero and zero optimal portfolio weights respec-
tively. Let |S| and |Sc| denote cardinality of the sets S and Sc. Without loss of generality,
we can do some rearrangements on the optimal portfolio weight vector w∗. Let the first |S|elements of w∗ be the nonzero optimal portfolio weights, and the rest |Sc| = N − |S| elements
be the zero optimal portfolio weights. Then we can partition Σ as
Σ =
(Σss Σssc
Σscs Σscsc
).
Here Σss (Σscsc) is a |S|× |S| (|Sc|× |Sc| ) sub-principal matrix of Σ, which can be constructed
by deleting the last |Sc| (the first |S|) columns and rows from Σ. The matrix Σss is the
covariance matrix of the |S| assets with the nonzero optimal portfolio weights, and Σscsc is the
covariance matrix of the |Sc| assets with the zero optimal portfolio weights. In the off-diagonal
parts of Σ, Σssc = ΣTscs and dim(Σssc) = |S| × |Sc| and dim(Σscs) = |Sc| × |S|. Σssc and Σscs
are matrices of covariances of two assets i and j where i ∈ S and j ∈ Sc. By similar fashion,
we can also partition A as A = (As, Asc) , where dim(As) = k × |S| and dim(Asc) = k × |Sc|.Let w∗s be the 1× |S| nonzero optimal portfolio weight vector and γ∗ be the k × 1 vector
of the Lagrangian multipliers for the linear constraints Aw = u from solving (1). At the
stationary point, the following KKT conditions should hold:
2 (Σss + λ2Iss) w∗s = ATs γ∗ − λ1 × sign (w∗s) , (3)
Asw∗s = u, (4)∥∥2Σ′scsw
∗s −AT
scγ∗∥∥∞ ≤ λ1, (5)
where Iss is a |S| × |S| identity matrix, sign (.) is the sign function and ‖.‖∞ is the sup norm.
Let 0|Sc| be a 1 × |Sc| zero vector and the optimal portfolio weight vector is then given by
w∗ =(w∗s ,0|Sc|
). We can use the above KKT conditions to solve for the vectors w∗s and γ∗,
which are shown in the following Lemma.
Lemma 1 Let Σ′ss = Σss + λ2Iss, Ms = AsΣ−1ss A
Ts and M ′s = AsΣ
′−1ss A
Ts . We then have
w∗s = w∗2,s +λ12
(Σ′−1ss A
Ts δ2,s − Σ′−1ss sign (w∗s)
),
γ∗ = γ∗2,s + λ1δ2,s,
where w∗2,s = Σ′−1ss ATs M
′−1s u, δ2,s = M ′−1s AsΣ
′−1ss sign (w∗s), γ∗2,s = 2M ′−1s u, and sign (w∗s) is
a |S| × 1 vector in which the elements are signs of the optimal nonzero portfolio weights.
Proof of Lemma 1 can be found in Appendix 8.1. We can see that w∗2,s is just the optimal
portfolio weight vector for the following mvp optimization,
w∗2,s = arg minw
wTΣ′ssw, subject to Asw = u, (6)
7
and γ∗2,s is the vector of Lagrange multipliers for the linear constraints Asw = u in (6). Note
that (6) is a mvp optimization penalized by the squared l2 norm penalty 2. Without the
squared l2 norm penalty, (6) becomes a gmvp optimization. Let w∗un,s be the optimal portfolio
weight vector for such a gmvp optimization, i.e.,
w∗un,s = arg minw
wTΣssw, subject to Asw = u. (7)
and γ∗un,s be the vector of Lagrange multipliers for the linear constraints Asw = u in (7).
Similar to w∗2,s and γ∗2,s, we can solve w∗un,s and γ∗un,s explicitly as w∗un,s = Σ−1ss ATs M
−1s u and
γ∗un,s = 2M−1s u. Let σ2s := w∗TΣw∗, σ2un := w∗TunΣw∗un, σ22,s := w∗T2,sΣssw∗2,s and σ2un,s :=
w∗Tun,sΣssw∗un,s denote the minimum portfolio variances via solving different mvp optimization.
Here σ2s is the minimum portfolio variance via solving the weighted norm mvp optimization
(1)3; σ2un and w∗un are the portfolio variance and optimal portfolio weight vector of the gmvp
with all assets and σ22,s and σ2un,s are the minimum portfolio variances via solving (6) and (7).
With the above notations, we can have the following Lemma, which provides a key inequal-
ity for proof of the main results in Section 3.2.
Lemma 2 Let φj (Σss) , j = 1, . . . , |S| denote eigenvalues of Σss. Let
Σ′ss = Σss + λ2Iss,
φmin (Σss) = arg minj=1,...,|S|
φj (Σss) ,
where Iss denote a |S| × |S| identity matrix. Suppose φmin (Σss) > 0. Then
0 ≤ σ2s − σ2un,s ≤ cs,1 (1 + cs,1)σ2un,s − λ2 ‖w∗s‖
22 +
λ12
(∥∥w∗2,s∥∥1 − ‖w∗s‖1) ,where cs,1 = λ
122
(φmin (Σss)
)− 12 .
Proof of Lemma 2 can be found in Appendix 8.2. The Lemma indicates that σ2s is bounded
by a linear combination of the scaled σ2un,s, the penalty parameters and the norm penalties.
Lemma 1 and Lemma 2 are stated in a deterministic way, but these results still hold when Σ
is replaced by some positive semidefinite estimate of the covariance matrix.
2.3 Relation to the Benchmark Portfolios
It can be easily seen that we can solve the weighted norm mvp optimization (1) as a l1 penalized
portfolio optimization with the covariance matrix Σ′ = Σ+λ2INN , where INN denotes a N×Nidentity matrix. In a special situation when λ1 = 0 and λ2 → ∞, the optimal weights from
2The objective function of (6) is sum of the portfolio variance and the squared l2 norm penalty on theportfolio weights with penalty parameter λ2.
3Note that w∗TΣw∗ = w∗Ts Σssw∗s .
8
solving (1) will converge to 1/N (DeMiguel et al., 2009a). When the linear constraint is the full
investment constraint wT1N = 1 and only the l1 penalty is active (λ2 = 0), it can be shown
that when λ1 is beyond some threshold λ > 0, the optimal weight vectors of the weighted
norm mvp and no-shortsale mvp (nsmvp) will be identical (Brodie et al., 2009; DeMiguel
et al., 2009a; Fan et al., 2012). In this situation, using any λ1 ≥ λ for solving (1) will only
generate the optimal no-shortsale weight vector. Practically to pin down the upper bound λ
is important if we want to search the optimal λ1 over a range of possible values. Yen and Yen
(2011) show that the upper bound λ can be easily obtained by using the optimal no-shortsale
solution.
Furthermore, from the KKT conditions, it can be shown that the optimal nonzero portfolio
weights of the nsmvp can be obtained by solving (7) if we replace S with Sns =i : w∗ns,i > 0
,
where w∗ns,i is the optimal no-shortsales portfolio weight for asset i and Sns is the set of assets
with nonzero weights in the optimal nsmvp. In other words, the nsmvp, as a mvp with a
heavy l1 norm penalty, can be equivalently obtained from the gmvp optimization via using a
dimension-reduced covariance matrix Σss with S = Sns and assigning zero weights to assets
i /∈ Sns.The way to construct the no-shortsales mvp is different from Jagannathan and Ma (2003),
in which the authors argued that the no-shorsales mvp can be equivalently constructed from the
gmvp optimization via using the shrinkage type covariance matrix ΣJM = Σ−(ν1T
N + νT1N),
where ν is the vector of the Lagrange multipliers for the nonnegativity constraint wi ≥ 0,
i = 1, . . . , N . However, here we show that the no-shorsales mvp is a typical sparse portfolio
which can be constructed from casting the gmvp optimization on a suitable subset Sns of the
whole available assets, without shrinking any element in the covariance matrix used.
3 An Econometric Analysis on the Weighted Norm MVP
3.1 Basic Settings
In this section we analyze theoretical properties of the weighted norm mvp from econometric
perspectives. We introduce randomness into the portfolio optimization problem, and consider
when the covariance matrix is estimated by some estimated covariance matrix Σ. Let Rt ∈ RN
denote return vector of the N assets at period t. Suppose Rt has mean µ = (µ1, . . . , µN ) and
covariance Σ. Let Σ denote some estimate for Σ by using data Rtnt=1. The ith diagonal
term in Σ is estimated variance of the ith asset return σ2i , i = 1, . . . , N , and the i, jth off-
diagonal term in Σ is estimated covariance of the ith and jth asset returns σij , i, j = 1, . . . , N ,
and i 6= j. In order to distinguish the deterministic case, we let all of the notations used in
previous sections with a hat above to denote their counterparts from solving the same mvp
when Σ is replaced by Σ. For example , w∗ = (w∗1, . . . , w∗N ) will be used to denote the optimal
portfolio weight vector by solving (1) when Σ is replaced by Σ and S = i : w∗i 6= 0 and
Sc = i : w∗i = 0 will be used to denote the sets of the assets with nonzero and zero optimal
9
weights in the weighted norm portfolio. Given S and Sc, we can have the following partitions:
Σ =
(Σss Σssc
Σscs Σscsc
),
and A = (As, Asc) and w∗ =(w∗s ,0|Sc|
).
Let the out-of-sample (oos) return of the weighted norm mvp at t = n + 1 be Rwp,n+1 =
w∗TRn+1. With Rtnt=1, we have the out-of-sample (oos) conditional variance of the weighted
norm mvp
var (Rwp,n+1| Rtnt=1) = w∗Ts Σssw∗s , (8)
which can be viewed as a measure of risk that an investor will immediately face at period
n+ 1 when she allocates wealth according to w∗. Thus the oos conditional portfolio variance
is also called realized risk or out-of-sample risk (El Karoui, 2009, 2010). By similar fashion,
let Runp,n+1 = wTunRn+1 be the oos return of the gmvp, and we can have the oos conditional
variance of the gmvp
var (Runp,n+1| Rtnt=1) = wTunΣwun. (9)
Also we have the oos conditional variance of the 1/N portfolio
var
(1
N1TNRn+1| Rtnt=1
)=
1
N21TNΣ1N . (10)
Let σ21/N := N−21TNΣ1N . Unlike (8) and (9), the oos conditional variance of 1/N portfolio is
non-random, since its weights always have a deterministic value 1/N .
3.2 Main Results
When the number of assetsN and sample size n both become large, given two easily-implementable
benchmark portfolio strategies: the gmvp and 1/N , is it still worth to use the weighted norm
mvp strategy? To answer the question, we aim to compare their realized risks and see how
the probability that the realized risk of the weighted norm mvp (8) will be smaller than those
of the gmvp (9) and the 1/N portfolio (10). Before we state the main results, we introduce a
definition and some conditions that are used in the theoretical analysis.
Definition 1 (Asymptotical feasible set of the active constituents) We call S the asymptotical
feasible set of the active constituents for the weighted norm mvp if S is a subset of 1, . . . , Nsuch that as n and N →∞, the following KKT conditions hold,
2Σ′ssw∗s = AT
ksγ∗s − λ1 × sign (w∗s) ,
Asw∗s = u,∥∥∥2Σ′scsw
∗s −AT
sc γ∗s
∥∥∥∞≤ λ1.
10
In short, S ⊆ 1, . . . , N is a possible realization of S as n and N →∞.
The following five conditions are needed for proof of the main results.
Condition 1 All elements in Σ have finite values, and all eigenvalues of Σ are bounded away
from 0 and ∞, i.e., 0 φmin(Σ) ≤ φmax (Σ)∞.
Condition 2 The number of linear constraints k is fixed, and the set of solution for Aw = u
is non-empty. The elements in A and u are nonrandom.
Condition 3 When either Σ or Σ is used, for 0 ≤ λ1, λ2 < ∞, the weighted norm mvp
optimization is feasible as n and N →∞. The definition of a feasible optimization problem is
that there exists at least one feasible point such that the optimal value of the objective function
is finite.
Condition 4 There exists a constant B0 such that supS⊆1,...,N P (‖w∗s‖1 > B0) = o (1).
Condition 5 N < n and limn→∞ n−1N = ρN ∈ (0, 1).
We state the main theoretical results in the following two Theorems.
Theorem 1 Suppose Rt ∼ i.i.d. N (µ,Σ) for t = 1, . . . , n, Σ is the sample covariance matrix
estimated by using data Rtnt=1 and conditions 1 to 5 hold. Let S be the asymptotical feasible
set of the active constituents for the weighted norm mvp and assume |S| >> k. Let amin =
mini,j=1,...,N aij, where aij is a constant and has the same definition as in Lemma 4. If
λ1 = λ2 = λn,N = B0
√2 logN
aminn,
and assume as n and N →∞, σ2un,s√
log n→∞ and the following maximum ratio of portfolio
variances condition (MRPV) holds,
supS⊆1,...,N
(σ2un,s − σ2un
σ2un,s− ρs
)≤ 0, (11)
where ρs = limn→∞ n−1 |S|. Then as n and N →∞, P
(w∗TΣw∗ ≤ wT
unΣwun
)→ 1.
Theorem 2 With the same assumptions except (11) hold in Theorem 1. Let ηN = σ2(N−1 logN
) 12−ε
,
where σ2 > 0 and 0 < ε < 2−1 are two constants. Now assume as n and N → ∞,
0 < ηN < σ21/N and the following maximum ratio of portfolio variances condition (MRPV)
holds,
supS⊆1,...,N
σ2un,s − σ21Nσ2un,s
− ζs
≤ 0, (12)
where ζs = min(ρs −
(ηN/σ
2un,s
), 0). Then as n and N →∞, P
(w∗TΣw∗ ≤ σ21/N
)→ 1.
11
Proof of Theorem 1 and 2 can be found in Appendix 8.6. In addition to Lemma 1 and Lemma
2, the proof also relies on using Lemma 3, which shows bounded eigenvalues of the sample
covariance Σ, and Lemma 4, which provides an upper bound for the tail probability of the
estimation errors of elements in Σ 4.
Theorem 1 and 2 state that under certain conditions, the realized risk (conditional variance)
of the weighted norm mvp will be less than those of the gmvp and 1/N with high probability
as the number of assets N and sample size n both becomes large. The weighted norm penalty
tends to result in sparsity of the optimal portfolio weight vector, and consequently only a
smaller number of assets (say |S|) will be included in the portfolio. When only a smaller
number of assets is included, the true risk of this portfolio σ2un,s may be well approximated
by the realized risk of the weighted norm mvp w∗TΣw∗. On contrary, the realized risk of the
gmvp wTunΣwun may be a bad approximation for its true risk σ2un due to large N . Note that
the true risk of the gmvp σ2un is the minimum value that any realized risk can achieve (if the
corresponding mvp optimization is subject to the same linear constraints Aw = u). So if the
true risk of the portfolio with the smaller number of assets σ2un,s is not too different from σ2un,
then the realized risk of the weighted norm mvp can have a large chance to be smaller than
that of the gmvp, since the latter may be more far away from σ2un due to its bad empirical
property.
The above result implies that, to ensure the weighted norm strategy works well, σ2un,s and
σ2un should not be too different. How closeness should the two have? It is stated by the max-
imum ratio of portfolio variances condition (MRPV) of (11): For every possible asymptotical
feasible set of the active constituents S, the maximum difference between the two should not
exceed σ2un,sρs, where ρs = limn→∞ n−1 |S|.
While impacts from the estimation errors can be mitigated by shrinking size of the optimal
portfolio weight vector, the MRPV condition characterizes a link between how many assets
and which assets should to be included in the portfolio. In Theorem 1, the MRPV condition
also implies that the number of selected assets |S| should increase with the sample size n. To
see this, consider the case of fixing |S| < N , the upper bound σ2un,sρs goes to zero as n→∞,
but σ2un,s − σ2un > 0 hold if |S| < N and it violates the MRPV condition.
However, this does not always mean including more assets is beneficial. In addition to
inducing more estimation errors, including more assets might not ease the MRPV condition.
Consider the case when |S′| assets is selected, and |S′| > |S| but S * S′, so ρs′ > ρs. Neverthe-
less it is possible that σ2un,s < σ2un,s′ , since S is not a subset of S′. In turn, the MRPV condition
might hold for the portfolio with S assets, but not hold for that with S′ assets. Therefore re-
quiring a suitable set of assets included in the portfolio is also important for ensuring the
weighted norm strategy work 5.
4We give discussions on properties of the normally distributed returns, sample covariance matrix and esti-mation errors in Appendix 8.3 to 8.5.
5Note that the situation |S′| > |S| but S * S′ is possible when the weighted norm strategy is used. Supposethat λ1 > λ′1. The l1 norm penalty does not always guarantee that the set of assets being selected under λ1 is
12
For the case of comparing with σ21/N , the MRPV condition is (12), which requires that
σ2un,s should be no greater than σ21/N . The realized risk of the 1/N is σ21/N , and the realized
risk of the weighted norm mvp is bounded below by σ2un,s. Hence if σ2un,s > σ21/N , the weighted
norm mvp cannot have a smaller realized risk than the 1/N . Here we also require σ21/N cannot
decrease too fast with N . As n and N become to large, approximation error of w∗TΣw∗ to
σ2un,s vanishes with the rate Op
(√log n/n
)(see section 8.6). If σ21/N were to decrease too fast
with N , say O(
(logN/N)12+ε)
where ε > 0 is a constant6, approximation error of w∗TΣw∗
to σ2un,s would be larger than σ21/N in probability. Therefore in this situation, the weighted
norm mvp is unlikely to beat 1/N in the realized risk. An example for such a situation is when
Σ = INN . Suppose that the linear constraint is wT1 = 1, then σ21/N = N−1 = σ2un, and it
follows that P(w∗TΣw∗ ≤ σ21/N
)= 0. Another example is when Σ is a Toeplitz matrix. In
this case, if σ2i = 1 and σ2ij = c|i−j|, where i 6= j and 0 < c < 1, then
σ21N
=1
N2
[2
((N − 1) c− c2−cN+1
1−c1− c
)+N
]= O
(N−1
),
which also violates the condition.
The number of assets selected is controlled by the penalty parameter of the l1 norm penalty,
which is specified as
λ1 = λn,N = B0
√2 logN
aminn.
The constant amin is related to the elements in Σ (see proof of Lemma 3). Such specification
satisfies the requirement that |S| should increase with n, and more importantly, it provides a
practical guideline to set the penalty parameters as we will see in Section 6.2.
Finally, we should note that the property of causing sparsity makes the weighted norm
strategy different from the ordinary shrinkage estimators. The ordinary shrinkage estimators
shrink particular elements within the estimated covariance matrix to some targets, and in
turn to reduce the estimation errors. The weighted norm strategy, however, not only shrinks
particular elements within the estimated covariance matrix, but also reduces its dimension of
the estimated covariance matrix for the portfolio optimization. Consequently the resulting mvp
only has assets from a certain subset of the whole assets. Thus we can view that the ordinary
shrinkage estimators aim to approximate σ2un directly, while the weighted norm strategy aims
to approximate σ2un,s. As we show above, how the weighted norm strategy performs depends
crucially on how far σ2un,s deviates from σ2un.
a subset of those being selected under λ′1, since the optimal weights are not always monotonically decreasingwith the penalty parameter of the l1 norm penalty.
6Note that here limn→∞N/n = ρN ∈ (0, 1) should hold.
13
3.3 Unconditional Portfolio Variance
If the asset returns are i.i.d., by using total variance formula, unconditional variances of port-
folio returns for the weighted norm mvp and gmvp are given by
var (Rwp,n+1) = E(w∗TΣw∗
)+ µTcov (w∗)µ,
var (Runp,n+1) = E(wTunΣwun
)+ µTcov (wun)µ,
where cov (w∗) and cov (w∗un) are covariance matrices of w∗ and w∗un and their analytical ex-
pressions are not derived here. From Theorem 1, under certain conditions, P(w∗TΣw∗ ≤ wT
unΣwun
)→
1 as N and n both become large. It follows that
E(w∗TΣw∗
)≤ E
(wTunΣwun
).
as N and n both become large. Thus if µTcov (w∗)µ ≤ µTcov (wun)µ, var (Rwp,n+1) ≤var (Runp,n+1) will also hold.
As for the 1/N , its unconditional variance also equals to σ21/N , and therefore cov (Rwp,n+1) ≤σ21/N if and only if
E(w∗TΣw∗
)+ µTcov (w∗)µ ≤ σ21
N
.
From Theorem 2, under certain conditions, P(w∗TΣw∗ ≤ σ21/N
)→ 1 as N and n both go to
large. It implies
E(w∗TΣw∗
)≤ σ21
N
.
will hold as N and n both become large. Thus to require var (Rwp,n+1) ≤ σ21/N , either
µTcov (w∗)µ should not be too large or E(w∗TΣw∗
)should be small enough comparing
to σ21/N .
We can assume that the covariances between elements within w∗ are small, and µTcov (w∗)µ
might be approximated by∑N
i=1 µ2i var (w∗i ). By similar fashion, µTcov (w∗un)µ might be ap-
proximated by∑N
i=1 µ2i var
(w∗un,i
)if the covariances between elements within w∗un are also
small. The variances var (w∗i ) and var(w∗un,i
), i = 1, . . . , N are closely related to the portfolio
turnover rates, and we will see that empirically, the weighted norm mvp tends to have a lower
turnover rate than the gmvp.
4 Some Explanations on the Weighted Norm MVP Optimiza-
tion
4.1 Individual’s Financing Constraint
The l1 norm penalty in (1) is equivalent to imposing the norm constraint ‖w‖l1 ≤ c in the
portfolio optimization. The norm constraint is called a gross exposure constraint in Fan, Zhang,
14
and Yu (2012). It can be viewed as the investor wants to minimize the portfolio variance, but
still trying to limit the investment positions exposed to the risky assets. The constant c is the
maximum allowable amounts of investments on the risky assets, which reflects the investor’s
concern on parameter uncertainty due to statistical estimation errors.
Now consider a more general version of the l1 norm constraint:
N∑i=1
νi |wi| ≤ c,
where νi is a nonnegative constant. In Brodie et al. (2009), νi is viewed as a measure of
transaction cost, such as bid-ask spread of asset i. We can also interpret νi as the requirement
of margin on asset i (Garleanu and Pedersen, 2011). As for the case of (1), it is equivalent to
treating all of such transaction costs being equal to one.
4.2 Decision Based on Marginal Increment of the Portfolio Variance
The weighted norm mvp portfolio optimization can be viewed as a penalized l1 norm mvp
portfolio optimization when the investor uses Σ′ = Σ + λ2INN as the covariance matrix. Now
suppose that due to fear of estimation errors, the investor believes Σ′ is the valid in-sample
estimate for the covariance matrix. Also to simplify the analysis, suppose the mvp optimization
is only subject to the full investment constraint. At the stationary point, the marginal change
of the (valid) in-sample portfolio variance due to a change of wi is given by
∂wTΣ′w
∂wi= 2wi
(σ2i + λ2
)+ 2
N∑j 6=i
wj σij = γ − λ1 (13)
if wi > 0, and
∂wTΣ′w
∂wi= 2wi
(σ2i + λ2
)+ 2
N∑j 6=i
wj σij = γ + λ1 (14)
if wi < 0. If λ1 = 0, the marginal change is the Lagrangian multiplier γ, which is the shadow
price to measure how the portfolio variance changes when the investor’s wealth changes. The
marginal change of wTΣ′w due to an increase of a positive weight is γ − λ1, which is smaller
than that due to an increase of a negative weight (γ+λ1). It can be shown that γ > λ1 always
holds at the stationary point, and hence the marginal changes γ − λ1 and γ + λ1 are always
positive. For wi = 0, from the KKT conditions,
γ − λ1 ≤∂wTΣ′w
∂wi≤ γ + λ1. (15)
15
If λ1 becomes large, it is more likely that ∂wTΣ′w/∂wi will fall into the interval [γ − λ1, γ + λ1],
and then more assets will be excluded in the optimal portfolio. Meanwhile, it is less likely that
(14) will still hold as λ1 increases, since γ+λ1 will also increase. As mentioned, if λ1 is beyond
some upper bound λ, the portfolio only has no-shortsales positions, and in this extreme case,
only (13) will hold. Thus the optimal no-shortsale solution is equivalent to the optimal solution
of the mvp with a subset of all assets.
The weighted norm portfolio optimization can be viewed as a decision process in which the
investor assigns the penalty parameter to decide whether to include an asset in the portfolio
or not. The decision is based on how the (in sample) portfolio variance changes due to an
increase of the asset’s weight. At the stationary point, if the increase of the asset’s weight
causes a large (small) enough increment of the portfolio variance, say γ+λ1 (γ−λ1), then the
asset’s weight should have a negative (positive) sign. If the increase of the asset’s weight only
causes a mild increase of the portfolio variance, say between [γ − λ1, γ + λ1], then the asset
will not be included in the portfolio. We also can interpret it as the investor is concerned with
both the sign and magnitude of the asset weight. If increasing an asset’s weight only makes
the portfolio variance change at some level between γ − λ1 and γ + λ1, then the investor will
view such information is not enough to determine the sign of the asset weight, and hence it
had better not to have the asset in the portfolio. It reflects the investor’s attitude to parameter
uncertainty, and the penalty parameter λ1 controls the degree of such attitude.
When λ1 becomes extremely large, only a small number of assets will be included in the
portfolio. Now the assets with negative weights can have much larger impacts on the portfolio
variance than the assets with positive weights, and including the assets with negative weights
would be risky, since a small change in their weights may cause a large change in the portfolio
variance. Thus the investor will try to avoid such assets. In this situation, the assets included
in the portfolio will all have positive sign, and changes of their weights will only have small
impacts on the portfolio variance.
4.3 Relation to Bounded Rationality and Psychological Phenomenons
Gabaix (2011) showed that adding the l1 norm penalty to an individual’s optimization problem
can generate rich bounded rationality and psychological effects. In his model, the individual
tries to simplify her optimal decision process by considering only a few important parameters
in the utility function. To achieve this, at first the individual minimizes a cost function for
choosing the important parameters in the utility function. The cost function has a quadratic
form plus the l1 norm penalty on the parameters. With the l1 norm penalty, it is easy to
achieve a sparse solution for the parameter vector. Such a decision process can be explained
as the individual prefer to frame her view on the real world not too complex. It is a reasonable
setting in reality, since no one can freely consider a large amount of relevant information for her
decisions. If some of the relevant information is not so important, the individual had better to
damp it. Then the individual can consider her optimal actions based on the simplified utility
16
function.
To get more insights of Gabaix (2011), we can further generalize the l1 norm penalty ‖w‖l1as
‖w −w0‖l1 .
In the portfolio optimization, w0 can be viewed as the investor’s default decision on the asset
allocation. In our case, we set such default weight vector equal to zero. With such a generalized
l1 norm penalty, we can image that some elements in the optimal weight vector will naturally
be the default weights. It means that the investor’s decision with respective to some of the
assets will not be changed. In practice, this property is helpful on reducing portfolio turnover
rates, as shown in DeMiguel et al. (2010), who set w0 equal to the optimal weight vector of
the minimum variance portfolio.
In economics, such sticking-to-default effect is called inattention, while in psychology, it is
called an endowment effect. This effect often arises in real world when the investor faces too
many assets to choose. The investor may fear that simultaneously making many decisions may
deteriorate overall qualities of these decisions. To avoid such a situation, the investor can keep
as many initial decisions as possible, and change some of them if it is really necessary.
4.4 The Maximum a Posteriori Probability (MAP) Estimator
Zou and Hastie (2005) show that regression coefficients regularized by the elastic net constraint
can be viewed as having a compromised prior between the Gaussian and Laplace distributions.
Based on this result, we can give the optimal portfolio weights obtain from solving (1) a similar
statistical explanation. Let Rt be the N × 1 vector of asset returns at time t, t = 1, . . . , T .
Suppose that given w, estimated mean return R, and portfolio variance σ2por, the investor
believes that the portfolio return wTRt | w, R, σ2poriid∼ N
(wTR, σ2por
)for all t. Also suppose
that the investor has a prior belief that the weight vector w follows a distribution with the
density proportional to
π (w) = exp(−ψ ×
(α ‖w‖l1 + (1− α) ‖w‖22
))I Aw = u ,
where ψ > 0, α ∈ [0, 1], and Aw = u is the linear constraints for the portfolio weights.
Conditional on σ2por, Rtnt=1 and R, the density of the posterior distribution of the portfolio
weights w is given by
p(w|σ2por, Rtnt=1 ,R
)∝ exp
(−T − 1
2σ2porwTΣw
)× π (w)
= exp
(−T − 1
2σ2porwTΣ′′w − ψα ‖w‖l1
)× I Aw = u (16)
17
where Σ is the sample covariance matrix of Rt, and
Σ′′ = Σ +2σ2porψ (1− α)
T − 1IN×N .
Thus maximizing the log posterior portfolio weight density with respect to w is equivalent to
solving problem (1) with Σ = Σ, λ1 = (T −1)−1(2σ2porψ)α and λ2 = (T −1)−1(2σ2porψ) (1− α),
and the optimal w is the maximum a posteriori probability (MAP) estimator for w. The above
result is related to proposition 8 and 9 in DeMiguel et al. (2009a), in which they stated the
cases when α = 1. Furthermore, equation (16) implies the investor has a prior belief that w
follows a distribution with the density proportional to exp(−ψα ‖w‖l1
)I Aw = u and the
regularized covariance matrix estimator Σ′′ is used as the covariance matrix estimation.
4.5 MVP Optimization as a Minimum Mean Square Deviation Problem
The mvp optimization has some similar characteristic to the squared loss-based linear regression
estimation. Consider the following minimum mean square deviation problem:
minw
E(Y − sTw)2, subject to Aw = u, (17)
where the expectation is taken with respect to Y . To solve (17), we seek a vector w to
minimize the mean square deviation between sTw and Y given that Aw = u. Suppose Y is
the dependent variable and s is a N×1 vector of predictors. E(Y −sTw)2 can be interpreted as
the expected squared prediction error, and Aw = u is the constraint on the coefficient vector.
Now let R be a N×1 random asset return vector. If we set Y = RTw, s = E(R), the objective
function becomes the portfolio variance E((R − E(R))Tw)2. Consequently (17) becomes the
mvp optimization. It is equivalent to seeking the minimum mean square deviation between
(R− E(R))Tw and zero, subject to Aw = u.
When the number of covariates becomes relatively large to the sample size, recent research
on large dimensional variable selections in the linear regression shows that regularization meth-
ods can work well not only for model selections but also for improving out-of-sample predictions.
Bai and Ng (2008) and De Mol, Giannone, and Reichlin (2008) show regression penalized by
the l1 norm penalty can perform at least equally well or better than traditional methods on
predicting important macroeconomic indicators when a large number of predictors are jointly
considered. As for the mvp, when the number of assets becomes relatively large to the sample
size, it is reasonable to see that the regularization methods can do the same improvements for
reducing the out-of-sample portfolio variance as it does for the mean squared prediction error
of the OLS regression, since the mvp optimization and the OLS estimation share a similar
property on searching the optimal solution: namely, finding the optimal coefficient vector w
to minimize the mean square deviation between two points.
18
5 Coordinate Wise Descent Algorithm
In this section we introduce a coordinate-wise descent algorithm proposed by Yen and Yen
(2011) to solve the weighted norm mvp optimization (1). We focus on the case in which the
objective function only subjects to the full investment constraint wT1 = 1. Let f (w) =
f (w1, . . . , wN ) be an objective function and assume f (w) is convex in w. Suppose we would
like to solve minw f(w). The coordinate-wise descent algorithm starts by fixing wi for i =
2, . . . , N and then finds a value for w1 to minimize f (w). The iteration step is then carried
out over i = 2, 3, · · · , N before going back to start again for i = 1, and the procedure is
repeated until w has converged.
Friedman et al. (2007) demonstrates that coordinate-wise descent algorithms can be pow-
erful tools in solving regression problems regularized by convex constraints. Since then, the
approach has become popular in statistics to solve various norm constrained regression prob-
lems. Theoretical properties of this type of algorithm can be found in Tseng (2001).
Yen and Yen (2011) develops an efficient coordinated-wise descent algorithm to solve the
weighted norm mvp optimization. When we only have the full investment constraint wT1N =
1, we can use following scheme to update the portfolio weights wi, i = 1, . . . , N
ST (γ − zi, λ1)2(σ2i + λ2
) , (18)
where ST (x, y) = sign (x) (|x| − y)+ is the soft thresholding function and zi = 2∑N
j 6=iwjσij .
Let S+ = i : wi > 0 and S− = i : wi < 0. With the updated portfolio weights, then we
can use the following scheme to update the Lagrangian multiplier γ:
1 +∑
i∈S+∪S−zi
2(σ2i +λ2)
− λ1(∑
i∈S−1
2(σ2i +λ2)
−∑
i∈S+
12(σ2
i +λ2)
)[∑
i∈S+∪S−1
2(σ2i +λ2)
] . (19)
The derivations of (18) and (19) can be found in Appendix 8.7. If the set of nonzero weights
S+ ∪ S− is correctly identified, convergence of the algorithm will guarantee the minimum is
attainedThrough updating the γ, the full investment constraint can be further satisfied. If
w and γ both have converged, we can obtain the optimal solution of weighted norm mvp
optimization.
The approach for updating the weights and the Lagrangian multiplier given above can be
generalized. For example, consider that the set of linear constraints Aw = u should be satisfied,
where A is a k × N matrix, u is a k × 1 vector, and k ≥ 1. The Lagrangian of the weighted
norm mvp optimization is the objective function minus γ(Aw − u), where γ = (γ1, . . . , γk) is
a 1×k vector of the Lagrangian multipliers. We can follow procedures similar to the one given
above to derive the updated forms for wi and γ, and then w, γ1, γ2, . . . , γk can be updated
sequentially. Yen and Yen (2011) showed how to such a sequential update when an additional
19
target return constraint wTµ = µ. We will use their scheme to solve the weighted norm mvp
with both the full investment and target return constraints in the empirical analysis.
6 Empirical Results
6.1 Performance Measures
We estimate the covariance matrix with the sample covariance estimator. We adopt the ex-
panding window scheme with the initial window length τ0 = d1.2Ne for the estimations.
Suppose there are T periods, and length of the testing period (from τ0 + 1 to T ) for the port-
folio strategies is T = T − τ0. Let Σt denote the sample covariance matrix estimated at period
t = τ0, . . . , T − 1. We plug Σt into (1) and solve the weighted norm mvp optimization at each
period t, and the solved optimal weight at period t is denoted by w∗i,t, i = 1, . . . , N .
We first calculate the out-of-sample (oos) portfolio return at period t+ 1, which is defined
as7
Roos,p,t+1 =
N∑i=1
w∗i,tRi,t+1.
We then calculate turnover rate of the portfolio strategy (TOR). Suppose at the end of period
t − 1, the investor has wealth Πt−1 that can be invested on the assets. Given the optimal
weight w∗i,t−1 for period t, at the end of this period, the value of asset i is Πt−1w∗i,t−1 (1 +Ri,t)
and the investor’s total wealth is Πt = Πt−1
(1 + Roos,p,t
). With the optimal weight w∗i,t for
period t+ 1, the investor’s wealth to invest on asset i is Πtwi,t. We define the turnover rate of
asset i from t to t+ 1 as
TORi,t+1 =
∣∣∣∣∣∣w∗i,t − w∗i,t−1 (1 +Ri,t)(1 + Roos,por,t
)∣∣∣∣∣∣ , (20)
which is just the proportion of wealth at the end of period t needed to invest on asset i in
order to satisfy the amount Πtwi,t. We then define the turnover rate of the portfolio strategy
(TOR) from t to t+ 1 as the sum over the turnover rate (20) of all assets:
TORp,t+1 =N∑i=1
TORi,t+1.
We impose transaction fee ε, and define the oos net portfolio return at period t+ 1 as
Rnetoos,p,t+1 = (1− εTORp,t+1)×(
1 + Roos,p,t+1
)− 1.
7All the performance measures introduced in the following are calculated by using the optimal weights ofthe weighted norm mvp. The same calculations are applied to all the other portfolio strategies by using theiroptimal weights.
20
The oos net portfolio return is then used to calculate the sample variance (SV), the Sharpe
ratio (SR) and certainty equivalent return (CER) 8.
To see how sparsity affects the portfolio performances, we can have a look of proportion of
active constituents (PAC) of the portfolio at period t,
PACt =
∣∣∣S+t ∪ S
−t
∣∣∣N
,
where S+t =
i : w∗i,t > 0
and S−t =
i : w∗i,t < 0
. We also interest in how the portfolio
strategy works on the diversity among the assets. To measure this, we calculate Herfindahl-
Hirschman index (HHI) for the portfolio weights at period t, which is defined as
HHIt =N∑i=1
∣∣∣w∗i,t∣∣∣2(∑Ni=1
∣∣∣w∗i,t∣∣∣)2 =‖w∗t ‖
22
‖w∗t ‖21
.
The norm penalty generates a sparse solution for the portfolio optimization, and the resulting
optimal portfolio will have the weights concentrate on the active constituents (assets with
nonzero weights). Hence measuring concentration among the optimal portfolio weights of
these active constituents may be more informative. For this purpose, we also calculate the
adjusted normalized Herfindahl-Hirschman Index (ANHHI) at period t,
ANHHIt =HHIt − 1
|St|1− 1
|St|,
where St =i : w∗i,t 6= 0
9. Finally, we define the shortsale-long ratio of a portfolio (SLR) at
period t as
SLRt =
∑i∈S−t
∣∣∣w∗i,t∣∣∣∑i∈S+
t
∣∣∣w∗i,t∣∣∣ .The shortsale-long ratio is useful on clarifying how the norm penalty and the linear constraints
have impacts on the short and long positions in the portfolio.
6.2 Setting the Penalty Parameters
Theorem 1 suggests that we can set the penalty parameter as a function of the number of
assets N and sample size n. Let λ1,t and λ2,t denote penalty parameters for period t. We
8The certainty equivalent return used here has a form of quadratic utility function with the risk aversionparameter ψ, which is the same as the one used in DeMiguel et al. (2009b). We set ψ = 5 in the followingempirical study.
9Note that for ANHHIt, we use 1/∣∣∣St
∣∣∣ rather than 1/N as the normalizing constant. The reason is that
unlike HHIt, which measures concentration of the optimal weights among all N assets, the ANHHIt measuresconcentration of weights among the active assets.
21
propose the following settings:
λ1,t = αatBt
√2 logN
nt,
λ2,t = (1− α) atBt
√2 logN
nt,
where at = mini=1,...,N σ2i,t, and Bt =
∥∥w∗t−1∥∥1 . The parameter nt is the sample size used at
period t, which increases with time t under the expanding window scheme10. The specification
shown here keeps the form√
2 logN/nt, but also uses some scale parameters: at, Bt and α.
at is the minimum diagonal term of the estimated covariance matrix (the minimum variance
among all the N assets). The reason for such setting is that, as shown in the proof of Lemma
3, aij is inversely related to σ2ij or σ4i . Therefore amin should also inversely relate to σ2ij or
σ4i . we find approximate amin by at works pretty well in practice. In addition, adding at also
makes the penalty parameters are free to changes in units. The choice for Bt here is trying
to satisfy condition 4 in Section 3.2, and we also find this choice works well. The parameter
α ∈ [0, 1] imposed here is used to adjust relative importance between the l1 and squared l2
norm penalties, and it can help us to see how changing the relative importance between the
two penalties affects the portfolio performances.
6.3 Main Results
We first compare performances of different portfolio strategies when the full investment con-
straint wT1 = 1 is imposed. The benchmark strategies we consider are the no-shortsale mvp
(nsmvp), 1/N and the global minimum variance portfolio (gmvp). The data used are re-
turns of the Fama-French 100 size and book-to-market ratio portfolios (FF100) and 300 stocks
randomly chosen from CRSP data bank (CRSP300) 11.
Table 8 and 2 show the results when daily return data are used, and the portfolios are
rebalanced with daily frequency. The whole sampling period for the FF100 is from July-
12-1987 to Dec-31-2010 (5,415 observations) and for the CRSP300 is from July-30-1998 to
Dec-31-2010 (3,127 observations), and the testing period for the FF100 is from Jan-02-1990
to Dec-31-2010 (5,295 observations) and for the CRSP300 is from Jan-03-2000 to Dec-31-2010
(2,767 observations). We impose transaction fee ε = 35 basis points when calculating the oos
net portfolio returns. The Tables include annualized sample variance of the oos net returns
(SV), the annualized Sharpe ratio (SR) obtained from the oos net portfolio returnsand average
values of the other six performance measures: the certainty equivalent return (CER), portfolio
turnover rate (TOR), proportion of active constituents (PAC), Herfindahl-Hirschman Index
(HHI), adjusted normalized Herfindahl-Hirschman Index (ANHHI), and shortsale-long ratio of
10Under the expanding window scheme, nt = t, t = τ0, . . . , T − 1.11The data of FF100 can be downloaded from Professor Kenneth French’s website: http://mba.tuck.
dartmouth.edu/pages/faculty/ken.french/datalibrary.html. The returns of the 100 portfolios used hereare value weighted returns of their constituents. The return data of CRSP300 is available from the author.
22
the portfolio (SLR). For each Table, in the parentheses are the bootstrap standard errors of
the corresponding quantities obtained from using stationary bootstrap of Politis and Romano
(1994).
For the daily FF100, we can see that the weighted norm strategy can deliver a lower
annualized portfolio variance (SV) than the no-shortsale mvp (nsmvp) and 1/N portfolio, and
the 1/N portfolio has the highest SV. The result is different from DeMiguel et al. (2009b) who
showed that the 1/N strategy can dominate other more sophisticated portfolio strategies in
many different performance measures. As for the annualized Sharpe ratio (SR), the weighted
norm mvp performs better than the benchmark portfolio strategies. With different values of
α, the weighted norm mvp has the annualized SR all greater than 1 (ranging from 1.07 to 1.19
when α varies from 0 to 1). The weighted norm mvp also dominates the benchmark portfolio
strategies in the average certainty equivalent return (CER). We also can see the proportion of
active constituents (PAC) is inversely related to the SV. It suggests that increasing sparsity
seems not work for reducing portfolio volatility in this case.
As for the daily CRSP300, the weighted norm mvp can yield a lower SV than the benchmark
portfolio strategies for a range of α. The lowest SV occurs when α = 0.2, and in this case
on average only 76% assets are selected. It suggests that optimally increasing sparsity may
work on reducing portfolio volatility. However, the weighted norm mvp does not enjoy the
highest SR and CER. The highest SR and CER are achieved by the nsmvp, a special case of
the penalized l1 norm minimum variance portfolio.
Overall the average portfolio turnover rate (TOR) of the 1/N portfolio is lower than those
of the other strategies. The fact was widely documented in previous literatures. As for the
weighted norm mvp, it can be seen that the TOR declines with α, which suggests that imposing
more l1 penalty helps to stablize the portfolio weight vector. For the average proportion of
active constituents (PAC), the weighted norm mvp with α > 0 and nsmvp constantly have less
than N assets included. The PAC also decreases with α, since increasing α is equivalent to
imposing more l1 penalty which facilitates more sparsity in the portfolio weight vector. The
nsmvp on average has the sparsest optimal portfolio weight vector. It is expected, since the
no nsmvp is essentially the same as the minimum variance portfolio with a heavy l1 penalty.
In general, with the full investment constraint, a portfolio with a sparse weight vector will
concentrate on a few assets, and consequently the problem of extreme portfolio weights will
arise. The phenomenon can been measures by using the HHI and ANHHI for the portfolio
weights. The 1/N portfolio assigns equal weights to each asset and hence its HHI and ANHHI
have the lowest values (1/N and 0) among all portfolios. For the weighted norm mvp, the HHI
and ANHHI increase with α and the PAC. It is not surprising that the nsmvp, which has the
lowest average PAC, has the highest values of HHI and ANHHI. The weighted norm mvp with
α = 0, in which only the squared l2 norm penalty is imposed, has lower values of HHI and
ANHHI than the gmvp. It suggests that additionally imposing the squared l2 norm penalty
helps to alleviate the problem of extreme portfolio weights. Finally, the shortsale-long ratio
23
is positively related to α, which confirms that the l1 norm penalty with the full investment
constraint facilitates long positions in the weighted norm mvp.
Table 3 to 5 show the results of the weekly FF100 and CRSP300 and monthly FF100
return data are used12. Since here the weekly (monthly) return data are used for estimating
the sample covariance matrix, the portfolios are rebalanced weekly (monthly). The results are
qualitatively similar as those from using the daily data. For the FF100, the weighted norm
mvp consistently yields higher SR than the three benchmark portfolios, but for the CRSP 300,
it still fails to achieve higher SR and CER than the nsmvp.
6.4 With the Target Return Constraint
Previous literatures argued that empirically, adding the estimated return vector into the port-
folio optimization often damages the portfolio performances (Jagannathan and Ma, 2003;
DeMiguel et al., 2009a)). Accurately estimating the means of returns is considered to be
more difficult than accurately estimating the variances or covariances of the returns. This
perhaps is the main reason why researchers often focus on the portfolio optimization without
imposing the target return constraint wTµ = µ in which we need to estimate the mean return
vector. In this section we investigate whether imposing the weighted norm penalty can help to
improve the portfolio performances under the target return constraint. The mean return vector
µ is estimated by the sample mean of the returns with the expanding window scheme. Table
6 shows the results. The target return µ (annualized) is set at high and low levels according
to the means of the net oos portfolio returns of the FF100 and CRSP300. We keep the same
penalty parameter settings as in Section 6.2.
Overall, the SV, SR and CER shown here are worse than the case without the target return
constraint. One exception is the CRSP300 with µ = 5%, in which the SR and CER are higher
than the case without the target return constraint. The value of µ have a significant impact
on the performance measures, but higher µ does not necessarily result in higher SR and CER.
For example, in the CRSP300, the weighted norm mvp can yield the SR around 0.59 to 1 when
the annualized required return µ = 5%. But as µ increases to 10%, the resulting SR decreases
to 0.40 and 0.74. For the FF100, the situation becomes opposite. The weighted norm mvp can
yield SR from 1.02 to 1.06 as µ = 20%, while µ is down to 10%, the resulting SR is only around
0.59 to 0.64. In sum, the results shown here are in line with the previous literatures found:
adding the estimated mean returns into the portfolio optimization does little or no help on
improving the portfolio performances. However, we should note that the penalty parameters
used here do not incorporate any information about the estimated mean return vector, and
12The testing periods are the same as the daily cases, but the sampling periods are now different. The samplingperiod for the weekly FF100 is from the 38th week of 1987 to the last week of 2010 (1,215 observations); for theCRSP300 it is from the sixth week of 1993 to the last week of 2010 (933 observations). The weekly CRSP300return data is available from the author. The sampling period for the monthly FF100 is from Jan-1980 toDec-2010 (372 observations), and the monthly return data can also be downloaded from Professor KennethFrench’s website.
24
this perhaps is another reason for the inferior performances of the weighted norm mvp with
the target return constraint.
6.5 Alternative Norm Penalties
We then investigate whether imposing different norm penalties on the portfolio weights can
obtain better portfolio performances than does the weighted norm penalty. As mentioned
in Section 4.5, the penalized portfolio optimization can be viewed as a constrained regression
problem. We therefore borrow some ideas from statistics. The first alternative penalty function
we consider is the berhu penalty (Owen, 2007),
λN∑i=1
(|wi| I |wi| < κ+
w2i + κ2
2κI |wi| ≥ κ
), (21)
where IA is the indicator function such that IA = 1 if event A is true and IA = 0
otherwise. The name ”berhu” comes from the fact that it is the reverse of Huber’s loss.
In statistics, Huber’s loss function is designed to mitigate effects from large error terms in
regression estimations The Huber’s loss is a hybrid function of two parts: one is a quadratic
function for the error having a relatively small magnitude and the other is a linear function
for absolute value of the error having a relatively large magnitude. By separating impacts
from the errors having small and large magnitudes, the Huber’s loss can be used in robust
regression estimations. The berhu penalty (21) is also a combination of the l1 and squared
l2 norm penalties. However, unlike the weighted norm penalty, the berhu penalty adopts a
different regularization rule. If |wi| is small (less than κ), then it will be regularized by the l1
norm penalty; if |wi| is large (greater than or equal to κ), then it will be regularized by the
squared l2 norm penalty. By specifying the parameter κ, the berhu penalty can help us to
regularize large and small portfolio weights separately, like the Huber loss dealing with error
terms in the regression estimations. Such setting provides an alternative way to robustly deal
with problem of extreme portfolio weights in the portfolio optimization.
We also consider the following generalized l1 norm penalty 13,
λ1 ‖w‖1 + λ2 ‖w −w0‖1 .
This is a more general version of the norm penalty considered in DeMiguel et al. (2010) in
which only the part of ‖w −w0‖1 is imposed (λ1 = 0). The additional ‖w‖1 imposed here is
functioning as inducing more sparsity. We have seen the penalty ‖w −w0‖1 in Section 4.3,
where we relate it to an individual’s inattention or the endowment effect. The vector w0 in
the second l1 norm penalty is called the target portfolio weight vector. DeMiguel et al. (2010)
suggested to use the optimal gmvp weight vector as the target portfolio weight vector, and
13The derivation of the coordinate-wise descent algorithm for solving the mvp penalized the generalized l1norm penalty can be found in Appendix 8.9
25
showed that such a choice can help to reduce the portfolio turnover rate. Here we use 1/N
or the optimal no-shortsale weight vector as the the target portfolio weight vector. Using
1/N is because it generally results in the lowest portfolio turnover rate among the benchmark
portfolios, while using the optimal no-shortsale weight is because its dominant performances
in the CRSP300.
We also try to assign different penalty values to different assets. To do this, we consider a
multiple-stage portfolio optimization with the following penalty:
λN∑i=1
ε
exp(ε∣∣∣w∗(l)i
∣∣∣) |wi| , (22)
where w∗(l)i , l = 0, 1, . . . , is the portfolio weight for asset i obtained from some portfolio
optimization, and ε > 0 is a turning parameter. we call (22) the adaptive penalty. Note that
all of the norm penalties we use so far treat each asset equally: the penalty parameter for each
of them has the same value. The above modified l1 norm penalty is trying to impose different
degrees of penalty on the assets by adding the term ε exp(−ε∣∣∣w∗(l)i
∣∣∣). To obtain w∗(l)i , we
propose to use a multi-stage approach:
• Step 1: Solve the l1 norm penalized portfolio optimization. Denote the solution w∗(0)i ,
i = 1, . . . , N.
• Step 2: Plug w∗(0)i into (22). Then solve the portfolio optimization with penalty (22),
and denote the solution w∗(1)i , i = 1, . . . , N.
• Step 3: Plug w∗(1)i into (22). Then solve the portfolio optimization with penalty (22),
and the solution, denoted by w∗(2)i , i = 1, . . . , N is used as the portfolio weight for asset
i.
We can continue to step 4, 5,... and obtain w∗(3)i , w
∗(4)i , . . ., and terminate until certain con-
vergence condition is met. However, we find in our empirical analysis only up to step 3 is
enough to ensure good convergence. Imposing (22) can be viewed as using the majorization-
minimization method to approximately solve an l0 norm penalized portfolio optimization, and
a more detail discussion on this can be found in Appendix 8.9.
For the three alternative norm penalties, we set λ = atBt√
2 logN/nt. Table 7 and 8 show
the results for the daily FF100 and CRSP300. For the FF100, using berhu penalty can yield
the SR around 0.83 to 1.11, and CER around 0.04 to 0.05, as we vary the parameter κ from
0.02 to 0.1. For the CRSP300, the SR is from 0.99 to 1.08, and the CER is from 0.033 to
0.037, as κ is varied within the same range. Both the SR and CER of the berhu penalty for
the CRSP300 are better than those of the weighted norm mvp. As for the generalized l1 norm
penalty, we use TWN − l1 (TWNS − l1) and TWN (TWNS) to denote the cases with and
without the additional l1 norm penalty when the target weight vector w0 is equal to 1/N (the
optimal no-shortsale weight vector). It can be seen that using the optimal no-shortsale weight
26
vector as w0 yields higher SR and CER than using the 1/N weight, no matter whether the
additional l1 norm penalty is added. Imposing the additional l1 penalty works well for the
CRSP300, but not for the FF100. As for the adaptive penalty, it can obtain higher SR than
the weighted norm mvp for both the FF100 and CRSP300, as we carefully set the parameter
ε. The number of iterations l seems not have effects on the performances. The SR for the
FF100 can achieve 1.21 as ε = 1. For the CRSP300, setting ε = 1 and 2.5 yields SR up to
0.96 and 0.99, which is similar to that of the weighted norm mvp. Overall, the results indicate
that the three alternative penalties have a chances to achieve comparable performances that
the weighted norm penalty has.
7 Conclusion
We provide both theoretical and empirical evidence to show that imposing the weighted l1 and
squared l2 norm penalty on the portfolio weights can improve performances of the portfolio
optimization when the number of assets N becomes large. We use an efficient coordinate-
wise descent algorithm on solving the penalized norm portfolio optimization with the penalty
parameter setting derived from our theoretical analysis. We also link the weighted norm
strategy to some interesting issues in finance, economics and statistics, and provides alternative
explanations on why using the weighted norm strategy is reasonable.
In this paper we only use the sample covariance matrix and sample mean return vector
estimated with the expanding window scheme in the portfolio optimization. More sophisticated
estimations, such as using high frequency data or factor models, might provide better results.
There are some interesting issues which have not been thoughtfully discussed in the paper. For
example, how the theoretical properties of the weighted norm mvp change when 1) allowing
random components in the linear constraints, 2) the data generating process becomes more
complex and 3) more sophisticated estimations of the covariance matrix and mean return vector
are used. El Karoui (2009, 2010) provided some results on the first two issues for the global
minimum variance portfolio. Fan et al. (2013) provided a rigorous analysis for the latter two
issues when the l1 norm constrained mvp is considered. The above three modifications will
make properties of the high dimensional covariance matrix, mean return vector estimations
and the penalized norm portfolio optimization become more complicated than those shown in
this paper, and analyzing the properties and empirical performances of the penalized norm
portfolios under the situation is interesting and challenging.
27
8 Appendix
8.1 Proof of Lemma 1
Proof. The optimal nonzero weights w∗s and Lagrange multipliers can be solved via the following
system of equations:(2 (Σss + λ2Iss) AT
s
As 0
)(w∗s−γ∗
)=
(−λ1 × sign (w∗s)
u
).
By using matrix inverse formula, we can obtain expressions of w∗s and γ∗ as shown in Lemma 1.
8.2 Proof of Lemma 2
Proof. Multiplying both sides of (3) by w∗T yields
2σ2s + 2λ1 ‖w∗s‖
22 = w∗Ts AT
s γ∗ − λ1w
∗Ts sign (w∗s) .
By (4) and w∗Tsign (w∗) = ‖w∗‖1, It follows that
2σ2s + 2λ2 ‖w∗s‖
22 = uT γ∗ − λ1 ‖w∗s‖1 .
From Lemma 1, uTγ∗ = uTγ∗2,s + λ1uTδ2,s. It can be shown that
uTγ∗2,s = 2(σ2
2,s + λ2
∥∥w∗2,s∥∥2
2
).
Therefore
σ2s − σ2
un,s = σ22,s − σ2
un,s + λ2
(∥∥w∗2,s∥∥2
2− ‖w∗s‖
22
)+λ1
2
(uTδ2,s − ‖w∗s‖1
). (23)
Also
uTδ2,s = w∗T2,ssign (w∗s) ≤∑i∈S
∣∣w∗2,s,isign (w∗s,i)∣∣ ≤ ∥∥w∗2,s∥∥1.
We then prove that
σ22,s − σ2
un,s + λ2
∥∥w∗2,s∥∥2
2≤ cs,1 (cs,1 + 1)σ2
un,s.
To see this, since Σ′ss = Σss + λ2Iss, it can be shown that
AsΣ−1ss A
Ts −AsΣ′−1
ss ATs = λ2AsΣ
′−1ss Σ−1
ss ATs . (24)
Furthermore,
σ22,s − σ2
un,s + λ2
∥∥w∗2,s∥∥2
2= λ2
[uTM ′−1
s
(AsΣ
′−1ss Σ−1
ss ATs
)M−1s u
].
Note that if a |S| × |S| matrix Ms is positive semidefinite, one can have the following Cauchy-Schwartz
type inequality ∣∣xTMsy∣∣ ≤√xTMsx
√yTMsy,
28
where x and y are |S| × 1 column vectors. Note that since Σ′−1ss and Σ−1
ss both positive semidefinite,
given |S| >> k, the matrix AsΣ′−1ss Σ−1
ss ATs is also positive semidefinite. It follows
uTM ′−1s
(AsΣ
′−1ss Σ−1
ss ATs
)M−1s u ≤
√uTM ′−1
s
(AsΣ
′−1ss Σ−1
ss ATs
)M ′−1s u (25)
×√
uTM−1s
(AsΣ
′−1ss Σ−1
ss ATs
)M−1s u (26)
In the following we derive the upper bound of (25) and (26). For (25), by φmin (Σss) > 0,
uTM ′−1s
(AsΣ
′−1ss Σ−1
ss ATs
)M ′−1s u ≤ 1
φmin (Σss)
|S|∑j=1
1
(φj (Σss) + λ2)‖xsqj,s‖22
where xs = uTM ′−1s As. The vector qj,s is the jth column of a square matirx Qs such that Σ′−1
ss Σ−1ss =
QsΛsQTs , QsQ
Ts = Iss and Λs is a diagonal matrix which jth diagonal element is (φj (Σss) (φj (Σss) + λ2))
−1.
Thens∑i=1
1
(φj (Σss) + λ2)‖xsqi,s‖22 = σ2
2,s + λ2
∥∥w∗2,s∥∥2
2.
Combining the above results, (25) is bounded by(φmin (Σss)
)− 12
(σ2
2,s + λ2
∥∥w∗2,s∥∥2
2
) 12
. For (26), since
the matrix AsΣ′−1ss Σ−1
ss ATs is positive semidefinite, by (24),
xTMsx− xTM ′sx = λ2xTAsΣ
′−1ss Σ−1
ss ATs x ≥ 0,
where x is a k × 1 column vector. By xTM ′sx ≥0,
xTMsx ≥λ2xTAsΣ
′−1ss Σ−1
ss ATs x
for any k × 1 column vector x. Therefore
uTM−1s
(AsΣ
′−1ss Σ−1
ss ATs
)M−1s u ≤ uTM−1
s MsM−1s u
λ2=σ2un,s
λ2.
The second term (26) is bounded by λ− 1
22 σun,s. Note that σ2
2,s ≥ σ2un,s, since the former is obtained
by restricting the portfolio weights with the l2 norm, while the later is obtained without imposing any
constraint on the portoflio weights. Therefore
σ22,s − σ2
un,s + λ2
∥∥w∗2,s∥∥2
2≤ cs,1
√(σ2
2,s + λ2
∥∥w∗2,s∥∥2
2
)σ2un,s,
where cs,1 = λ122
(φmin (Σss)
)− 12 . Let a = σ2
2,s + λ2
∥∥w∗2,s∥∥2
2, b = σ2
un,s. The above result shows that
a− b ≤ cs,1√ab. By a and b are both nonnegative, Then
√a−√b ≤ cs,1
√ab
√a+√b≤ cs,1
√ab√a
= cs,1√b.
Therefore√ab ≤ (cs,1 + 1) b, and
σ22,s − σ2
un,s + λ2
∥∥w∗2,s∥∥2
2≤ cs,1 (cs,1 + 1)σ2
un,s.
29
Combining the above results, the proof is completed.
8.3 Normality of Asset Returns
Suppose that Rtiid∼ N (µ,Σ) , t = 1, . . . , n, where µ = (µ1, . . . , µN ) is the mean return vector and Σ is
the covariance matrix, and the calibrated Σ is the sample covariance matrix. Let 1n denote an n × 1
column vector in which all components are 1. It can be shown that
Σ =1
n− 1Σ
12 ZTHnZΣ
12 ,
where R = 1n
∑nt=1 Rt is the sample mean return, R = (R1, . . . ,Rn)
Tis an n × N return matrix,
Hn = Inn − n−11n1Tn is an idempotent matrix, and HR = R−R. The vector Z = (Z1, . . . ,Zn)T
is an
n×N matrix for standard normal random vector,
Ztiid∼ N (0, INN ) ,
i = 1, . . . , n. Obviously, Rt = µt + ZtΣ12 .
With i.i.d. normally distributed returns and the sample covariance estimator, it is well known that
Σ ∼ W (Σ, N, n− 1)
n− 1, (27)
where W (Σ, N, n− 1) is the Wishart distribution with parameter Σ, N, and n − 1. If A is a k × Ndeterministic matrix, then
AΣAT ∼W(AΣAT, k, n− 1
)n− 1
. (28)
If Σ is positive definite with probability one, then
(AΣ−1AT
)−1
∼W((AΣ−1AT
)−1, k, n− 1−N + k
)n− 1
. (29)
Following similar way in previous section, one can also do partition on the return vector Ri. Given
some nonrandom sets S and Sc, let Rs,t ∈ R|S|, and Rsc,t ∈ R|Sc|, t = 1, . . . , n be the first |S| and the
rest |Sc| = N − |S| elements of Rt, then
Rs,tiid∼ N (µs,Σss) ,
Rsc,tiid∼ N (µsc ,Σscsc) ,
where
µ = (µs, µsc) , and Σ =
(Σss Σscs
Σscs Σscsc
).
Furthermore, when Σ is the sample covariance matrix, then
Σss ∼ W (Σss, |S| , n− 1)
n− 1,
Σscsc ∼ W (Σscsc , |Sc| , n− 1)
n− 1,
30
where Σss and Σscsc are sample covariance estimates of Σss and Σscsc respectively. Furthermore, (28)
and (29) also hold when Σ and Σ are replaced by these submatrices Σss and Σscsc and A replaced
by deterministic matrices As (with dimension k × |S|) and Asc (with dimension k × |Sc|). The above
property holds for any S ⊆ N , and is useful on the following proof as we want to construct probability
of a certain event.
8.4 Eigenvalues of the Sample Covariance Matrix
We then discuss some issues on the convergence of the extreme eigenvalues of a sample covariance
matrix. By using Theorem 5.11 in Bai and Silverstein (2010), we can obtain an useful result on
bounded eigenvalues of the sample covariance matrix with i.i.d. normal samples.
Lemma 3 Suppose Ziiid∼ N
(0, σ2 ⊗ INN
), and
Σ =Σ
12 ZTHnZΣ
12
n− 1,
where Z is a n×N matrix which ith row is Zi, Σ is a N×N symmetric and positive semidefinite matrix,
and Hn = Inn−n−11n1Tn . Let φj (M) , j = 1, . . . , N, denote eigenvalues of matrix M , and φmin (M) and
φmax (M) denote the smallest and largest eigenvalues of M , respectively. As limn→∞N/n = ρN ∈ (0, 1) ,
almost surely
σ2 (1−√ρN )2φmin (Σ) ≤ lim
n→∞φj
(Σ)≤ σ2 (1 +
√ρN )
2φmax (Σ) ,
for every j, j = 1, . . . , N .
To prove Lemma 3, we use Theorem 5.11 in Bai and Silverstein (2010). For completeness, we restate
their theorem here.
Theorem 3 (Bai and Silverstein, 2010) Let CN denote the set of N dimensional complex numbers.
Assume Zi ∈ CN , i = 1, . . . , n be iid with mean 0N and covariance matrixΣ in which its diagonal term
Σjj = σ2, and off-diagonal terms Σjl = 0, j, l = 1, . . . , N, j 6= l. Suppose Zi also has finite fourth
moment. As limn→∞N/n = ρN ∈ (0, 1) , then almost surely one can have
limn→∞
φmin(
Σ)
= σ2 (1−√ρN )2,
limn→∞
φmax(
Σ)
= σ2 (1 +√ρN )
2,
whereΣ = (n− 1)−1∑n
i=1 ZZTi .
The proof of Lemma 3 is straightforward, and can be accomplished by applying the above result to
the matrix ZTHnZ.
Proof. Since Σ and Σ are both positive semidefinite, for any arbitrary 1×N vectorx
0 ≤ φmin
(ZTHnZ
n− 1
)xTΣx ≤ xTΣx
n− 1≤ φmax
(ZTHnZ
n− 1
)xTΣx.
31
Then it can be shown that
φmax(
Σ)≤ φmax
(ZTHnZ
n− 1
)φmax (Σ) ,
φmin(
Σ)≥ φmin
(ZTHnZ
n− 1
)φmin (Σ) .
Note that the above inequalities hold for every n > 1 and N ≥ 1. Since Hn is idempotent, it can be
shown that
ZTHnZ =(ZTHn
)(HnZ) =
n∑i=1
(Zi − Z
) (Zi − Z
)T,
where Z = n−1∑ni=1 Zi. Since Zi
iid∼ N(0, σ2 ⊗ INN
), Zi − Z has mean 0N and diagonal covariance
matrix which diagonal elements equal to n−1 (n− 1)σ2. By Theorem 5.11 in Bai and Silverstein (2010),
it can be shown that as limn→∞N/n = ρN ∈ (0, 1), almost surely
limn→∞
φmin
(ZTHnZ
n− 1
)= σ2 (1−√ρN )
2,
limn→∞
φmax
(ZTHnZ
n− 1
)= σ2 (1 +
√ρN )
2,
by n−1 (n− 1)σ2 → σ2. Therefore as limn→∞N/n = ρN ∈ (0, 1) , almost surely
limn→∞
φmax(
Σ)≤ σ2 (1 +
√ρN )
2φmax (Σ) ,
limn→∞
φmin(
Σ)≥ σ2 (1−√ρN )
2φmin (Σ) .
8.5 Estimation Errors
Suppose that σij and σ2i and are consistent estimators for σij and σ2
i , respectively. Without loss of
generality, we may assume σij and σij have the following linear relationship,
σij = σij + ωij (n) ,
σ2i = σ2
i + ωii (n) ,
where ωij (n) (ωii (n)) is the estimation error for σij (σ2i ) by using σij (σij), and is a function of sample
size n. If σij and σ2i are consistent, as n → ∞, ωij (n) and ωii (n) should vanish to zero with a high
probability. Σ may be expressed as
Σ = Σ + Ω (n) (30)
where Ω (n) is an N ×N estimation error matrix with the (i, j)th elements ωij (n) , i, j = 1, . . . , N . It
is also symmetric but may not be positive semidefinite.
Furthermore, by similar partition used in Σ, it follows that
Ω (n) =
(Ωss (n) Ωssc (n)
Ωscs (n) Ωscsc (n)
).
32
Then the difference between the oos variances of weighted norm mvp and gmvp can be expressed as
w∗Ts Σssw∗s − w∗TunΣw∗un = D1,s + D2,s, (31)
where
D1,s = σ2s − σ2
un, (32)
D2,s = w∗Ts Ωss (n) w∗s − w∗TunΩ (n) w∗un. (33)
That is, the sum of the difference between their in-sample variances and the difference between the
estimation errors.
Given S = S, Fan et al. (2009) shows that
w∗Ts Ωss (n) w∗s ≤ maxi,j|ωij (n)| ‖w∗s‖
21 .
Therefore if ‖w∗s‖21 is bounded, the estimation error term of the weighted norm mvp is bounded by
the maximum|ωij (n)| scaled by ‖w∗s‖21 . Here, size of the optimal weight vector plays an important
role in reducing the estimation errors. If w∗s is the optimal no-shortsales weight with w∗Ts 1N = 1,
then ‖w∗s‖21 = 1. The no-shortsales mvp has the smallest ‖w∗s‖
21 over the mvp with the full investment
constraint, and it is main reason why it can efficiently eliminate the estimation errors.
The term D1,s can be further expressed as
(σ2
2,s − σ2un
)+ λ2
(∥∥w∗2,s∥∥2
2− ‖w∗s‖
22
)+λ1
2
(uTδ2,s − ‖w∗s‖1
). (34)
The first term of (34) is the difference between the optimal in-sample portfolio variances from (6)
and (7), and it is nonnegative. To see this, at first note that σ2un ≤ σ2
s,un, since the in-sample portfolio
variance from optimally choosing a larger set of assets will always be no greater than that from optimally
choosing a smaller subset of the same assets. Furthermore, σ2s,un ≤ σ2
2,s, since the later is obtained
by restricting the portfolio weights by the l2 norm on the assets in S, while the former is obtained
without such constraint. For the second and third term of (34), as shown in the proof of Lemma 2, is
the difference between the optimal in-sample portfolio variances of (1) and (6), and it is nonnegative.
Therefore (34) in general is nonnegative.
The term D2,s is the difference between the estimation errors, and it is determined by the estimation
error matrix and its principal submatrix, and the two optimal weight vectors. Given that D1,s ≥ 0,
to make (42) nonpositive, D2,s needs to be large enough to offset the nonnegativity caused by D1,s.
Furthermore, we can have the following lemma for the estimation error omegai when the returns are
i.i.d. normal and the sample variance and covariance estimators are used.
Lemma 4 Suppose Rt ∼ i.i.d. N (µ,Σ), t = 1, . . . , n, and every element in the covariance matrix Σ
is finite. If Σ is the sample covariance matrix. Then
P (|ωij (n)| ≥ v) = O(n−
12 exp
(−aijv2n
)),
where aij is positive and dependent on σ2i , σ2
j and σij.
33
Proof. Note that
|ωij (n)| = |σij − σij |
≤
∣∣∣∣∣ 1
n− 1
n∑k=1
(RikRjk − σij)
∣∣∣∣∣+
∣∣∣∣ 1
n− 1
(nRiRj − σij
)∣∣∣∣ . (35)
The first term can be shown to satisfy∣∣∣∣∣ 1
n− 1
n∑k=1
(RikRjk − σij)
∣∣∣∣∣ ≤ En,1 + En,2,
where
En,1 =
∣∣∣∣∣ 1
4 (n− 1)
n∑k=1
((Rik +Rjk)
2 −(2σij + σ2
i + σ2j
))∣∣∣∣∣En,2 =
∣∣∣∣∣ 1
4 (n− 1)
n∑k=1
((Rik −Rjk)
2 −(σ2i + σ2
j − 2σij))∣∣∣∣∣ .
Let V ar+ij = σ2
i + σ2j + 2σij and V ar−ij = σ2
i + σ2j − 2σij . It is known that
Rik +Rjk ∼ N(0, V ar+
ij
),
Rik −Rjk ∼ N(0, V ar−ij
),
Then
P (En,1 ≥ v) = P
∣∣∣∣∣∣∣n∑k=1
Rik +Rjk√
V ar+ij
2
− 1
∣∣∣∣∣∣∣ ≥
4 (n− 1) v
V ar+ij
= P
(χ2n ≥ n+
4 (n− 1) v
V ar+ij
)+ P
(χ2n ≤ n−
4 (n− 1) v
V ar+ij
), (36)
where v ≥ 0 is a constant. By similar arguments,
P (En,2 ≥ v) ≤ P
(χ2n ≥ n+
4 (n− 1) v
V ar−ij
)+ P
(χ2n ≤ n−
4 (n− 1) v
V ar−ij
). (37)
We first discuss the second terms in (36) and (44). If
v ≥ n
4 (n− 1)max
(V ar−ij , V ar
+ij
),
then
P
(χ2n ≤ n−
4 (n− 1) v
V ar+ij
)= P
(χ2n ≤ n−
4 (n− 1) v
V ar−ij
)= 0,
since χ2n is nonnegative with probability one. Let
θij =4
max(V ar−ij , V ar
+ij
) .
34
Suppose that
0 < n− 4 (n− 1) v
max(V ar−ij , V ar
+ij
) = n− θijvn+ θijv ≤ n.
Then
max
(P
(χ2n ≤ n−
4 (n− 1) v
V ar−ij
), P
(χ2n ≤ n−
4 (n− 1) v
V ar+ij
))= P
(χ2n ≤ (1− θijv)n+ θijv
)≤ P
(χ2n ≤ (1− θijv)n
)+ P
(χ2n ≤ θijv
).
We know that
P(χ2n ≤ (1− θijv)n
)=
Γ(n2 ,
(1−θijv)n2
)Γ(n2
) ,
P(χ2n ≤ θijv
)=
Γ(n2 ,
θijv2
)Γ(n2
) ,
where Γ (u, z) =∫ u
0xz−1 exp (−x) dx is the lower incomplete gamma function, and Γ (u) is the gamma
function. It can be shown that
Γ (u, z) ≤ zu exp (−z)u− uz
u+1
≤ zu exp (−z)u− uz
u+1
=u+ 1
u
zu exp (−z)u+ 1− z
.
Then
ln Γ (u, z) ≤ ln (u+ 1)− lnu+ u ln z − z − ln (u+ 1− z)
ln Γ (u, τu) ≤ ln (u+ 1)− lnu+ u (ln τ + lnu)− τu− ln (u+ 1− τu) .
If u > 0, the logrithm of the gamma function is
ln Γ (u) =
(u− 1
2
)lnu− u+
1
2ln 2π + 2
∫ ∞0
(arctanx/u
exp (2πx)− 1dx
)=
(u− 1
2
)lnu− u+
1
2ln 2π +O (1) .
Therefore
lnΓ (u, z)
Γ (u)≤ ln
(u+ 1
u
)+ ln
u12
u− z− (lnu− (ln z + 1))u− z − 1
2ln 2π −O (1) ,
lnΓ (u, τu)
Γ (u)≤ ln
(u+ 1
u
)− 1
2lnu− (τ − (1 + ln τ))u− ln (1− τ)− 1
2ln 2π −O (1) .
Note that τ > 1 + ln τ if τ ∈ (0, 1) . Furthermore, if τ ∈ (0, 1) ,
(1− τ)2 . τ − (1 + ln τ) .
35
Then if we set u = n2
P(χ2n ≤ τn
)=
Γ(n2 ,
τn2
)Γ(n2
) = O
(n−
12 exp
(− (1− τ)
2n
2
)).
Replacing τ with (1− θijv) and z with n−1θijv, then
Γ(n2 ,
(1−θijv)n2
)Γ(n2
) = O
(n−
12 exp
(−θ2ijv
2n
2
))Γ(n2 ,
θijv2
)Γ(n2
) = O
(n−
12 exp
(− (lnn)n
2
)).
Therefore when n goes large
max
(P
(χ2n ≤ n−
4 (n− 1) v
V ar−ij
), P
(χ2n ≤ n−
4 (n− 1) v
V ar+ij
))= O
(n−
12 exp
(−θ2ijv
2n
2
))(38)
For the first terms in (36) and (44), it can be shown that
max
(P
(χ2n ≥ n+
4 (n− 1) v
V ar−ij
), P
(χ2n ≥ n+
4 (n− 1) v
V ar+ij
))≤ P
(χ2n ≥ (1 + θijv)n− θijv
)=
Γ(n2 ,
(1+θijv)n−θijv2
)Γ(n2
) ,
where Γ (u, z) =∫∞zxu−1 exp (−x) dx is the upper incomplete gamma function. By similar arguments
to prove (38), one can derive the upper bound for these terms. In sum, we can conclude that
P
(∣∣∣∣∣ 1
n− 1
n∑k=1
(RikRjk − σij)
∣∣∣∣∣ ≥ v)
= O(n−
12 exp
(−a′ijv2n
)), (39)
where a′ij is positive and dependent on σ2i , σ
2j and σij . As for the second term of (35), it can be shown
that ∣∣∣∣ 1
n− 1
(nRiRj − σij
)∣∣∣∣ ≤ E3,n + E4,n,
where
E3,n =
∣∣∣∣∣ 1
n (n− 1)
n∑k=1
(RikRjk − σij)
∣∣∣∣∣E4,n =
∣∣∣∣∣ 1
n− 1
n∑k=2
Ri1Rjk
∣∣∣∣∣ .
36
Then it follows
P (E3,n ≥ v) ≤ P
(χ2n ≥ n+
4 (n− 1)nv
V ar+ij
)+ P
(χ2n ≤ n−
4 (n− 1)nv
V ar+ij
)
+P
(χ2n ≥ n+
4 (n− 1)nv
V ar−ij
)+ P
(χ2n ≤ n−
4 (n− 1)nv
V ar−ij
)(40)
and
P (E4,n ≥ v) ≤ 2P
(χ2n−1 ≥ n+
4 (n− 1) v
σ2i + σ2
j
)+ 2P
(χ2n−1 ≥ n−
4 (n− 1) v
σ2i + σ2
j
). (41)
by cov (Ri1, Rjk) = 0 for k = 2, . . . , n (i.i.d. of the returns). By similar arguments to prove (39) it is
not difficult to see (40) and (41) have the same order as (39). Thus in sum
P (|ωij (n)| ≥ v) = O(n−
12 exp
(−aijv2n
)),
where aij is positive and determined by a′ij , σ2i , σ
2j and σij .
8.6 Proof of Theorem 1 and Theorem 2
8.6.1 Proof of Theorem 1
Proof. Given S ⊆ 1, . . . , N, by Lemma 2, one can have
w∗Ts Σssw∗s − w∗TunΣw∗un =
(σ2s − σ2
un
)−(w∗TunΩ (n) w∗un − w∗Ts Ωss (n) w∗s
)(42)
≤ (cs,1 (cs,1 + 1) + 1) σ2un,s − σ2
un − λ2 ‖w∗s‖22
+λ1
2
(uTδ2,s − ‖w∗s‖1
)−(
w∗TunΩ (n) w∗un − maxi,j∈S
|ωij (n)| ‖w∗s‖21
)≤ cs,2σ
2un,s −
(σ2un + w∗TunΩ (n) w∗un
)+ (λ1cs,3 − λ2cs,4)
−λ1 ‖w∗s‖1 + maxi,j∈S
|ωij (n)| ‖w∗s‖21
where cs,1 = λ122
(φmin
(Σss
))− 12
, cs,2 = cs,1 (cs,1 + 1) + 1, cs,3 = 2−1(uTδ2,s + ‖w∗s‖1
), and cs,4 =
‖w∗s‖22 . As shown by El Karoui (2009),
w∗TunΣw∗un∼=
1
1− ρNσ2un,
and it follows that w∗TunΩ (n) w∗un−ρNσ2un = Op
(ρN (1− ρN )
−1)
, and it is positive. Therefore the last
inequality is bounded by
cs,2σ2un,s − σ2
un − ρNσ2un + (λ1cs,3 − λ2cs,4)− λ1 ‖w∗s‖1 + max
i,j∈S|ωij (n)| ‖w∗s‖
21
Then we have a look of cs,3. With some algebra, it can be shown that λ1cs,3 is the difference between
two optimal values of objective functions: the weighted norm and squared l2 norm optimizations, when
Σss is used. Note that squared l2 norm optimization is a special case of the weighted norm optimization
when λ1 = 0. By the assumption that the weighted norm optimization is feasible for for 0 ≤ λ1, λ2 <∞,
as n, N → ∞, we can conclude that cs,3 = Op (1). With similar arguments, cs,4 = Op (1) , since it is
37
the squared l2 norm penalty of the objective function of the weighted norm portfolio optimization.
Let
A1,s =
cs,2σ
2un,s − ρNσ2
un + (λ1cs,3 − λ2cs,4) + maxi,j∈S
|ωij (n)| ‖w∗s‖21 ≤ σ
2un + λ1 ‖w∗s‖1
,
A1,s is a sufficient condition to make (42) nonpositive. Define the following three events,
B1,s = λ1cs,3 − λ2cs,4 ≤ B1,n ,
B2,s =
maxi,j∈S
|ωij (n)| ‖w∗s‖21 ≤ λ1 ‖w∗s‖1
,
B3,s =cs,2σ
2un,s − ρNσ2
un ≤ σ2un −B1,n
,
where B1,n is a finite nonnegative constant. It follows that⋂3i=1 Bi,s ⊆ A1,s. Therefore
P(w∗Ts Σssw
∗s ≤ w∗TunΣw∗un|S = S
)= P
(w∗Ts Σssw
∗s ≤ w∗TunΣw∗un
)≥ P (A1,s)
≥ 1−3∑i=1
P(Bci,s
).
We set
λ1 = λ2 = λn,N = B0
√2 logN
aminn,
λ2 = o (1) and B1,n = (log n)− 1
2 , where B0 > 0 is a constant such that supS⊆1,...,N P(‖w∗s‖l1 > B0
)=
o (1). For P (Bc1,s), we can have
P (Bc1,s) = P
(cs,3 − cs,4 > B2
√n
log n logN
)≤ P
(cs,3 − cs,4 >
B2√n
log n
), (43)
where B2 =√amin/
(√2B0
). Since cs,3 − cs,4 = Op (1) and B2
√n/ log n → ∞, (43) converges to
zero as n, N → ∞. Then we have a look of P(Bc2,s
). By Lemma 4 and the assumption that
supS⊆1,...,N P (‖w∗s‖1 > B0) = o (1), one can show that
P(Bc2,s
)= P
(maxi,j∈S
|ωij (n)| ‖w∗s‖1 > λ1
)≤ P
(maxi,j∈S
|ωij (n)|B0 > λ1
⋂‖w∗s‖1 ≤ B0
)+ o (1)
≤ P
(maxi,j∈S
|ωij (n)|B0 > λ1
)+ o (1) .
= N2O
(n−
12 exp
(−aminλ
21n
B20
))+ o (1) .
= O(n−
12
).
For P(Bc3,s
), we at first show that
cs,2σ2un,s
p.→ n− |S| − 1 + k
n− 1σ2un,s, (44)
38
as n and n− |S| go large. Apply Lemma 3 to Σss, it can be shown that almost surely,
limn→∞
φmin(
Σss
)≥ φmin (Σss) (1−√ρs)2
> 0,
where ρs = limn→∞ |S| /n. Therefore
limn→∞
cs,1 = limn→∞
√√√√ λ2
φmin(
Σss
) ≤ 1
1−√ρs
√λ2
φmin (Σss) ,=
cs,11−√ρs
.
Now with λ2 = λn,N , cs,1 = o (1), thus cs,1 = op (1). Therefore cs,2 = cs,1 (cs,1 + 1) + 1p.→ 1. For σ2
un,s,
it has been shown that given S,
σ2un,s = uT
(AsΣ
−1ss A
Ts
)−1
u = uT(AsΣ
−1ss A
Ts
)−1uχn−1−|S|+k
n− 1= σ2
un,s
χ2n−1−|S|+k
n− 1.
Thus as n and n − |S| become large, σ2un,s
p.→ (n− 1)−1
(n− |S| − 1 + k)σ2un,s by χ2
n−1−|S|+kp.→ n −
|S| − 1 + k. Combining the results, the claim (44) follows. Furthermore, as k is fixed and k |S| , we
can haven− |S| − 1 + k
n− 1σ2un,s → (1− ρs)σ2
un,s,
as n and |S| go large. Thus
P(Bc3,s
)= P
(cs,2σ
2un,s − ρNσ2
un > σ2un,s −B1,n
)→ P
((1− ρs)σ2
un,s > σ2un
),
by setting B1,n = (log n)− 1
2 .
We then prove that if (11) hold, P(w∗TΣw∗ ≤ w∗TunΣw∗un
)→ 1 as n→∞. We are hoping to con-
struct the lower bound of the unconditional probability by aggregating P(w∗Ts Σssw
∗s ≤ w∗TunΣw∗un|S
)over different realized S:
P(w∗TΣw∗ ≤ w∗TunΣw∗un
)=
∑S⊆1,...,N
P(w∗Ts Σssw
∗s ≤ w∗TunΣw∗un|S = S
)P(S = S
)
≥∑
S⊆1,...,N
(1−
3∑i=1
P(Bci,s
))P(S = S
)
= 1−3∑i=1
∑S⊆1,...,N
P(Bci,s
)P(S = S
)
First, note that upper bound of the third term P(Bc2,s
)is independent of S, thus
∑S⊆1,...,N
P(Bc2,s
)P(S = S
)= O
(n−
12
).
39
For the other two terms, we have∑S⊆1,...,N
P(Bc1,s
)P(S = S
)=
∑S⊆1,...,N
P (λ1cs,3 − λ2cs,4 > B1,n)P(S = S
)≤ sup
S⊆1,...,NP
(cs,3 − cs,4 >
B2√n
log n
), (45)∑
S⊆1,...,N
P(Bc3,s
)P(S = S
)=
∑S⊆1,...,N
P(cs,2σ
2un,s − ρNσ2
un > σ2un −B1,n
)P(S = S
)≤ sup
S⊆1,...,NP
(cs,2σ
2un,s − ρNσ2
un > σ2un −
1√log n
). (46)
By the assumption that the weighted norm optimization is feasible, cs,3 − cs,4 = Op (1) holds for every
S as n and N → ∞, thus (45) converges to zero as n → ∞. For (46), as n and n − |S| → ∞, by the
assumption that n and N →∞, σ2un,s
√log n→∞,
P(cs,2σ
2un,s − ρNσ2
un > σ2un −B1,n
)→ P
((1− ρs)σ2
un,s > σ2un −O
(1√
log n
))= P
(σ2un,s − σ2
un
σ2un,s
− ρs > −1
σ2un,s
O
(1√
log n
))
→ P
(σ2un,s − σ2
un
σ2un,s
− ρs > 0
)
≤ P
(sup
S⊆1,...,N
(σ2un,s − σ2
un
σ2un,s
− ρs
)> 0
)= 0
for every S, if the MRPV condition holds. Combining the above results, the conclusion follows.
8.6.2 Proof of Theorem 2
Proof.
Following the proof of Theorem 1, for comparing w∗Ts Σssw∗s with σ2
1/N , we can have
w∗Ts Σssw∗s − σ2
1N
= σ2s + w∗Ts Ωss (n) w∗s − σ2
1N
≤ cs,2σ2un,s − σ2
1N
+ (λ1cs,3 − λ2cs,4)− λ1 ‖w∗s‖1 + maxi,j∈S
|ωij (n)| ‖w∗s‖21 (47)
Note that for every S, w∗Ts Σssw∗s−σ2
1/N ≤ 0 implies that σ2un,s−σ2
1/N ≤ 0 holds, and hence the derived
results should be verified whether it violates this condition. Let
A′1,s =
cs,2σ
2un,s + (λ1cs,3 − λ2cs,4) + max
i,j∈S|ωij (n)| ‖w∗s‖
21 ≤ σ
21N
+ λ1 ‖w∗s‖1
.
A′1,Sλ
is a sufficient condition to make (47) non-positive. Define the following event,
B′1,s = λ1cs,3 − λ2cs,4 ≤ ηN ,
B′3,s =cs,2σ
2un,s ≤ σ2
1N− ηN
.
40
where ηN = σ2(N−1 logN
) 12−ε. Note that σ2
1/N−ηN > 0, since by the assumption that 0 < ηN < σ21/N ,
where σ2 > 0 and 0 < ε < 2−1 are two constants. We then show that B′1,s ∩ B2,s ∩ B′3,s ⊆ A′1,s. For
P (B′c1,s), it follows that
P (B′c1,s) = P
(cs,3 − cs,4 > B′2
(n
logN
)ε)≤ P
(cs,3 − cs,4 > B′2
(n
log n
)ε),
where
B′2 = σ2B2ρε− 1
2
N
is a constant. Since cs,3 − cs,4 = Op (1), and B′2 (n/ log n)ε → ∞, P (B′c1,s) converges to zero as n and
N →∞. By similar fashion for (45), as n→∞
supS⊆1,...,N
P
(cs,3 − cs,4 > B′2
(n
log n
)ε)→ 0
since cs,3 − cs,4 = Op (1) holds for every S as n and N →∞. We then need to show that∑S⊆1,...,N
P(B′c3,s
)P(S = S
)→ 0.
Following the same strategy used in the proof of Theorem 1 and setting λ1 = λ2 = λn,N , it can be
shown that if ρs − ηN/σ2un,s > 0,
P(cs,2σ
2un,s > σ2
1N− ηN
)→ P
((1− ρs)σ2
un,s − σ21N> −ηN
)= P
(σ2un,s − σ2
1N
σ2un,s
> ρs −ηNσ2un,s
)
≤ P
(sup
S⊆1,...,N
σ2un,s − σ2
1N
σ2un,s
> 0
)= 0,
since we require σ2un,s − σ2
1/N ≤ 0 for every S. If ρs − ηN/σ2un,s ≤ 0, we need
P(cs,2σ
2un,s > σ2
1N− ηN
)→ P
(σ2un,s − σ2
1N
σ2un,s
> ρs −ηNσ2un,s
)
≤ P
(sup
S⊆1,...,N
(σ2un,s − σ2
1N
σ2un,s
− ρs −ηNσ2un,s
)> 0
)= 0,
To sum if the MRPV condition (12) holds, then P(cs,2σ
2un,s > σ2
1N
− ηN)→ 0, and the proof completes.
41
8.7 Derivation of the Coordinate Wise Descent Algorithm
From the KKT conditions, when the linear constraint is wT1N = 1, by fixing wj , j = 1, . . . , N, j 6= i,
we can solve wi asST (γ − zi, λ1)
2 (σ2i + λ2)
,
where ST (x, y) = sign (x) (|x| − y)+ is the soft thresholding function and zi = 2∑Nj 6=i wjσij . Let
S+ = i : wi > 0 and S− = i : wi < 0. Then we know that
wT1N = γ
∑i∈S+∪S−
1
2 (σ2i + λ2)
− ∑i∈S+∪S−
zi2 (σ2
i + λ2)+
λ1
∑i∈S−
1
2 (σ2i + λ2)
−∑i∈S+
1
2 (σ2i + λ2)
.
Since wT1N = 1, we can solve for γ as
1 +∑i∈S+∪S−
zi2(σ2
i+λ2)− λ1
(∑i∈S−
1
2(σ2i+λ2)
−∑i∈S+
1
2(σ2i+λ2)
)[∑
i∈S+∪S−1
2(σ2i+λ2)
] .
To implement the algorithm, we set initial value of each weight w(0)1 = w
(0)2 = · · · = w
(0)p = N−1,
and γ(0) > λ1. The algorithm starts from updating w1, w2, . . . , and wN sequentially, and then use the
updated vector w to update γ. The procedure terminates until w and γ have converged. We summarize
the algorithm as follows.
Algorithm 1 Coordinate-wise descent update for the weighted norm mvp optimization with
the full investment constraint
1. Fix λ1 and λ2 at some constant levels.
2. Initialize w(0) = N−11N and γ(0) > λ1
3. For i = 1, . . . , N , and k > 0,
w(k)i ←
ST(γ(k−1) − z(k)
i , λ1
)2 (σ2
i + λ2),
where
z(k)i = 2(
∑j<i
w(k)j σij +
∑j>i
w(k−1)j σij).
42
4. For k > 0, update γ as
γ(k) ←
[ ∑i∈S(k)
+ ∪S(k)−
1
2 (σ2i + λ2 (1− α))
]−1
×
[1 +
∑i∈S(k)
+ ∪S(k)−
z(k)i
2 (σ2i + λ2)
−
λ1
∑i∈S(k)
−
1
2 (σ2i + λ2)
−∑i∈S(k)
+
1
2 (σ2i + λ2)
],where S
(k)+ =
i : w
(k)i > 0
and S− =
i : w
(k)i < 0
.
5. Repeat 3 and 4 until w(k) and γ(k) converge.
8.8 Stochastic Dominance Test
The performance measures shown in Table 8 and 2, such as the Sharpe ratio or certainty equivalent
return, does not take a general framework of utility maximization into account. If an individual endowed
with an arbitrary (nondecreasing) utility function, should she prefer the weighted norm mvp to the
benchmark portfolios? The concepts of first-order (FSD) and second-order (SSD) stochastic dominance
can help us to answer the question. Given returns of two portfolio strategies 1 and 2, say R1 and
R2, if R1 first order stochastically dominates R2, it is equivalent to saying that every expected utility
maximizer will prefer strategy 1 to strategy 2; or strategy 1 clearly delivers higher expected utility
than strategy 2. If R1 second order stochastically dominates R2, it is equivalent to saying that every
risk-averse expected utility maximizer will prefer strategy 1 to strategy 2; or we can say strategy 1 is
less risky than strategy 2.
In practice, to see whether R1 first or second order stochastic dominates R2, we can implement some
formal statistical tests by comparing functionals of their cumulative distributions. Now let strategy 1
be the weighted norm mvp and strategy 2 be some other benchmark portfolio. The null hypothesis for
the test is
H0 : The weighted norm mvp FSD (SSD) the benchmarks.
If we cannot reject the null, there is not enough evidence to say that the weighted norm mvp FSD
(SSD) the benchmark portfolio does not hold. We adopt subsampling method suggested by Linton
et al. (2005) to construct critical values and p-values for the test.
The formal definitions for the FSD and SSD are as follows.
Definition 2 Let u (.) be an nondecreasing (u′ (.) ≥ 0 ) von Neumann-Morgenstern utility function,
and F1 (r) and F2 (r) be the cumulative distribution functions (c.d.f.) of random variables R1 and R2
respectively. R2 is first order stochastic dominated by R1, i.e. R1 FSD R2, if and only if
E (u (R1)) ≥ E (u (R2)) ,
for all u (.) and with strict inequality for some u (.) ; or
F1 (r) ≤ F2 (r) ,
43
for all r and with strict inequality for some r.
Definition 3 Let u (.) be an nondecreasing (u′ (.) ≥ 0 ) and concave (u” (.) ≤ 0 ) von Neumann-
Morgenstern utility function, and F1 (r) and F2 (r) be the cumulative distribution functions (c.d.f.) of
random variables R1 and R2 respectively. R2 is second order stochastic dominated by R1, i.e. R1 SSDR2, if and only if
E (u (R1)) ≥ E (u (R2)) ,
for all u (.) and with strict inequality for some u (.) ; or∫ r
−∞F1 (x) dx ≤
∫ r
−∞F2 (x) dx,
for all r and with strict inequality for some r.
Note that the above definition of SSD does not require the property of equal mean of R1 and R2.
Does it matter? To see this,
E (u (R1))− E (u (R2)) =
∫ ∞−∞
u (r) dF1 (r)−∫ ∞−∞
u (r) dF1 (r)
=
∫ ∞−∞
u′ (r) (F2 (r)− F1 (r)) dr
= u′ (r)
∫ r
−∞(F2 (x)− F1 (x)) dx
∣∣∣∣∞−∞
−∫ ∞−∞
u” (r)
(∫ r
−∞(F2 (x)− F1 (x)) dx
)dr.
Clearly, if the second condition in definition 2 holds, and u (r) is nondecreasing and concave, E (u (R1)) ≥E (u (R2)) no matter whether R1 and R2 have the same mean or not.
The concepts of FSD and SSD state whether one portfolio strategy can generate higher expected
utility than another, and also whether the portfolio strategy can be less risky than another. If R1 FSDR2 holds, it is equivalent to saying that every expected utility maximizer will prefer F1 (r) to F2 (r) .
On the other hand, F1 (r) clearly deliver higher expected utility than F2 (r) . If R1 SSD R2 holds, it
is equivalent to saying that every risk-averse expected utility maximizer will prefer F1 (r) to F2 (r). Or
we can say F1 (r) is less risky than F2 (r) .
To see whether one random variable first or second order stochastic dominates the other random
variable, one can implement some formal statistical tests via comparing functionals of their c.d.f’s. We
can empirically estimate the c.d.f. by
Fi (r) =1
T
T∑t=1
1 Rit ≤ r ,
i = 1, 2. Let ∆(1)1,2 (r) := F1 (r)− F2 (r) , and δ∗(1) := supr
(∆
(1)1,2 (r)
). To test whether R1 FSD R2, we
can form a null hypothesis as the following
H0 : δ∗(1) ≤ 0.
The above null hypothesis states that for all r, F1 (r) ≤ F2 (r) . On the other hand, if we cannot reject
44
the null, there is not enough evidence to say that R1 FSD R2 does not hold. The empirical analogue
of δ∗(1) can be
δ∗(1) = supr
√T(
∆(1)1,2 (r)
),
where ∆(1)1,2 (r) = F1 (r)− F2 (r) .
Let CFi (r) :=∫ r−∞ Fi (x) dx. For testing SSD, at first note that by intergating by parts, given
Fi (−∞) := 0, it can be shown that
CFi (r) = Fi (r) r −∫ r
−∞xdFi (x)
=
∫ r
−∞rdFi (x)−
∫ r
−∞xdFi (x)
=
∫ r
−∞(r − x) dFi (x) .
Therefore we can empirically estimate CFi (r) by
ˆCFi (r) =1
T
T∑t=1
(r −Rit) 1 Rit ≤ r .
Let ∆(2)1,2 (r) := CF1 (r) − CF2 (r) , and δ∗(2) := supr
(∆
(2)1,2 (r)
). To test whether R1 SSD R2, we can
form a null hypothesis as the following
H0 : δ∗(2) ≤ 0.
The above null hypothesis states that for all r, CF1 (r) ≤ CF2 (r) . On the other hand, if we cannot
reject the null, there is not enough evidence to say that R1 SSD R2 does not hold. The empirical
analogue of δ∗(2) can be
δ∗(2) = supr
√T(
∆(2)1,2 (r)
),
where ∆(2)1,2 (r) = CF 1 (r)− CF 2 (r) . Let
R = min (R11, . . . , R1T , R21, . . . , R2T ) ,
R = max (R11, . . . , R1T , R21, . . . , R2T ) .
To numerically evaluate δ∗(1)and δ∗(2), we divide the interval[R, R
]into 200 equally spaced grids and
search the value of r ∈[R, R
]over the grids to maximize∆
(1)1,2 (r) or ∆
(2)1,2 (r)
We are interested in is whether the weighted norm mvp first (or second) order stochastic dominates
the other three strategies (1/N , no-shortsales and GMVP). Formally, it can be stated as Rα FSD Rl
(or Rα SSD Rl ), where Rα is return of the weighted norm mvp with parameter value α and Rl is
return of strategy l, l = 1/N , no-shortsales and GMVP. To empirically construct critical values and
p-values, we adopt subsampling method suggested by Linton et al. (2005). We briefly describe the
scheme as follows. Suppose we have T realized return observations, R1, R2 . . . , RT . The test statistic,√TθT (R1, R2 . . . , RT ), is a function of the T observations, and is determined by supr ∆
(1)1,2 (r) for the
FSD test and by supr ∆(2)1,2 (r) for the SSD test. Let GT (x) = P
(√TθT (R1, R2 . . . , RT ) ≤ x
)be the
distribution function of the test statistic. Following Linton et al. (2005), we approximate the distribution
45
of GT (x)by
GT,b (x) =1
T − b+ 1
T−b+1∑t=1
1√
bθT,b,t ≤ x,
where θT,b,t := θb (Rt, Rt+1 . . . , Rt+b−1) , t = 1, . . . , T − b + 1, is just the function of θ evaluated with
subsample Rt, Rt+1 . . . , Rt+b−1. Note that each√bθT,b,t is the test statistic for the FSD (or SSD test)
obtained from b subsamples Rt, Rt+1 . . . , Rt+b−1, and it can be seen that the subsampling scheme
is essentially very similar as the rolling window scheme. With this approximation, we then define the
subsample critical value at significant level δ as gT,b (δ) = infx : δ ≥ 1− GT,b (x)
, and the subsample
p-values pT,b = 1 − GT,b(√
TθT
). The decision rule is that we reject the null if
√TθT > gT,b (δ) or
pT,b < δ.
Table 9 and 10 show subsample p-values for the stochastic dominance tests for the daily FF100
and CRSP300 For the daily FF100, there is a strong evidence against the hypotheses that the weighted
norm mvp FSD the 1/N and the nsmvp, but in favour of the weighted norm mvp FSD the gmvp. These
results seem to be inconsistent with those shown in previous section, in which the 1/N and nsmvp both
have high portfolio variance and low average annualized portfolio returns 14. As for the CRSP300, there
is some evidence supporting that the weighted norm mvp FSD the nsmvp and gmvp. Overall, the FSD
test suggests that it is still hard to say whether the weighted norm mvp is preferred to the benchmark
portfolios by an individual endowed with an arbitrary risk preference.
Unlike the FSD test, the results for the SSD test are more consistent. For the two data sets, we
cannot reject that the weighted norm mvp second order stochastic dominates the benchmark portfolios
at the significant level 0.05. It suggests that an individual with risk averse preference is more likely to
choose the weighted norm strategy than the other benchmark ones as she is making decisions on asset
allocations.
8.9 Coordinate-Wise Descent on MVP Penalized by the Generalized l1
Penalty
The subsection provides a derivation of the coordinate-wise descent algorithm for solving the mvp
penalized by the generalized l1 norm penalty in Section 6.5. As the linear constraint is the full investment
constraint, the penalized mvp optimization is given by
minw
wTΣw + λ1 ‖w‖1 + λ2 ‖w −w0‖1 , subject to wT1 = 1.
Let w0,i ≥ 0, i = 1, . . . , N denote elements in w0. At the stationary point, the following subgradient
equation should hold
2σ2iwi + 2
N∑j 6=i
σijwj + λ1sign (wi) + λ2sign (wi − w0,i)− γ = 0,
14The annualized average oos net portfolio returns for the 1/N and nsmvp are 11.5416% and 13.4645%respectively, while the lowest two annualized average oos net portfolio returns for the weighted norm mvp are13.3457% (α = 0) and 14.2830% (α = 0.2).
46
for i = 1, . . . , N, and also wT1 = 1. Again, let zi =∑Nj 6=i σijwj . By fixing wj , j 6= i, one can solve wi
as
wi =
γ−zi−(λ1+λ2)2σ2i
if γ − zi > (λ1 + λ2) + 2σ2iwi,0,
wi,0 if (λ1 − λ2) + 2σ2iwi,0 ≤ γ − zi ≤ (λ1 + λ2) + 2σ2
iwi,0,
γ−zi−(λ1−λ2)2σ2i
if λ1 − λ2 < γ − zi < (λ1 − λ2) + 2σ2iwi,0,
0 if − λ1 − λ2 ≤ γ − zi ≤ λ1 − λ2,
γ−zi+(λ1+λ2)2σ2i
if γ − zi < −λ1 − λ2.
Let ∆1 = i : wi,0 < wi <∞, ∆2 = i : wi = wi,0, ∆3 = i : 0 < wi < wi,0, and ∆4 = i : −∞ < wi < 0.One can solve γ by using the full investment constraint,
γ =1−
∑i∈∆2
wi,0 +∑i∈∆1
zi+(λ1+λ2)2σ2i
+∑i∈∆3
zi+(λ1−λ2)2σ2i
+∑i∈∆4
zi−(λ1+λ2)2σ2i∑
i∈∆1∪∆3∪∆4
12σ2i
.
The algorithm can be summarized as follows.
Algorithm 2 Coordinate-wise descent update for mvp penalized by the generalized l1 penalty.
1. Fix λ1 and λ2 at some constant levels.
2. Initialize w(0) = N−11N and γ(0) > max (λ1, λ2)
3. For i = 1, . . . , N , and k > 0,
w(k)i ←
γ(k−1)−z(k)i −(λ1+λ2)
2σ2i
if γ(k−1) − z(k)i > (λ1 + λ2) + 2σ2
iwi,0,
wi,0 if (λ1 − λ2) + 2σ2iwi,0 ≤ γ(k−1) − z(k)
i ≤ (λ1 + λ2) + 2σ2iwi,0,
γ(k−1)−z(k)i −(λ1−λ2)
2σ2i
if λ1 − λ2 < γ(k−1) − z(k)i < (λ1 − λ2) + 2σ2
iwi,0,
0 if − λ1 − λ2 ≤ γ(k−1) − z(k)i ≤ λ1 − λ2,
γ(k−1)−z(k)i +(λ1+λ2)
2σ2i
if γ(k−1) − z(k)i < −λ1 − λ2.
where
z(k)i = 2(
∑j<i
w(k)j σij +
∑j>i
w(k−1)j σij).
4. For k > 0, update γ as
γ(k) ←
[ ∑i∈∆
(k)1 ∪∆
(k)3 ∪∆
(k)4
1
2σ2i
]−1
×
[1−
∑i∈∆
(k)2
wi,0 +∑i∈∆
(k)1
z(k)i + (λ1 + λ2)
2σ2i
+∑i∈∆
(k)3
z(k)i + (λ1 − λ2)
2σ2i
+∑i∈∆
(k)4
z(k)i − (λ1 + λ2)
2σ2i
],
47
where
∆(k)1 =
i : γ(k−1) − z(k)
i > (λ1 + λ2) + 2σ2iwi,0
,
∆(k)2 =
i : (λ1 − λ2) + 2σ2
iwi,0 ≤ γ(k−1) − z(k)i ≤ (λ1 + λ2) + 2σ2
iwi,0
,
∆(k)3 =
i : −λ1 − λ2 ≤ γ(k−1) − z(k)
i ≤ λ1 − λ2,,
∆(k)4 =
i : γ(k−1) − z(k)
i < −λ1 − λ2.,
5. Repeat 3 and 4 until w(k) and γ(k) converge.
8.10 The Multistage Portfolio Optimization and the l0 norm Penalty
In the following, we show that the multistage portfolio optimization in section 6.5 can be viewed
as using the majorization-minimization method to approximately solve a l0 norm penalized portfolio
optimization. The l0 norm of w is defined as ‖w‖0 =∑Ni=1 |wi|
0, where
|wi|0 :=
1 if wi 6= 0
0 if wi = 0.
Therefore to penalize the portfolio weights by the l0 norm is equivalent to restricting the number of
assets included in a portfolio. The penalized mvp optimization is given by
minw
wTΣw + λ ‖w‖0 , subject to Aw = u, (48)
As mentioned in the beginning of the paper, an optimization problem involved with the l0 norm penalty
in practice is difficult to solve. To make the problem tractable, one may approximate the l0 norm penalty
by
AP (w,ε) =
N∑i=1
1− 1
exp (ε |wi|).
and then solve the optimization with the approximated penalty AP (w,ε). It can be seen that AP (w,ε)
converges to ‖w‖0 as ε goes large,
limε→∞
AP (w,ε) = ‖w‖0 .
Then (48) becomes
minw
(limε→∞
wTΣw + λAP (w,ε)), subject to Aw = u, . (49)
However, we now meet another difficulty. Since AP (w,ε) is concave in wi, the objective function is not
guaranteed to be a convex function of w, thus a coordinate-descent type algorithm is not applicable
here. To circumstance this, one can adopt the majorization-minimization approach. A real valued
function f (x, y) is said to majorize a real valued function g (x) at point y if
f (x, y) ≥ g (x) for all x, y ∈ R,
f (y, y) = g (y) for all x ∈ R.
48
Suppose x = y∗ minimizes f (x, y) , then
g (y∗) = f (y∗, y) + g (y∗)− f (y∗, y)
≤ f (y∗, y) + g (y)− f (y, y)
≤ g (y) .
Now let y∗ = v(l+1), and y = v(l), then
g(v(l+1)
)≤ g
(v(l)).
That says, if one wants to find a sequence of v(l) to decrease the function g (x) , one can achieve this by
sequentially minimizing its majorization function f(x, v(l)
)with respect to x,
v(l+1) = arg minxf(x, v(l)
),
An algorithm which sequentially minimizes a majorization function of a certain objective function in
order to find its global minimizer is called the minimization-majorization (MM) algorithm. For solving
(49), we can try to find a majorization function of the objective function of (49), and cast the MM
algorithm to find its global minimizer. However, the majorization should be convex for w, otherwise
we still will have the same difficulty as we have in solving (49).
Since wTΣw is already a convex function of w, one can just find a convex majorization function
of AP (w,ε) to replace AP (w,ε), and the new objective function will be a convex function of w. Note
that for all x ≥ 0 and ε > 0 , 1− exp (−εx) is a concave function of x . And it can shown that
1− exp (−εx) ≤ 1− exp (−εy) + ε exp (−εy) (x− y) ,
for all x, y ≥ 0 and ε > 0. The right hand side of the above inequality is a linear function of x which is
tangent to the graph of 1− exp (−εx) at the point y. Let
APM(w,w∗(l), ε
)=
N∑i=1
1− 1
exp(ε∣∣∣w∗(l)i
∣∣∣) +ε
exp(ε∣∣∣w∗(l)i
∣∣∣)(|wi| −
∣∣∣w∗(l)i
∣∣∣) .
It can be seen that APM(w,w∗(l), ε
)majorizes AP (w,ε) at point w∗(l), and it is also a convex function
of w. Therefore the global minimizer of wTΣw + λAP (w,ε) can be obtained by sequentially solving
minw
wTΣw + λAPM(w,w∗(l), ε
), subject to Aw = u,
where
w∗(l) = arg minw,Aw=u
wTΣw + λAPM(w,w∗(l), ε
),
which is equivalent to solving
minw
wTΣw + λ
N∑i=1
ε
exp(ε∣∣∣w∗(l)i
∣∣∣) |wi| , subject to Aw = u,
sequentially. As only the full investment constraint is imposed, the above optimization can be easily
49
solved by using algorithm 1 with λ1 = ε exp(−ε∣∣∣w∗(l)i
∣∣∣) and λ2 = 0.
50
Table 1: The table shows results of performances for daily FF100 data when the full investmentconstraint wT1 = 1 is imposed. We estimate Σ by sample covariance matrix with expandingwindow scheme. Here N = 100 and initial window length τ0 = 120. For the weighted normconstraint, we set λ1 = λ1,t, λ1 = λ2,t and vary α at six different levels. Testing period is fromJan-02-1990 to Dec-31-2010 and T = 5, 295. SV denotes sample variance of out-of-samplenet returns (when the transaction fees are deducted). The sample variance is annualized.The transaction fee we consider is 35 basis points. SR denotes annualized Sharpe ratio, andthe yearly risk free rate used is 3.63%. Certainty equivalence is obtained with ψ = 5. ColumnTOR and PAC show average turnover rate and average proportion of active assets, respectively.Column HHI and ANHHI show average values of Herfindahl–Hirschman index, and averagevalues of adjusted normalized Herfindahl–Hirschman index. Column SLR shows average short-long ratio. In the parentheses are the bootstrap standard errors obtained from using stationarybootstrap of Politis and Romano (1994).
SV(%) SR CER(%) TOR PAC HHI ANHHI SLR
α = 0 82.9718 1.0666 0.0451 0.0407 1.0000 0.0202 0.0103 0.5696(14.6744) (0.4704) (0.0158) (0.0034) (0.0000) (0.0004) (0.0005) (0.0037)
α = 0.2 86.6844 1.1442 0.0485 0.0301 0.6496 0.0323 0.0170 0.4713(16.0203) (0.4721) (0.0160) (0.0026) (0.0077) (0.0007) (0.0008) (0.0050)
α = 0.4 90.3660 1.1752 0.0502 0.0257 0.5207 0.0422 0.0230 0.4163(17.1536) (0.4709) (0.0162) (0.0022) (0.0091) (0.0011) (0.0012) (0.0054)
α = 0.6 93.4194 1.1827 0.0509 0.0231 0.4316 0.0515 0.0284 0.3758(17.9204) (0.4697) (0.0164) (0.0020) (0.0083) (0.0015) (0.0015) (0.0057)
α = 0.8 96.1646 1.1861 0.0514 0.0215 0.3714 0.0608 0.0340 0.3432(18.5558) (0.4682) (0.0166) (0.0019) (0.0078) (0.0018) (0.0018) (0.0061)
α = 1 98.6034 1.1847 0.0517 0.0202 0.3204 0.0700 0.0389 0.3173(19.0446) (0.4648) (0.0167) (0.0019) (0.0072) (0.0023) (0.0023) (0.0063)
NSMVP 176.2114 0.7409 0.0362 0.0082 0.1049 0.1650 0.0709 0.0000(38.2327) (0.3494) (0.0182) (0.0012) (0.0034) (0.0075) (0.0048) (0.0000)
1/N 311.1636 0.4485 0.0150 0.0051 1.0000 0.0100 0.0000 0.0000(67.1655) (0.2488) (0.0195) (0.0004) (0.0000) (0.0000) (0.0000) (0.0000)
GMVP 77.6752 0.9444 0.0400 0.0726 1.0000 0.0212 0.0113 0.6263(12.0891) (0.4802) (0.0160) (0.0133) (0.0000) (0.0005) (0.0005) (0.0053)
51
Table 2: The table shows results of performances for daily CRSP300 data when the full in-vestment constraint wT1 = 1 is imposed. We estimate Σ by sample covariance matrix withexpanding window scheme. Here N = 300 and initial window length τ0 = 360. For theweighted norm constraint, we set λ1 = λ1,t, λ1 = λ2,t and vary α at six different levels. Testingperiod is from Jan-03-2000 to Dec-31-2010 and T = 2, 767. SV denotes sample variance ofout-of-sample net returns (when the transaction fees are deducted). The sample variance isannualized. The transaction fee we consider is 35 basis points. SR denotes annualized Sharperatio, and the yearly risk free rate used is 2.50%. Certainty equivalence is obtained with ψ = 5.Column TOR and PAC show average turnover rate and average proportion of active assets,respectively. Column HHI and ANHHI show average values of Herfindahl–Hirschman index,and average values of adjusted normalized Herfindahl–Hirschman index. Column SLR showsaverage short-long ratio. In the parentheses are the bootstrap standard errors obtained fromusing stationary bootstrap of Politis and Romano (1994).
SV(%) SR CER(%) TOR PAC HHI ANHHI SLR
α = 0 50.7208 0.5576 0.0208 0.0468 1.0000 0.0132 0.0099 0.3539(15.2769) (0.4650) (0.0124) (0.0052) (0.0000) (0.0003) (0.0003) (0.0084)
α = 0.2 49.7156 0.8009 0.0276 0.0320 0.7619 0.0221 0.0178 0.2361(16.4991) (0.4928) (0.0120) (0.0030) (0.0077) (0.0005) (0.0005) (0.0092)
α = 0.4 50.8529 0.8865 0.0302 0.0267 0.6417 0.0289 0.0238 0.1795(17.5011) (0.5073) (0.0121) (0.0025) (0.0095) (0.0008) (0.0007) (0.0092)
α = 0.6 51.9853 0.9265 0.0315 0.0236 0.5544 0.0344 0.0285 0.1440(18.2561) (0.5122) (0.0122) (0.0023) (0.0096) (0.0009) (0.0009) (0.0086)
α = 0.8 52.9633 0.9514 0.0324 0.0215 0.4911 0.0390 0.0324 0.1198(18.8245) (0.5134) (0.0122) (0.0020) (0.0085) (0.0011) (0.0010) (0.0079)
α = 1 53.8325 0.9705 0.0331 0.0201 0.4499 0.0431 0.0359 0.1021(19.2547) (0.5152) (0.0123) (0.0019) (0.0076) (0.0012) (0.0011) (0.0074)
NSMVP 67.4157 1.1038 0.0395 0.0126 0.2054 0.0734 0.0574 0.0000(24.1749) (0.5062) (0.0134) (0.0010) (0.0067) (0.0014) (0.0010) (0.0000)
1/N 404.4271 0.6587 0.0225 0.0157 1.0000 0.0033 0.0000 0.0000(112.3455) (0.3344) (0.0293) (0.0009) (0.0000) (0.0000) (0.0000) (0.0000)
GMVP 52.3428 0.4147 0.0168 0.0537 1.0000 0.0144 0.0111 0.3673(15.3964) (0.4513) (0.0127) (0.0080) (0.0000) (0.0003) (0.0003) (0.0104)
52
Table 3: The table shows results of performances for weekly FF100 data when the full invest-ment constraint wT1 = 1 is imposed. We estimate Σ by sample covariance matrix with expand-ing window scheme. Here N = 100 and initial window length τ0 = 120. For the weighted normconstraint, we set λ1 = λ1,t, λ1 = λ2,t and vary α at six different levels. Testing period is fromfrom first week of 1990 to the last week of 2010 and T = 1, 095. SV denotes sample varianceof out-of-sample net returns (when the transaction fees are deducted). The sample variance isannualized. The transaction fee we consider is 35 basis points. SR denotes annualized Sharperatio, and the yearly risk free rate used is 3.63%. Certainty equivalence is obtained with ψ = 5.Column TOR and PAC show average turnover rate and average proportion of active assets,respectively. Column HHI and ANHHI show average values of Herfindahl–Hirschman index,and average values of adjusted normalized Herfindahl–Hirschman index. Column SLR showsaverage short-long ratio. In the parentheses are the bootstrap standard errors obtained fromusing stationary bootstrap of Politis and Romano (1994).
SV(%) SR CER(%) TOR PAC HHI ANHHI SLR
α = 0 122.1502 0.9108 0.2047 0.0934 1.0000 0.0162 0.0063 0.6038(26.1549) (0.4318) (0.0850) (0.0082) (0.0000) (0.0003) (0.0003) (0.0020)
α = 0.2 128.6224 0.9074 0.2059 0.0686 0.5393 0.0317 0.0134 0.4608(28.5429) (0.4291) (0.0865) (0.0071) (0.0059) (0.0011) (0.0010) (0.0032)
α = 0.4 135.7170 0.9254 0.2119 0.0588 0.3781 0.0461 0.0200 0.3801(30.5520) (0.4167) (0.0859) (0.0063) (0.0050) (0.0018) (0.0016) (0.0042)
α = 0.6 141.8157 0.9331 0.2153 0.0558 0.2904 0.0616 0.0278 0.3205(32.0246) (0.4061) (0.0856) (0.0063) (0.0056) (0.0026) (0.0023) (0.0051)
α = 0.8 146.9892 0.9144 0.2123 0.0534 0.2278 0.0786 0.0355 0.2758(33.2714) (0.3976) (0.0857) (0.0071) (0.0054) (0.0038) (0.0032) (0.0059)
α = 1 151.2552 0.8700 0.2029 0.0598 0.1853 0.0992 0.0464 0.2446(34.0646) (0.3896) (0.0861) (0.0134) (0.0056) (0.0060) (0.0048) (0.0068)
NSMVP 211.963 0.5994 0.1357 0.0260 0.0936 0.1795 0.0756 0.0000(50.4148) (0.3053) (0.0874) (0.0033) (0.0043) (0.0118) (0.0082) (0.0000)
1/N 317.9742 0.4641 0.0761 0.0119 1.0000 0.0100 0.0000 0.0000(77.8459) (0.2417) (0.0945) (0.0011) (0.0000) (0.0000) (0.0000) (0.0000)
GMVP 118.647 0.8010 0.1806 0.3160 1.0000 0.0176 0.0076 0.7558(18.5033) (0.4428) (0.0892) (0.0599) (0.0000) (0.0004) (0.0004) (0.0084)
53
Table 4: The table shows results of performances for monthly FF100 data when the fullinvestment constraint wT1 = 1 is imposed. We estimate Σ by sample covariance matrixwith expanding window scheme. Here N = 100 and initial window length τ0 = 120. For theweighted norm constraint, we set λ1 = λ1,t, λ1 = λ2,t and vary α at six different levels. Testingperiod is from January 1990 to December 2010 and T = 252. SV denotes sample variance ofout-of-sample net returns (when the transaction fees are deducted). The sample variance isannualized. The transaction fee we consider is 35 basis points. SR denotes annualized Sharperatio, and the yearly risk free rate used is 3.63%. Certainty equivalence is obtained with ψ = 5.Column TOR and PAC show average turnover rate and average proportion of active assets,respectively. Column HHI and ANHHI show average values of Herfindahl–Hirschman index,and average values of adjusted normalized Herfindahl–Hirschman index. Column SLR showsaverage short-long ratio. In the parentheses are the bootstrap standard errors obtained fromusing stationary bootstrap of Politis and Romano (1994).
SV(%) SR CER(%) TOR PAC HHI ANHHI SLR
α = 0 144.469 0.7928 0.7957 0.1990 1.0000 0.0153 0.0053 0.5760(27.1036) (0.3711) (0.3508) (0.0287) (0.0000) (0.0001) (0.0002) (0.0009)
α = 0.2 157.8873 0.7550 0.7642 0.1514 0.4785 0.0366 0.0159 0.3847(30.3782) (0.3643) (0.3630) (0.0294) (0.0092) (0.0005) (0.0007) (0.0020)
α = 0.4 167.6569 0.7416 0.7534 0.1150 0.2926 0.0612 0.0277 0.2844(34.3140) (0.3534) (0.3635) (0.0190) (0.0065) (0.0015) (0.0019) (0.0030)
α = 0.6 174.3271 0.7361 0.7493 0.0973 0.2050 0.0807 0.0331 0.2270(37.5616) (0.3401) (0.3578) (0.0156) (0.0043) (0.0022) (0.0029) (0.0041)
α = 0.8 180.8716 0.7119 0.7236 0.0911 0.1629 0.0996 0.0401 0.1892(40.6771) (0.3271) (0.3536) (0.0147) (0.0033) (0.0042) (0.0047) (0.0038)
α = 1 187.1527 0.6869 0.6957 0.0987 0.1397 0.1245 0.0563 0.1605(42.8446) (0.3148) (0.3500) (0.0168) (0.0033) (0.0062) (0.0060) (0.0041)
NSMVP 219.3593 0.5222 0.4900 0.0581 0.1001 0.1692 0.0738 0.0000(49.5957) (0.2663) (0.3453) (0.0077) (0.0054) (0.0101) (0.0074) (0.0000)
1/N 305.5895 0.5053 0.4019 0.0287 1.0000 0.0100 0.0000 0.0000(62.2650) (0.2234) (0.3597) (0.0042) (0.0000) (0.0000) (0.0000) (0.0000)
GMVP 255.2823 0.3886 0.2881 1.2783 1.0000 0.0170 0.0071 0.8379(36.0606) (0.2874) (0.3937) (0.2674) (0.0000) (0.0002) (0.0002) (0.0114)
54
Table 5: The table shows results of performances for weekly CRSP300 data when the fullinvestment constraint wT1 = 1 is imposed. We estimate Σ by sample covariance matrix withexpanding window scheme. HereN = 300 and initial window length τ0 = 360. For the weightednorm constraint, we set λ1 = λ1,t, λ1 = λ2,t and vary α at six different levels. Testing period isfrom the first week of 2000 to the last week of 2010 and T = 573. SV denotes sample varianceof out-of-sample net returns (when the transaction fees are deducted). The sample variance isannualized. The transaction fee we consider is 35 basis points. SR denotes annualized Sharperatio, and the yearly risk free rate used is 2.50%. Certainty equivalence is obtained with ψ = 5.Column TOR and PAC show average turnover rate and average proportion of active assets,respectively. Column HHI and ANHHI show average values of Herfindahl–Hirschman index,and average values of adjusted normalized Herfindahl–Hirschman index. Column SLR showsaverage short-long ratio. In the parentheses are the bootstrap standard errors obtained fromusing stationary bootstrap of Politis and Romano (1994).
SV(%) SR CER(%) TOR PAC HHI ANHHI SLR
α = 0 75.5405 0.4162 0.0813 0.1473 1.0000 0.0082 0.0049 0.4822(25.9349) (0.4583) (0.0743) (0.0143) (0.0000) (0.0003) (0.0003) (0.0059)
α = 0.2 69.3276 0.6094 0.1123 0.0884 0.6555 0.0177 0.0126 0.2800(27.6839) (0.5044) (0.0716) (0.0095) (0.0037) (0.0004) (0.0004) (0.0079)
α = 0.4 70.1200 0.6767 0.1233 0.0693 0.5160 0.0247 0.0183 0.2038(29.3341) (0.5194) (0.0717) (0.0078) (0.0042) (0.0005) (0.0005) (0.0089)
α = 0.6 71.3702 0.7241 0.1314 0.0603 0.4410 0.0300 0.0226 0.1634(30.5486) (0.5275) (0.0717) (0.0069) (0.0048) (0.0005) (0.0006) (0.0089)
α = 0.8 72.6740 0.7605 0.1378 0.0550 0.3974 0.0344 0.0262 0.1363(31.4910) (0.5310) (0.0715) (0.0063) (0.0060) (0.0006) (0.0006) (0.0086)
α = 1 74.3669 0.7836 0.1423 0.0515 0.3668 0.0386 0.0297 0.1153(32.5348) (0.5310) (0.0715) (0.0060) (0.0073) (0.0006) (0.0007) (0.0084)
NSMVP 95.4867 0.9127 0.1737 0.0340 0.1956 0.0656 0.0491 0.0000(43.6905) (0.4872) (0.0705) (0.0039) (0.0069) (0.0023) (0.0018) (0.0000)
1/N 393.5178 0.6621 0.1115 0.0347 1.0000 0.0033 0.0000 0.0000(132.7015) (0.3422) (0.1421) (0.0030) (0.0000) (0.0000) (0.0000) (0.0000)
GMVP 88.2586 0.2131 0.0441 0.2131 1.0000 0.0086 0.0053 0.5492(26.6200) (0.3741) (0.0709) (0.0315) (0.0000) (0.0003) (0.0003) (0.0128)
55
Table 6: The table shows results of performances of weighted norm mvp for daily CRSP300and FF100 data when the full investment constraint wT1 = 1 and target return constraintwTµ = µ are imposed. The target return µ shown in the table is annualized. We estimate µand Σ by sample mean and covariance matrix with expanding window scheme. Here N = 100and 300 for FF100 and CRSP300 respectively, and initial window length τ0 = 1.2N . For theweighted norm constraint, we set λ1 = λ1,t, λ1 = λ2,t and vary α at three different levels. SVdenotes sample variance of out-of-sample net returns (when the transaction fees are deducted).The sample variance is annualized. The transaction fee we consider is 35 basis points. SRdenotes annualized Sharpe ratio, and the yearly risk free rates for FF100 and CRSP300 are3.63% and 2.5%, respectively. Certainty equivalence is obtained with ψ = 5. Column PACshows average proportion of active assets. In the parentheses are the bootstrap standard errorsobtained from using stationary bootstrap of Politis and Romano (1994).
FF100, Jan-02-1990 to Dec-31-2010, T = 5, 295
µ = 10% µ = 20%
SV(%) SR CER(%) TOR PAC SV(%) SR CER(%) TOR PAC
α = 0 96.6592 0.5857 0.0279 0.0615 1.0000 83.7073 1.0200 0.0435 0.0502 1.0000(18.6169) (0.4286) (0.0167) (0.0034) (0.0000) (13.8393) (0.4591) (0.0157) (0.0053) (0.0000)
α = 0.6 111.2906 0.6429 0.0305 0.0488 0.4938 94.7331 1.0596 0.0463 0.0382 0.4328(22.2266) (0.4135) (0.0171) (0.0027) (0.0111) (16.8293) (0.4509) (0.0163) (0.0049) (0.0075)
α = 1 118.5553 0.6170 0.0295 0.0468 0.3801 100.7166 1.0592 0.0470 0.0373 0.3212(23.5070) (0.4017) (0.0173) (0.0030) (0.0096) (17.9121) (0.4453) (0.0166) (0.0052) (0.0069)
CRSP300, Jan-03-2000 to Dec-31-2010, T = 2, 767
µ = 5% µ = 10%
SV(%) SR CER(%) TOR PAC SV(%) SR CER(%) TOR PAC
α = 0 52.5846 0.5876 0.0218 0.0530 1.0000 50.2325 0.4033 0.0164 0.0474 1.0000(16.4380) (0.4681) (0.0127) (0.0049) (0.0000) (14.6432) (0.4542) (0.0125) (0.0053) (0.0000)
α = 0.6 53.6010 0.9629 0.0328 0.0320 0.5674 51.0724 0.7185 0.0254 0.0246 0.5585(19.2751) (0.5067) (0.0123) (0.0019) (0.0081) (17.4733) (0.4937) (0.0124) (0.0024) (0.0088)
α = 1 55.3200 0.9966 0.0341 0.0290 0.4603 52.9987 0.7406 0.0263 0.0213 0.4549(20.1596) (0.5111) (0.0124) (0.0015) (0.0068) (18.5730) (0.4948) (0.0126) (0.0020) (0.0069)
56
Table 7: The table shows results of performances for daily FF100 data when three alternativepenalties: berhu, generalized l1 norm, and adaptive penalties are imposed. The linear con-straint is the full investment constraint wT1 = 1. We estimate Σ by sample covariance matrixwith expanding window scheme. Here N = 100 and initial window length τ0 = 120. For eachpenalty, we uniformly set the penalty parameter equal to atBt
√2 logN/nt. Testing period is
from Jan-02-1990 to Dec-31-2010 and T = 5, 295. SV denotes sample variance of out-of-samplenet returns (when the transaction fees are deducted). The sample variance is annualized. Thetransaction fee we consider is 35 basis points. SR denotes annualized Sharpe ratio, and theyearly risk free rate used is 3.63%. Certainty equivalence is obtained with ψ = 5. Column TORand PAC show average turnover rate and average proportion of active assets, respectively. Inthe parentheses are the bootstrap standard errors obtained from using stationary bootstrap ofPolitis and Romano (1994).
Berhu Penalty
SV(%) SR CER(%) TOR PAC
κ = 0.02 146.0434 0.8283 0.0400 0.0143 0.5483(32.1981) (0.3924) (0.0180) (0.0010) (0.0049)
κ = 0.05 119.1196 0.9940 0.0460 0.0178 0.4142(25.0352) (0.4391) (0.0176) (0.0013) (0.0062)
κ = 0.1 106.9375 1.1123 0.0498 0.0191 0.3531(22.1391) (0.4676) (0.0175) (0.0016) (0.0074)
Generalized l1 Norm Penalty
SV(%) SR CER(%) TOR PAC
TWN 104.0272 1.0798 0.0482 0.0221 1.0000(20.2157) (0.4463) (0.0166) (0.0018) (0.0000)
TWN − l1 114.8295 1.0561 0.0483 0.0166 0.6677(23.2896) (0.4355) (0.0170) (0.0014) (0.0047)
TWNS 98.5689 1.1792 0.0515 0.0200 0.3090(18.9658) (0.4638) (0.0166) (0.0018) (0.0068)
TWNS − l1 111.5312 1.0922 0.0495 0.0155 0.2196(22.5187) (0.4421) (0.0169) (0.0015) (0.0055)
Adaptive Penalty
SV(%) SR CER(%) TOR PAC
ε = 1, l = 1 96.5597 1.2096 0.0524 0.0208 0.2978(18.2046) (0.4696) (0.0163) (0.0019) (0.0075)
ε = 2.5, l = 1 107.2337 1.1244 0.0504 0.0178 0.1778(20.2679) (0.4479) (0.0166) (0.0019) (0.0052)
ε = 1, l = 2 96.4462 1.2107 0.0524 0.0210 0.2952(18.1246) (0.4692) (0.0163) (0.0020) (0.0077)
ε = 2.5, l = 2 107.8656 1.1139 0.0500 0.0190 0.1519(20.0121) (0.4452) (0.0166) (0.0025) (0.0052)
57
Table 8: The table shows results of performances for daily FF100 data when three alternativepenalties: berhu, generalized l1 norm, and adaptive penalties are imposed. The linear con-straint is the full investment constraint wT1 = 1. We estimate Σ by sample covariance matrixwith expanding window scheme. Here N = 100 and initial window length τ0 = 120. For eachpenalty, we uniformly set the penalty parameter equal to atBt
√2 logN/nt. Testing period is
from Jan-03-2000 to Dec-31-2010 and T = 2, 767. SV denotes sample variance of out-of-samplenet returns (when the transaction fees are deducted). The sample variance is annualized. Thetransaction fee we consider is 35 basis points. SR denotes annualized Sharpe ratio, and theyearly risk free rate used is 2.5%. Certainty equivalence is obtained with ψ = 5. Column TORand PAC show average turnover rate and average proportion of active assets, respectively. Inthe parentheses are the bootstrap standard errors obtained from using stationary bootstrap ofPolitis and Romano (1994).
Berhu Penalty
SV(%) SR CER(%) TOR PAC
κ = 0.02 57.2220 1.0880 0.0372 0.0230 0.4932(18.1071) (0.5389) (0.0133) (0.0019) (0.0054)
κ = 0.05 54.3572 1.0274 0.0349 0.0210 0.4617(18.6517) (0.5264) (0.0125) (0.0018) (0.0064)
κ = 0.1 53.7601 0.9926 0.0337 0.0201 0.4517(19.1032) (0.5181) (0.0122) (0.0018) (0.0075)
Generalized l1 Norm Penalty
SV(%) SR CER(%) TOR PAC
TWN 54.9798 0.7611 0.0271 0.0286 1.0000(18.7627) (0.4787) (0.0126) (0.0023) (0.0000)
TWN − l1 57.5252 0.9593 0.0333 0.0196 0.6340(20.7838) (0.5066) (0.0127) (0.0017) (0.0070)
TWNS 55.2197 0.9643 0.0331 0.0191 0.4346(19.5056) (0.4856) (0.0119) (0.0018) (0.0063)
TWNS − l1 58.2940 1.0160 0.0352 0.0158 0.3240(20.7498) (0.4924) (0.0123) (0.0014) (0.0045)
Adaptive Penalty
SV(%) SR CER(%) TOR PAC
ε = 1, l = 1 53.8823 0.9597 0.0328 0.0200 0.4468(18.8170) (0.4949) (0.0119) (0.0018) (0.0076)
ε = 2.5, l = 1 59.0413 0.9925 0.0346 0.0151 0.2851(20.8862) (0.4926) (0.0123) (0.0013) (0.0053)
ε = 1, l = 2 53.8860 0.9592 0.0328 0.0200 0.4467(18.8191) (0.4949) (0.0119) (0.0018) (0.0076)
ε = 2.5, l = 2 59.2664 0.9842 0.0344 0.0151 0.2808(20.9691) (0.4914) (0.0124) (0.0013) (0.0053)
58
Table 9: The table shows p-values of the stochastic dominance tests proposed by Linton et al.(2005). The data used here is the realized daily net returns (when the transaction fees arededucted) of FF100 as different portfolio strategies are used. The transaction fee we consideris 35 basis points. Testing period is from Jan-02-1990 to Dec-31-2010 and T = 5, 295. FSDand SSD denote first and second order stochastic dominances, respectively. The p-values areobtained from the subsampling method, and the subsample size is set to 300.
1/N NSMVP GMVP
FSD SSD FSD SSD FSD SSD
α = 0 0.0000 0.4235 0.0000 0.4582 0.7714 0.1229α = 0.2 0.0000 0.3881 0.0000 0.4023 0.5783 0.1111α = 0.4 0.0000 0.3811 0.0000 0.3805 0.6769 0.1087α = 0.6 0.0000 0.3755 0.0000 0.3765 0.4658 0.1063α = 0.8 0.0000 0.3705 0.0000 0.3717 0.4660 0.1019α = 1 0.0000 0.3639 0.0000 0.3647 0.3004 0.0915
Table 10: The table shows p-values of the stochastic dominance tests proposed by Lintonet al. (2005). The data used here is the realized daily net returns (when the transaction feesare deducted) of CRSP300 as different portfolio strategies are used. The transaction fee weconsider is 35 basis points. Testing period is from Jan-03-2000 to Dec-31-2010 and T = 2, 767.FSD and SSD denote first and second order stochastic dominances, respectively. The p-valuesare obtained from the subsampling method, and the subsample size is set to 300.
1/N NSMVP GMVP
FSD SSD FSD SSD FSD SSD
α = 0 0.0000 0.0539 0.1803 0.0960 0.5985 0.7848α = 0.2 0.0000 0.0636 0.0045 0.0985 0.1864 0.5733α = 0.4 0.0000 0.0827 0.0446 0.1102 0.3165 0.3780α = 0.6 0.0000 0.0879 0.1001 0.1179 0.4652 0.3383α = 0.8 0.0000 0.0891 0.0713 0.1216 0.4344 0.3428α = 1 0.0000 0.0900 0.0928 0.1240 0.9955 0.3408
59
References
Bai, J. and S. Ng (2008): “Forecasting economic time series using targeted predictors,”
Journal of Econometrics, 146, 304–317.
Bai, Z. and J. W. Silverstein (2010): Spectral Analysis of Large Dimensional Random
Matrices, Springer, 2nd ed.
Brandt, M. W., P. Santa-Clara, and R. Valkanov (2009): “Covariance Regularization
By Parametric Portfolio Policies: Exploiting Characteristics in the Cross-Section of Equity
Returns,” The Review of Financial Studies, 22, 3411–3447.
Brodie, J., I. Daubechies, C. D. Mol, D. Giannone, and I. Loris (2009): “Sparse and
stable Markowitz portfolios,” Proceedings of the National Academy of Sciences of the United
States of America, 106, 12267–12272.
Buhlmann, P. and S. van de Geer (2011): Statistics for High-Dimensional Data: Methods,
Theory and Applications, Springer.
Cochrane, J. H. (2011): “Presidential Address: Discount Rates,” Journal of Finance, 66,
1047–1108.
De Mol, C., D. Giannone, and L. Reichlin (2008): “Forecasting using a large number of
predictors: Is Bayesian shrinkage a valid alternative to principal components?” Journal of
Econometrics, 146, 318–328.
DeMiguel, V., L. Garlappi, F. J. Nogales, and R. Uppal (2009a): “A generalized ap-
proach to portfolio optimization: Improving performance by constraining portfolio norms,”
Management Science, 55, 798–812.
DeMiguel, V., L. Garlappi, and R. Uppal (2009b): “Optimal versus naive diversification:
How inefficient is the 1/N portfolio strategy?” Review of Financial Studies, 22, 1915–1953.
DeMiguel, V., F. J. Nogales, and R. Uppal (2010): “Stock Return Serial Dependence
and Out-of-Sample Portfolio Performance,” SSRN eLibrary.
Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani (2004): “Least angle regression,”
The Annals of Statistics, 32, 407–451.
El Karoui, N. (2009): “On the realized risk of high-dimensional Markowitz portfolios,”
Technical Report 784, Department of Statistics, UC Berkeley.
——— (2010): “High-dimensionality effects in the Markowitz problem and other quadratic
programs with linear constraints: Risk underestimation,” The Annals of Statistics, 38, 3487–
3566.
60
Fan, J., Y. Liao, and X. Shi (2013): “Risks of Large Portfolios,” SSRN eLibrary.
Fan, J., J. Zhang, and K. Yu (2009): “Asset allocation and risk assessment with gross
exposure constraints for vast portfolios,” Tech. rep., Princeton University.
——— (2012): “Vast Portfolio Selection with Gross-exposure Constraints,” Journal of the
American Statistical Association, 107, 592–606.
Fastrich, B., S. Paterlini, and P. Winker (2012): “Constructing Optimal Sparse Port-
folios Using Regularization Methods,” SSRN eLibrary.
Frahm, G. and M. Christoph (2011): “Dominating estimators for minimum-variance port-
folios,” Journal of Econometrics, forthcoming.
Friedman, J., T. Hastie, H. Hofling, and R. Tibshirani (2007): “Pathwise coordinate
optimization,” The Annals of Applied Statistics, 1, 302–332.
Gabaix, X. (2011): “A Sparsity-based model of bounded rationality,” NBER Working Paper
16911.
Garleanu, N. and L. H. Pedersen (2011): “Margin-based Asset Pricing and Deviations
from the Law of One Price,” Review of Financial Studies, 24, 1980–2022.
Hastie, T., R. Tibshirani, and J. Friedman (2009): The Elements of Statistical Learning,
Springer, 2nd ed.
Jagannathan, R. and T. Ma (2003): “Risk reduction in large portfolios: Why imposing
the wrong constraints helps,” Journal of Finance, 58, 1651–1684.
Jorion, P. (1986): “Bayes-Stein estimation for portfolio analysis,” Journal of Financial and
Quantitative Analysis, 21, 279–292.
Kan, R. and G. F. Zhou (2007): “Optimal portfolio choice with parameter uncertainty,”
Journal of Financial and Quantitative Analysis, 42, 621–656.
Kirby, C. and B. Ostdiek (2011): “Optimal Active Portfolio Management with Uncondi-
tional Mean-Variance Risk Preferences,” SSRN eLibrary.
Lai, T. L., H. Xing, and Z. Chen (2011): “Mean-variance portfolio optimization when
means and covariances are unknow,” The Annals of Applied Statistics, forthcoming.
Ledoit, O. and M. Wolf (2003): “Improved estimation of the covariance matrix of stock
returns with an application to portfolio selection,” Journal of Empirical Finance, 10, 603–
621.
Linton, O., E. Maasoumi, and Y.-J. Whang (2005): “Consistent Testing for Stochastic
Dominance under General Sampling Schemes,” Review of Economic Studies, 72, 735–765.
61
Owen, A. B. (2007): “A robust hybrid of lasso and ridge regression,” Contemporary Mathe-
matics, 443, 59–72.
Politis, D. N. and J. P. Romano (1994): “The Stationary Bootstrap,” Journal of the
American Statistical Association, 89, 1303–1313.
Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” Journal of the
Royal Statistical Society: Series B (Statistical Methodology), 58, 267–288.
Tseng, P. (2001): “Convergence of a block coordinate descent method for nondifferentiable
minimization,” Journal of Optimization Theory and Applications, 109, 475–494.
Tu, J. and G. Zhou (2011): “Markowitz meets Talmud: A combination of sophisticated and
naive diversification strategies,” Journal of Financial Economics, 99, 204–215.
Welsch, R. E. and X. Zhou (2007): “Application of robust statistics to asset allocation
models,” Revstat, 5, 97–114.
Yen, Y.-M. and T.-J. Yen (2011): “Solving Norm Constrained Portfolio Optimization via
Coordinate-Wise Descent Algorithms,” Working paper.
Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,”
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 301–320.
62