the cross-section of investing skill
Post on 16-Jun-2022
1 Views
Preview:
TRANSCRIPT
The Cross-Section of Investing Skill
Ravi Sastry
Columbia Business School
rsastry13@gsb.columbia.edu
www.columbia.edu/∼rvs2103/
this version: November 21, 2011
Abstract
Building on insights from the economics of superstars, I develop an
efficient method for estimating the skill of mutual fund managers. Outliers
are especially helpful for disentangling skill from luck when I explicitly
model the cross-sectional distribution of managerial skill using a flexible
and realistic function. Forecasted performance is dramatically improved
relative to standard regression estimates: an investor selecting (avoiding)
the best (worst) decile of funds would improve risk-adjusted performance
by 2% (3%) annually. Forecasted performance also helps to predict fund
flows, consistent with the smart money effect. The distribution of skill
is found to be fat-tailed and positively skewed; its shape helps to explain
the convexity of fund flows.
1 Introduction
With over $3 trillion in aggregate assets under management, the scale of the ac-
tive mutual fund industry is a testament to the conviction among investors that
skilled managers exist and can be identified in advance. Yet the unpredictabil-
ity of good performance remains the principal stylized fact in the mutual fund
literature, and is often interpreted to imply the absence of skill among fund
managers and the inability of investors to benefit from active mutual funds.1 In
1Berk and Green (2004) propose a model that reconciles unpredictability with the existenceof skill, but all benefits from skill accrue to fund managers rather than their investors.
1
this paper, I will show that this “fact” is the consequence of strong, unrealistic,
and often implicit assumptions about the distribution of “alpha”—namely that
alphas are independent across funds, or that alpha is normally distributed. I
propose a hierarchical model of investment performance that relaxes both as-
sumptions, and I provide empirical support for alternative “facts”: alphas are
predictable, and extreme performance is not always luck.
The assumption that alphas are independent across funds is what justifies
any fund-by-fund approach to performance evaluation.2 Although seemingly
reasonable, Jones and Shanken (2005) show that this is equivalent to assum-
ing, counterfactually, that the distribution of alphas across managers is known.
In the realistic case where the distribution of alphas is acknowledged to be un-
known, the returns of all funds intuitively help to illuminate the abilities of fund
managers as a group, which in turn enables improved inference for any partic-
ular fund. Jones and Shanken present a hierarchical model that formalizes this
cross-sectional “learning across funds”.
They also impose normality on the cross-sectional distribution of alpha, but
there is little evidence that alpha follows this convenient form.3 Rather, it is
a modeling choice intended to improve inference by reducing the influence of
outliers. When extreme alpha estimates are thought to be spurious, reasonable
remedies include shrinkage towards a normal distribution, or still more cautious
parameter filters that simply drop funds with estimated alphas outside some
allowable range.4 The surviving funds’ alphas can be interpreted with greater
confidence, but little is learned about the dropped funds. This is a disadvantage
of employing such sharp, and inherently subjective, parameter filters. But the
larger cost of treating outliers as problematic statistical artifacts is that the
extreme alphas might instead be the most interesting and informative cases—
economic rather than statistical outliers.
The “superstars” literature documents convex returns to talent across a
range of disciplines. Rosen (1981) describes the phenomenon: “In certain kinds
of economic activity there is concentration of output among a few individu-
als, marked skewness in the associated distributions of income, and very large
2Carhart (1997) uses OLS, but more recent methods allow substantially improved inference.See, e.g., Pastor and Stambaugh (2002), Bollen and Busse (2005), and Mamaysky et al. (2007).
3Kosowski et al. (2006) and Fama and French (2010) employ distinct bootstrap techniquesand document non-normalities in the alpha distribution. Kosowski et al. find more pronouncednon-normalities than do Fama and French. They also find stronger evidence of positive alphasnet of fees.
4Mamaysky et al. (2007) use a Kalman filter to estimate alphas and market betas, and
then drop funds where(α, βmkt
)falls outside a predefined region.
2
rewards at the top.” Subsequent work explores the pronounced disparities in,
for example: CEO compensation, rock concert revenues, television ratings for
NBA games, and citation counts for finance academics.5 Across diverse fields of
endeavor, observed outcomes are distinctly non-normal and positively skewed,
regardless of the distribution of underlying, and unobservable, talent. Viewed
through this lens, extreme alphas are the natural result of differences in innate
investment skill.
This perspective requires some semantic clarification. Alpha is often consid-
ered synonymous with investing skill, but these are distinct concepts. Invest-
ment skill itself—like the related attribute of intelligence—cannot be directly
observed, although it is associated with observable manager characteristics.6
Alpha is a measure of investment performance, an outcome that we expect to
be increasing in managerial skill. The economics of superstars shows that the
mapping between talent and outcomes is often convex, and that extreme alphas
may be the rule rather than the exception. The approach to performance eval-
uation in this paper incorporates these insights.7 (For the sake of consistency
with the mutual fund literature, alpha and skill are used interchangeably in the
rest of this paper.)
I present a hierarchical model where the cross-sectional distribution of alpha
is described by a mixture-of-normals, and each fund manager represents a draw
from this distribution. The mixture-of-normals is a flexible and realistic model-
ing choice, allowing—but not imposing—both skewness and fat tails. The data
will dictate whether these features are present.
Out-of-sample tests reveal significant short-term alpha predictability. The
“mixture” alphas are striking improvements over the estimates from OLS, which
in turn outperform alphas from a hierarchical model with a normal cross-
sectional distribution: imposing the wrong structure on the cross-section of
alpha can be worse than imposing no structure at all. The bottom decile port-
folio of mutual funds formed using the mixture-of-normals distribution is more
than twice as bad as the OLS bottom decile portfolio, while the top “mixture”
portfolio is nearly twice as good as the OLS top portfolio. (The alphas of the
top and bottom “mixture” decile portfolios are +5.4% and -6.1%, respectively.)
5These four examples are from Gabaix and Landier (2008), Krueger (2005), Hausman andLeonard (1997), and Chung and Cox (1990), respectively.
6See, e.g., Chevalier and Ellison (1999), Cohen et al. (2008), and Coval and Moskowitz(2001) who relate managerial performance to undergraduate institution quality, social ties tocorporate board members, and geographic proximity, respectively.
7The distinction between alpha and skill is also central to the model of Berk and Green.
3
I find that the true distribution of alpha is fat-tailed and positively skewed,
consistent with the economics of superstars. I also find that fund flows are
sensitive to these “mixture” alphas—evidence that investors respond to, and
benefit from, the skills of mutual fund managers. Finally, I show that the non-
normality of the alpha distribution can help to explain the convexity of fund
flows.8
The rest of the paper proceeds as follows. Section 2 describes the hierarchical
model in detail. Section 3 explains the data used for the analysis. Section 4
presents the main empirical results. Section 5 discusses these results and their
relation to the mutual fund literature. Section 6 concludes.
2 Model
I adopt a hierarchical Bayesian model, similar in spirit to Jones and Shanken
(2005), that allows for skewness and fat-tails in the cross-sectional distribution
of managerial skill. The parameters of this distribution are determined by the
data, including “outliers”. Skill is measured relative to a four-factor model
including the Fama and French (1993) factors and the momentum factor of
Carhart (1997). Prior distributions are weakly informative—proper but diffuse.
2.1 Background
The (semi-strong) efficient markets hypothesis implies that abnormal mutual
fund returns should not persist. Carhart (1997) provides strong supporting ev-
idence: adjusting for a momentum factor in addition to the three Fama-French
(1993) factors produces ranked portfolios of mutual funds without significant
persistence among top performing funds. However, Berk and Green (2004)
showed that if fund returns are decreasing functions of assets under manage-
ment, the absence of persistence in factor-model skill estimates is not incon-
sistent with the existence of skill.9 Furthermore, studies using shorter holding
periods than the one-year holding periods employed by Carhart have found
evidence of persistence.10 Skill may indeed exist, but it appears difficult to
identify.
8Fund flows are much more responsive to high performance than to poor performance.9Berk and Green (2004) also assume rational investor learning in the context of a common-
knowledge prior on the distribution of skill. This is an unrealistic feature of their model.10See, e.g., Bollen and Busse (2004).
4
Imperfect empirical methodologies may explain the difficulty. Jones and
Shanken (2005) observe that “. . . the maximum OLS alpha estimate [among a
population of funds] becomes unbounded as the number of funds approaches
infinity, even when the true alphas are all zero.”11 Mamaysky et al. (2007)
demonstrate that “. . . sorting on the estimated alphas populates the top and
bottom deciles not with the best and worst funds, but with those having the
greatest estimation error.” Carhart’s result appears less damning upon reconsid-
eration: alpha may seem non-persistent simply because it is poorly measured.
One “solution” is to drop funds with suspiciously extreme alphas. Another
approach is to use shrinkage estimators.12 Bayesian inference automatically
incorporates shrinkage towards a prior distribution. Inferences are made on all
funds, but the accuracy of the posterior estimates depends very strongly on the
shape of the prior.
Baks et al. (2001) employ a prior on skill consisting of the right tail of a
truncated normal distribution and a probability mass at the point of truncation.
They provide a procedure for eliciting the parameters of this prior from the
investor. Jones and Shanken use a normal prior on skill but do not specify its
mean or variance, instead estimating these along with all fund-level parameters
in a hierarchical procedure. Both priors are simple, but they do not reflect the
richness of the true distribution of skill.
Kosowski et al. (2006) document a complex non-normal distribution of mu-
tual fund alphas. They note that “the separation of luck from skill becomes ex-
tremely sensitive to the assumed joint distribution from which the fund alphas
are drawn” and motivate their non-parametric bootstrap approach by invoking
the “intractability of parametrically modeling the joint distribution of mutual
fund alphas across several hundred funds.” This paper tackles the same problem
head-on. The prevailing cross-sectional distribution of skill among mutual fund
managers is indeed complex, but it is not intractable.
2.2 Model specification
Excess fund returns are assumed to follow a standard linear factor structure,
Rjt −Rft = αj + Ftβ
j + εjt , (1)
11They assume a standard factor model with independent residuals, as in (1).12See Efron and Morris (1977) for an introduction to shrinkage estimation.
5
where εjt ∼ N(0, V jε
), factor loadings βj and residuals εj are cross-sectionally
uncorrelated, and the T ×K matrix of factor returns F is observable.
The investor assumes that true manager alphas are independent draws from
a finite mixture of two normal distributions, with density
f(αj)
=
N∑i=1
wi ·1√Vi· φ(αj − µi√
Vi
), (2)
where N = 2, φ (x) is the standard normal density, µi and Vi are the un-
known mean and variance of mixture component i, and the wi are unknown
non-negative weights that sum to one.13 This conditional distribution of alpha
is simple and flexible, allowing both skewness and excess kurtosis in the cross-
section of managerial skill but imposing neither.14 It also suggests an intuitive
interpretation of the two mixture components as “good” and “bad” manager
types.
Greater flexibility could be achieved at the expense of parsimony. Increasing
the number of normal mixture components is straightforward, although interpre-
tation is problematic for N ≥ 4. One could alternately employ finite mixtures
of Student’s t or Laplace distributions—imposing excess kurtosis—or a more
general unimodal skewed distribution.15
In the absence of strong justification for any specific alternative distribution,
however, it is reasonable to begin with the simplest choice that allows some
variability in skewness and kurtosis. The distribution in (2) fits the bill, while
minimizing computational complexity and the impact of specification searching.
2.3 Bayesian approach
The hierarchical model defined by (1) and (2) is conceptually simple, yet es-
timation with classical methods is not feasible. Consider the sheer number of
parameters: for each of the J funds there are K + 2 parameters, in addition
13Without additional restrictions, the mixture of normals in (2) is not identified. An ad-ditional identifying restriction, µ2 ≥ µ1, is imposed without loss of generality to allow esti-mation. Stephens (2000) describes the problem in more detail and proposes an alternativeresolution.
14The distribution is conditional because it depends on the unknown population hyper-parameters (µi, Vi, wi) that will be estimated along with the fund-level parameters(αj , βj , V jε
).
15See Fernandez and Steel (1998) for a general approach to modeling skewed unimodaldistributions in a Bayesian context. Their method allows skewness and kurtosis to varyindependently; an advantage over the simpler model in (2).
6
to 5 population-level hyper-parameters.16 For any reasonable values of J and
K, the prospects for successfully implementing a high-dimensional numerical
optimization, as required by MLE or GMM, are dim.
The Markov chain Monte Carlo (MCMC) approach to the estimation of
Bayesian models breaks this curse of dimensionality. Rather than dealing di-
rectly with a single intractable [J · (K + 2) + 5]-dimensional joint posterior dis-
tribution, we can equivalently work with the [J · (K + 2) + 5] individual 1-
dimensional conditional posterior distributions, which together fully charac-
terize the joint posterior distribution. This equivalence is guaranteed by the
Hammersley-Clifford theorem.17
2.4 Prior specification
A complete Bayesian model requires both likelihoods and priors. The former
are defined by (1) and (2). The latter are described below.
An ideal prior distribution (i) accurately represents the investor’s prior in-
formation and beliefs while (ii) enabling easy sampling from the posterior dis-
tribution. The second condition is often automatically satisfied by a conjugate
prior, which gives rise to a known analytical posterior distribution from the
same family as the prior. For example, a normal prior and a normal likelihood
result in a normal posterior. Conjugacy is a valuable property, but is not always
consistent with (i). In such cases, I relax (ii) in favor of (i).
A common criticism of Bayesian inference is that the inherently subjective
prior distribution can exert substantial influence on posterior parameter esti-
mates. I minimize subjectivity by choosing intentionally uninformative priors.
By this, I mean diffuse proper distributions. These are sometimes also called
weakly informative priors to distinguish them from improper priors.18
Putting aside complex approaches to uninformative priors that are them-
selves subject to criticism,19 I follow the pragmatic lead of Leamer (1978):
16The K+ 2 parameters are the managerial skill and the idiosyncratic variance, in additionto K factor loadings.
17See, e.g., Robert and Casella (2004).18Any valid probability density must have a finite integral over the relevant parameter
space. An improper prior is a function whose integral is infinite, hence the impropriety.Typical improper priors would be f (θ) ∝ 1 for a location parameter, or f (σ) ∝ 1
σfor a scale
parameter. While superficially appealing, such priors do not always lead to valid posteriors.(See, e.g., Lindley (1972).) They are not even uninformative. The improper prior f (θ) appearsto be uninformative with respect to θ, but it also suggests that |θ| is extraordinarily large andthat 1
θis very close to zero.
19For example, the Jeffreys prior or Zellner’s g-prior.
7
“. . . we can find a prior distribution that will have little impact on
the posterior. It is important to understand that this is not a prior
intended to represent ignorance. . . In practice, the sample may dom-
inate the prior information and the posterior distribution may be
inconsequentially different from a posterior distribution correspond-
ing to an improper [uninformative] prior. A prior density that is
relatively uniform where the likelihood function attains its maximum
is likely to imply such a posterior.”20
The priors on µi and each βjk are independent normal distributions with
means of zero and large variances, with the exception of βjmkt which has a prior
mean of one. The priors on Vi and V εj are independent gamma distributions
with shape parameters of one and large scale parameters.21 The prior on w =
[w1, ..., wN ] is a Dirichlet distribution with an N × 1 vector of ones as the
parameter.22 The prior on αj is already defined by (2). See Appendix A for
a detailed presentation of the prior and posterior distributions for all model
parameters.
2.5 Hierarchical structure
Having specified a complete Bayesian model in some detail, it may seem self-
defeating to select priors that convey no information. Such a view overlooks
the importance of the hierarchical structure of the model. The power of this
structure to influence posterior parameter estimates is not undone by arbitrarily
diffuse priors, as long as they are proper.23 Indeed, the chief justification for
choosing diffuse priors is to ensure that the estimation results are driven by the
model itself rather than any specific choice of priors.
If we altered the model to eliminate (2) and estimated (1) with diffuse priors,
this would be equivalent to OLS. However, the estimates generated by the full
model are radically different from those given by OLS—these differences must
be attributed to the hierarchical structure itself.
20pp. 61-63, emphasis added.21A more common choice would be inverse-gamma distributions, which have the virtue of
conjugacy. However, these are not consistent with condition (i) or Leamer’s rule of thumb,especially when the posterior parameter estimates are likely to be close to zero. See Gelman(2006).
22This results in a marginal prior for each wi that is beta distributed with parameters 1and N − 1. For the degenerate case where N = 2 that is employed here, this is simply theuniform distribution on the unit interval.
23Improper priors are another story, since they can effectively preclude the cross-sectionallearning that is integral to hierarchical model estimation.
8
3 Data
Mutual fund data are obtained from the CRSP Survivor-Bias-Free US Mutual
Fund Database, which begins in 1961. Monthly returns are net holding period
returns—all estimated alphas are after costs and management fees.24 This
takes the perspective of an investor considering allocating capital to active fund
managers, rather than that of a philosophical investigation of the existence of
skill. Alpha only matters to the extent that it exceeds costs.
Monthly factor returns are obtained from Kenneth French’s website via
WRDS. I use a four-factor model, consisting of the three Fama-French factors
and a momentum factor, throughout the analysis.
The CRSP data are incomplete in several key ways. Multiple share classes
of a single fund appear as separate funds in CRSP, with no foolproof way of
identifying and grouping them. Addressing this shortcoming is especially im-
portant in this analysis, since double-counting funds would impair estimation
of the cross-sectional distribution of skill. Furthermore, CRSP data regarding
fund objectives and fund managers are not comprehensive. Each of these is
addressed in turn.
3.1 Multiple share classes
The cross-sectional distribution of skill in (2) can only be estimated if each fund
in the sample represents a distinct draw from the distribution. Many mutual
funds have multiple share classes, and each class appears in the CRSP database
as a separate fund. To proceed with the analysis, these “funds” need to be
identified in order to avoid double-counting. Unfortunately, there is no explicit
share class variable in the database.
CRSP suggests matching “funds” that share the same portfolio of holdings,
but portfolio data are not available until 1998. The only alternative is to exam-
ine management company and fund names, matching as deemed appropriate.
This latter approach is employed here.
Funds without fund name data are consequently excluded from the analysis.
This affects approximately 10% of the funds at the beginning of the sample in
1961 but generally disallows less than 5% of all funds.
24To reconstruct gross returns, one should add back management expenses and tradingcosts. Expense ratios are available for most funds, so these could be added back to thereported net returns. However, trading costs would need to be estimated in order to computeapproximate gross returns. See, e.g., Wermers (2000) and French (2008).
9
02000000
4000000
6000000
8000000
1.0
0e+
07
Num
ber
of F
und N
am
e P
airs
0 20 40 60 80 100Levenshtein distance
Panel A: Entire Sample
05000
10000
15000
20000
25000
Num
ber
of F
und N
am
e P
airs
0 2 4 6 8 10Levenshtein distance
Panel B: Distance <= 10
Figure 1: Levenshtein distances of all pairs of fund names. Panel A shows theentire CRSP sample. The mode is quite large – between 40 and 50. Panel Bzooms in to show only distances less than or equal to 10, clearly indicating thepresence of a secondary mode at 1.
Fund names of the remaining funds are filtered to remove all non-alphabetic
characters and extra spaces. Based on the general pattern that share classes of
the same fund will have the same name followed by a suffix denoting the share
class, funds are then grouped according to their names’ pairwise Levenshtein
distances.25 Panel A of Figure 1 shows a histogram of all pairwise Levenshtein
distances for the entire sample of 20,000+ funds. The mode is between 40 and
50, and it appears that very few pairs of funds have similar names. Panel B
of Figure 1 shows the same histogram only for distances less than or equal to
10. Here we can see that there is a secondary mode for distances equal to 1.
Although most random pairs of funds have names that are nothing alike, some
pairs have names that are nearly identical.
Funds with Levenshtein distances less than 3 are assumed to be share classes
of a single actual fund. From each grouping of fund share classes, the class with
the most assets under management is retained to represent the fund. Figure 2
shows the total number of “funds” in the CRSP database, the number of funds
25The Levenshtein distance between two strings is the minimum number of edits neededto transform one string into the other, with the allowable edit operations being insertion,deletion, or substitution of a single character. See Levenshtein (1966).
10
01
00
00
20
00
03
00
00
Nu
mb
er
of
Fu
nd
s
1960 1970 1980 1990 2000 2010
Year
funds in CRSP database
funds with fund name data
funds after share class filter
Figure 2: Number of funds in the sample before and after the share class filter.Essentially all funds have fund names recorded in the database (dashed line),but the majority do not survive the share class filter (dotted line).
that have fund name data, and the number of true funds that survive the share
class filter.
3.2 Fund objectives
This analysis is confined to actively managed domestic equity mutual funds.
Fund names and Lipper class (where available) are screened to drop money
market, bond, balanced, and international funds. Index funds are identified
by screening fund names for explicit references to indexing. Closet indexers
cannot be identified in this way, and any such funds remain in the sample.
This corresponds to the challenge faced by investors in prospectively identifying
closet indexers. Furthermore, such funds will in general have estimated alphas
close to zero, less their trading costs and management fees. They are unlikely
to populate the highest or lowest deciles of ranked mutual funds, which are the
basis of the key results of this study.
11
01
00
02
00
03
00
04
00
0N
um
be
r o
f F
un
ds
1960 1970 1980 1990 2000 2010Date
distinct active domestic equity funds
distinct active domestic equity funds with manager name data
Figure 3: Effect of requiring fund manager information on sample size. Thesolid line represents the fund sample used in the rest of the analysis. Thesudden jump in the number of funds in 2007 corresponds to a change in thedata sources used by CRSP.
3.3 Fund managers
Data on fund managers in the CRSP database are not comprehensive. No
manager name data is available before 1998. Even after 1998, in many cases,
the fund manager field is missing or recorded as team.26 Utilizing individual
fund manager data is therefore not feasible without impoverishing the sample.
The full impact of requiring individual manager data is depicted in Figure 3.
One consequence of the lack of reliable manager data is that the funds them-
selves become the objects of analysis. Ideally, when a fund changes its portfolio
manager, the new prior on its alpha would depend on the new manager’s previ-
ous fund’s returns. Since the data do not allow managers to be tracked in this
way, I use funds’ returns as if no management changes have occurred.27 This
does not introduce any bias but does weaken the model’s predictive ability for
funds that have changed managers during the evaluation period.
26Massa et al. (2010) discuss why fund management companies may elect to have anony-mously managed funds.
27This approach assumes fund returns are generated according to (1) and depend only onthe fund manager. If the fund family makes a difference, then future returns are related topast returns even after a managerial change. Brown and Wu (2011) estimate such a model ina Bayesian setting.
12
4 Results
For each month, alphas are estimated for all funds with at least 12 monthly
observations during the 2-year window [t− 23, t]. Funds are evaluated using
each of three methods: OLS, hierarchical Bayes with a mixture-of-normals prior,
and hierarchical Bayes with a “standard” normal prior.28 All funds are ranked
according to their estimated alphas and assigned to deciles. Funds that are
missing during month t+ 2 are dropped; equally-weighted portfolios are formed
from the remaining funds.29 These portfolios are held for one month and then
rebalanced according to the updated alpha estimates.
Following Busse and Irvine (2006), the alpha of each of these decile portfolios
is estimated using standard OLS regressions. When estimating the alpha of a
portfolio of funds, there is likely to be little difference between the OLS measure
and any reasonable Bayesian measure based on the same model. Using the OLS
post-ranking-period measure avoids any bias that could result from using the
Bayesian measure for both the ranking-period and the post-ranking-period.
4.1 Parameters of the mixture-of-normals distribution
Posterior means and 95% confidence intervals of the parameters in (2) are shown
in Figure 4 and Figure 5. Each rolling estimation window is 24 months long, so
the estimates depicted for month t correspond to fund performance during the
window [t− 23, t].
Figure 4 shows posterior estimates of both mixture component means, µ1
and µ2, and the weight on the second mixture component, w2. By construction,
and without loss of generality, the mean of the second component is greater than
or equal to the mean of the first component. Panel A shows that µ1 is generally
indistinguishable from zero, although it does diverge from zero towards the end
of the sample – positively during the very early 2000s and negatively since
then. Panel B shows that µ2 follows a pattern similar to µ1, essentially always
statistically equal to zero. Panel C shows that the proportion of managers drawn
from the second, high skill, component has always been low and has steadily
decreased over time – falling from 20% at the start of the sample to 10% at the
end.
28This last method follows Jones and Shanken (2005) and allows a direct assessment of thevalue of using a sufficiently flexible prior distribution.
29A one-month gap is introduced between the ranking-period and the post-ranking-periodto eliminate spurious microstructure-induced correlations.
13
Figure 5 shows posterior estimates of the standard deviations of both mixture
components. Far more managers are drawn from the first component, so the
confidence intervals on√V1 are much tighter than those on
√V2. Nevertheless,
it is clear that√V2 is an order of magnitude larger than
√V1.
The overall distribution has substantial excess kurtosis because the indi-
vidual variances are unequal; in this case V2 � V1. This, plus the fact that
µ2 > µ1 gives rise to positive skewness as well. Both findings are consistent
with the economics of superstars.
Although one virtue of the mixture-of-normals distribution was the poten-
tially straightforward interpretation of the components as good and bad man-
ager types, the high dispersion of the second mixture component suggests a
similar but more nuanced interpretation. Managers with extremely high or low
relative alphas are likely to be drawn from the second mixture component, con-
sistent with positively correlated alpha and risk-taking: managers drawn from
this high-alpha, high-variance component are taking aggressive risks that pay
off on average but that expose their funds to significant losses.
4.2 Comparison of skill distribution moments, by prior
Posterior moments of the distribution of skill for all three “priors” are shown in
Figure 6.30 Moments for both hierarchical Bayes priors—normal and mixture-
of-normals—are computed directly from their posterior parameter estimates.
Moments for the OLS distribution are computed from the estimated fund alphas.
Panel A compares the estimated means of the prevailing distribution of skill
from OLS and the mixture model. The estimated mean from the normal hi-
erarchical model is nearly identical to that of the mixture model and is not
depicted. The posterior mean from the mixture model behaves as we might
expect: it generally follows the OLS mean but shrinkage mitigates the larger
fluctuations.
Panel B compares the estimated standard deviations of the prevailing dis-
tribution of skill. The susceptibility of OLS to the influence of outliers is clear.
The normal hierarchical model generates the lowest estimates—shrinkage in
its strongest form. In between lies the estimated standard deviation from the
30OLS may not be an explicit Bayesian prior, but OLS estimates will exactly match Bayesianestimates if the priors are dogmatically diffuse; e.g. normal distributions with infinite vari-ances. If such strictly uninformative, improper priors are specified, there can be no shrinkageand the posterior alpha estimates will depend only on the likelihood. Thus, Bayesian estima-tion would yield the maximum-likelihood estimates for a regression model such as (1), whichare equivalent to the OLS estimates.
14
−.4
−.2
0.2
%,
mo
nth
ly
1960m1 1970m1 1980m1 1990m1 2000m1 2010m1
Panel A: Mixture Component 1 Mean−
.50
.51
1.5
%,
mo
nth
ly
1960m1 1970m1 1980m1 1990m1 2000m1 2010m1
Panel B: Mixture Component 2 Mean
0.1
.2.3
.4p
rop
ort
ion
1960m1 1970m1 1980m1 1990m1 2000m1 2010m1
Panel C: Weight on Mixture Component 2
Figure 4: Rolling posterior estimates and 95% confidence intervals for the meansof both mixture components and the weight on the second mixture componentin (2). The mean of the second component is constrained to be higher than themean of the second component.
15
0.1
.2.3
.4%
, m
on
thly
1960m1 1970m1 1980m1 1990m1 2000m1 2010m1
Panel A: Mixture Component 1 Standard Deviation0
24
6%
, m
on
thly
1960m1 1970m1 1980m1 1990m1 2000m1 2010m1
Panel B: Mixture Component 2 Standard Deviation
Figure 5: Rolling posterior estimates and 95% confidence intervals for the stan-dard deviations of both mixture components in (2).
16
−.4−.20.2.4 19
60
m1
19
70
m1
19
80
m1
19
90
m1
20
00
m1
20
10
m1
OL
SM
ixtu
re
Panel A
: M
eans
0.511.52 19
60
m1
19
70
m1
19
80
m1
19
90
m1
20
00
m1
20
10
m1
OL
SN
orm
al
Mix
ture
Panel B
: S
tandard
Devia
tions
−505 19
60
m1
19
70
m1
19
80
m1
19
90
m1
20
00
m1
20
10
m1
OL
SM
ixtu
re
Panel C
: S
kew
ness
01020304050 19
60
m1
19
70
m1
19
80
m1
19
90
m1
20
00
m1
20
10
m1
OL
SM
ixtu
re
Panel D
: K
urt
osis
Fig
ure
6:C
omp
aris
onof
the
mom
ents
ofth
ed
istr
ibu
tion
of
skil
lov
erti
me,
by
mod
el.
InP
an
elA
,th
em
ean
for
anorm
al
pri
oris
alm
ost
iden
tica
lto
that
for
am
ixtu
reof
norm
als
,an
dis
om
itte
dfo
rth
esa
keof
clari
ty.
InP
an
elC
,all
skew
nes
sva
lues
wit
hm
agn
itu
de
grea
ter
than
5ar
ese
tto
+5
or-5
,as
ap
pro
pri
ate
,to
pre
vent
ou
tlie
rsfr
om
dom
inati
ng
the
figu
re.
Sim
ilarl
y,in
Pan
elD
,al
lku
rtos
isva
lues
grea
ter
than
50are
set
to50.
17
mixture hierarchical model. This model is flexible enough to take account of
extreme alphas without being dominated by them.
Panel C compares the estimated skewness of the prevailing distribution of
skill from OLS and the mixture model. The normal model, of course, is con-
strained to have zero skewness. OLS cannot decide whether skill is positively or
negatively skewed. Indeed, the series is so volatile and the excursions from zero
are so large (in both directions) that it cannot possibly correspond to reality.
The mixture model produces relatively stable and comprehensible estimates of
mild positive skewness.
Panel D tells a similar story via the kurtosis estimates. The normal model
is constrained to have a kurtosis of three, while both OLS and the mixture
model “agree” that kurtosis is likely to be much higher—although they diverge
regarding how much higher. Here is clear evidence that the default Bayesian
approach, with a normal prior on alpha, over-shrinks the estimated alphas in
this case. Fitting to a normal distribution assumes away large outliers, while a
more flexible model can learn from them.
4.3 Estimated cross-sectional distribution of alpha
The moments of the distribution of alpha are revelatory, but the distribution
itself completes the story. The overall distribution of skill is shown in Figure
7. This view makes the distribution appear normal, but the pivotal deviations
from normality are in the tails.
Understanding the tail behavior requires an examination of the individual
mixture components. Both components, scaled to reflect their relative weights,
are shown in Figure 8. The two components differ dramatically in terms of
relative weighting and dispersion. Component 1 represents approximately 85%
of all managers and requires that alpha is fairly close to zero. Component 2,
representing the remaining 15% of fund managers, allows for extreme alphas in
both directions.
Figure 9 illustrates this extreme tail behavior. Although component 1 assigns
approximately zero probability to alphas larger than 0.6% monthly, component
2 assigns small positive probabilities to alphas as high as 3% monthly. Rela-
tive to a normal model of alpha, which completely discounts the possibility of
“superstar” fund managers capable of dramatic outperformance, the mixture-
of-normals allows the upper echelon of managers to exhibit superior alphas.
18
-1.0 -0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
2.0
Α
pHΑL
Figure 7: Overall estimated cross-sectional distribution of alpha from 2004,summing the individual components shown in Figure 8. Alpha is monthly andmeasured in percent. The distribution appears generally normal in this zoomed-out view, since the weight on mixture component 1 is much larger than theweight on mixture component 2. A closer examination of the tails, depicted inFigure 9, reveals pronounced non-normality.
-1.0 -0.5 0.0 0.5 1.0
0.0
0.1
0.2
0.3
0.4
Α
pHΑL
Figure 8: Estimated mixture components from 2004, chosen to be representa-tive. Component 1 is depicted by the solid line, while the higher-mean com-ponent 2 is depicted by the dashed line. Alpha is monthly and measured inpercent. Component 2 has a slightly higher mean and a much higher variancethan component 1.
19
0.5 1.0 1.5 2.0 2.5 3.0
0.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035
Α
pHΑL
Figure 9: Tail behavior in the cross-sectional distribution of skill from 2004—azoomed-in view of the distributions in Figure 8. Component 1 is depicted bythe solid line, while the higher-mean component 2 is depicted by the dashedline. Alpha is monthly and measured in percent. The probability of observingextremely high alphas is several orders of magnitude higher than it would be ina model that imposed normality on alpha, a result of allowing for the secondcomponent.
4.4 Decile portfolio performance
Decile portfolios are formed using all three ranking methods and their abnormal
returns are estimated using OLS. The power of the mixture-of-normals prior is
made clear in this test. Numerical results are given in Table 1 and shown
graphically in Figure 10.
Even OLS reveals substantial short-term predictability. The worst funds
(deciles 1 and 2) continue to under-perform and the best funds (deciles 9 and
10) continue to outperform. Carhart documented persistent poor performance
over a longer time frame, attributing it to high fees. Bollen and Busse, among
others, have more recently documented persistent outperformance in the short
term using OLS. Thus, the results from the OLS decile portfolios are just as
expected.
Results using rankings from the normal hierarchical model are telling—they
are substantially weaker than those from OLS. The performance of the bottom
decile portfolio is better than the OLS bottom decile portfolio, and the perfor-
mance of the top decile is worse. The normal prior does a worse job of sorting
20
Out-of-Sample Monthly Alphas of Decile Portfolios
Decile OLS Normal Prior Mixture Prior
1 -0.21* -0.14* -0.51*(0.05) (0.04) (0.03)
2 -0.12* -0.16* -0.37*(0.04) (0.04) (0.04)
3 -0.07 -0.09* -0.34*(0.04) (0.04) (0.04)
4 -0.05 -0.07 -0.18*(0.03) (0.04) (0.04)
5 -0.03 -0.02 -0.10*(0.03) (0.04) (0.04)
6 0.00 0.04 0.25*(0.03) (0.08) (0.05)
7 0.03 0.08 0.25*(0.03) (0.05) (0.05)
8 0.06 0.06 0.32*(0.04) (0.04) (0.04)
9 0.10* 0.15* 0.36*(0.04) (0.05) (0.04)
10 0.27* 0.15* 0.45*(0.07) (0.04) (0.04)
Table 1: Out-of-sample monthly alphas of decile portfolio by ranking method,1961-2010. At the end of each month t, alphas are estimated for each mutualfund using each method and the past 24 monthly returns. Funds are ranked,and equally-weighted decile portfolios are formed and held during month t+ 2.Alphas for these portfolios are estimated using OLS, with bootstrapped standarderrors.
21
−.5
0.5
%,
mo
nth
ly
0 2 4 6 8 10decile
Panel A: OLS
−.5
0.5
%,
mo
nth
ly
0 2 4 6 8 10decile
Panel B: Normal
−.5
0.5
%,
mo
nth
ly
0 2 4 6 8 10decile
Panel C: Mixture
Figure 10: Out-of-sample OLS alphas and 95% confidence intervals for decileportfolios, by ranking method, 1961-2010. Data are taken from Table 1. PanelA shows decile portfolios formed from funds ranked on their OLS alphas. PanelB shows decile portfolios formed from funds ranked on their hierarchical Bayesalphas with a normal prior. Panel C shows decile portfolios formed from fundsranked on their hierarchical Bayes alphas with a mixture-of-normals prior.
22
funds. There is valuable information in the extreme alpha estimates that the
normal model is forced to disregard.
The results from the mixture hierarchical model are striking: all portfolios
have statistically significant alphas, decile 1 under-performs OLS decile 1 by 30
basis points and decile 10 outperforms OLS decile 10 by 18 basis points. When
annualized, these translate to 360 basis points and 216 basis points, respectively.
The differences relative to the normal hierarchical estimates are even starker.
In absolute terms, the mixture decile 1 portfolio posts an annual alpha of -6.1%
and the mixture decile 10 portfolio posts an annual alpha of +5.4% during the
fifty-year period 1961-2010. The mixture bottom decile portfolio is more than
twice as bad as the OLS bottom decile portfolio, while the mixture top decile
portfolio is nearly twice as good as the OLS top decile portfolio.
4.5 Fund-by-fund alpha correlations
The mixture model generates superior portfolios of funds; its fund-level esti-
mates are better as well. Using the same two-year estimation windows, we can
compare the estimated fund alphas for the interval [t− 23, t] to the estimated
alphas for the non-overlapping interval [t+ 2, t+ 25].31 Let the vector of esti-
mated fund alphas generated by a particular model k on the interval [ta, tb] be
denoted by αk[ta,tb], and define the correlation of the alphas on the first interval
with the alphas on the second interval to be
ρk1,k2t = correlation(αk1[t−23,t], α
k2[t+2,t+25]
). (3)
Funds that do not appear in both intervals are dropped before the correlation
is computed. I expect that the typical fund used in this part of the analysis is
consequently better than average. Even if the level of alpha is biased upwards,
this does not imply that the serial correlation is biased.
When comparing the performance of two models, k1 and k2, in predicting
the estimates generated by model k3, it is also useful to define:
k1,k3∆k2,k3t = ρk1,k3t − ρk2,k3t . (4)
31By evaluating the estimation methods in this way, I assume that the short-term pre-dictability in fund alphas is not completely offset by any un-modeled longer-term negativecorrelation.
23
Panel A: Serial Correlation of Estimated Fund Alphas, by Method
ρOLS,OLSt ρNormal,Normalt ρMixture,Mixturet ρNormal,OLSt ρMixture,OLS
t
0.141* 0.116* 0.171* 0.104* 0.159*(0.006) (0.005) (0.005) (0.004) (0.005)
Panel B: Differences in Serial Correlations of Estimated Fund Alphas
Normal,Normal∆OLS,OLSt
Mixture,Mixture∆OLS,OLSt
Mixture,OLS∆OLS,OLSt
-0.025* 0.031* 0.019*(0.006) (0.004) (0.004)
Table 2: In Panel A, serial correlations of estimated fund alphas are com-puted for rolling, non-overlapping, two-year intervals, 1961-2010. In the lasttwo columns, the hierarchical models are used to predict OLS alphas. In PanelB, differences in these correlations are reported.
Results are shown in Table 2.32 From the first three columns in Panel
A, we see that all alphas are positively serially correlated, consistent with the
predictability suggested by the performance of the decile portfolios. The last two
columns show the correlations of alphas generated by the hierarchical models in
the first period with the OLS alphas from the second period. These are positive
as well, and indicate that the positive serial correlations are not artifacts of the
Bayesian procedure.
Panel B of Table 2 shows the differences in the correlations. In the second
column, we can see that the mixture model’s estimates are indeed more strongly
serially correlated than the OLS estimates. Most critically, the third column
shows that the mixture model predicts the OLS estimates better than OLS
itself.
5 Discussion and Implications
The persistent performance of the decile portfolios is impossible to fully reconcile
with the rational model of Berk and Green (2004). In their model, managers
can possess investing skill but do not exhibit persistent non-zero alphas due to
convexities in the costs of managing their funds. As funds grow in size, positive
alpha becomes increasingly difficult to generate. In equilibrium all funds have
32The same analysis was also performed using rank correlations with essentially identicalresults.
24
zero alphas in expectation, but vary in size according to their managers’ varying
levels of skill. This equilibrium requires investors to evaluate managers in the
context of a known cross-sectional distribution of alpha, and to dynamically
re-allocate capital across funds in each period.
In contrast to the equilibrium predictions of Berk and Green, I find markedly
persistent performance in the short-term. At least one of their assumptions
must not hold. Costs may not be convex. Reuter and Zitzewitz (2010) use
fund flows associated with discrete changes in Morningstar fund ratings to show
that their effect on fund performance is inconsistent with the Berk and Green
hypothesis. There may be frictions associated with capital re-allocation that
prevent investors from responding immediately to new estimates of manager
skill.
The most indisputable deviation from the Berk and Green world, however,
is that investors do not know the true cross-sectional distribution of alpha. In
reality, as in the model presented here, this distribution and the individual fund
alphas must be estimated simultaneously. Such an analysis requires MCMC
methods—somewhat beyond the capabilities of the typical mutual fund investor,
or even the typical financial advisor. The sheer difficulty of the evaluation is
sufficient to cause slow learning, even among competent and vigilant investors.
Even if all of the other assumptions of Berk and Green hold, this slow learning
implies the short-term persistence of alphas that I document.
Flow predictability regressions presented below show that investors do re-
spond to the information contained in the Bayesian mixture alphas. Whatever
heuristics they employ to evaluate funds must incorporate the intuition of my
model if not its details.
5.1 Convexity of fund flows
Fund flows have a convex relationship with past fund performance. Outflows in
response to poor performance are less pronounced than the inflows in response
to good performance.33 Lynch and Musto (2003) offer an explanation: if poor
performance makes changes in investment strategy more likely, then poor per-
formance is less informative than good performance regarding future returns.
Berk and Green (2004) also predict a convex flow-performance relationship—a
consequence of the convex costs of management in their model.34
33See, e.g., Sirri and Tufano (1998) and Chevalier and Ellison (1997).34See Figure 3 in Berk and Green (2004).
25
Another possibility is that the optimal updating rule for posterior skill es-
timates is itself convex. While a normal prior (along with a normal likelihood)
gives rise to a linear updating rule, the mixture prior proposed here generates
an updating rule that is nonlinear for observed returns that are not extremely
far from zero, and convex over the region of greatest interest.35
Assume that the true cross-sectional distribution of skill is a mixture-of-
normals, as in (2). For comparison, I also consider a normal prior on skill. To
obtain the best approximation, the mean and variance of the normal prior are
set to match the mean and variance of the mixture distribution:
µ = (1− w2) · µ1 + w2 · µ2 (5)
and
V = (1− w2) · V1 + w2 · V2 + w2 · (1− w2) · (µ1 − µ2)2
. (6)
Assume that fund returns are simply managerial skill plus Gaussian distur-
bances: a simplified version of the model in (1) with no risk factors.
After observing a single return R, investors update their estimates of man-
agerial skill. The posterior expectation of skill under the normal prior is:
Enormal [α] =
(µ
V+R
Vε
)·(
1
V+
1
Vε
)−1
. (7)
The posterior expectation of skill under the mixture prior is:
Emixture [α] =
N∑wi·
i=1
(µiVi
+R
Vε
)·(
1
Vi+
1
Vε
)−1
, (8)
where
wi =wi√Vi + Vε
· exp
[−1
2· (R− µi)2
Vi + Vε
]. (9)
To make this more concrete, the posterior expectations, (7) and (8), of alpha
corresponding to each of these priors after a single return R are shown in Figure
11. Here we can easily see how sensitive the posterior beliefs are to the investor’s
prior, even when both possible priors have matching first and second moments.
35The details depend on the model parameters, but the intuition is that extreme obser-vations are almost certainly drawn from the mixture component with the highest variance.Since this component is normal, the updating rule has a linear asymptote.
26
-4 -2 0 2 4
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
R
Α
`
Posterior Estimates
Figure 11: Updating rules for managerial skill with a mixture-of-normals prior(dashed line) and a normal prior with matching first and second moments (solidline). R is a monthly return measured in percent. The parameters of the mixturedistribution are: µ1 = −1, µ2 = 0, V1 = 0.01, V2 = 1, and w2 = 0.1. Pluggingthese into (5) and (6), we get the following parameters for the “matching”normal distribution: µ = −0.9 and V = 0.2. The updating rule for the normalprior is linear. The updating rule for the mixture prior has a linear asymptotebut is nonlinear for non-extreme returns, and is generally convex for monthlyreturns between -4 and +4 percent.
27
The updating rule for the mixture prior is generally convex for non-extreme
monthly returns. If the relationship between skill estimates and observed returns
is convex, and the relationship between fund flows and skill estimates is not too
concave, we should expect to observe convex flows.
The mixture-of-normals prior is not special in this regard. Any non-normal
prior will generate a nonlinear updating rule. A more flexible prior than the
two-component mixture should be able to match the observed flow-performance
relationship even more closely.
5.2 Predictability of fund flows
This paper describes a method of accurately estimating mutual fund alphas
that fully exploits the information contained in the cross-section of funds. It is
natural to ask what resemblance this bears to the methods investors actually use
to evaluate funds. Although their methods are not observable, their conclusions
are revealed by the fund flow data.
Results for a series of fund flow predictability regressions are shown in Table
3. The dependent variable is the flow ratio, defined as the net fund flow divided
by the fund’s total assets under management in the previous month. Regression
(1) is patterned after the baseline regression model of Sirri and Tufano (1998).
This shows that investors are highly responsive to past returns, although they
are more responsive to high returns than to low returns—the effect of being in
the top quintile of past returns is positive and significant even after controlling
for average past returns, Rt −Rmkt. This convexity is a robust feature of mutual
fund flows.
Regression (2) adds the lagged OLS alpha as a predictor, while regression
(3) adds the lagged Bayesian mixture alpha. Regression (4) includes them both.
Regression (2) shows that investors are indeed responsive to the information in
OLS alphas. This information does not subsume the effect of past returns, but
the coefficient on Rt −Rmkt is somewhat closer to zero. Regression (3) shows
that investors are also responsive to the information in Bayesian mixture alphas.
In regression (4), we see that both the OLS alphas and the Bayesian mix-
ture alphas predict fund flows in a statistically significant manner. Although the
point estimate of the coefficient on the Bayesian mixture alpha is higher than
that of the OLS alpha, 0.33 versus 0.11, this difference is not significant. The
key finding is that the information contained in the Bayesian mixture alphas, in-
corporating knowledge of the skewness and excess kurtosis in the cross-sectional
28
distribution of alpha, influences investors’ capital allocations, beyond the infor-
mation contained in OLS alphas and directly in past returns. Although not
statistically significant, it also appears that including the Bayesian mixture al-
phas, in regressions (3) and (4), decreases the convexity of fund flows, consistent
with the theoretical predictions of section 5.1.
5.3 Smart money effect
A general finding in the literature is that mutual fund flows help to predict
mutual fund performance.36 The “smart money” effect refers to the interpreta-
tion that investors are actually able to anticipate future fund performance and
allocate capital accordingly, but this is not the only interpretation.37 One way
to assess the “smartness” of investors is to examine the incremental impact that
estimated alphas have for fund return prediction, controlling for fund flows.
Table 4 presents estimates from a series of regressions seeking to explain
fund returns in terms of past flows and returns. Regression (3) shows that past
flows do predict fund returns; adding past returns in regression (4) weakens
the predictive power of flows. As expected, regressions (1) and (2) show that
including lagged OLS and Bayesian mixture alphas does improve fund return
prediction.
The most interesting results are from regression (5), which includes past flows
and both OLS and Bayesian mixture alphas. The coefficient on lagged flow ratio
remains significant—flows are predictive beyond the information contained in
estimated alphas. Had the coefficients on lagged flows all been insignificant, this
would have implied that alphas predict fund returns because both are driven
by managerial skill: flows are truly smart money. The failure of this test does
not imply that investors are not smart, however, since this is a joint test of the
accuracy of the estimated alphas and the smart money effect. But the strength
of the return predictability documented in section 4.4 and the relative weakness
of the flow predictability documented in section 5.2 suggest a mechanism other
than “smart money” for the predictability of fund returns by fund flows.
Also noteworthy is the coefficient on lagged OLS alphas in regression (5): it
is negative and significant. Controlling for Bayesian mixture alphas, future fund
returns are higher when OLS alphas are lower, and vice versa. This is a clear
36See, e.g., Gruber (1996) and Zheng (1999).37For example, Lou (2011) presents evidence that fund return predictability is the conse-
quence of flow-induced price pressure on underlying stock holdings.
29
Fund Flow Predictability, Fama-MacBeth Regressions
Flow Ratio (1) (2) (3) (4)
Top Return Quintile 0.62* 0.50* 0.38* 0.43*(0.17) (0.16) (0.16) (0.16)
Bottom Return Quintile 0.10 0.21 0.26 0.39(0.17) (0.18) (0.17) (0.27)
Rt −Rmkt (%) 2.45* 1.99* 2.14* 2.16*(0.24) (0.27) (0.24) (0.35)
σ(Rt −Rmkt) (%) -0.04 -0.04 -0.04 -0.06(0.03) (0.04) (0.03) (0.04)
Lagged Expense Ratio (%) -0.15 -0.13 -0.11 -0.09(0.18) (0.17) (0.18) (0.18)
Total Flow into Fund Category ($M) 0.00 0.00 0.00 0.00(0.00) (0.00) (0.00) (0.00)
Lagged Log(Assets Under Mgmt. ($M)) 0.36* -0.38* -0.38* -0.37*(0.05) (0.05) (0.05) (0.05)
Lagged OLS α (%) 0.13* 0.11*(0.02) (0.05)
Lagged Bayesian Mixture α (%) 0.65* 0.33*(0.06) (0.14)
Constant 0.95 1.35* 1.67* 1.40*(0.56) (0.58) (0.60) (0.64)
R2 0.09 0.09 0.09 0.10
Table 3: Results for Fama-MacBeth regressions predicting flow ratio, definedas this month’s net flows divided by last month’s total assets under manage-ment; 1963-2010. Regression (1) is based directly on the baseline regression inSirri and Tufano (1998), and shows both the responsiveness of flows to raw pastperformance, as well as the convexity of fund flows. (2) adds OLS alphas as apredictor, while (3) adds the Bayesian mixture alphas. (4) includes both alphasas predictors. Although overall R2 does not change dramatically, the stronglysignificant coefficients on the Bayesian mixture alphas indicate that they con-tain information not contained in the other predictors, and that investors aresensitive to this information.
30
0.2
.4.6
.8fr
action o
f m
anagers
with p
ositiv
e s
kill
1960m1 1970m1 1980m1 1990m1 2000m1 2010m1
Figure 12: Estimated fraction of managers with positive alpha using theBayesian mixture model.
illustration of how poorly OLS distinguishes signal from noise in the context of
investment performance evaluation.
5.4 Value of active management
Even if some fund managers possess skill, investors do not benefit unless man-
agers generate positive alpha after management fees and trading costs. Since
all alphas in this study are estimated using net fund returns to the investor,
we can directly evaluate the prevalence of skill over time. Panel A of Figure 6
shows the overall mean of the prevailing distribution of skill—it does not diverge
substantially from the implied OLS mean—which seems to lend support to the
Berk and Green hypothesis: the typical fund has an alpha quite close to zero.
Looking at the mean alone is misleading, however. Figure 7 shows that the
prevailing distribution of skill also has fat tails and positive skewness. Instead
of simply examining the mean, it is more informative to compute the fraction
of managers with positive alpha directly from the estimated parameters of the
prevailing cross-sectional distribution. The results are shown in Figure 12.
Barras et al. (2010) classify funds as either unskilled, zero-alpha, or skilled,
and find that 75% of funds are zero-alpha while less than 5% are skilled.38
This paper uses fund-level data to estimate the underlying distribution—sharply
separating the negative and positive regions of skill. There is no need for a
38See Figure 4 in Barras et al. (2010) for a comparison to Figure 12. Their estimatedfraction of skill is both much lower and smoother than the estimates presented here.
31
“Smart Money” Effect, Fama-MacBeth Regressions
Monthly Fund Return (%) (1) (2) (3) (4) (5)
Lagged OLS α (%) 0.19* -0.15*(0.01) (0.01)
Lagged Bayesian Mixture α (%) 0.11* 0.11*(0.01) (0.01)
Rt −Rmkt (%) 0.24* 0.17*(0.01) (0.01)
Flow Ratio in month t−1 0.002* 0.001* 0.001*(0.00) (0.00) (0.00)
Flow Ratio in month t−2 0.001* 0.00 0.00(0.00) (0.00) (0.00)
Flow Ratio in month t−3 0.00 0.00 0.00(0.00) (0.00) (0.00)
Constant 0.45* 0.53* 0.49* 0.38* 0.49*(0.09) (0.09) (0.01) (0.01) (0.01)
R2 0.02 0.02 0.01 0.02 0.02
Table 4: Results for Fama-MacBeth regressions predicting monthly fund re-turns; 1963-2010. (1) and (2) show the predictive power of OLS and Bayesianmixture alphas, respectively. (3) shows that flows do help to predict fund per-formance. They continue to predict performance when (4) past returns areincluded, and (5) when OLS and Bayesian mixture alphas are included. Thisevidence, combined with the results from sections 4.4 and 5.2, suggests that“smart money” is not the principal mechanism explaining why fund flows pre-dict fund returns.
32
zero-alpha category unless there is a point mass at zero in the underlying skill
distribution. Kosowski et al. (2006) use a bootstrap approach to compute a
continuous measure of skill, and find that approximately 5% of managers show
evidence of skill net of costs.
The most recent estimates in Figure 12 suggest that approximately 20% of
managers have skill, although the fraction has been much higher in the past.
The volatility of the estimates over time and the magnitude of the maximum
estimates rightly suggest that the distribution has not been perfectly estimated.
Nevertheless, the decile portfolios performed well: an imperfectly estimated
distribution still greatly assists in the proper ordering of funds.
Even if sizeable fractions of managers possess skill after costs and fees, we
would like to know if the active mutual fund industry is providing value in
aggregate. Fama and French (2010) employ a bootstrap methodology and find
that the aggregate portfolio of mutual funds has a negative net alpha, although
there is evidence of superior skill in the extreme upper tail of the cross-sectional
distribution of skill.
This study used only one representative share class from each fund. To
construct a comparable net industry-wide alpha, we would need to take into
account all share classes of each fund, their expense ratios, and assets under
management, following French (2008). This has not been done, so the impor-
tant question of whether the industry provides value to investors remains open.
Even still, it is safe to speculate that such an analysis would have conclusions
somewhat more favorable to the mutual fund industry than French, given the
higher fraction of skilled managers found here.
6 Conclusion
The model explored in this paper is deliberately kept simple to focus atten-
tion on the sensitivity of alpha estimates to the assumptions made regarding
the cross-section of skill. Existing methodologies ignore the conclusions of the
“superstars” literature: the returns to investing skill are likely to be positively
skewed and fat-tailed. An optimal approach to performance evaluation should
take this into account and explicitly model the cross-sectional distribution of
alpha accordingly.
The hierarchical model I propose is such an approach. While it does not
impose skewness or excess kurtosis, it “learns” from the data that such features
33
are indeed present. Despite the simplicity of the mixture-of-normals distribu-
tion, the resulting estimates for fund alphas exhibit strong predictability across
the entire population of funds—it is not merely the very best and the very worst
funds whose performance persists.
In addition to accurately measuring fund alphas, this model also provides a
possible explanation for the convexity of fund flows. Any non-normal prior on
the distribution of alpha will lead to a nonlinear updating rule: high observed
returns can exert much greater influence than low observed returns on the pos-
terior mean of a given fund’s alpha. The convexity of fund flows could thus be
a rational response to the non-normality of alpha.
34
A MCMC Algorithm
This section provides details about the Bayesian inference procedure, includ-
ing the priors on all parameters and a step-by-step summary of the Markov
chain Monte Carlo (MCMC) algorithm used to generate samples from the joint
posterior distribution of the parameters.
A.1 Parameters to be estimated(µi, Vi, wi, α
j , βjk, Vjε
)for i = 1, ..., N ; j = 1, ..., J ; k = 1, ...,K; where N is the
number of mixture components in (2), J is the number of funds, and K is the
number of risk factors in (1).
A.2 Latent indicator
To simplify the estimation procedure, I introduce a latent J × 1 vector z, where
each element is Bernoulli distributed,39
zj ∼ Bernoulli (w2) . (10)
The conditional (on zj) prior on skill is now normal, instead of the original
mixture of normals in (2):
αj ∼ N(µzj+1, Vzj+1
). (11)
A.3 Priors
All priors are proper but diffuse. They are scaled to correspond to the monthly
percent returns used in the analysis. For example, an annual alpha of 12%
would appear here as 1.
µ1 ∼ N(apriorµ = 0, Apriorµ = 10
)(12)
µ2 ∼ N(apriorµ = 0, Apriorµ = 10
)· 1µ2≥µ1
(13)
Vi ∼ Gamma (1, 10) (14)
39In the general case with more than two mixture components, zj is categorically distributedwith vector parameter w.
35
w ∼ Dirichlet(πprior =
(1
1
))(15)
βjmkt ∼ N(apriorβmkt
= 1, Apriorβ = 5)
(16)
βjk ∼ N(apriorβ = 0, Apriorβ = 5
)(17)
V jε ∼ Gamma (1, 5) (18)
A.4 Sampling algorithm
1. Draw z conditional on w, µi, Vi, α.
• Compute (vector) probabilities ω conditional on each mixture compo-
nent:
ωi =1√Vi· φ(α− µi√
Vi
). (19)
• Compute (vector) unconditional posterior probability:
p2 =ω2
ω1 + ω2. (20)
• Draw z ∼ Bernoulli (p2).
2. Partition α conditional on z.
• Partition α into two vectors, α1 and α2, corresponding to those el-
ements of α from each of the mixture components, and let Ni =
rows(αi).
3. Draw Vi conditional on z, µi, αi.
• Because of the gamma prior, the posterior on Vi is not a known
distribution and is sampled using a random walk Metropolis-Hastings
draw.
4. Draw µi conditional on Vi, αi.
36
• Compute posterior hyper-parameters for each mixture component:
Aposteriorµ,i =
(1
Apriorµ
+NiVi
)−1
(21)
aposteriorµ,i
Aposteriorµ,i
=apriorµ
Apriorµ
+
Ni∑n=1
αi,n
Vi(22)
• Draw µi ∼ N(aposteriorµ,i , Aposteriorµ,i
).
5. Impose identifying constraint µ2 ≥ µ1.
• If the most recent draws of µi violate this constraint, revert to the
previous values.
6. Draw w conditional on z.
• Compute posterior vector hyper-parameter:
πposterior = πprior +
(N1
N2
). (23)
• Draw w ∼ Dirichlet(πposterior
).
7. Draw V jε conditional on Rj , F, αj , βj .
• Because of the gamma prior, the posterior on V jε is not a known
distribution and is sampled using a random walk Metropolis-Hastings
draw.
8. Draw βj conditional on Rj , F, αj , V jε
• Compute posterior hyper-parameters for each βjk:
Aposteriorβ,j,k =
(1
Apriorβ,k
+F
′
k · FkV jε
)−1
(24)
aposteriorβ,j,k
Aposteriorβ,j,k
=apriorβ,k
Apriorβ,k
+
F′
k ·Rj − αj ·T∑t=1
Fk,t − F′
k · F−k · β−k
V jε(25)
37
• Draw βjk ∼ N(aposteriorβ,j,k , Aposteriorβ,j,k
).
9. Draw αj conditional on zj , µzj+1, Vzj+1, βj , V jε , F,R
j .
• Compute posterior hyper-parameters for each αj :
Aposteriorα,j =
(1
Vzj+1+
T
V jε
)−1
(26)
aposteriorα,j
Aposteriorα,j
=µzj+1
Vzj+1+
T∑t=1
(Rjt − Ft · βj
)V jε
(27)
• Draw αj ∼ N(aposteriorα,j , Aposteriorα,j
).
References
[1] Baks, Klaas P., Andrew Metrick, and Jessica Wachter, 2001, Should In-
vestors Avoid All Actively Managed Mutual Funds? A Study in Bayesian
Performance Evaluation, Journal of Finance 56, 45-85.
[2] Barras, Laurent, Oliver Scaillet, and Russ Wermers, 2010, False Discov-
eries in Mutual Fund Performance: Measuring Luck in Estimated Alphas,
Journal of Finance 65, 179-216.
[3] Berk, Jonathan B. and Richard C. Green, 2004, Mutual Fund Flows and
Performance in Rational Markets, Journal of Political Economy 112, 1269-
1295.
[4] Bollen, Nicolas P. B. and Jeffrey A. Busse, 2004, Short-Term Persistence
in Mutual Fund Performance, Review of Financial Studies 18, 569-597.
[5] Brown, David P. and Youchang Wu, Mutual Fund Families and
Performance Evaluation (February 15, 2011). Available at SSRN:
http://ssrn.com/abstract=1690539
[6] Busse, Jeffrey A. and Paul J. Irvine, 2006, Bayesian Alphas and Mutual
Fund Persistence, Journal of Finance 61, 2251-2288.
[7] Carhart, Mark M., 1997, On Persistence in Mutual Fund Performance,
Journal of Finance 52, 57-82.
38
[8] Chevalier, Judith and Glenn Ellison, 1997, Risk Taking by Mutual Funds
as a Response to Incentives, Journal of Political Economy 105, 1167-1200.
[9] Chevalier, Judith and Glenn Ellison, 1999, Are Some Mutual Fund Man-
agers Better Than Others? Cross-Sectional Patterns in Behavior and Per-
formance, Journal of Finance 54, 875-899.
[10] Chung, Kee H. and Ramond A. K. Cox, 1990, Patterns of Productivity in
the Finance Literature: A Study of the Bibliometric Distributions, Journal
of Finance 45, 301-309.
[11] Cohen, Lauren, Andrea Frazzini, and Christopher Malloy, 2008, The Small
World of Investing: Board Connections and Mutual Fund Returns, Journal
of Political Economy 116, 951-979.
[12] Coval, Joshua D. and Tobias J. Moskowitz, 2001, The Geography of Invest-
ment: Informed Trading and Asset Prices, Journal of Political Economy
109, 811-841.
[13] Efron, Bradley and Carl Morris, 1977, Stein’s Paradox in Statistics, Scien-
tific American 236, 119-127.
[14] Fama, Eugene F. and Kenneth R. French, 1993, Common risk factors in
the returns on bonds and stocks, Journal of Financial Economics 33, 3-53.
[15] Fama, Eugene F. and Kenneth R. French, 2010, Luck versus Skill in the
Cross-Section of Mutual Fund Returns, Journal of Finance 65, 1915-1947.
[16] Fernandez, Carmen and Mark F. J. Steel, 1998, On Bayesian Modeling of
Fat Tails and Skewness, Journal of the American Statistical Association
93, 359-371.
[17] French, Kenneth R., 2008, Presidential Address: The Cost of Active In-
vesting, Journal of Finance 63, 1537-1573.
[18] Gabaix, Xavier and Augustin Landier, 2008, Why Has CEO Pay Increased
So Much?, Quarterly Journal of Economics 123, 49-100.
[19] Gelman, Andrew, 2006, Prior distributions for variance parameters in hi-
erarchical models, Bayesian Analysis 1, 515-533.
[20] Gruber, Martin J., 1996, The Growth in Actively Managed Funds, Journal
of Finance 51, 783-810.
39
[21] Hausman, Jerry A. and Gregory K. Leonard, 1997, Superstars in the Na-
tional Basketball Association: Economic Value and Policy, Journal of Labor
Economics 15, 586-624.
[22] Jones, Christopher S. and Jay Shanken, 2005, Mutual fund performance
with learning across funds, Journal of Financial Economics 78, 507-552.
[23] Kosowski, Robert, Allan Timmerman, Russ Wermers, and Hal White, 2006,
Can Mutual Fund “Stars” Really Pick Stocks? New Evidence from a Boot-
strap Analysis, Journal of Finance 61, 2551-2595.
[24] Krueger, Alan B., 2005, The Economics of Real Superstars: The Market
for Rock Concerts in the Material World, Journal of Labor Economics 23,
1-30.
[25] Leamer, Edward E., 1978, Specification Searches.
[26] Levenshtein, V. I., 1966, Binary Codes Capable of Correcting Deletions,
Insertions, and Reversals, Soviet Physics Doklady 10, 707-710.
[27] Lindley, D. V., 1972, Bayesian Statistics, A Review.
[28] Lou, Dong, 2011, A Flow-Based Explanation for Return Predictability,
working paper.
[29] Pastor, Lubos and Robert F. Stambaugh, 2002, Mutual fund performance
and seemingly unrelated assets, Journal of Financial Economics 63, 315-
349.
[30] Lynch, Anthony W. and David K. Musto, 2003, How Investors Interpret
Past Fund Returns, Journal of Finance 58, 2033-2058.
[31] Mamaysky, Harry, Matthew Spiegel, and Hong Zhang, 2007, Improved
Forecasting of Mutual Fund Alphas and Betas, Review of Finance 11, 359-
400.
[32] Massa, Massimo, Jonathan Reuter, and Eric Zitzewitz, 2010, When should
firms share credit with employees? Evidence from anonymously managed
mutual funds, Journal of Financial Economics 95, 400-424.
[33] Reuter, Jonathan and Eric Zitzewitz, 2010, How Much Does Size Erode
Mutual Fund Performance? A Regression Discontinuity Approach, NBER
Working Paper 16329.
40
[34] Robert, Christian P. and George Casella, 2004, Monte Carlo Statistical
Methods.
[35] Rosen, Sherwin, 1981, The Economics of Superstars, American Economic
Review 71, 845-858.
[36] Sirri, Erik R. and Peter Tufano, 1998, Costly Search and Mutual Fund
Flows, Journal of Finance 53, 1589-1622.
[37] Stephens, Matthew, 2000, Dealing with label switching in mixture models,
Journal of the Royal Statistical Society B 62, 795-809.
[38] Wermers, Russ, 2000, Mutual Fund Performance: An Empirical Decompo-
sition into Stock-Picking Talent, Style, Transactions Costs, and Expenses,
Journal of Finance 55, 1655-1695.
[39] Zheng, Lu, 1999, Is Money Smart? A Study of Mutual Fund Investors’
Fund Selection Ability, Journal of Finance 54, 901-933.
41
top related