schema econ

#14 Cod. 20191

Econometrics

Black-Litterman Model....................................................................................................................................................1

OLS...................................................................................................................................................................................2

VAR and volatility estimation...........................................................................................................................................4

Stock for the long run......................................................................................................................................................6

Style analysis (OLS application)........................................................................................................................................6

Principle component........................................................................................................................................................7

Logarithmic random walk................................................................................................................................................8

Types of return and their properties...............................................................................................................................9

Markowitz optimization portfolio (Algebra calculus application)..................................................................................10

Probability Mathematics and Laws................................................................................................................................11

Matlab question............................................................................................................................................................12

Black-Litterman Model

Black-Litterman model scope is to estimate the market expected return avoiding the Markowitz optimization pitfall1. The basic idea is to use as weights for the market allocation, the ones computed starting from those provided by some well diversified index and by adjusting them with our views as departures from that index asset allocation. It is an application of the Bayesian statistic, basically we want to find a new distribution under some new information provided by us.

The methodology proposed consists of a multi-step process: At first We should perform the estimations of the B-L variables:

o We will chose a market index, from whom we will obtain the corresponding weights. Here we are making some assumptions on the index. The chosen market proxy should be mean variance efficient. This assumption is not really strong, in fact it is reasonable that a market proxy is at least not too much mean variance inefficient. However we should remember that a sub set of an efficient portfolio is not in general efficient2.

o The available market information are distributed according to a Normal μr N (μmkt ; Γ∗Σ), where the mean

is equal to the estimated market expected return and the Var-Cov matrix times a scalar smaller than oneo We already know the relationship between Var-Cov, weight, market return and risk aversion coefficient, as it

has been defined by Markowitz optimization, hence it is possible to invert that formula and to find out the implicit market expectation

wmkt=γ∗Σ−1∗(μmkt−1 r f) Since the estimated expected market return highly depends on the choice of the proxy index, to

lessen the problem, we should use a big portfolio. However the bigger the portfolio the harder/ numerical demanding is the computation power required, so to maintain a numerical manageability we can deepen the use of CAPM: we are going to use a big portfolio and we will find for each of our securities just the betas, hence we don’t need to estimate the whole Var-Cov matrix3

1 The problem overcome by this model is the high volatility of the historical return, which doesn’t allow to define narrow confidence interval at high probability level. Due to this high

volatility, there is an high sampling error, which doesn’t allow to use the Markowitz method to properly find out the weights of the market portfolio2 It could be the case only if the sub portfolio has been built by random sampling technique, so that it has the same sub class exposure3 There is a drawback, stocks with low correlation with the market tend to give unstable results, so it is necessary to implement a multifactor model

1

#14 Cod. 20191o The Γ is the transforming parameter of the Var-Cov matrix, its meaning is to account the relative importance

gives to the market or our view info, it is important the ratio between it and the view matrix. The higher the ratio the higher the confidence in the market

o We will make some assumption on the Var-Cov matrix. The matrix is usually estimated by monthly historical data (usually a tree years time frame) or by smoothed estimates

A typical problem in the Var-Cov matrix is the overestimation correlation, which will lower the positive effect of diversification, if two securities have similar expected return and high correlation there will be an over concentration on the asset with higher expected return. There exist a procedure4 to lowering this problem, that is similar to the adjusted beta.

o The risk aversion parameter (assuming the absence of the risk free asset) is given by the Markowitz formula: variance over expected excess return. Note the denominator is an a priori guess since it is what we are looking for, we can use an iterated process

γ= Σμmkt−r

o The views must be given in a numerical form, so that it is possible to immediately check the effect on the allocation. The asset manager views consist on portfolio return, which are summarized by a Normal with mean the expected return of the portfolio (given the manager views) and a diagonal Var-Cov matrix expressing the confidence on those views5

Pμr N (V ;φ )Where P is the weights that allow to have V expected return given the expected return of the securities in the market

Given all the previously information the Black and Litterman proposes to combine those two set of information using an optimization equation, aiming to minimizing the distance between our parameter and both the market’s and manger’s information.

minμ

(μ−μmkt )'∗ΓΣ−1∗¿(μ−μmkt )+(Pμ−V )'∗φ−1∗(Pμ−V )¿

Note that if we will use only the market portfolio information the investor will end up with the market portfolio itself, the innovation of the model is the possibility to add views and so to have a different allocation

The solution can be express in two ways as well:

μ=μmkt−K (V−P μmkt)Which can be seen as the tangency portfolio in the Markowitz optimization theorem, where to the μmkt we will add a

spread position representing the view correction

μbl=( (Γ Σ )−1+P ' (φ )−1P )−1

((Γ Σ )−1μmkt+P

' (φ )−1V )

Which is like a weighted average. The equivalent weights are

ωbl=g∗γ (Σ−1 (μmkt−rf )+Σ−1K (V−P μmkt ))=g(ω¿¿mkt+γ Σ−1K (V−P μmkt))¿

if 1' P '=0 than g=1 the parameter g is a constant that made the weights sum equal 1

OLS

OLS, ordinary least square is a method used to estimate the regression parameters of linear regression: a set of variables regressed on common factors. It assumed a linear relationship between the depend variable and the weights, not for the independent variable.

Besides OLS there exist other methods to estimate the regression parameters: Moments and maximum likelihood. The OLS

estimates consists of minimizing the ε ' ε the sum squared error [Y−(β0+ β1 X ) ]2 we want that our model on average is

equal to Y. This method is preferred because has an analytic solution and under certain hp is superior to any other methods as proofed by Gauss Markov theorem. An estimator need to have an important feature to be useful, that is un-biasness and if it

4 You will blend (meaning a weighted average) the estimated matrix with a reference one made of one in the diagonal and the average of the off diagonal elements of the estimated matrix5 We should define the matrix value so that to ideally built a IC at 95% in which our views are restrained

2

#14 Cod. 20191cannot be achieve we need to require consistency, which is an asymptotic property which require less hp on the error and its correlation with the independent variables.

Setting the first derivatives to 0 (it is a sufficient condition since the function is concave) we end up with the

β=( X ' X )−1X ' Y which E ( β )=β+∑ ( xi−x) ε /∑ (x i−x )2 and V ( β )=σε

2 ( X ' X )−1

β0= y−x β which E ( β0 )=β0+E (β− β )x∧V ( β0 )=V ( y−x β )=V (∑ y i

n−∑ (xi−x )

y i∗x

∑ (x i−x )2 )=¿

¿V (∑ y i( 1n−

(x i−x ) x∑ (x i−x )2 ))=σ2(∑ ( 1

n−

(x i−x ) x∑ ( xi−x )2 ))

2

=σ2( 1n2 n+

∑ (x i−x )2 x2

∑ (xi−x )4−

2n∗∑ (x i−x ) x

∑ (x i−x )2 )=σ2( 1n+ x2

∑ (x i−x )2)

from this formula we see that to increase the estimation quality by increasing the range of the independent variables.

Y=X ' β+ε ; Y=X (X ' X )−1X 'Y ; ε=(I−X (X ' X )−1

X ' )YAs the formula shows the depend variable randomness comes from the presence of the error. Hence the conditional and unconditional distribution are the same, furthermore under the weak hp we can say:

E(Y ) = Y and V(Y )=σ ε2X (X ' X )−1

X ' ;V (Y )=σ ε2 I n

E ( ε Y ' )=0 ; E ( ε Y ' )=σε2(I−X (X ' X )−1

X ' )OLS requires certain hp to properly work and to allow the user to infer IC

Weak hp are three and they will ensure all together that OLS estimators are BLUE:o The expected value of the error is 0 (it is always the case if the intercept is included in the model) and they are

not correlate with the X; if X is random we should request E (ε ¿ )=0o The variance of error is constant and the correlation among errors is 0, if this hp fail, so that V (ε )=Σ we can

still estimate the β with the generalized least method where βgls=(X ' Σ−1 X )−1X ' Σ−1Y it is still BLUE. We

need to transform the original equation so that to have another one with V (ε )=constant . Here the proof:

in GLS the

V (ε )=Σ=BB' ;B−1Y=B−1Xβ+B−1 ε=¿>Y °=X ° β+ε° where Y °=B−1Y∧V (ε ° )=B−1Σ B−1=I we can still use ε° ' ε °=(Y °−X ° β )' I−1(Y °−X ° β )o Note the V ( ε )≠V ( ε ) since ε=ε−( β0−β0 )−( β−β ) X hence V ( ε )=¿ σ ε

2(I−X (X ' X )−1X ')

o The matrix X is a full ranked one to avoid multi-collineratity and to ensure the matrix (X’X) to be invertible. The effect of the multi-collinearity is the increase of the betas variance

V ( β j )=σε

2 (X ' X ) j−1

1−R2 whereR2is theresult of the X jregressed vs .the other indipendent variables

The Gauss Markov theorem states that βols is BLUE by using the definition of variance efficiency which states that if β i

and β ii are both unbiased estimated we can say that β i is not worse than β ii iif V(β i¿−V (β ii) is at least psd.

E ( β )=( (X ' X )−1X '+C )Xβ=β+CXβ henceCX=0¿beunbiased

V (β )=σ ε2 (( X ' X )−1

X '+C ) ( (X ' X )−1X '+C )'=σε

2 [ ( X ' X )−1X ' X ( X ' X )−1

+(X ' X )−1X 'C'+CC '+( X ' X )−1

CX ]σ ε

2[( X ' X )−1+CC ']

we should also consider that if we want to estimate a set of linear functions of β meaning Hβ where H is not random the definition of BLUE estimator is invariant to this. We call this property “invariance to linear transform” and it is the stronger argument in favor of this definition of ’not worse’ estimator. An implied hp in the previously theorem is that the class of estimators is to be linear on the dependent variable.

Strong HP are two: the error are independent one to each other and to X, hence they are distribute as a Normal. It follows that even the beta has the same distribution since they are linear combination of the errors. Under those hp we can built confidence of interval and test the statistical meaningfulness of the model parameters.

There are several test used in statistics to assess the fit and the overall and one by one coefficient significance

3

#14 Cod. 20191o The t-ratio since the error variance is unobservable we use the sample variance so instead of the Gaussian

distribution we are going to use the t-student distribution x−xs x

the percentile za to define the IC and to see if

the 0 is included or we can use the “a” p-value; “n” is the degree of freedom, each of them will be used for each estimators. The general idea is to divide the numerator (hp) by its standard deviation. The paired sample is a procedure to test the difference between estimator by considering the difference “d” introduced as a new parameters in the model. Hence The standard deviation is automatically computed by the model and it will consider the effect of the potential correlation6 among estimators

o The F-test to test more/jointly hp. The F ratio is defined as

V ( εr )−V ( εur )V (εur )

∗q

n−k−1=

Rur2 −R r

2

1−Rur2 ∗q

n−k−1

; where k is the

# of parameters “q” is the number of factors tested. The F-test on one variable in general gives the same result of a two side t-test

o The R2 ¿V (Y )V (Y )

=1−V (ε )V (Y )

=p2( y ; y ) if the constant is included in the regression or better if Y=Y . This

measure can only be reduce by adding new independent variables. Note that if we are in the univariate the

R2=P ( x , y ) since y is a linear combination of x.

V ( y )=∑ ¿¿ Some consideration based on exams test:

o Cov(r1,r2) where both return has been computed on the same factors it is equal to β1β2V (f )o Remember that the expected value of each betas in a multivariate regression is the real beta and that the

difference of any pair of betas is alwaysV (β1−β2 )=σ e2¿¿

o If we us the estimated OLS parameter to make an inference in the region outside the X used (forecasting) you have to assume that the beta in the new region are the same and are still distributed according to a normal

with the same parameters. The target function to estimate the IC is x f β−x f β−ε= y f− y f . If we are doing

an IC for the Forecast the IC will ~Xβ+zαV (~ε );whereV (~ε )=V (~xβ− y f )o If the constant is included in the model we have Cov (Y , Y )=V (Y ) and the fitted value of y on the average

value of the X is the average of the fitted value itself, which is equal to the average of the real y as well.

y= Xnβ= X

n(X ' X )−1

X 'Y= Yn= y; if we use a model without intercept the E (ε )≠0=0+1' β0

o V ( y )=β 'V (X )B+V (ε)

o The Mean square error Eθ ((T−Eθ (T ) )2)+(Eθ (T )−θ )2 where T is the estimators θis the real value

o If we do not consider the complete model, but we miss to consider one independent variable which is correlated with the included variables, there will be a bias in our coefficient since they will be correlated with

the error ~β1= β1+ β2δ where δ isthe coefficient of the x2on x1 variabley=B0+B1 x+B2 z+ε1 realmodel

y=a0+a1 x+ε2 estimatedmodel

the error will be E (a1 )−B1 regressing z=x γ 1+ε3

o If the intercept is excluded in the model (and it is effectively different form zero) than the estimates of the

betas are biased. However if the intercept is really 0 the coefficient variance will be lower. E (ε )=β0 Io If the Cov(Xi;xj)=0 than each Beta could be estimated by the univariate formula

X ' Y=X ' Xβ+X ' ε=X 'Y−X ' X β=( X ' X−X ' X ) β=¿>X'Y−X ' Y=V β sinceY=Xβ

Where V is the Var-Cov Matrix of X, if the statement it is true that matrix is a diagonal

6 ε ' ε=(Y−Xβ )' (Y−Xβ )=Y ' Y−X ' β 'Y−Y ' Xβ+X ' βXβ doing thefirst derivatives we end up with 2 X 'Y +2(X ' X )β4

#14 Cod. 20191VAR and volatility estimation

Before talking about the VAR and its estimation procedure, we should spend some word on the volatility itself, on its meaning and on how to estimate. In the finance field the volatility is used as a measure of risk to have a sense of the unpredictability of an events. It is usually computed looking to the historical trend of a variable or by looking to the derivatives market, that is the implied volatility, which is the one making true the market price given the other variable using a pre-specified pricing formula

In the finance field Tails behavior is essential to estimate VAR7, which is used to assess the max possible future loss on a certain time interval. The VAR inputs are the exposure amount and the percentile indicating the given probability to experiment a loss at least equal to the one indicated by the percentile itself. As we can see the hp on the distribution of the Tails of the return is the key to ensure meaningfulness to this tool. The book propose four possible data distributions:

The Parametric one is the first methodology proposed. It consists of a gauss distribution with parameters infer from historical data. The parameters needed are the σ∧μ to find any quintile. Those parameters are estimated by historical data, in detail the volatility is estimated using the Riskmetrix.

rα=zα σ with μ=0o Our goals are to estimate the quintile and the low bound since the variance it is estimated

P (R≤r α )=α=¿>z α

The V (r¿¿α2)=zα4∑ λ2 i

(∑ λi )2∗V (σ 2)=zα

4∑ λ2 i

(∑ λi )2∗2σ 4¿where we are assuming

μ=0∧μ4=3σ 4 as a proxy

So the lower bound is −√rα2 +zα2 √V (r α2 )

o This method has several limits highlighted by empirical evidence, in fact the underlying hp on Gaussian returns is counter proofed by data.

Mixture of gauss distribution. It consists of the mixture of two or more Gaussian or not distribution with different parameters weighted with the probability of occurrence. The general idea is to use for the first the normal case parameters, while for the second the exceptional one. The blended distribution can be computed only numerically by maximum likelihood method

∏ f (ri; μ , σ1 , σ2 ,P)=l(σ1 , σ2 ,P) where μis estimated before running the quasi-likelihood function (log form);

however the tails will decline still at an exponential rate like the Gaussian Distribution; this methods is like a Garch model with infinite components, so the unconditional distribution becomes a non-constant variance

The Non parametric consist of using a theoretical distribution based on a frequency probabilistic approach. We will use as distribution the cumulative function, no parameter needed

o The confidence interval are built starting by finding the i-esimo observation from whom we have the wanted

empirical probability using the frequentistic approach i=¿ r i≤rn

o To find out the low bound we will need to compute the volatility of the frequentistic probability We will compute the probability of occurrence of that i-esimo observation using a binomial

distribution, (ni )α i (1−α )n−i

The cumulative distribution: ∑ (ni )αi (1−α )n−i ; E ( . )=nα ;V ( . )=α (1−α )n

With a n>> the distribution converge to a Gaussian distribution

ϕ0 ;1( j−nα

√nα (1−α ) )→ j<nα+ zα∗¿ √nα (1−α ) where j is the number of the ordered

observation which maximize the P(r j≥r a)≤β so it is the lower bound

7 Positive corr will reduce the variance

5

#14 Cod. 20191o The drawback is given by the few insight provided for extreme quintile since the observations became either

granular (be non-contiguous) or totally absent, hence this method is weak against alternative parametric distribution (high sampling error)

Semi-parametric is a blend of a parametric model to estimate the central value (close to the mean) and a non-parametric one for the tails, while the non-parametric part is to find where to plug in the tailor model

o The parametric part for the central value is a gauss distribution as in the parametric oneo The non-parametric part suggested to estimate the tailor data consists in building a function to approximated

the behavior of tails data. For P (R≤r )=L (r )|r|−awhere L(.) is a slow varying function and a is the speed

with which the tails goes to 0o To estimate “a” (the only parameter) we use the formula to represent the log frequency distribution

ln (F r (r ) )=ln (L (r ) )−aln|r|→C+aln|r|+εWhere εrepresents all the approximation made, C is a conant, in fact the ln of slow varying function is basically a constant a is estimated using OLS; polynomial declining rate for tails

o Then we graphically search for the plug in point, which is the point form where the empirical cumulative distribution start to behave as a linear function

When we have found the sub set of data, we will use them to estimate the quintile with:

α1

α2

=L (rα 1 )L(rα 2)

∗( rα 2

rα 1)a

→rα 2=r α1∗(α 1

α 2)

1a withα 2≤α 1

Hence given a and the probability and the first point where to start the plug in The procedure to find the low bond from the quintile probability is since:

−a=cov ( ln (F r (r ) ) ; ln|r|)

V (ln|r|)V (−a )=

V ( εi )V ( ln|r|)

∗1

n

a=a−t 1−β , n−3√ V ( εi )V (ln|r|)

∗1

n

Where V (εi )=∑ εi2

n−2 the low bound is ¿ rα 1∗(α1

α2)

1a

Stock for the long run

Stock for the long run is a common mistake in the finance field. It states that an investor should choose its own investment strategy choosing the stock with the highest expected return without considering the underling risk. This statement is based on two ex-ante and one ex-post hp, those hp came from the intuition (LRW world) that after a certain time period any kind of return can be achieved, regardless the risk:

First hp: given the Sharpe ratio formula √n∗μσ

the idea is that with a sufficient big n any result can be acquired, or in

other word there is time interval in which the probability to obtain a given expected return is reached (usually with a confidence of 95%) it is a direct consequence of the LRW hp that states that return grows at a rate “n” while volatility

grows at a rate equal to the “square root of n”8 nμ−√n∗z a∗σ Second hp: taking two investment strategies with same mean and variance, one in 10 uncorrelated securities for one

year and the other in just one security for 10 year, the hp will suggest the existence of time diversification Third hp (a posterior): looking the historical performance of the US stock exchange it makes sense to invest on it

compared to other investment strategy As it can be seen It is a consequence of how we built confidence of interval, however it can be proven wrong:

8 A general limit of the VAR methodology is that it doesn’t give information on the event that is causing the loss, but it gives only the probability of that event. It also ignores the distribution after the quintile estimated, furthermore it is a pro-cyclical measure, in fact since many of the methodologies proposed use historical parameters (or more in general data) from the past time interval, a positive (negative) trend will bring a positive (negative) momentum that will biased the estimation downgrade (upgrade)

6

#14 Cod. 20191 First critique: it made some hp on the investors utility function, meaning how he will choice his investment strategy.

The statement is assuming that investors will choice only comparing the Sharpe ratio on the long run, and that they won’t change their strategy. There is another comments to be done regarding the strategy: assuming to be confident on the criterion of Sharpe only for a certain long time frame, but be against the investment using the same criterion for each of the period subset, it is like assuming a peculiar Utility function of the investors; not only the statement is wrong, in fact the investment is not superior for any given horizontal period, but for sufficient long time horizon this strategy seems to be the best one among all the other possibilities. Furthermore, since we are interested in the total return (not

in the expected return) we notice that the range of possible total return (r1∗r2∗….∗rn ¿ will increase at “n” rate over

time, hence the uncertainty is not declining Second critique: it is an error based on wrong idea that two investment strategy with different time frame are

comparable, thus there is not any kind of time diversification. Third critique: since the US stock market has shown the highest return on the last century you should invest in stock.

This is an ex post statement and it cannot be proven for the future, in fact the positive US trend has been sustained by the economic growth of that economy, and we cannot infer from historical data a similar future success

Style analysis (OLS application)

Style analysis is a statistical way to compare asset manger performance with a specified ex post portfolio built using market index, meaning we want to know if the manger has been able to over-perform the market performance, hence if he had deserved the management’s fees. This capability to add value is not replicable by investor using public information it is an ex-post analysis

The suggested methodology consist of regression the fund return on some indexes, which are subjectively assumed by the investors to be a good proxy of management strategy

We will consider the spread of the return and the estimated return (hence we are considering the error) and we will analyze if its mean is statistical significant and if the cumulative sum of the error show any trend

Sharpe suggests to build the model following this procedure:o Set the constrain on the beta, they must sum to one and without intercept, this can be done by an ex-ante

method or an ex-post one (normalizing the value), keep in mind that those methods don’t give same results. This is a simplification made by Sharpe to ensure a self-financing strategy and to avoid the presence of a constant return over time (even risk free cannot achieve this result9)

o The regression is made on subset of constant period length, making them moving forward one by one The critiques of this methodology consists of three points:

o The weights has been set to maintain constant relative proportion is a limit and a costly strategy, there exist alternative: buy and hold or trend strategy or even within the constant weight we can have changes

o If the fund manager knows how he will be judge and he knows more than investor regarding the composition of the market portfolio, he can easily over perform the market, but it is not an easy task to replicate the market portfolio ex ante

o The analysis is not considering the difference of variance produce by the two strategy and this can give an advantage to the fund manger

There are three possible decision regarding the analysis depending on the error value e t=r t−r t o The cumulative error is negative, and it is a strong evidence against the fund performance, since it is more

efficient a totally passive strategyo The cumulative error is equal or not statistically significantly different from 0, it is hard to assess if the

management performance is not satisfying o The cumulative error is positive, it cannot consider an evidence of the goodness of a management team since

this measure alone it is affected by many strong simplifying assumptions and doesn’t consider the volatility difference between the passive strategy implemented and the effective one

Principle component

9 The theme of the explosion growth in the unitary coefficient auto regressive model, where LRW is one of those.

7

#14 Cod. 20191One of the key task in the asset management industry is to estimate the E (r )=δβ where δ are the price of risk times the beta (sensitivity). Those price are usually proxy with portfolio return, the problem is that we need to joint estimate both the factors and the betas, so we have an infinite range of possible weight to be used as solution.

There exist two methods to test the meaningfulness of a model: the first one is to check if the intercept is equal zero, however it is not really powerful and furthermore is not a good criterion (we may have really good fitting model which fails the test); the other one is to test the linear relationship between return and beta. This last method is a two-step process, at first we estimate the beta for each portfolio, than we will run a cross sectional regression to check that the estimated beta and factor are consistent with market data10. We may add other term like square of beta or error terms to see if those term are meaningful.

Principal component is an old and alternative method to estimate factors and betas by using the spectral theorem, where the number of principal components is less than or equal to the number of original variables. The rational of the method is to proxy the unobservable factor with portfolio return, which are built up to be sensible to constrains, basically we chose as factor f j=r X j. We need to jointly estimate the factors and the betas.

r=Fq X q' +ε ,where ε=Fm−q X m−q

This transformation is defined in such a way that the first principal component has as high a variance as possible (that is, accounts for as much of the variability in the data as possible11), and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components, hence the first elements of the error matrix is smaller than the smaller of the factors’. Principal components are guaranteed to be independent only if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables

Assuming to know the variance and to have a time independent Var matrix. This last assumption is added just to simply calculus, in fact there exist more complex methodologies to apply Principal component. Returns’ variance can be represented by the spectral theorem. Other assumption is that V(r) is a full rank matrix, thus if k is the rank it is equal to m which is the number of returns used

V (r )=X Λ X '=∑ x i∗x i∗λ i; X' X=I

o Where x is the eigenvectors and the λ is the diagonal matrix which has been ordered from the highest to the smallest value starting from the upper left position12

o The factors proposed are portfolio return, computed using the eigenvector and the market returns. Since each

portfolio is made by f j=r∗x j where x is the eigenvector, each of this portfolio is independent to the other so

we can use the univariate formula to compute our beta

β j=

Cov ( f j ;r )V (f j )

=(E ( x j

' r ' r )−E (x j' r )E (r ))

V ( f j )=x

j

∗V (r )

V (f j )=x j' λ j

λ j

=x j ' so The beta are the eigenvector for the

specified factoro The variance of this factors is equal to the diagonal matrix in the spectral decomposition and it is a diagonal

V (f j )=V ( r x j)=x j X ∆ X ' x j= λ j; V (F )=Λo Since our model completely explains the return behavior, so to change it in a model more close to the common

regression we will rearrange the formula. We will divide the factors in two group. The first one will be the variables matrix, the residual will be the error matrix.

The residual matrix will have mean 0 and it is uncorrelated with the factors

V (ε )=V ( r−f j∗β j )=X− j Λ− j X− j '

Thus the Var-Cov highest value of the residual will be smaller than those of the factors one The factors matrix rank is equal to q where q is the number of factor considered (q=j)

10 Theory of finance justifies this statement. We can use sort term risk free rate investment11 To increase the power we group the return in box which maximize the distance between observation12 By possible we mean given the constrain on the squared sum of the weights to be equal one, otherwise there won’t be a bound since it can be arbitrary change by multiplying by a constant. There exist other alternative such using the module, however those methodologies doesn’t allow an analytic solution

8

http://en.wikipedia.org/wiki/Multivariate_normal_distribution

http://en.wikipedia.org/wiki/Variance

#14 Cod. 20191 There is drawback in this methodology, it doesn’t generally respect the pricing theory which state that there should not

be extra remuneration not to bear any risk, in fact the residual can be correlated to some return and so they are not idiosyncratic, and furthermore this risk is not negligible an asset even if it’s not correlated with the factors included can have an excess return

There is another way to built principal component by maximize the portfolio risk with the constrain that each portfolio is orthogonal to each other components and that the sum of the squared weights is set to one

o We will built a Lagrangian function to maximize the variance under the constrain, we will end up with that the weights are the eigenvectors and the variance is the diagonal elements of the spectral theorem decomposition of the variance of the return

maxϑ ' ϑ

V (rθ )=¿>L=θV (r )θ'− λ(θ' θ−1)

The θ' θ constrain is made to have an analitic solution, even if it doen’t have an economic meaning, in fact in

general the linear combination of θ and return is not a portfolio13

|X Λ X '−λI|θ=0only if θ=x j

o The book suggest to see the marginal contribution of the total variance of each component to notice how basically all the variance is explained by the first three components

Assuming an unknown Var-Cov matrix : we can start from an a priori estimate of V(r) using historical data, however there could be the case that the quality is to low, that’s way it is suggested another methodology. We can start to estimate each components, starting from the highest, one by one.

o This method consists of maximize the variance with the usual constrain x’x=1 leaving all the estimation error in the last component, since we can better off the estimate of the first one

Logarithmic random walk

In finance we are interested to forecast return, however the uncertainty around return is not predictable (differently form game of chance) so we need to make assumption on possible probability distribution. One of the first model used is the LRW. It assumed that the price evolution over time are approximated according to a stochastic difference equation

lnP=ln Pt−1+εt ;rt=εtAs it is shown by the equation the current price level depends on the past evolution and on an idiosyncratic component, so it’s like saying that price movement are led by a modeled chance, meaning the underling distribution is assumed to be Gaussian. The log form is used since it allows for multi-period return to preserve normality since the log of a product is the sum of the log (linear function preserve the underling distribution). The idiosyncratic component has zero mean, constant variance and the covariance among error across time is 0. Sometimes it is added the hp that those error are jointly normal distributed, hence they are independent one to each other consistently with the time window, in fact if the observation are aggregated between period the new idiosyncratic component will be not correlated only for the new time window, but it will be correlated with the middle one, hence those middle observations must be dropped. Note that To Aggregate overtime the variance with a correlation

structure between correlated observation is not any more “n” times the one period variance but V (r )=n (σ2+(1−n )σ2ρ ), hence the variance will increase an higher rate if the correlation is positive compared to the LRW variance.

Nowadays Logarithmic random walk is simply used as descriptive method used to made accruals on returns, since no other alternative has reach enough consensus in finance field, however the LRW hp are counter-proofed by empirical data Price don’t evolve by chance as suggested by LRW & There exist a strong empirical evidence against constant variance and in favor of the presence of correlation among securities; it can lead to negative Price level

The accrual convention consists of annualizing return by multiplying the expected return and volatility of one period by number of period or by its square root, which is the correct procedure in case of LRW, while it is an accrual for the securities

13 Remember that the Var-Cov matrix is a PD, otherwise (PSD) we cannot directly apply the theorem. The |Λ−λI|=0 is the characteristic equation. Which is of order equal to the rank of the Var-Cov matrix, so it can be solved only numerically

9

#14 Cod. 20191Another proposed model is the Geometric RW P=P0e

rwhich is basically the LRW applied on price instead of return, this model

has a log normal distribution (hence a positive skewedness, which is related to the number of period considered). Some useful properties are: it cannot become negative, volatility is function of the level of price (lower for small price. Bigger for big one)

Types of return and their properties

Even if in finance we are interested on the price evolution over time, all the models and assumptions are based on returns. The easiest hp made on their possible evolution is basically supposing the existence of a stationarity process in the price, this statement is a big contentious in finance. There are two typologies of return, neither of them is better than the other, it depend on what we want to do:

Linear are best used for portfolio return14 over one time period to compute expected return and variance of portfolio, while the log return of portfolio doesn’t have a linear function to put together securities, hence any different combination of stocks have a not linear relationship making incredible difficult any optimization problem, because the

return are even a function of the stocks return. ∏ (ri+1 )−1=¿ E(∏ (ri+1 )−1)≠∏ E [(r i+1 )−1]not lin.

Logarithmic are best used for single stock return over time, in this case the return will only depend on the initial and last

element of the time series rt+1=ln( (Pt∗Pt+1 )Pt−1∗Pt

)=ln∏ Pi

Pi−1

=∑ r i;

ln∑ wi r i=¿E [ ln∑ wi ri ]≠ ln∑ wi E(ri) The relationship between those returns can be better understood by using the Taylor expansion for the log return,

which is the same of the linear if truncated at the first parameter ln ( x )=ln (1 )+ x−11

−( x−1 )2

2+…. This formula

shows how the difference between the linear and log return (for price ratio far away from 1) will be always greater than zero, since ln ( x )<x−1.

In finance the ratio of consecutive prices (maybe corrected by taking into account accruals) is often modeled as a random variable with an expected value very near to 1. This implies that the two definitions shall give different values with sizable probability only when the variance (or more in general the dispersion) of the price ratio distribution is non-negligible, so that observations far from the expected value have non-negligible probability. Since standard models in finance assume that variance of returns increases when the time between returns increases, this implies that the two definitions shall more likely imply different values when applied to long term returns.

The mean is hard to estimate due to the relative size of volatility, which is so big that basically the IC ends up with including the 0. Furthermore the increase of the frequency don’t provide any benefit

V ¿ nothing change since the monthly μ

12We are going to estimate the volatility measure using historical data, however there exist several procedure to be implemented from the simplest based on the LRW hp to more complex one to properly address some empirically volatility features.

The one based on the LRW is simply the equally weighed sum of the difference between the i-observation and the mean, however this measure has one big drawback that is the hp on the equality between the marginal contribution of the new observation to improve the estimate and the one of the oldest.

∑ (x i−x )2

n−2 To overcame this assumption forward a more market tailor procedure the financial industry has introduced the

Riskmetrix procedure: The new formula is an exponentially smoothed estimate with coefficient usually around 0.95, with boundary level set to >0 & <1; with the hp of zero mean15

14 We can use the absolute sum of the θ, but only numerical solution are available

15 (Pt−P t−1 )

Pt−1

=rt ; ∑ wi r iE (∑ w iri ) ;V (∑ wi r i), it is linear

10

#14 Cod. 20191

∑ λi∗¿ rt−i2

∑ λ i ¿ ∑ λi= 1(1−λ )

with i→∞=¿ (1− λ )∗∑ λi∗rt−i2

(1− λ ) [rt2+∑ λi∗r t−i−12 ]=(1−λ ) [ r t2+ λ∑ λi−1∗rt−i−1

2 ]= (1−λ ) rt2+λ∗σ t−1

2

Alternatively we can be written as λ∗V t−1+(1−λ)*rt−i2 −

λn+1rt−n−12

∑ λi where the last term is zero for n>>

o The drawback of this estimate is the loss of the un-biasness property (if the data have a constant volatility), and the formula is basically like truncating the available info with a daily data frequency at 1 years at most even for high λcoefficient

∑ wi r i2=¿E (∑ wi ri

2 )=∑ wi E (ri2)=∑ wiσ2≠σ 2 it is the case only if wi=1/n

∑ wi r i2=¿V (∑ wi ri

2 )=∑ wi r i2=(μ4−σ4)∑ w i

it is minimize with wi=1/n by doing the

LagrangianThe variance estimation (reducing the variance of the variance) on the other side is small relative to its estimation and it can be improved by increasing the frequency

V (σ2 )=V ¿

Where the fourth moment is computed assuming Gaussian return with μ4=3σ 4+6σ2 μ2+μ4 and theE2(x2)=(σ 2+μ2)2; the

monthly frequency formula becomes 2/n∗σ2( σ2

12+ 2μ2

144) which is smaller than before

Both volatility estimation suffer of the so called ghost problem, meaning that extremely high new obs has a high impact in the level of our estimates. This behavior is asymmetric in fact incredible low obs are capped and it is more severe for the classic formula where the volatility level will change abruptly when the outlier goes out the sample or will be reduce at a rate 1/n if all

the sample is considered. In the case of the smoothed estimators we have a decking factor equal to 1/λk

Markowitz optimization portfolio (Algebra calculus application)

Markowitz optimization portfolio is a methodology to build mean variance efficient portfolio using a set of stocks. This in general is not related with the CAPM, which is a general equilibrium model, however if we consider the whole market the Markowitz optimization becomes the CAPM market portfolio itself

The model is considering that the criterion used in the market is the mean variance efficiency and the investment time window is unique and preset at the beginning of the investment process (no change after that)

o The hp to apply this method are that we know both the expected values of the single stock and the Var-Cov Matrix, if those assumption fail there will be problems on the error sampling side

o One possible solution : The portfolio is built to minimize the variance with the constrain to achieve a specific return. One of the most important result is that the relative weight on the portfolio are the same and do not depend on the chosen return. This is a first instance of the separation theorem, meaning that the expected return that we want to achieve depend solely on the allocation between the risk free asset and the portfolio

Rπ= (1−w ' I ) rf+w' R;V (Rπ )=w ' Σw

minc=E (R π)

w ' Σw ;maxw

E (Rπ )−1λw ' Σw

o The same result can be achieve by maximize the return given a certain risk, the tangency portfolio in this case is equivalent to the result of the previously equation. This a sort of mean variance utility function

The variance of the return is always equal to the portfolio one time the weight invest on it The ratio of the expected value and its standard deviation is the same for all the portfolio, hence all

the portfolio have the same marginal contribution on the composition of the stock portfolio risk

11

#14 Cod. 20191

We want to show to result the first one is simply that the w=λ Σ−1(μ−r f) and that the λ=V (Rπ )

E (Rπ )−r f

is the slope

and that √q=E (Rπ )−r f

√V (Rπ ) is the Sharpe ratio, basically we want to show that all the portfolio have the same value:

o If we consider the weight: E (w )=c−rfμr−r f

and plug into the Markowitz lambda: λ=c−rf

(μr−rf )' Σ−1 (μr−r f )

o If we plug this lambda in the Markowitz weight: w=(c−r f ) Σ−1 (μr−rf )

q

o We have to consider the market allocation 1'w=(E ( rm )−rf )1' Σ−1 (μr−r f )

qm

o The portfolio return will be: r f+γ w (r−rf )so computing the Expected and Variance of this equation and find

the γ and equal them so we will end up: γ=E (Rπ )−r fw (μr−rf )

and γ=V (Rπ )w ' Σ w

Investors take on risk in order to generate higher expected returns. This trade-off implies that an investor must balance the return contribution of each securities against their portfolio risk contributions. Central to achieving this balance is some measure of the correlation of each investment’s returns with those of the portfolio

We do not believe there is one optimally estimated covariance matrix. Rather, we use approaches designed to balance trade-offs along several dimensions and choose parameters that make sense for the task at hand.

o One important trade-off arises from the desire to track time varying volatilities, which must be balanced against the imprecision that results from using only recent data. This balance is very different when the investment horizon is short, for example a few weeks, versus when it is longer, such as a quarter or a year.

o Another trade-off arises from the desire to extract as much information from the data as possible, which argues toward measuring returns over short intervals. This desire must be balanced against the reality that the structure of volatility and correlation is not stable and may be contaminated by mean-reverting noise over very short intervals, such as intraday or even daily returns

All the portfolio must have same Sharpe ratio if we can invested we can add as intercept the risk free rate, otherwise a self-financing strategy should have an intercept of zero

V (Rπ )=σ2

n+

(n−1 )n

ρ σ2→n→∞→V (Rπ )= ρσ2 systemicriskassuming a portfolio equally weighted. The first

term is ∑ V (r )k2 , the second term ∑

Co (ri ;r j )K2

Probability Mathematics and Laws

P (B ¿ )=P ( A∩B )P ( A )

this is the conditional probability, this formula allows as to update probability with new

info, Bayes has proposed an alternative formula (more usable)

P (B ¿ )=P ( A ¿ )P (B )P ( A )

where P (A )=P (A ¿ )P (B )+P (A ¿B )P ( B )

P(A ¿)=P (A )if A∧BareindipendnetP (A ∩B )=P (A )∗P (B ) if A∧Bare indipendnet

P (A∪B )=P (A )+P (B )−P ( A∩B )

V (A )=E (E (A2 ¿ ))−(E (E ( A ¿ ) ) )2=E ¿ Any random variable has an associated distribution:

12

#14 Cod. 20191o Continuous case the “empirical” density is ∫ f x ( x )dx and the cumulative probability is

F ( X )=∫−∞

a

f x (u )duand it is a strictly increasing function

The concept of percentile (q bounded between 0 and 1) is the probability that F (q )−1=z

where z is the corresponding value which put q% of data/value

The excepted value is ∫−∞

∞

xf x ( x )dxwhile the variance is ∫−∞

∞

(x−E (x ) )2 f x (x )dx those are

population moments, using real CDF Each distribution is defined by three parameters

o Local are those which will shift the distribution to the right or lefto Shape are the residual definitiono Scale are those who will change the σ, anything else

Possible distribution (useful)o Binomial or Bernoulli distribution. The parameters are “n” number of experiment and “p”

probability of success for each experiment (it is assumed constant for all experiments)

(nk) pk (1−p )n−k

K is the target success occurrence, for n>> the binomial will approximate a Gaussian Distributiono Lognormal is a right skewed distribution o Multivariate distribution is the distribution of the jointly behavior of two or more variable

∫∫ f ( x , y )dxdy bivariate case Matrix operation:

o The rank rank|AB|=max rank (A ,B); Anxk k<nrank (A )=k atmosto A matrix is invertible iif it is full ranked and symmetric o A(nxk )Bkxn=Cnxn A must have same col of the B’s row, the resulting matrix is A’s row and B’s col

o A−1 A=A ' A−1

o d X ' FdX

=F ; d X ' FXdX

=2FX

o V [∑i=0

n

x i]=nV (x)

o To sum column of a matrix : C (nx1 )T∗Anxm=B1xm

o To sum rows: A∗bmx 1

o With two vectors we can built a matrix: cnx 1∗bmx 1T=Bnxm

Inequalitieso Tchebicev Inequality ( the model consider the module, so we will account for both tail side)

P (|X−E (X )|< λσ )≥1− 1

λ2

o Vysochankij-Petun Inequality

P (|X−E (X )|< λσ )≥1− 4

9 λ2wi th λ>1,63

o Cantelli one side:

P (X−E (X )<λσ )≥ λ2

1+ λ2

13

#14 Cod. 20191 Distribution measures:

o The skewedness measures the asymmetry of a distribution, it is the third moment centered. Positive value will indicate right asymmetry, the opposite the left

E (X−E (X ) )3

σ 3 polutation1n∑ ( x−x

σ )3

sample

o The kurtosis is a measure of the distribution in the shoulder or in the tails. The higher the value the higher the concentration in the tails. It will be affected by asymmetry as well and it is always >0

E (X−E (X ) )4

σ 4 polutation1n∑ ( x−x

σ )4

sample

Matlab question

Data=xlsread(‘nome file’,’worksheet’,’range’); if “worksheet =-1” opens an Excel window to interactively select data. Both string or number for worksheet

Xlswrite(‘nome file’, dati , ‘worksheet’,range) Inv(A) is doing the inverse of the matrix [coeff, latent]= pcacov(A) is doing the pc: where “coeff” stands for the eigenvectors, while “latent” for the

row of the lambda Flipud(A) the last element becomes the first Cov(A) will do the Var-Cov matrix for i=0:4:12 (…) end, where 0 is the starting value, 4 is length and 12 is the final value

14

schema econ

Business