finm 331/stat 339 financial data analysis, winter...

58
FinM 331/Stat 339 Financial Data Analysis, Winter 2010 Floyd B. Hanson, Visiting Professor Email: [email protected] Master of Science in Financial Mathematics Program University of Chicago Lecture 5 6:30-9:30 pm, 01 February 2010, Ryerson 251 in Chicago 7:30-10:30 pm, 01 February 2010 2010 at UBS in Stamford 7:30-10:30 am, 02 February 2010 at Spring in Singapore FINM 331/Stat 339 W10 Financial Data Analysis Lecture5-page1 Floyd B. Hanson

Upload: others

Post on 12-May-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

FinM 331/Stat 339 Financial Data Analysis,

Winter 2010

Floyd B. Hanson, Visiting Professor

Email: [email protected]

Master of Science in Financial Mathematics ProgramUniversity of Chicago

Lecture 56:30-9:30 pm, 01 February 2010, Ryerson 251 in Chicago

7:30-10:30 pm, 01 February 2010 2010 at UBS in Stamford

7:30-10:30 am, 02 February 2010 at Spring in Singapore

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page1 — Floyd B. Hanson

Page 2: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

5. Regression and Maximum Likelihood Tools:5.1 Parametric Regression:a

Parametric regression involves fitting to the sample data a smallnumber of parameters, e. g., mean µ and variance σ2, while by contrastnonparametric regression involving fitting on more of the localproperties of the data with much data preparation to smooth fittedcurves. In addition, there are many flavors of parametric regression. Somedo not rely on random properties much, like the simple least squaresrelating a response data vector analogous to an dependent variable to amodel depending on a few predictors or regressors analogous toindependent variables and a few unknown parameters or coefficientsthat are to be estimated. Other regression methods, such as maximumlikelihood estimation, as the name implies, rely on more probabilisticmethods, but then so do stochastic models in finance.

aSee Rice (2007) Chapt. 14 or Carmona (2004) Chapt. 3 for more information or Weis-berg (2005) Chapts. 1-4 for even more details on both statistical and computational errors.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page2 — Floyd B. Hanson

Page 3: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

5.2 Ordinary or Simple Least Squares Regression forUnivariate Linear Models:Given a sample of n observations of responses ~Y = [Yi]n×1 that appearto satisfy a nearly linear relationship with respect to the same size sampleof observations ~X = [Xi]n×1 called predictors or regressors. Thenearly linear relationship could be suggested by plotting ~Y against ~X

by using MATLAB’s scatter(Y,X); scatter plot function andeyeballing the plotted points. Least squares suggests that our cost functionbe the square the deviations from the linear model y=φ(x)≡b+mx,where m is the slope and b is the intercept and minimize or find the leastvalue of the sum of the squares of the deviation of the data from thestraight line, i.e., our objective is the quadratic cost function,

L2(b, m) =n∑

i=1

(Yi − b − mXi)2. (5.1)

The observations will not likely satisfy the model φ(x), instead there willbe an error Yi =b+mXi+ei or ei ≡Yi−b−mXi andL2(b, m) =

∑ni=1 e2

i = (n − 1)σ2e .

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page3 — Floyd B. Hanson

Page 4: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

Alternatively, the sum of the absolute deviations,

L1(m, b) =n∑

i=1

|Yi − m · Xi − b|, (5.2)

but the cost model does not have such nice and justifiable properties.

The minimum can be found by searching for the critical points withrespect to the unknown parameter set (b, m), so

∂L2

∂b(m, b)=−2

n∑i=1

(Yi−mXi − b)=−2n(Y −mX−b) ∗=0,(5.3)

yielding the implicit optimal intercept estimation,b = b∗ = Y − m · X , where (X, Y ) is the usual sample mean vectorof the coordinates. Next,

∂L2∂m

(m, b)= −2∑n

i=1(Yi−mXi−b)Xi

= −2n(XY −mX2−bX) ∗=0,(5.4)

yielding the least squares solution,

m = m∗ =XY − X Y

X2 − (X)2=

σx,y

σ2x

if σ2x > 0. (5.5)

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page4 — Floyd B. Hanson

Page 5: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

Hence, the estimated slope m is given as the ratio of the data samplecovariance to the variance of the predictor ~X , both being biased orunbiased quadratic moments. However, this estimated slope formulahas possibly small differences in numerator and denominator, so itscomputation should be handled with care due to “CatastrophicCancellations”. These are cancellations where significant digits are lostin subtractions.

The estimated intercept b is usually calculated once the estimated slopem is calculated since it serves no numerical purpose to eliminate the slopefrom the intercept formula, but could lead to a less precise calculation.

The observation error ~e between observed residual using the modelcomes with statistical conditions: E[~e ] = ~0 since otherwise there will bebias so any non-zero value should be in the model and not in the noise andCov[~e, ~e>] = σ2

eIn for constant “sigma” σe or else will interferewith the least squares fit.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page5 — Floyd B. Hanson

Page 6: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

The quality of the linear regression is just by how close the notedcoefficient of determination R2 value is to one and is defined byformula,

R2 ≡ 1 −SSE

TSS= 1 −

(n − 2)σ2e

(n − 1)σ2y

, (5.6)

{Almost Total Correction!},where the

SSE = RSS ≡n∑

i=1

(Yi − Yi)2 ≡n∑

i=1

e2i = (n − 2)σ2

e (5.7)

is the sum of squared errors (SSE) or residual sum of squares (RSS),with Yi ≡mXi+b is the ith estimated response, with the 2 less degreesof freedom due to parameters accounting for the factor n−2. The

TSS ≡n∑

i=1

(Yi − Y )2 = (n − 1)σ2y (5.8)

is the total sum of squares (TSS) of the deviation.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page6 — Floyd B. Hanson

Page 7: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• Linear Regression Translation to Linear Algebra & MATLAB:

Translating the linear regression to matrix-vector form, combine theparameters into a parameter vector be ~p=[b, m]>, then the independentvariable (n×1) ~X is extended to the (n×2)-matrixA=[ones(n, 1), ~X], so that the usual linear algebraic equationA ∗ ~x = ~b is equivalent to the form A~p= ~Y with the correspondence~b= ~Y is the known RHS and ~x=~p is the unknown. Th linear regression,least squares problem is the find the minimum of the scalar quadraticform:

S(~x)≡(A~x−~b

)>(A~x−~b

)=~x>A>A~x−2~x>A>~b+~b >~b, (5.9)

after transpose algebra ((AB)>=B>A>). Since if ~V is a fixed, then∇x[~x> ~V ]=∇x[~V >~x]= ~V (5.10)

and∇x[S](~x)=2

(A>A~x−A>~b

)∗= ~0, (5.11)

at a critical point ~x∗ if it exists, where the least squares solution is~p∗ =~x∗ =

(A>A

)−1A>~b. (5.12)

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page7 — Floyd B. Hanson

Page 8: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• Ordinary Least Squares for Simple Linear Fit Example:

0 0.2 0.4 0.6 0.8 1−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Ordinary Least Square Linear Example:

x

y

scatter plot True y=m+bx lscov fit backslash fit

Figure 5.1: Ordinary least squares fit using indistinguishable MATLAB

lscov and A\b backslash methods, with similar results for a simple,straight line fit with scatterplot data.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page8 — Floyd B. Hanson

Page 9: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• MATLAB Code for Simple Ordinary Least Squares Application:function LStest

clc

fprintf(’\nLStest Output (%s):’,datestr(now));m = 1; b = 0.25; % True parm values;

fprintf(’\nTrue: b = %7.4f; m = %7.4f;’,b,m);

ptrue = [b;m];n = 100;

x = rand(n,1); % Simulated Uniform x-data;

sigma_e = 0.25;

err = 0.25*randn(n,1); % Simulated Normal Mean-Zero Error;fprintf(’\nsigma_e = %7.4f;’,sigma_e);

y = b+m*x+err; % Simulated y-data with linear model;

TSS = (n-1)*var(y);fprintf(’\nTSS = %9.3e’,TSS);

A = [ones(n,1) x];

fprintf(’\nsize(A)=[%i,%i]; size(y)=[%i,%i];’,size(A),size(y));[preg,sep,mse] = lscov(A,y); % Least Sqs Method, No Cov input

pbsl = A\y; % BackSlash (BS) Method (fast);

breg = preg(1); mreg = preg(2);yhatreg = breg+mreg*x;

yhatbsl = pbsl(1)+pbsl(2)*x;

SSEreg = sum((y-yhatreg).ˆ2); % Also, RSS=ResSumSqs

SSEbsl = sum((y-yhatbsl).ˆ2);

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page9 — Floyd B. Hanson

Page 10: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

fprintf(’\nRSS=SSEreg = %9.3e; SSEbsl = %9.3e;’,SSEreg,SSEbsl);

fprintf(’\nRelResreg = %9.3e;’,sqrt(norm(y-yhatreg)/norm(y)));

fprintf(’\nRelResbsl = %9.3e;’,sqrt(norm(y-yhatbsl)/norm(y)));

Rsqreg = 1-SSEreg/TSS; Rsqbsl = 1-SSEbsl/TSS;

fprintf(’\nRsqreg = %9.3e; Rsqbsl = %9.3e;’,Rsqreg,Rsqbsl);

fprintf(’\nTrue: b =%7.4f; m =%7.4f;’,b,m);

fprintf(’\nlscov: b = %7.4f; m = %7.4f;’,breg,mreg);

fprintf(’\nbkslh: b = %7.4f; m = %7.4f;’,pbsl(1),pbsl(2));

fprintf(’\nsqrt(norm(preg-ptrue)/norm(ptrue)) =%7.4f;’ ...

,sqrt(norm(preg-ptrue)/norm(ptrue)));

fprintf(’\nsqrt(norm(pbsl-ptrue)/norm(ptrue)) =%7.4f;’ ...

,sqrt(norm(pbsl-ptrue)/norm(ptrue)));

fprintf(’\nsqrt(norm(preg-pbsl)/norm(pbsl)) =%7.4f;’ ...

,sqrt(norm(preg-pbsl)/norm(pbsl)));

fprintf(’\nlscov: sep = [%9.3e,%9.3e];’,sep);

fprintf(’\nlscov: mse = %9.3e;’,mse); % Caution: different scaling

xg = 0:0.1:1;

ytrue = b+m*xg;

yhatregp = breg+mreg*xg;

yhatbslp = pbsl(1)+pbsl(2)*xg;

%

figure(1); nfig = 1;

scrsize = get(0,’ScreenSize’); % figure spacing for target screen

ss = [5.0,4.5,4.0,3.5]; % figure spacing factors

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page10 — Floyd B. Hanson

Page 11: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

scatter(x,y,8); hold on;

plot(xg,ytrue,’-g’,xg,yhatregp,’-.k’,xg,yhatbslp,’--r’,’LineWidth’,2);

axis tight; hold off;title(’Ordinary Least Square Linear Example:’...

,’Fontsize’,24,’FontWeight’,’Bold’);

xlabel(’x’,’Fontsize’,24,’FontWeight’,’Bold’);

ylabel(’y’,’Fontsize’,24,’FontWeight’,’Bold’);legend(’scatter plot’,’ True y=m+bx’,’ lscov fit’,’ backslash fit’ ...

,’Location’,’NorthWest’);

set(gca,’Fontsize’,20,’FontWeight’,’Bold’,’LineWidth’,3);set(gcf,’Color’,’White’,’Position’ ...

,[scrsize(3)/ss(nfig) 60 scrsize(3)*0.60 scrsize(4)*0.80]); %[l,b,w,h]

fprintf(’\n ’);

============= LStest Output =========================

LStest Output (30-Jan-2010 14:34:46):

True: b = 0.2500; m = 1.0000;

sigma_e = 0.2500;

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page11 — Floyd B. Hanson

Page 12: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

TSS = 1.659e+01

RSS=SSEreg = 6.494e+00; SSEbsl = 6.494e+00;

RelResreg = 5.403e-01;

RelResbsl = 5.403e-01;

Rsqreg = 6.085e-01; Rsqbsl = 6.085e-01;

True: b = 0.2500; m = 1.0000;

lscov: b = 0.2253; m = 1.0873;

bkslh: b = 0.2253; m = 1.0873;

sqrt(norm(preg-ptrue)/norm(ptrue)) = 0.2967;

sqrt(norm(pbsl-ptrue)/norm(ptrue)) = 0.2967;

sqrt(norm(preg-pbsl)/norm(pbsl)) = 0.0000;

lscov: sep = [5.125e-02,8.809e-02];

lscov: mse = 6.627e-02;

>>

Remark: Some scaling differences mean variables like mse, supposedmean square error, are quite different, say from the Relative ResidualRelRes=sqrt(norm(y-yhat)/norm(y)).

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page12 — Floyd B. Hanson

Page 13: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

5.3 Multiple Linear Regressor Models with a UnivariateResponse:Generalizing the prior univariate predictor variable, assuming againunivariate response sample data, ~Y = [Yi]n×1, consider the linearrelationship to m-dimensional multivariate predictor with sample dataarray X = [Xi,j]n×m and ~1 ≡ [1]n×1,

~Y ' p0~1 + X~p (5.13)

with predictor-independent coefficient p0 and linear predictor coefficientor parameter vector ~p = [pi]m×1. Define the linear error in themodel to be

~e=[ei]n×1 ≡ ~Y −p0~1−X~p. (5.14)

Next we absorb the p0 parameter by defining extended linear parameterand model basis arrays with the constant vector ~X0 =[Xi,0]n×1 ≡~1,

~pex ≡ [pi−1](m+1)×1 & A≡ [~1, X]=[Xi,j−1]n×(m+1), (5.15)

yielding the multivariate linear model in compact form,~Y =A~pex+~e. (5.16)

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page13 — Floyd B. Hanson

Page 14: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• Multiple Linear (Multilinear) Least Squares Regression:a

The least squares objective corresponds to minimizing the quadratic costsor residual sum of squares in 2-norm form,

L(~pex)=RSS(~pex)=(

~Y −A~pex

)>(~Y −A~pex

)=∣∣∣∣∣∣~Y −A~pex

∣∣∣∣∣∣2.(5.17)

In order to find the multivariate critical points with respect to theunknown parameter vector ~pex, we expand the quadratic form into moreelementary products

L(~pex) = ~Y > ~Y − 2~pex>A> ~Y + ~pex

>A>A~pex, (5.18)

where the reverse product transpose identity (AB)> = B>A> has been

used. Next the gradient peel theoremb, ∇~p

[~p > ~Y

]= ~Y , with three

applications, implies the critical point conditions,∇~pex

[L(~pex)] = −2A> ~Y + 2A>A~pex∗= ~0(m+1)×1. (5.19)

aMATAB reserves multivariate linear regression for a vector system of multiple linearregressor models of responses ~y and a array of predictors X , as in a portfolio application.

bHanson ’07, Online Appendix B Preliminaries: Probability and Analysis Results, p. B45,http://www.math.uic.edu/ hanson/pub/SIAMbook/bk0BprelimAppendfinal.pdf

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page14 — Floyd B. Hanson

Page 15: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

This leads to the optimal parameter estimate, provided that the(m+1) × (m+1) square matrix (A>A) is invertible,

pex = ~p∗ex =

(A>A

)−1A>~Y , (5.20)

but the non-square A will not be invertible in the ordinary sense, theA>A may not be positive definite for m>1, and there are possiblecatatrophic cancellation problems with estimated parameter formula.The optimum, i.e., minimum, pex, of the parameter vector depends ontechnical conditions, the most important is the objective quadratic form,the parameter vector is linear in the response observation vector ~Y .This estimate is unbiased conditional on A, with E[~e ] = ~0 andCov[~e, ~e>] = σ2

eIn from the usual assumptions, is that~e

dist= N (~0, σ2

eIn) , also it is assumed that E[e2i |Xi] is constant, so

E[pex | A] =(A>A

)−1A>E[~Y | A]

=(A>A

)−1A>E[A~pex + ~e | A]

=(A>A

)−1A>A~pex = ~pex,

(5.21)

which we take as the true parameter value.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page15 — Floyd B. Hanson

Page 16: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

In general, A>A may not be positive definite due to negativeX-correlations. Testing for positive definiteness, using~X ≡ 1

n[∑n

k=1 Xk,i]m×1 and X>X ≡ 1n[∑n

k=1 Xk,iXk,j]m×m,

1n

~p>exA>A~pex =

[p0 ~p>] 1 ~X

>

~X X>X

p0

~p

= p2

0+p0~X

>~p+~p>X>X~p

=∣∣∣∣∣∣∣∣p0+ ~X

>~p

∣∣∣∣∣∣∣∣2+~p>Cov[X>, X

]~p,

(5.22)

where the completing the square technique was used andCov

[X>, X

]≡ 1

n[∑n

k=1(Xk,i−Xi)(Xk,j −Xi)]m×m. So, whenm=1 predictor variable with a sum of squares and semi-positivedefiniteness, but for m>1 definiteness may not be true if there arenegative elements of variable covariance. Care must be taken in choosingstatistical software.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page16 — Floyd B. Hanson

Page 17: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

The least squares parameter solution has a geometric interpretationthat the least squares residual ~Res = ~Y − Apex must be orthogonal tothe predictor Apex. This follows from some linear algebra starting withthe scalar product orthogonal test:

(Apex)> ~Res= p>exA>(~Y −Apex)

= p>ex(A

> ~Y −A>Apex)∗= p>

ex~0(m+1)×1 =0,

(5.23)

by Eq. (5.19). Hence, the residual ~Res, the response observations ~Y andoptimal predicted response Apex form a right triangle with ~Y on thehypotenuse.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page17 — Floyd B. Hanson

Page 18: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

The estimated parameter covariance matrix is similarly given by

Σbpex|A = E[(pex−~pex)(pex−~pex)

>∣∣∣A]

=(A>A

)−1A>E

[~e ~e>]A(A>A

)−1

= σ2e ·(A>A

)−1.

(5.24)

The estimated response vector is defined by

Y ≡Apex =A(A>A

)−1A> ~Y ≡H ~Y , (5.25)

where H ≡A(A>A

)−1A> =[Hi,j]n×n is called the hat or

prediction matrix and note that it is a square symmetric matrix. Thismatrix enters into the raw residuals or errors as e≡ ~Y −Y =(In−H)~Y

with mean E[e|A]=~0 and covariance Σbe|A =σ2e(In−H), obviously

correlated due to the (In−H).

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page18 — Floyd B. Hanson

Page 19: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

The minimum residual sum of squares is given on page 6 byRSS = ||e||2 ≡||~Y −Y ||2 and the unbiased variance associated withthis residual sum of squares, when n>m+1, isσ2be = ||e||2/(n−m−1), where the number of degrees of freedomhave been reduced by the number of parameters, (m+1). Thestandardized residuals are given approximately by

e(std)i ' ei/(σbe√1−Hi,i). (5.26)

The F-test can be used for a test of normality, such that

F =SSreg

σ2e

=TSS − RSS

σ2e

, (5.27)

where SSreg is the sum of squares due to the regression. (Notesimilarity to R2).

In computational statistics, the relative residual,RelResy = ||~Y −Y ||/||~Y || (5.28)

is often used since it is readily calculated once the parameter estimate isavailable.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page19 — Floyd B. Hanson

Page 20: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• MATLAB Functions for Least Squares:MATLAB contains a large amount of functions for linear regression andits variations, but only a few are listed here.

1. Back Slash Method or Left Matrix Division (\ or mldivide): remember

the statement that A did not have a ordinary inverse, well it has a generalized

inverse such that bp=A\~Y (read this from right to left for A divided into~Y , as you would read ~Y/A in the reverse order) producing a least square

approximation solution,a the method depending on input. The mldivide

uses many methods depending on A. The more general regress function

uses powerful backslash technique, but produces added statistics.2. [phat,stdp,mse]=lscov(A,Y); produces the least square solution to

A~x=~b by solving (A~p−~Y )′(A~p−~Y ) for (m×1) phat from (n×m)

A and (n×1) Y (in our notation, (n, m)→(n, m+1) and (A, Y, p)

→ (A, ~Y , ~pex)). The extra output are estimated standard errors in stdp of

phat and mean squared error in mse. See the examples in help lscov .aD. & N. Higham ’05, MATLAB Guide, pp.126-127, say that it actually performs faster

and more accurately than some other methods.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page20 — Floyd B. Hanson

Page 21: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

3. [phat,pbci,r,rci,stats]=regress(X,Y,alpha);primary produces the the `×1 estimated linear parameter coefficient array

phat= bp for the responses in univariate Y from multilinear regression on

`×n predictor data X and n×1 response data y. (Caution: If a constantparameter p0, then the first column of X is a column of ones and`=m+1.) The bci is a `×1 array for estimated parameter confidence

intervals in each of ` parameters using an out of confidence parameter

alpha, r is the n×1 linear fit residuals or errors, rci is a n×1 residual

confidence intervals for outlier diagnosis, an outlier being outside the

100(1-alpha)% residual confidence interval, and stats is an 1×4

statistics row vector containing

[R2−statistic, F−statistic, P−value of F, bσ2e ]. See Help

regress for an example application for ` = 4, where

A=[ones(size(x1)),x1,x2,x1.*x2], the fourth linear term is a

data cross product x1.∗x2, and a scatter3 scatterplot of the y vs.

(x1, x2) results. Note, multilinear only means a linear combination of

functions of data and are also called basis functions.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page21 — Floyd B. Hanson

Page 22: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

4. stats=regstats(y,x,model,whichstats); producessimiler linear regression results like regress, but make possible aricher set of statistical diagnostic tests and allows linear combinationsof both linear and quadratic functions of ~x. The option model can be’linear’,’purequadratic’,’interaction’,’quadratic’,

the latter is a general quadratic function. The option whichstatsallows many statistical tests and measures with output in the outputstructure stats and using ’all’ gives all, but if the outputstructure is omitted, then which statistics can be selected on a easierto use GUI menu.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page22 — Floyd B. Hanson

Page 23: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

5. [b,stats]=robustfit(x,y,wfun,tune); primaryproduces the weighted least square or linear regression for

the p×1 estimated linear parameter coefficient array b= p for theresponses in univariate y from multilinear regression on `×n

predictor data x and n×1 response data y (Caution: If a constantparameter p0 is in the model, then robustfit automaticallyadds ones to the first column of x and `=m at input, so do NOTadd ones. Also, the input order for x,y in robustfit is opposite tothe tmbty,x in regress and regstats.) See Helprobustfit for a long list of options. If the options wfun,tune,are omitted the bi-square weight function and its tuning constant areused.

{Note: The robustfit weighting can be used for identifyingdata outliers, so in traditional practice can be discarded orlessened by weighting or could be enhanced if value at risk was aserious concern. See also recoplot, the residual order plot.}

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page23 — Floyd B. Hanson

Page 24: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

6. [p,S,mu]=polyfit(x,y,n); finds the fit for vector y as a polynomial

p of degree n in standardized vector variable bx=(x-mu(1))/mu(2) (i.e.,

note m=1), a highly recommended centering and scaling of data toimprove any method computation, where mu=[mean(x),std(x)];.

Also S is the structure that gives extra information (see Help) reading the

output by [y,delta]=polyval(p,x,S,mu); where y±delta gives

the error supposedly within 50% confidence.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page24 — Floyd B. Hanson

Page 25: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

7. [U,V,D]=svd(x,0); produces the singular valuedecomposition (SVD) of the n×m array x where U is an n×n

orthogonal array (i.e., transpose is the inverse), V is an m×m

orthogonal array and D is generally a n×m diagonal array, but thesecond 0 input option reduces it to the necessary m×m, whenn≥m, but in our notation n≥m′+1 with the added constantcolumn, so for svd then m=m’+1.a SVD used to be one of a set ofmethods used by the backslash method.

aSee D. & N. Higham ’05, MATLAB Guide, pp.130-131. In general, the least squaresproblem is susceptible to ill-conditioning due to possible problems in catastrophic cancel-lation, but SVD was developed as a robust method for this problem. See also G. Forsythe,M. Malcolm and C. Moler, Computer Methods for Mathematical Computations, 1977, forpre-MATLAB background by Moler and his mentor Forsythe. Note, mathematical infiniteprecision is unrealistic compared to the computational finite precision in practice.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page25 — Floyd B. Hanson

Page 26: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

5.4 Weighted Least Squares :Since outliers or extreme rare events happen, there may be a desirableto reduce their bias if the objective is equilibrium type models or if theinterest is the larger value at risk the analyst may want to enhance theeffect to compensate for the bias of regression to average out theoutliers.

Let W =[wiδi,j]n×n be a diagonal matrix of constant positive weightswi, so W −1 =[δi,j/wi]n×n, W 1/2 =

√W =[

√wiδi,j]n×n and

W >=W . Next, the observational error or residual ~e is replaced by aweighted version ~ew =~e/

√W , so that the basic statistics are

E[~ew]=~0n and Cov[~ew, ~e>w ]=σ2

eInW −1.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page26 — Floyd B. Hanson

Page 27: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

The residual is then~Res= ~Y −A~pex =~e/

√W (5.29)

with extended observations array A=[~1, ~X] and the residual sum ofsquares is

RSS(~pex)=~e>~e=(~Y −A~pex)>W (~Y −A~pex), (5.30)

so the theoretical least squares estimate is

pex =(A>WA

)−1A>W ~Y . (5.31)

The implementation of weighted least square (WLS), see MATLAB’srobustfit, begins with scaling the predictor observations by letting

M ≡√

WA=[√

wiXi,j−1

]n×(m+1)

, (5.32)

recalling that ~X0 =~1, and the response observations by letting~Z ≡

√W ~Y =

[√wiYi

]n×1

, (5.33)

so the least squares parameter estimate becomes

pex =(M>M

)−1M>~Z. (5.34)

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page27 — Floyd B. Hanson

Page 28: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• Ordinary and Weighted Least Squares for Simple Linear Fit Examplewith Normal-Poisson Error:

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

2

2.5Ordinary & Weighted Least Square Linear Example:

x

y

scatter plot True y=m+bx Regress fit Robustfit

Figure 5.2: Ordinary least squares fit using MATLAB’s regress andweighted least squares robustfit methods, with dissimilar results fora straight line fit with scatterplot data from combined normal andPoisson jump random error.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page28 — Floyd B. Hanson

Page 29: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• MATLAB Code for Ordinary & Weighted Least Squares Applicationwith Normal-Poisson Error:function RegRobtestclc

fprintf(’\nRegRobtest Output (%s):’,datestr(now));

m = 1; b = 0.25; % True parm values;fprintf(’\nTrue: b = %7.4f; m = %7.4f;’,b,m);

ptrue = [b;m];

n = 100;x = rand(n,1); % Simulated Uniform x-data;

sigma_e = 0.30; nu_e = 1.5; Lambda_e = 0.05;

% JD-Simulation Zero-Mean error:err = sigma_e*randn(n,1) ...

+ nu_e*(poissrnd(Lambda_e,n,1)-Lambda_e*ones(n,1));

fprintf(’\nsigma_e = %7.4f; nu_e = %7.4f; Lambda_e = %7.4f;’...

,sigma_e,nu_e,Lambda_e);y = b+m*x+err; % Simulated y-data with linear model;

TSS = (n-1)*var(y);

fprintf(’\nTSS = %9.3e’,TSS);A = [ones(n,1) x];

fprintf(’\nsize(A)=[%i,%i]; size(y)=[%i,%i];’,size(A),size(y));

[preg,pci,res,resci,statreg] = Regress(y,A); % Multlinear Regression;breg = preg(1); mreg = preg(2);

yhatreg = breg+mreg*x;

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page29 — Floyd B. Hanson

Page 30: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

fprintf(’\nRelResReg =%9.3e;’,sqrt(norm(y-yhatreg)/norm(y)));

[prob,statrob] = robustfit(x,y); % Weighted Least Sqs Method;

yhatrob = prob(1)+prob(2)*x;

fprintf(’\nRelResRob =%9.3e;’,sqrt(norm(y-yhatrob)/norm(y)));

fprintf(’\nTrue: b =%7.4f; m =%7.4f;’,b,m);

fprintf(’\nRegress: b =%7.4f; m =%7.4f;’,breg,mreg);

fprintf(’\nRobustfit: b =%7.4f; m =%7.4f;’,prob(1),prob(2));

fprintf(’\nsqrt(norm(preg-ptrue)/norm(ptrue)) =%7.4f;’ ...

,sqrt(norm(preg-ptrue)/norm(ptrue)));

fprintf(’\nsqrt(norm(prob-ptrue)/norm(ptrue)) =%7.4f;’ ...

,sqrt(norm(prob-ptrue)/norm(ptrue)));

fprintf(’\nsqrt(norm(preg-prob)/norm(prob)) =%7.4f;’ ...

,sqrt(norm(preg-prob)/norm(prob)));

SSEreg = sum((y-yhatreg).ˆ2); % Also, RSS=ResSumSqs

SSErob = sum((y-yhatrob).ˆ2); % Also, RSS=ResSumSqs

fprintf(’\nRSS: SSEreg =%9.3e; SSErob =%9.3e;’,SSEreg,SSErob);

Rsqreg = 1-SSEreg/TSS; Rsqrob = 1-SSErob/TSS;

fprintf(’\nRsqreg =%9.3e; Rsqrob =%9.3e;’,Rsqreg,Rsqrob);

fprintf(’\nstatreg: Rˆ2 =%7.4f; F = %7.4f;’ ...

,statreg(1,1),statreg(1,2));

fprintf(’ P-value)(F) =%7.4f; Var(error) =%7.4f;’ ...

,statreg(1,1),statreg(1,1));

fprintf(’\nstatrob Sigmas: OLS_s=%7.4f; Robust_s=%7.4f;’ ...

,statrob.ols_s,statrob.robust_s);

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page30 — Floyd B. Hanson

Page 31: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

fprintf(’ MAD_s=%7.4f; final_s=%7.4f;’ ...

,statrob.mad_s,statrob.s);

fprintf(’\nstatrob: SE_p = [%7.4f; %7.4f];’,statrob.se);

fprintf(’\nstatrob: Corr_ps=[%7.4f, %7.4f; %7.4f, %7.4f];’ ...

,statrob.coeffcorr);

statrob,

%

xg = 0:0.1:1;

ytrue = b+m*xg;

yhatregg = breg+mreg*xg;

yhatrobg = prob(1)+prob(2)*xg;

%

figure(1); nfig = 1;

scrsize = get(0,’ScreenSize’); % figure spacing for target screen

ss = [5.0,4.5,4.0,3.5]; % figure spacing factors

scatter(x,y,8); hold on;

plot(xg,ytrue,’-g’,xg,yhatregg,’:k’,xg,yhatrobg,’--r’,’LineWidth’,2);

axis tight; hold off;

title(’Ordinary & Weighted Least Square Linear Example:’...

,’Fontsize’,24,’FontWeight’,’Bold’);

xlabel(’x’,’Fontsize’,24,’FontWeight’,’Bold’);

ylabel(’y’,’Fontsize’,24,’FontWeight’,’Bold’);

legend(’scatter plot’,’ True y=m+bx’,’ Regress fit’,’ Robustfit’ ...

,’Location’,’NorthWest’);

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page31 — Floyd B. Hanson

Page 32: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

set(gca,’Fontsize’,20,’FontWeight’,’Bold’,’LineWidth’,3);

set(gcf,’Color’,’White’,’Position’ ...

,[scrsize(3)/ss(nfig) 60 scrsize(3)*0.60 scrsize(4)*0.80]); %[l,b,w,h]

fprintf(’\n ’);

=========== CORRECTED Output ====================

RegRobtest Output (30-Jan-2010 14:04:30):

b = 0.2500; m = 1.0000;

sigma_e = 0.3000; nu_e = 1.5000; Lambda_e = 0.0500;

TSS = 3.247e+01

size(A)=[100,2]; size(y)=[100,1];

RelResReg =7.140e-01;

RelResRob =7.229e-01;

True: b = 0.2500; m = 1.0000;

Regress: b = 0.2241; m = 1.0794;

Robustfit: b = 0.1434; m = 1.0245;

sqrt(norm(preg-ptrue)/norm(ptrue)) = 0.2846;

sqrt(norm(prob-ptrue)/norm(ptrue)) = 0.3258;

sqrt(norm(preg-prob)/norm(prob)) = 0.3072;

RSS: SSEreg =2.372e+01; SSErob =2.492e+01;

Rsqreg =2.695e-01; Rsqrob =2.326e-01;

statreg: Rˆ2 = 0.2695; F = 36.1469; P-value)(F) = 0.2695;

Var(error) = 0.2695;

statrob Sigmas: OLS_s= 0.4920; Robust_s= 0.3220; MAD_s= 0.2732;

final_s= 0.3302;

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page32 — Floyd B. Hanson

Page 33: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

statrob: SE_p = [ 0.0690; 0.1205];

statrob: Corr_ps=[ 1.0000, -0.8780; -0.8780, 1.0000];

statrob =

ols_s: 0.4920

robust_s: 0.3220

mad_s: 0.2732

s: 0.3302

resid: [100x1 double]

rstud: [100x1 double]

se: [2x1 double]

covb: [2x2 double]

coeffcorr: [2x2 double]

t: [2x1 double]

p: [2x1 double]

w: [100x1 double]

R: [2x2 double]

dfe: 98

h: [100x1 double]

>>

Remark: Some differences with reasonable intercept (b) but poor slope(m), the parameter RMS difference being 0.3072. The OLS-sigma ismuch greater than the Robust-sigma or Final-sigma.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page33 — Floyd B. Hanson

Page 34: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

5.5 General Linear Least Squares (GLLS) Method forGeneral Linear Model (GLM):The use of only linear predictor variables is highly restrictive andunrealistic, since the model should depend on the available sciencemodels and not on the modeler’s limited toolbox of models. In thegeneral least squares method a general linear model is assumed thatthe response variable Y is a linear combination of functions of thepredictor variable ~X that includes linear and nonlinear ones, but themodel must be a linear in the parameters, ~c = [ci]p×1, which happen tobe the coefficients,

y =p∑

k=1

ckφk(~x), (5.35)

where the φk(~x) are the general nonlinear fit functions. An example isφk(~x)=1, the constant function, φk(~x)=xi, the linear in componentxi, or φk(~x)=xixj , the interactive function if i 6=j and pure quadraticif i=j.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page34 — Floyd B. Hanson

Page 35: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

Unfortunately, this excludes parameters in the function like φk(~x; ~β),such as when trying to fit multi-parameter densities where the parameterscannot be formed into a linear combinations coefficient.

Hence, we need to fit the response-predictor data to determine acoefficient parameter p-vector c with respect to the corresponding fitbasis function p-vector ~Φ(~x)=[φi(~x)]p×1. The objective is tominimize the GLLS chi-square function:

χ2(~c)=n∑

i=1

(Yi−

p∑k=1

ckφk( ~Xi)

)2/σ2

i , (5.36)

where the wi = 1/σ2i are the weight constants or functions for

i=1:m. For simplicity, it is assumed that σ2i = σ2

y,i is a varianceassociated with the response variable Yi, so that for the time being,σ2

x,i ≡ 0, i.e., we have precise ~Xi or δ ~Xi ≡ 0.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page35 — Floyd B. Hanson

Page 36: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

Here, the array

A = [Ai,j]n×p ≡√

WΦ>(X)≡[φj( ~Xi)/σi

]n×p

, (5.37)

is called the design matrix, where ~Xi =[Xi,j]1×m, m is the number ofpredictor variables, for i=1:n predictor observations, whileX =[Xi,j]n×m is the array of all predictor observations andW =[wiδi,j]n×n is the diagonal weight matrix. Further, let~Y =[Yi]n×1 is the vector of observed responses with assigned variancevector ~σ≡ [σi]n×1 .

It is reasonable to assume that n�p, that there is much more data thenthere are parameters, so that the system of equations in severelyover-determined and an averaged solution is necessary. The moreover-determined the system is, the better it is for the statistics. Thenumber of degrees of freedom (DOF) is DOF=n-p>0. Let the scaledoutput be ~Z =[Zi]n×1 ≡ [Yi/σi]n×1 ≡

√W [Yi]n×1.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page36 — Floyd B. Hanson

Page 37: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

Thus, Eq. (5.54) can be written in matrix-vector form,

χ2(~c)=(~Z − A~c)>(~Z − A~c)>. (5.38)The optimal critical condition for the coefficient parameters is

~0p∗=∇c[χ2](~c∗)= −2(A>~Z− A>A~c∗), (5.39)

yielding the least squares normal equation for the parameter vector

A>A~c∗ =A>~Z (5.40)and the estimate,

c=~c∗ =(A>A)−1A>~Z. (5.41)This is essentially the same form obtained for either ordinary or weightedleast squares regression. In fact, there is an example in the MATLABhelp for the OLS regress function,

A=[ones(size(x1)),x1,x2,x1.*x2], (5.42)that show how you can include simple GLLS basis functions into OLSvariables. In a sense, the raw data [x1, x2] is formed into substitutevectors, as long as still independent,

[X1,X2,X3,X4]=[ones(size(x1)),x1,x2,x1.*x2]. (5.43)

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page37 — Floyd B. Hanson

Page 38: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• Singular Value Decomposition (SVD):

Although, the product matrix A>A is symmetric and is usuallyinvertible, it contains no more information than A itself and it is desirableto work just with A. The singular value decomposition are an ideal wayof doing this for the least squares design matrix A. SVD for A has theform,

A = UDV >, (5.44)

where U =[Ui,j]n×n is the same size as A and is column othogonal,U>U =In or

∑nk=1Uk,iUk,j =δi,j = ~U

(col)i · ~U

(col)j , V =[Vi,j]p×p

is square and row or column orthogonal, V >V =Ip=V V >, andD=[siδi,j]n×p is a diagonal matrix with singular values si along thediagonal and measuring the effect of A on a vector with respect to a norm,such that mini(si)≤||A~u||/||~u||≤maxi(si), ||~u|| 6=0. Thus,

D=InDIn =U>UDV >V =U>AV. (5.45)

The SV problem (SVP) is AV =sU . See MATLAB’s svd function.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page38 — Floyd B. Hanson

Page 39: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

For GLLS, if A has an SVD, i.e., A = UDV >, then using the transposereversed product algebra, i.e., (AB)>=B>A>,

A>=(UDV >)>=

(V >)>D>U>=V D>U>, (5.46)

since the transpose of transpose does nothing to a matrix and noting that anon-square the diagonal is not symmetric so D>6= D. The desiredproduct A>A decomposes as

A>A=V D>U>UDV > =V D>InDV > =V D>DV >, (5.47)

where D>D=[s2i δi,j]p×p, which is square and genuinely symmetric.

Next, applying the SVD (5.46) and (5.47) to the normal equation (5.40)for the parameter vector,

V D>DV >~c∗ =V D>U>~Z (5.48)

and inverting the coefficients V, (D>D) and V >of ~c∗ one by one yields

c = ~c∗ =V DU>~Z, (5.49)

where D=(D>D)−1D> =[δi,j/si]p×n, would be “D−1” ifinvertible, but D is almost always not square.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page39 — Floyd B. Hanson

Page 40: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• SVD Parameter Solution:

Hence, the least squares solution is, given SVD matrices for A, is reducedto matrix-vector multiplication, a big advantage for a matrixcomputational system like MATLAB. However, by expanding by matrixelements the multiplications can be reduced to partial matrixmultiplication:

~c∗ = [c∗i ]p×1 =

[∑pj=1 Vi,j

∑pk=1

1sj

δj,k

∑n`=1 Uk,`Z`

]p×1

=[∑p

j=1 Vi,j1sj

∑n`=1 Uj,`Z`

]p×1

=p∑

j=1

1

sj

((~U

(col)j

)>~Z

)~V

(col)j ,

(5.50)

where the column vectors are ~U(col)j ≡ [Ui,j]p×1 and

~V(col)

j ≡ [Vi,j]n×1. Note that ((~U(col)j )>~Z)) is a scalar product, so the

vector-orientation of ~c∗ comes mainly from ~V(col)

j .

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page40 — Floyd B. Hanson

Page 41: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• SVD Parameter Sensitivities:

The vector-form of SVD parameter solution is also useful for determiningthe parameter sensitivities to the output responses, e.g., the sensitivity ofparameter c∗

j to the original response Yi =σiZi is

∂c∗j

∂Yi

=p∑

k=1

1

sk

((~U

(col)k

)>∂ ~Z

∂Yi

)Vj,k =

1

σi

p∑k=1

1

sk

Ui,kVj,k. (5.51)

This sensitivity can be used to estimate the variance of c∗j , under restored

response weighting,

σ2c∗

j=

n∑i=1

σ2i

(∂c∗

j

∂Yi

)2

=n∑

i=1

p∑k=1

1

sk

Ui,kVj,k

p∑`=1

1

s`

Ui,`Vj,`

=p∑

k=1

1

s2k

V 2j,k,

(5.52)

using the column othogonality∑n

i=1 Ui,kUi,` =δ`,k. Similarly, theestimated covariance is cov(c∗

i , c∗j )=

∑pk=1Vi,kVj,k/s2

k.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page41 — Floyd B. Hanson

Page 42: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• Fitting General Nonlinear Models (GNLM):Given a general nonlinear model y=f(x;~c), beyond the linearcombination form y=

∑pj=1 cjφj(x), the primary technique is to

transform the GNLM, if possible, to a form suitable for least squares. Dueto complexity, the GNLM can be numerically ill-conditioned, i.e.,difficult to compute. Some examples:

1. Exponential-like Density: y=f(x;~c)=c2 exp(−c1x). Letz=ln(y)=−c1x+ln(c2), so letting z=

∑2j=1 bjφj(x) with

φ1(x)=1, φ2(x)=x, b1 =ln(c2) and b2 =−c1. Problem is thatfor the single parameter exponential distribution c1 =c2 =1/µ, soleast square errors are likely to lead to inconsistencies.

2. Normal-like Density: y=f(x;~c)=exp(−(x−c1)

2/(2c2))√2πc2

. Let

z=ln(y)=−(x−c1)2/(2c2)−ln(

√2πc2), so letting

z=P3

j=1 bjφj(x) with φ1(x)=1, φ2(x)=x, φ3(x)=x2,

b1 =−c21/(2c2)−ln(

√2πc2), b2 =c1/c2 and

b3 =−1/(2c2). Problem is that for the two parameter normaldistribution leads to three related coefficients and inconsistencies.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page42 — Floyd B. Hanson

Page 43: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• General Nonlinear Least Square Fit Methods:Let the nonlinear objective be

χ2(~c)=χ2( ~X, ~Y , ~σ;~c)=n∑

i=1

(Yi − f(Xi;~c))2

σ2i

. (5.53)

Searching for critical points in p-dimensions,

∂χ2

∂cj

(~c)=−2n∑

i=1

(Yi − f(Xi;~c))

σ2i

∂f

∂cj

(Xi;~c)∗=0 (5.54)

for j =1:p and where ~c∗ = min~c

[χ2(~c)

]. Difficulties are there is no

closed-form linear normal equation there is no SVD or similar methodsand the optimization may be ill-conditioned with local minima hiding aglobal minima.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page43 — Floyd B. Hanson

Page 44: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• General Nonlinear Least Square Fit Functions in MATLAB:

The nonlinear least squares is the kind of method needed to find themarket distribution for assets or other financial instruments. Derivativeoptimization methods like methods of steepest descents and variations ofNewton’s method may be time-consuming and for complicateddistribution models that derivative formulas may be quite complex. Forgeneral users, it may be desirable to use direct methods which bydefinition do not use derivatives or derivative methods for which the codeasks only for an input of the function χ2 and an initial value of theparameter ~c0. Some of these in MATLAB, listed in the simplest or next tosimplest form:

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page44 — Floyd B. Hanson

Page 45: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

1. c=fminsearch(chi2,c0): The chi2 is a function handle forthe objective function assumed to include the nonlinear fit function f,c0 is an input starting vector value and c is the output answer. Thisis the granddaddy of direct search and uses Nelder-Mead’s (1994)down-hill simplex method which employs flexible super-triangles inthe parameter space to search for unconstrained minima. It is fast andfairly robust, but is is just a basic minimization method. Relatedfunction fminunc is an unconstrained derivative minimum searchmethod in the Optimization Toolbox. Functions fminbndand fmincon treat constrained minima, the first by direct search andthe second by derivative search. See MATLAB helpfor moreinformation.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page45 — Floyd B. Hanson

Page 46: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

Figure 5.3: MATLAB Down-Hill Simplex Algorithm Illustration:Fminsearch Algorithm, Unconstrained Nonlinear Optimization page, Op-timization Toolbox, 2008. Triangular with (x(1), x(2), x(n + 1) ≈ x(3)).

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page46 — Floyd B. Hanson

Page 47: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

2. [c,Res,Cov,mse]=nlinfit(X,Y,f,c0): This is anonlinear regression function, where X is the input predictor, Y isthe input response, f is the function handle for the nonlinear fitfunction, c0 is an input starting vector value and c is the outputanswer. Extra output is residual Res. Jacobian J, covariance matrixCov and an error term mse. It is similar to robustfit multilinearweighted least squares, but nlinfit also uses the Jacobian ofderivatives fot f. Also, nlinfit has several auxiliary functions anda GUI tool:

• cci=nlparci(c,Res,’covar’,Cov):This gives 95% confidence intervals cci for the output parameterc,Res,Cov of nlinfit, else insert the input option pair{,’alpha’,alpha}. The ’covar’ is the parameter labelfor Cov.

• [ypred,delta]=nlpredci(f,X,c,Res,’covar’,Cov):This gives predicted values ypred for 95% CI half-widths delta.It also uses the output parameters c,Res,Cov of nlinfit.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page47 — Floyd B. Hanson

Page 48: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• nlintool(X,Y,f,c0,alpha,’Xname’,’Yname’): This isthe graphical user interface for nlinfit, can take optionalarguments like the complementary CI parameter alpha and plotXY-labels ’Xname’,’Yname’.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page48 — Floyd B. Hanson

Page 49: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

5.6 Maximum Likelihood Estimation Method:Maximum likelihood estimation methods are a form of regression wherethe objective is derived from maximum of the likelihood, i.e., the mostprobable state of the distribution. The maximum probability occurs at themaximum of the density, called the mode. For example, the normaldistribution density is

f(n)Z (z; µ, σ2)=exp(−(z − µ)2/(2σ2))/

√2πσ2 (5.55)

and checking for critical points,

0 ∗=(f (n)Z )′(z∗; µ, σ2)=−

2(z∗ − µ)

σ2 f(n)Z (z∗; µ, σ2), (5.56)

we find the critical point and mode to be at the mean z∗ = µ since thatcan be the only interior critical point, the density only vanishing atinfinity. Note that we effectively computed the logarithmic derivatived ln(f (n)

Z )/dz=(f (n)Z )′/f

(n)Z . Often, the log-likelihood using the

logarithm of the density rather than the density itself for the maximumlikelihood to avoid the over-dominance of exponentials.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page49 — Floyd B. Hanson

Page 50: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

Often we are looking for a multidimensional mode (the value of thevariate where the maximum probability occurs) and in analogy with thenormal density and it is easier to look at the exponent by using thelogarithm than at the exponential itself. Since if the kernel or mostimportant part of the distribution is of the form K(x) = exp(φ(x))then ln(K(x)) = φ(x) and

K′(x) = φ′(x)K(x) = (log(K))′(x)K(x), (5.57)

so the determination of the maximum likelihood location or mode bycritical point analysis of this simple kernel is related to the simpler criticalpoint analysis of the logarithm of the kernel, i.e., zeros of the logarithmicderivative of the kernel, assumed unimodal.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page50 — Floyd B. Hanson

Page 51: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

• A Probabilistic Introduction to Maximum Log-Likelihood (MLE)for Financial, Exploratory Data Analysis Applications:a

Let us return to the geometric Brownian motion or linear diffusion withconstant coefficients model for an asset price A(t), where

dA(t) = A(t)(µdt + σdW (t)), (5.58)where the statistics (µ, σ) are constant and W (t) is the theoretical modelfor Brownian motion. Since the equation is linear in the asset price, it isconvenient to transform to log-pricing using the stochastic dt precisionchain rule to to get the arithmetic Brownian motion,

d log(A(t)) = (µ − σ2/2)dt + σdW (t). (5.59)Further, since our financial data is discrete and assuming the time-step ∆t

is sufficiently small, the equation is converted to asset log-return form,LRi = ∆ log(Ai) = m` +

√v`Zi (5.60)

where, simplifying, m` ≡ (µ − σ2/2)∆t and v` ≡ σ2∆t > 0 arelog-coefficients from using ∆Wi =

√∆tZi with Zi

dist=IID

N (0, 1) ∀ i .aSee Carmona ’04, p. 124ff; Hull ’06, p. 567ff; Hanson 01’-08’ computational finance

papers, http://www.math.uic.edu/ hanson/compfinpapers.html.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page51 — Floyd B. Hanson

Page 52: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

The log-return distribution for each i, with X = LRi, is given bydefinition,

FX(xi) ≡ Prob[X ≤ xi] = Prob[m` +√

v`Zi ≤ xi]alg= Prob[Zi ≤ (xi − m`)/

√v`]

N= F(n)Z ((xi − m`)/

√v`; 0, 1) .

(5.61)

Thus, upon differentiation, the ith likelihood function is the densityfunction,

LHi(m`, v`) = fX(xi) = 1√v`

f(n)Z ((xi − m`)/

√v`; 0, 1)

N= 1√2πv`

exp(−0.5(xi − m`)2/v`

).

(5.62)

Further, the Zi are IID normal, so the total density is the product of allthe individual density for the log-return data count for i=1:n.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page52 — Floyd B. Hanson

Page 53: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

Hence,LH(n)(m`, v`) = f ~X(~x) IID=

n∏i=1

fX(xi)

= 1(√

v`)n

n∏i=1

f(n)Z ((xi − m`)/v`; 0, 1)

N= 1(√

2πv`)n

n∏i=1

exp(−0.5(xi − m`)2/v`

).

(5.63)

For further simplicity, we take logarithms, turning the products into sums,to get the log-likelihood function (the natural logarithm, here log ⇔ ln),

LLH(n)(m`, v`) = log(f ~X(~x)

)=

n∑i=1

log(fX(xi))

= −1

2v`

n∑i=1

(xi − m`)2 −n

2log(2πv`),

(5.64)

ending up with the negative of the least squares objective. Although oftenthe log-likelihood function is multiplied by a minus sign since mostoptimal solvers are written as minimizers,

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page53 — Floyd B. Hanson

Page 54: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

The system bias in linear regression is that they are natural for normallydistributed randomness. There are several variants of maximumlikelihood estimation based on other distributions, e.g., multinomialmaximum likelihood estimationa and MATLAB’s generic maximumlikelihood function mle specialized to 17 named distribution optionsusing the response data. Seeking critical points,

∂LLH(n)

∂m`(m`, v`) =

1

v`

n∑i=1

(xi − m`)∗= 0, (5.65)

and∂LLH(n)

∂v`(m`, v`) =

1

2v2`

n∑i=1

(xi − m`)2 −n

2v`

∗= 0, (5.66)

gives the simultaneous estimates,

m` =m∗` =

1

n

n∑i=1

xi ≡x & v` =v∗` =

1

n

n∑i=1

(xi − m∗` )

2 ≡ σ2x. (5.67)

This justifies using the mean and variance directly from the log-return data.aFor instance, Hanson with Westman and Zhu 04a’ and 04b’, computational finance pa-

pers, http://www.math.uic.edu/ hanson/compfinpapers.html.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page54 — Floyd B. Hanson

Page 55: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

This is essentially what one would expect and could have been obtainedfrom mean(LR) and var(LR). Often, practitioners may a prior setm` =0, assuming that the mean log-return is usually small anyway.Finally, we need to convert back to standard model coefficient, assumingthat ∆t, usually one trading day in years, is known,

σ =σx√∆t

& µ =x + σ2

x/2

∆t, (5.68)

the latter form seems to contradict the practice of throwing out the meanm` since σ2

x/2 > 0.

One advantage of maximum likelihood estimation (MLE), it that allproblems are not forced into one template objective such as least squaresfor the normal noise model, but yield an objective that is more natural forthe target problem and its distribution.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page55 — Floyd B. Hanson

Page 56: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

(not presented)• Using Subsamples to Get Some Estimate of Time Dependence ofModel Coefficients:

If the sample size it sample size n is sufficiently large, then subsamplescan be used to get an estimate of time dependence of the modelcoefficients. Since the sample index i is associated with a trading day,picking i = k for some trading day ti, marking a log-return between i andi + 1, and then a half bandwidth iw, then our MLE estimates would be

m`,k =1

n

k+iw∑i=k−iw

xi ≡ xk (5.69)

and

v`,k =1

n

k+iw∑i=k−iw

(xi − m`,k)2 ≡ σ2x,k, (5.70)

where k − iw ≥ 1 and k + iw ≤ n. The index k represents the centerof a moving window in time.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page56 — Floyd B. Hanson

Page 57: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

(not presented)• General Maximum Likelihood Function in MATLAB:

[chat,cci]=mle(Y,’distribution’,’DistName’,’alpha’,

alpha): This is a maximum likelihood estimator for a large number ofspecialized distributions. The required input is data Y, but the optionalinput is the ’distribution’ parameter paired with the name value’DistName’ which can be ’bernoulli’, ’bernoulli’,’exponential’, ’generalized pareto’, ’lognormal’,’normal’ (default), ’poisson’, ’uniform’ and others. The outputis the estimated parameters chat and the parameter confidence intervalcci at complementary level alpha, but if pair ’alpha’, alpha isomitted then the CI is at the MATLAB default of 0.05 or 95% CI. Manyof the mle special distributions are prepackaged, such as binofit orpoissfit. The auxiliary function mlecov with similar argumentsoutputs the parameter estimate covariance matrix.

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page57 — Floyd B. Hanson

Page 58: FinM 331/Stat 339 Financial Data Analysis, Winter 2010homepages.math.uic.edu/~hanson/finm331/FINM331W10Lecture...lscov: mse = 6.627e-02; >> Remark: Some scaling differences mean variables

* Reminder: Lecture 5 Homework Posted in Chalk Assignments,due by Lecture 6 in Chalk Assignments!

* Summary of Lecture 5:

1. Ordinary Least Squares or Linear Regression

2. Multilinear Regression

3. Weighted Least Squares

4. General Linear Least Squares

5. Maximum Likelihood Estimation Method (part 1)

FINM 331/Stat 339 W10 Financial Data Analysis — Lecture5-page58 — Floyd B. Hanson