biol 582 supplemental material matrices, matrix calculations, glm using matrix algebra

26
BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

Upload: shana-bryan

Post on 03-Jan-2016

228 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

BIOL 582

Supplemental Material

Matrices, Matrix calculations,

GLM using matrix algebra

Page 2: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• Compact method of expressing mathematical operations (including statistics)

• Generalize from one to many variables (i.e. vectors to matrices)

• Matrix operations have geometric interpretations in data spaces

• Many data types (e.g., morphometric data) cannot be measured with a single variable, so multivariate methods are required to properly address hypotheses

BIOL 582 Why bother with matrices?

Page 3: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

•Scalar: a number•Vector: an ordered list (array) of scalars (nrows x 1cols)

•Matrix: a rectangular array of scalars (nrows x pcols)

3b 11 1

1

p

n np

X X

X X

X

1

n

e

e

e

BIOL 582 Scalars, vectors, and matrices

Page 4: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• Reverse rows and columns• Represent by At or A′• Vector transpose works identically

a d

b e

c f

A ' a b c

d e f

tA = A

BIOL 582 Matrix transpose

Page 5: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• Matrices must have same dimensions• Add/subtract element-wise• Vector addition/subtraction works identically

2 4 6 8

1 3 5 9

8 12A B =

6 12

2 4 6 8

1 3 5 9

-4 -4A B =

-4 -6

Addition

Subtraction

BIOL 582 Matrix addition and subtraction

Page 6: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

inner

• Scalar multiplication: Multiply scalar by each element in matrix or vector

• Matrix/vector multiplication is a summed multiplication

• Inner dimensions allow multiplication• Outer dimensions determine size of result• Order of matrices makes a difference: AB BA

BIOL 582 Matrix multiplication

AB

n1 × p1 * n2 × p2

Page 7: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

•Scalar multiplication:

•Matrix multiplication:

2 6

3 7 21

5 15

a

E

1 1 1 21 1

2 1 2 21 1

n n

i i i ii i

n n

i i i ii i

a b a b

a b a b

AB =

2 11 2 3 2 6 12 1 6 6

3 34 5 6 8 15 24 4 15 12

4 2

20 13AB =

47 31

Inner dimensions MUST AGREE!!!

BIOL 582 Matrix multiplication

Page 8: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• Inner (scalar) product: vector multiplication resulting in a scalar (weighted linear combination)

• Outer (matrix) product: vector multiplication resulting in a matrix

6

1 2 3 5 28

4

ta b =

1 6 5 4

2 6 5 4 12 10 8

3 18 15 12

tab =

Inner Product

Outer Product

Inner dimensions MUST AGREE!!!

BIOL 582 Matrix multiplication: inner and outer products

Page 9: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• I: Identity matrix (equivalent to ‘1’ for matrices)• 1: A matrix of all ones• 0: A matrix of all zeros• Diagonal: diagonal contains non-zero elements • Square: n = p• Symmetric: off-diagonal elements same:

np pnX X

BIOL 582 Special matrices

Page 10: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

1 0 0

0 1 0

0 0 1

I

1 1

1 1

1

1 2 4

2 5 6

4 6 3

T

4 0 0

0 1 0

0 0 2

D

0 0

0 0

0

2 1 1 0 2 1:

3 4 0 1 3 4

AI A

BIOL 582 Special matrices

Page 11: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• Orthogonal: square matrix with property: • VERY useful for statistics and other fields (e.g,

morphometrics)

• Orthogonal matrices can be thought of as rigid rotations of data sets (shown later)

t AA I

.7071 .7071 .7071 .7071 1 0:

.7071 .7071 .7071 .7071 0 1t

AA IOrthonormal Example:

BIOL 582 Special matrices

Page 12: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• Can’t divide matrices , so calculate the inverse (reciprocal) of denominator and multiply

• Inverses have property that: • Inverses are tedious to calculate, so in practice we use a computer• Only works for square matrices whose determinant 0!!!

• Determinant: combination of diagonal and off-diagonal elements

• A matrix whose determinant = 0 is Singular (has no inverse)

A

B

1 AA I

det( )a ad bc Aa b

c d

AFor:

BIOL 582 Matrix inversion

Page 13: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

•For the 2 x 2 case:

Example:

Confirm:

a b

c d

A 1 1

d b

d b

c a c a

A AA

A

A A

det( )a ad bc A2 3

1 4

A

1

4 34 3 0.8 0.61 5 51 2 1 2 0.2 0.44*2 3*1

5 5

A

1 2 3 .8 .6 1 0

1 4 .2 .4 0 1

AA

BIOL 582 Matrix inversion: example

Page 14: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• Multiplying data and other matrices has geometric interpretations

• XI=X: No change to X• cIX=Y: Change of scale (e.g, enlargement)• XD=Y: Stretching if D is diagonal• XT=Y: Rigid rotation if T is p×p orthogonal• XT=Y: Shear if T is not orthogonal

(T can be decomposed into rotation, dilation, rotation)

• X = data matrix

BIOL 582 Matrix multiplication: geometric interpretations

Page 15: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

(images from C.A.B. Smith, 1969)

Original

Scalar (2)

Scalar (1/2)

Rotations

Shears and Projections

BIOL 582 Matrix multiplication: visual examples

Page 16: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

•The GLM model:

•Independent Variable/s:

•Dependent Variable/s:

•Solve for ‘regression coefficients’

• found from:

BIOL 582 GLM in matrix form

Note, in general vectors are lower case and matrices are upper case, but using upper case is more encompassing

Y Xβ

Page 17: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

Multiply by inverse:

•Why this equation? 1ˆ t tβ X X X Y

Start with:

t tX Y X Xβ

1 1

1 ˆ

t t t t

t t

X X X Y X X X Xβ

X X X Y β

Make ‘X’ a square matrix:

BIOL 582 GLM in matrix form: Solving for β

ˆY = XβY = Xβ + ε This is the model

Y = Xβ

Page 18: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

and:Y = Xβ + ε

1. Expand matrixes:

where:

2. Begin rewrite:

BIOL 582 GLM in matrix form: Deriving univariate regression

Page 19: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

2. From before: 1ˆ t t

β X X X Y

-1

2

1=n X Y

XYX X

2 2

2 2 2

2 2 2

2 2 2

2 2 2

X X X Y X XY

n X X n X X n X XY

X XY n XY X Yn

n X X n X X n X X

0

1

b

b

4. Multiply

2

2 2

2 2

2 2

2 2

X X

n X X n X X

Xn

n X X n X X

-1

2

1=n X

X X3. Calculate inverse:

BIOL 582 GLM in matrix form: Deriving univariate regression

Page 20: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• F-ratio is: SSM/SSE (with df corrections)• Need to calculate full and reduced model SS• Full model (contains all terms)

• Reduced model (X# has 1 less term – column of x values – in it)

• Significance based on:

• Or one can always use a random permutation approach…

2 ˆ ˆˆt

iSSEf y y Y X Y X

# #ˆ ˆt

SSEr Y X Y X

1

s

SSEr SSEfk

FSSEf

n k

BIOL 582 GLM in matrix form: Calculating sums of squares (SS)

Page 21: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

•The Data: (for matrix form):1

2

3

4

5

Y

1

4

5

7

9

X

1 1

1 4

1 5

1 7

1 9

X

10

1

0.9348 0.1413 15 0.3152ˆ0.1413 0.0272 97 0.5163

t tb

b

β X X X Y

0.1902t

SSEf Y X Y X

# # 10.0000t

SSEr Y X Y X

Source df SS MS (SS/df) F P

Regression Δk = 1 SSEr -SSEf = 9.8098 9.8098 154.71 < 0.0001

Error n - Δk – 1 = 3 SSEf = 0.1902 0.0634

Total n – 1 = 4 SST = SSEr = 10

ˆ 0.315 0.516

ˆ 0.315 0.516

or

y x

Y X

BIOL 582 Regression example

Page 22: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

BIOL 582 Using GLM for ANOVA

• Analysis of Variance (ANOVA) is the standard way of comparing means among multiple groups.

• ANOVA is the cornerstone of most applied stats courses in life science fields

• Linear regression equation

• ANOVA equation

0 1i i iy b b x

i i ij jy

yiy y

Page 23: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

• Same idea, but must use special X-matrix coding• Recode k groups in k-1 dummy variables columns) of X

• Generally, column 1 yields , column 2 yields deviation from for mean of group 1, etc.

1 1

1 1

1 1

1 1

1 1

1 1

X

y

BIOL 582 Using GLM for ANOVA

y

Page 24: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

•The Data:5

4

4

4

3

7

5

6

6

6

Y1

1 1

1 1

1 1

1 1

1 1

1 1

1 1

1 1

1 1

1 1

X

10

1

ˆ 5ˆˆ 1

t tb

b

β X X X Y

4t

SSEf Y X Y X # # 14

tSSEr Y X Y X

1 4y

5y

2 6y n1=5

n2=5

BIOL 582 Using GLM for ANOVA: Example 1

51 1

1

51 1

1

Source df SS MS (SS/df) F P

Group Δk = 1 SSEr -SSEf = 10 10 20 0.0021

Error n - Δk – 1 = 8 SSEf = 4 5

Total n – 1 = 9 SST = SSEr = 14

(Reduced model is one column of 1s)

DEMO

Page 25: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

BIOL 582 GLM final comments

• As we will learn, ANOVA, ANCOVA, Multiple Regression, MANOVA, MANCOVA, and Multivariate Multiple Regression, are all variants of the same GLM procedure.

• All of these “different” analytical approaches are no different to a computer using matrix calculations to perform GLM

• If there are 4 groups, then 4 – 1 = 3 dummy variables are needed. If there are 88 groups, then 88 – 1 = 87 dummy variables are needed. ALWAYS, there are a -1 “factor” levels for a groups.

Page 26: BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra

BIOL 582 GLM final comments

• Dummy variables are “indicator” variables

• E.g., can be written as

where Z is an indicator: 1 if in the group; 0 if not in the group.

• There are two ways to form the design matrix (X):1 1 0 0

1 1 0 0

1 0 1 0

1 0 1 0

1 0 0 1

1 0 0 1

1 1 1 1

1 1 1 1

1 0 0 0

1 0 0 0

1 1 0 0

1 1 0 0

1 0 1 0

1 0 1 0

1 0 0 1

1 0 0 1

0 1

i i i

i i i

y

y b b Z

Groups 2-4 means are expressed as deviations from the first group mean

All group means are expressed as deviations from the overall group mean

Analytically, these are no different, but different software packages use different approaches!