multivariate data and matrix algebra review bmtry 726 spring 2012

44
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Upload: scarlett-baker

Post on 23-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Multivariate Data and Matrix Algebra Review

BMTRY 726Spring 2012

Page 2: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

What is ‘Multivariate’ Data?• Data in which each sampling unit contributes to

more than one outcome. For example….

Sampling Unit

Cancer patients Serum concentrations on a panel of protein markers are collected in chemotherapy patients

Smoking cessation participants

Collect background information and smoking behavior at multiple visits

Post-operative patient outcome

Multiple measures of how a patient is doing post-operatively: patient self-reported pain, opioid consumption, ICU/Hospital length of stay

Diabetics Each subject assigned to different glucose control option (medication, diet, diet and medication). Fasting blood glucose is monitored at 0, 3, 6, 9, 12, and 15 months.

Page 3: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Goals of Multivariate Analysis• Data reduction and structural simplification – Say we collect 16 biological markers to examine patient

response to chemotherapy.– Ideally we might like to summarize patient response as

some simple combination of the markers.– How can variation in p=16 markers be summarized?

Page 4: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Goals of Multivariate Analysis• Sorting and grouping data– Participants are enrolled in a smoking cessation program

for several years– Information about the background of each subject and

smoking behavior at multiple visits– Some patients quit while others do not– Can we use the background and smoking behavior

information to classify those that quit and those that do not in order to screen future participants?

Page 5: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Goals of Multivariate Analysis• Investigating dependence among variables– Subjects take a standardized test with different categories

of questions• Sentence completion• Number sequences• Orientation of patterns• Arithmetic (etc.)

– Can correlation among scores be attributed to variation in one or more unobserved factors?• Intelligence• Mathematical ability

Page 6: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Goals of Multivariate Analysis• Prediction based on relationship between variables– We conduct a microarray experiment to compare tumor

and healthy tissue– We want to develop a reliable classification tool based on

the gene expression information from our experiment

Page 7: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Goals of Multivariate Analysis• Hypothesis testing– Participants in a diabetes study are placed into one of

three treatment groups– Fasting blood glucose is evaluated at 0, 3, 6, 9, 12, and 15

months– We want to test the hypothesis that treatment groups are

different.

Page 8: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Multivariate Data Properties• What property/ies of multivariate data make

commonly used statistical approached inappropriate?

Page 9: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Notation & Data Organization

• Consider an example where we have 15 tumor markers collected on 30 tissue samples

• The 15 markers are variables and our samples represent the subjects in the data.

• These data can most easily be expressed as an 15 by 30 array

Page 10: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Notation & Data Organization

• More generally, let j = 1,2,…,p represent a set of variables collected in a study

• And let i = 1,2,…,n represent the samples

11 12 1

21 22 2

n1 2

p

p

n p

n np

x x x

x x x

x x x

X

Page 11: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Random Vectors• Each experimental unit has multiple outcome

measures thus we can arrange the ith subject’s j = 1,2,…, p outcomes as a vector.

• is a random variable as are it’s individual elements

1

2

i

i

i

ip

X

X

X

X

iX

p denotes the number of outcomes for subject i

i = 1,2,…,n is the number subjects

Page 12: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Descriptive Statistics• We can calculate familiar descriptive statistics

for this array– Mean

– Variance

– Covariance (Correlation)

Page 13: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Arranged as Arrays

• Means

• Covariance

1

2

p

x

x

x

11 12 1

21 22 2

1 2

p

p

p p pp

s s s

s s s

s s s

Page 14: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Distance• Many multivariate statistics are based on the

idea of distance• For example, if we are comparing two groups

we might look at the difference in their means• Euclidean distance

Page 15: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Distance• But why is Euclidean distance inappropriate in

statistics?• This leads us to the idea of statistical distance• Consider a case where we have two measures

Page 16: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Distance• Consider a case where we have two measures

Page 17: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Distance• Consider a case where we have two measures

Page 18: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Distance• Our expression of statistical distance can be

generalized to p variables to any fixed set of points

1 2

1 2

22 2

1 1 2 2

11 22

, ...

p

p

p p

pp

P x x x

Q y y y

x yx y x yd Q P

s s s

Page 19: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Basic Matrix Operations

• Can I add A2x3 and B3x3?

• What is the product of matrix A and scalar c?

• When can I multiply the two matrices A and B?

Page 20: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Matrix Transposes

• The transpose of an n x m matrix A, denoted as A’, is an m x n matrix whose ijth element is the jith element of A

• Properties of a transpose:

''

'

'

( )

A

A B

AB

Page 21: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Types of Matrices• Square matrix:

• Idempotent:

• Symmetric:

• A square matrix is diagonal :

Page 22: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

More Definitions

• An n x n matrix A is nonsingular if there exists an matrix Bn x n such that

• B is the multiplicative inverse of A and can be written as

• A square matrix with no multiplicative inverse is said to be….

• We can calculate the inverse of a matrix assuming one exists but it is tedious (let the computer do it).

AB BA I

1B A

Page 23: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Matrix Determinant• The determinant of a square matrix A is a scalar

given by

• What is the determinant of

11

1

1 11

1

11n j

j jj

a n

naA

A

11 12

21 22

a a

a aA

Page 24: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Matrix Determinant• The determinant of a square matrix A is a scalar

given by

• What is the determinant of

11

1

1 11

1

11n j

j jj

a n

naA

A

11 12

21 22

11 22 12 21

a a

a a

a a a a

A

A

Page 25: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Matrix Determinant• What about the determinant of the 3x3 matrix?

11 12 13

21 22 23

31 32 33

22 23 21 23 21 22

11 12 1332 33 31 33 31 32

11 22 33 12 23 31 13 21 32 13 22 31 12 21 33 11 23 32

a a a

a a a

a a a

a a a a a aa a aa a a a a a

a a a a a a a a a a a a a a a a a a

A

A

A

Page 26: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Matrix Determinant• Using this result what is the determinant of

?1 4 0

2 2 1

1 3 0

A A

Page 27: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Orthogonal an Orthonormal vectors

• A collection of m-dimensional vectors, x1, x2,…, xp are orthogonal if…

• The collection of vectors is said to be orthonormal if what 2 conditions are met?

Page 28: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Linear Dependence• The p of m-dimensional vectors, , are

linearly dependent if there is a set of constants, c1,c2,…,cp not all zero for which

1 2, ,..., px x x

Page 29: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Linear Dependence• The p of m-dimensional vectors, , are

linearly dependent if there is a set of constants, c1,c2,…,cp not all zero for which

• Conversely, if no such set of non-zero constants exists, the vectors are linearly independent.

1 2, ,..., px x x

Page 30: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Rank of a Matrix• Row rank is the number of rows• Column rank is the number of cols• Find the column rank of

• How are row and column rank related?• What does rank tell us about linear dependence of

the vectors that make up the matrix?

1 2 4

3 0 6

5 3 13

A

Page 31: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Orthogonal Matrices

• A square matrix Anxn is said to be orthogonal if its columns form an orthonormal set.

• This can be easily be determined by showing that

' A A I

Page 32: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Eigenvalues and Eigenvectors

• The eigenvalues of an Anxn matrix

are the solutions to

for a set of eigenvectors, . We typically normalize so that

1 2 ... 0p

i i ie eA

1 2, ,..., pe e e

'

'

1

0 for all

i i

i j

e e

e e i j

Page 33: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Quadratic Forms

• Given a symmetric matrix Anxn and an n-dimensional vector x, we can write the quadratic form as .

• For example, find the quadratic form where '1 2x xx

'x Ax

2 21 1 2 23 5x x x x

Page 34: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

2 21 1 2 23 5x x x x

Page 35: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Trace• Let A be an nxn matrix, the trace of A is given by

• Properties of the trace:

1.

2.

3.

4. ' ' ' '

tr c c tr

tr tr tr

tr tr

tr tr tr tr

A A

A B A B

AB BA

AB B A AB A B

Page 36: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Positive Definite Matrices• A symmetric matrix A is said to be positive definite if

this implies

a. ' 0 for all except ' 0 0 0p x Ax x R x

1

1. rank

2. 0

3. exists

p

A

A

A

Page 37: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Positive Definite Matrices

• A real symmetric matrix is:

b. negative definite if ' 0 for all non-zero

c. positive semi-definite if ' 0 for all non-zero

d. negative semi-definite if ' 0 for all non-zero

x Ax x

x Ax x

x Ax x

Page 38: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Back to Random Vectors• Define Y as a random vector

• Then the population mean vector is:

1

2

p

Y

Y

Y

Y

1

2

p

E Y

E YE

E Y

Y

Page 39: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Random Vectors Cont’d

• So Yi is a random variable whose mean and variance can be expressed by:

2 2 2

mean:

variance:

i i i i i i

i i i i i i i i i ii

E Y y f y dy

V Y y f y dy E Y

Page 40: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Covariance of Random Vectors• We then define the covariance between the ith and jth

trait in Y as

• Yielding the covariance matrix

Cov , i j i i j j ijY Y E Y Y

11 12 1

21 22 2

1 2

p

p

p p pp

V

Y Σ

Page 41: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Correlation Matrix of Y• The correlation matrix for Y is

12 1

21 2

1 2

2 2

1 11 1

1 12 2

1 1

1

1

1

where

Note

p

p

p p

ijij ji

i j

p p

P

P

Page 42: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Properties of a Covariance Matrix

• is symmetric (i.e. sij = sji for all i,j)

• is positive semi-definite for any vector of constants

Σ

Σ

1

2 yielding ' 0

p

c

c

c

c c c

Page 43: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Linear Combinations• Consider linear combinations of the elements of Y

• If Y has mean m and covariance S, then

1 1 2 2

1

2

1 2

...

'

p p

p

p

Z c Y c Y c Y

Y

Yc c c

Y

c Y

1 1 2 2

1 1

' ...

0

p p

p p

i j iji j

E Z c c c

V Z c c

c μ

c'Σc

Page 44: Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012

Linear Combinations Cont’d• If S is not positive definite then

for at least one

' ' 0V c Y c Σc

1

2 0

p

c

c

c

c