univariate and multivariate statistical distributions · brief primer on matrix algebra ... reading...

72
Univariate and Multivariate Statistical Distributions PRE 906: Structural Equation Modeling Lecture #2 – January 28, 2015 PRE 906, SEM: Lecture 2-Distributions

Upload: lenguyet

Post on 09-Aug-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

Univariate and Multivariate Statistical Distributions

PRE 906: Structural Equation ModelingLecture #2 – January 28, 2015

PRE 906, SEM: Lecture 2-Distributions

Today’s Class

• Random variables: definitions and types

• Univariate distributions General terminology

Univariate normal/Gaussian

• Brief introduction to matrix algebra

• Multivariate distributions The multivariate normal distribution

PRE 906, SEM: Lecture 2-Distributions 2

Key Questions for Today’s Lecture

1. Why is knowledge about random variables important to learn for use with SEM?

2. What are the two parameters of the univariate normal distribution and what do they govern?

3. What is are matrices and what basic algebraic operations can be conducted such things?

4. What are the two matrices of parameters of the multivariate normal distribution and what do they govern?

PRE 906, SEM: Lecture 2-Distributions 3

RANDOM VARIABLES AND STATISTICAL DISTRIBUTIONS

PRE 906, SEM: Lecture 2-Distributions 4

Why is there a Lecture about Random Variables in SEM Class?

• Random variables are key to statistical analyses: They provide the distributional specifics that accompany assumptions about data

when analyses are conducted They provide the necessary functions to estimate models for specific analyses They also provide a mechanism to describe a set of multivariate data through use

of statistical parameters that we use for scientific inference

• Structural equation models make a number of assumptions that involve random variables Assumptions about the distribution of the data Assumptions about the distribution of error (if any exists) Assumptions about the distribution of the latent variables

• Typically, SEMs have used normal distributions for everything Can be any distribution, though

• This class will introduce you to larger classes of models with normal and non-normal assumptions Therefore your analyses don’t have to be typical better fit to your data

PRE 906, SEM: Lecture 2-Distributions 5

Random Variables

Random: situations in which the certainty of the outcome is unknown and is at least in part due to chance

+

Variable: a value that may change given the scope of a given problem or set of operations

=

Random Variable: a variable whose outcome depends on chance

(possible values might represent the possible outcomes of a yet-to-be-performed experiment)

Today we will denote a random variable with a lower-cased: 𝑥

PRE 906, SEM: Lecture 2-Distributions 6

Key Features of Random Variables

• Random variables each are described by a probability density/mass function (PDF) 𝑓(𝑥) that indicates relative frequency of occurrence A PDF is a mathematical function that gives a rough picture of the distribution

from which a random variable is drawn

PRE 906, SEM: Lecture 2-Distributions 7

Types of Random Variables

• Random variables have different types:

1. Continuous Examples of continuous random variables:

𝑥 represents the height of a person

𝑌𝑝 (the outcome/DV in a GLM)

2. Discrete (also called categorical, generally) Example of discrete:

𝑥 represents the gender of a person

3. Mixture of Continuous and Discrete: Example of mixture:

𝑥 represents response time (if between 0 and 45 seconds)

0

PRE 906, SEM: Lecture 2-Distributions 8

Uses of Distributions in Data Analysis

• Statistical models make distributional assumptions on various parameters and/or parts of data

• These assumptions govern: How models are estimated

How inferences are made

How missing data are incorporated or modeled within an analysis

• If data do not follow an assumed distribution, inferences may be inaccurate It is important to pick distributions that are plausible for the analysis of the

data you have

PRE 906, SEM: Lecture 2-Distributions 9

CONTINUOUS UNIVARIATE DISTRIBUTIONS

PRE 906, SEM: Lecture 2-Distributions 10

Continuous Univariate Distributions

• To demonstrate how continuous distributions work and look, we will discuss three: Uniform distribution

Normal distribution

Chi-square distribution

• Each are described a set of parameters, which we will later see are what give us our inferences when we analyze data

• What we then do is put constraints on those parameters based on hypothesized effects in data

PRE 906, SEM: Lecture 2-Distributions 11

Uniform Distribution

• The uniform distribution is shown to help set up how continuous distributions work

• For a continuous random variable 𝑥 that ranges from 𝑎, 𝑏 , the uniform probability density function is:

𝑓 𝑥 =1

𝑏 − 𝑎

• The uniform distribution has two parameters: 𝑎 – the lower limit 𝑏 – the upper limit

• 𝑥 ∼ 𝑈 𝑎, 𝑏PRE 906, SEM: Lecture 2-Distributions 12

More on the Uniform Distribution

• To demonstrate how PDFs work, we will try a few values:

• The uniform PDF has the feature that all values of 𝑥 are equally likely across the sample space of the distribution Therefore, you do not see 𝑥 in the PDF 𝑓 𝑥

• The mean of the uniform distribution is 1

2(𝑎 + 𝑏)

• The variance of the uniform distribution is 1

12𝑏 − 𝑎 2

𝒙 𝒂 𝒃 𝒇(𝒙)

.5 0 1 1

1 − 0= 1

.75 0 1 1

1 − 0= 1

15 0 20 1

20 − 0= .05

15 10 20 1

20 − 10= .1

PRE 906, SEM: Lecture 2-Distributions 13

Univariate Normal Distribution

• For a continuous random variable 𝑥 (ranging from −∞ to ∞) the univariate normal distribution function is:

𝑓 𝑥 =1

2𝜋𝜎𝑥2exp −

𝑥 − 𝜇𝑥2

2𝜎𝑥2

• The shape of the distribution is governed by two parameters: The mean 𝜇𝑥 The variance 𝜎𝑥

2

These parameters are called sufficient statistics (they contain all the information about the distribution)

• The skewness (lean) and kurtosis (peakedness) are fixed

• Standard notation for normal distributions is 𝑥 ∼ 𝑁(𝜇𝑥, 𝜎𝑥2)

Read as: “𝑥 follows a normal distribution with a mean 𝜇𝑥 and a variance 𝜎𝑥2”

• Linear combinations of random variables following normal distributions result in a random variable that is normally distributed

PRE 906, SEM: Lecture 2-Distributions 14

Univariate Normal Distribution

𝑓(𝑥)

𝑓 𝑥 gives the height of the curve (relative frequency) for any value of 𝑥, 𝜇𝑥, and 𝜎𝑥2

• R provides this for you directly using the dnorm() function

PRE 906, SEM: Lecture 2-Distributions 15

More of the Univariate Normal Distribution

• To demonstrate how the normal distribution works, we will try a few values:

• The values from 𝑓 𝑥 were obtained by using R The “=rnorm()” function Most statistics packages have a normal distribution function

• The mean of the normal distribution is 𝜇𝑥• The variance of the normal distribution is 𝜎𝑥

2

𝒙 𝝁𝒙 𝝈𝒙𝟐 𝒇(𝒙)

.5 0 1 0.352

.75 0 1 0.301

.5 0 5 0.079

.75 -2 1 0.009

-2 -2 1 0.399

PRE 906, SEM: Lecture 2-Distributions 16

Chi-Square Distribution

• Another frequently used univariate distribution is the Chi-Square distribution Sampling distribution of the variance follows a chi-square distribution

Likelihood ratios follow a chi-square distribution

• For a continuous random variable 𝑥 (ranging from 0 to ∞), the chi-square distribution is given by:

𝑓 𝑥 =1

2𝜈2Γ

𝜈2

𝑥𝜈2−1 exp −

𝑥

2

• Γ ∙ is called the gamma function

• The chi-square distribution is governed by one parameter: 𝜈(the degrees of freedom) The mean is equal to 𝜈; the variance is equal to 2𝜈

PRE 906, SEM: Lecture 2-Distributions 17

(Univariate) Chi-Square Distribution

𝑥

𝑓(𝑥)

PRE 906, SEM: Lecture 2-Distributions 18

FROM MARGINAL DISTRIBUTIONS TO JOINT DISTRIBUTIONS

PRE 906, SEM: Lecture 2-Distributions 19

Moving from One to Multiple Random Variables

• When more than one random variable is present, there are several different types of statistical distributions:

• We will consider the following distributions: Marginal distribution

The distribution of one variable only (either 𝑓(𝑥) or 𝑓(𝑧))

Joint distribution 𝑓(𝑥, 𝑧): the distribution of both variables (both 𝑥 and 𝑧)

Conditional distribution (note: these distributions will not be discussed until we get to generalized models later in the course) The distribution of one variable, conditional on values of the other:

– 𝑓 𝑥 𝑧 : the distribution of 𝑥 given 𝑧

– 𝑓 𝑧 𝑥 : the distribution of 𝑧 given 𝑥

PRE 906, SEM: Lecture 2-Distributions 20

Marginal Distributions

• Marginal distributions are what we have worked with exclusively up to this point: they represent the distribution of one variable by itself

Continuous univariate distributions: Uniform

Normal

Chi-square

Categorical distributions (also not discussed until later in the course): Bernoulli

Binomial

Multinomial

Poisson

PRE 906, SEM: Lecture 2-Distributions 21

Joint Distributions

• Joint distributions describe the distribution of more than one variable, simultaneously Representations of multiple variables collected

• Commonly, the joint distribution function is denoted with all random variables separated by commas In our example, 𝑓 𝑥, 𝑧 is the joint distribution of the outcome of a random

variable 𝑥 and a random variable 𝑧

• Joint distributions are multivariate distributions We will cover the continuous multivariate normal distribution today

• Most joint distributions are described symbolically using matrix algebra (so we will discuss that briefly)

PRE 906, SEM: Lecture 2-Distributions 22

BRIEF PRIMER ON MATRIX ALGEBRA

For Reading and Deciphering Multivariate Distributions

PRE 906, SEM: Lecture 2-Distributions 23

Definitions

• We begin this section with some general definitions (from dictionary.com): Matrix:

1. A rectangular array of numeric or algebraic quantities subject to mathematical operations

2. The substrate on or within which a fungus grows

Algebra:

1. A branch of mathematics in which symbols, usually letters of the alphabet, represent numbers or members of a specified set and are used to represent quantities and to express general relationships that hold for all members of the set

2. A set together with a pair of binary operations defined on the set. Usually, the set and the operations include an identity element, and the operations are commutative or associative

PRE 906, SEM: Lecture 2-Distributions 24

Why Learn Matrix Algebra

• Matrix algebra can seem very abstract from the purposes of this class (and statistics in general)

• Learning matrix algebra is important for: Understanding how statistical methods work

And when to use them (or not use them)

Understanding what statistical methods mean

Reading and writing results from new statistical methods

• Today’s class is the first lecture of learning the language of multivariate statistics and structural equation modeling

PRE 906, SEM: Lecture 2-Distributions 25

DATA EXAMPLE AND R

PRE 906, SEM: Lecture 2-Distributions 26

A Guiding Example

• To demonstrate matrix algebra, we will make use of data

• Imagine that I collected data SAT test scores for both the Math (SATM) and Verbal (SATV) sections of 1,000 students

• The descriptive statistics of this data set are given below:

PRE 906, SEM: Lecture 2-Distributions 27

DEFINITIONS OF MATRICES, VECTORS, AND SCALARS

PRE 906, SEM: Lecture 2-Distributions 28

Matrices

• A matrix is a rectangular array of data Used for storing numbers

• Matrices can have unlimited dimensions For our purposes all matrices will have two dimensions:

Row Columns

• Matrices are symbolized by boldface font in text, typically with capital letters Size (r rows x c columns)

𝐗 =

520 580520 550⋮ ⋮

540 660 (1000 𝑥 2)

PRE 906, SEM: Lecture 2-Distributions 29

SAT Verbal

(Column 1)SAT Math

(Column 2)

Vectors

• A vector is a matrix where one dimension is equal to size 1 Column vector: a matrix of size 𝑟 𝑥 1

𝒙∙1 =

520520⋮

540 1000 𝑥 1

Row vector: a matrix of size 1 𝑥 𝑐𝒙1∙ = 520 580 1 𝑥 2

• Vectors are typically written in boldface font text, usually with lowercase letters

• The dots in the subscripts 𝒙∙1 and 𝒙1∙ represent the dimension aggregated across in the vector

𝒙1∙ is the first row and all columns of 𝐗

𝒙∙1 is the first column and all rows of 𝐗

Sometimes the rows and columns are separated by a comma (making it possible to read double-digits in either dimension)

PRE 906, SEM: Lecture 2-Distributions 30

Matrix Elements

• A matrix (or vector) is composed of a set of elements Each element is denoted by its position in the matrix (row and column)

• For our matrix of data 𝐗 (size 1000 rows and 2 columns), each element is denoted by:

𝑥𝑖𝑗

The first subscript is the index for the rows: i = 1,…,r (= 1000)

The second subscript is the index for the columns: j = 1,…,c (= 2)

𝐗 =

𝑥11 𝑥12𝑥21 𝑥22⋮ ⋮

𝑥1000,1 𝑥1000,2 (1000 𝑥 2)

PRE 906, SEM: Lecture 2-Distributions 31

Scalars

• A scalar is just a single number

• The name scalar is important: the number “scales” a vector – it can make a vector “longer” or “shorter”

• Scalars are typically written without boldface:𝑥11 = 520

• Each element of a matrix is a scalar

PRE 906, SEM: Lecture 2-Distributions 32

Matrix Transpose

• The transpose of a matrix is a reorganization of the matrix by switching the indices for the rows and columns

𝐗 =

520 580520 550⋮ ⋮

540 660 (1000 𝑥 2)

𝐗𝑇 =520 520 ⋯ 540580 550 ⋯ 660 2 𝑥 1000

• An element 𝑥𝑖𝑗 in the original matrix 𝐗 is now 𝑥𝑗𝑖 in the transposed matrix 𝐗𝑇

• Transposes are used to align matrices for operations where the sizes of matrices matter (such as matrix multiplication)

PRE 906, SEM: Lecture 2-Distributions 33

Types of Matrices

• Square Matrix: A square matrix has the same number of rows and columns Correlation/covariance matrices are square matrices

• Diagonal Matrix: A diagonal matrix is a square matrix with non-zero diagonal elements (𝑥𝑖𝑗 ≠ 0 for 𝑖 = 𝑗) and zeros

on the off-diagonal elements (𝑥𝑖𝑗 = 0 for 𝑖 ≠ 𝑗):

𝐀 =2.759 0 00 1.643 00 0 0.879

• Symmetric Matrix: A symmetric matrix is a square matrix with all elements reflected across the diagonal (𝑎𝑖𝑗 = 𝑎𝑗𝑖) Correlation and covariance matrices are symmetric matrices

PRE 906, SEM: Lecture 2-Distributions 34

VECTORS

PRE 906, SEM: Lecture 2-Distributions 35

Vectors in Space…

• Vectors (row or column) can be represented as lines on a Cartesian coordinate system (a graph)

• Consider the vectors: 𝐚 =12

and 𝐛 =23

• A graph of these vectors would be:

• Question: how would a column vector for each of our example variables (SATM and SATV) be plotted?

PRE 906, SEM: Lecture 2-Distributions 36

Vector Addition

• Vectors can be added together resulting in a new vector

• Vector addition is done element-wise, by adding each of the respective elements together: The new vector has the same number of rows and columns

𝐜 = 𝐚 + 𝐛 =12

+23

=35

Geometrically, this creates a new vector along either of the previous two Starting at the origin and ending at a new point in space

• In Data: a new variable (say, SAT total) is the result of vector addition

𝑺𝑨𝑻𝑻𝑶𝑻𝑨𝑳 = 𝒙.1 + 𝒙.2

PRE 906, SEM: Lecture 2-Distributions 37

Vector Addition: Geometrically

PRE 906, SEM: Lecture 2-Distributions 38

Vector Multiplication by Scalar

• Vectors can be multiplied by scalars All elements are multiplied by the scalar

𝐝 = 2𝐚 = 212

=24

• Scalar multiplication changes the length of the vector:

𝐝 = 22 + 42 = 20 = 4.47

• This is where the term scalar comes from: a scalar ends up “rescaling” (resizing) a vector

• In Data: the GLM (where X is a matrix of data) the fixed effects (slopes) are scalars multiplying the data

PRE 906, SEM: Lecture 2-Distributions 39

Scalar Multiplication: Geometrically

PRE 906, SEM: Lecture 2-Distributions 40

Linear Combinations

• Addition of a set of vectors (all multiplied by scalars) is called a linear combination:

𝐲 = 𝑎1𝐱1 + 𝑎2𝐱2 +⋯+ 𝑎𝑘𝐱𝑘

• Here, 𝐲 is the linear combination For all k vectors, the set of all possible linear combinations is called their span

Typically not thought of in most analyses – but when working with things that don’t exist (latent variables) becomes somewhat important

• In Data: linear combinations happen frequently: Linear models (i.e., Regression and ANOVA)

Principal components analysis

PRE 906, SEM: Lecture 2-Distributions 41

Linear Dependencies

• A set of vectors are said to be linearly dependent if𝑎1𝐱1 + 𝑎2𝐱2 +⋯+ 𝑎𝑘𝐱𝑘 = 0

-and-

𝑎1, 𝑎2, … , 𝑎𝑘 are all not zero

• Example: let’s make a new variable – SAT Total:𝐒𝐀𝐓𝐭𝐨𝐭𝐚𝐥 = 1 ∗ 𝐒𝐀𝐓𝐕 + 1 ∗ 𝐒𝐀𝐓𝐌

• The new variable is linearly dependent with the others:1 ∗ 𝐒𝐀𝐓𝐕 + 1 ∗ 𝐒𝐀𝐓𝐌+ (−1) ∗ 𝐒𝐀𝐓𝐭𝐨𝐭𝐚𝐥 = 𝟎

• In Data: (multi) collinearity is a linear dependency. Linear dependencies are bad for analyses that use matrix inverses

PRE 906, SEM: Lecture 2-Distributions 42

Inner (Dot) Product of Vectors

• An important concept in vector geometry is that of the inner product of two vectors The inner product is also called the dot product

𝐚 ∙ 𝐛 = 𝐚𝑇𝐛 = 𝑎11𝑏11 + 𝑎21𝑏21 +⋯+ 𝑎𝑁1𝑏𝑁1 =

𝑖=1

𝑁

𝑎𝑖1𝑏𝑖1

• The dot or inner product is related to the angle between vectors and to the projection of one vector onto another

• From our example: 𝐚 ∙ 𝐛 = 1 ∗ 2 + 2 ∗ 3 = 8

• From our data: 𝒙.1 ⋅ 𝒙.2 = 251,928,400• In data: the angle between vectors is related to the correlation

between variables and the projection is related to regression/ANOVA/linear models

PRE 906, SEM: Lecture 2-Distributions 43

MATRIX ALGEBRA

PRE 906, SEM: Lecture 2-Distributions 44

Moving from Vectors to Matrices

• A matrix can be thought of as a collection of vectors Matrix operations are vector operations on steroids

• Matrix algebra defines a set of operations and entities on matrices

I will present a version meant to mirror your previous algebra experiences

• Definitions: Identity matrix Zero vector Ones vector

• Basic Operations: Addition Subtraction Multiplication “Division”

PRE 906, SEM: Lecture 2-Distributions 45

Matrix Addition and Subtraction

• Matrix addition and subtraction are much like vector addition/subtraction

• Rules: Matrices must be the same size (rows and columns)

• Method: The new matrix is constructed of element-by-element addition/subtraction of

the previous matrices

• Order: The order of the matrices (pre- and post-) does not matter

PRE 906, SEM: Lecture 2-Distributions 46

Matrix Addition/Subtraction

PRE 906, SEM: Lecture 2-Distributions 47

Matrix Multiplication

• Matrix multiplication is a bit more complicated The new matrix may be a different size from either of the two

multiplying matrices

𝐀(𝑟 𝑥 𝑐)𝐁(𝑐 𝑥 𝑘) = 𝐂(𝑟 𝑥 𝑘)

• Rules: Pre-multiplying matrix must have number of columns equal to the number of

rows of the post-multiplying matrix

• Method: The elements of the new matrix consist of the inner (dot) product of the row

vectors of the pre-multiplying matrix and the column vectors of the post-multiplying matrix

• Order: The order of the matrices (pre- and post-) matters

PRE 906, SEM: Lecture 2-Distributions 48

Matrix Multiplication

PRE 906, SEM: Lecture 2-Distributions 49

Multiplication in Statistics

• Many statistical formulae with summation can be re-expressed with matrices

• A common matrix multiplication form is: 𝐗𝑇𝐗 Diagonal elements: 𝑖=1

𝑁 𝑋𝑖2

Off-diagonal elements: 𝑖=1𝑁 𝑋𝑖𝑎𝑋𝑖𝑏

• For our SAT example:

𝐗𝑇𝐗 =

𝑖=1

𝑁

𝑆𝐴𝑇𝑉𝑖2

𝑖=1

𝑁

𝑆𝐴𝑇𝑉𝑖𝑆𝐴𝑇𝑀𝑖

𝑖=1

𝑁

𝑆𝐴𝑇𝑉𝑖𝑆𝐴𝑇𝑀𝑖

𝑖=1

𝑁

𝑆𝐴𝑇𝑀𝑖2

=251,797,800 251,928,400251,928,400 254,862,700

PRE 906, SEM: Lecture 2-Distributions 50

Identity Matrix

• The identity matrix is a matrix that, when pre- or post-multiplied by another matrix results in the original matrix:

𝐀𝐈 = 𝐀𝐈𝐀 = 𝐀

• The identity matrix is a square matrix that has: Diagonal elements = 1

Off-diagonal elements = 0

𝐼 3 𝑥 3 =1 0 00 1 00 0 1

PRE 906, SEM: Lecture 2-Distributions 51

Zero Vector

• The zero vector is a column vector of zeros

𝟎(3 𝑥 1) =000

• When pre- or post- multiplied the result is the zero vector:𝐀𝟎 = 𝟎𝟎𝐀 = 𝟎

PRE 906, SEM: Lecture 2-Distributions 52

Ones Vector

• A ones vector is a column vector of 1s:

𝟏(3 𝑥 1) =111

• The ones vector is useful for calculating statistical terms, such as the mean vector and the covariance matrix

PRE 906, SEM: Lecture 2-Distributions 53

Matrix “Division”: The Inverse Matrix

• Division from algebra: First:

𝑎

𝑏=

1

𝑏𝑎 = 𝑏−1𝑎

Second: 𝑎

𝑎= 1

• “Division” in matrices serves a similar role For square and symmetric matrices, an inverse matrix is a matrix that when

pre- or post- multiplied with another matrix produces the identity matrix:𝐀−1𝐀 = 𝐈𝐀𝐀−𝟏 = 𝐈

• Calculation of the matrix inverse is complicated Even computers have a tough time

• Not all matrices can be inverted Non-invertible matrices are called singular matrices

In statistics, singular matrices are commonly caused by linear dependencies

PRE 906, SEM: Lecture 2-Distributions 54

The Inverse

• In data: the inverse shows up constantly in statistics Models which assume some type of (multivariate) normality need an inverse

covariance matrix

• Using our SAT example Our data matrix was size (1000 x 2), which is not invertible

However 𝐗𝑇𝐗 was size (2 x 2) – square, and symmetric

𝐗𝑇𝐗 =251,797,800 251,928,400251,928,400 254,862,700

The inverse is:

𝐗𝑇𝐗 −1 =3.61𝐸 − 7 −3.57𝐸 − 7−3.57𝐸 − 7 3.56𝐸 − 7

PRE 906, SEM: Lecture 2-Distributions 55

Matrix Algebra Operations

• 𝐀 + 𝐁 + 𝐂 =𝐀 + (𝐁 + 𝐂)

• 𝐀 + 𝐁 = 𝐁 + 𝐀

• 𝑐 𝐀 + 𝐁 = 𝑐𝐀 + 𝑐𝐁

• 𝑐 + 𝑑 𝐀 = 𝑐𝐀 + 𝑑𝐀

• 𝐀 + 𝐁 𝑇 = 𝐀𝑇 + 𝐁𝑇

• 𝑐𝑑 𝐀 = 𝑐(𝑑𝐀)

• 𝑐𝐀 𝑇 = 𝑐𝐀𝑇

• 𝑐 𝐀𝐁 = 𝑐𝐀 𝐁

• 𝐀 𝐁𝐂 = 𝐀𝐁 𝐂

• 𝐀 𝐁 + 𝐂 = 𝐀𝐁 + 𝐀𝐂

• 𝐀𝐁 𝑇 = 𝐁𝑇𝐀𝑇

• For 𝑥𝑗 such that 𝐴𝑥𝑗 exists:

𝑗=1

𝑁

𝐀𝐱𝑗 =𝐀

𝑗=1

𝑁

𝐱𝑗

𝑗=1

𝑁

𝐀𝐱𝑗 𝐀𝐱𝑗𝑇=

𝐀

𝑗=1

𝑁

𝐱𝑗𝐱𝑗𝑇 𝐀𝑇

PRE 906, SEM: Lecture 2-Distributions 56

BIVARIATE NORMAL DISTRIBUTION

A Multivariate Distribution with Two Variables

PRE 906, SEM: Lecture 2-Distributions 57

Bivariate Normal Distribution

• The bivariate normal is a joint distribution for two variables Both variable is normally distributed marginally (by itself)

Together, they form a bivariate normal distribution

• The bivariate normal density provides the relatively frequency of observing any pair of observations, 𝐱𝑖 = 𝑥𝑖1 𝑥𝑖2

𝑓 𝑥1, 𝑥2 =1

2𝜋𝜎𝑥1𝜎𝑥2 1 − 𝜌2exp −

𝑧

2 1 − 𝜌2

Where

𝑧 =𝑥𝑖1 − 𝜇𝑥1

2

𝜎𝑥12 −

2𝜌 𝑥𝑖1 − 𝜇𝑥1 𝑥𝑖2 − 𝜇𝑥2𝜎1𝜎2

+𝑥𝑖2 − 𝜇𝑥2

2

𝜎𝑥22

PRE 906, SEM: Lecture 2-Distributions 58

Bivariate Normal Plot #1

𝝁 =𝜇𝑥1𝜇𝑥2

=00

, 𝚺 =𝜎𝑥12 𝜎𝑥1𝑥2

𝜎𝑥1𝑥2 𝜎𝑥22 =

1 00 1

Density Surface (3D)

Density Surface (2D): Contour

Plot

PRE 906, SEM: Lecture 2-Distributions 59

Bivariate Normal Plot #2

𝝁 =𝜇𝑥1𝜇𝑥2

=00

, 𝚺 =𝜎𝑥12 𝜎𝑥1𝑥2

𝜎𝑥1𝑥2 𝜎𝑥22 =

1 .5.5 1

Density Surface (3D)

Density Surface (2D): Contour

Plot

PRE 906, SEM: Lecture 2-Distributions 60

MULTIVARIATE NORMAL DISTRIBUTIONS (VARIABLES ≥ 2)

PRE 906, SEM: Lecture 2-Distributions 61

Multivariate Normal Distribution

• The multivariate normal distribution is the generalization of the univariate normal distribution to multiple variables The bivariate normal distribution just shown is part of the MVN

• The MVN provides the relative likelihood of observing all V variables for a subject p simultaneously:

𝐱𝑝 = 𝑥𝑝1 𝑥𝑝2 … 𝑥𝑝𝑉

• The multivariate normal density function is:

𝑓 𝐱𝑝 =1

2𝜋𝑉2 𝚺

12

exp −𝐱𝑝𝑇 − 𝝁

𝑇𝚺−1 𝐱𝑝

𝑇 − 𝝁

2

PRE 906, SEM: Lecture 2-Distributions 62

The Multivariate Normal Distribution

𝑓 𝐱𝑝 =1

2𝜋𝑉2 𝚺

12

exp −𝐱𝑝𝑇 − 𝝁

𝑇𝚺−1 𝐱𝑝

𝑇 − 𝝁

2

• The mean vector is 𝝁 =

𝜇𝑥1𝜇𝑥2⋮

𝜇𝑥𝑉

• The covariance matrix is 𝚺 =

𝜎𝑥12 𝜎𝑥1𝑥2 ⋯ 𝜎𝑥1𝑥𝑉

𝜎𝑥1𝑥2 𝜎𝑥22 ⋯ 𝜎𝑥2𝑥𝑉

⋮ ⋮ ⋱ ⋮𝜎𝑥1𝑥𝑉 𝜎𝑥2𝑥𝑉 ⋯ 𝜎𝑥𝑉

2

The covariance matrix must be non-singular (invertible)PRE 906, SEM: Lecture 2-Distributions 63

Comparing Univariate and Multivariate Normal Distributions

• The univariate normal distribution:

𝑓 𝑥𝑝 =1

2𝜋𝜎2exp −

𝑥 − 𝜇 2

2𝜎2

• The univariate normal, rewritten with a little algebra:

𝑓 𝑥𝑝 =1

2𝜋12|𝜎2|

12

exp −𝑥 − 𝜇 𝜎−

12 𝑥 − 𝜇

2

• The multivariate normal distribution

𝑓 𝐱𝑝 =1

2𝜋𝑉2 𝚺

12

exp −𝐱𝑝𝑇 − 𝝁

𝑇𝚺−1 𝐱𝑝

𝑇 − 𝝁

2

When 𝑉 = 1 (one variable), the MVN is a univariate normal distribution

PRE 906, SEM: Lecture 2-Distributions 64

The Exponent Term

• The term in the exponent (without the −1

2) is called the

squared Mahalanobis Distance

𝑑2 𝒙𝑝 = 𝐱𝑝𝑇 − 𝝁

𝑇𝚺−1 𝐱𝑝

𝑇 − 𝝁

Sometimes called the statistical distance

Describes how far an observation is from its mean vector, in standardized units

Like a multivariate Z score (but, if data are MVN, is actually distributed as a 𝜒2variable with DF = number of variables in X)

Can be used to assess if data follow MVN

PRE 906, SEM: Lecture 2-Distributions 65

Multivariate Normal Notation

• Standard notation for the multivariate normal distribution of v variables is 𝑁𝑣 𝝁, 𝚺 Our SAT example would use a bivariate normal: 𝑁2 𝝁, 𝚺

• In data: The multivariate normal distribution serves as the basis for most every

statistical technique commonly used in the social and educational sciences General linear models (ANOVA, regression, MANOVA)

General linear mixed models (HLM/multilevel models)

Factor and structural equation models (EFA, CFA, SEM, path models)

Multiple imputation for missing data

Simply put, the world of commonly used statistics revolves around the multivariate normal distribution Understanding it is the key to understanding many statistical methods

PRE 906, SEM: Lecture 2-Distributions 66

Bivariate Normal Plot #1

𝝁 =𝜇𝑥1𝜇𝑥2

=00

, 𝚺 =𝜎𝑥12 𝜎𝑥1𝑥2

𝜎𝑥1𝑥2 𝜎𝑥22 =

1 00 1

PRE 906, SEM: Lecture 2-Distributions 67

Density Surface (3D)

Density Surface (2D): Contour

Plot

Bivariate Normal Plot #2 (Multivariate Normal)

𝝁 =𝜇𝑥1𝜇𝑥2

=00

, 𝚺 =𝜎𝑥12 𝜎𝑥1𝑥2

𝜎𝑥1𝑥2 𝜎𝑥22 =

1 .5.5 1

PRE 906, SEM: Lecture 2-Distributions 68

Density Surface (3D)

Density Surface (2D): Contour

Plot

Multivariate Normal Properties

• The multivariate normal distribution has some useful properties that show up in statistical methods

• If 𝐗 is distributed multivariate normally:

1. Linear combinations of 𝐗 are normally distributed

2. All subsets of 𝐗 are multivariate normally distributed

3. A zero covariance between a pair of variables of 𝐗implies that the variables are independent

4. Conditional distributions of 𝐗 are multivariate normal

PRE 906, SEM: Lecture 2-Distributions 69

WRAPPING UP AND REFOCUSING

PRE 906, SEM: Lecture 2-Distributions 70

Key Questions for Today’s Lecture

1. Why is knowledge about random variables important to learn for use with SEM?

2. What are the two parameters of the univariate normal distribution and what do they govern?

3. What is are matrices and what basic algebraic operations can be conducted such things?

4. What are the two matrices of parameters of the multivariate normal distribution and what do they govern?

PRE 906, SEM: Lecture 2-Distributions 71

Concluding Remarks

• The univariate and multivariate normal distributions serve as the backbone of the most frequently used statistical procedures They are very robust and very useful

• Understanding their form and function will help you learn about a whole host of statistical routines immediately This course is essentially a course based in normal distributions

• Inspecting normality is a tricky thing More often than not, data will not be normal More often than not, no one will check (don’t ask, don’t tell)

• Next week: linear models in matrices, repeated measures ANOVA, Multivariate ANOVA Classical statistics – described as a background for modern methods

PRE 906, SEM: Lecture 2-Distributions 72