eleg-636: statistical signal processing

ELEG-636: Statistical Signal Processing

Gonzalo R. Arce

Department of Electrical and Computer EngineeringUniversity of Delaware

Spring 2010

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 1 / 96

Course Objectives & Structure


Objective: Given a discrete time sequence x(n), develop

Statistical and spectral signal representation

Filtering, prediction, and system identification algorithmsOptimization methods that are

StatisticalAdaptive

Course Structure:

Weekly lectures [notes: www.ece.udel.edu/∼arce]

Periodic homework (theory & Matlab implementations) [15%]

Midterm & Final examinations [85%]

Textbook:

Haykin, Adaptive Filter Theory.




Broad Applications in Communications, Imaging, Sensors.Emerging application in

Brain-imaging techniquesBrain-machine interfaces,Implantable devices.

Neurofeedback presents real-time physiological signals from MRIsin a visual or auditory form to provide information about brainactivity. These signals are used to train the patient to alter neuralactivity in a desired direction.

Traditionally, feedback using EEGs or other mechanisms has notfocused on the brain because the resolution is not good enough.


Probability

Signal Characterization

Assumption: Many methods take x(n) to be deterministicReality: Real world signals are usually statistical in nature

Thus,. . . x(−1), x(0), x(1), . . .

can be interpreted as a sequence of random variables.We begin by analyzing each observation x(n) as a R.V.Then, to capture dependencies, we consider random vectors

. . . x(n), x(n + 1), . . . , x(n + N − 1)︸︷︷︸

x(n)

, x(n + N), . . .


Probability Random Variables

Random Variables

Definition

For a space S, the subsets, or events of S, have associatedprobabilities.

To every event δ, we assign a number x(δ), which is called a R.V.

The distribution function of x is

Prx ≤ x0 = Fx(x0) −∞ < x0 < ∞

Properties:1 F (+∞) = 1, F (−∞) = 02 F (x) is continuous from the right

F (x+) = F (x)

3 Prx1 < x ≤ x2 = F (x2)− F (x1)



Example

Fair toss of two coins: H=heads, T=Tails

Define numerical assignments:

Events(δ) Prob. X(δ) Y(δ)HH 1/4 1 -100HT 1/4 2 -100TH 1/4 3 -100TT 1/4 4 500

This assignments yield different distribution functions

Fx(2) = PrHH,HT = 1/2

Fy (2) = PrHH,HT ,TH = 3/4

How do we attain an intuitive interpretation of the distribution function?



Distribution Plots

Note properties hold:

1 F (+∞) = 1, F (−∞) = 02 F (x) is continuous from the right

F (x+) = F (x)

3 Prx1 < x ≤ x2 = F (x2)− F (x1)



Definition

The probability density function is defined as,

f (x) =dF (x)

dx

or F (x) =∫ x

−∞f (x)dx

Thus F (∞) = 1 ⇒∫ ∞

−∞f (x)dx = 1

Types of distributions:

Continuous: Prx = x0 = 0 ∀x0

Discrete: F (xi)− F (x−i ) = Prx = xi = Pi

In which case f (x) =∑

i Piδ(x − xi)

Mixed: discontinuous but not discrete



Distribution examples

Uniform: x ∼ U(a, b) a < b

f (x) = 1

b−a x ∈ [a, b]0 else

−

! "# $ % &

! "' $ % &(



Gaussian: x ∼ N(µ, σ)

f (x) =1√2πσ

e− (x−µ)2

2σ2

µ

) * + ,µ

- * + ,. / 0.Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 11 / 96


Gaussian Distribution Example

Example

Consider the Normal (Gaussian) distribution PDF and CDF forµ = 0, σ2 = 0.2, 1.0, 5.0 and µ = −2, σ2 = 0.5

1 2 3456 78 9 : ;9 : <9 : =9 : >9 : 9 ? @ ? A B A @B : 9

? B C > =? >? =D E C FD E C FD E G H FD E C F E C I HJK E L I CJK E M I CJK E C I MJK N O PQRS TU V W XV W YV W ZV W [V W V \ ] \ ^ _ ^ ]

_ W V\ _ ` [ Z\ [\ Z

a b ` ca b ` ca b d e ca b ` c b ` f egh b i f `gh b j f `gh b ` f jghGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 12 / 96


Binomial: x ∼ B(p, q) p + q = 1

Example

Toss a coin n times. What is the probability of getting k heads?

For p + q = 1, where q is probability of a tail, and p is the probabilityof a head:

Prx = k =

(nk

)

pkqn−k[

NOTE:(

nk

)

=n!

(n − k)!k !

]

⇒ f (x) =n∑

k=0

(nk

)

pkqn−kδ(x − k)

⇒ F (x) =m∑

k=0

(nk

)

pkqn−k m ≤ x < m + 1



Binomial Distribution Example I

Example

Toss a coin n times. What is the probability of getting k heads? Forn = 9, p = q = 1

2 (fair coin)

kk l mk l nk l ok m n o p q r s t u m k

v w x yzz |

z ~ | z



Binomial Distribution Example II

Example

Toss a coin n times. What is the probability of getting k heads? Forn = 20, p = 0.5, 0.7 and n = 40, p = 0.5.

¡ ¡¢ ¡£ ¡¤ ¡¥¦ ¡

§ ¨ © ª « ¬ ¬ ¨ § ¨ © ® « ¬ ¬ ¨ § ¨ © ª « ¬ ¬ ¨ Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 15 / 96

Probability Conditional Distributions

Conditional Distributions

Definition

The conditional distribution of x given event “M” has occurred is

Fx(x0|M) = Prx ≤ x0|M

=Prx ≤ x0,M

PrM

Example

Suppose M = x ≤ a, then

Fx(x0|M) =Prx ≤ x0,M

Prx ≤ a

If x0 ≥ a, what happens?



Special Cases

Special Case: x0 ≥ a

Prx ≤ x0, x ≤ a = Prx ≤ a

⇒ Fx(x0|M) =Prx ≤ x0,M

Prx ≤ a =Prx ≤ aPrx ≤ a = 1

Special Case: x0 ≤ a

⇒ Fx(x0|M) =Prx ≤ x0,M

Prx ≤ a =Prx ≤ x0Prx ≤ a

=Fx(x0)

Fx(a)



Conditional Distribution Example

Example

Suppose

¯° ± ² ³´

What does Fx(x |M) look like? Note M = x ≤ a.

⇒ Fx(x0|M) =

Fx (x0)Fx (a)

x ≤ a1 a ≤ x



Example (Fair Coin Toss)

Toss a fair coin 4 times. Let x be the number of heads. DeterminePrx = k.

Recall

Prx = k =

(nk

)

pkqn−k

In this case

Prx = k =

(4k

)(12

)4

Prx = 0 = Prx = 4 =1

16

Prx = 1 = Prx = 3 =14

Prx = 2 =38



Density and Distribution Plots for Fair Coin (n = 4) Ex.

ÆÆ Ç ÈÆ Ç ÉÆ Ç ÊÆ Ç ËÆ Ç ÌÆ È É Ê Ë Ì

Í Î Ï ÐÑ ÒÑ Ó Ñ Ô Õ Ñ ÒÑÓ Ñ ÖÖ × ØÙ Ö Ù Ú Û Ü Ø Ý Þ

ß à á âã äã ã äå ã äã ã ã äã å æ

What type of distribution is this? Discrete. Thus,

F (xi)− F (x−i ) = Prx = xi = Pi

F (x) =∫ x

−∞f (x)dx =

∫ x

−∞

∑

i

Piδ(x − xi)dx



Conditional Case

Example (Conditional Fair Coin Toss)

Toss a fair coin 4 times. Let x be the number of heads. SupposeM = [at least one flip produces a head]. Determine Prx = k |M.

Recall,

Prx = k |M =Prx = k ,M

PrMThus first determine PrM

PrM = 1 − PrNo heads

= 1 − 116

=1516



Conditional and Unconditional Density Functions

çç è éç è êç è ëç è ìç è íç é ê ë ì í

î ï ð ñò óò ô ò õ ö ò óòô ò ÷÷ ø ù÷ ø ú÷ ø û÷ ø ü÷ ø ý

÷ ù ú û ü ýþ ÿ

Are they proper density functions?


Probability Total Probability and Bayes’ Theorem

Total Probability and Bayes’ Theorem

Let M1,M2, . . . ,Mn forms a partition of S, i.e.⋃

i

Mi = S and Mi

⋂

i 6=j

Mj = φ

Then

F (x) =∑

i

Fx(x |Mi)Pr(Mi)

f (x) =∑

i

fx(x |Mi)Pr(Mi)

Aside

PrA|B =PrA,B

PrB =PrB,APrAPrBPrA =

PrB|APrAPrB



Putting it all Together: Bayes’ Theorem

Bayes’ Theorem:

f (x0|M) =PrM|x = x0f (x0)

PrM

=PrM|x = x0f (x0)

∫∞−∞ PrM|x = x0f (x0)dx0


Probability Functions of a R.V.

Functions of a R.V.

Problem Statement

Let x and g(x) be RVs such that

y = g(x)

Question: How do we determine the distribution of y?

NoteFy (y0) = Pry ≤ y0

= Prg(x) ≤ y0= Prx ∈ Ry0

whereRy0 = x : g(x) ≤ y0

Question: If y = g(x) = x2, what is Ry0 ?



Example

Let y = g(x) = x2. Determine Fy (y0).

−

=

Note thatFy (y0) = Pr(y ≤ y0)

= Pr(−√y0 ≤ x ≤ √

y0)= Fx(

√y0)− Fx(−

√y0)



Example

Let x ∼ N(µ, σ) and

y = U(x) =

1 if x > µ0 if x ≤ µ

Determine fy (y0) and Fy (y0).



General Function of a Random Variable Case

To determine the density of y = g(x) in terms of fx(x0), look at g(x)

fy (y0)dy0 = Pr(y0 ≤ y ≤ y0 + dy0)

= Pr(x1 ≤ x ≤ x1 + dx1) + Pr(x2 + dx2 ≤ x ≤ x2)

+Pr(x3 ≤ x ≤ x3 + dx3)



fy (y0)dy0 = Pr(x1 ≤ x ≤ x1 + dx1) + Pr(x2 + dx2 ≤ x ≤ x2)

+Pr(x3 ≤ x ≤ x3 + dx3)

= fx(x1)dx1 + fx(x2)|dx2|+ fx(x3)dx3 (∗)Note that

dx1 =dx1

dy0dy0 =

dy0

dy0/dx1=

dy0

g′(x1)

Similarly

dx2 =dy0

g′(x2)and dx3 =

dy0

g′(x3)

Thus (∗) becomes

fy (y0)dy0 =fx(x1)

g′(x1)dy0 +

fx(x2)

|g′(x2)|dy0 +

fx(x3)

g′(x3)dy0

or

fy (y0) =fx(x1)

g′(x1)+

fx(x2)

|g′(x2)|+

fx(x3)

g′(x3)



Function of a R.V. Distribution General Result

Set y = g(x) and let x1, x2, . . . be the roots, i.e.,

y = g(x1) = g(x2) = . . .

Then

fy (y) =fx(x1)

|g′(x1)|+

fx(x2)

|g′(x2)|+ . . .

Example

Suppose x ∼ U(−1, 2) and y = x2. Determine fy (y).

! " # $ %

& ' '(

) **'



Note thatg(x) = x2 ⇒ g′(x) = 2x

Consider special cases separately:Case 1: 0 ≤ y ≤ 1

y = x2 ⇒ x = ±√y

fy (y) =fx(x1)

|g′(x1)|+

fx(x2)

|g′(x2)|

=1/3|2√y | +

1/3| − 2

√y | =

1/3√y

Case 2: 1 ≤ y ≤ 4y = x2 ⇒ x =

√y

fy (y) =fx(x1)

|g′(x1)|=

1/32√

y=

1/6√y



Result: For x ∼ U(−1, 2) and y = x2

+, - . /0 1 2 3 4

. 5 6 67

8 996

fy (y) =

1/3√y 0 ≤ y ≤ 1

1/6√y 1 < y ≤ 4 :: ; <= : = > ? @ <

A B C D EF G D



Example

Let x ∼ N(µ, σ) and y = ex . Determine fy (y).

Note g(x) ≥ 0 and g′(x) = ex

Also, there is a single root (inverse solution):

x = ln(y)

Therefore,

fy (y) =fx(x)|g′(x)| =

fx(x)ex

Expressing this in terms of y through substitution yields:

fy (y) =fx(ln(y))

eln(y)=

fx(ln(y))y



Note that x is Gaussian:

fx(x) =1√2πσ

e− (x−µ)2

2σ2

⇒ fy (y) =1√

2πyσe− (ln(y)−µ)2

2σ2 , for y > 0

HH I J KH K L M N J

O P Q R SR

Log normal density



Distribution of Fx(x)

For any RV with continuous distribution Fx(x), the RV y = Fx(x) isuniform on [0, 1].

Proof: Note 0 < y < 1. Since

g(x) = Fx(x)

g′(x) = fx(x)

Thus

fy (y) =fx(x)g′(x)

=fx(x)fx(x)

= 1

T UV W X Y ZU T UU [ W X Y Z Y



Thus the function

g(x) = Fx(x)

performs the mapping:

The converse also holds:

Combining operationsyields Synthesis:

\] ^_ ` ab c _ de − fgh ij k lm n

−jo p qr s

− tGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 39 / 96

Probability Mean, Median and Variance

Mean, Median and variance

Definitions

Mean Ex =

∫ ∞

−∞xf (x)dx

Conditional Mean Ex |M =

∫ ∞

−∞xf (x |M)dx

Example

Suppose M = x ≥ a. Then

Ex |M =

∫ ∞

−∞xf (x |M)dx

=

∫∞a xf (x)dx∫∞

a f (x)dx



For a function of a RV, y = g(x),

Ey =

∫ ∞

−∞yfy (y)dy =

∫ ∞

−∞g(x)fx(x)dx

Example

Suppose g(x) is a step function: Determine Eg(x).

Eg(x) =

∫ ∞

−∞g(x)fx(x)dx =

∫ x0

−∞fx(x)dx = Fx(x0)



Median

Definitions

Median= m∫ m

−∞f (x)dx =

∫ ∞

mf (x)dx =

12

Median Prx ≤ m = Prx ≥ m

Example

Let x ∼ λ exp−λx U(x). Then m = ln(2)λ



Definition (Variance)

Variance σ2 =

∫ ∞

−∞(x − η)2f (x)dx

where η = Ex. Thus,

σ2 = E(x − η)2 = Ex2 − E2x

Example

For x ∼ N(η, σ2), determine the variance.

f (x) =1√2πσ

e− (x−η)2

2σ2

Note: f (x) is symmetric about x = η ⇒ Ex = η

Also ∫ ∞

−∞f (x)dx = 1 ⇒

∫ ∞

−∞e− (x−η)2

2σ2 dx =√

2πσ



∫ ∞

−∞e− (x−η)2

2σ2 dx =√

2πσ

Differentiating w.r.t. σ:

⇒∫ ∞

−∞

(x − η)2

σ3 e− (x−η)2

2σ2 dx =√

2π

Rearranging yields

∫ ∞

−∞(x − η)2 1√

2πσe− (x−η)2

2σ2 dx = σ2

orE(x − η)2 = σ2


Probability Moments

Definition (Moments)

Moments

mn = Exn =

∫ ∞

−∞xnf (x)dx

Central Moments

µn = E(x − η)n =

∫ ∞

−∞(x − η)nf (x)dx

From the binomial theorem

µn = E(x − η)n = E

n∑

k=0

(nk

)

xk (−η)n−k

=

n∑

k=0

(nk

)

mk (−η)n−k

⇒ µ0 = 1, µ1 = 0, µ2 = σ2, µ3 = m3 − 3ηm2 + 2η3


Probability Moments

Example

Let x ∼ N(0, σ2). Prove

Exn =

0 n = 2k + 11 · 3 · · · · (n − 1)σn n = 2k

For n odd

Exn =

∫ ∞

−∞xnf (x) = 0

since xn is an odd function and f (x) is an even function.

To prove the second part, use the fact that

∫ ∞

−∞e−αx2

dx =

√π

α


Probability Moments

Differentiate ∫ ∞

−∞e−αx2

dx =

√π

α

with respect to α, k times

⇒∫ ∞

−∞x2ke−αx2

dx =1 · 3 · · · · (2k − 1)

2k

√π

α2k+1

Let α = 12σ2 , then

∫ ∞

−∞x2ke− x2

2σ2 dx = 1 · 3 · · · · (2k − 1)σ2k+1√

2π

Setting n = 2k and rearranging∫ ∞

−∞xn 1√

2πσe− x2

2σ2 dx = 1 · 3 · · · · (n − 1)σn [QED]

Note: Variance is a measure of a RV’s concentration around its meanGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 47 / 96

Probability Tchebycheff Inequality

Tchebycheff Inequality

For any ε > 0,

Pr(|x − η| ≥ ε) ≤ σ2

ε2

To prove this, note

Pr(|x − η| ≥ ε) =

∫ η−ε

−∞f (x)dx +

∫ ∞

η+εf (x)dx

=

∫

|x−η|≥ε

f (x)dx

Also note that

σ2 =

∫ ∞

−∞(x − η)2f (x)dx

≥∫

|x−η|≥ε

(x − η)2f (x)dx

u v w xη η y εηz ε


Probability Tchebycheff Inequality

σ2 ≥∫

|x−η|≥ε

(x − η)2f (x)dx

Using the fact that |x − η| ≥ ε in the above gives

σ2 ≥ ε2∫

|x−η|≥ε

f (x)dx

= ε2Pr|x − η| ≥ ε

Rearranging gives the desired result

⇒ Pr|x − η| ≥ ε ≤(σ

ε

)2

QED


Probability Jensen’s Inequality

Jensen’s Inequality

For a real convex function Ψ and x a random variable,

Ψ(E [x ]) ≤ E [Ψ(x)].

with equality when Ψ not strictly convex, Ψ(x) = x .Inequality reversed for concave Ψ.


Probability Jensen’s Inequality

Suppose Ω is a measurable subset of the real line and f (x) is anon-negative function such that

∫ ∞

−∞f (x)dx = 1

where f is a probability density function. Then Jensen’s inequalitybecomes the following statement about integrals:If g is any real-valued measurable function and Ψ is convex over therange of g, then

Ψ

(∫ ∞

−∞g(x)f (x)dx

)

≤∫ ∞

−∞Ψ(g(x))f (x)dx

If g(x) = x , then this form of inequality reduces to the commonly usedspecial case:

Ψ

(∫ ∞

−∞xf (x)dx

)

≤∫ ∞

−∞Ψ(x)f (x)dx


Probability Characteristic & Moment Generating Functions

Definition (Characteristic Function)

The characteristic function of a random variable x with pdf fx(x) isdefined by

φx(ω) = E(

ejωx)

=

∫ ∞

−∞ejωx fx(x)dx

If fx(x) is symmetric about 0 (fx(x) = fx(−x)), then φx(x) is realThe magnitude of the characteristic function is bound by

|φx(ω)| ≤ φx(0) = 1

Theorem (Characteristic Function for the sum of independent RVs)

Let x1, x2, . . . , xN be independent (but not necessarily identicallydistributed) RVs and set sN =

∑ni=1 aixi where ai are constants. Then

φsN (ω) =

N∏

i=1

φxi (aiω)



The theorem can be proved by a simple extension of the following: Letx and y be independent. Then

φx+y (ω) = E(

ejω(x+y))

= E(

ejωxejωy)

= E(

ejωx)

E(

ejωy)

= φx(ω)φy (ω)

Example

Determine the characteristic function of the sample mean operating oniid samples.

Note x = 1N

∑Ni=1 xi ⇒ ai =

1N

⇒ φx(ω) =N∏

i=1

φxi (aiω) =(

φxi

(ω

N

))N



The Moment Generating function is realized by making the substitutionjω → s in the above

Definition (Moment Generating Function)

The moment generating function of a random variable x with pdf fx(x)is defined by

Φx(s) = E (esx) =

∫ ∞

−∞esx fx(x)dx

Note Φx(jω) = φx(ω)

Theorem (Moment Generation)

Provided that Φx(s) exists in an open interval around s = 0, thefollowing hold

mn = E (xn) = Φ(n)x (0) =

dnΦx

dsn (0)

Simply noting that Φ(n)x (s) = E (xnesx) proves the result



Example

Let x be exponentially distributed,

f (x) = λe−λxU(x)

Determine η = m1,m2, and σ2

Note

Φx(s) = λ

∫ ∞

0esxe−λxdx = λ

∫ ∞

0e−x(λ−s)dx

=λ

λ− sThus

Φ(1)x (0) =

1λ

and Φ(2)x (0) =

2λ2

and

Ex =1λ, E

x2

=2λ2 ⇒ σ2 =

1λ2


Probability Bivariate Statistics

Bivariate Statistics

Given two RVs, x and y , the bivariate (joint) distribution is given by

F (x0, y0) = Prx ≤ x0, y ≤ y0 | | Properties:

F (−∞, y) = F (x ,−∞) = 0

F (∞,∞) = 1

Fx(x) = F (x ,∞), Fy (y) = F (∞, y)



Special Cases

Case 1: M = x1 ≤ x ≤ x2, y ≤ y0 x

y

y0

x1 x2

y

x

y

y

x

⇒ PrM = F (x2, y0)− F (x1, y0)

Case 2: M = x ≤ x0, y1 ≤ y ≤ y2

y

x0

y1

y2

x

⇒ PrM = F (x0, y2)− F (x0, y1)



Case 3: M = x1 ≤ x ≤ x2, y1 ≤ y ≤ y2 Theny

x1

y1

y2

xx2

andPrM = F (x2, y2)− F (x1, y2)− F (x2, y1) + F (x1, y1)

︸︷︷︸

↓

Added back because this region was subtracted twice


Probability Joint Statistics

Definition (Joint Statistics)

f (x , y) =∂2F (x , y)∂x∂y

and

F (x , y) =∫ x

−∞

∫ y

−∞f (α, β)dαdβ

In general, for some region M, the joint statistics are

Pr(x , y) ∈ M =

∫ ∫

M

f (x , y)dxdy

Marginal Statistics: Fx(x) = F (x ,∞) and Fy (y) = F (∞, y)

⇒ fx(x) =

∫ ∞

−∞f (x , y)dy

⇒ fy (y) =

∫ ∞

−∞f (x , y)dx


Probability Independence

Independence

Definition (Independence)

Two RVs x and y are statistically independent if for arbitrary events(regions) x ∈ A and y ∈ B,

Prx ∈ A, y ∈ B = Prx ∈ APry ∈ B

Letting A = x ≤ x0 and B = y ≤ y0, we see x and y areindependent iff

Fx ,y (x , y) = Fx(x)Fy (y)

and by differentiation

fx ,y (x , y) = fx(x)fy (y)



If x and y are independent RVs, then

z = q(x) and w = h(y)

are also independent.

Function of two RVs

Given two RVs, let z = g(x , y). Define Dz to be the xy plane region

z ≤ z0 = g(x , y) ≤ z0 = (x , y) ∈ Dz

Then

Fz(z0) = Prz ≤ z0= Pr(x , y) ∈ Dz

=

∫ ∫

Dz

f (x , y)dxdy



Example

Let z = x + y . Then, z ≤ z0 gives the region x + y ≤ z0 which isdelineated by the line x + y = z0

∫ ∫

y

z0

x

z0

Thus

Fz(z0) =

∫ ∫

Dz

f (x , y)dxdy

=

∫ ∞

−∞

∫ z0−y

−∞f (x , y)dxdy



We can obtain fz(z) by differentiation

∂Fz(z)∂z

=

∫ ∞

−∞

∂

∂z

∫ z−y

−∞f (x , y)dxdy

fz(z) =

∫ ∞

−∞f (z − y , y)dy (∗)

Note that if x and y are independent,

f (x , y) = fx(x)fy (y) (∗∗)

Thus utilizing (∗∗) in (∗)

fz(z) =

∫ ∞

−∞fx(z − y)fy (y)dy

︸︷︷︸

Convolution

= fx(z) ∗ fy (z)



Example

Let z = x + y where x and y are independent with

fx(x) = αe−αxU(x)

fy (y) = αe−αyU(y)

Then

fz(z) =

∫ ∞

−∞fx(z − y)fy (y)dy

= α2∫ z

0e−α(z−y)e−αydy

= α2e−αz∫ z

0dy

= α2ze−αzU(z)



Example

Let z = max(x , y). Determine Fz(z0) and fz(z0).

NoteFz(z0) = Prz ≤ z0

= Prmax(x , y) ≤ z0= Fxy (z0, z0)

∂

y

x

z0

z0



If x and y are independent,

Fz(z0) = Fx(z0)Fy (z0)

and

fz(z0) =∂Fz(z0)

∂z0

=∂Fx(z0)

∂z0Fy (z0) +

∂Fy (z0)

∂z0Fx(z0)

= fx(z0)Fy (z0) + fy (z0)Fx(z0)


Probability Joint Moments

Joint Moments

For RVs x and y and function z = g(x , y)

Ez =

∫ ∞

−∞zfz(z)dz

Eg(x , y) =

∫ ∞

−∞

∫ ∞

−∞g(x , y)f (x , y)dxdy

Definition (Covariance)

For RVs x and y ,

Cxy = Cov(x , y)= E [(x − ηx)(y − ηy )]= E [xy ]− ηxE [y ]− ηyE [x ] + ηxηy

= E [xy ]− ηxηy



Definition (Correlation Coefficient)

The correlation coefficient is given by

r =Cxy

σxσy

Note that

0 ≤ E[a(x − ηx) + (y − ηy )]2

= E(x − ηx)2a2 + 2E(x − ηx)(y − ηy )a + E(y − ηy )

2= σ2

x a2 + 2Cxya + σ2y

This is a positive quadratic function of a⇒ Roots are imaginary and discriminant is non-positive

√

4C2xy − 4σ2

xσ2y → imaginary

⇒ 4C2xy − 4σ2

xσ2y ≤ 0

⇒ C2xy ≤ σ2

xσ2y



Thus,

|Cxy | ≤ σxσy and |r | = |Cxy |σxσy

≤ 1

Definition (Uncorrelated)

Two RVs are uncorrelated if their covariance is zero

Cxy = 0

⇒ r =Cxy

σxσy= 0

=Exy − ExEy

σxσy= 0

⇒ Exy = ExEyThus

Cxy = 0 ⇔ Exy = ExEy



Result

If x and y are independent, then

Exy = ExEy

and x and y are uncorrelated

Note: Converse is not true (in general)

Converse only holds for Gaussian RVs

Independence is a stronger condition than uncorrelated

Definition (Orthogonality)

Two RVs are orthogonal if

Exy = 0

Note: If x and y are correlated, they are not orthogonal



Example

Consider the correlation between two RV’s, x and y , with samplesshown in a scatter plot


Probability Sequences and Vectors of Random Variables

Sequences and Vectors of Random Variables

Definition (Vector Distribution)

Let x be a sequence of RVs. Take N samples to form the randomvector

x = [x1, x2, . . . , xN ]T

Then the vector distribution function is

Fx(x0) = Prx1 ≤ x01 , x2 ≤ x0

2 , . . . , xN ≤ x0N

4= Prx ≤ x0

Special Case: For complex data

x = xr + jxi



x1

x2...xN

=

xr1

xr2...xrN

+ j

xi1

xi2...xiN

The distribution in the complex case is defined as

Fx(x0) = Prxr ≤ x0r , xi ≤ x0

i 4= Prx ≤ x0

The density function is given by

fx(x) =∂NFx(x)

∂x1∂x2 . . . ∂xN

Fx(x0) =

∫ x01

−∞

∫ x02

−∞. . .

∫ x0N

−∞fx(x)dx1dx2 . . . dxN



Properties:

Fx([∞,∞, · · · ,∞]T ) = 1∫ ∞

−∞

∫ ∞

−∞· · ·

∫ ∞

−∞fx(x)dx = 1

Fx([x1, x2, · · · ,−∞, · · · , xN ]T ) = 0

Also

F ([∞, x2, x3, · · · , xN ]T ) = F ([x2, x3, · · · , xN ]

T )∫ ∞

−∞f ([x1, x2, x3, · · · , xN ]

T )dx1 = f ([x2, x3, · · · , xN ]T )

Setting xi = ∞ in the cdf eliminates this sample

Integrating over (−∞,∞) along xi in the pdf eliminates this sample



Joint Distribution

Definitions (Joint Distribution and Density)

Given two random vectors x and y, the joint distribution and density are

Fxy(x0, y0) = Prx ≤ x0, y ≤ y0

fxy(x, y) =∂N∂MFxy(x, y)

∂x1∂x2 · · · ∂xN∂y1∂y2 · · · ∂yM

Definition (Vector Independence)

The vectors are independent iff

Fxy(x, y) = Fx(x)Fy(y)

or equivalentlyfxy(x, y) = fx(x)fy(y)



Expectations & Moments

Objective: Obtain partial description of process generating x

Solution: Use moments

The first moment, or mean, is

mx = Ex = [m1,m2, . . . ,mN ]T

=

∫ ∞

−∞

∫ ∞

−∞· · ·

∫ ∞

−∞xfx(x)dx

⇒ mk =

∫ ∞

−∞

∫ ∞

−∞· · ·

∫ ∞

−∞xk fx(x)dx1dx2 · · · dxN

=

∫ ∞

−∞xk fxk (xk )

︸︷︷︸dxk

⇑ marginal distribution of xk



Definition (Correlation Matrix)

A complete set of second moments is given by the correlation matrix

Rx = ExxH = Exx∗T

=

E|x1|2 Ex1x∗2 · · · Ex1x∗

NEx2x∗

1 E|x2|2 · · · Ex2x∗N

......

. . ....

ExNx∗1 ExNx∗

2 · · · E|xN |2

ResultThe correlation matrix is Hermitian symmetric

(Rx)H = (ExxH)H

= E(xxH)H= ExxH = Rx



Definition (Covariance Matrix)

The set of second central moments is given by the covariance

Cx = E(x − mx)(x − mx)H

= ExxH − mxExH − ExmxH + mxmx

H

= Rx − mxmxH

ResultThe covariance is Hermitian symmetric

Cx = CxH



Result

The correlation and covariance matrices are positive semi-definite

aHRxa ≥ 0 aHCxa ≥ 0 (∀a)

To prove this, note

aHRxa = aHExxHa

= EaHxxHa= E(aHx)(aHx)H= E|aHx|2 ≥ 0

For most cases, R and C are positive define

aHRxa > 0 aHCxa > 0

⇒ no linear dependencies in Rx or Cx



Definitions (Cross-Correlation and Cross-Covariance)

For random vectors x and y,

Cross-correlation4= Rxy = ExyH

Cross-covariance4= Cxy = E(x − mx)(y − my)

H= Rxy − mxmy

H

Definition (Uncorrelated Vectors)

Two vectors x and y are uncorrelated if

Cxy = Rxy − mxmyH = 0

or equivalentlyRxy = ExyH = mxmy

H



Note that as in the scalar case

independence ⇒ uncorrelated

uncorrelated ; independence

Also, x and y are orthogonal if

Rxy = ExyH = 0

Example

Let x and y be the same dimension. If

z = x + y

find Rz and Cz



By definition

Rz = E(x + y)(x + y)H= ExxH+ ExyH+ EyxH+ EyyH= Rx + Rxy + Ryx + Ry

SimilarlyCz = Cx + Cxy + Cyx + Cy

Note: If x and y are uncorrelated,

Rz = Rx + mxmyH + mymx

H + Ry

andCz = Cx + Cy



Definition (Multivariate Gaussian Density)

For a N dimensional random vector x with covariance Cx, themultivariate Gaussian pdf is

fx(x) =1

(2π)N2 |Cx|

12

e− 12 (x−mx)HCx

−1(x−mx)

Note the similarity to the univariate case

fx(x) =1√2πσ

e− 12(x−m)2

σ2

Example

Let N = 2 (bivariate case) and x be real. Then

x =

[x1

x2

]

and mx = Ex =

[m1

m2

]



Cx = E(x − mx)(x − mx)T

= ExxT − mxmxT

= E[

x21 x1x2

x2x1 x22

]

−[

m21 m1m2

m2m1 m22

]

=

[Ex2

1 − m21 Ex1x2 − m1m2

Ex2x1 − m2m1 Ex22 − m2

2

]

Recall thatσ2

x = Ex2 − E2xand

r =Ex1x2 − m1m2

σx1σx2



Rearranging: Cx =

[σ2

x1rσx1σx2

rσx1σx2 σ2x2

]

Also,

Cx−1 =

1σ2

x1σ2

x2− r2σ2

x1σ2

x2

[σ2

x2−rσx1σx2

−rσx1σx2 σ2x1

]

=1

σ2x1σ2

x2(1 − r2)

[σ2

x2−rσx1σx2

−rσx1σx2 σ2x1

]

Substituting into the Gaussian pdf and simplifying

fx(x) =1

2π|Cx|12

e− 12 (x−mx)T Cx

−1(x−mx)

=1

2πσx1σx2(1 − r2)12

e− 1

2(1−r2)

[

(x1−m1)2

σ2x1

−2r(x1−m1)(x2−m2)

σx1σx2+

(x2−m2)2

σ2x2

]



Note: If uncorrelated, r = 0

⇒ fx(x) = 12πσx1σx2

e− 1

2 [(x1−m1)

2

σ2x1

+(x2−m2)

2

σ2x2

]

= fx1(x1)fx2(x2)

Gaussian special case result:

uncorrelated ⇒ independent

Example

Examine the contours defined by

(x − mx)T Cx

−1(x − mx) = constant

Why? For all values on the contour

fx(x) = constant



r = 0 σx1 = σx2

x2

x1m1

m2

x

x

x

x

x

x

r = 0 σx1 > σx2

x2

x1m1

m2

x

x

x

x

r > 0 σx1 > σx2

x2

x1m1

m2

x

x

r > 0 σx1 < σx2

x2

x1m1

m2



r < 0 and σx1 > σx2

x2

x1m1

m2

x 2

)( 22xf x

x1

)( 11xf x

Integrating over x2 yields fx1(x1)

Integrating over x1 yields fx2(x2)



Additional Gaussian (surface) examples:

r = 0 σx1 = σx2 r = 0 σx1 < σx2

r > 0 σx1 < σx2 r < 0 σx1 < σx2



Transformations of a vector

Let the N functions g1(·), g2(·), . . . , gN(·) map x to z, where

z1 = g1(x1, x2, . . . , xN)z2 = g2(x)...

zN = gN(x)

Forward mapping

Let g1(·), g2(·), . . . , gN(·) be independent and yield a one-to-onetransformation such that ∃ a set of functions

x1 = h1(z), x2 = h2(z), . . . , xN = hN(z)

where z = [z1, z2, . . . , zN ]T .

Reverse mapping

Question: How do we determine the distribution fz(z)?



Let N = 2 and consider the probability of being in the region defined by

[z1, z1 + dz1] and [z2, z2 + dz2]

z1

z2

z2+d z2

z2

z1 z1+d z1

Az

x

x

A

),Pr(), xxAzz =∈x1

x2z2+d z2

z2z1

z1+d z1

Ax

Identify an equivalent area in the x1, x2 domain and equate theprobabilities

Pr(z1, z2) ∈ Az = Pr(x1, x2) ∈ Axfz1z2(z1, z2)Area(Az) = fx1x2(x1, x2)Area(Ax)



Area(Ax)

Area(Az)= abs

(

J(

x1 x2

z1 z2

))

=1

abs

(

J(

z1 z2

x1 x2

))

The Jacobian is defined as

J(

x1 x2

z1 z2

)

=

∣∣∣∣∣

∂x1∂z1

∂x1∂z2

∂x2∂z1

∂x2∂z2

∣∣∣∣∣

and

J(

z1 z2

x1 x2

)

=

∣∣∣∣∣

∂z1∂x1

∂z1∂x2

∂z2∂x1

∂z2∂x2

∣∣∣∣∣

Note that∂x1

∂z1=

∂h1(z)∂z1

and∂z1

∂x1=

∂g1(x)∂x1



Thusfz1z2(z1, z2)Area(Az) = fx1x2(x1, x2)Area(Ax)

⇒ fz1z2(z1, z2) =fx1x2(x1, x2)

Area(Az)/Area(Ax)=

fx1x2(x1, x2)

abs

(

J(

z1 z2

x1 x2

))

General Case Result (Functions of Vectors)

fz(z) =fx(x)

abs

(

J(

zx

))

where

J(

zx

)

=

∣∣∣∣∣∣∣

∂z1∂x1

· · · ∂z1∂xN

.... . .

...∂zN∂x1

· · · ∂zN∂xN

∣∣∣∣∣∣∣



Example (Linear Transformation)

Let z and x be linearly related

z1 = a11x1 + a12x2

z2 = a21x1 + a22x2

or z = Ax and x = A−1z

Then J(

z1 z2

x1 x2

)

=

∣∣∣∣∣

∂z1∂x1

∂z1∂x2

∂z2∂x1

∂z2∂x2

∣∣∣∣∣

=

∣∣∣∣

a11 a12

a21 a22

∣∣∣∣= |A|



Let A−1 =

[b11 b12

b21 b22

]

. Then

[x1

x2

]

=

[b11 b12

b21 b22

] [z1

z2

]

and

fz1z2(z1, z2) =fx1x2(x1, x2)

abs|A|

=fx1x2(b11z1 + b12z2, b21z1 + b22z2)

abs|A|

General Case Result (Linear Transformations)

For case where z = Ax

fz(z) =1

abs|A| fx(A−1z)



Vector Statistics for Linear Transformations

For such linear transformations z = Ax

Ez = EAx = Amx

SimilarlyEzzH = EAx(Ax)H

= EAxxHAH

⇒ Rz = ARxAH

By similar arguments it is easy to show

Cz = ACxAH

Note: Results in simple linear transformations of statistics


eleg-636: statistical signal processing

Documents