eleg-636: statistical signal processing
TRANSCRIPT
ELEG-636: Statistical Signal Processing
Gonzalo R. Arce
Department of Electrical and Computer EngineeringUniversity of Delaware
Spring 2010
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 1 / 96
Course Objectives & Structure
Course Objectives & Structure
Objective: Given a discrete time sequence x(n), develop
Statistical and spectral signal representation
Filtering, prediction, and system identification algorithmsOptimization methods that are
StatisticalAdaptive
Course Structure:
Weekly lectures [notes: www.ece.udel.edu/∼arce]
Periodic homework (theory & Matlab implementations) [15%]
Midterm & Final examinations [85%]
Textbook:
Haykin, Adaptive Filter Theory.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 2 / 96
Course Objectives & Structure
Course Objectives & Structure
Broad Applications in Communications, Imaging, Sensors.Emerging application in
Brain-imaging techniquesBrain-machine interfaces,Implantable devices.
Neurofeedback presents real-time physiological signals from MRIsin a visual or auditory form to provide information about brainactivity. These signals are used to train the patient to alter neuralactivity in a desired direction.
Traditionally, feedback using EEGs or other mechanisms has notfocused on the brain because the resolution is not good enough.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 3 / 96
Probability
Signal Characterization
Assumption: Many methods take x(n) to be deterministicReality: Real world signals are usually statistical in nature
Thus,. . . x(−1), x(0), x(1), . . .
can be interpreted as a sequence of random variables.We begin by analyzing each observation x(n) as a R.V.Then, to capture dependencies, we consider random vectors
. . . x(n), x(n + 1), . . . , x(n + N − 1)︸ ︷︷ ︸
x(n)
, x(n + N), . . .
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 5 / 96
Probability Random Variables
Random Variables
Definition
For a space S, the subsets, or events of S, have associatedprobabilities.
To every event δ, we assign a number x(δ), which is called a R.V.
The distribution function of x is
Prx ≤ x0 = Fx(x0) −∞ < x0 < ∞
Properties:1 F (+∞) = 1, F (−∞) = 02 F (x) is continuous from the right
F (x+) = F (x)
3 Prx1 < x ≤ x2 = F (x2)− F (x1)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 6 / 96
Probability Random Variables
Example
Fair toss of two coins: H=heads, T=Tails
Define numerical assignments:
Events(δ) Prob. X(δ) Y(δ)HH 1/4 1 -100HT 1/4 2 -100TH 1/4 3 -100TT 1/4 4 500
This assignments yield different distribution functions
Fx(2) = PrHH,HT = 1/2
Fy (2) = PrHH,HT ,TH = 3/4
How do we attain an intuitive interpretation of the distribution function?
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 7 / 96
Probability Random Variables
Distribution Plots
Note properties hold:
1 F (+∞) = 1, F (−∞) = 02 F (x) is continuous from the right
F (x+) = F (x)
3 Prx1 < x ≤ x2 = F (x2)− F (x1)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 8 / 96
Probability Random Variables
Definition
The probability density function is defined as,
f (x) =dF (x)
dx
or F (x) =∫ x
−∞f (x)dx
Thus F (∞) = 1 ⇒∫ ∞
−∞f (x)dx = 1
Types of distributions:
Continuous: Prx = x0 = 0 ∀x0
Discrete: F (xi)− F (x−i ) = Prx = xi = Pi
In which case f (x) =∑
i Piδ(x − xi)
Mixed: discontinuous but not discrete
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 9 / 96
Probability Random Variables
Distribution examples
Uniform: x ∼ U(a, b) a < b
f (x) = 1
b−a x ∈ [a, b]0 else
−
! "# $ % &
! "' $ % &(
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 10 / 96
Probability Random Variables
Gaussian: x ∼ N(µ, σ)
f (x) =1√2πσ
e− (x−µ)2
2σ2
µ
) * + ,µ
- * + ,. / 0.Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 11 / 96
Probability Random Variables
Gaussian Distribution Example
Example
Consider the Normal (Gaussian) distribution PDF and CDF forµ = 0, σ2 = 0.2, 1.0, 5.0 and µ = −2, σ2 = 0.5
1 2 3456 78 9 : ;9 : <9 : =9 : >9 : 9 ? @ ? A B A @B : 9
? B C > =? >? =D E C FD E C FD E G H FD E C F E C I HJK E L I CJK E M I CJK E C I MJK N O PQRS TU V W XV W YV W ZV W [V W V \ ] \ ^ _ ^ ]
_ W V\ _ ` [ Z\ [\ Z
a b ` ca b ` ca b d e ca b ` c b ` f egh b i f `gh b j f `gh b ` f jghGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 12 / 96
Probability Random Variables
Binomial: x ∼ B(p, q) p + q = 1
Example
Toss a coin n times. What is the probability of getting k heads?
For p + q = 1, where q is probability of a tail, and p is the probabilityof a head:
Prx = k =
(nk
)
pkqn−k[
NOTE:(
nk
)
=n!
(n − k)!k !
]
⇒ f (x) =n∑
k=0
(nk
)
pkqn−kδ(x − k)
⇒ F (x) =m∑
k=0
(nk
)
pkqn−k m ≤ x < m + 1
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 13 / 96
Probability Random Variables
Binomial Distribution Example I
Example
Toss a coin n times. What is the probability of getting k heads? Forn = 9, p = q = 1
2 (fair coin)
kk l mk l nk l ok m n o p q r s t u m k
v w x yzz |
z ~ | z
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 14 / 96
Probability Random Variables
Binomial Distribution Example II
Example
Toss a coin n times. What is the probability of getting k heads? Forn = 20, p = 0.5, 0.7 and n = 40, p = 0.5.
¡ ¡¢ ¡£ ¡¤ ¡¥¦ ¡
§ ¨ © ª « ¬ ¬ ¨ § ¨ © ® « ¬ ¬ ¨ § ¨ © ª « ¬ ¬ ¨ Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 15 / 96
Probability Conditional Distributions
Conditional Distributions
Definition
The conditional distribution of x given event “M” has occurred is
Fx(x0|M) = Prx ≤ x0|M
=Prx ≤ x0,M
PrM
Example
Suppose M = x ≤ a, then
Fx(x0|M) =Prx ≤ x0,M
Prx ≤ a
If x0 ≥ a, what happens?
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 16 / 96
Probability Conditional Distributions
Special Cases
Special Case: x0 ≥ a
Prx ≤ x0, x ≤ a = Prx ≤ a
⇒ Fx(x0|M) =Prx ≤ x0,M
Prx ≤ a =Prx ≤ aPrx ≤ a = 1
Special Case: x0 ≤ a
⇒ Fx(x0|M) =Prx ≤ x0,M
Prx ≤ a =Prx ≤ x0Prx ≤ a
=Fx(x0)
Fx(a)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 17 / 96
Probability Conditional Distributions
Conditional Distribution Example
Example
Suppose
¯° ± ² ³´
What does Fx(x |M) look like? Note M = x ≤ a.
⇒ Fx(x0|M) =
Fx (x0)Fx (a)
x ≤ a1 a ≤ x
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 18 / 96
Probability Conditional Distributions
µ¶ ·¸ ¹º» ¼½½¾ ¿≤ À
Á Â Ã ÄÅDistribution properties hold for conditional cases:
Limiting cases: F (∞|M) = 1 and F (−∞|M) = 0Probability range: Prx0 ≤ x ≤ x1|M = F (x1|M)− F (x0|M)Density–distribution relations:
f (x |M) =∂F (x |M)
∂x
F (x0|M) =
∫ x0
−∞
f (x |M)dx
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 19 / 96
Probability Conditional Distributions
Example (Fair Coin Toss)
Toss a fair coin 4 times. Let x be the number of heads. DeterminePrx = k.
Recall
Prx = k =
(nk
)
pkqn−k
In this case
Prx = k =
(4k
)(12
)4
Prx = 0 = Prx = 4 =1
16
Prx = 1 = Prx = 3 =14
Prx = 2 =38
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 20 / 96
Probability Conditional Distributions
Density and Distribution Plots for Fair Coin (n = 4) Ex.
ÆÆ Ç ÈÆ Ç ÉÆ Ç ÊÆ Ç ËÆ Ç ÌÆ È É Ê Ë Ì
Í Î Ï ÐÑ ÒÑ Ó Ñ Ô Õ Ñ ÒÑÓ Ñ ÖÖ × ØÙ Ö Ù Ú Û Ü Ø Ý Þ
ß à á âã äã ã äå ã äã ã ã äã å æ
What type of distribution is this? Discrete. Thus,
F (xi)− F (x−i ) = Prx = xi = Pi
F (x) =∫ x
−∞f (x)dx =
∫ x
−∞
∑
i
Piδ(x − xi)dx
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 21 / 96
Probability Conditional Distributions
Conditional Case
Example (Conditional Fair Coin Toss)
Toss a fair coin 4 times. Let x be the number of heads. SupposeM = [at least one flip produces a head]. Determine Prx = k |M.
Recall,
Prx = k |M =Prx = k ,M
PrMThus first determine PrM
PrM = 1 − PrNo heads
= 1 − 116
=1516
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 22 / 96
Probability Conditional Distributions
Next determine Prx = k |M for the individual cases, k = 0, 1, 2, 3, 4
Prx = 0|M =Prx = 0,M
PrM = 0
Prx = 1|M =Prx = 1,M
PrM
=Prx = 1
PrM =1/4
15/16=
415
Prx = 2|M =Prx = 2
PrM =3/8
15/16=
615
Prx = 3|M =4
15
Prx = 4|M =1
15
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 23 / 96
Probability Conditional Distributions
Conditional and Unconditional Density Functions
çç è éç è êç è ëç è ìç è íç é ê ë ì í
î ï ð ñò óò ô ò õ ö ò óòô ò ÷÷ ø ù÷ ø ú÷ ø û÷ ø ü÷ ø ý
÷ ù ú û ü ýþ ÿ
Are they proper density functions?
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 24 / 96
Probability Total Probability and Bayes’ Theorem
Total Probability and Bayes’ Theorem
Let M1,M2, . . . ,Mn forms a partition of S, i.e.⋃
i
Mi = S and Mi
⋂
i 6=j
Mj = φ
Then
F (x) =∑
i
Fx(x |Mi)Pr(Mi)
f (x) =∑
i
fx(x |Mi)Pr(Mi)
Aside
PrA|B =PrA,B
PrB =PrB,APrAPrBPrA =
PrB|APrAPrB
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 25 / 96
Probability Total Probability and Bayes’ Theorem
From this we get
PrM|x ≤ x0 =Prx ≤ x0|MPrM
Prx ≤ x0
=F (x0|M)PrM
F (x0)
and
PrM|x = x0 =f (x0|M)PrM
f (x0)
By integration∫ ∞
−∞PrM|x = x0f (x0)dx0 =
∫ ∞
−∞f (x0|M)PrMdx0
= PrM∫ ∞
−∞f (x0|M)dx0 = PrM
⇒ PrM =
∫ ∞
−∞PrM|x = x0f (x0)dx0
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 26 / 96
Probability Total Probability and Bayes’ Theorem
Putting it all Together: Bayes’ Theorem
Bayes’ Theorem:
f (x0|M) =PrM|x = x0f (x0)
PrM
=PrM|x = x0f (x0)
∫∞−∞ PrM|x = x0f (x0)dx0
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 27 / 96
Probability Functions of a R.V.
Functions of a R.V.
Problem Statement
Let x and g(x) be RVs such that
y = g(x)
Question: How do we determine the distribution of y?
NoteFy (y0) = Pry ≤ y0
= Prg(x) ≤ y0= Prx ∈ Ry0
whereRy0 = x : g(x) ≤ y0
Question: If y = g(x) = x2, what is Ry0 ?
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 28 / 96
Probability Functions of a R.V.
Example
Let y = g(x) = x2. Determine Fy (y0).
−
=
Note thatFy (y0) = Pr(y ≤ y0)
= Pr(−√y0 ≤ x ≤ √
y0)= Fx(
√y0)− Fx(−
√y0)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 29 / 96
Probability Functions of a R.V.
Example
Let x ∼ N(µ, σ) and
y = U(x) =
1 if x > µ0 if x ≤ µ
Determine fy (y0) and Fy (y0).
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 30 / 96
Probability Functions of a R.V.
General Function of a Random Variable Case
To determine the density of y = g(x) in terms of fx(x0), look at g(x)
fy (y0)dy0 = Pr(y0 ≤ y ≤ y0 + dy0)
= Pr(x1 ≤ x ≤ x1 + dx1) + Pr(x2 + dx2 ≤ x ≤ x2)
+Pr(x3 ≤ x ≤ x3 + dx3)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 31 / 96
Probability Functions of a R.V.
fy (y0)dy0 = Pr(x1 ≤ x ≤ x1 + dx1) + Pr(x2 + dx2 ≤ x ≤ x2)
+Pr(x3 ≤ x ≤ x3 + dx3)
= fx(x1)dx1 + fx(x2)|dx2|+ fx(x3)dx3 (∗)Note that
dx1 =dx1
dy0dy0 =
dy0
dy0/dx1=
dy0
g′(x1)
Similarly
dx2 =dy0
g′(x2)and dx3 =
dy0
g′(x3)
Thus (∗) becomes
fy (y0)dy0 =fx(x1)
g′(x1)dy0 +
fx(x2)
|g′(x2)|dy0 +
fx(x3)
g′(x3)dy0
or
fy (y0) =fx(x1)
g′(x1)+
fx(x2)
|g′(x2)|+
fx(x3)
g′(x3)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 32 / 96
Probability Functions of a R.V.
Function of a R.V. Distribution General Result
Set y = g(x) and let x1, x2, . . . be the roots, i.e.,
y = g(x1) = g(x2) = . . .
Then
fy (y) =fx(x1)
|g′(x1)|+
fx(x2)
|g′(x2)|+ . . .
Example
Suppose x ∼ U(−1, 2) and y = x2. Determine fy (y).
! " # $ %
& ' '(
) **'
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 33 / 96
Probability Functions of a R.V.
Note thatg(x) = x2 ⇒ g′(x) = 2x
Consider special cases separately:Case 1: 0 ≤ y ≤ 1
y = x2 ⇒ x = ±√y
fy (y) =fx(x1)
|g′(x1)|+
fx(x2)
|g′(x2)|
=1/3|2√y | +
1/3| − 2
√y | =
1/3√y
Case 2: 1 ≤ y ≤ 4y = x2 ⇒ x =
√y
fy (y) =fx(x1)
|g′(x1)|=
1/32√
y=
1/6√y
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 34 / 96
Probability Functions of a R.V.
Result: For x ∼ U(−1, 2) and y = x2
+, - . /0 1 2 3 4
. 5 6 67
8 996
fy (y) =
1/3√y 0 ≤ y ≤ 1
1/6√y 1 < y ≤ 4 :: ; <= : = > ? @ <
A B C D EF G D
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 35 / 96
Probability Functions of a R.V.
Example
Let x ∼ N(µ, σ) and y = ex . Determine fy (y).
Note g(x) ≥ 0 and g′(x) = ex
Also, there is a single root (inverse solution):
x = ln(y)
Therefore,
fy (y) =fx(x)|g′(x)| =
fx(x)ex
Expressing this in terms of y through substitution yields:
fy (y) =fx(ln(y))
eln(y)=
fx(ln(y))y
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 36 / 96
Probability Functions of a R.V.
Note that x is Gaussian:
fx(x) =1√2πσ
e− (x−µ)2
2σ2
⇒ fy (y) =1√
2πyσe− (ln(y)−µ)2
2σ2 , for y > 0
HH I J KH K L M N J
O P Q R SR
Log normal density
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 37 / 96
Probability Functions of a R.V.
Distribution of Fx(x)
For any RV with continuous distribution Fx(x), the RV y = Fx(x) isuniform on [0, 1].
Proof: Note 0 < y < 1. Since
g(x) = Fx(x)
g′(x) = fx(x)
Thus
fy (y) =fx(x)g′(x)
=fx(x)fx(x)
= 1
T UV W X Y ZU T UU [ W X Y Z Y
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 38 / 96
Probability Functions of a R.V.
Thus the function
g(x) = Fx(x)
performs the mapping:
The converse also holds:
Combining operationsyields Synthesis:
\] ^_ ` ab c _ de − fgh ij k lm n
−jo p qr s
− tGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 39 / 96
Probability Mean, Median and Variance
Mean, Median and variance
Definitions
Mean Ex =
∫ ∞
−∞xf (x)dx
Conditional Mean Ex |M =
∫ ∞
−∞xf (x |M)dx
Example
Suppose M = x ≥ a. Then
Ex |M =
∫ ∞
−∞xf (x |M)dx
=
∫∞a xf (x)dx∫∞
a f (x)dx
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 40 / 96
Probability Mean, Median and Variance
For a function of a RV, y = g(x),
Ey =
∫ ∞
−∞yfy (y)dy =
∫ ∞
−∞g(x)fx(x)dx
Example
Suppose g(x) is a step function: Determine Eg(x).
Eg(x) =
∫ ∞
−∞g(x)fx(x)dx =
∫ x0
−∞fx(x)dx = Fx(x0)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 41 / 96
Probability Mean, Median and Variance
Median
Definitions
Median= m∫ m
−∞f (x)dx =
∫ ∞
mf (x)dx =
12
Median Prx ≤ m = Prx ≥ m
Example
Let x ∼ λ exp−λx U(x). Then m = ln(2)λ
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 42 / 96
Probability Mean, Median and Variance
Definition (Variance)
Variance σ2 =
∫ ∞
−∞(x − η)2f (x)dx
where η = Ex. Thus,
σ2 = E(x − η)2 = Ex2 − E2x
Example
For x ∼ N(η, σ2), determine the variance.
f (x) =1√2πσ
e− (x−η)2
2σ2
Note: f (x) is symmetric about x = η ⇒ Ex = η
Also ∫ ∞
−∞f (x)dx = 1 ⇒
∫ ∞
−∞e− (x−η)2
2σ2 dx =√
2πσ
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 43 / 96
Probability Mean, Median and Variance
∫ ∞
−∞e− (x−η)2
2σ2 dx =√
2πσ
Differentiating w.r.t. σ:
⇒∫ ∞
−∞
(x − η)2
σ3 e− (x−η)2
2σ2 dx =√
2π
Rearranging yields
∫ ∞
−∞(x − η)2 1√
2πσe− (x−η)2
2σ2 dx = σ2
orE(x − η)2 = σ2
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 44 / 96
Probability Moments
Definition (Moments)
Moments
mn = Exn =
∫ ∞
−∞xnf (x)dx
Central Moments
µn = E(x − η)n =
∫ ∞
−∞(x − η)nf (x)dx
From the binomial theorem
µn = E(x − η)n = E
n∑
k=0
(nk
)
xk (−η)n−k
=
n∑
k=0
(nk
)
mk (−η)n−k
⇒ µ0 = 1, µ1 = 0, µ2 = σ2, µ3 = m3 − 3ηm2 + 2η3
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 45 / 96
Probability Moments
Example
Let x ∼ N(0, σ2). Prove
Exn =
0 n = 2k + 11 · 3 · · · · (n − 1)σn n = 2k
For n odd
Exn =
∫ ∞
−∞xnf (x) = 0
since xn is an odd function and f (x) is an even function.
To prove the second part, use the fact that
∫ ∞
−∞e−αx2
dx =
√π
α
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 46 / 96
Probability Moments
Differentiate ∫ ∞
−∞e−αx2
dx =
√π
α
with respect to α, k times
⇒∫ ∞
−∞x2ke−αx2
dx =1 · 3 · · · · (2k − 1)
2k
√π
α2k+1
Let α = 12σ2 , then
∫ ∞
−∞x2ke− x2
2σ2 dx = 1 · 3 · · · · (2k − 1)σ2k+1√
2π
Setting n = 2k and rearranging∫ ∞
−∞xn 1√
2πσe− x2
2σ2 dx = 1 · 3 · · · · (n − 1)σn [QED]
Note: Variance is a measure of a RV’s concentration around its meanGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 47 / 96
Probability Tchebycheff Inequality
Tchebycheff Inequality
For any ε > 0,
Pr(|x − η| ≥ ε) ≤ σ2
ε2
To prove this, note
Pr(|x − η| ≥ ε) =
∫ η−ε
−∞f (x)dx +
∫ ∞
η+εf (x)dx
=
∫
|x−η|≥ε
f (x)dx
Also note that
σ2 =
∫ ∞
−∞(x − η)2f (x)dx
≥∫
|x−η|≥ε
(x − η)2f (x)dx
u v w xη η y εηz ε
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 48 / 96
Probability Tchebycheff Inequality
σ2 ≥∫
|x−η|≥ε
(x − η)2f (x)dx
Using the fact that |x − η| ≥ ε in the above gives
σ2 ≥ ε2∫
|x−η|≥ε
f (x)dx
= ε2Pr|x − η| ≥ ε
Rearranging gives the desired result
⇒ Pr|x − η| ≥ ε ≤(σ
ε
)2
QED
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 49 / 96
Probability Jensen’s Inequality
Jensen’s Inequality
For a real convex function Ψ and x a random variable,
Ψ(E [x ]) ≤ E [Ψ(x)].
with equality when Ψ not strictly convex, Ψ(x) = x .Inequality reversed for concave Ψ.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 50 / 96
Probability Jensen’s Inequality
Suppose Ω is a measurable subset of the real line and f (x) is anon-negative function such that
∫ ∞
−∞f (x)dx = 1
where f is a probability density function. Then Jensen’s inequalitybecomes the following statement about integrals:If g is any real-valued measurable function and Ψ is convex over therange of g, then
Ψ
(∫ ∞
−∞g(x)f (x)dx
)
≤∫ ∞
−∞Ψ(g(x))f (x)dx
If g(x) = x , then this form of inequality reduces to the commonly usedspecial case:
Ψ
(∫ ∞
−∞xf (x)dx
)
≤∫ ∞
−∞Ψ(x)f (x)dx
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 51 / 96
Probability Characteristic & Moment Generating Functions
Definition (Characteristic Function)
The characteristic function of a random variable x with pdf fx(x) isdefined by
φx(ω) = E(
ejωx)
=
∫ ∞
−∞ejωx fx(x)dx
If fx(x) is symmetric about 0 (fx(x) = fx(−x)), then φx(x) is realThe magnitude of the characteristic function is bound by
|φx(ω)| ≤ φx(0) = 1
Theorem (Characteristic Function for the sum of independent RVs)
Let x1, x2, . . . , xN be independent (but not necessarily identicallydistributed) RVs and set sN =
∑ni=1 aixi where ai are constants. Then
φsN (ω) =
N∏
i=1
φxi (aiω)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 52 / 96
Probability Characteristic & Moment Generating Functions
The theorem can be proved by a simple extension of the following: Letx and y be independent. Then
φx+y (ω) = E(
ejω(x+y))
= E(
ejωxejωy)
= E(
ejωx)
E(
ejωy)
= φx(ω)φy (ω)
Example
Determine the characteristic function of the sample mean operating oniid samples.
Note x = 1N
∑Ni=1 xi ⇒ ai =
1N
⇒ φx(ω) =N∏
i=1
φxi (aiω) =(
φxi
(ω
N
))N
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 53 / 96
Probability Characteristic & Moment Generating Functions
The Moment Generating function is realized by making the substitutionjω → s in the above
Definition (Moment Generating Function)
The moment generating function of a random variable x with pdf fx(x)is defined by
Φx(s) = E (esx) =
∫ ∞
−∞esx fx(x)dx
Note Φx(jω) = φx(ω)
Theorem (Moment Generation)
Provided that Φx(s) exists in an open interval around s = 0, thefollowing hold
mn = E (xn) = Φ(n)x (0) =
dnΦx
dsn (0)
Simply noting that Φ(n)x (s) = E (xnesx) proves the result
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 54 / 96
Probability Characteristic & Moment Generating Functions
Example
Let x be exponentially distributed,
f (x) = λe−λxU(x)
Determine η = m1,m2, and σ2
Note
Φx(s) = λ
∫ ∞
0esxe−λxdx = λ
∫ ∞
0e−x(λ−s)dx
=λ
λ− sThus
Φ(1)x (0) =
1λ
and Φ(2)x (0) =
2λ2
and
Ex =1λ, E
x2
=2λ2 ⇒ σ2 =
1λ2
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 55 / 96
Probability Bivariate Statistics
Bivariate Statistics
Given two RVs, x and y , the bivariate (joint) distribution is given by
F (x0, y0) = Prx ≤ x0, y ≤ y0 | | Properties:
F (−∞, y) = F (x ,−∞) = 0
F (∞,∞) = 1
Fx(x) = F (x ,∞), Fy (y) = F (∞, y)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 56 / 96
Probability Bivariate Statistics
Special Cases
Case 1: M = x1 ≤ x ≤ x2, y ≤ y0 x
y
y0
x1 x2
y
x
y
y
x
⇒ PrM = F (x2, y0)− F (x1, y0)
Case 2: M = x ≤ x0, y1 ≤ y ≤ y2
y
x0
y1
y2
x
⇒ PrM = F (x0, y2)− F (x0, y1)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 57 / 96
Probability Bivariate Statistics
Case 3: M = x1 ≤ x ≤ x2, y1 ≤ y ≤ y2 Theny
x1
y1
y2
xx2
andPrM = F (x2, y2)− F (x1, y2)− F (x2, y1) + F (x1, y1)
︸ ︷︷ ︸
↓
Added back because this region was subtracted twice
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 58 / 96
Probability Joint Statistics
Definition (Joint Statistics)
f (x , y) =∂2F (x , y)∂x∂y
and
F (x , y) =∫ x
−∞
∫ y
−∞f (α, β)dαdβ
In general, for some region M, the joint statistics are
Pr(x , y) ∈ M =
∫ ∫
M
f (x , y)dxdy
Marginal Statistics: Fx(x) = F (x ,∞) and Fy (y) = F (∞, y)
⇒ fx(x) =
∫ ∞
−∞f (x , y)dy
⇒ fy (y) =
∫ ∞
−∞f (x , y)dx
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 59 / 96
Probability Independence
Independence
Definition (Independence)
Two RVs x and y are statistically independent if for arbitrary events(regions) x ∈ A and y ∈ B,
Prx ∈ A, y ∈ B = Prx ∈ APry ∈ B
Letting A = x ≤ x0 and B = y ≤ y0, we see x and y areindependent iff
Fx ,y (x , y) = Fx(x)Fy (y)
and by differentiation
fx ,y (x , y) = fx(x)fy (y)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 60 / 96
Probability Independence
If x and y are independent RVs, then
z = q(x) and w = h(y)
are also independent.
Function of two RVs
Given two RVs, let z = g(x , y). Define Dz to be the xy plane region
z ≤ z0 = g(x , y) ≤ z0 = (x , y) ∈ Dz
Then
Fz(z0) = Prz ≤ z0= Pr(x , y) ∈ Dz
=
∫ ∫
Dz
f (x , y)dxdy
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 61 / 96
Probability Independence
Example
Let z = x + y . Then, z ≤ z0 gives the region x + y ≤ z0 which isdelineated by the line x + y = z0
∫ ∫
y
z0
x
z0
Thus
Fz(z0) =
∫ ∫
Dz
f (x , y)dxdy
=
∫ ∞
−∞
∫ z0−y
−∞f (x , y)dxdy
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 62 / 96
Probability Independence
We can obtain fz(z) by differentiation
∂Fz(z)∂z
=
∫ ∞
−∞
∂
∂z
∫ z−y
−∞f (x , y)dxdy
fz(z) =
∫ ∞
−∞f (z − y , y)dy (∗)
Note that if x and y are independent,
f (x , y) = fx(x)fy (y) (∗∗)
Thus utilizing (∗∗) in (∗)
fz(z) =
∫ ∞
−∞fx(z − y)fy (y)dy
︸ ︷︷ ︸
Convolution
= fx(z) ∗ fy (z)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 63 / 96
Probability Independence
Example
Let z = x + y where x and y are independent with
fx(x) = αe−αxU(x)
fy (y) = αe−αyU(y)
Then
fz(z) =
∫ ∞
−∞fx(z − y)fy (y)dy
= α2∫ z
0e−α(z−y)e−αydy
= α2e−αz∫ z
0dy
= α2ze−αzU(z)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 64 / 96
Probability Independence
Example
Let z = max(x , y). Determine Fz(z0) and fz(z0).
NoteFz(z0) = Prz ≤ z0
= Prmax(x , y) ≤ z0= Fxy (z0, z0)
∂
y
x
z0
z0
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 65 / 96
Probability Independence
If x and y are independent,
Fz(z0) = Fx(z0)Fy (z0)
and
fz(z0) =∂Fz(z0)
∂z0
=∂Fx(z0)
∂z0Fy (z0) +
∂Fy (z0)
∂z0Fx(z0)
= fx(z0)Fy (z0) + fy (z0)Fx(z0)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 66 / 96
Probability Joint Moments
Joint Moments
For RVs x and y and function z = g(x , y)
Ez =
∫ ∞
−∞zfz(z)dz
Eg(x , y) =
∫ ∞
−∞
∫ ∞
−∞g(x , y)f (x , y)dxdy
Definition (Covariance)
For RVs x and y ,
Cxy = Cov(x , y)= E [(x − ηx)(y − ηy )]= E [xy ]− ηxE [y ]− ηyE [x ] + ηxηy
= E [xy ]− ηxηy
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 67 / 96
Probability Joint Moments
Definition (Correlation Coefficient)
The correlation coefficient is given by
r =Cxy
σxσy
Note that
0 ≤ E[a(x − ηx) + (y − ηy )]2
= E(x − ηx)2a2 + 2E(x − ηx)(y − ηy )a + E(y − ηy )
2= σ2
x a2 + 2Cxya + σ2y
This is a positive quadratic function of a⇒ Roots are imaginary and discriminant is non-positive
√
4C2xy − 4σ2
xσ2y → imaginary
⇒ 4C2xy − 4σ2
xσ2y ≤ 0
⇒ C2xy ≤ σ2
xσ2y
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 68 / 96
Probability Joint Moments
Thus,
|Cxy | ≤ σxσy and |r | = |Cxy |σxσy
≤ 1
Definition (Uncorrelated)
Two RVs are uncorrelated if their covariance is zero
Cxy = 0
⇒ r =Cxy
σxσy= 0
=Exy − ExEy
σxσy= 0
⇒ Exy = ExEyThus
Cxy = 0 ⇔ Exy = ExEy
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 69 / 96
Probability Joint Moments
Result
If x and y are independent, then
Exy = ExEy
and x and y are uncorrelated
Note: Converse is not true (in general)
Converse only holds for Gaussian RVs
Independence is a stronger condition than uncorrelated
Definition (Orthogonality)
Two RVs are orthogonal if
Exy = 0
Note: If x and y are correlated, they are not orthogonal
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 70 / 96
Probability Joint Moments
Example
Consider the correlation between two RV’s, x and y , with samplesshown in a scatter plot
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 71 / 96
Probability Sequences and Vectors of Random Variables
Sequences and Vectors of Random Variables
Definition (Vector Distribution)
Let x be a sequence of RVs. Take N samples to form the randomvector
x = [x1, x2, . . . , xN ]T
Then the vector distribution function is
Fx(x0) = Prx1 ≤ x01 , x2 ≤ x0
2 , . . . , xN ≤ x0N
4= Prx ≤ x0
Special Case: For complex data
x = xr + jxi
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 72 / 96
Probability Sequences and Vectors of Random Variables
x1
x2...xN
=
xr1
xr2...xrN
+ j
xi1
xi2...xiN
The distribution in the complex case is defined as
Fx(x0) = Prxr ≤ x0r , xi ≤ x0
i 4= Prx ≤ x0
The density function is given by
fx(x) =∂NFx(x)
∂x1∂x2 . . . ∂xN
Fx(x0) =
∫ x01
−∞
∫ x02
−∞. . .
∫ x0N
−∞fx(x)dx1dx2 . . . dxN
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 73 / 96
Probability Sequences and Vectors of Random Variables
Properties:
Fx([∞,∞, · · · ,∞]T ) = 1∫ ∞
−∞
∫ ∞
−∞· · ·
∫ ∞
−∞fx(x)dx = 1
Fx([x1, x2, · · · ,−∞, · · · , xN ]T ) = 0
Also
F ([∞, x2, x3, · · · , xN ]T ) = F ([x2, x3, · · · , xN ]
T )∫ ∞
−∞f ([x1, x2, x3, · · · , xN ]
T )dx1 = f ([x2, x3, · · · , xN ]T )
Setting xi = ∞ in the cdf eliminates this sample
Integrating over (−∞,∞) along xi in the pdf eliminates this sample
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 74 / 96
Probability Sequences and Vectors of Random Variables
Joint Distribution
Definitions (Joint Distribution and Density)
Given two random vectors x and y, the joint distribution and density are
Fxy(x0, y0) = Prx ≤ x0, y ≤ y0
fxy(x, y) =∂N∂MFxy(x, y)
∂x1∂x2 · · · ∂xN∂y1∂y2 · · · ∂yM
Definition (Vector Independence)
The vectors are independent iff
Fxy(x, y) = Fx(x)Fy(y)
or equivalentlyfxy(x, y) = fx(x)fy(y)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 75 / 96
Probability Sequences and Vectors of Random Variables
Expectations & Moments
Objective: Obtain partial description of process generating x
Solution: Use moments
The first moment, or mean, is
mx = Ex = [m1,m2, . . . ,mN ]T
=
∫ ∞
−∞
∫ ∞
−∞· · ·
∫ ∞
−∞xfx(x)dx
⇒ mk =
∫ ∞
−∞
∫ ∞
−∞· · ·
∫ ∞
−∞xk fx(x)dx1dx2 · · · dxN
=
∫ ∞
−∞xk fxk (xk )
︸ ︷︷ ︸dxk
⇑ marginal distribution of xk
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 76 / 96
Probability Sequences and Vectors of Random Variables
Definition (Correlation Matrix)
A complete set of second moments is given by the correlation matrix
Rx = ExxH = Exx∗T
=
E|x1|2 Ex1x∗2 · · · Ex1x∗
NEx2x∗
1 E|x2|2 · · · Ex2x∗N
......
. . ....
ExNx∗1 ExNx∗
2 · · · E|xN |2
ResultThe correlation matrix is Hermitian symmetric
(Rx)H = (ExxH)H
= E(xxH)H= ExxH = Rx
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 77 / 96
Probability Sequences and Vectors of Random Variables
Definition (Covariance Matrix)
The set of second central moments is given by the covariance
Cx = E(x − mx)(x − mx)H
= ExxH − mxExH − ExmxH + mxmx
H
= Rx − mxmxH
ResultThe covariance is Hermitian symmetric
Cx = CxH
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 78 / 96
Probability Sequences and Vectors of Random Variables
Result
The correlation and covariance matrices are positive semi-definite
aHRxa ≥ 0 aHCxa ≥ 0 (∀a)
To prove this, note
aHRxa = aHExxHa
= EaHxxHa= E(aHx)(aHx)H= E|aHx|2 ≥ 0
For most cases, R and C are positive define
aHRxa > 0 aHCxa > 0
⇒ no linear dependencies in Rx or Cx
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 79 / 96
Probability Sequences and Vectors of Random Variables
Definitions (Cross-Correlation and Cross-Covariance)
For random vectors x and y,
Cross-correlation4= Rxy = ExyH
Cross-covariance4= Cxy = E(x − mx)(y − my)
H= Rxy − mxmy
H
Definition (Uncorrelated Vectors)
Two vectors x and y are uncorrelated if
Cxy = Rxy − mxmyH = 0
or equivalentlyRxy = ExyH = mxmy
H
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 80 / 96
Probability Sequences and Vectors of Random Variables
Note that as in the scalar case
independence ⇒ uncorrelated
uncorrelated ; independence
Also, x and y are orthogonal if
Rxy = ExyH = 0
Example
Let x and y be the same dimension. If
z = x + y
find Rz and Cz
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 81 / 96
Probability Sequences and Vectors of Random Variables
By definition
Rz = E(x + y)(x + y)H= ExxH+ ExyH+ EyxH+ EyyH= Rx + Rxy + Ryx + Ry
SimilarlyCz = Cx + Cxy + Cyx + Cy
Note: If x and y are uncorrelated,
Rz = Rx + mxmyH + mymx
H + Ry
andCz = Cx + Cy
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 82 / 96
Probability Sequences and Vectors of Random Variables
Definition (Multivariate Gaussian Density)
For a N dimensional random vector x with covariance Cx, themultivariate Gaussian pdf is
fx(x) =1
(2π)N2 |Cx|
12
e− 12 (x−mx)HCx
−1(x−mx)
Note the similarity to the univariate case
fx(x) =1√2πσ
e− 12(x−m)2
σ2
Example
Let N = 2 (bivariate case) and x be real. Then
x =
[x1
x2
]
and mx = Ex =
[m1
m2
]
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 83 / 96
Probability Sequences and Vectors of Random Variables
Cx = E(x − mx)(x − mx)T
= ExxT − mxmxT
= E[
x21 x1x2
x2x1 x22
]
−[
m21 m1m2
m2m1 m22
]
=
[Ex2
1 − m21 Ex1x2 − m1m2
Ex2x1 − m2m1 Ex22 − m2
2
]
Recall thatσ2
x = Ex2 − E2xand
r =Ex1x2 − m1m2
σx1σx2
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 84 / 96
Probability Sequences and Vectors of Random Variables
Rearranging: Cx =
[σ2
x1rσx1σx2
rσx1σx2 σ2x2
]
Also,
Cx−1 =
1σ2
x1σ2
x2− r2σ2
x1σ2
x2
[σ2
x2−rσx1σx2
−rσx1σx2 σ2x1
]
=1
σ2x1σ2
x2(1 − r2)
[σ2
x2−rσx1σx2
−rσx1σx2 σ2x1
]
Substituting into the Gaussian pdf and simplifying
fx(x) =1
2π|Cx|12
e− 12 (x−mx)T Cx
−1(x−mx)
=1
2πσx1σx2(1 − r2)12
e− 1
2(1−r2)
[
(x1−m1)2
σ2x1
−2r(x1−m1)(x2−m2)
σx1σx2+
(x2−m2)2
σ2x2
]
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 85 / 96
Probability Sequences and Vectors of Random Variables
Note: If uncorrelated, r = 0
⇒ fx(x) = 12πσx1σx2
e− 1
2 [(x1−m1)
2
σ2x1
+(x2−m2)
2
σ2x2
]
= fx1(x1)fx2(x2)
Gaussian special case result:
uncorrelated ⇒ independent
Example
Examine the contours defined by
(x − mx)T Cx
−1(x − mx) = constant
Why? For all values on the contour
fx(x) = constant
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 86 / 96
Probability Sequences and Vectors of Random Variables
r = 0 σx1 = σx2
x2
x1m1
m2
x
x
x
x
x
x
r = 0 σx1 > σx2
x2
x1m1
m2
x
x
x
x
r > 0 σx1 > σx2
x2
x1m1
m2
x
x
r > 0 σx1 < σx2
x2
x1m1
m2
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 87 / 96
Probability Sequences and Vectors of Random Variables
r < 0 and σx1 > σx2
x2
x1m1
m2
x 2
)( 22xf x
x1
)( 11xf x
Integrating over x2 yields fx1(x1)
Integrating over x1 yields fx2(x2)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 88 / 96
Probability Sequences and Vectors of Random Variables
Additional Gaussian (surface) examples:
r = 0 σx1 = σx2 r = 0 σx1 < σx2
r > 0 σx1 < σx2 r < 0 σx1 < σx2
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 89 / 96
Probability Sequences and Vectors of Random Variables
Transformations of a vector
Let the N functions g1(·), g2(·), . . . , gN(·) map x to z, where
z1 = g1(x1, x2, . . . , xN)z2 = g2(x)...
zN = gN(x)
Forward mapping
Let g1(·), g2(·), . . . , gN(·) be independent and yield a one-to-onetransformation such that ∃ a set of functions
x1 = h1(z), x2 = h2(z), . . . , xN = hN(z)
where z = [z1, z2, . . . , zN ]T .
Reverse mapping
Question: How do we determine the distribution fz(z)?
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 90 / 96
Probability Sequences and Vectors of Random Variables
Let N = 2 and consider the probability of being in the region defined by
[z1, z1 + dz1] and [z2, z2 + dz2]
z1
z2
z2+d z2
z2
z1 z1+d z1
Az
x
x
A
),Pr(), xxAzz =∈x1
x2z2+d z2
z2z1
z1+d z1
Ax
Identify an equivalent area in the x1, x2 domain and equate theprobabilities
Pr(z1, z2) ∈ Az = Pr(x1, x2) ∈ Axfz1z2(z1, z2)Area(Az) = fx1x2(x1, x2)Area(Ax)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 91 / 96
Probability Sequences and Vectors of Random Variables
Area(Ax)
Area(Az)= abs
(
J(
x1 x2
z1 z2
))
=1
abs
(
J(
z1 z2
x1 x2
))
The Jacobian is defined as
J(
x1 x2
z1 z2
)
=
∣∣∣∣∣
∂x1∂z1
∂x1∂z2
∂x2∂z1
∂x2∂z2
∣∣∣∣∣
and
J(
z1 z2
x1 x2
)
=
∣∣∣∣∣
∂z1∂x1
∂z1∂x2
∂z2∂x1
∂z2∂x2
∣∣∣∣∣
Note that∂x1
∂z1=
∂h1(z)∂z1
and∂z1
∂x1=
∂g1(x)∂x1
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 92 / 96
Probability Sequences and Vectors of Random Variables
Thusfz1z2(z1, z2)Area(Az) = fx1x2(x1, x2)Area(Ax)
⇒ fz1z2(z1, z2) =fx1x2(x1, x2)
Area(Az)/Area(Ax)=
fx1x2(x1, x2)
abs
(
J(
z1 z2
x1 x2
))
General Case Result (Functions of Vectors)
fz(z) =fx(x)
abs
(
J(
zx
))
where
J(
zx
)
=
∣∣∣∣∣∣∣
∂z1∂x1
· · · ∂z1∂xN
.... . .
...∂zN∂x1
· · · ∂zN∂xN
∣∣∣∣∣∣∣
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 93 / 96
Probability Sequences and Vectors of Random Variables
Example (Linear Transformation)
Let z and x be linearly related
z1 = a11x1 + a12x2
z2 = a21x1 + a22x2
or z = Ax and x = A−1z
Then J(
z1 z2
x1 x2
)
=
∣∣∣∣∣
∂z1∂x1
∂z1∂x2
∂z2∂x1
∂z2∂x2
∣∣∣∣∣
=
∣∣∣∣
a11 a12
a21 a22
∣∣∣∣= |A|
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 94 / 96
Probability Sequences and Vectors of Random Variables
Let A−1 =
[b11 b12
b21 b22
]
. Then
[x1
x2
]
=
[b11 b12
b21 b22
] [z1
z2
]
and
fz1z2(z1, z2) =fx1x2(x1, x2)
abs|A|
=fx1x2(b11z1 + b12z2, b21z1 + b22z2)
abs|A|
General Case Result (Linear Transformations)
For case where z = Ax
fz(z) =1
abs|A| fx(A−1z)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 95 / 96
Probability Sequences and Vectors of Random Variables
Vector Statistics for Linear Transformations
For such linear transformations z = Ax
Ez = EAx = Amx
SimilarlyEzzH = EAx(Ax)H
= EAxxHAH
⇒ Rz = ARxAH
By similar arguments it is easy to show
Cz = ACxAH
Note: Results in simple linear transformations of statistics
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 96 / 96