Lecture Notes 3
Random Vectors
• Specifying a Random Vector
• Mean and Covariance Matrix
• Coloring and Whitening
• Gaussian Random Vectors
EE 278: Random Vectors Page 3 – 1
Specifying a Random Vector
• Let X1, X2, . . . , Xn be random variables defined on the same probability space.We define a random vector (RV) as
X =
X1
X2...
Xn
• X is completely specified by its joint cdf for x = (x1, x2, . . . , xn):
FX(x) = P{X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn} , x ∈ Rn
• If X is continuous, i.e., FX(x) is a continuous function of x, then X can bespecified by its joint pdf:
fX(x) = fX1,X2,...,Xn(x1, x2, . . . , xn) , x ∈ Rn
• If X is discrete then it can be specified by its joint pmf:
pX(x) = pX1,X2,...,Xn(x1, x2, . . . , xn) , x ∈ Xn
EE 278: Random Vectors Page 3 – 2
• A marginal cdf (pdf, pmf) is the joint cdf (pdf, pmf) for a subset of{X1, . . . , Xn}; e.g., for
X =
X1
X2
X3
the marginals are
fX1(x1) , fX2(x2) , fX3(x3)
fX1,X2(x1, x2) , fX1,X3(x1, x3) , fX2,X3(x2, x3)
• The marginals can be obtained from the joint in the usual way. For the previousexample,
FX1(x1) = limx2,x3→∞
FX(x1, x2, x3)
fX1,X2(x1, x2) =
∫ ∞
−∞
fX1,X2,X3(x1, x2, x3) dx3
EE 278: Random Vectors Page 3 – 3
• Conditional cdf (pdf, pmf) can also be defined in the usual way. E.g., theconditional pdf of Xn
k+1 = (Xk+1, . . . , Xn) given Xk = (X1, . . . ,Xk) is
fXnk+1
|Xk(xnk+1|xk) =
fX(x1, x2, . . . , xn)
fXk(x1, x2, . . . , xk)=
fX(x)
fXk(xk)
• Chain Rule: We can write
fX(x) = fX1(x1)fX2|X1(x2|x1)fX3|X1,X2
(x3|x1, x2) · · · fXn|Xn−1(xn|xn−1)
Proof: By induction. The chain rule holds for n = 2 by definition of conditionalpdf. Now suppose it is true for n− 1. Then
fX(x) = fXn−1(xn−1)fXn|Xn−1(xn|xn−1)
= fX1(x1)fX2|X1(x2|x1) · · · fXn−1|Xn−2(xn−1|xn−2)fXn|Xn−1(xn|xn−1) ,
which completes the proof
EE 278: Random Vectors Page 3 – 4
Independence and Conditional Independence
• Independence is defined in the usual way; e.g., X1, X2, . . . , Xn are independentif
fX(x) =n∏
i=1
fXi(xi) for all (x1, . . . , xn)
• Important special case, i.i.d. r.v.s: X1, X2, . . . , Xn are said to be independent,identically distributed (i.i.d.) if they are independent and have the samemarginals
Example: if we flip a coin n times independently, we generate i.i.d. Bern(p)r.v.s. X1, X2, . . . ,Xn
• R.v.s X1 and X3 are said to be conditionally independent given X2 if
fX1,X3|X2(x1, x3|x2) = fX1|X2
(x1|x2)fX3|X2(x3|x2) for all (x1, x2, x3)
• Conditional independence neither implies nor is implied by independence;X1 and X3 independent given X2 does not mean that X1 and X3 areindependent (or vice versa)
EE 278: Random Vectors Page 3 – 5
• Example: Coin with random bias. Given a coin with random bias P ∼ fP (p),flip it n times independently to generate the r.v.s X1,X2, . . . ,Xn, whereXi = 1 if i-th flip is heads, 0 otherwise
◦ X1, X2, . . . , Xn are not independent
◦ However, X1, X2, . . . , Xn are conditionally independent given P ; in fact, theyare i.i.d. Bern(p) for every P = p
• Example: Additive noise channel. Consider an additive noise channel with signalX , noise Z , and observation Y = X + Z , where X and Z are independent
◦ Although X and Z are independent, they are not in general conditionallyindependent given Y
EE 278: Random Vectors Page 3 – 6
Mean and Covariance Matrix
• The mean of the random vector X is defined as
E(X) =[
E(X1) E(X2) · · · E(Xn)]T
• Denote the covariance between Xi and Xj , Cov(Xi, Xj), by σij (so thevariance of Xi is denoted by σii, Var(Xi), or σ
2Xi)
• The covariance matrix of X is defined as
ΣX =
σ11 σ12 · · · σ1n
σ21 σ22 · · · σ2n... ... . . . ...
σn1 σn2 · · · σnn
• For n = 2, we can use the definition of correlation coefficient to obtain
ΣX =
[
σ11 σ12
σ21 σ22
]
=
[
σ2X1
ρX1,X2σX1σX2
ρX1,X2σX1σX2 σ2X2
]
EE 278: Random Vectors Page 3 – 7
Properties of Covariance Matrix ΣX
• ΣX is real and symmetric (since σij = σji)
• ΣX is positive semidefinite, i.e., the quadratic form
aTΣXa ≥ 0 for every real vector a
Equivalently, all the eigenvalues of ΣX are nonnegative, and also all principalminors are nonnegative
• To show that ΣX is positive semidefinite we write
ΣX = E[
(X− E(X))(X− E(X))T]
,
i.e., as the expectation of an outer product. Thus
aTΣXa = aT E[
(X− E(X))(X− E(X))T]
a
= E[
aT (X− E(X))(X− E(X))Ta]
= E[
(aT (X− E(X)))2]
≥ 0
EE 278: Random Vectors Page 3 – 8
Which of the Following Can Be a Covariance Matrix ?
1.
1 0 00 1 00 0 1
2.
1 2 12 1 11 1 1
3.
1 0 11 2 10 1 3
4.
−1 1 11 1 11 1 1
5.
1 1 11 2 11 1 3
6.
1 2 32 4 63 6 9
EE 278: Random Vectors Page 3 – 9
Coloring and Whitening
• Square root of covariance matrix: Let Σ be a covariance matrix. Then thereexists an n× n matrix Σ1/2 such that Σ = Σ1/2(Σ1/2)T . The matrix Σ1/2 iscalled the square root of Σ
• Coloring: Let X be white RV, i.e., has zero mean and ΣX = aI , a > 0. Assumewithout loss of generality that a = 1
Let Σ be a covariance matrix, then the RV Y = Σ1/2X has covariance matrixΣ (why?)
Hence we can generate a RV with any prescribed covariance from a white RV
• Whitening: Given a zero mean RV Y with nonsingular covariance matrix Σ,then the RV X = Σ−1/2Y is white
Hence, we can generate a white RV from any RV with nonsingular covariancematrix
• Coloring and whitening have applications in simulations, detection, andestimation
EE 278: Random Vectors Page 3 – 10
Finding Square Root of Σ
• For convenience, we assume throughout that Σ is nonsingular
• Since Σ is symmetric, it has n real eigenvalues λ1, λ2, . . . , λn and ncorresponding orthogonal eigenvectors u1,u2, . . . ,un
Further, since Σ is positive definite, the eigenvalues are all positive
• Thus, we have
Σui = λiui, λi > 0, i = 1, 2, . . . , n
uTi uj = 0 for every i 6= j
Without loss of generality assume that the ui vectors are unit vectors
• The first set of equations can be rewritten in the matrix form
ΣU = UΛ,
where
U = [u1 u2 . . . un]
and Λ is a diagonal matrix with diagonal elements λi
EE 278: Random Vectors Page 3 – 11
• Note that U is a unitary matrix (UTU = UUT = I), hence
Σ = UΛUT
and the square root of Σ isΣ1/2 = UΛ1/2,
where Λ1/2 is a diagonal matrix with diagonal elements λ1/2i
• The inverse of the square root is straightforward to find as
Σ−1/2 = Λ−1/2UT
• Example: Let
Σ =
[
2 11 3
]
To find the eigenvalues of Σ, we find the roots of the polynomial equation
det(Σ− λI) = λ2 − 5λ+ 5 = 0,
which gives λ1 = 3.62, λ2 = 1.38
To find the eigenvectors, consider[
2 11 3
] [
u11
u12
]
= 3.62
[
u11
u12
]
,
EE 278: Random Vectors Page 3 – 12
and u211 + u2
12 = 1, which yields
u1 =
[
0.530.85
]
Similarly, we can find the second eigenvector
u2 =
[
−0.850.53
]
Hence,
Σ1/2 =
[
0.53 −0.850.85 0.53
] [√3.62 0
0√1.38
]
=
[
1 −11.62 0.62
]
The inverse of the square root is
Σ−1/2 =
[
1/√3.62 0
0 1/√1.38
] [
0.53 0.85−0.85 0.53
]
=
[
0.28 0.45−0.72 0.45
]
EE 278: Random Vectors Page 3 – 13
Geometric Interpretation
• To generate a RV Y with covariance matrix Σ from white RV X, we use thetransformation Y = UΛ1/2X
• Equivalently, we first scale each component of X to obtain the RV Z = Λ1/2X
And then rotate Z using U to obtain Y = UZ
• We can visualize this by plotting xTIx = c, zTΛz = c, and yTΣy = c
x1
x2
z1
z2
y1
y2
Z = Λ1/2X
=⇒=⇒
Y = UZ
⇐= ⇐=
X = Λ−1/2Z Z = UTY
EE 278: Random Vectors Page 3 – 14
Cholesky Decomposition
• Σ has many square roots:
If Σ1/2 is a square root, then for any unitary matrix V , Σ1/2V is also a squareroot since Σ1/2V V T (Σ1/2)T = Σ
• The Cholesky decomposition is an efficient algorithm for computing lowertriangle square root that can be used to perform coloring causally (sequentially)
• For n = 3, we want to find a lower triangle matrix (square root) A such that
Σ =
σ11 σ12 σ13
σ21 σ22 σ23
σ31 σ32 σ33
=
a11 0 0a21 a22 0a31 a32 a33
a11 a21 a310 a22 a320 0 a33
The elements of A are computed in a raster scan manner:
a11: σ11 = a211 ⇒ a11 =√σ11
a21: σ21 = a21a11 ⇒ a21 = σ21/a11
a22: σ22 = a221 + a222 ⇒ a22 =√
σ22 − a221
a31: σ31 = a11a31 ⇒ a31 = σ31/a11
EE 278: Random Vectors Page 3 – 15
a32: σ32 = a21a31 + a22a32 ⇒ a32 = (σ32 − a21a31)/a22
a33: σ33 = a231 + a232 + a233 ⇒ a33 =√
σ33 − a231 − a232
• The inverse of a lower triangle square root is also lower triangular
• Coloring and whitening summary:
◦ Coloring:
X Σ1/2 Y
ΣX = I ΣY = Σ
◦ Whitening:
Y Σ−1/2 X
ΣX = IΣY = Σ
◦ Lower triangle square root and its inverse can be efficiently computed usingCholesky decomposition
EE 278: Random Vectors Page 3 – 16
Gaussian Random Vectors
• A random vector X = (X1, . . . , Xn) is a Gaussian random vector (GRV) (orX1,X2, . . . ,Xn are jointly Gaussian r.v.s) if the joint pdf is of the form
fX(x) =1
(2π)n2 |Σ|12
e−12(x − µ)TΣ−1(x − µ) ,
where µ is the mean and Σ is the covariance matrix of X, and∣
∣Σ∣
∣ > 0, i.e., Σis positive definite
• Verify that this joint pdf is the same as the case n = 2 from Lecture Notes 2
• Notation: X ∼ N (µ,Σ) denotes a GRV with given mean and covariance matrix
• Since Σ is positive definite, Σ−1 is positive definite. Thus if x− µ 6= 0,
(x− µ)TΣ−1(x− µ) > 0 ,
which means that the contours of equal pdf are ellipsoids
• The GRV X ∼ N (0, aI), where I is the identity matrix and a > 0, is calledwhite; its contours of equal joint pdf are spheres centered at the origin
EE 278: Random Vectors Page 3 – 17
Properties of GRVs
• Property 1: For a GRV, uncorrelation implies independence
This can be verified by substituting σij = 0 for all i 6= j in the joint pdf.
Then Σ becomes diagonal and so does Σ−1, and the joint pdf reduces to theproduct of the marginals Xi ∼ N (µi, σii)
For the white GRV X ∼ N (0, aI), the r.v.s are i.i.d. N (0, a)
• Property 2: Linear transformation of a GRV yields a GRV, i.e., given anym× n matrix A, where m ≤ n and A has full rank m, then
Y = AX ∼ N (Aµ, AΣAT )
• Example: Let
X ∼ N(
0,
[
2 11 3
])
Find the joint pdf of
Y =
[
1 11 0
]
X
EE 278: Random Vectors Page 3 – 18
Solution: From Property 2, we conclude that
Y ∼ N(
0,
[
1 11 0
] [
2 11 3
] [
1 11 0
])
= N(
0,
[
7 33 2
])
Before we prove Property 2, let us show that
E(Y) = Aµ and ΣY = AΣAT
These results follow from linearity of expectation. First, expectation:
E(Y) = E(AX) = AE(X) = Aµ
Next consider the covariance matrix:
ΣY = E[
(Y − E(Y))(Y − E(Y))T]
= E[
(AX−Aµ)(AX−Aµ)T]
= AE[
(X− µ)(X− µ)T]
AT = AΣAT
Of course this is not sufficient to show that Y is a GRV—we must also showthat the joint pdf has the right form
We do so using the characteristic function for a random vector
EE 278: Random Vectors Page 3 – 19
• Definition: If X ∼ fX(x), the characteristic function of X is
ΦX(ω) = E(
eiωTX
)
,
where ω is an n-dimensional real valued vector and i =√−1
Thus
ΦX(ω) =
∫ ∞
−∞
∫ ∞
−∞
. . .
∫ ∞
−∞
fX(x)eiωTx dx
This is the inverse of the multi-dimensional Fourier transform of fX(x), whichimplies that there is a one-to-one correspondence between ΦX(ω) and fX(x).The joint pdf can be found by taking the Fourier transform of ΦX(ω), i.e.,
fX(x) =
∫ ∞
−∞
∫ ∞
−∞
. . .
∫ ∞
−∞
1
(2π)nΦX(ω)e−iωTx dω
• Example: The characteristic function for X ∼ N (µ, σ2) is
ΦX(ω) = e−12ω
2σ2 + iµω ,
and for a GRV X ∼ N (µ,Σ),
ΦX(ω) = e−12ω
TΣω + iωTµ
EE 278: Random Vectors Page 3 – 20
• Now let’s go back to proving Property 2
Since A is an m× n matrix, Y = AX and ω are m-dimensional. Therefore thecharacteristic function of Y is
ΦY(ω) = E(
eiωTY
)
= E(
eiωTAX
)
= ΦX(ATω)
= e−12(A
Tω)TΣ(AT
ω) + iωTAµ
= e−12ω
T (AΣAT)ω + iωTAµ
Thus Y = AX ∼ N (Aµ, AΣAT )
• An equivalent definition of GRV: X is a GRV iff for every real vector a 6= 0, ther.v. Y = aTX is Gaussian (see HW for proof)
• Whitening transforms a GRV to a white GRV; conversely, coloring transforms awhite GRV to a GRV with prescribed covariance matrix
EE 278: Random Vectors Page 3 – 21
• Property 3: Marginals of a GRV are Gaussian, i.e., if X is GRV then for anysubset {i1, i2, . . . , ik} ⊂ {1, 2, . . . , n} of indexes, the RV
Y =
Xi1
Xi2...
Xik
is a GRV
• To show this we use Property 2. For example, let n = 3 and Y =
[
X1
X3
]
We can express Y as a linear transformation of X:
Y =
[
1 0 00 0 1
]
X1
X2
X3
=
[
X1
X3
]
Therefore
Y ∼ N([
µ1
µ3
]
,
[
σ11 σ13
σ31 σ33
])
• As we have seen in Lecture Notes 2, the converse of Property 3 does not hold ingeneral, i.e., Gaussian marginals do not necessarily mean that the r.v.s arejointly Gaussian
EE 278: Random Vectors Page 3 – 22
• Property 4: Conditionals of a GRV are Gaussian, more specifically, if
X =
X1
−−X2
∼ N
µ1
−−µ2
,
Σ11 | Σ12
−− | −−Σ21 | Σ22
,
where X1 is a k-dim RV and X2 is an n− k-dim RV, then
X2 | {X1 = x} ∼ N(
Σ21Σ−111 (x− µ1) + µ2 , Σ22 − Σ21Σ
−111 Σ12
)
Compare this to the case of n = 2 and k = 1:
X2 | {X1 = x} ∼ N(
σ21
σ11(x− µ1) + µ2 , σ22 −
σ212
σ11
)
• Example:
X1
−−X2
X3
∼ N
1−−22
,
1 | 2 1−− | −− −−2 | 5 21 | 2 9
EE 278: Random Vectors Page 3 – 23
From Property 4, it follows that
E(X2 |X1 = x) =
[
21
]
(x− 1) +
[
22
]
=
[
2xx+ 1
]
Σ{X2|X1=x} =
[
5 22 9
]
−[
21
]
[
2 1]
=
[
1 00 8
]
• The proof of Property 4 follows from properties 1 and 2 and the orthogonalityprinciple (HW exercise)
EE 278: Random Vectors Page 3 – 24