ee401 (semester 1) jitkomut songsiri 1. review of probability
TRANSCRIPT
EE401 (Semester 1) Jitkomut Songsiri
1. Review of Probability
• Random Experiments
• The Axioms of Probability
• Conditional Probabilty
• Independence of Events
• Sequential Experiments
1-1
Random Experiments
An experiment in which the outcome varies in an unpredictable fashionwhen the experiment is repeated under the same conditions
Examples:
• Select a ball from an urn containing balls numbered 1 to n
• Toss a coin and note the outcome
• Roll a dice and note the outcome
• Measure the time between page requests in a Web server
• Pick a number at random between 0 and 1
Review of Probability 1-2
Sample space
Sample space is the set of all possible outcomes, denoted by S
• obtained by listing all the elements, e.g., S = {H,T}, or• giving a property that specifies the elements, e.g., S = {x | 0 ≤ x ≤ 3}
Same experimental procedure may have different sample spaces
y
x
0
1
1
S1
y
x
0
1
1
S2
• Experiment 1: Pick two numbers at random between zero and one
• Experiment 2: Pick a number X at random between 0 and 1, then picka number Y at random between 0 and X
Review of Probability 1-3
Three possibilities for the number of outcomes in sample spaces
finite, countably infinite, uncountably infinite
Examples:
S1 = {1, 2, 3, . . . , 10}S2 = {HH,HT,TT,TH}S3 = {x ∈ Z | 0 ≤ x ≤ 10}S4 = {1, 2, 3, . . .}S5 = {(x, y) ∈ R× R | 0 ≤ y ≤ x ≤ 1}S6 = Set of functions X(t) for which X(t) = 0 for t ≥ t0
Discrete sample space: if S is countable (S1, S2, S3, S4)
Continuous sample space: if S is not countable (S5, S6)
Review of Probability 1-4
Events
Event is a subset of a sample space when the outcome satisfies certainconditions
Examples: Ak denotes an event corresponding to the experiment Ek
E1 : Select a ball from an urn containing balls numbered 1 to 10
A1 : An even-numbered ball (from 1 to 10) is selected
S1 = {1, 2, 3, . . . , 10}, A1 = {2, 4, 6, 8, 10}
E2 : Toss a coin twice and note the sequence of heads and tails
A2 : The two tosses give the same outcome
S2 = {HH,HT,TT,TH}, A2 = {HH,TT}
Review of Probability 1-5
E3 : Count # of voice packets containing only silence from 10 speakers
A3 : No active packets are produced
S3 = {x ∈ Z | 0 ≤ x ≤ 10}, A3 = {0}
Two events of special interest:
• Certain event, S, which consists of all outcomes and hence alwaysoccurs
• Impossible event or null event, ∅, which contains no outcomes andnever occurs
Review of Probability 1-6
Review of Set Theory
• A = B if and only if A ⊂ B and B ⊂ A
• A ∪B (union): set of outcomes that are in A or in B
• A ∩B (intersection): set of outcomes that are in A and in B
• A and B are disjoint or mutually exclusive if A ∩B = ∅• Ac (complement): set of all elements not in A
• A ∪B = B ∪A and A ∩B = B ∩A
• A ∪ (B ∪ C) = (A ∪B) ∪ C and A ∩ (B ∩ C) = (A ∩B) ∩ C
• A∪ (B ∩C) = (A∪B)∩ (A∪C) and A∩ (B ∪C) = (A∩B)∪ (A∩C)
• DeMorgan’s Rules
(A ∪B)c = Ac ∩Bc, (A ∩B)c = Ac ∪Bc
Review of Probability 1-7
Axioms of Probability
Probabilities are numbers assigned to events indicating how likely it is thatthe events will occur
A Probability law is a rule that assigns a number P (A) to each event A
P (A) is called the the probability of A and satisfies the following axioms
Axiom 1 P (A) ≥ 0
Axiom 2 P (S) = 1
Axiom 3 If A ∩B = ∅ then P (A ∪B) = P (A) + P (B)
Review of Probability 1-8
Probability Facts
• P (Ac) = 1− P (A)
• P (A) ≤ 1
• P (∅) = 0
• If A1, A2, . . . , An are pairwise mutually exclusive then
P
(
n⋃
k=1
Ak
)
=
n∑
k=1
P (Ak)
• P (A ∪B) = P (A) + P (B)− P (A ∩B)
• If A ⊂ B then P (A) ≤ P (B)
Review of Probability 1-9
Conditional Probability
The probability of event A given that event B has occured
The conditional probability, P (A|B), is defined as
P (A|B) =P (A ∩B)
P (B), for P (B) > 0
If B is known to have occured, then A can occurs only if A ∩B occurs
Simply renormalizes the probability of events that occur jointly with B
Useful in finding probabilities in sequential experiments
Review of Probability 1-10
Example: Tree diagram of picking balls
Selecting two balls at random without replacement
Outcome of first draw
Outcome of second draw
B1
B2B2
W1
W2W2
35
25
14
34
24
24
110
310
310
310
B1, B2 are the events of getting a black ball in the first and second draw
P (B2|B1) =1
4, P (W2|B1) =
3
4, P (B2|W1) =
2
4, P (W2|W1) =
2
4
The probability of a path is the product of the probabilities in the transition
P (B1 ∩B2) = P (B2|B1)P (B1) =1
4
2
5=
1
10
Review of Probability 1-11
Example: Tree diagram of Binary Communication
Input
Output 00
0
11
1
p1− p
εε 1− ε1− ε
(1− p)(1− ε) (1− p)ε pε p(1− ε)
Ai: event the input was i, Bi: event the reciever was i
P (A0 ∩B0) = (1− p)(1− ε)
P (A0 ∩B1) = (1− p)ε
P (A1 ∩B0) = pε
P (A1 ∩B1) = p(1− ε)
Review of Probability 1-12
Theorem on Total Probability
Let B1, B2, . . . , Bn be mutually exclusive events such that
S = B1 ∪B2 ∪ · · · ∪Bn
(their union equals the sample space)
Event A can be partitioned as
A = A ∩ S = (A ∩B1) ∪ (A ∩B2) ∪ · · · ∪ (A ∩Bn)
Since A ∩Bk are disjoint, the probability of A is
P (A) = P (A ∩B1) + P (A ∩B2) + · · ·+ P (A ∩Bn)
or equivalently,
P (A) = P (A|B1)P (B1) + P (A|B2)P (B2) + · · ·+ P (A|Bn)P (Bn)
Review of Probability 1-13
Example: revisit the tree diagram of picking two balls
Outcome of first draw
Outcome of second draw
B1
B2B2
W1
W2W2
35
25
14
34
24
24
110
310
310
310
Find the probability of the event that the second ball is white
P (W2) = P (W2|B1)P (B1) + P (W2|W1)P (W1)
=3
4· 25+
1
2· 35=
3
5
Review of Probability 1-14
Bayes’ Rule
The conditional probablity of event A given B is related to the inverseconditional probability of event B given A by
P (A|B) =P (B|A)P (A)
P (B)
• P (A) is called a priori probability
• P (A|B) is called a posteriori probability
Let A1, A2, . . . , An be a partition of S
P (Ai|B) =P (B|Ai)P (Ai)
∑nk=1P (B|Ak)P (Ak)
Review of Probability 1-15
Example: Binary Channel
OutputInput
00
11
ε
ε
1− ε
1− εAi event the input was i
Bi event the receiver output was i
Input is equally likely to be 0 or 1
P (B1) = P (B1|A0)P (A0)+P (B1|A1)P (A1) = ε(1/2)+(1−ε)(1/2) = 1/2
Applying Bayes’ Rule, we obtain
P (A0|B1) =P (B1|A0)P (A0)
P (B1)=
ε/2
1/2= ε
If ε < 1/2, input 1 is more likely than 0 when 1 is observed
Review of Probability 1-16
Independence of Events
Events A and B are independent if
P (A ∩B) = P (A)P (B)
• Knowledge of event B does not alter the probability of event A
• This implies P (A|B) = P (A)
�������������������������
�������������������������
��������������������
��������������������
0
1
1
A
B
x
y
��������������������������������
��������������������������������
��������������������
��������������������
0
1
1
A
C x
y
A and B are independent A and C are not independent
Review of Probability 1-17
Example: System Reliability
Controller Unit 1
Unit 2
Unit 3
• System is ’up’ if the controller and atleast two units are functioning
• Controller fails with probability p
• Peripheral unit fails with probability a
• All components fail independently
• A: event the controller is functioning
• Bi: event unit i is functioning
• F : event two or more peripheral units are functioning
Find the probability that the system is up
Review of Probability 1-18
The event F can be partition as
F = (B1 ∩B2 ∩Bc3) ∪ (B1 ∩Bc
2 ∩B3) ∪ (Bc1 ∩B2 ∩B3) ∪ (B1 ∩B2 ∩B3)
Thus,
P (F ) = P (B1)P (B2)P (Bc3) + P (B1)P (Bc
2)P (B3)
+ P (Bc1)P (B2)P (B3) + P (B1)P (B2)P (B3)
= 3(1− a)2a+ (1− a)3
P (system is up) = P (A ∩ F ) = P (A)P (F )
= (1− p)P (F ) = (1− p){3(1− a)2a+ (1− a)3}
Review of Probability 1-19
Sequential Independent Experiments
• Consider a random experiment consisting of n independent experiments
• Let A1, A2, . . . , An be events of the experiments
• We can compute the probability of events of the sequential experiment
P (A1 ∩A2 ∩ · · · ∩An) = P (A1)P (A2) . . . P (An)
• Example: Bernoulli trial
– Perform an experiment and note if the event A occurs– The outcome is “success” or “failure”– The probability of success is p and failure is 1− p
Review of Probability 1-20
Binomial Probability
• Perform n Bernoulli trials and observe the number of successes
• Let X be the number of successes in n trials
• The probability of X is given by the Binomial probability law
P (X = k) =
(
nk
)
pk(1− p)n−k
for k = 0, 1, . . . , n
• The binomial coefficient
(
nk
)
=n!
k!(n− k)!
is the number of ways of picking k out of n for the successes
Review of Probability 1-21
Example: Error Correction Coding
OutputInput
00
11
ε
ε
1− ε
1− ε• Transmit each bit three times
• Decoder takes a majority vote of the receivedbits
Compute the probability that the receiver makes an incorrect decision
• View each transmission as a Bernoulli trial
• Let X be the number of wrong bits from the receiver
P (X ≥ 2) =
(
32
)
ε2(1− ε) +
(
33
)
ε3
Review of Probability 1-22
Mutinomial Probability
• Generalize the binomial probability law to the occurrence of more thanone event
• Let B1, B2, . . . , Bm be possible events with
P (Bk) = pk, and p1 + p2 + · · ·+ pm = 1
• Suppose n independent repetitions of the experiment are performed
• Let Xj be the number of times each Bj occurs
• The probability of the vector (X1,X2, . . . ,Xm) is given by
P (X1 = k1, X2 = k2, . . . , Xm = km) =n!
k1!k2! . . . km!pk11 pk22 · · · pkmm
where k1 + k2 + · · ·+ km = n
Review of Probability 1-23
Geometric Probability
• Repeat independent Bernoulli trials until the the first success occurs
• Let X be the number of trials until the occurrence of the first success
• The probability of this event is called the geometric probability law
P (X = k) = (1− p)k−1p, for k = 1, 2, . . .
• The geometric probabilities sum to 1:
∞∑
k=1
P (X = k) = p∞∑
k=1
qk−1 =p
1− q= 1
where q = 1− p
• The probability that more than n trials are required before a success
P (X > n) = (1− p)n
Review of Probability 1-24
Example: Error Control by Retransmission
• A sends a message to B over a radio link
• B can detect if the messages have errors
• The probability of transmission error is q
• Find the probability that a message needs to be transmitted more thantwo times
Each transmission is a Bernoulli trial with probability of success p = 1− q
The probability that more than 2 transmissions are required is
P (X > 2) = q2
Review of Probability 1-25
Sequential Dependent Experiments
Sequence of subexperiments in which the outcome of a givensubexperiment determine which subexperiment is performed next
Example: Select the urn for the first draw by flipping a fair coin
1
0
11
1
1
01
0
0 1
Draw a ball, note the number on the ball and replace it back in its urn
The urn used in the next experiment depends on # of the ball selected
Review of Probability 1-26
Trellis Diagram
Sequence of outcomes
...
H
T000
0000000
111
111
1111
Probability of a sequence of outcomes
...
2/3
1/3
5/6
2/3 2/3
1/3 1/3
1/6 1/6
5/6 5/6
1/2
1/21/6
0000
1111
is the product of probabilities along the path
Review of Probability 1-27
Markov Chains
Let A1, A2, . . . , An be a sequence of events from n sequential experiments
The probability of a sequence of events is given by
P (A1A2 · · ·An) = P (An|A1A2 · · ·An−1)P (A1A2 · · ·An−1)
If the outcome of An−1 only determines the nth experiment and An then
P (An|A1A2 · · ·An−1) = P (An|An−1)
and the sequential experiments are called Markov Chains
Thus,
P (A1A2 · · ·An) = P (An|An−1)P (An−1|An−2) · · ·P (A2|A1)P (A1)
Find P (0011) in the urn example
Review of Probability 1-28
The probability of the sequence 0011 is given by
P (0011) = P (1|1)P (1|0)P (0|0)P (0)
where the transition probabilities are
P (1|1) = 5
6, P (1|0) = 1
3, P (0|0) = 2
3
and the initial probability is given by
P (0) =1
2
Hence,
P (0011) =5
6· 13· 23· 12=
5
54
Review of Probability 1-29
References
Chapter 2 inA. Leon-Garcia, Probability, Statistics, and Random Processes for
Electrical Engineering, 3rd edition, Pearson Prentice Hall, 2009
Review of Probability 1-30
EE401 (Semester 1) Jitkomut Songsiri
2. Random Variables
• definition
• probability measures: CDF, PMF, PDF
• expected values and moments
• examples of RVs
2-1
Definition
Real lineS
SX
x
X(ζ) = x
ζ
a random variable X is a function mapping an outcome to a real number
• the sample space, S, is the domain of the random variable
• SX is the range of the random variable
example: toss a coin three times and note the sequence of heads and tails
S = {HHH,HHT,HTH,THH,HTT,THT,TTH,TTT}
Let X be the number of heads in the three tosses
SX = {0, 1, 2, 3}
Random Variables 2-2
Types of Random Variables
Discrete RVs take values from a countable set
example: let X be the number of times a message needs to be transmitteduntil it arrives correctly
SX = {1, 2, 3, . . .}Continuous RVs take an infinite number of possible values
example: let X be the time it takes before receiving the next phone calls
Mixed RVs have some part taking values over an interval like typicalcontinuous variables, and part of it concentrated on particular values likediscrete variables
Random Variables 2-3
Probability measures
Cumulative distribution function (CDF)
F (a) = P (X ≤ a)
Probability mass function (PMF) for discrete RVs
p(k) = P (X = k)
Probability density function (PDF) for continuous RVs
f(x) =dF (x)
dx
Random Variables 2-4
Cumulative Distribution Function (CDF)
Properties
0 ≤ F (a) ≤ 1
F (a) → 1, as a → ∞F (a) → 0, as a → −∞
−10 −5 0 5 100
0.2
0.4
0.6
0.8
1
−10 −5 0 5 100
0.2
0.4
0.6
0.8
1
F (b)− F (a) =
∫ b
a
f(x)dx F (a) =∑
k≤a
p(k)
Random Variables 2-5
Probability Density Function
Probability Density Function (PDF)
• f(x) ≥ 0
• P (a ≤ X ≤ b) =∫ b
af(x)dx
• F (x) =x∫
−∞f(u)du
Probability Mass Function (PMF)
• p(k) ≥ 0 for all k
• ∑
k∈S
p(k) = 1
Random Variables 2-6
Expected values
let g(X) be a function of random variable X
E[g(X)] =
∑
x∈S
g(x)p(x) X is discrete
∞∫
−∞g(x)f(x)dx X is continuous
Mean
µ = E[X ] =
∑
x∈S
xp(x) X is discrete
∞∫
−∞xf(x)dx X is continuous
Varianceσ2 = var[X ] = E[(X − µ)2]
nth MomentE[Xn]
Random Variables 2-7
Facts
Let Y = g(X) = aX + b, a, b are constants
• E[Y ] = aE[X ] + b
• var[Y ] = a2 var[X ]
• var[X ] = E[X2]− (E[X ])2
Random Variables 2-8
Example of Random Variables
Discrete RVs
• Bernoulli
• Binomial
• Geometric
• Negative binomial
• Poisson
• Uniform
Continuous RVs
• Uniform
• Exponential
• Gaussian (Normal)
• Gamma
• Rayleigh
• Cauchy
• Laplacian
Random Variables 2-9
Bernoulli random variables
let A be an event of interest
a Bernoulli random variable X is defined as
X = 1 if A occurs and X = 0 otherwise
it can also be given by the indicator function for A
X(ζ) =
{
0, if ζ not in A
1, if ζ in A
PMF: p(1) = p, p(0) = 1− p, 0 ≤ p ≤ 1
Mean: E[X ] = p
Variance: var[X ] = p(1− p)
Random Variables 2-10
Example of Bernoulli PMF: p = 1/3
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.2 0.4 0.6 0.8 10.65
0.7
0.75
0.8
0.85
0.9
0.95
1
xx
p(x)
F(x)
Random Variables 2-11
Binomial random variables
• X is the number of successes in a sequence of n independent trials
• each experiment yields success with probability p
• when n = 1, X is a Bernoulli random variable
• SX = {0, 1, 2, . . . , n}• ex. Transmission errors in a binary channel: X is the number of errorsin n independent transmissions
PMF
p(k) = P (X = k) =
(
nk
)
pk(1− p)n−k, k = 0, 1, . . . , n
MeanE[X ] = np
Variancevar[X ] = np(1− p)
Random Variables 2-12
Example of Binomial PMF: p = 1/3, n = 10
0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
xx
p(x)
F(x)
Random Variables 2-13
Geometric random variables
• repeat independent Bernoulli trials, each has probability of success p
• X is the number of experiments required until the first success occurs
• SX = {1, 2, 3, . . .}• ex. Message transmissions: X is the number of times a message needsto be transmitted until it arrives correctly
PMFp(k) = P (X = k) = (1− p)k−1p
Mean
E[X ] =1
p
Variance
var[X ] =1− p
p2
Random Variables 2-14
Example of Geometric PMF: p = 1/3
0 2 4 6 8 10 120
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
x
p(x)
• parameters: p = 1/4, 1/3, 1/2
Random Variables 2-15
Negative binomial (Pascal) random variables
• repeat independent Bernoulli trials until observing the rth success
• X is the number of trials required until the rth success occurs
• X can be viewed as the sum of r geometrically RVs
• SX = {r, r + 1, r + 2, . . .}
PMF
p(k) = P (X = k) =
(
k − 1r − 1
)
pr(1− p)k−r, k = r, r + 1, . . .
MeanE[X ] =
r
pVariance
var[X ] =r(1− p)
p2
Random Variables 2-16
Example of negative binomial PMF: p = 1/3, r = 5
0 10 20 30 400
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0 10 20 30 400
0.2
0.4
0.6
0.8
1
xx
p(x)
F(x)
Random Variables 2-17
Poisson random variables
• X is a number of events occurring in a certain period of time
• events occur with a known average rate
• the expected number of occurrences in the interval is λ
• SX = {0, 1, 2, . . .}• examples:
– number of emissions of a radioactive mass during a time interval– number of queries arriving in t seconds at a call center– number of packet arrivals in t seconds at a multiplexer
PMF
p(k) = P (X = k) =λk
k!e−λ, k = 0, 1, 2, . . .
Mean E[X ] = λ
Variance var[X ] = λ
Random Variables 2-18
Example of Poisson PMF: λ = 2
0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 2 4 6 8 100.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
xx
p(x)
F(x)
Random Variables 2-19
Derivation of Poisson distribution
• approximate a binomial RV when n is large and p is small
• define λ = np, in 1898 Bortkiewicz showed that
p(k) =
(
nk
)
pk(1− p)n−k ≈ λk
k!e−λ
Proof.
p(0) = (1− p)n = (1− λ/n)n ≈ e−λ, n → ∞p(k + 1)
p(k)=
(n− k)p
(k + 1)(1− p)=
(1− k/n)λ
(k + 1)(1− λ/n)
take the limit n → ∞
p(k + 1) =λ
k + 1p(k) =
(
λ
k + 1
)(
λ
k
)
· · ·(
λ
1
)
p(0) =λk+1
(k + 1)!e−λ
Random Variables 2-20
Comparison of Poisson and Binomial PMFs
0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0 5 10 15 200
0.02
0.04
0.06
0.08
0.1
0.12
0.14
xx
p(x)
p(x)
p = 1/2, n = 10 p = 1/10, n = 100
• red: Poisson
• blue: Binomial
Random Variables 2-21
Uniform random variables
Discrete Uniform RVs
• X has n possible values, x1, . . . , xn that are equally probable
• PMF
p(x) =
{
1n, if x ∈ {x1, . . . , xn}0, otherwise
Continuous Uniform RVs
• X takes any values on an interval [a, b] that are equally probable
f(x) =
{
1(b−a), for x ∈ [a, b]
0, otherwise
• Mean: E[X ] = (a+ b)/2
• Variance: var[X ] = (b− a)2/12
Random Variables 2-22
Example of Discrete Uniform PMF: X = 0, 1, 2, . . . , 10
0 5 100
0.02
0.04
0.06
0.08
0.1
0 5 100
0.2
0.4
0.6
0.8
1
xx
p(x
)
F(x
)
Example of Continuous Uniform PMF: X ∈ [0, 2]
0 0.5 1 1.5 2−0.5
0
0.5
1
1.5
0 0.5 1 1.5 20
0.2
0.4
0.6
0.8
1
xx
f(x
)
F(x
)
Random Variables 2-23
Exponential random variables
• arise when describing the time between occurrence of events
• examples:
– the time between customer demands for call connections– the time used for a bank teller to serve a customer
• λ is the rate at which events occur
• a continuous counterpart of the geometric random variable
f(x) =
{
λe−λx, if x ≥ 0
0, if x < 0
Mean E[X ] = 1λ
Variance var[X ] = 1λ2
Random Variables 2-24
Example of Exponential PDF
0 5 100
0.2
0.4
0.6
0.8
1
0 5 100
0.2
0.4
0.6
0.8
1
xx
f(x
)
F(x
)
• parameters: λ = 1, 1/2, 1/3
Random Variables 2-25
Memoryless property
P (X > t+ h|X > t) = P (X > h)
• P (X > t+ h|X > t) is the probability of having to wait additionally atleast h seconds given that one has already been waiting t seconds
• P (X > h) is the probability of waiting at least h seconds when one firstbegins to wait
• thus, the probability of waiting at least an additional h seconds is thesame regardless of how long one has already been waiting
Proof.
P (X > t+ h|X > t) =P{(X > t+ h) ∩ (X > t)}
P (X > t), for h > 0
=P (X > t+ h)
P (X > t)=
e−λ(t+h)
e−λt= e−λh
Random Variables 2-26
this is not the case for other non-negative continuous RVs
in fact, the conditional probability
P (X > t+ h|X > t) =1− P (X ≤ t+ h)
1− P (X ≤ t)=
1− F (t+ h)
1− F (t)
depends on t in general
Random Variables 2-27
Gaussian (Normal) random variables
• arise as the outcome of the central limit theorem
• the sum of a large number of RVs is distributed approximately normally
• many results involving Gaussian RVs can be derived in analytical form
• let X be a Gaussian RV with parameters mean µ and variance σ2
Notation X ∼ N (µ, σ2)
f(x) =1√2πσ2
exp− (x− µ)2
2σ2, −∞ < x < ∞
Mean E[X ] = µ
Variance var[X ] = σ2
Random Variables 2-28
let Z ∼ N (0, 1) be the normalized Gaussian variable
CDF of Z is
FZ(z) =1√2π
∫ z
−∞e−t2/2dt , Φ(z)
then CDF of X ∼ N (µ, σ2) can be obtained by
FX(x) = Φ
(
x− µ
σ
)
in MATLAB, the error function is defined as
erf(x) =2√π
∫ x
0
e−t2dt
hence, Φ(z) can be computed via the erf command as
Φ(z) =1
2
[
1 + erf
(
z√2
)]
Random Variables 2-29
Example of Gaussian PDF
−10 −5 0 5 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
−10 −5 0 5 100
0.2
0.4
0.6
0.8
1
xx
f(x
)
F(x
)
• parameters: µ = 1, σ = 1, 2, 3
Random Variables 2-30
Gamma random variables
• appears in many applications:
– the time required to service customers in queuing system– the lifetime of devices in reliability studies– the defect clustering behavior in VLSI chips
• let X be a Gamma variable with parameters α, λ
f(x) =λ(λx)α−1e−λx
Γ(α), x ≥ 0; α, λ > 0
where Γ(z) is the gamma function, defined by
Γ(z) =
∫ ∞
0
xz−1e−xdx, z > 0
Mean E[X ] = αλ Variance var[X ] = α
λ2
Random Variables 2-31
Properties of the gamma function
Γ(1/2) =√π
Γ(z + 1) = zΓ(z) for z > 0
Γ(m+ 1) = m!, for m a nonnegative integer
Special cases
a Gamma RV becomes
• exponential RV when α = 1
• m-Erlang RV when α = m, a positive integer
• chi-square RV with k DOF when α = k/2, λ = 1/2
Random Variables 2-32
Example of Gamma PDF
0 5 10 15 200
0.1
0.2
0.3
0.4
0.5
x
f(x
)
• blue: α = 0.2, λ = 0.2 (long tail)
• green: α = 1, λ = 0.5 (exponential)
• red: α = 3, λ = 1/2 (Chi square with 6 DOF)
• black: α = 5, 20, 50, 100 and α/λ = 10 (α-Erlang with mean 10)
Random Variables 2-33
m-Erlang random variables
0 t2 t
X2
tm−1 tm+1
. . .
. . .t1
X1
tm
Xm Xm+1
• the kth event occurs at time tk
• the times X1, X2, . . . , Xm between events are exponential RVs
• N(t) denotes the number of events in t seconds, which is a Poisson RV
• Sm = X1 +X2 + . . .+Xm is the elapsed time until the mth occurs
we can show that Sm is an m-Erlang random variable
Random Variables 2-34
Proof. Sm ≤ t iff m or more events occur in t seconds
F (t) = P (Sm ≤ t)
= P (N(t) ≥ m)
= 1−m−1∑
k=0
(λt)k
k!e−λt
to get the density function of Sm, we take the derivative of F (t):
f(t) =dF (t)
dt=
m−1∑
k=0
e−λt
k!
(
λ(λt)k − kλ(λt)k−1)
=λ(λt)m−1e−λt
(m− 1)!⇒ Erlang distribution with parameters m,λ
the sum of m exponential RVs with rate λ is an m-Erlang RV
if m becomes large, the m-Erlang RV should approach the normal RV
Random Variables 2-35
Rayleigh random variables
• arise when observing the magnitude of a vector
• ex. The absolute values of random complex numbers whose real andimaginary are i.i.d. Gaussian
PDFf(x) =
x
σ2e−x2/2σ2
, x ≥ 0, α > 0
MeanE[X ] = σ
√
π/2
Variance
var[X ] =4− π
2σ2
the Chi square distribution is a generalization of Rayleigh distribution
Random Variables 2-36
Example of Rayleigh PDF
0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
xx
f(x
)
F(x
)
• parameters: σ = 1, 2, 3
Random Variables 2-37
Cauchy random variables
f(x) =1/π
1 + x2, −∞ < x < ∞
• Cauchy distribution does not have any moments
• no mean, variance or higher moments defined
• Z = X/Y is the standard Cauchy if X and Y are independent Gaussian
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
x
f(x
)
Random Variables 2-38
Laplacian random variables
PDFf(x) =
α
2e−α|x−µ|, −∞ < x < ∞
MeanE[X ] = µ
Variance
var[X ] =2
α2
• arise as the difference between two i.i.d exponential RVs
• unlike Gaussian, the Laplace density is expressed in terms of theabsolute difference from the mean
Random Variables 2-39
Example of Laplacian PDF
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.5
1
1.5
2
2.5
x
f(x
)
• parameters: µ = 1, α = 1, 2, 3, 4, 5
Random Variables 2-40
Related MATLAB commands
• cdf returns the values of a specified cumulative distribution function
• pdf returns the values of a specified probability density function
• randn generates random numbers from the standard Gaussiandistribution
• rand generates random numbers from the standard uniform distribution
• random generates random numbers drawn from a specified distribution
Random Variables 2-41
References
Chapter 3,4 inA. Leon-Garcia, Probability, Statistics, and Random Processes for
Electrical Engineering, 3rd edition, Pearson Prentice Hall, 2009
Random Variables 2-42
EE401 (Semester 1) Jitkomut Songsiri
3. Functions of random variables
• linear and quadratic transformations
• general transformations
• characteristic function
• Markov and Chebyshev inequalities
• Chernoff bound
3-1
Functions of random variables
let X be an RV and g(x) be a real-valued function defined on the real line
• Y = g(X), Y is also an RV
• CDF of Y will depend on g(x) and CDF of X
Example: define g(x) as
g(x) = (x)+ =
{
x, if x ≥ 0
0, if x < 0
• an input voltage X passes thru a halfwave rectifier
• A/D converter: a uniform quantizer maps input to the closet point
• Y is # of active speakers in excess of M , i.e., Y = (X −M)+
Functions of random variables 3-2
CDF of Y = g(X)
probability of equivalent events:
P (Y in C) = P (g(X) in C) = P (X in B)
where B is the equivalent event of values of X sucht that g(X) is in C
Functions of random variables 3-3
Example: Voice Transmission System
• X is # of active speakers in a group of N speakers
• let p be the probability that a speaker is active
• a voice transmission system can transmit up to M signals at a time
• let Y be the number of signal discarded, so Y = (X −M)+
Y take values from the set SY = {0, 1, . . . , N −M}
we can compute PMF of Y as
P (Y = 0) = P (X in {0, 1, . . . ,M}) =M∑
k=0
pX(k)
P (Y = k) = P (X = M + k) = pX(M + k), 0 < k ≤ N −M,
Functions of random variables 3-4
Affine functions
define Y = aX + b, a > 0. Find CDF and PDF of Y
If a > 0
y
x
{Y ≤ y}
{X ≤ y−ba }
y−ba
Y=aX
+b
P (Y ≤ y) = P (aX + b ≤ y)
= P (X ≤ (y − b)/a)
thus,
FY (y) = FX
(
y − b
a
)
PDF of Y is obtained by differentiating the CDF wrt. to y
fY (y) =1
afX
(
y − b
a
)
Functions of random variables 3-5
Example: Affine function of a Gaussian
let X ∼ N (m,σ2) :
fX(x) =1√2πσ2
exp − (x−m)2
2σ2
let Y = aX + b, with a > 0
from page 3-5,
fY (y) =1
afX
(
y − b
a
)
=1
√
2π(aσ)2exp − (y − b− am)2
2(aσ)2
• Y has also a Gaussian distribution with mean b+am and variance (aσ)2
• thus, a linear function of a Gaussian is also a Gaussian
Functions of random variables 3-6
Example: Quadratic functions
define Y = X2. find CDF and PDF of Y
for a positive y, we have
{Y ≤ y} ⇐⇒ {−√y ≤ X ≤ √
y}
thus,
FY (y) =
{
0, y < 0
FX(√y)− FX(−√
y), y > 0−√
y√y
{Y ≤ y}
Y = X2
differentiating wrt. to y gives
fY (y) =fX(
√y)
2√y
+fX(−√
y)
2√y
for X ∼ N (0, 1), Y is a Chi-square random variable with one DOF
Functions of random variables 3-7
General functions of random variables
suppose Y = g(X) is a transformation (could be many-to-one)
to find fY (y) for a fixed y, we solve the equation y = g(x) to get x
this may have multiple roots:
y = g(x1) = g(x2) = · · · = g(xn) = · · ·
it can be shown that
fY (y) =fX(x1)
|g′(x1)|+ · · ·+ fX(xn)
|g′(xn)|+ · · ·
where g′(x) is the derivative (Jacobian) of g(x)
Functions of random variables 3-8
Affine transformation: Y = aX + b, g′(x) = a
the equation y = ax+ b has a single solution x = (y − b)/a for every y, so
fY (y) =1
|a|fX(
y − b
a
)
Quadratic transformation: Y = aX2, a > 0, g′(x) = 2ax
if y ≤ 0, then the equation y = ax2 has no real solutions, so fY (y) = 0
if y > 0, then it has two solutions
x1 =√
y/a, x2 = −√
y/a
and therefore
fY (y) =1
2√ay
(
fX(√
y/a) + fX(−√
y/a))
Functions of random variables 3-9
Log of uniform variables
verify that if X has a standard uniform distribution U(0, 1), then
Y = − ln(X)/λ
has an exponential distribution with parameter λ
for Y = y, we can solve X = x = e−λy ⇒ unique root
• the Jacobian is g′(x) = −1/λx = −eλy/λ
• fY (y) = 0 when x = e−λy /∈ [0, 1] or when y < 0
• when y ≥ 0 (or e−λy ∈ [0, 1]), we will have
fY (y) =fX(e−λy)
| − 1/λx| = λe−λy
Functions of random variables 3-10
Amplitude samples of a sinusoidal waveform
let Y = cosX where X ∼ U(0, 2π], find the pdf of Y
for |y| > 1 there is no solution of x ⇒ fY (y) = 0
for |y| < 1 the equation y = cosx has two solutions:
x1 = cos−1(y), x2 = 2π − x1
the Jacobians are
g′(x1) = − sin(x1) = − sin(cos−1(y)) = −√
1− y2, g′(x2) =√
1− y2
since fX(x) = 1/2π in the interval (0, 2π], so
fY (y) =1
π√
1− y2, for − 1 < y < 1
note that although fY (±1) = ∞ the probability that y = ±1 is 0
Functions of random variables 3-11
Transform Methods
• moment generating function
• characteristic function
Functions of random variables 3-12
Moment generating functions
for a random variable X , the moment generating function (MGF) of X is
Φ(t) = E[etX]
Continuous
Φ(t) =
∫ ∞
−∞etxf(x)dx
DiscreteΦ(t) =
∑
k
etxkp(xk)
• except for a sign change, Φ(t) is the 2-sided Laplace transform of pdf
• knowing Φ(t) is equivalent to knowing f(x)
Functions of random variables 3-13
Moment theorem
computing any moments of X is easily obtained by
E[Xn] =dnΦ(t)
dtn
∣
∣
∣
∣
t=0
because
E[etX] = E
[
1 + tX +(tX)2
2!+ · · ·+ (tX)n
n!+ · · ·
]
= 1 + tE[X ] +t2
2!E[X2] + · · ·+ tn
n!E[Xn] + · · ·
note that Φ(0) = 1
Functions of random variables 3-14
MGF of Gaussian variables
the MGF of X ∼ N (µ, σ2) is given by
Φ(t) = e(µt+σ2t2/2)
it can be derived by completing square in the exponent:
Φ(t) =1√2πσ2
∫ ∞
−∞e−(x−µ)2
2σ2 etxdx
where
−(x− µ)2
2σ2+ tx = −(x− (µ+ σ2t))2
2σ2+ µt+
σ2t2
2
from the MGF, we obtain
Φ′(0) = µ, Φ′′(0) = µ2 + σ2
Functions of random variables 3-15
Characteristic functions
the characteristic function (CF) of a random variable X is defined by
Continuous
Φ(ω) = E[ejωX] =
∫ ∞
−∞f(x)ejωxdx
DiscreteΦ(ω) = E[ejωX] =
∑
k
eiωxkp(xk)
• Φ(ω) is obtained from the moment generating function when t = jω
• Φ(ω) is simply the Fourier transform of the PDF or PMF of X
• every pdf and its characteristic function form a unique Fourier pair:
Φ(ω) ⇐⇒ f(x)
Functions of random variables 3-16
the characteristic function is maximum at origin because f(x) ≥ 0:
|Φ(ω)| ≤ Φ(0) = 1
Linear transformation: if Y = aX + b , then
Φy(ω) = ejbωΦx(aω)
Gaussian variables: let X ∼ N (µ, σ2)
the characteristic function of X is
Φ(ω) = ejµω · e−σ2ω2/2
Binomial variables: parameters are n, p and q = 1− p
Φ(ω) = (pejω + q)n
Functions of random variables 3-17
Markov and Chebyshev Inequalities
Markov inequality
let X be a nonnegative RV with mean E[X ]
P (X ≥ a) ≤ E[X ]
a, a > 0
Chebyshev inequality
let X be an RV with mean µ and variance σ2
P (|X − µ| ≥ a) ≤ σ2
a2
Functions of random variables 3-18
example: manufacturing of low grade resistors
• assume the averge resistance is 100 ohms (measured by a statisticalanalysis)
• some of resistors have different values of resistance
if all resistors over 200 ohms will be discarded, what is the maximumfraction of resistors to meet such a criterion ?
using Markov inequality with µ = 100 and a = 200
P (X ≥ 200) ≤ 100
200= 0.5
the percentage of discarded resistors cannot exceed 50% of the total
Functions of random variables 3-19
if the variance of the resistance is known to equal 100, find the probabilitythat the resistance values are between 50 and 150
P (50 ≤ X ≤ 150) = P (|X − 100| ≤ 50)
= 1− P (|X − 100| ≥ 50)
by Chebyshev inequality
P (|X − 100| ≥ 50) ≤ σ2
(50)2= 1/25
hence,
P (50 ≤ X ≤ 150) ≥ 1− 1
25=
24
25
Functions of random variables 3-20
Chernoff bound
the Chernoff bound is given by
P (X ≥ a) ≤ inft≥0
Eet(X−a)
which can be expressed as
logP (X ≥ a) ≤ inft≥0
{−ta+ logEetX}
• E[etX] is the moment generating function
• logEetX is called the cumulant generating function
• Chernoff bound is useful when EetX has an analytical expression
Functions of random variables 3-21
Example: X is Gaussian with zero mean and unit variance
the cumulant generating function is
logE[etX] = t2/2
hence,logP (X ≥ a) ≤ inf
t≥0{−ta+ t2/2} = −a2/2
and the Chernoff bound gives
P (X ≥ a) ≤ e−a2/2
which is tighter than the Chebyshev inequality:
P (|X | ≥ a) ≤ 1/a2 =⇒ P (X ≥ a) ≤ 1/2a2
Functions of random variables 3-22
0 0.5 1 1.5 20
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Chernoff bound
Chebyshev boundP(X
≥a)
P (X ≥ a)
a
when a is small, Chebyshev bound is useless while the Chernoff bound istighter
Functions of random variables 3-23
References
Chapter 3,4 inA. Leon-Garcia, Probability, Statistics, and Random Processes for
Electrical Engineering, 3rd edition, Pearson Prentice Hall, 2009
Functions of random variables 3-24
EE401 (Semester 1) Jitkomut Songsiri
4. Pairs of Random Variables
• probabilities
• conditional probability and expectation
• independence
• joint characteristic function
• functions of random variables
4-1
Definition
let ζ be an outcome in the sample space S
a pair of RVs Z(ζ) is a function that maps ζ to a pair of real numbers
Z(ζ) = (X(ζ), Y (ζ))
example: a web page provides the user with a choice either to watch anad or move directly to the requested page
let ζ be the patterns of user arrivals to a webpage
• N1(ζ) be the number of times the webpage is directly requested
• N2(ζ) be the number of that the ads is chosen
(N1(ζ), N2(ζ)) assigns a pair of nonnegative integers to each outcome ζ
Pairs of Random Variables 4-2
Events of interest
events involving a pair of RVs (X, Y ) can be represented by regions
Example:
yyy
xxx(4, 0)
(0, 4)(5, 5)
A
B
C
A = {X + Y ≤ 4}, B = {min(X,Y ) ≤ 5}, C = {X2 + Y 2 ≤ 25}
• A: total revenue from two sources is less than 4
• C: total noise power is less than r2
Pairs of Random Variables 4-3
Events and Probabilities
we consider the events that that the product form:
C = {X in A} ∩ {Y in B}
the probability of product-form events is
P (C) = P ({X in A}) ∩ P ({Y in B})= P (X in A, Y in B)
Pairs of Random Variables 4-4
Probability for pairs of random variables
Joint cumulative distribution function
FXY (a, b) = P (X ≤ a, Y ≤ b)
Properties:
y
x
(a, b)
• a joint CDF is a nondecreasing function of x and y:
FXY (x1, y1) ≤ FXY (x2, y2), if x1 ≤ x2 and y1 ≤ y2
• FXY (x1,−∞) = 0, FXY (−∞, y1) = 0, FXY (∞,∞) = 1
• P (x1 < X ≤ x2, y1 < Y ≤ y2)
= FXY (x2, y2)− FXY (x2, y1)− FXY (x1, y2) + FXY (x1, y1)
Pairs of Random Variables 4-5
Joint PMF for discrete RVs
pXY (x, y) = P (X = x, Y = y), (x, y) ∈ S
Joint PDF for continuous RVs
fXY (x, y) =∂2FXY (x, y)
∂x∂y
Marginal PMF
pX(x) =∑
y∈S
pXY (x, y), pY (y) =∑
x∈S
pXY (x, y)
Marginal PDF
fX(x) =
∫ ∞
−∞fXY (x, z)dz, fY (y) =
∫ ∞
−∞fXY (z, y)dz
Pairs of Random Variables 4-6
Example 1: Jointly Gaussian Random Variables
if X,Y are jointly Gaussian, a joint pdf of X and Y can be given by
fXY (x, y) =1
2π√
1− ρ2exp − (x2 − 2ρxy + y2)
2(1− ρ2), −∞ < x, y < ∞
−2
0
2
−2
0
2
0
0.05
0.1
0.15
0.2
fXY (x, y)
with |ρ| < 1
find the marginal PDF’s
Pairs of Random Variables 4-7
the marginal pdf of X is found by integrating fXY (x, y) over y:
fX(x) =e−x2/2(1−ρ2)
2π√
1− ρ2
∫ ∞
−∞e−(y2−2ρxy)/2(1−ρ2)dy
=e−x2/2
√2π
∫ ∞
−∞
e−(y−ρx)2/2(1−ρ2)
√
2π(1− ρ2)dy
=e−x2/2
√2π
• the second step follows from completing the square in (y − ρx)2
• the last integral equals 1 since its integrand is a Gaussian pdf withmean ρx and variance 1− ρ2
• the marginal pdf of X is also a Gaussian with mean 0 and variance 1
• from the symmetry of fXY (x, y) in x and y, the marginal pdf of Y isalso the same as X
Pairs of Random Variables 4-8
Example 2
consider X and Y with a joint PDF
fXY (x, y) = ce−xe−y, 0 ≤ y ≤ x < ∞
find the constant c, the marginal PDFs and P (X + Y ≤ 1)
y y
x x
x=
y
x=
y x+y=1
12
12
the constant c is found from the normalization condition:
1 =
∫ ∞
0
∫ x
0
ce−xe−ydydx =⇒ c = 2
Pairs of Random Variables 4-9
the marginal PDFs are obtained by
fX(x) =
∫ ∞
0
fXY (x, y)dy =
∫ x
0
2e−xe−ydy, 0 ≤ x < ∞
fY (y) =
∫ ∞
0
fXY (x, y)dx =
∫ ∞
y
2e−xe−ydx = 2e−y, 0 ≤ y < ∞
P (X + Y ≤ 1) can be found by taking the intersection of the region wherethe joint PDF is nonzero and the event {X + Y ≤ 1}
P (X + Y ≤ 1) =
∫ 1/2
0
∫ 1−y
y
2e−xe−ydxdy =
∫ 1/2
0
2e−y[e−y − e−(1−y)]dy
= 1− 2e−1
Pairs of Random Variables 4-10
Conditional Probability
Discrete RVs
the conditional PMF of Y given X = x is defined by
pY |X(y|x) = P (Y = y|X = x) =P (X = x, Y = y)
P (X = x)
=pXY (x, y)
pX(x)
Continuous RVs
the conditional PDF of Y given X = x is defined by
fY |X(y|x) =fXY (x, y)
fX(x)
Pairs of Random Variables 4-11
Example: Number of defects in a region
• let X be the total number of defects on a chip
X ∼ Poisson(α)
• let Y be the number of defects falling in region R
• if X = n (given), then Y is binomial with (n, p)
pY |X(k|n) =
0, k > n(
n
k
)
pk(1− p)n−k, 0 ≤ k ≤ n
• we can show thatY ∼ Poisson(αp)
Pairs of Random Variables 4-12
P (Y = k) =∞∑
n=0
P (Y = k|X = n)P (X = n)
=∞∑
n=k
(
nk
)
pk(1− p)n−kαne−α
n!
=(αp)ke−α
k!
∞∑
n=k
[(1− p)α]n−k
(n− k)!
=(αp)ke−αe(1−p)α
k!=
(αp)k
k!e−αp
Pairs of Random Variables 4-13
Example: Customers arrive at a service station
• let N be # of customers arriving at a station during time t
N ∼ Poisson(βt)
• let T be the service time for each customer
T ∼ exponential(α)
• we can show that # of customers that arrive during the service time isa geometric RV with probability of success α/(α+ β)
Pairs of Random Variables 4-14
P (N = k) =
∫ ∞
0
P (N = k|T = t)fT (t)dt
=
∫ ∞
0
(
(βt)k
k!e−βt
)
αe−αtdt
=αβk
k!
∫ ∞
0
tke−(α+β)tdt
let r = (α+ β)t, then
P (N = k) =αβk
k!(α+ β)k+1
∫ ∞
0
rke−rdr
=
(
α
α+ β
)(
β
α+ β
)k
(the last integral is a gamma function and is equal to k!)
Pairs of Random Variables 4-15
Conditional Expectation
the conditional expectation of Y given X = x is defined by
Continuous RVs
E[Y |X ] =
∫ ∞
−∞y fY |X(y|x)dy
Discrete RVsE[Y |X ] =
∑
y
y pY |X(y|x)
• E[Y |X ] is the center of mass associated with the conditional pdf or pmf
• E[Y |X ] can be viewed as a function of random variable X
• E[E[Y |X ]] = E[Y ]
Pairs of Random Variables 4-16
in fact, we can show that
E[h(Y )] = E[E[h(Y )|X ]]
for any function h(·) that E[|h(Y )|] < ∞proof.
E[E[h(Y )|X ]] =
∫ ∞
−∞E[h(Y )|x]fX(x)dx
=
∫ ∞
−∞
∫ ∞
−∞h(y)fY |X(y|x)dy fX(x)dx
=
∫ ∞
−∞h(y)
∫ ∞
−∞fXY (x, y) dxdy
=
∫ ∞
−∞h(y)fY (y) dy
= E[h(Y )]
Pairs of Random Variables 4-17
Example: Average defects of a chip in a region
From the example on page 4-12,
E[Y ] = E[E[Y |X ]]
=∞∑
n=0
np P (X = n)
= p∞∑
n=0
nP (X = n)
= pE[X ] = αp
Pairs of Random Variables 4-18
Example: Average arrivals in a service time
from the example on page 4-14,
E[N ] = E[E[N |T ]]
=
∫ ∞
0
E[N |T = t]fT (t)dt
=
∫ ∞
0
βtfT (t)dt
= βE[T ]
=β
α
Pairs of Random Variables 4-19
Example: Variance of arrivals in a service time
same example as in page 4-12
N is Poisson RV with parameter βt when T = t is given, so
E[N |T = t] = βt, E[N2|T = t] = (βt) + (βt)2
the second moment of N can be calculated by
E[N2] = E[E[N2|T ]]
=
∫ ∞
0
E[N2|T = t] fT (t) dt
=
∫ ∞
0
(βt+ β2t2) fT (t) dt
= βE[T ] + β2E[T 2]
Pairs of Random Variables 4-20
therefore,
var(N) = E[N2]− (E[N ])2
= β2E[T 2] + βE[T ]− β2(E[T ])2
= β2 var(T ) + βE[T ]
• if T is not random (E[T ] is constant and var(T ) = 0), the mean andvariance of N are those of a Poisson RV with parameter βE[T ]
• when T is random, the mean of N remains the same, but var(N)increases by the term β2 var(T )
• bote that the above result holds for any distribution fT (t)
• if T is exponential with parameter α, then E[T ] = 1/α andvar(T ) = 1/α2, so
E[N ] =β
α, var(N) =
β2
α2+
β
α
Pairs of Random Variables 4-21
Independence of two random variables
X and Y are independent if and only if
FXY (x, y) = FX(x)FY (y), ∀x, y
this is equivalent to
Discrete Random Variables
pXY (x, y) = pX(x)pY (y)
pY |X(y|x) = pY (y)
Continuous Random Variables
fXY (x, y) = fX(x)fY (y)
fY |X(y|x) = fY |X(y)
If X and Y are independent, so are any pair of functions g(X) and h(Y )
Pairs of Random Variables 4-22
Example
let X and Y be Gaussian RVs with zero mean and unit variance
the product of the marginal pdf’s of X and Y is
fX(x)fY (y) =1
2πexp − (x2 + y2)
2, −∞ < x, y < ∞
from the example on page 4-7, the joint pdf of X and Y is
fXY (x, y) =1
2π√
1− ρ2exp − (x2 − 2ρxy + y2)
2(1− ρ2), −∞ < x, y < ∞
therefore the jointly Guassian X and Y are independent if and only if
ρ = 0
ρ is called correlation coefficient between X and Y
Pairs of Random Variables 4-23
Expected Values and Covariance
the expected value of Z = g(X,Y ) is defined as
E[Z] =
∫ ∞
−∞
∫ ∞
−∞g(x, y) fXY (x, y) dx dy X, Y continuous
E[Z] =∑
x
∑
y
g(x, y) pXY (x, y) X, Y discrete
• E[X + Y ] = E[X ] +E[Y ]
• E[XY ] = E[X ]E[Y ] if X and Y are independent
Covariance of X and Y
cov(X,Y ) = E[(X −E[X ])(Y −E[Y ])]
• cov(X,Y ) = E[XY ]−E[X ]E[Y ]
• cov(X,Y ) = 0 if X and Y are independent (the converse is NOT true)
Pairs of Random Variables 4-24
Correlation Coefficient
denoteσX =
√
var(X), σY =√
var(Y )
the standard deviations of X and Y
the correlation coefficient of X and Y is defined by
ρXY =cov(X, Y )
σXσY
• −1 ≤ ρXY ≤ 1
• ρXY gives the linear dependence between X and Y : for Y = aX + b,
ρXY = 1 if a > 0 and ρXY = −1 if a < 0
• X and Y are said to be uncorrelated if ρXY = 0
Pairs of Random Variables 4-25
if X and Y are independent then X and Y are uncorrelated
but the converse is NOT true
example: uncorrelated but dependent random variables
let θ be a uniform RV in the interval (0, 2π) and let
X = cos θ, Y = sin θ
y
xθ1
1
−1
−1
• the marginals of X and Y are arcsine pdf’s
• the products of the marginals of X and Y is nonzeroin the square region
• (X,Y ) is the point on the unit circle, so they areare dependent
E[XY ] =1
2π
∫ 2π
0
sinφ cosφ dφ =1
4π
∫ 2π
0
sin 2φ dφ = 0
since E[X ] = E[Y ] = 0, the above eq. implies X and Y are uncorrelated
Pairs of Random Variables 4-26
Joint Characteristic Function
the joint characteristic function of X and Y is defined by
ΦXY (λ, ω) = E[ej(λX+ωY )] =
∫ ∞
−∞
∫ ∞
−∞ej(λX+ωY )fXY (x, y) dx dy
the joint characterestic function is a 2D Fourier transform
if X and Y are independent
ΦXY (λ, ω) = E[ejλX]E[ejωY ] = ΦX(λ)ΦY (ω)
Pairs of Random Variables 4-27
Example
let U ∼ N (0, 1) and V ∼ N (0, 1) be independent RVs
defineX = U + V, Y = U − V
the joint characteristic function of X,Y is obtained by
ΦXY (λ, ω) = E[ej(λ(U+V )+ω(U−V ))]
= E[ej(λ+ω)U+j(λ−ω)V ]
= E[ej(λ+ω)U ]E[ej(λ−ω)V ]
= ΦU(λ+ ω)ΦV (λ− ω)
= e−(λ+ω)2/2e−(λ−ω)2/2
= = e−(λ2+ω2) = ΦX(λ)ΦY (ω)
X and Y are also Gaussian with zero mean and variance 2
Pairs of Random Variables 4-28
from the identity:
∂2E[ej(λX+ωY )]
∂λ∂ω= j2E[XY ej(λX+ωY )]
the joint characteristic function is also useful for finding E[XY ], since
E[XY ] =1
j2∂2E[ej(λX+ωY )]
∂λ∂ω
∣
∣
∣
∣
λ=0,ω=0
= −∂2E[e−(λ2+ω2)]
∂λ∂ω
∣
∣
∣
∣
∣
λ=0,ω=0
= −e−(λ2+ω2)(4λω)|λ=0,ω=0
= 0
thus X and Y are uncorrelated
(note that X and Y have zero mean)
Pairs of Random Variables 4-29
Function of Multiple Random Variables
• sum of random variables: Z = X + Y
• division of random variables: Z = X/Y
• linear transformation
Pairs of Random Variables 4-30
Sum of Random Variables
y
x
x+ y ≤ z
Let Z = X + Y
P (Z ≤ z) = P (X + Y ≤ z)
integrate the joint pdf fXY over the yellow region
CDF of Z: FZ(z) = P (Z ≤ z) =∞∫
−∞
z−x∫
−∞fXY (x, y)dydx
PDF of Z: fZ(z) =dFZ(z)
dz=
∞∫
−∞fXY (x, z − x)dx
when X,Y are independent, the pdf of Z has a form of convolution:
fZ(z) =
∫ ∞
−∞fX(x)fY (z − x)dx
Pairs of Random Variables 4-31
Example
find the pdf of the sum Z = X + Y
X, Y are jointly Gaussian with zero mean and unit variance withcorrelation coefficient ρ = −1/2
fZ(z) =
∫ ∞
−∞fXY (x, z − x)dx
=1
2π√
(1− ρ2)
∫ ∞
−∞e−(x2−2ρx(z−x)+(z−x)2)/2(1−ρ2)dx
=1
2π√
3/4
∫ ∞
−∞e−(x2−xz+z2)/2(3/4)dx
=e−z2/2
√2π
the sum of these two nonindependent Gaussian is also a Gaussian RV
Pairs of Random Variables 4-32
Characteristic function of a sum
let X and Y be independent RVs and define
Z = X + Y
then CF of X is the product of CFs of X and Y :
ΦZ(ω) = ΦX(ω)ΦY (ω)
proof.
ΦZ(ω) = E[ejωZ] = E[ejω(X+Y )]
= E[ejωX] E[ejωY ] (∵ X and Y are independent)
= ΦX(ω) ΦY (ω)
Pairs of Random Variables 4-33
Example: sum of independent binomials
let X and Y be i.i.d. binomials RVs with parameters n, p
PX(k) = PY (k) =
(
nk
)
pk qn−k (q = 1− p)
first compute the CF of X and Y
ΦX(ω) = ΦY (ω) =n∑
k=0
ejωk
(
nk
)
pk qn−k = (pejω + q)n
the CF of Z is then given by
ΦZ(ω) = ΦX(ω) ΦY (ω) = (pejω + q)2n
conclusion: Z is also a binomial with parameters 2n and p
Pairs of Random Variables 4-34
Division of Random Variables
let Z = X/Y
if Y = y (given), then Z = X/y, a scaled version of X
therefore, if Y is fixed then the distribution of Z must be the same as X
fZ|Y (z|y) = |y|fX|Y (yz|y)
use this result to find the pdf of Z:
fZ(z) =
∫ ∞
−∞fZ|Y (z|y)fY (y) dy
=
∫ ∞
−∞|y|fX|Y (yz|y)fY (y) dy
=
∫ ∞
−∞|y|fXY (yz, y) dy
Pairs of Random Variables 4-35
Division of Exponential RVs
let X and Y be exponential RVs with mean 1
fX(x) = e−x, x ≥ 0, fY (y) = e−y, y ≥ 0
assume that X,Y are independent, so
fXY (x, y) = fX(x)fY (y) = e−(x+y)
the pdf of Z = Y/X can be determined by
fZ(z) =
∫ ∞
0
ye−yze−ydy =1
(z + 1)2, z > 0
Pairs of Random Variables 4-36
Linear Transformation
let A be an invertible linear transformation such that[
UV
]
= A
[
XY
]
⇐⇒[
XY
]
= A−1
[
UV
]
(x, y)
(u, v)dx
dyA dM :
area of the parallelogram
P (X ∈ dx, Y ∈ dy) = fXY (x, y)dxdy, P (U ∈ du, V ∈ dV ) = fUV (u, v)dM
it can be shown that
dM = | detA|dx dy,
fUV (u, v) =1
| detA|fXY (x, y)
Pairs of Random Variables 4-37
Example: Linear Transformation of a Gaussian
let X and Y be jointly Gaussian RVs with the joint pdf
fXY (x, y) =1
2π√
3/4exp − 2(x2 − xy + y2)
3
let U and V be obtained from (X,Y ) by
[
UV
]
=1√2
[
1 1−1 1
] [
XY
]
⇐⇒[
XY
]
=1√2
[
1 −11 1
] [
UV
]
therefore the pdf of U and V is
fUV (u, v) =1
π√3exp − (u2/3 + v2)
U and V become independent, zero-mean Gaussian RVs
Pairs of Random Variables 4-38
References
Chapter 5 inA. Leon-Garcia, Probability, Statistics, and Random Processes for
Electrical Engineering, 3rd edition, Pearson Prentice Hall, 2009
Pairs of Random Variables 4-39
EE401 (Semester 1) Jitkomut Songsiri
5. Random Vectors
• probabilities
• characteristic function
• cross correlation, cross covariance
• Gaussian random vectors
• functions of random vectors
5-1
Random vectors
we denote X a random vector
X is a function that maps each outcome ζ to a vector of real numbers
an n-dimensional random variable has n components:
X =
X1
X2...
Xn
also called a multivariate or multiple random variable
Random Vectors 5-2
Probabilities
Joint CDF
F (x) , FX(x1, x2, . . . , xn) = P (X1 ≤ x1, X2 ≤ x2, . . . ,Xn ≤ xn)
Joint PMF
p(x) , pX(x1, x2, . . . , xn) = P (X1 = x1, X2 = x2, . . . ,Xn = xn)
Joint PDF
f(x) , fX(x1, x2, . . . , xn) =∂n
∂x1 . . . ∂xnF (x)
Random Vectors 5-3
Marginal PMF
pXj(xj) = P (Xj = xj) =∑
x1
. . .∑
xj−1
∑
xj+1
. . .∑
xn
pX(x1, x2, . . . , xn)
Marginal PDF
fXj(xj) =
∫ ∞
−∞. . .
∫ ∞
−∞fX(x1, x2, . . . , xn) dx1 . . . dxj−1dxj+1 . . . dxn
Conditional PDF: the PDF of Xn given X1, . . . ,Xn−1 is
f(xn|x1, . . . , xn−1) =fX(x1, . . . , xn)
fX1,...,Xn−1(x1, . . . , xn−1)
Random Vectors 5-4
Characteristic Function
the characteristic function of an n-dimensional RV is defined by
Φ(ω) = Φ(ω1, . . . , ωn) = E[ej(ω1X1+···+ωnXn)]
=
∫
x
ejωTxf(x)dx
where
ω =
ω1
ω2...ωn
, x =
x1
x2...xn
Φ(ω) is the n-dimensional Fourier transform of f(x)
Random Vectors 5-5
Independence
the random variables X1, . . . , Xn are independent if
the joint pdf (or pmf) is equal to the product of their marginal’s
DiscretepX(x1, . . . , xn) = pX1(x1) · · · pXn(xn)
ContinuousfX(x1, . . . , xn) = fX1(x1) · · · fXn(xn)
we can specify an RV by the characteristic function in place of the pdf,
X1, . . . , Xn are independent if
Φ(ω) = Φ1(ω1) · · ·Φn(ωn)
Random Vectors 5-6
Example: White noise signal in communication
the n samples X1, . . . Xn of a noise signal have the joint pdf:
fX(x1, . . . , xn) =e−(x2
1+···+x2n)/2
(2π)n/2for all x1, . . . , xn
the joint pdf is the n-product of one-dimensional Gaussian pdf’s
thus, X1, . . . ,Xn are independent Gaussian random variables
Random Vectors 5-7
Expected Values
the expected value of a function
g(X) = g(X1, . . . , Xn)
of a vector random variable X is defined by
E[g(X)] =
∫
x
g(x)f(x)dx Continuous
E[g(X)] =∑
x
g(x)p(x) Discrete
Mean vector
µ = E[X] = E
X1
X2...
Xn
,
E[X1]E[X2]
...E[Xn]
Random Vectors 5-8
Correlation and Covariance matrices
Correlation matrix has the second moments of X as its entries:
R , E[XXT ] =
E[X1X1] E[X1X2] · · · E[X1Xn]E[X2X1] E[X2X2] · · · E[X2Xn]
... ... . . . ...E[XnX1] E[XnX2] · · · E[XnXn]
withRij = E[XiXj]
Covariance matrix has the second-order central moments as its entries:
C , E[(X− µ)(X− µ)T ]
withCij = cov(Xi,Xj) = E[(Xi − µi)(Xj − µj)]
Random Vectors 5-9
Symmetric matrix
A ∈ Rn×n is called symmetric if A = AT
Facts: if A is symmetric
• all eigenvalues of A are real
• all eigenvectors of A are orthogonal
• A admits a decomposition
A = UDUT
where UTU = UUT = I (U is unitary) and D is diagonal
(of course, the diagonals of D are eigenvalues of A)
Random Vectors 5-10
Unitary matrix
a matrix U ∈ Rn×n is called unitary if
UTU = UUT = I
example: 1√2
[
1 −11 1
]
,
[
cos θ − sin θsin θ cos θ
]
Facts:
• a real unitary matrix is also called orthogonal
• a unitary matrix is always invertible and U−1 = UT
• columns vectors of U are mutually orthogonal
• norm is preserved under a unitary transformation:
y = Ux =⇒ ‖y‖ = ‖x‖
Random Vectors 5-11
Positive definite matrix
a symmetric matrix A is positive semidefinite, written as A � 0 if
xTAx ≥ 0, ∀x ∈ Rn
and positive definite, written as A ≻ 0 if
xTAx > 0, for all nonzero x ∈ Rn
Facts: A � 0 if and only if
• all eigenvalues of A are non-negative
• all principle minors of A are non-negative
Random Vectors 5-12
example: A =
[
1 −1−1 2
]
� 0 because
xTAx =[
x1 x2
]
[
1 −1−1 2
] [
x1
x2
]
= x21 + 2x2
2 − 2x1x2
= (x1 − x2)2 + x2
2 ≥ 0
or we can check from
• eigenvalues of A are 0.38 and 2.61 (real and positive)
• the principle minors are 1 and
∣
∣
∣
∣
1 −1−1 2
∣
∣
∣
∣
= 1 (all positive)
note: A � 0 does not mean all entries of A are positive!
Random Vectors 5-13
Properties of correlation and covariance matrices
let X be a (real) n-dimensional random vector with mean µ
Facts:
• R and C are n× n symmetric matrices
• R and C are positive semidefinite
• If X1, . . . ,Xn are independent, then C is diagonal
• the diagonals of C are given by the variances of Xk
• if X has zero mean, then R = C
• C = R− µµT
Random Vectors 5-14
Cross Correlation and Cross Covariance
let X,Y be vector random variables with means µX, µY respectively
Cross Correlation
cor(X,Y) = E[XYT ]
if cor(X,Y) = 0 then X and Y are said to be orthogonal
Cross Covariance
cov(X,Y) = E[(X− µX)(Y − µY )T ]
= cor(X,Y)− µXµTY
if cov(X,Y) = 0 then X and Y are said to be uncorrelated
Random Vectors 5-15
Affine transformation
let Y be an affine transformation of X:
Y = AX+ b
where A and b are deterministic matrices
• µY = AµX + b
µY = E[AX+ b] = AE[X] +E[b] = AµX + b
• CY = ACXAT
CY = E[(Y − µY )(Y − µY )T ] = E[(A(X− µX))(A(X− µX))T ]
= AE[(X− µX)(X− µX)T ]AT = ACXAT
Random Vectors 5-16
Diagonalization of covariance matrix
suppose a random vector Y is obtained via a linear transformation of X
X Y
A
• the covariance matrices of X,Y are CX,CY respectively
• A may represent linear filter, system gain, etc.
• the covariance of Y is CY = ACXAT
Problem: choose A such that CY becomes ’diagonal’
in other words, the variables Y1, . . . , Yn are required to be uncorrelated
Random Vectors 5-17
since CX is symmetric, it has the decomposition:
CX = UDUT
where
• D is diagonal and its entries are eigenvalues of CX
• U is unitary and the columns of U are eigenvectors of CX
diagonalization: pick A = UT to obtain
CY = ACXAT = AUDUTAT = UTUDUTU = D
as desired
Random Vectors 5-18
one can write X in terms of Y as
X = UUTX = UY =[
U1 U2 · · · Un
]
Y1
Y2...Yn
=
n∑
k=1
YkUk
this equation is called Karhunen-Loeve expansion
• X can be expressed as a weighted sum of the eigenvectors Uk
• the weighting coefficients are uncorrelated random variables Yk
Random Vectors 5-19
example: X has the covariance matrix
[
4 22 4
]
design a transformation Y = AX s.t. the covariance of Y is diagonal
the eigenvalues of CX and the corresponding eigenvectors are
λ1 = 6, u1 =
[
11
]
, λ2 = 2, u2 =
[
1−1
]
u1 and u2 are orthogonal, so if we normalize uk so that ‖uk‖ = 1 then
U =[
u1√2
u2√2
]
is unitary
therefore, CX = UDUT where
U =
[
1/√2 1/
√2
1/√2 −1/
√2
]
, D =
[
6 00 2
]
thus, if we choose A = UT then CY = D which is diagonal as desired
Random Vectors 5-20
Whitening transformation
we wish to find a transformation Y = AX such that
CY = I
• a white noise property: the covariance is the identity matrix
• all components in Y are all uncorrelated
• the variances of Yk are normalized to 1
from CY = ACXAT and use the eigenvalue decomposition in CX
CY = AUDUTAT = AUD1/2D1/2UTAT
denote D1/2 the square root of D with D � 0, i.e., D1/2D1/2 = D
D = diag(d1, . . . , dn) =⇒ D1/2 = diag(√
d1, . . . ,√
dn)
can you find A that makes CY the identity matrix ?
Random Vectors 5-21
Gaussian random vector
X1, . . . , Xn are said to be jointly Gaussian if their joint pdf is given by
f(x) , fX(x1, x2, . . . , xn) =1
(2π)n/2 det(Σ)1/2exp −1
2(x−µ)TΣ−1(x−µ)
µ is the mean (n× 1) and Σ ≻ 0 is the covariance matrix (n× n):
µ =
µ1
µ2...µn
, Σ =
Σ11 Σ12 · · · Σ1n
Σ21 Σ22 · · · Σ2n... ... . . . ...
Σn1 Σn2 · · · Σnn
and
µk = E[Xk], Σij = E[(Xi − µi)(Xj − µj)]
Random Vectors 5-22
example: the joint density function of x (not normalized) is given by
f(x1, x2, x3) = exp − x21 + 3x2
2 + 2(x3 − 1)2 + 2x1(x3 − 1)
2
• f is an exponential of negative quadratic in x so x must be a Gaussian
f(x1, x2, x3) = exp − 1
2
x1
x2
x3 − 1
T
1 0 10 3 01 0 2
x1
x2
x3 − 1
• the mean vector is (0, 0, 1) and the covariance matrix is
C =
1 0 10 3 01 0 2
−1
=
2 0 −10 1/3 0−1 0 1
• the variance of x1 is highest while x2 is smallest
• x1 and x2 are uncorrelated, so are x2 and x3
Random Vectors 5-23
examples of Gaussian density contour (the exponent of exponential)
[
x1
x2
]T [Σ11 Σ12
Σ12 Σ22
]−1 [x1
x2
]
= 1
uncorrelated
x1
x2
different variance
x1
x2
correlated
x1
x2
Σ =
[
1 00 1
]
Σ =
[
1/2 00 1
]
Σ =
[
1 −1−1 2
]
Random Vectors 5-24
Properties of Gaussian variables
many results on Gaussian RVs can be obtained analytically:
• marginal’s of X is also Gaussian
• conditional pdf of Xk given the other variables is a Gaussian distribution
• uncorrelated Gaussian random variables are independent
• any affine transformation of a Gaussian is also a Gaussian
these are well-known facts
and more can be found in the areas of estimation, statistical learning, etc.
Random Vectors 5-25
Characteristic function of Gaussian
Φ(ω) = Φ(ω1, ω2, . . . , ωn) = ejµTω e−
ωTΣω2
Proof. By definition and arranging the quadratic term in the power of exp
Φ(ω) =1
(2π)n/2|Σ|1/2∫
x
ejxTω e−
(x−µ)TΣ−1(x−µ)2 dx
=ejµ
Tω e−ωTΣω
2
(2π)n/2|Σ|1/2∫
x
e−(x−µ−jΣω)TΣ−1(x−µ−jΣω)
2 dx
= exp (jµTω) exp (−1
2ωTΣω)
(the integral equals 1 since it is a form of Gaussian distribution)
for one-dimensional Gaussian with zero mean and variance Σ = σ2,
Φ(ω) = e−σ2ω2
2
Random Vectors 5-26
Affine Transformation of a Gaussian is Gaussian
let X be an n-dimensional Gaussian, X ∼ N (µ,Σ) and define
Y = AX+ b
where A is m× n and b is m× 1 (so Y is m× 1)
ΦY(ω) = E[ejωTY] = E[ejω
T (AX+b)]
= E[ejωTAX · ejωTb] = ejω
TbΦX(ATω)
= ejωTb · ejµTATω · e−ωTAΣATω/2
= ejωT (Aµ+b) · e−ωTAΣATω/2
we read off that Y is Gaussian with mean Aµ+ b and covariance AΣAT
Random Vectors 5-27
Marginal of Gaussian is Gaussian
the kth component of X is obtained by
Xk =[
0 · · · 1 0]
X , eTkX
(ek is a standard unit column vector; all entries are zero except the kth
position)
hence, Xk is simply a linear transformation (in fact, a projection) of X
Xk is then a Gaussian with mean
eTkµ = µk
and covarianceeTk Σ ek = Σkk
Random Vectors 5-28
Uncorrelated Gaussians are independent
suppose (X,Y) is a jointly Gaussian vector with
mean µ =
[
µx
µy
]
and covariance
[
CX 00 CY
]
in otherwords, X and Y are uncorrelated Gaussians:
cov(X,Y ) = E[XY T ]−E[X ]E[Y ]T = 0
the joint density can be written as
fXY(x,y) =1
(2π)n|CX|1/2|CY |1/2exp −1
2
[
x− µx
y − µy
]T [C−1
X 00 C−1
Y
] [
x− µx
y − µy
]
=1
(2π)n/2|CX|1/2e−1
2(x−µx)TC
−1X (x−µx)· 1
(2π)n/2|CY |1/2e−
12(y−µy)
TC−1Y (y−µy)
proving the independence
Random Vectors 5-29
we can also see from the characteristic function
Φ(ω1, ω2) = E
[
exp
(
j
[
ω1
ω2
]T [X
Y
]
)]
= exp
(
j
[
ω1
ω2
]T [µy
µy
]
)
· exp − 1
2
[
ω1
ω2
]T [CX 00 CY
] [
ω1
ω2
]
= exp (jωT1 µx) exp − ωT
1 CXω1
2· exp (jωT
2 µy) · exp − ωT2 CYω2
2
, Φ1(ω1) · Φ2(ω2)
proving the independence
Random Vectors 5-30
Conditional of Gaussian is Gaussian
let Z be an n-dimensional Gaussian which can be decomposed as
Z =
[
X
Y
]
∼ N([
µx
µy
]
,
[
Σxx Σxy
ΣTxy Σyy
])
the conditional pdf of X given Y is also Gaussian with conditional mean
µX|Y = µx +ΣxyΣ−1yy (Y − µy)
and conditional covariance
ΣX|Y = Σx − ΣxyΣ−1yy Σ
Txy
Random Vectors 5-31
Proof:
from the matrix inversion lemma, Σ−1 can be written as
Σ−1 =
S−1 −S−1ΣxyΣ−1yy
−Σ−1yy Σ
TxyS
−1 Σ−1yy +Σ−1
yy ΣTxyS
−1ΣxyΣ−1yy
where S is called the Schur complement of Σxx in Σ and
S = Σxx − ΣxyΣ−1yy Σ
Txy
detΣ = detS · detΣyy
we can show that Σ ≻ 0 if any only if S ≻ 0 and Σyy ≻ 0
Random Vectors 5-32
from fX|Y(x|y) = fX(x,y)/fY(y), we calculate the exponent terms
[
x− µx
y − µy
]T
Σ−1
[
x− µx
y − µy
]
− (y − µy)TΣ−1
yy (y − µy)
= (x− µx)TS−1(x− µx)− (x− µx)
TS−1ΣxyΣ−1yy (y − µy)
−(y − µy)TΣ−1
yy ΣTxyS
−1(x− µx)
+(y − µy)T (Σ−1
yyΣTxyS
−1ΣxyΣ−1yy )(y − µy)
= [x− µx − ΣxyΣ−1yy (y − µy)]
TS−1[x− µx − ΣxyΣ−1yy (y − µy)]
, (x− µX|Y)TΣ−1
X|Y(x− µX|Y)
fX|Y(x|y) is an exponential of quadratic function in x
so it has a form of Gaussian
Random Vectors 5-33
Standard Gaussian vectors
for an n-dimensional Gaussian vector X ∼ N (µ,C) with C ≻ 0
let A be an n× n invertible matrix such that
AAT = C
(A is called a factor of C)
then the random vectorZ = A−1(X− µ)
is a standard Gaussian vector, i.e.,
Z ∼ N (0, I)
(obtain A via eigenvalue decomposition or Cholesky factorization)
Random Vectors 5-34
Functions of random vectors
• minimum and maximum of random variables
• general transformation
• affine transformation
Random Vectors 5-35
Minimum and Maximum of RVs
let X1,X2, . . . ,Xn be independent RVs
define the minimum and maximum of RVs by
Y = min(X1,X2, . . . ,Xn), Z = max(X1, X2, . . . , Xn)
the maximum of X1,X2, . . . ,Xn is less than z iff Xi ≤ z for all i, so
FZ(z) = P (X1 ≤ z)P (X2 ≤ z) · · ·P (Xn ≤ z) = (FX(z))n
the minimum of X1, X2, . . . , Xn is greater than y iff Xi ≥ y for all i, so
1− FY (y) = P (X1 > y)P (X2 > y) · · ·P (Xn > y) = (1− FX(y))n
andFY (y) = 1− (1− FX(y))n
Random Vectors 5-36
Example: Merging of independent Poisson arrivals
...Source 1 Source 2 Source n
Server
web page request
• Ti denotes the interarrival times for source i
• Ti has exponential distribution with rate λi
• find the distribution of the interarrival times between consecutiverequests at server
Random Vectors 5-37
each Ti satisfies the memoryless property, so the time that has elapsedsince the last arrival is irrelevant
the time until the next arrival at the multiplexer is
Z = min(T1, T2, . . . , Tn)
therefore, the cdf of Z can be computed by:
1− FZ(z) = P (min(T1, T2, . . . , Tn) > z)
= P (T1 > z)P (T2 > z) · · ·P (Tn > z)
= (1− FT1(z))(1− FT2(z)) · · · (1− FTn(z))
= e−λ1z · · · e−λnz = e−(λ1+λ2+···+λn)z
the interarrival time is an exponential RV with rate λ1 + λ2 + · · ·+ λn
Random Vectors 5-38
General transformation
let X be a vector random variable
define Z = g(X) : Rn → Rn and assume that g is invertible
so that for Z = z we can solve for x uniquely:
x = g−1(z)
then the joint pdf of Z is given by
fZ(z) =fX(g
−1(z))
|detJ |
where detJ is the determinant of the Jacobian matrix:
J =
∂g1/∂x1 · · · ∂g1/∂xn... . . . ...
∂gn/∂x1 · · · ∂gn/∂xn
Random Vectors 5-39
Affine transformation
if X is a continuous random vector and A is an invertible matrix
then Y = AX+ b has pdf
fY(y) =1
|det(A)|fX(A−1(y − b))
Gaussian case: let X ∼ N (0,Σ)
fY(y) =1
(2π)n/2| detA||Σ|1/2 exp − 1
2(y − b)TA−TΣ−1A−1(y − b)
=1
(2π)n/2|AΣAT |1/2 exp − 1
2(y − b)T (AΣAT )−1(y − b)
we read off that Y is also Gaussian with mean b and covariance AΣAT
this agrees with the result in page 5-16 and 5-27
Random Vectors 5-40
Example: Sum of jointly Gaussian
a special case of linear transformation is
Z = a1X1 + a2X2 + · · ·+ anXn
where X1, . . . , Xn are jointly Gaussian
Z can be written as
Z =[
a1 · · · an]
X1...
Xn
, AX
Z is simply a linear transformation of a Gaussian
Random Vectors 5-41
therefore, Z is Gaussian with mean
E[Z] = Aµ =∑
i=1
aiE[Xi]
and variance
var(Z) = cov(Z) = AΣAT =n∑
i=1
n∑
j=1
aiaj cov(Xi,Xj)
if X1, . . . ,Xn are independent Gaussian, i.e.,
cov(Xi,Xj) = 0
then the variance of Z is reduced to
var(Z) =n∑
i=1
a2i cov(Xi, Xi) =n∑
i=1
a2i var(Xi)
Random Vectors 5-42
References
Chapter 6 inA. Leon-Garcia, Probability, Statistics, and Random Processes for
Electrical Engineering, 3rd edition, Pearson Prentice Hall, 2009
Random Vectors 5-43
EE401 (Semester 1) Jitkomut Songsiri
6. Sums of Random Variables
• mean and variance
• PDF of sums of independent RVs
• laws of large numbers
• central limit theorems
6-1
Mean and Variance
let X1,X2, . . . ,Xn be a sequence of RVs
regardless of statistical dependence, we have
E[X1 +X2 + · · ·+Xn] = E[X1] + E[X2] + · · ·+ E[Xn]
the variance of a sum of RVs is, however, NOT equal to the sum ofvariances
var(X1 +X2 + · · ·+Xn) =n∑
k=1
var(Xk) +n∑
j=1
n∑
k=1
cov(Xj,Xk)
If X1, X2, . . . , Xn are uncorrelated, then
var(X1 +X2 + · · ·+Xn) = var(X1) + var(X2) + · · ·+ var(Xn)
Sums of Random Variables 6-2
PDF of sums of independent RVs
consider the sum of n independent RVs
Sn = X1 +X2 + · · ·+Xn
the characteristic function of Sn is
ΦS(ω) = E[ejωSn] = E[ejω(X1+X2+···+Xn)]
= E[ejωX1] · · ·E[ejωXn]
= ΦX1(ω) · · ·ΦXn(ω)
thus the pdf of Sn is found by finding the inverse Fourier of ΦS(ω):
fS(X) = F−1[ΦX1(ω) · · ·ΦXn(ω)]
Sums of Random Variables 6-3
Example
find the pdf of a sum of n independent exponential RVs
all exponential variables have parameter α
the characteristic function of a single exponential RV is
ΦX(ω) =α
α− jω
the characteristic function of the sum is
ΦS(ω) =
(
α
α− jω
)n
we see that Sn is an n-Erlang RV
Sums of Random Variables 6-4
Sample mean
let X be an RV with E[X ] = µ (unknown)
X1, X2, . . . , Xn denote n independent, repeated measurements of X
Xj’s are independent, identically distributed (i.i.d.) RVs
the sample mean of the sequences is used to estimate E[X ]:
Mn =1
n
n∑
j=1
Xj
two statistical quantities for characterizing the sample mean’s properties:
• E[Mn]: we say Mn is unbiased if E[Mn] = µ
• var(Mn): we examine this value when n is large
Sums of Random Variables 6-5
the sample mean is an unbiased estimator for µ:
E[Mn] = E
1
n
n∑
j=1
Xj
=1
n
n∑
j=1
E[Xj] = µ
suppose var(X) = σ2 (true variance)
since Xj’s are i.i.d, the variance of Mn is
var(Mn) =1
n2
n∑
j=1
var(Xj) =nσ2
n2=
σ2
n
hence, the variance of the sample mean approaches zero as the number ofsamples increases
Sums of Random Variables 6-6
Weak Law of Large Numbers
let X1,X2, . . . ,Xn be a sequence of iid RVs with finite mean E[X ] = µand variance σ2
for any ǫ > 0,limn→∞
P [ |Mn − µ| < ǫ ] = 1
• for large enough n, the sample mean will be close to the true mean withhigh probability
• Proof. apply Chebyshev inequality:
P [|Mn − µ| ≥ ǫ] ≤ σ2
nǫ2=⇒ P [|Mn − µ| < ǫ] ≥ 1− σ2
nǫ2
Sums of Random Variables 6-7
scattergram of 1000 realizations of the sample mean
−0.4 −0.2 0 0.2 0.4−0.4
−0.2
0
0.2
0.4
−0.4 −0.2 0 0.2 0.4−0.4
−0.2
0
0.2
0.4
−0.4 −0.2 0 0.2 0.4−0.4
−0.2
0
0.2
0.4
−0.4 −0.2 0 0.2 0.4−0.4
−0.2
0
0.2
0.4
x1x1
x1x1
x2
x2
x2
x2
n = 100 n = 1000
n = 10000 n = 100000
• Mn’s are computed from 2-dimensional Gaussian with zero mean
• as n increases, the probability of Mn’s are concentrated at zero is high
Sums of Random Variables 6-8
Strong Law of Large Numbers
let X1,X2, . . . ,Xn be a sequence of iid RVs with finite mean E[X ] = µand finite variance, then
P [ limn→∞
Mn = µ] = 1
• Mk is the sequence of sample mean computed using X1 through Xk
• with probability 1, every sequence of sample mean calculations willeventually approach and stay close to E[X ] = µ
• the strong law implies the weak law
Sums of Random Variables 6-9
Central Limit Theorem
let X1,X2, . . . ,Xn be a sequence of iid RVs with
finite mean E[X ] = µ and finite variance σ2
let Sn be the sum of the first n RVs in the sequences:
Sn = X1 +X2 + · · ·+Xn
and define
Zn =Sn − nµ
σ√n
then
limn→∞
P (Zn ≤ z) =1√2π
∫ z
−∞e−x2/2dx
as n becomes large, the CDF of normalized Sn approaches Gaussiandistribution
Sums of Random Variables 6-10
Proof of Central Limit Theorem
first note that
Zn =Sn − nµ
σ√n
=1
σ√n
n∑
k=1
(Xk − µ)
the characteristic function of Zn is given by
ΦZn(ω) = E[ejωZn] = E
[
expjω
σ√n
n∑
k=1
(Xk − µ)
]
= E
[
n∏
k=1
ejω(Xk−µ)/σ√n
]
=(
E[ejω(X−µ)/σ√n])n
(using the fact that Xk’s are iid)
Sums of Random Variables 6-11
expanding the exponential expression gives
E[ejω(X−µ)/σ√n] = E
[
1 +jω
σ√n(X − µ) +
(jω)2
2!nσ2(X − µ)2 + . . .
]
≈ 1− ω2
2n
(the higher order term can be neglected as n becomes large)
then we obtain
ΦZn(ω) →(
1− ω2
2n
)n
→ e−ω2/2, as n → ∞
Sums of Random Variables 6-12
0 5 10 15 20 250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sum of 5 exponential RVs
Gaussian
x
F(x)
n = 5
20 30 40 50 60 70 800
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sum of 50 exponential RVs
Gaussian
x
F(x)
n = 50
• blue lines are the CDF of the sum of n exponential RVs with λ = 1where n = 5 (left) and n = 50 (right)
• red dashed line is the CDF of a Gaussian RV with the same mean (n/λ)and variance n/λ2
• as n increases, the CDF approaches that of Gaussian distribution
Sums of Random Variables 6-13
0 1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sum of 5 Bernoulli RVs
Gaussian
x
F(x)
n = 5
0 5 10 15 20 250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sum of 25 Bernoulli RVs
Gaussian
x
F(x)
n = 25
• blue lines are the CDFs of the sum of n Bernoulli RVs with p = 1/2where n = 5 (left) and n = 25 (right)
• red dashed line is the CDF of a Gaussian with mean np and variancenp(1− p)
Sums of Random Variables 6-14
Example
the time between events is iid exponential RVs with mean m sec
find the probability that the 1000th event occurs in time interval(1000± 50)m
• Xj is the time between events
• Sn is the time of the nth event (then Sn = X1 +X2 + · · ·+Xn)
• E[Sn] = nm and var(Sn) = nm2
the CLT gives
P (950m ≤ S1000 ≤ 1050m)
= P
(
950m− 1000m
m√1000
≤ Zn ≤ 1050m− 1000m
m√1000
)
≈ Φ(1.58)− Φ(−1.58)
Sums of Random Variables 6-15
References
Chapter 7 inA. Leon-Garcia, Probability, Statistics, and Random Processes for
Electrical Engineering, 3rd edition, Pearson Prentice Hall, 2009
Sums of Random Variables 6-16