btry 4080 / stsci 4080, fall 2009statistics.rutgers.edu/home/pingli/4080/lecture.pdf · •...
Post on 06-Nov-2020
8 Views
Preview:
TRANSCRIPT
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 1
BTRY 4080 / STSCI 4080, Fall 2009
Theory of Probability
Instructor: Ping Li
Department of Statistical Science
Cornell University
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 2
General Information
• Lectures: Tue, Thu 10:10-11:25 am, Malott Hall 253
• Section 1: Mon 2:55 - 4:10 pm, Warren Hall 245
• Section 2: Wed 1:25 - 2:40 pm, Warren Hall 261
• Instructor: Ping Li, pingli@cornell.edu,
Office Hours: Tue, Thu 11:25 am -12 pm, 1192, Comstock Hall
• TA: Xiao Luo, lx42@cornell.edu. Office hours:
(1) Mon, 4:10 - 5:10pm Warren Hall 245;
(2) Wed, 2:40 - 3:40pm, Warren Hall 261.
• Prerequisites: Math 213 or 222 or equivalent; a course in statistical methods
is recommended
• Textbook: Ross, A First Course in Probability, 7th edition, 2006
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 3
• Exams: Locations TBD
– Prelim 1: In Class, September 29, 2009
– Prelim 2: In Class, October 20, 2009
– Prelim 3: In Class, TBD
– Final Exam: Wed. December 16, 2009, from 2pm to 4:30pm.
– Policy: Close book, close notes
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 4
• Homework: Weekly
– Please turn in your homework either in class or to BSCB front desk
(Comstock Hall, 1198).
– No late homework will be accepted.
– Before computing your overall homework grade, the assignment with the
lowest grade (if ≥ 25%) will be dropped, the one with the second lowest
grade (if ≥ 50%) will also be dropped.
– It is the students’ responsibility to keep copies of the submitted homework.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 5
• Grading:
1. The homework: 30%
2. The prelim with the lowest grade: 5%
3. The other two prelims: 15% each
4. The final exam 35%
• Course Letter Grade Assignment
A ≈ 90% (in the absolute scale)
C ≈ 60% (in the absolute scale)
In borderline cases, participation in section and class interactions will be used as
a determining factor.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 6
Material
Topic Textbook
Combinatorics 1.1 - 1.6
Axioms of Probability 2.1 - 2.5
Conditional Probability and Independence 3.1 - 3.5
Discrete Random Variables 4.1 - 4.9
Continuous Random Variables 5.1 - 5.7
Jointly Distributed Random Variables 6.1 - 6.7
Expectation 7.1 - 7.9
As time permits: Limit Theorems 8.1 - 8.6,
Poisson Processes and Markov Chains 9.1 - 9.2 Simulation 10.1 - 10.4
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 7
Probability Is Increasingly Popular & Truly Useful
An Example: Probabilistic Counting
Consider two sets S1, S2 ∈ Ω = 0, 1, 2, ..., D − 1, for example
D = 2000
S1 = 0, 15, 31, 193, 261, 495, 563, 919S2 = 5, 15, 19, 193, 209, 325, 563, 1029, 1921
Q: What is the size of intersection a = |S1 ∩ S2|?
A: Easy! a = |15, 193, 563| = 3.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 8
Challenges
• What if the size of space D = 264?
• What if there are 10 billion of sets, instead of just 2?
• What if we only need approximate answers?
• What if we need the answers really quickly?
• What if we are mostly interested in sets that are highly similar?
• What if the sets are dynamic and frequently updated?
• ...
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 9
Search Engines
Google/Yahoo!/Bing all collected at least 1010 (10 billion) web pages.
Two examples (among many tasks)
• (Very quickly) Determine if two Web pages are nearly duplica tes
A document can be viewed as a collection (set) of words or contiguous words.
This is a counting of set intersection problem.
• (Very quickly) Determine the number of co-occurrences of tw o words
A word can be viewed as a set of documents containing that word
This is also a counting of set intersection problem.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 10
Term Document Matrix
0
Word n
Word 4
Word 2
Word 1
Word 3
Doc 1 Doc 2 Doc m
1 0
10
0 1 0
0 1 0 0 1 0
10
≥ 1010 documents (Web pages).
≥ 105 common English words. (What about total number of contiguous words?)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 11
One Method for Probabilistic Counting of Intersections
Consider two sets S1, S2 ⊂ Ω = 0, 1, 2, ..., D − 1
S1 = 0, 15, 31, 193, 261, 495, 563, 919S2 = 5, 15, 19, 193, 209, 325, 563, 1029, 1921
One can first generate a random permutation π:
π : Ω = 0, 1, 2, ..., D − 1 −→ 0, 1, 2, ..., D − 1
Apply π to sets S1 and S2 (and all other sets). For example
π (S1) = 13, 139, 476, 597, 698, 1355, 1932, 2532π (S2) = 13, 236, 476, 683, 798, 1456, 1968, 2532, 3983
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 12
Store only the minimums
minπ (S1) = 13, minπ (S2) = 13
Suppose we can theoretically calculate the probability
P = Pr (minπ (S1) = minπ (S2))
S1 and S2 are more similar =⇒ larger P (i.e., closer to 1).
Generate (small) k random permutations and every times store the minimums.
Count the number of matches to estimate the similarity between S1 and S2.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 13
Later, we can easily show that
P = Pr (minπ (S1) = minπ (S2)) =|S1 ∩ S2||S1 ∪ S2|
=a
f1 + f2 − a
where a = |S1 ∩ S2|, f1 = |S1|, f2 = |S2|.
Therefore, we can estimate P as a binomial probability from k permutations.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 14
Combinatorial Analysis
Ross: Chapter 1, Sections 1.1 - 1.6
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 15
Basic Principles of Counting, Section 1.2
1. The multiplicative principle
A project requires two sub-tasks, A and B.
There are m ways to complete A, and there are n ways to complete B.
Q: How many different ways to complete this project?
2. The additive principle
A project can be completed either by doing task A, or by doing task B.
There are m ways to complete A, and there are n ways to complete B.
Q: How many different ways to complete this project?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 16
The Multiplicative Principle
2
A B
DestinationOrigin
1
m
2
1
n
Q: How many different routes from the origin to reach the destination?
Route 1: A:1 + B:1Route 2: A:1 + B:2
...
Route n: A:1 + B:nRoute n+1: A:2 + B:1
...
Route ? A:m + B:n
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 17
The additive Principle
Destination
A
1
m
2
B1
n
2
Origin
Q: How many different routes from the origin to reach the destination?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 18
Example :
A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors,
and 2 seniors. A subcommittee of 4, consisting of 1 person from each class, is to
be chosen.
Q: How many different subcommittees are possible?
—————–
Multiplicative or additive?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 19
Example :
In a 7-place license plate, the first 3 places are to be occupied by letters and the
final 4 by numbers, e.g., ITH-0827.
Q: How many different 7-place license plates are possible?
—————–
Multiplicative or additive?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 20
Example :
How many license plates would be possible if repetition among letters or numbers
were prohibited?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 21
Permutations and Combinations, Section 1.3, 1.4
Need to select 3 letters from 26 English letter, to make a license plate.
Q: How many different arrangements are possible?
———————
The answer differs if
• Order of 3 letters matters =⇒ Permutation
e.g., ITH, TIH, HIT, are all different.
• Order of 3 letters does not matter =⇒ Combination
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 22
Permutations
Basic formula :
Suppose we have n objects. How many different permutations are there?
n × (n − 1) × (n − 2) × ... × 1 = n!
Why?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 23
...
1 2 3 4 n
n choices n−1 choices n−2 choices 1 choice
...
...
Which counting principle?: Multiplicative? additive?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 24
Example:
A baseball team consists of 9 players.
Q: How many different batting orders are possible?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 25
Example:
A class consists of 6 men and 4 women. The students are ranked according to
their performance in the final, assuming no ties.
Q: How many different rankings are possible?
Q: If the women are ranked among themselves and the men among themselves
how many different rankings are possible?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 26
Example:
How many different letter arrangements can be formed using P,E,P ?
The brute-force approach
• Step 1: Permutations assuming P1 and P2 are different.
P1 P2 E P1 E P2
P2 P1 E P2 E P1
E P1 P2 E P2 P1
• Step 2: Removing duplicates assuming P1 and P2 are the same.
P P E P E P E P P
Q: How many duplicates?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 27
Example:
How many different letter arrangements can be formed using P,E,P,P,E,R ?
Answer:
6!
3! × 2! × 1!= 60
Why?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 28
A More General Example of Permutations:
How many different permutations of n objects, of which n1 are identical, n2 are
identical, ..., nr are identical?
• Step 1: Assuming all n objects are distinct, there are n! permutations.
• Step 2: For each permutation, there are n1! × n2! × ... × nr! ways to
switch objects and still remain the same permutation.
• Step 3: The total number of (unique) permutations are
n!
n1! × n2! × ... × nr!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 29
111 2 3 4 5 6 7 8 9 10
How many ways to switch the elements and remain the same permutation?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 30
Example:
A signal consists of 9 flags hung in a line, which can be made from a set of 4
white flags, 3 red flags, and 2 blue flags. All flags of the same color are identical.
Q: How many different signals are possible?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 31
Combinations
Basic question of combination: The number of different groups of r
objects that could be formed from a total of n objects.
Basic formula of combination:(
n
r
)
=n!
(n − r)! × r!
Pronounced as “n-choose-r”
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 32
Basic Formula of Combination
Step 1: When orders are relevant, the number of different groups of r objects
formed from n objects = n × (n − 1) × (n − 2) × ... × (n − r + 1)
n−r+1 choices
1 2 3 4
n choices n−1 choices n−2 choices
...
...
... r
Step 2: When orders are NOT relevant, each group of r objects can be
formed by r! different ways.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 33
Step 3: Therefore, when orders are NOT relevant, the number of different groups
of r objects would be
n × (n − 1) × ... × (n − r + 1)
r!
=n × (n − 1) × ... × (n − r + 1)×(n − r) × (n − r − 1) × ... × 1
r!×(n − r) × (n − r − 1) × ... × 1
=n!
(n − r)! × r!
=
(
n
r
)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 34
Example:
A committee of 3 is to be formed from a group of 20 people.
Q: How many different committees are possible?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 35
Example:
How many different committees consisting of 2 women and 3 men can be formed,
from a group of 5 women and 7 men?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 36
Example (1.4.4c):
Consider a set of n antennas of which m are defective and n−m are functional.
Q: How many linear orderings are there in which no two defectives are
consecutive?
A:(
n−m+1m
)
.
Line up n − m functional antennas. Insert (at most) one defective antenna
between two functional antennas. Considering the boundaries, there are in total
n − m + 1 places to insert defective antennas.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 37
Useful Facts about Binomial Coefficient(
nr
)
1.(
n0
)
= 1, and(
nn
)
= 1. Recall 0! = 1.
2.(
nr
)
=(
nn−r
)
3.(
nr
)
=(
n−1r−1
)
+(
n−1r
)
Proof 1 by algebra(
n−1r−1
)
+(
n−1r
)
= (n−1)!(n−r)!(r−1)! + (n−1)!
(n−r−1)!r! = (n−1)!(r+n−r)(n−r)!r!
Proof 2 by a combinatorial argument
A selection of r objects either contains object 1 or does not contain object 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 38
4. Some computational tricks
• Avoid factorial n!, which is easily very large. 170! = 7.2574 × 10306
• Take advantage of(
nr
)
=(
nn−r
)
• Convert products to additions(
n
r
)
=n(n − 1)(n − 2)...(n − r + 1)
r!
= exp
r−1∑
j=0
log(n − j) −r∑
j=1
log j
• Use accurate approximations of the log factorial function (log n!). Check
http://en.wikipedia.org/wiki/Factorial.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 39
The Binomial Theorem
The Binomial Theorem:
(x + y)n
=
n∑
k=0
(
n
k
)
xkyn−k
Example for n = 2:
(x + y)2 = x2 + 2xy + y2,(
20
)
=(
22
)
= 1,(
21
)
= 2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 40
Proof of the Binomial Theorem by Induction
Two key components in proof by induction:
1. Verify a base case, e.g., n = 1.
2. Assume the case n − 1 is true, prove for the case n.
Proof:
The base case is trivially true (but always check ! !)
Assume
(x + y)n−1 =n−1∑
k=0
(
n − 1
k
)
xkyn−1−k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 41
Then
n∑
k=0
(
n
k
)
xkyn−k =n−1∑
k=1
(
n
k
)
xkyn−k + xn + yn
=n−1∑
k=1
(
n − 1
k
)
xkyn−k +n−1∑
k=1
(
n − 1
k − 1
)
xkyn−k + xn + yn
=yn−1∑
k=0
(
n − 1
k
)
xkyn−1−k − yn +n−1∑
k=1
(
n − 1
k − 1
)
xkyn−k + xn + yn
=y(x + y)n−1 +n−2∑
j=0
(
n − 1
j
)
xj+1yn−1−j + xn, (let j = k − 1)
=y(x + y)n−1 + xn−1∑
j=0
(
n − 1
j
)
xjyn−1−j − xn + xn
=y(x + y)n−1 + x(x + y)n−1 = (x + y)n
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 42
Multinomial Coefficients
A set of n distinct items is to be divided into r distinct groups of respective size
n1, n2, ..., nr , where∑r
i=1 ni = n.
Q: How many different divisions are possible?
Answer:(
n
n1, n2, ..., nr
)
=n!
n1!n2!...nr!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 43
Proof
Step 1: There are(
nn1
)
ways to form the first group of size n1.
Step 2: There are(
n−n1
n2
)
ways to form the second group of size n2.
...
Step r: There are(
n−n1−n2−...−nr−1
nr
)
ways to form the last group of size nr .
Therefore, the total number of divisions should be(
n
n1
)
×(
n − n1
n2
)
× ... ×(
n − n1 − n2 − ... − nr−1
nr
)
=n!
(n − n1)!n1!× (n − n1)!
(n − n1 − n2)!n2!× ... × (n − n1 − ... − nr−1)!
(n − n1 − ...nr−1 − nr)!nr!
=n!
n1!n2!...nr!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 44
Example:
A police department in a small city consists of 10 officers. The policy is to have 5
patrolling the streets, 2 working full time at the station, and 3 on reserve at the
station.
Q: How many different divisions of 10 officers into 3 groups are possible?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 45
The Number of Integer Solutions of Equations
Suppose x1, x2, ..., xr , are positive, and x1 + x2 + ... + xr = n.
Q: How many distinct vectors (x1, x2, ..., xr) can satisfy the constraints.
A:(
n − 1
r − 1
)
Line up n 1’s. Draw r − 1 lines from in total n − 1 places to form r groups
(numbers).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 46
Suppose x1, x2, ..., xr , are non-negative, and x1 + x2 + ... + xr = n.
Q: How many distinct vectors (x1, x2, ..., xr) can satisfy the constraints.
A:(
n + r − 1
r − 1
)
,
by padding r zeros after n one’s.
One can also let yi = xi + 1, i = 1 to r. Then the problem becomes finding
positive integers (y1, ..., yr) so that y1 + y2 + ... + yr = n + r.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 47
Example:
8 identical blackboards are to be divided among 4 schools.
Q (a): How many divisions are possible?
Q (b): How many, if each school must receive at least 1 blackboard?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 48
More Examples for Chapter 1
From a group of 8 women and 6 men a committee consisting of 3 men and 3
women is to be formed.
(a): How many different committees are possible?
(b): How many, if 2 of the men refuse to serve together?
(c): How many, if 2 of the women refuse to serve together?
(d): How many, if 1 man and 1 woman refuse to serve together?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 49
Solution:
(a): How many different committees are possible?
(
83
)
×(
63
)
= 56 × 20 = 1120
(b): How many, if 2 of the men refuse to serve together?
(
83
)
×[(
43
)
+(
21
)(
42
)]
= 56 × 16 = 896
(c): How many, if 2 of the women refuse to serve together?
[(
63
)
+(
21
)(
62
)]
×(
63
)
= 50 × 20 = 1000
(d): How many, if 1 man and 1 woman refuse to serve together?
(
73
)(
53
)
+(
73
)(
52
)
+(
72
)(
53
)
= 350 + 350 + 210 = 910
Or(
83
)(
63
)
−(
72
)(
52
)
= 910
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 50
Fermat’s Combinatorial Identity
Theoretical Exercise 11.
(
n
k
)
=n∑
i=k
(
i − 1
k − 1
)
, n ≥ k
There is an interesting proof (different from the book) using the result for the
number of integer solutions of equations, by a recursive argument.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 51
Proof:
Let f(m, r) be the number of positive integer solutions, i.e.,
x1 + x2 + ... + xr = m.
We know f(m, r) =(
m−1r−1
)
.
Suppose m = n + 1 and r = k + 1, Then
f(n + 1, k + 1) =
(
m − 1
r − 1
)
=
(
n
k
)
x1 can take values from 1 to (n + 1) − k, because xi ≥ 1, i = 1 to k + 1.
Decompose the problem into (n + 1) − k sub-problems by fixing a value for x1.
If x1 = 1, then x2 + x3 + ... + xk+1 = m − 1 = n.
The total number of positive integer solutions is f(n, k).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 52
If x1 = 1, there are still f(n, k) possibilities, i.e.,(
n−1k−1
)
.
If x1 = 2, there are still f(n − 1, k) possibilities, i.e.,(
n−2k−1
)
...
If x1 = (n + 1) − k, there are still f(k, k) possibilities, i.e.,(
k−1k−1
)
Thus, by additive counting principle,
f(n + 1, k + 1) =
n+1−k∑
j=1
f(n + 1 − j, k) =
n+1−k∑
j=1
(
n − j
k − 1
)
But, is this what we want? Yes!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 53
By additive counting principle,
f(n + 1, k + 1) =n+1−k∑
j=1
f(n + 1 − j, k)
=
n+1−k∑
j=1
(
n − j
k − 1
)
=
(
n − 1
k − 1
)
+
(
n − 2
k − 1
)
+ ... +
(
k − 1
k − 1
)
=
(
k − 1
k − 1
)
+
(
k
k − 1
)
+ ... +
(
n − 1
k − 1
)
=n∑
i=k
(
i − 1
k − 1
)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 54
Example: A company has 50 employees, 10 of whom are in management
positions.
Q: How many ways are there to choose a committee consisting of 4 people, if
there must be at least one person of management rank?
A:
(
50
4
)
−(
40
4
)
= 138910 =4∑
k=1
(
10
k
)(
40
4 − k
)
———-
Can we also compute the answer using(
101
)(
493
)
= 184240? No.
It is not correct because the two sub-tasks,(
101
)
and(
493
)
, are overlapping; and
hence we can not simply apply the multiplicative principle.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 55
Axioms of Probability
Chapter 2, Sections 2.1 - 2.5, 2.7
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 56
Section 2.1
In this chapter we introduce the concept of the probability of an event and then
show how these probabilities can be computed in certain situations. As a
preliminary, however, we need the concept of the sample space and the events of
an experiment.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 57
Sample Space and Events, Section 2.2
Definition of Sample Space:
The sample space of an experiment, denoted by S, is the set of all possible
outcomes of the experiment.
Example:
S = girl, boy, if the output of an experiment consists of the determination of
the gender of a newborn.
Example:
If the experiment consists of flipping two coins (H/T), the the sample space
S = (H, H), (H, T ), (T, H), (T, T )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 58
Example:
If the outcome of an experiment is the order of finish in a race among 7 horses
having post positions, 1, 2, 3, 4, 5, 6, 7, then
S = all 7! permutations of (1, 2, 3, 4, 5, 6, 7)
Example:
If the experiment consists of tossing two dice, then
S = (i, j), i, j = 1, 2, 3, 4, 5, 6, What is the size |S|?
Example:
If the experiment consists of measuring (in hours) the lifetime of a transistor, then
S = t : 0 ≤ t ≤ ∞.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 59
Definition of Events:
Any subset E of the sample space S is an event.
Example:
S = (H, H), (H, T ), (T, H), (T, T ),
E = (H, H) is an event. E = (H, T ), (T, T ) is also an event
Example:
S = t : 0 ≤ t ≤ ∞,
E = t : 1 ≤ t ≤ 5 is an event,
But t : −10 ≤ t ≤ 5 is NOT an event.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 60
Set Operations on Events
Let A, B, C be events of the same sample space S
A ⊂ S, B ⊂ S, C ⊂ S. Notation for subset: ⊂
Set Union:
A ∪ B = the event which occurs if either A OR B occurs.
Example:
S = 1, 2, 3, 4, 5, ... = the set of all positive integers.
A = 1, B = 1, 3, 5, C = 2, 4, 6,
A ∪ B = 1, 3, 5, B ∪ C = 1, 2, 3, 4, 5, 6,
A ∪ B ∪ C = 1, 2, 3, 4, 5, 6
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 61
Set Intersection:
A ∩ B = AB = the event which occurs if both A AND B occur.
Example:
S = 1, 2, 3, 4, 5, ... = the set of all positive integers.
A = 1, B = 1, 3, 5, C = 2, 4, 6,
A ∩ B = 1, B ∩ C = = ∅, A ∩ B ∩ C = ∅
Mutually Exclusive:
If A ∩ B = ∅, then A and B are mutually exclusive.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 62
Set Complement:
Ac = S − A = the event which occurs only if A does not occur.
Example:
S = 1, 2, 3, 4, 5, ... = the set of all positive integers.
A = 1, 3, 5, ... = the set of all positive odd integers.
B = 2, 4, 6, ... = the set of all positive even integers.
Ac = S − A = B, Bc = S − B = A.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 63
Laws of Set Operations
Commutative Laws:
A ∪ B = B ∪ A A ∩ B = B ∩ A
Associative Laws:
(A ∪ B) ∪ C = A ∪ (B ∪ C) = A ∪ B ∪ C
(A ∩ B) ∩ C = A ∩ (B ∩ C) = A ∩ B ∩ C = ABC
Distributive Laws:
(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) = AC ∪ BC
(A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)
These laws can be verified by Venn diagrams.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 64
DeMorgan’s Laws
(
n⋃
i=1
Ei
)c
=n⋂
i=1
Eci
(
n⋂
i=1
Ei
)c
=n⋃
i=1
Eci
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 65
Proof of (⋃n
i=1 Ei)c
=⋂n
i=1 Eci
Let A = (⋃n
i=1 Ei)c
and B =⋂n
i=1 Eci
1. Prove A ⊆ B
Suppose x ∈ A. =⇒ x does not belong to any Ei.
=⇒ x belongs to every Eci . =⇒ x ∈ ⋂n
i=1 Eci = B.
2. Proof B ⊆ A
Suppose x ∈ B. =⇒ x belongs to every Eci .
=⇒ x does not belong to any Ei. =⇒ x ∈ (⋃n
i=1 Ei)c
= A.
3. Consequently A = B
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 66
Proof of (⋂n
i=1 Ei)c
=⋃n
i=1 Eci
Let Fi = Eci .
We have proved (⋃n
i=1 Fi)c
=⋂n
i=1 F ci
(
n⋂
i=1
Ei
)c
=
(
n⋂
i=1
F ci
)c
=
((
n⋃
i=1
Fi
)c)c
=n⋃
i=1
Fi =n⋃
i=1
Eci
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 67
Axioms of Probability, Section 2.3
For each event E in the sample space S, there exists a value P (E), referred to
as the probability of E.
P (E) satisfies three axioms.
Axiom 1: 0 ≤ P (E) ≤ 1
Axiom 2: P (S) = 1
Axiom 3: For any sequence of mutually exclusive events, E1, E2, ..., (that
is, EiEj = ∅ whenever i 6= j), then
P
( ∞⋃
i=1
Ei
)
=∞∑
i=1
P (Ei)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 68
Consequences of Three Axioms
Consequence 1: P (∅) = 0.
P (S) = P (S ∪ ∅) = P (S) + P (∅) =⇒ P (∅) = 0
Consequence 2:
If S =⋃n
i=1 Ei and all mutually-exclusive events Ei are equally likely, meaning
P (E1) = P (E2) = ... = P (En), then
P (Ei) =1
n, i = 1 to n
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 69
Some Simple Propositions, Section 2.4
Proposition 1: P (Ec) = 1 − P (E)
Proof: S = E ∪ Ec =⇒ 1 = P (S) = P (E) + P (Ec)
Proposition 2: If E ⊂ F , then P (E) ≤ P (F )
Proof:
E ⊂ F =⇒ F = E ∪ (Ec ∩ F )
=⇒ P (F ) = P (E) + P (Ec ∩ F ) ≥ P (E)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 70
Proposition 3: P (E ∪ F ) = P (E) + P (F ) − P (EF )
Proof:
E ∪F = (E −EF )∪(F −EF )∪EF , union of three mutually exclusive sets.
=⇒ P (E ∪ F ) = P (E − EF ) + P (F − EF ) + P (EF )
= P (E) − P (EF ) + P (F ) − P (EF ) + P (EF )
= P (E) + P (F ) − P (EF )
Note that E = EF ∪ (E − EF ), union of two mutually exclusive sets.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 71
A Generalization of Proposition 3:
P (E ∪ F ∪ G)
=P (E) + P (F ) + P (G) − P (EF ) − P (EG) − P (FG) + P (EFG)
Proof:
Let B = F ∪ G. Then P (E ∪ B) = P (E) + P (B) − P (EB)
P (B) = P (F ∪ G) = P (F ) + P (G) − P (FG)
P (EB) = P (E(F ∪ G)) = P (EF ∪ EG) =
P (EF ) + P (EG) − P (EF EG) = P (EF ) + P (EG) − P (EFG)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 72
Proposition 4: P (E1 ∪ E2 ∪ ... ∪ En) = ?
Read this proposition and its proof yourself.
We will temporarily skip this proposition and relevant examples 5m, 5n, and 5o.
However, we will come back to those material at a later time.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 73
Example:
A total of 36 members of a club play tennis, 28 play squash, and 18 play
badminton. Furthermore, 22 of the members play both tennis and squash, 12
play both tennis and badminton, 9 play both squash and badminton, and 4 play all
three sports.
Q: How many members of this club play at least one of these sports?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 74
Solution:
A = members playing tennis B = members playing squash C = members playing badminton |A ∪ B ∪ C| =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 75
Sample Space Having Equally Likely Outcomes, Sec. 2.5
Assumption: In an experiment, all outcomes in the sample space S are equally
likely to occur.
For example, S = 1, 2, 3, ..., N. Assuming equally likely outcomes, then
P (i) =1
N, i = 1 to N
For any event E:
P (E) =Number of outcomes in E
Number of outcomes in S=
|E||S|
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 76
Example: If two dice are rolled, what is the probability that the sum of the
upturned faces will equal 7?
S = (i, j), i, j = 1, 2, 3, 4, 5, 6
E = ??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 77
Example: If 3 balls are randomly drawn from a bag containing 6 white and 5
black balls, what is the probability that one of the drawn balls is white and the
other two black?
S = ??
E = ??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 78
Example: An urn contains n balls, of which one is special. If k of these balls
are withdrawn randomly, which is the probability that the special ball is chosen?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 79
Revisit Probabilistic Counting of Set Intersections
Consider two sets S1, S2 ⊂ Ω = 0, 1, 2, ..., D − 1. For example,
S1 = 0, 15, 31, 193, 261, 495, 563, 919S2 = 5, 15, 19, 193, 209, 325, 563, 1029, 1921
One can first generate a random permutation π:
π : Ω = 0, 1, 2, ..., D − 1 −→ 0, 1, 2, ..., D − 1
Apply π to sets S1 and S2. For example
π (S1) = 13, 139, 476, 597, 698, 1355, 1932, 2532π (S2) = 13, 236, 476, 683, 798, 1456, 1968, 2532, 3983
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 80
Store only the minimums
minπ (S1) = 13, minπ (S2) = 13
Suppose we can theoretically calculate the probability
P = Pr (minπ (S1) = minπ (S2))
S1 and S2 are more similar =⇒ larger P (i.e., closer to 1).
Generate (small) k random permutations and every times store the minimums.
Count the number of matches to estimate the similarity between S1 and S2.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 81
Now, we can easily (How? ) show that
P = Pr (minπ (S1) = minπ (S2)) =|S1 ∩ S2||S1 ∪ S2|
=a
f1 + f2 − a
where a = |S1 ∩ S2|, f1 = |S1|, f2 = |S2|.
Therefore, we can estimate P as a binomial probability from k permutations.
How many permutations (value of k) we need?:
Depend on the variance of this binomial distribution.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 82
Poker Example (2.5.5f): A poker hand consists of 5 cards. If the cards have
distinct consecutive values and are not all of the same suit, we say that the hand
is a straight. What is the probability that one is dealt a straight?
A 2 3 4 5 6 7 8 9 10 J Q K A
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 83
Example: A 5-card poker hand is a full house if it consists of 3 cards of the
same denomination and 2 cards of the same denomination. (That is, a full house
is three of a kind plus a pair.) What is the probability that one is dealt a full house?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 84
Example (The Birthday Attack): If n people are present in a room, what is the
probability that no two of them celebrate their birthday on the same day of the
year (365 days) ? How large need n be so that this probability is less than 12?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 85
Example (The Birthday Attack): If n people are present in a room, what is the
probability that no two of them celebrate their birthday on the same day of the
year? How large need n be so that this probability is less than 12?
p(n) =365 × 364 × ... × (365 − n + 1)
365n
=365
365× 364
365× ... × 365 − n + 1
365
p(23) = 0.4927 < 12 , p(50) = 0.0296
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 86
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n
p(n)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 87
Matlab Code
n = 5:60;
for i = 1:length(n)
p(i) = 1;
for j = 2:n(i);
p(i) = p(i) * (365-j+1)/365;
end;
end;
figure; plot(n,p,’linewidth’,2); grid on; box on;
xlabel(’n’); ylabel(’p(n)’);
How to improve the efficiency for computing many n’s?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 88
Example: A football team consists of 20 offensive and 20 defensive players.
The players are to be paired into groups of 2 for the purpose of determining
roommates. The pairing is done at random.
(a) What is the probability that there are no offensive-defensive roommate pairs?
(b) What is the probability that there are 2i offensive-defensive roommate pairs?
Denote the answer to (b) by P2i. The answer to (a) is then P0.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 89
Solution Steps:
1. Total ways of dividing the 40 players into 20 unordered pairs.
2. Ways of pairing 2i offensive-defensive pairs.
3. Ways of pairing 20 − 2i remaining players within each group.
4. Applying multiplicative principle and equally likely outcome assumption.
5. Obtaining numerical probability values.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 90
Solution Steps:
1. Total ways of dividing the 40 players into 20 unordered pairs.
ordered
(
40
2, 2, ..., 2
)
=(40)!
(2!)20
unordered(40)!
(2!)20(20)!
2. Ways of pairing 2i offensive-defensive pairs.(
20
2i
)
×(
20
2i
)
× (2i)!
3. Ways of pairing 20 − 2i remaining players within each group.
[
(20 − 2i)!
210−i(10 − i)!
]2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 91
4. Applying multiplicative principle and equally likely outcome assumption.
P2i =
[(
202i
)]2(2i)!
[
(20−2i)!210−i(10−i)!
]2
40!22020!
5. Obtaining numerical probability values.
Stirling’s approximation: n! ≈ nn+1/2e−n√
2π
Be careful!
• Better approximations are available.
• n! could easily overflow, so could(
nk
)
.
• The probability never overflows, P ∈ [0, 1].
• Looking for possible cancelations.
• Rearranging terms in numerator and denominator to avoid overflow.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 92
Conditional Probability and Independence
Chapter 3, Sections 3.1 - 3.5
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 93
Introduction, Section 3.1
Conditional Probability: one of the most important concepts in probability:
• Calculating probabilities when partial (side) information is available.
• Computing desired probabilities may be easier.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 94
A Famous Brain-Teaser
• There are 3 doors.
• Behind one door is a prize.
• Behind the other 2 is nothing.
• First, you pick one of the 3 doors at random.
• One of the doors that you did not pick will be opened to reveal nothing.
• Q: Do you stick with the door you chose in the first place, or do you switch?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 95
A Joke: Tom & Jerry Conversation
• Jerry: I’ve heard that you will take a plane next week. Are you afraid that
there might be terrist activities?
• Tom: Yes. That’s exactly why I am prepared.
• Jerry: How?
• Tom: It is reported that the chance of having a bomb in a plane is 1106 .
According to what I learned in my probability class (4080), the chance of
having two bombs in the plane will be 11012 .
• Jerry: So?
• Tom: Therefore, to dramatically reduce the chance from 1106 to 1
1012 , I will
bring one bomb to the plane myself!
• Jerry: AH...
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 96
Section 3.2, Conditional Probabilities
Example: Suppose we toss a fair dice twice.
• What is the probability that the sum of the 2 dice is 8?
Sample space S = ??Event E = ??
• What if we know the first dice is 3?
(Reduced) S = ?? (Reduced) E = ??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 97
The Basic Formula of Conditional Probability
Two events E, F ⊆ S.
P (E|F ): conditional probability that E occurs given that F occurs.
If P (F ) > 0, then
P (E|F ) =P (EF )
P (F )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 98
The Basic Formula of Conditional Probability
P (E|F ) =P (EF )
P (F )
How to understand this formula? —- Venn diagram
Given that F occurs,
• What is the new (ie, reduced) sample space S′?
• What is the new (ie, reduced) event E′?
• P (E|F ) = |E′||S′| = |E′|/S
|S′|/S =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 99
Example (3.2.2c):
In the card game bridge the 52 cards are dealt out equally to 4 players — called
East, West, North, and South. If North and South have a total of 8 spades among
them, what is the probability that East has 3 of the remaining 5 spades?
The reduced sample space S′ = ??The reduced event E′ = ??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 100
A transformation of the basic formula
P (E|F ) =P (EF )
P (F )
=⇒
P (EF ) = P (E|F ) × P (F ) = P (F |E) × P (E)
P (E|F ) may be easier to compute than P (F |E), or vice versa.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 101
Example:
A total of n balls are sequentially and randomly chosen, without replacement,
from an urn containing r red and b blue balls (n ≤ r + b).
Given that k of the n balls are blue, what is the probability that the first ball
chosen is blue?
Two solutions:
1. Working with the reduced sample space — Easy!
Basically a problem of choosing one ball from n balls randomly.
2. Working with the original sample space and basic formula — More difficult!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 102
Working with original sample space:
Event B: the first ball chosen is blue.
Event Bk: a total of k blue balls are chosen.
Desired probability :
P (B|Bk) =P (BBk)
P (Bk)=
P (Bk|B)P (B)
P (Bk)
P (B) =?? P (Bk) =?? P (Bk|B) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 103
Desired probability :
P (B|Bk) =P (BBk)
P (Bk)=
P (Bk|B)P (B)
P (Bk)
P (B) =b
r + b, P (Bk) =
(
bk
)(
rn−k
)
(
r+bn
) , P (Bk|B) =
(
b−1k−1
)(
rn−k
)
(
r+b−1n−1
)
Therefore,
P (B|Bk) =P (Bk|B)P (B)
P (Bk)=
(
b−1k−1
)(
rn−k
)
(
r+b−1n−1
) × b
r + b×
(
r+bn
)
(
bk
)(
rn−k
) =k
n
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 104
Example (3.2.2f):
Suppose an urn contains 8 red balls and 4 white balls. We draw 2 balls from the
urn without replacement. Assume that at each draw each ball in the urn is equally
likely to be chosen.
Q: what is probability that both balls drawn are red?
Two solutions:
1. The direct approach
2. The conditional probability approach
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 105
Two solutions:
1. The direct approach(
82
)
(
122
) =14
33
2. The conditional probability approach
8
12× 7
11=
14
33
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 106
The Multiplication Rule
An extremely important formula in science
P (E1E2E3...En) = P (E1)P (E2|E1)P (E3|E1E2)...P (En|E1E2...En−1)
Proof?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 107
Example (3.2.2g):
An ordinary deck of 52 playing cards is randomly divided into 4 piles of 13 cards
each.
Q: Compute the probability that each pile has exactly 1 ace.
A: Two solutions:
1. Direct approach.
2. Conditional probability (as in the textbook).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 108
Solution by the direct approach
• The size of sample space)(
52
13
)(
39
13
)(
26
13
)
=52!
13! × 13! × 13! × 13!=
52!
(13!)4
• Fix 4 aces in four piles
• Only need to select 12 cards for each pile (from 48 cards total)(
48
12
)(
36
12
)(
24
12
)
=48!
(12!)4
• Consider 4! permutations
• The desired probability(
4812
)(
3612
)(
2412
)
(
5213
)(
3913
)(
2613
) × 4! =48!
52!
(13!)4
(12!)44! = 0.1055
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 109
Solution using the conditional probability
Define
• E1 = The ace of spades is in any one of the piles
• E2 = The ace of spades and the ace of hearts are in different piles
• E3 = The ace of spades, hearts, and diamonds are all in different piles
• E4 = All four aces are in different piles
The desired probability
P (E1E2E3E4)
=P (E1) P (E2|E1)P (E3|E1E2) P (E2|E1)P (E4|E1E2E3)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 110
• P (E1) = 1
• P (E2|E1) = 3951
• P (E3|E1E2) = 2650
• P (E4|E1E2E3) = 1349
• P (E1E2E3E4) = 3951
2650
1349 = 0.1055.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 111
Section 3.3, Baye’s Formula
The basic Baye’s formula
P (E) = P (E|F )P (F ) + P (E|F c) (1 − P (F ))
Proof:
• By Venn diagram. Or
• By algebra using the conditional probability formula
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 112
Example (3.3.3a):
An accident-prone person will have an accident within a fixed one-year period
with probability 0.4. This probability decreases to 0.2 for a non-accident-prone
person. Suppose 30% of the population is accident prone.
Q 1: What is the probability that a random person will have an accident within a
fixed one-year period?
Q 2: Suppose a person has an accident within a year. What is the probability that
he/she is accident prone?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 113
Example:
Let p be the probability that the student knows the answer and 1 − p the
probability that the student guesses. Assume that a student who guesses will be
correct with probability 1m , where m is the total number of multiple-choice
alternatives.
Q: What is the conditional probability that a student knew the answer, given that
he/she answered it correctly?
Solution:
Let C = event that the student answers the question correctly.
Let K = event that he/she actually knows the answer.
P (K|C) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 114
Solution:
Let C = event that the student answers the question correctly.
Let K = event that he/she actually knows the answer.
P (K|C) =P (KC)
P (C)=
P (C|K)P (K)
P (C|K)P (K) + P (C|Kc)P (Kc)
=1 × p
1 × p + 1m (1 − p)
=mp
1 + (m − 1)p
————
What happens when m is large?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 115
Example (3.3.3d):
A lab blood test is 95% effective in detecting a certain disease when it is
represent. However, the test also yields a false positive result for 1% of the
healthy tested. Assume 0.5% of the population actually has the disease.
Q: What is the probability that a person has the disease given that the test result
is positive?
Solution:
Let D = event that the tested person has the disease.
Let E = event that the test result is positive.
P (D|E) = ??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 116
The Generalized Baye’s Formula
Suppose F1, F2, ..., Fn are mutually exclusive events such as⋃n
i=1 Fi = S,
then
P (E) =n∑
i=1
P (E|Fi)P (Fi),
which is easy to understand from the Venn diagram.
P (Fj |E) =P (EFj)
P (E)=
P (E|Fj)P (Fj)∑n
i=1 P (E|Fi)P (Fi)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 117
Example (3.3.3m):
A bin contains 3 different types of disposable flashlights. The probability that a
type 1 flashlight will give over 100 hours of use is 0.7, with the corresponding
probabilities for type 2 and type 3 flashlights being 0.4 and 0.3 respectively.
Suppose that 20% of the flashlights in the bin are type 1, and 30% are type 2,
and 50% are type 3.
Q 1: What is the probability that a randomly chosen flashlight will give more than
100 hours of use?
Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability
that it was type j?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 118
Q 1: What is the probability that a randomly chosen flashlight will give more than
100 hours of use?
0.7 × 0.2 + 0.4 × 0.3 + 0.3 × 0.5 = 0.14 + 0.12 + 0.15 = 0.41
Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability
that it was type j?
Type 1: 0.140.41 = 0.3415
Type 2: 0.120.41 = 0.3927
Type 3: 0.150.41 = 0.3659
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 119
Independent Events
Definition:
Two events E and F are said to be independent if
P (EF ) = P (E)P (F )
Consequently, if E and F are independent, then
P (E|F ) = P (E), P (F |E) = P (F ).
Independent is different from mutually exclusive!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 120
Check Independence by Verifying P (EF ) = P (E)P (F )
Example: Two coins are flipped and all 4 outcomes are equally likely. Let E be
the event that the first lands heads and F the event that the second lands tails.
Verify that E and F are independent.
Example: Suppose we toss 2 fair dice. Let E denote the event that the sum of
two dice is 6 and F denote the event that the first die equals 4.
Verify that E and F are not independent.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 121
Proposition 4.1:
If E and F are independent, then so are E and F c.
Proof:
Given P (EF ) = P (E)P (F )
Need to show P (EF c) = P (E)P (F c).
Which formula contains both P (EF ) and P (EF c)?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 122
Independence of Three Events
Definition:
The three events, E, F ,and G, are said to be independent if
P (EFG) = P (E)P (F )P (G)
P (EF ) = P (E)P (F )
P (EG) = P (E)P (G)
P (FG) = P (F )P (G)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 123
Independence of More Than Three Events
Definition:
The events, Ei, i = 1 to n, are said to be independent if, for every
E1, E2, ..., Er , r ≤ n, of these events,
P (E1E2...Er) = P (E1)P (E2)...P (Er)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 124
Example (3.4.4f):
An infinite sequence of independent trials is to be performed. Each trial results in
a success with probability p and a failure with probability 1 − p.
Q: What is the probability that
1. at least one success occurs in the first n trials.
2. exactly k successes occur in the first n trials.
3. all trials result in successes?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 125
Solution:
Q: What is the probability that
1. at least one success occurs in the first n trials.
1 − (1 − p)n
2. exactly k successes occur in the first n trials.
(
nk
)
pk(1 − p)n−k
How is this formula related to the binomial theorem?
3. all trials result in successes?
pn → 0 if p < 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 126
Example (3.4.4h):
Independent trials, consisting of rolling a pair of fair dice, are performed. What is
the probability that an outcome of 5 appears before an outcome of 7 when the
outcome is the sum of the dice?
Solution 1:
Let En denote the event that no 5 or 7 appears on the first n − 1 trials and a 5
appears on the nth trial, then the desired probability is
P
( ∞⋃
n=1
En
)
=∞∑
n=1
P (En)
Why Ei and Ej are mutually exclusive?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 127
P (En) = P ( no 5 or 7 on the first n − 1 trials AND a 5 on the nth trial)
By independence
P (En) = P ( no 5 or 7 on the first n − 1 trials)×P ( a 5 on the nth trial)
P ( a 5 on the nth trial) = P (a 5 on any trial)
P ( no 5 or 7 on the first n − 1 trials) = [P (no 5 or 7 on any trial)]n−1
P (no 5 or 7 on any trial) = 1 − P (either a 5 OR a 7 on any trial)
P (either a 5 OR a 7 on any trial) = P (a 5 on any trial) + P (a 7 on any trial)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 128
P (En) = P ( no 5 or 7 on the first n − 1 trials)×P ( a 5 on the nth trial)
P (En) =
(
1 − 4
36− 6
36
)n−14
36=
1
9
(
13
18
)n−1
The desired probability
P
( ∞⋃
n=1
En
)
=∞∑
n=1
P (En) =1
9
1
1 − 1318
=2
5.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 129
Solution 2 (using a conditional argument):
Let E = the desired event.
Let F = the event that the first trial is a 5.
Let G = the event that the first trial is a 7.
Let H = the event that the first trial is neither a 5 nor 7.
P (E) = P (E|F )P (F ) + P (E|G)P (G) + P (E|H)P (H)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 130
A General Result:
If E and F are mutually exclusive events of an experiment, then when
independent trials of this experiment are performed, E will occur before F with
probability
P (E)
P (E) + P (F )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 131
The Gambler’s Ruin Problem (Example 3.4.4k)
Two gamblers, A and B, bet on the outcomes of successive flips of a coin. On
each flip, if the coin comes up head, A collects 1 unit from B, whereas if it comes
up tail, A pays 1 unit to B. They continue until one of them runs out of money.
If it is assumed that the successive flips of the coin are independent and each flip
results in a head with probability p, what is the probability that A ends up with all
the money if he starts with i units and B starts with N − i units?
Intuition: What is this probability if the coin is fair (i.e., p = 1/2)?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 132
Solution:
E = event that A ends with all the money when he starts with i.
H = event that first flip lands heads.
Pi = P (E) = P (E|H)P (H) + P (E|Hc)P (Hc)
P0 =??
PN =??
P (E|H) =??
P (E|Hc) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 133
Pi = P (E) =P (E|H)P (H) + P (E|Hc)P (Hc)
=pPi+1 + (1 − p)Pi−1 = pPi+1 + qPi−1
=⇒Pi+1 =Pi − qPi−1
p=⇒ Pi+1 − Pi =
q
p(Pi − Pi−1)
P2 − P1 =q
pP1
P3 − P2 =q
p(P2 − P1) =
(
q
p
)2
P1
...
Pi − Pi−1 =
(
q
p
)i−1
P1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 134
Pi =P1
[
1 +
(
q
p
)
+
(
q
p
)2
+ ... +
(
q
p
)i−1]
=P1
1 −(
qp
)i
1 −(
qp
)
if q 6= p
Because PN = 1, we know
P1 =1 −
(
qp
)
1 −(
qp
)Nif q 6= p
Pi =1 −
(
qp
)i
1 −(
qp
)Nif q 6= p
(Do we really have to worry about q = p = 12?)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 135
The Matching Problem (Example 3.5.5d)
At a party n men take off their hats. The hats are then mixed up, and each man
randomly selects one. We say that a match occurs if a man selects his own hat.
Q: What is the probability of
(a) no matches;
(b) exactly k matches?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 136
Solution (a): using recursion and conditional probabilities.
1. Let E = event that no matches occurs. Denote Pn = P (E).
Let M = event that the first man selects his own hat.
2. Pn = P (E) = P (E|M)P (M) + P (E|M c)P (M c)
3. Express Pn as a function a Pn−1, Pn−2, etc.
4. Identify the boundary conditions, eg P1 =?, P2 =?.
5. Prove Pn, eg, by induction.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 137
Solution (a): using recursion and conditional probabilities.
1. Let E = event that no matches occurs. Denote Pn = P (E).
Let M = event that the first man selects his own hat.
2. Pn = P (E) = P (E|M)P (M) + P (E|M c)P (M c)
P (E|M) = 0, P (M c) = n−1n , Pn = n−1
n P (E|M c)
3. Express Pn as a function a Pn−1, Pn−2, etc.
P (E|M c) = 1n−1Pn−2 + Pn−1, Pn = n−1
n Pn−1 + 1nPn−2.
4. Identify the boundary conditions.
P1 = 0, P2 = 1/2.
5. Prove Pn by induction.
Pn = 12! − 1
3! + 14! − ... + (−1)n
n! .
(What is a good approximation of Pn?)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 138
Solution (b):
1. Fix a group of k men.
2. The probability that these k men select their own hats.
3. The probability that none of the remaining n − k men select their own hats.
4. Multiplicative counting principle.
5. Adjustment for selecting k men.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 139
Solution (b):
1. Fix a group of k men.
2. The probability that these k men select their own hats.
1n
1n−1
1n−2 ... 1
n−k+1 = (n−k)!n!
3. The probability that none of the remaining n − k men select their own hats.
Pn−k
4. Multiplicative counting principle.(n−k)!
n! × Pn−k
5. Adjustment for selecting k men.
(n−k)!n! Pn−k
(
nk
)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 140
P (.|F ) Is A Probability
Conditional probabilities satisfy all the properties of ordinary probabilities.
Proposition 5.1
• 0 ≤ P (E|F ) ≤ 1
• P (S|F ) = 1
• If Ei, i = 1, 2, ..., are mutually exclusive events, then
P
( ∞⋃
i=1
Ei|F)
=∞∑
i=1
P (Ei|F )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 141
Conditionally Independent Events
Definition:
Two events, E1 and E2, are said to be conditionally independent given event F , if
P (E1|E2F ) = P (E1|F )
or, equivalently
P (E1E2|F ) = P (E1|F )P (E2|F )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 142
Laplace’s Rule of Succession (Example 3.5.5e)
There are k + 1 coins in a box. The ith coin will, when flipped, turn up heads with
probability i/k, i = 0, 1, ..., k. A coin is randomly selected from the box and is
then repeatedly flipped.
Q: If the first n flip all result in heads, what is the conditional probability that the
(n + 1)st flip will do likewise?
Key: Conditioning on that the ith coin is selected, the outcomes will be
conditionally independent.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 143
Solution :
Let Ci = event that the ith coin is selected.
Let Fn = event the first n flips all result in heads.
Let H = event that the (n + 1)st flip is a head.
The desired probability :
P (H|Fn) =k∑
i=0
P (H|FnCi)P (Ci|Fn)
By conditional independence P (H|FnCi) =??
Solve P (Ci|Fn) using Baye’s formula.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 144
The desired probability :
P (H|Fn) =k∑
i=0
P (H|FnCi)P (Ci|Fn)
By conditional independence P (H|FnCi) = P (H|Ci) = ik
Using Baye’s formula
P (Ci|Fn) =P (CiFn)
P (Fn)=
P (Fn|Ci)P (Ci)∑k
j=0 P (Fn|Cj)P (Cj)=
(
ik
)n 1k+1
∑kj=0
(
jk
)n 1k+1
The final answer :
P (H|Fn) =
∑kj=0
(
jk
)n+1
∑kj=0
(
jk
)n
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 145
Chapter 4: (Discrete) Random Variables
Definition:
A random variable is a function from the sample space S to the real numbers.
A discrete random variable is a random variable that takes only on a finite (or at
most a countably infinite) number of values.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 146
Example (1a):
Suppose the experiment consists of tossing 3 fair coins. Let Y denote the
number of heads appearing, then Y is a discrete random variable taking on one
of the values 0, 1, 2, 3 with probabilities
P (Y = 0) = P(T, T, T ) =1
8
P (Y = 1) = P(H, T, T ), (T, H, T ), (T, T, H) =3
8
P (Y = 2) = P(T, H, H), (H, T, H), (H, H, T ) =3
8
P (Y = 3) = P(H, H, H) =1
8
Check
P
(
3⋃
i=0
Y = i)
=3∑
i=0
PY = i = 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 147
Example (1c)
Independent trials, consisting of the flipping of a coin having probability p of
coming up heads, are continuously performed until either a head occurs or a total
of n flips is made. If we let X denote the number of times the coin is flipped, then
X is a random variable taking on one of the the values 1, 2, ..., n.
P (X = 1) = PH = p
P (X = 2) = P(T, H) = (1 − p)p
P (X = 3) = P(T, T, H) = (1 − p)2p
...
P (X = n − 1) = (1 − p)n−2p
P (X = n) = (1 − p)n−1
Check∑n
i=1 P (X = i) = 1??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 148
Example (1d)
3 balls are randomly chosen from an urn containing 3 white, 3 red, and 5 black
balls. Suppose that we win $1 for each white ball selected and lose $1 for each
red ball selected. Let X denote our total winnings from the experiments.
Q 1: What possible values can X take on?
Q 2: What are the respective probabilities?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 149
Solution 1 : What possible values can X take on?
X ∈ 3, 2, 1, 0,−1,−2,−3.
Solution 2 : What are the respective probabilities?
P (X = 0) =
(
53
)
+(
31
)(
31
)(
51
)
(
113
) =55
165
P (X = 3) = P (X = −3) =
(
33
)
(
113
) =1
165
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 150
The Coupon Collection Problem (Example 1e)
There are N distinct types of coupons and each time one obtains a coupon
equally likely from one of the N types, independent of prior selections.
A random variable T = number of coupons that needs to be collected until one
obtains a complete set of at least one of each type.
Q: What is P (T = n)?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 151
Solution:
1. Suffice to study P (T > n)
because P (T = n) = P (T > n) − P (T > n − 1)
2. Define event Aj = no type j coupon is collected among first n.
3. P (T > n) = P(
⋃Nj=1 Aj
)
Then what?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 152
P
N⋃
j=1
Aj
=∑
j
P (Aj) −∑
j1<j2
P (Aj1Aj2)
+∑
j1<j2<j3
P (Aj1Aj2Aj3) + ... + (−1)N+1P (A1A2...AN )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 153
P (Aj) =
(
N − 1
N
)n
P (Aj1Aj2) =
(
N − 2
N
)n
...
P (Aj1Aj2 ...Ajk) =
(
N − k
N
)n
...
P (Aj1Aj2 ...AjN−1) =
(
1
N
)n
P (A1A2...AN ) = 0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 154
P (T > n) = P
N⋃
j=1
Aj
=∑
j
P (Aj) −∑
j1<j2
P (Aj1Aj2)
+∑
j1<j2<j3
P (Aj1Aj2Aj3) + ... + (−1)N+1P (A1A2...AN )
=N
(
N − 1
N
)n
−(
N
2
)(
N − 2
N
)n
+
(
N
3
)(
N − 3
N
)n
+ ... + (−1)N
(
N
N − 1
)(
1
N
)n
=N−1∑
j=1
(
N
j
)(
N − j
N
)n
(−1)j+1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 155
Cumulative Distribution Function
For a random variable X , its cumulative distribution function F is defined as
F (x) = PX ≤ x
F (x) is a non-decreasing function:
If x ≤ y, then F (x) ≤ F (y).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 156
Section 4.2: Discrete Random Variables
Discrete random variable A random variable X that take on at most a
countable number of possible values.
The probability mass function p(a) = PX = a
If X takes one of the values x1, x2, ..., then
p(xi) ≥ 0
∞∑
i=1
p(xi) = 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 157
Example (4.2.2a): The probability mass function of a random variable X is
given by p(i) = cλi
i! , i = 0, 1, 2, ....
Q: Find
• the relation between c and λ.
• PX = 0;
• PX > 2;
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 158
Solution :
• The relation between c and λ.
∞∑
i=0
p(i) = 1 =⇒ c∞∑
i=0
λi
i!= 1 =⇒ ceλ = 1 =⇒ c = e−λ.
• PX = 0;
PX = 0 = c = e−λ.
• PX > 2;
PX > 2 =1 − PX = 0 − PX = 1 − PX = 2
=1 − c − cλ − cλ2
2
=1 − e−λ − λe−λ − λ2e−λ
2.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 159
Section 4.3: Expected Value
Definition:
If X is a discrete random variable having a probability mass function p(x), the
expected value of X is defined by
E(X) =∑
x;p(x)>0
xp(x)
The expected value is a weighted average of possible values that X can take on.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 160
Example Find E(x) where X is the outcome when we roll a fair die.
E(X) =
6∑
i=1
i1
6=
1
6
6∑
i=1
i =7
2
Example Let I be an indicator variable for event A, if
I =
1 if A occurs
0 if Ac occurs
E(X) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 161
Example (3d) A school class of 120 students are driven in 3 buses to a
symphony. There are 36 students in one of the buses, 40 in another, and 44 in the
third bus. When the buses arrive, one of the 120 students is randomly chosen.
Let X denote the number of students on the bus of that randomly chosen
student, find E(X).
Solution:
X = ??P (X) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 162
Example (4a)
PX = −1 = 0.2, PX = 0 = 0.5, PX = 1 = 0.2.
Find E(X2).
Solution: Let Y = X2.
PY = 0 =PX = 0 = 0.5
PY = 1 =PX = 1 OR X = −1=PX = 1 + PX = −1=0.2 + 0.3 = 0.5
Therefore
E(X2) = E(Y ) = 0 × 0.5 + 1 × 0.5 = 0.5
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 163
Verify
∑
i
PX = xix2i
=PX = −1(−1)2 + PX = 0(0)2 + PX = 1(1)2
=0.5??= E(X2)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 164
Section 4.4: Expectation of a Function of Random Variable
Proposition 4.1 If X is a discrete random variable that takes on one of the
values xi, i ≥ 1, with respective probabilities p(xi), then for any real-valued
function g
E (g(X)) =∑
i
g(xi)p(xi)
Proof: Group the i’s with the same g(xi).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 165
Proof:
Group the i’s with the same g(xi).
∑
i
g(xi)p(xi) =∑
j
∑
i:g(xi)=yj
g(xi)p(xi)
=∑
j
yj
∑
i:g(xi)=yj
p(xi)
=∑
j
yjPg(X) = yj
=E(g(X)).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 166
Corollary 4.1 If a and b are constants, then
E(aX + b) = aE(X) + b.
Proof Use the definition of E(X).
The nth moment
E(Xn) =∑
x;p(x)>0
xnp(x)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 167
Section 4.5: Variance
The expectation E(X) measures the average of possible values of X .
The variance V ar(X) measures the variation, or spread of these values.
Definition If X is a random variable with mean µ, then the variance of X is
defined by
V ar(X) = E[
(X − µ)2]
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 168
An alternative formula for V ar(X)
V ar(X) = E(X2) − (E(X))2
= E(X2) − µ2
Proof:
V ar(X) =E[
(X − µ)2]
=∑
x
(x − µ)2p(x)
=∑
x
(x2 − 2µx + µ2)p(x)
=∑
x
x2p(x) − 2µ∑
x
xp(x) + µ2∑
x
p(x)
=E(X2) − 2µ2 + µ2
=E(X2) − µ2.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 169
Two important identities
If a and b are constants, then
V ar(a) = V ar(b) = 0
V ar(aX + b) = a2V ar(X)
Proof: Use the definitions.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 170
E(aX + b) = aE(x) + b
V ar(aX + b) =E ((aX + b) − E(aX + b))2
=E ((aX + b) − aEX − b)2
=E (aX − aEX)2
=a2E (X − EX)2
=a2V ar(X)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 171
Section 4.6: Bernoulli and Binomial Random Variables
Bernoulli random variable X ∼ Bernoulli(p)
PX = 1 = p(1) = p,
PX = 0 = p(0) = 1 − p.
Properties:
E(X) = p
V ar(X) = p(1 − p)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 172
Binomial random variable
Definition A random variable X presents the number of total successes that
occur in n independent trials. Each trial results in a success with probability p and
a failure with probability 1 − p.
X ∼ Binomial(n, p).
The probability mass function
p(i) = PX = i =
(
n
i
)
pi(1 − p)n−i
• Proof?
• ∑ni=0 p(i) = 1?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 173
A very useful representation:
X ∼ binomial(n, p) can be viewed as the sum of n independent Bernoulli
random variables with parameter p.
X =n∑
i=1
Yi, Yi ∼ Bernoulli(p)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 174
Example : Five fair coins are flipped. If the outcomes are assumed
independent, find the probability mass function of the numbers of heads obtained.
Solution:
• X = number of heads obtained.
• X ∼ binomial(n, p). n=??, p=??
• PX = i =?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 175
Properties of X ∼ binomial(n, p)
Expectation: E(X) = np.
Variance: V ar(X) = np(1 − p).
Proof:??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 176
E(X) =n∑
i=0
i
(
n
i
)
pi(1 − p)n−i
=n∑
i=0
in!
i!(n − i)!pi(1 − p)n−i
=
n∑
i=1
in!
i!(n − i)!pi(1 − p)n−i
=npn∑
i=1
(n − 1)!
(i − 1)!((n − 1) − (i − 1))!pi−1(1 − p)(n−1)−(i−1)
=npn−1∑
j=0
(n − 1)!
j!((n − 1) − j)!pj(1 − p)(n−1)−j
=np
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 177
E(X2) =n∑
i=0
i2(
n
i
)
pi(1 − p)n−i =n∑
i=1
i2n!
i!(n − i)!pi(1 − p)n−i
=npn∑
i=1
i(n − 1)!
(i − 1)!((n − 1) − (i − 1))!pi−1(1 − p)(n−1)−(i−1)
=npn−1∑
j=0
(j + 1)(n − 1)!
j!((n − 1) − j)!pj(1 − p)(n−1)−j
=npE(Y + 1) Y ∼ binomial(n − 1, p)
=np((n − 1)p + 1)
V ar(X) = E(X2)− (E(X))2
= np((n− 1)p+1)− (np)2 = np(1− p)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 178
What Happens When n → ∞ and np → λ?
X ∼ binomial(n, p), n → ∞ and np → λ
PX = i =n!
(n − i)!i!pi(1 − p)n−i
=(1 − p)n
i!
pi
(1 − p)in(n − 1)(n − 2)...(n − i + 1)
=
[
(1 − p)n(np)i
i!
] [
n(n − 1)(n − 2)...(n − i + 1)
(1 − p)ini
]
≈e−λλi
i!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 179
limn→∞
(
1 − λ
n
)n
= e−λ
For fixed (small) i,
limn→∞
n(n − 1)(n − 2)...(n − i + 1)
(n − λ)i= 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 180
Section 4.7: The Poisson Random Variable
Definition: A random variable X , taking on one of the values 0, 1, 2, ..., is a
Poisson random variable with parameter λ, if for some λ > 0,
p(i) = PX = i = e−λ λi
i!, i = 0, 1, 2, 3, ...
Check
∞∑
i=0
p(i) =∞∑
i=0
e−λ λi
i!= 1??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 181
Facts about Poisson Distribution
• Poisson(λ) approximates binomial (n, p), when n large and np ≈ λ.
(
n
i
)
pi(1 − p)n−i ≈e−λλi
i!
• Poisson is widely used for modeling rare (p ≈ 0) events (but not too rare).
– The number of misprints on a page of a book
– The number of people in a community living to 100 years of age
– ...
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 182
Compare Binomial with Poisson
0 5 10 15 200
0.05
0.1
0.15
0.2
0.25
i
p(i)
n = 20 p = 0.2
BinomialPoisson
0 5 10 15 200
0.05
0.1
0.15
0.2
0.25
ip(
i)
n = 20 p = 0.2
PoissonBinomial
Poisson approximation is good
• n does not have to be too large.
• p does not have to be too small.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 183
Matlab Code
function comp_binom_poisson(n,p);
i=0:n; P1=binopdf(i,n,p); P2=poisspdf(i,n*p);
figure; bar(i,P1,’g’); hold on; grid on;
bar(i,P2,’r’); legend(’Binomial’,’Poisson’);
xlabel(’i’); ylabel(’p(i)’);
axis([-0.5 n 0 max(P1)+0.05]);
title([’n = ’ num2str(n) ’ p = ’ num2str(p)]);
figure; bar(i,P2,’r’); hold on; grid on;
bar(i,P1,’g’); legend(’Poisson’,’Binomial’);
xlabel(’i’); ylabel(’p(i)’);
axis([-0.5 n 0 max(P1)+0.05]);
title([’n = ’ num2str(n) ’ p = ’ num2str(p)]);
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 184
Example Suppose that the number of typographical errors on a single page
of a book has a Poisson distribution with parameter λ = 12 .
Q: Calculate the probability that there is at least one error on this page.
P (X ≥ 1) = 1 − P (X = 0) = 1 − e−1/2 = 0.393.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 185
Expectation and Variance of Poisson
X ∼ Poisson(λ) Y ∼ binomial(n, p)
X approximates Y when (1): n large, (2): p small, and (3): np ≈ λ
E(Y ) = np, V ar(Y ) = np(1 − p)
Guess E(X) = ??, V ar(X) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 186
X ∼ Poisson(λ)
E(X) =∞∑
i=1
ie−λ λi
i!= λ
∞∑
i=1
e−λ λi−1
(i − 1)!= λ
∞∑
j=0
e−λ λj
j!= λ
E(X2) =∞∑
i=1
i2e−λ λi
i!= λ
∞∑
i=1
ie−λ λi−1
(i − 1)!
=λ∞∑
j=0
(j + 1)e−λ λj
j!= λ (E(X) + 1) = λ2 + λ
V ar(X) = λ2 + λ − λ2 = λ
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 187
Section 4.8.1: Geometric Random Variable
Definition: Suppose that independent trials, each having probability p of
being a success, are performed until a success occurs. X is the number of trials
required. Then
PX = n = (1 − p)n−1p, n = 1, 2, 3, ...
Check∑∞
n=1 PX = n =∑∞
n=1(1 − p)n−1p = 1??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 188
Expectation X ∼ Geometric(p).
E(X) =∞∑
n=1
n(1 − p)n−1p (q = 1 − p)
=p + 2qp + 3q2p + 4q3p + 5q4p + ...
=p(
1 + 2q + 3q2 + 4q3 + 5q4 + ...)
=??
We know
1 + q + q2 + q3 + q4 + ... =1
1 − q
How about
I = 1 + 2q + 3q2 + 4q3 + 5q4 + ... =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 189
I = 1 + 2q + 3q2 + 4q3 + 5q4 + ... =??
First trick
q × I = q + 2q2 + 3q3 + 4q4 + 5q5 + ...
I − qI = 1 + q + q2 + q3 + q4 + ... =1
1 − q
=⇒I =1
(1 − q)2
=⇒E(X) = pI =p
(1 − q)2=
1
p
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 190
I = 1 + 2q + 3q2 + 4q3 + 5q4 + ... =??
Second trick
I =d
dq
(
q + q2 + q3 + q4 + q5 + ...)
I =d
dq
1
1 − q=
1
(1 − q)2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 191
Variance X ∼ Geometric(p).
E(X2) =∞∑
n=1
n2(1 − p)n−1p (q = 1 − p)
=p(
1 + 22q + 32q2 + 42q3 + 52q4 + ...)
J = 1 + 22q + 32q2 + 42q3 + 52q4 + ... =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 192
J = 1 + 22q + 32q2 + 42q3 + 52q4 + ... =??
Trick 1
qJ = q + 22q2 + 32q3 + 42q4 + 52q5 + ...
J(1 − q) = 1 + (22 − 12)q + (32 − 22)q2 + (42 − 32)q3 + ...
= 1 + 3q + 5q2 + 7q3 + ...
J(1 − q)q = q + 3q2 + 5q3 + 7q4 + ...
J(1 − q)2 = 1 + 2q + 2q2 + 2q3 + ... = −1 +2
1 − q=
1 + q
1 − q
J =1 + q
(1 − q)3E(X2) =
p(2 − p
p3=
2 − p
p2V ar(X) =
1 − p
p2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 193
J = 1 + 22q + 32q2 + 42q3 + 52q4 + ... =??
Trick 2
J =
∞∑
n=1
n2qn−1 =
∞∑
n=1
d
dqnqn
=d
dq
[ ∞∑
n=1
nqn
]
=d
dq
[
q
pE(X)
]
=q
p
d
dq
1
1 − q=
q
p
1
(1 − q)2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 194
Section 4.8.2: Negative Binomial Distribution
Definition: Independent trials, each having probability p, of being a success
are performed until a total of r successes is accumulated. Let X denote the
number of trials required. Then X is a negative binomial random variable
X ∼ NB(r, p)
X ∼ Geometric(p) is the same as X ∼ NB(1, p).
What is probability mass function of X ∼ NB(r, p)?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 195
PX = n =
(
n − 1
r − 1
)
pr(1 − p)n−r, n = r, r + 1, r + 2, ...
E(Xk) =∞∑
n=r
nk
(
n − 1
r − 1
)
pr(1 − p)n−r
=∞∑
n=r
nk (n − 1)!
(r − 1)!(n − r)!pr(1 − p)n−r
Now what?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 196
E(Xk) =∞∑
n=r
nk−1 n!
(r − 1)!(n − r)!pr(1 − p)n−r
=r
p
∞∑
n=r
nk−1 n!
r!(n − r)!pr+1(1 − p)n−r, (m = n + 1)
=r
p
∞∑
m=r+1
(m − 1)k−1 (m − 1)!
r!(m − 1 − r)!pr+1(1 − p)m−1−r
=r
pE[(Y − 1)k−1], Y ∼ NB(r + 1, p)
E(X) =??, E(X2) =??, V ar(X) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 197
E(Xk) =r
pE[(Y − 1)k−1], Y ∼ NB(r + 1, p)
Let k = 1. Then
E(X) =r
pE[(Y − 1)0] =
r
p
Let k = 2. Then
E(X2) =r
pE[(Y − 1)] =
r
p
(
r + 1
p− 1
)
V ar(X) = E(X2) − E2X =r(1 − p)
p2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 198
Example (4.8.8a) An urn contains N white and M black balls. Balls are
randomly selected, one at a time, until a black one is obtained. Each selected ball
is replaced before the next one is drawn.
Q(a): What is the probability that exactly n draws are needed?
Q(b): What is the expected number of draws needed?
Q(c): What is the probability that at least k draws are needed?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 199
Solution: This is a geometric random variable with p = MM+N .
Q(a): What is the probability that exactly n draws are needed?
P (X = n) = (1 − p)n−1p =
(
N
M + N
)n−1M
M + N=
MNn−1
(M + N)n
Q(b): What is the expected number of draws needed?
E(X) =1
p=
M + N
M= 1 +
N
M
Q(c): What is the probability that at least k draws are needed?
P (X ≥ k) =∞∑
n=k
(1 − p)n−1p =p(1 − p)k−1
1 − (1 − p)=
(
N
M + N
)k−1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 200
Example (4.8.8e): A pipe-smoking mathematician carries 2 matchboxes, 1 in
the left-hand pocket and 1 in his right-hand pocket. Each time he needs a match
he is equal likely to take it from either pocket. Consider the moment when the
mathematician first discovers that one of his matchboxes is empty. If it is assumed
that both matchboxes initially contained N matches, what is the probability that
there are exactly k matches in the other box, k = 0, 1, ..., N?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 201
Solution:
1. Assume the left-hand pocket is empty.
2. X = total matches taken at the moment when left-hand pocket is empty.
3. View X as a random variable X ∼ NB(r, p). What is r, p?
4. Remember there are k matches remaining in the right-hand pocket.
5. The desired probability = PX = n. What is n?
6. Adjustment by considering either left-hand or right-hand can be empty.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 202
Solution:
1. Assume the left-hand pocket is empty.
2. X = total matches taken at the moment when left-hand pocket is empty.
3. View X as a random variable X ∼ NB(r, p). What is r, p?
p = 12 , r = N + 1.
4. Remember there are k matches remaining in the right-hand pocket.
5. The desired probability = PX = n. What is n?
n = 2N − k + 1.
6. Adjustment by considering either left-hand or right-hand can be empty.
2(
2N−kN
) (
12
)2N−k+1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 203
Section 4.8.3: Hypergeometric Distribution
Definition A sample of size n is chosen randomly without replacement from
an urn containing N balls, of which m are white and N − m are black. Let X
denote the number of white balls selected, then X ∼ HG(N, m, n)
PX = i =
(
mi
)(
N−mn−i
)
(
Nn
) , i = 0, 1, ..., n.
n − (N − m) ≤ i ≤ min(n, m)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 204
If the n balls are chosen with replacement, then it is binomial(
n, p = mN
)
.
when m and N are large relative to n.
X ∼ HG(N, m, n) ≈ binomial(
n,m
N
)
E(X)??≈ np =
nm
N??
V ar(X)??≈ np(1 − p) =
nm
N
(
1 − m
N
)
??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 205
E(Xk) =
n∑
i=1
ik(
mi
)(
N−mn−i
)
(
Nn
)
=n∑
i=1
ik−1 m!
(i − 1)!(m − i)!
(N − m)!
(n − i)!(N − m − n − i)!
n!(N − n)!
N !
=n−1∑
j=0
(j + 1)k−1 m!
j!(m − 1 − j)!
(N − m)!
(n − 1 − j)!(N − m − n − 1 − j)!
n!(N − n)!
N !
=nm
NE[
(Y + 1)k−1]
, Y ∼ HG(N − 1, m − 1, n − 1)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 206
Section 4.9: Properties of Cumulative Distribution Functi on
F (x) = PX ≤ x
1. F (x) is a non-decreasing function: If a < b, then F (a) ≤ F (b)
2. limx→∞ = 1
3. limx→−∞ = 0
4. F (x) is right continuous.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 207
Chapter 5: Continuous Random Variables
Definition: X is a continuous random variable if there exists a non-negative
function f , defined for all real x ∈ (−∞, ∞), having the property that for any
set B of real numbers,
PX ∈ B =
∫
B
f(x)dx
The function f is called the probability density function of the random variable X .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 208
Some Facts
• PX ∈ (−∞, ∞) =∫∞−∞ f(x)dx = 1
• PX ∈ (a, b) = Pa ≤ X ≤ b =∫ b
af(x)dx
• PX = a =∫ a
af(x)dx = 0
• F (a) = PX ≤ a = PX < a =∫ a
−∞ f(x)dx
• F (∞) = 1, F (−∞) = 0
• f(x) = ddxF (x)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 209
Example 5.1.1a: Suppose that X is a continuous random variable whose
probability density function is given by
f(x) =
C(4x − 2x2) 0 < x < 2
0 otherwise
Q(a): What is the value of C?
Q(b): Find PX > 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 210
Example 5.1.1d: Denote FX and fX the cumulative function and the
density function of a continuous random variable X , respectively. Find the density
function of Y = 2X .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 211
Example 5.1.1d: Denote FX and fX the cumulative function and the
density function of a continuous random variable X , respectively. Find the density
function of Y = 2X .
Solution:
FY (y) = P Y ≤ y = P 2X ≤ y = P X ≤ y/2 = FX(y/2)
fY (x) =d
dyFY (y) =
d
dyFX(y/2) =
1
2fX(y/2)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 212
Derivatives of Common Functions
http://en.wikipedia.org/wiki/Derivative
• Exponential and logarithmic functionsddxex = ex, d
dxax = ax log a, (Notation: log x = loge x)ddx log x = 1
x , ddx loga x = 1
x log a
• Trigonometric functionsddx sinx = cosx, d
dx cosx = − sinx, ddx tanx = sec2 x
• Inverse trigonometric functionsddx arcsinx = 1√
1−x2 , ddx arccosx = − 1√
1−x2 ,
ddx arctanx = 1
1+x2 ,
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 213
Integration by Parts
∫ b
a
f(x)dx = f(x)x|ba −∫ b
a
xf ′(x)dx
∫ b
a
f(x)d[g(x)] = f(x)g(x)|ba −∫ b
a
g(x)f ′(x)dx
∫ b
a
f(x)h(x)dx =
∫ b
a
f(x)d[H(x)] = f(x)H(x)|ba −∫ b
a
H(x)f ′(x)dx
where h(x) = H ′(x).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 214
Examples:
∫ b
a
log xdx =x log x|ba −∫ b
a
x [log x]′dx
=b log b − a log a −∫ b
a
x1
xdx
=b log b − a log a − b + a.
∫ b
a
x log xdx =1
2
∫ b
a
log xd[x2] =1
2x2 log x|ba − 1
2
∫ b
a
x2 [log x]′dx
=1
2b2 log b − 1
2a2 log a − 1
2
∫ b
a
xdx
=1
2b2 log b − 1
2a2 log a − 1
4
(
b2 − a2)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 215
∫ b
a
xnexdx =
∫ b
a
xnd[ex] = xnex|ba − n
∫ b
a
exxn−1dx
∫ b
a
xn cosxdx =
∫ b
a
xnd[sinx] = xn sinx|ba − n
∫ b
a
sinx xn−1dx
∫ b
a
sinn x sin 2xdx =2
∫ b
a
sinn x sinx cosxdx = 2
∫ b
a
sinn+1 xd[sinx]
=2
n + 2sinn+2 x
∣
∣
∣
∣
b
a
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 216
Expectation
Definition: Given a continuous random variable X with density f(x), its
density is defined as
E(X) =
∫ ∞
−∞xf(x)dx
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 217
Example: The density of X is given by
f(x) =
2x if 0 ≤ x ≤ 1
0 otherwise
Find E[
eX]
.
Two solutions:
1. • Find the density function of Y = eX , denoted by fY (y).
• E(Y ) =∫∞−∞ yfY (y)dy.
2. Apply the formula
E [g(X)] =
∫ ∞
−∞g(x)f(x)dx
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 218
Solution 1: Y = eX .
FY (y) =P Y ≤ y = P
eX ≤ y
=P X ≤ log y =
∫ log y
0
2ydy = log2 y
Therefore,
fY (y) =2
ylog y, 1 ≤ y ≤ e
E(Y ) =
∫ ∞
−∞yfY (y)dy =
∫ e
1
2 log ydy
= 2y log y|e1 −∫ e
1
2y1
ydy = 2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 219
Solution 2:
E[
eX]
=
∫ 1
0
2xexdx =
∫ 1
0
2xdex
= 2xex|10 − 2
∫ 1
0
exdx
=2e − 2(e − 1) = 2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 220
Lemma 5.2.2.1 For a non-negative random variable X ,
E(X) =
∫ ∞
0
PX > xdx
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 221
Lemma 5.2.2.1 For a non-negative random variable X ,
E(X) =
∫ ∞
0
PX > xdx
Proof:∫ ∞
0
PX > xdx =
∫ ∞
0
∫ ∞
x
fX(y)dydx
=
∫ ∞
0
∫ y
0
fX(y)dxdy
=
∫ ∞
0
fX(y)
(∫ y
0
dx
)
dy
=
∫ ∞
0
fX(y)ydy = E(X)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 222
X is a continuous random variable.
• E(aX + b) = aE(X) + b, where a and b are constants.
• V ar(X) = E[
(X − E(X))2]
= E(X2) − E2(X).
• V ar(aX + b) = a2V ar(X)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 223
Section 5.3: The Uniform Random Variable
Definition: X is a uniform random variable on the interval (a, b) if its
probability density function is given by
f(x) =
1b−a if a < x < b
0 otherwise
Denote X ∼ Uniform(a, b), or simply X ∼ U(a, b).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 224
Cumulative function
F (x) =
0 if x ≤ a
x−ab−a if a < x < b
1 if x ≥ b
If X ∼ U(0, 1), then F (x) = x, if 0 ≤ x ≤ 1.
Expectation
E(X) =a + b
2
Variance
V ar(X) =(b − a)2
12
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 225
Section 5.4: Normal Random Variable
Definition: X is a normal random variable, X ∼ N(
µ, σ2)
if its density is
given by
f(x) =1√2πσ
e−(x−µ)2
2σ2 , −∞ < x < ∞
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 226
• If X ∼ N(µ, σ2), then E(X) = µ and V ar(X) = σ2.
• The density function is a “bell-shaped” curve.
• It was first introduced by French Mathematician Abraham DeMoivre in 1733,
to approximate the binomial when n was large.
• The distribution became truly famous because of Karl F. Gauss
and hence its another common name is “Gaussian distribution.”
• It is called “normal distribution” (probably started by Karl Pearson) because
“most people” believed that it was “normal” for any well-behaved data set to
follow this distribution.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 227
Check that∫ ∞
−∞f(x)dx =
∫ ∞
−∞
1√2πσ
e−(x−µ)2
2σ2 dx = 1.
Note that, by letting z = x−µσ ,
∫ ∞
−∞
1√2πσ
e−(x−µ)2
2σ2 dx =
∫ ∞
−∞
1√2π
e−z2
2 dz = I.
It suffices to show that I2 = 1.
I2 =
∫ ∞
−∞
1√2π
e−z2
2 dz ×∫ ∞
−∞
1√2π
e−w2
2 dw
=
∫ ∞
−∞
∫ ∞
−∞
1
2πe−
z2+w2
2 dzdw
=1
2π
∫ ∞
0
∫ 2π
0
e−r2
2 rdθdr =1
2π
∫ 2π
0
dθ
∫ ∞
0
e−r2
2 d
[
r2
2
]
= 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 228
Expectation and Variance
If X ∼ N(µ, σ2), then E(X) = µ and V ar(X) = σ2.
E(X) =
∫ ∞
−∞x
1√2πσ
e−(x−µ)2
2σ2 dx
=
∫ ∞
−∞(x − µ)
1√2πσ
e−(x−µ)2
2σ2 dx +
∫ ∞
−∞µ
1√2πσ
e−(x−µ)2
2σ2 dx
=
∫ ∞
−∞(x − µ)
1√2πσ
e−(x−µ)2
2σ2 d[x − µ] + µ
=
∫ ∞
−∞z
1√2πσ
e−z2
2σ2 dz + µ
=µ.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 229
V ar(X2) =E[
(X − µ)2]
=
∫ ∞
−∞(x − µ)2
1√2πσ
e−(x−µ)2
2σ2 d[x − µ]
=
∫ ∞
−∞z2 1√
2πσe−
z2
2σ2 dz
=σ2
∫ ∞
−∞
z2
σ2
1√2π
e−z2
2σ2 d[ z
σ
]
=σ2
∫ ∞
−∞t2
1√2π
e−t2
2 d [t]
=σ2
∫ ∞
−∞t
1√2π
e−t2
2 d
[
t2
2
]
=σ2
∫ ∞
−∞−t
1√2π
d[
e−t2
2
]
=σ2
∫ ∞
−∞
1√2π
e−t2
2 dt − 1√2π
σ2te−t2
2
∣
∣
∣
∞
−∞= σ2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 230
Why limt→∞ te−t2
2 = 0??
limt→∞
te−t2
2 = limt→∞
t
et2/2= lim
t→∞[t]′
[
et2/2]′ = lim
t→∞1
tet2/2= 0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 231
Normal Distribution Approximates Binomial
Suppose X ∼ Binomial(n, p). For fixed p, as n → ∞
Binomial(n, p) ≈ N(µ, σ2),
µ = np, σ2 = np(1 − p).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 232
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
x
Den
sity
(m
ass)
func
tion
n = 10 p = 0.2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 233
−10 −5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
x
Den
sity
(m
ass)
func
tion
n = 20 p = 0.2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 234
−10 −5 0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
x
Den
sity
(m
ass)
func
tion
n = 50 p = 0.2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 235
−10 0 10 20 30 40 500
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
x
Den
sity
(m
ass)
func
tion
n = 100 p = 0.2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 236
100 150 200 250 3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
x
Den
sity
(m
ass)
func
tion
n = 1000 p = 0.2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 237
Matlab code
function NormalApprBinomial(n,p);
mu = n*p; sigma2 = n*p*(1-p);
figure;
bar((0:n),binopdf(0:n,n,p),’g’);hold on; grid on;
x = mu - 3*sigma2:0.001:mu+3*sigma2;
plot(x,normpdf(x,mu,sqrt(sigma2)),’r-’,’linewidth’,2);
xlabel(’x’);ylabel(’Density (mass) function’);
title([’n = ’ num2str(n) ’ p = ’ num2str(p)]);
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 238
The Standard Normal Distribution
Z ∼ N(0, 1) is called the standard normal.
• E(Z) = 0, V ar(Z) = 1.
• If X ∼ N(
µ, σ2)
, then Y = x−µσ ∼ N(0, 1).
• Z ∼ N(0, 1). Denote density function fZ(z) by φ(z)
and cumulative function FZ(z) by Φ(z).
φ(z) =1√2π
e−z2
2 , Φ(z) =
∫ z
−∞
1√2π
e−z2
2 dz
• φ(−x) = φ(x), Φ(−x) = 1 − Φ(x). Numerical table on Page 222.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 239
Example 5.4.4b: X ∼ N(µ = 3, σ2 = 9). Find
• P2 < X < 5• PX > 0• P|X − 3| > 6.
Solution:
Let Y = X−µσ = X−3
3 . Then Y ∼ N(0, 1).
P2 < X < 5 =P
2 − 3
3<
X − 3
3<
5 − 3
3
=P
−1
3< Y <
2
3
= Φ
(
2
3
)
− Φ
(
−1
3
)
=Φ(0.6667) − 1 + Φ (0.3333)
≈0.7475 − 1 + 0.6306 ≈ 0.3781
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 240
Section 5.5: Exponential Random Variable
Definition: An exponential random variable X has the probability density
function given by
f(x) =
λe−λx if x ≥ 0
0 if x < 0(λ > 0)
X ∼ EXP (λ).
Cumulative function:
F (x) =
∫ x
0
λe−λxdx = 1 − e−λx, (x ≥ 0).
F (∞) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 241
Moments:
E(Xn) =
∫ ∞
0
xnλe−λxdx
= −∫ ∞
0
xnde−λx
=
∫ ∞
0
e−λxnxn−1dx
=n
λE(Xn−1)
E(X) = 1λ , E(X2) = 2
λ2 , V ar(X) = 1λ2 , E(Xn) = n!
λn ,
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 242
Applications of Exponential Distribution
• The exponential distribution often arises as being the distribution of the
amount of time until some specific event occurs.
• For instance, the amount of time (starting from now) until an earthquake
occurs, or until a new war breaks out, etc.
• The exponential distribution is memoryless.
PX > s + t|X > t = PX > s, for all s, t ≥ 0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 243
Example 5b: Suppose that the length of a phone call in minutes is an
exponential random variable with parameter λ = 110 . If someone arrives
immediately ahead of you at a public telephone booth, find the probability that you
will have to wait
• (a) more than 10 minutes;
• (b) between 10 and 20 minutes.
Solution:
• PX > 10 = e−λ10 = e−1 = 0.368.
• P10 < X < 20 = e−1 − e−2 = 0.233
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 244
Section 5.6.1: The Gamma Distribution
Definition: A random variable has a gamma distribution with parameters
(α, λ), where α, λ > 0, if its density function is given by
f(x) =
λe−λx(λx)α−1
Γ(α) , x ≥ 0
0 x < 0
Denote X ∼ Gamma(α, λ).
∫∞0
f(x)dx = 1??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 245
The Gamma function is defined as
Γ(α) =
∫ ∞
0
e−yyα−1dy = (α − 1)Γ(α − 1).
Γ(1) = 1. Γ(0.5) =√
π. Γ(n) = (n − 1)!.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 246
Connection to exponential distribution
• When α = 1, the gamma distribution reduces to an exponential distribution.
Gamma(α = 1, λ) = EXP (λ).
• A gamma distribution Gamma(n, λ) often arises as the distribution of the
amount of time one has to wait until a total of n events has occurred.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 247
X ∼ Gamma(α, λ).
Expectation and Variance:
E(X) = αλ , V ar(X) = α
λ2 .
E(X) =
∫ ∞
0
xλe−λx(λx)α−1
Γ(α)dx
=
∫ ∞
0
e−λx(λx)α
Γ(α)dx
=Γ(α + 1)
Γ(α)
1
λ
∫ ∞
0
λe−λx(λx)α
Γ(α + 1)dx
=Γ(α + 1)
Γ(α)
1
λ
=α
λ.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 248
Section 5.6.4: The Beta Distribution
Definition: A random variable X has a beta distribution if its density is given
by
f(x) =
1B(a,b)x
a−1(1 − x)b−1 0 < x < 1
0 otherwise
Denote X ∼ Beta(a, b).
The Beta function
B(a, b) =Γ(a)Γ(b)
Γ(a + b)=
∫ 1
0
xa−1(1 − x)b−1dx
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 249
X ∼ Beta(a, b).
Expectation and Variance:
E(X) =a
a + b,
V ar(X) =ab
(a + b)2(a + b + 1)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 250
Sec. 5.7: Distribution of a Function of a Random Variable
Given the density function fX(x) and cumulative function FX(x) for a random
variable X , what is the general formula for the density function fY (y) of the
random variable Y = g(X)?
Example 5.7.7a: X ∼ Uniform(0, 1). Y = g(X) = Xn.
FY (y) = PY ≤ y = PXn ≤ y = PX ≤ y1/n = FX(y1/n) = y1/n
fY (y) =1
ny1/n−1, 0 ≤ y ≤ 1
=fX
(
g−1(y))
∣
∣
∣
∣
d
dyg−1(y)
∣
∣
∣
∣
y = g(x) = xn =⇒ g−1(y) = y1/n.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 251
Theorem 7.1: Let X be a continuous random variable having probability
density function fX . Suppose g(x) is a strictly monotone (increasing or
decreasing), differentiable function of x. Then the random variable Y = g(X)
has a probability density function given by
fY (y) =
fX
(
g−1(y))
∣
∣
∣
ddy g−1(y)
∣
∣
∣if y = g(x) for some x
0 if y 6= g(x) for all x
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 252
Example 7b: Y = X2
FY (y) = P Y ≤ y = P
X2 ≤ y
=P −√y ≤ X ≤ √
y=FX(
√y) − FX(−√
y)
fY (y) =1
2√
y(fX(
√y) + fX(−√
y))
Does this example violate Theorem 7.1?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 253
Example (Cauchy distribution) : Suppose a random variable X has density
function
f(x) =1
π(1 + x2), −∞ < x < ∞
Q: Find the distribution of 1X .
One can check∫∞−∞ f(x) = 1, but E(X) = ∞.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 254
Solution: Denote the cumulative function of X by FX(x).
Let Z = 1/X . Assume z > 0,
FZ(z) =P1/X ≤ z = PX ≤ 0 + PX ≥ 1/z, X ≥ 0
=1
2+
∫ ∞
1/z
1
π(1 + x2)dx
=1
2+
1
πtan−1(z)
∣
∣
∣
∣
∞
1/z
=1
2+
1
π
(π
2− tan−1(1/z)
)
Therefore, for z > 0,
fZ(z) =1
π
[π
2− tan−1(1/z)
]′=
1
π(1 + z2).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 255
Assume z < 0,
FZ(z) =P1/X ≤ z = PX ≥ 1/z, X ≤ 0
=
∫ 0
1/z
1
π(1 + x2)dx
=1
πtan−1(z)
∣
∣
∣
∣
0
1/z
=1
π
(
0 − tan−1(1/z))
Therefore, for z < 0,
fZ(z) =1
π
[
0 − tan−1(1/z)]′
=1
π(1 + z2).
Thus, Z and X have the same density function.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 256
Chapter 6: Jointly Distributed Random Variables
Definition: For any two random variables X and Y , the joint cumulative
probability distribution is defined as
F (x, y) = P X ≤ x, Y ≤ y , −∞ < x, y < ∞
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 257
F (x, y) = P X ≤ x, Y ≤ y , −∞ < x, y < ∞The distribution of X is
FX(x) =PX ≤ x=PX ≤ x, Y ≤ ∞=F (x,∞)
The distribution of Y is
FY (y) =PY ≤ y = F (∞, y)
Example:
PX > x, Y > y = 1 − FX(x) − FY (y) + F (x, y)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 258
Jointly Discrete Random Variables
X and Y are both discrete random variables.
Joint probability mass function
p(x, y) = PX = x, Y = y,∑
x
∑
y
p(x, y) = 1
Marginal probability mass function
pX(x) = PX = x =∑
y:p(x,y)>0
p(x, y)
pY (y) = PY = y =∑
x:p(x,y)>0
p(x, y)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 259
Example 1a: Suppose that 3 balls are randomly selected from an urn
containing 3 red, 4 white, and 5 blue balls. Let X and Y denote, respectively, the
number of red and white balls chosen.
Q: The joint probability mass function p(x, y) = PX = x, Y = y=??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 260
p(0, 0) =
(
53
)
(
123
) =10
220
p(0, 1) =
(
41
)(
52
)
(
123
) =40
220
p(0, 2) =
(
42
)(
51
)
(
123
) =30
220
p(0, 3) =
(
43
)
(
123
) =4
220
pX(0) = p(0, 0) + p(0, 1) + p(0, 2) + p(0, 3) =84
220
pX(0) =
(
93
)
(
123
) =84
220
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 261
Jointly Continuous Random Variables
X and Y are jointly continuous if there exists a function f(x, y) defined for all
real x and y, having the property that for every set C of pairs of real numbers
P(X, Y ) ∈ C =
∫ ∫
(x,y)∈C
f(x, y)dxdy
The function f(x, y) is called the joint probability density function.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 262
The joint cumulative function
F (x, y) = PX ∈ (−∞, x], Y ∈ (−∞, y]
=
∫ y
−∞
∫ x
−∞f(x, y)dxdy
The joint probability density function
f(x, y) =∂2
∂x∂yF (x, y)
The marginal probability density functions
fX(x) =
∫ ∞
−∞f(x, y)dy, fY (y) =
∫ ∞
−∞f(x, y)dx
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 263
Example 1c: The joint density function is given by
f(x, y) =
Ce−xe−2y 0 < x < ∞, 0 < y < ∞0 otherwise
• (a) Determine C .
• (b) PX > 1, Y < 1,
• (c) PX < Y ,
• (d) PX < a.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 264
Solution (a):
1 =
∫ ∞
0
∫ ∞
0
Ce−xe−2ydxdy
=C
∫ ∞
0
e−xdx
∫ ∞
0
e−2ydy
=C × e−x|0∞ × 1
2e−2y|0∞ =
C
2
Therefore, C = 2.
Solution (b):
PX > 1, Y < 1 =
∫ 1
0
∫ ∞
1
2e−xe−2ydxdy
=e−x|1∞ e−2y|01 = e−1(1 − e−2)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 265
Solution (c):
PX < Y =
∫ ∫
(x,y):x<y
2e−xe−2ydxdy =
=
∫ ∞
0
∫ y
0
2e−xe−2ydxdy
=
∫ ∞
0
[∫ y
0
e−xdx
]
2e−2ydy
=
∫ ∞
0
[
1 − e−y]
2e−2ydy
=
∫ ∞
0
2e−2y − 2e−3ydy
=1 − 2
3=
1
3
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 266
Solution (d):
PX < a =PX < a, Y < ∞
=
∫ a
0
∫ ∞
0
2e−2ye−xdydx
=
∫ a
0
e−xdx
=1 − e−a
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 267
Example 1e: The joint density is given by
f(x, y) =
e−(x+y) 0 < x < ∞, 0 < y < ∞0 otherwise
Q: Find the density function of the random variable X/Y .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 268
Solution: Let Z = X/Y , 0 < Z < ∞.
FZ(z) =P Z < z = P X/Y ≤ z
=
∫ ∫
x/y≤z
e−(x+y)dxdy
=
∫ ∞
0
[∫ yz
0
e−(x+y)dx
]
dy
=
∫ ∞
0
e−y[
e−x|0yz
]
dy
=
∫ ∞
0
e−y − e−y−yzdy
=e−y|0∞ +e−y−yz
z + 1
∣
∣
∣
∣
∞
0
= 1 − 1
z + 1
Therefore, fZ(z) =[
1 − 1z+1
]′= 1
(z+1)2 .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 269
FZ(z) =P Z < z = P X/Y ≤ z
=
∫ ∫
x/y≤z
e−(x+y)dxdy
=
∫ ∞
0
[
∫ ∞
xz
e−(x+y)dy
]
dx
=
∫ ∞
0
e−x[
e−y|xz∞]
dy
=
∫ ∞
0
e−x−xz dy
=e−x−x
z
1 + 1z
∣
∣
∣
∣
0
∞=
1
1 + 1z
= 1 − 1
z + 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 270
Multiple Jointly Distributed Random Variables
Joint cumulative distribution function
F (x1, x2, x3, ..., xn) = P X1 ≤ x1, X2 ≤ x2, ..., Xn ≤ xn
Suppose X1, X2, ..., Xn are discrete random variables.
Joint probability mass function
p(x1, x2, ..., xn) = PX1 = x1, X2 = x2, ..., Xn = xn
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 271
X1, X2, ..., Xn are jointly continuous if there exists a function f(x1, x2, ..., fn)
such that for any set C in n-dimensional space
P(X1, X2, ..., Xn) ∈ C =
∫ ∫
...
∫
(x1,x2,...,xn)∈C
f(x1, x2, ..., xn)dx1dx2...dxn
f(x1, x2, ..., xn) is the joint probability density function .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 272
Section 6.2: Independent Random Variables
Definition: The random variable X and Y are said to be independent if for
any two sets of real numbers A and B
PX ∈ A, Y ∈ B = PX ∈ A × PY ∈ B
equivalently
PX ≤ x, Y ≤ y = PX ≤ x × PY ≤ y
or, equivalently,
p(x, y) = pX(x)pY (y) for all x, y when X and Y are discrete
f(x, y) = fX(x)fY (y) for all x, y when X and Y are continous
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 273
Example 2b: Suppose the number of people that enter a post office on a
given day is a Poisson random variable with parameter λ.
Each person that enters the post office is male with probability p and a female
with probability 1 − p.
Let X and Y , respectively, be the number of males and females that enter the
post office.
• (a): Find the joint probability mass function PX = i, Y = j.
• (b): Find the marginal probability mass functions PX = i, PY = j.
• (c): Show that X and Y are independent.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 274
Key: Suppose we know there are N = n people that enter a post office. Then
the number of males is a binomial with parameters (n, p).
[X |N = n] ∼ Binomial (n, p)
[N = X + Y ] ∼ Poi(λ).
Therefore,
PX = i, Y = =∞∑
n=0
PX = i, Y = j|N = nPN = n
=PX = i, Y = j|N = i + jPN = i + j
because PX = i, Y = j|N 6= i + j = 0.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 275
Solution (a) Conditional on X + Y = i + j:
PX = i, Y = j=PX = i, Y = j|X + Y = i + jPX + Y = i + j
=
[(
i + j
i
)
pi(1 − p)j
] [
e−λ λi+j
(i + j)!
]
=pi(1 − p)je−λλiλj
i!j!
=pi(1 − p)je−λ(p+(1−p))λiλj
i!j!
=
[
e−λp (λp)i
i!
] [
e−λ(1−p) (λ(1 − p))j
j!
]
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 276
Solution (b)
PX = i =
∞∑
j=0
PX = i, Y = j
=∞∑
j=0
[
e−λp (λp)i
i!
] [
e−λ(1−p) (λ(1 − p))j
j!
]
=
[
e−λp (λp)i
i!
] ∞∑
j=0
[
e−λ(1−p) (λ(1 − p))j
j!
]
=
[
e−λp (λp)i
i!
]
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 277
Example 2c: A man and a woman decide to meet at a certain location. Each
person independently arrives at a time uniformly distributed between 12 noon and
1 pm.
Q: Find the probability that the first to arrive has to wait longer than 10 minutes.
Solution:
Let X = number of minutes by which one person arrives later than 12 noon.
Let Y = number of minutes by which the other person arrives later than 12 noon.
X and Y are independent.
The desired probability = P|X − Y | ≥ 10.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 278
P|X − Y | ≥ 10 =
∫ ∫
|x−y|>10
f(x, y)dxdy
=
∫ ∫
|x−y|>10
fX(x)fY (y)dxdy
=2
∫ ∫
y<x−10
1
60
1
60dxdy
=2
∫ 60
10
∫ x−10
0
1
3600dydx
=25
36.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 279
A Necessary and Suffice Condition for Independence
Proposition 2.1 The continuous (discrete) random variables X and Y are
independent if and only if their joint probability density (mass) function can be
expressed as
fX,Y (x, y) = h(x)g(y), −∞ < x, y < ∞
Proof: By definition, X and Y are independent if fX,Y (x, y) = fX(x)fY (y).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 280
Example 2f:
(a): Two random variables X and Y with joint density given by
f(x, y) = 6e−2xe−3y, 0 < x, y < ∞
are independent.
(b): Two random variables X and Y with joint density given by
f(x, y) = 24xy 0 < x, y < 1, 0 < x + y < 1
are NOT independent. (Why?)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 281
Independence of More Than Two Random Variables
Definition: The n random variables, X1, X2, ..., Xn, are said to be
independent if, for all sets of real numbers, A1, A2, ..., An
PX1 ∈ A1, X2 ∈ A2, ..., Xn ∈ An =n∏
i=1
PXi ∈ Ai
equivalently,
PX1 ≤ x1, X2 ≤ x2, ..., Xn ≤ xn =n∏
i=1
PXi ≤ xi,
for all x1, x2, ..., xn.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 282
Example 2h: Let X , Y , Z be independent and uniformly distributed over
(0, 1). Compute PX ≥ Y Z.
Solution:
PX ≥ Y Z =
∫ ∫ ∫
x≥yz
f(x, y, z)dxdydz
=
∫ 1
0
∫ 1
0
∫ 1
yz
dxdydz
=
∫ 1
0
∫ 1
0
(1 − yz) dydz
=
∫ 1
0
(
1 − z
2
)
dz
=3
4
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 283
Chapter 6.3: Sum of Independent Random Variables
Suppose that X and Y are independent. Let Z = X + Y .
FZ(z) =P X + Y ≤ z =
∫ ∫
x+y≤z
fX(x)fY (y)dxdy
=
∫ ∞
−∞
∫ z−y
−∞fX(x)fY (y)dxdy
=
∫ ∞
−∞fY (y)
∫ z−y
−∞fX(x)dxdy
=
∫ ∞
−∞fY (y)FX(z − y)dy
fZ(z) =∂
∂zFZ(z) =
∫ ∞
−∞fY (y)fX(z − y)dy
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 284
The discrete case: X and Y are two independent discrete random variables,
then Z = X + Y has probability mass function:
pZ(z) =∑
j
pY (j)pX(z − j)
——————–
Example: Suppose X, Y ∼ Bernouli(p). Then
Z = X + Y ∼ Binomial(2, p)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 285
Suppose X, Y ∼ Bernouli(p) are independent. Then
Z = X + Y ∼ Binomial(2, p).
pZ(0) =∑
j
pY (j)pX(0 − j) =pY (0)pX(0) = (1 − p)2 =
(
2
0
)
(1 − p)2
pZ(1) =∑
j
pY (j)pX(1 − j) =pY (0)pX(1 − 0) + pY (1)pX(1 − 1)
=(1 − p)p + p(1 − p) =
(
2
1
)
p(1 − p)
pZ(2) =∑
j
pY (j)pX(2 − j) =pY (1)pX(2 − 1) = p2 =
(
2
2
)
p2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 286
Suppose X ∼ Binomial(n, p) and Y ∼ Binomial(m, p) are independent.
Then Z = X + Y ∼ Binomial(m + n, p).
pZ(z) =∑
j
pY (j)pX(z − j).
Note 0 ≤ j ≤ m and z − n ≤ j ≤ z. We can assume m ≤ n.
There are three cases to consider:
1. Suppose 0 ≤ z ≤ m, then 0 ≤ j ≤ z
2. Suppose m < z ≤ n, then 0 ≤ j ≤ m.
3. Suppose n < z ≤ m + n, then z − n ≤ j ≤ m.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 287
Suppose 0 ≤ z ≤ m, then 0 ≤ j ≤ z
pZ(z) =∑
j
pY (j)pX(z − j) =z∑
j=0
pY (j)pX(z − j)
=z∑
j=0
(
m
j
)
pj(1 − p)m−j
(
n
z − j
)
pz−j(1 − p)n−z+j
=pz(1 − p)m+n−zz∑
j=0
(
m
j
)(
n
z − j
)
=
(
m + n
z
)
pz(1 − p)m+n−z
because we have seen (e.g., using a combinatorial argument)
z∑
j=0
(
m
j
)(
n
z − j
)
=
(
m + n
z
)
.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 288
Sum of Two Independent Uniform Random Variables
X ∼ U(0, 1) and Y ∼ U(0, 1) are independent. Let Z = X + Y .
fZ(z) =
∫ ∞
−∞fY (y)fX(z − y)dy =
∫ 1
0
fX(z − y)dy
Must have 0 ≤ y ≤ 1 and 0 ≤ z − y ≤ 1, i.e., z − 1 ≤ y ≤ z.
Two cases to consider
1. If 0 ≤ z ≤ 1, then 0 ≤ y ≤ z.
2. If 1 < z ≤ 2, then z − 1 < y ≤ 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 289
If 0 ≤ z ≤ 1, then 0 ≤ y ≤ z.
fZ(z) =
∫ 1
0
fX(z − y)dy =
∫ z
0
dy = z.
If 1 < z ≤ 2, then z − 1 ≤ y ≤ 1.
fZ(z) =
∫ 1
0
fX(z − y)dy =
∫ 1
z−1
dy = 2 − z.
fZ(z) =
z 0 < z ≤ 1
2 − z 1 < z < 2
0 otherwise
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 290
Sum of Two Gamma Random Variables
X ∼ Gamma(t, λ) and Y ∼ Gamma(s, λ) are independent.
Let Z = X + Y .
Show that Z ∼ Gamma(s + t, λ).
For an intuition, see notes page 246.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 291
fZ(z) =
∫ ∞
−∞fY (y)fX(z − y)dy =
∫ z
0
fY (y)fX(z − y)dy
=
∫ z
0
[
λe−λy(λy)s−1
Γ(s)
] [
λe−λ(z−y)(λ(z − y))t−1
Γ(t)
]
dy
=e−λzλ2λs+t−2
Γ(s)Γ(t)
∫ z
0
[
e−λyys−1] [
eλy(z − y)t−1]
dy
=λe−λzλs+t−1
Γ(s)Γ(t)
∫ z
0
ys−1(z − y)t−1dy
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 292
fZ(z) =λe−λzλs+t−1
Γ(s)Γ(t)
∫ z
0
ys−1(z − y)t−1dy
=
[
λe−λz(λz)s+t−1
Γ(s + t)
] [
Γ(s + t)
zs+t−1Γ(s)Γ(t)
∫ z
0
ys−1(z − y)t−1dy
]
=
[
λe−λz(λz)s+t−1
Γ(s + t)
] [
Γ(s + t)
Γ(s)Γ(t)
∫ z
0
(y
z
)s−1 (
1 − y
z
)t−1
dy
z
]
=
[
λe−λz(λz)s+t−1
Γ(s + t)
] [
Γ(s + t)
Γ(s)Γ(t)
∫ 1
0
θs−1 (1 − θ)t−1
dθ
]
=fW (z)
[
Γ(s + t)
Γ(s)Γ(t)
∫ 1
0
θs−1 (1 − θ)t−1 dθ
]
W ∼ Gamma(s + t, λ).
Because fZ(z) is also a density function, we must have[
Γ(s + t)
Γ(s)Γ(t)
∫ 1
0
θs−1 (1 − θ)t−1 dθ
]
= 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 293
Recall the definition of Beta function in Section 5.6.4
B(s, t) =
∫ 1
0
θs−1 (1 − θ)t−1
dθ
and its property (Eq. (6.3) in Section 5.6.4)
B(s, t) =Γ(s)Γ(t)
Γ(s + t).
Thus,
[
Γ(s + t)
Γ(s)Γ(t)
∫ 1
0
θs−1 (1 − θ)t−1 dθ
]
=
[
Γ(s + t)
Γ(s)Γ(t)
Γ(s)Γ(t)
Γ(s + t)
]
= 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 294
Generalization to more than two gamma random variables
If X1, X2, ..., Xn are independent gamma random variables with respective
parameters (s1, λ), (s2, λ), ..., (sn, λ), then
X1 + X2 + ... + Xn =n∑
i=1
Xi ∼ Gamma
(
n∑
i=1
si, λ
)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 295
Sum of Independent Normal Random Variables
Proposition 3.2 If Xi, i = 1 to n, are independent random variables that
are normally distributed with respective parameters ui and σ2i , i = 1 to n. Then
X1 + X2 + ... + Xn =n∑
i=1
Xi ∼ N
(
n∑
i=1
µi,n∑
i=1
σ2i
)
————————
It suffices to show
X1 + X2 ∼ N(
µ1 + µ2, σ21 + σ2
2
)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 296
X ∼ N(µx, σ2x), Y ∼ N(µy, σ2
y). Let Z = X + Y .
fZ(z) =
∫ ∞
−∞fX(z − y)fY (y)dy
=
∫ ∞
−∞
[
1√2πσx
e− (z−y−µx)2
2σ2x
]
[
1√2πσy
e− (y−µy)2
2σ2y
]
dy
=1√2π
∫ ∞
−∞
1√2πσxσy
e− (z−y−µx)2
2σ2x
− (y−µy)2
2σ2y dy
=1√2π
∫ ∞
−∞
1√2πσxσy
e− (z−y−µx)2σ2
y+(y−µy)2σ2x
2σ2xσ2
y dy
=1√2π
∫ ∞
−∞
1√2πσxσy
e− y2(σ2
x+σ2y)−2y(µyσ2
x−(µx−z)σ2y)+(µx−z)2σ2
y+µ2yσ2
x
2σ2xσ2
y dy
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 297
Let A = σxσy , B = σ2xσ2
y , C = σ2x + σ2
y ,
D = −2(µyσ2x − (µx − z)σ2
y),
E = (µx − z)2σ2y + µ2
yσ2x.
fZ(z) =
∫ ∞
−∞fX(z − y)fY (y)dy
=1√2π
∫ ∞
−∞
1√2πσxσy
e− y2(σ2
x+σ2y)−2y(µyσ2
x−(µx−z)σ2y)+(µx−z)2σ2
y+µ2yσ2
x
2σ2xσ2
y dy
=
∫ ∞
−∞
1√2πA
e−Cy2+Dy+E
2B dy =
√
B
CA2e
D2−4EC8BC
because∫∞−∞
1√2πσ
e−(y−µ)2
2σ2 dy = 1, for any µ, σ2.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 298
∫ ∞
−∞
1√2πA
e−Cy2+Dy+E
2B dy
=
∫ ∞
−∞
1√2πA
e− y2+ D
Cy+ E
C
2 BC dy
=
∫ ∞
−∞
1√2πA
e− (y+ D
2C )2−
D2
4C2 + EC
2 BC dy
=e−
−D2
4C2 + EC
2 BC
∫ ∞
−∞
1√
2π√
BC
A√BC
e− (y+ D
2C )2
2 BC dy
=e−
−D2
4C2 + EC
2 BC
√
B
CA2=
√
B
CA2e
D2−4EC8BC
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 299
fZ(z) =1√2π
√
B
CA2e
D2−4EC8BC
=1
√2π√
σ2x + σ2
y
e
(µyσ2x−(µx−z)σ2
y)2−((µx−z)2σ2y+µ2
yσ2x)(σ2
x+σ2y)
2σ2xσ2
y(σ2x+σ2
y)
=1
√2π√
σ2x + σ2
y
e− (z−µx−µy)2
2(σ2x+σ2
y)
Therefore, Z = X + Y ∼ N(
µx + µy, σ2x + σ2
y
)
.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 300
Sum of Independent Poisson Random Variables
If X1, X2, ..., Xn are independent Poisson random variables with respective
parameters λi, i = 1 to n, then
X1 + X2 + ... + Xn =n∑
i=1
Xi ∼ Poisson
(
n∑
i=1
λi
)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 301
Sum of Independent Binomial Random Variables
If X1, X2, ..., Xn are independent binomial random variables with respective
parameters (mi, p), i = 1 to n, then
X1 + X2 + ... + Xn =n∑
i=1
Xi ∼ Binomial
(
n∑
i=1
mi, p
)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 302
Sections 6.4, 6.5: Conditional Distributions
Recall P (E|F ) = P (EF )P (F ) .
If X and Y are discrete random variables, then
pX|Y (x|y) =p(x, y)
pY (y)
If X and Y are continuous random variables, then
fX|Y (x|y) =f(x, y)
fY (y)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 303
Example 4b: Suppose X and Y are independent Poisson random variables
with respective parameters λx and λy .
Q: Calculate the conditional distribution of X , given that X + Y = n.
—————–
What is the distribution of X + Y ??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 304
Example 4b: Suppose X and Y are independent Poisson random variables
with respective parameters λx and λy .
Q: Calculate the conditional distribution of X , given that X + Y = n.
Solution:
PX = k|X + Y = n =PX = k, X + Y = n
PX + Y = n
=PX = k, Y = n − k
PX + Y = n =PX = kPY = n − k
PX + Y = n
=
[
e−λxλkx
k!
]
[
e−λyλn−ky
(n − k)!
]
[
n!
e−λx−λy(λx + λy)n
]
=
(
n
k
)(
λx
λx + λy
)k (
1 − λx
λx + λy
)n−k
Therefore, X ∼ Binomial(
n, p = λx
λx+λy
)
.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 305
Example 5b: The joint density of X and Y is given by
f(x, y) =
e−x/ye−y
y 0 < x, y < ∞0 otherwise
Q: Find PX > 1|Y = y.
————————–
PX > 1|Y = y =∫∞1
fX|Y (x|y)dx
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 306
Solution:
fX|Y (x|y) =f(x, y)
fY (y)=
e−x/ye−y
y∫∞0
e−x/ye−y
y dx=
e−x/y
∫∞0
e−x/ydx=
1
ye−x/y
PX > 1|Y = y =
∫ ∞
1
fX|Y (x|y)dx =
∫ ∞
1
1
ye−x/ydx = e−1/y
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 307
The Bivariate Normal Distribution
The random variables X and Y have a bivariate normal distribution if, for
constants, ux, uy , σx, σy , −1 < ρ < 1, their joint density function is given, for
all −∞ < x, y < ∞, by
f(x, y) =1
2πσxσy
√
1 − ρ2e− 1
2(1−ρ2)
[
(x−µx)2
σ2x
+(y−µy)2
σ2y
−2ρ(x−µx)(y−µy)
σxσy
]
If X and Y are independent, then ρ = 0, and
f(x, y) =1
2πσxe− 1
2
[
(x−µx)2
σ2x
+(y−µy)2
σ2y
]
=1√
2πσx
e− (x−µx)2
2σ2x × 1√
2πσy
e− (y−µy)2
2σ2y
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 308
fX(x) =
∫ ∞
−∞f(x, y)dy
=
∫ ∞
−∞
1
2πσxσy
√
1 − ρ2e− 1
2(1−ρ2)
[
(x−µx)2
σ2x
+(y−µy)2
σ2y
−2ρ(x−µx)(y−µy)
σxσy
]
dy
=
∫ ∞
−∞
1
2πσx
√
1 − ρ2e− 1
2(1−ρ2)
[
(x−µx)2
σ2x
+t2−2ρ(x−µx)t
σx
]
dt
=
∫ ∞
−∞
1
2πσx
√
1 − ρ2e− (x−µx)2+t2σ2
x−2ρ(x−µx)tσx
2(1−ρ2)σ2x dt
=
∫ ∞
−∞
1√2πA
e−Ct2+Dt+E
2B dt =
√
B
CA2e
D2−4EC8BC =
1√2πσx
e− (x−µx)2
2σ2x
A =√
2π√
1 − ρ2σx, B = (1 − ρ2)σ2x, C = σ2
x,
D = −2ρ(x − µx)σx, E = (x − µx)2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 309
Therefore,
X ∼ N(µx, σ2x), Y ∼ N(µy, σ2
y)
The conditional density
fX|Y (x|y) =f(x, y)
fY (y)
A useful trick: Only worry about the terms that are functions of x; the rest
terms are just normalizing constants.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 310
Any (reasonable) non-negative function g(x) can be essentially viewed as a
probability density function because one can always normalize it.
f(x) =g(x)
∫∞−∞ g(x)dx
= Cg(x) ∝ g(x), where C =1
∫∞−∞ g(x)dx
is a probability density function. (One can verify∫∞−∞ f(x) = 1).
C is just a (normalizing) constant.
When y is treated as a constant, C(y) is still a constant.
X |Y = y means y is a constant.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 311
For example, g(x) = e−(x−1)2
2 is essentially the density function of N(1, 1).
Similarly, g(x) = e−x2
−2x2 , g(x) = e−
x2−2xy2 are both (essentially) the
density functions of normals.
————–
1
C=
∫ ∞
−∞e−
x2−2xy2 dx =
√2πe
y2
2
Therefore,
f(x) = Cg(x) =1√2π
e−y2
2 × g(x) =1√2π
e−(x−y)2
2
is the density function of N(y, 1), where y is merely a constant.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 312
(X , Y ) is a bivariate normal.
fX|Y (x|y) =f(x, y)
fY (y)∝ f(x, y)
∝ e− 1
2(1−ρ2)
[
(x−µx)2
σ2x
−2ρ(x−µx)(y−µy)
σxσy
]
∝ e−
(x−µx)2−2ρ(x−µx)(y−µy)σxσy
2(1−ρ2)σ2x
∝ e−
(
x−µx−ρ(y−µy)σxσy
)2
2(1−ρ2)σ2x
Therefore,
X |Y ∼ N
(
µx + ρ(y − µy)σx
σy, (1 − ρ2)σ2
x
)
Y |X ∼ N
(
µy + ρ(x − µx)σy
σx, (1 − ρ2)σ2
y
)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 313
Example 5d: Consider n + m trials having a common probability of
success. Suppose, however, that this success probability is not fixed in advance
but is chosen from U(0, 1).
Q: What is the conditional distribution of this success probability given that the
n + m trials result in n successes?
Solution:
Let X = trial success probability. X ∼ U(0, 1).
Let N = total number of successes. N |X = x ∼ Binomial(n + m, x).
The desired probability: X |N = n.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 314
Solution:
Let X = trial success probability. X ∼ U(0, 1).
Let N = total number of successes. N |X = x ∼ Binomial(n + m, x).
fX|N (x|n) =PN = n|X = xfX(x)
PN = n
=
(
n+mn
)
xn(1 − x)m
PN = n∝ xn(1 − x)m
Therefore, X |N ∼ Beta(n + 1, m + 1).
Here X ∼ U(0, 1) is the prior distribution.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 315
If X |N ∼ Beta(n + 1, m + 1), then
E(X |N) =n + 1
(n + 1) + (m + 1)
Suppose we do not have a prior knowledge of the success probability X .
We observe n successes out of n + m trials.
The most intuitive guess (estimate) of X should be
X =n
n + m
Assuming a uniform prior on X leads to the add-one smoothing .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 316
Section 6.6: Order Statistics
Let X1, X2, ..., Xn be n independent and identically distributed, continuous
random variables having a common density f and distribution function F .
Define
X(1) = smallest of X1, X2, ..., Xn
X(2) = second smallest of X1, X2, ..., Xn
...
X(j) = jth smallest of X1, X2, ..., Xn
...
X(n) = largest of X1, X2, ..., Xn
The ordered values X(1) ≤ X(2) ≤ ... ≤ X(n) are known as the order
statistics corresponding to the random variables X1, X2, ..., Xn.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 317
For example, the CDF of the largest (i.e., X(n)) should be
FX(n)(x) = P (X(n) ≤ x) = P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x)
because, if the largest is smaller than x, then everyone is smaller than x.
Then, by independence
FX(n)(x) =P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x)
=P (X1 ≤ x)P (X2 ≤ x)...P (Xn ≤ x)
=F (x)n,
and the density function is simply
fX(n)(x) =
∂Fn(x)
∂x= nFn−1(x)f(x)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 318
The CDF of the smallest (i.e., X(1)) should be
FX(1)(x) = P (X(1) ≤ x) =1 − P (X(1) > x)
=1 − P (X1 > x, X2 > x, ..., Xn > x)
because, if the smallest is larger than x, then everyone is larger than x.
Then, by independence
FX(1)(x) =1 − P (X1 > x)P (X2 > x)...P (Xn > x)
=1 − [1 − F (x)]n
and the density function is simply
fX(1)(x) =
∂1 − [1 − F (x)]n
∂x= n [1 − F (x)]
n−1f(x)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 319
Joint density function of order statistics
for x1 ≤ x2 ≤ ... ≤ xn,
fX(1),X(2),...,X(n)(x1, x2, ..., xn) = n!f(x1)f(x2)...f(xn),
Intuitively (and heuristically)
PX(1) = x1, X(2) = x2=PX1 = x1, X2 = x2 OR X2 = x1, X1 = x2=P X1 = x1, X2 = x2 + P X2 = x1, X1 = x2=f(x1)f(x2) + f(x1)f(x2)
=2!f(x1)f(x2)
Warning, this is a heuristic argument, not a rigorous proof. Please read the book.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 320
Density function of the kth order statistics, X(k): fX(k)(x).
• Select 1 from the n values X1, X2, ..., Xn. Let it be equal to x.
• Select k − 1 from the remaining n − 1 values. Let them be smaller than x.
• Let the rest be larger than x.
• Total number of choices is ??.
• For a given choice, the probability should be ??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 321
• Select 1 from the n values X1, X2, ..., Xn. Let it be equal to x.
• Select k − 1 from the remaining n − 1 values. Let them be smaller than x.
• Let the rest be larger than x.
• Total number of choices is(
n1,k−1,n−k
)
.
• For a given choice, the probability should be
f(x) [F (x)]k−1
[1 − F (x)]n−k
.
Therefore,
fX(k)(x) =
(
n
1, k − 1, n − k
)
f(x) [F (x)]k−1
[1 − F (x)]n−k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 322
Cumulative function of the kth order statistics
• Via integrating the density function:
FX(k)(y) =
∫ y
−∞fX(k)
(x)
=
∫ y
−∞
(
n
1, k − 1, n − k
)
f(x) [F (x)]k−1
[1 − F (x)]n−k
dx
=
(
n
1, k − 1, n − k
)∫ y
−∞f(x) [F (x)]
k−1[1 − F (x)]
n−kdx
=
(
n
1, k − 1, n − k
)∫ F (y)
0
[z]k−1 [1 − z]n−k dz
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 323
• Via a direct argument.
FX(k)(y) =PX(k) ≤ (y) = P≥ k of the Xi’s are ≤ y
FX(k)(y) =PX(k) ≤ y
=P≥ k of the Xi’s are ≤ y
=
n∑
j=k
Pj of the Xi’s are ≤ y
=
n∑
j=k
(
n
j
)
[F (y)]j[1 − F (y)]
n−j
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 324
• A more intuitive solution by dividing y into non-overlapping segments.
X(k) ≤ y =y ∈ [X(n),∞)⋃
y ∈ [X(n−1), X(n))⋃
...⋃
y ∈ [X(k), X(k+1))
=n∑
j=k
(
n
j
)
[F (y)]j[1 − F (y)]
n−j
Py ∈ [X(n),∞) =
(
n
n
)
[F (y)]n
Py ∈ [X(n−1), X(n)) =
(
n
n − 1
)
[F (y)]n−1[1 − F (y)]
Py ∈ [X(k), X(k+1)) =
(
n
k
)
[F (y)]k[1 − F (y)]n−k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 325
An interesting identity by letting X ∼ U(0, 1):
n∑
j=k
(
n
j
)
yj [1 − y]n−j
=
(
n
1, k − 1, n − k
)∫ y
0
xk−1 [1 − x]n−k
dx, 0 < y < 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 326
Joint Probability Distribution of Functions of Random Vari ables
X1 and X2 are jointly continuous random variables
with probability density function fX1,X2 .
Y1 = g1(X1, X2), Y2 = g2(X1, X2).
The goal: Determine the density function fY1,Y2 .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 327
The formula to be remembered
fY1,Y2(y1, y2) = fX1,X2(x1, x2)|J(x1, x2)|−1
where, the inverse functions:
x1 = h1(y1, y2), x2 = h2(y1, y2)
and the Jocobian:
J(x1, x2) =
∣
∣
∣
∣
∣
∣
∂g1
∂x1
∂g1
∂x2
∂g2
∂x1
∂g2
∂x2
∣
∣
∣
∣
∣
∣
=∂g1
∂x1
∂g2
∂x2− ∂g2
∂x1
∂g1
∂x2
Two conditions
• y1 = g1(x1, x2) and y2 = g2(x1, x2) can be uniquely solved:
x1 = h1(y1, y2), and x2 = h2(y1, y2)
• The Jocobian J(x1, x2) 6= 0 for all points (x1, x2).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 328
Example 7a: Let X1 and X2 be jointly continuous random variables with
probability density fX1,X2 . Let Y1 = X1 + X2 and Y2 = X1 − X2.
Q: Find fY1,Y2 in terms of fX1,X2 .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 329
Solution:
y1 = g1(x1, x2) = x1 + x2, y2 = g2(x1, x2) = x1 − x2,
=⇒x1 = h1(y1, y2) = y1+y2
2 , x2 = h2(y1, y2) = y1−y2
2 ,
J(x1, x2) =
∣
∣
∣
∣
∣
∣
1 1
1 −1
∣
∣
∣
∣
∣
∣
= −2;
Therefore,
fY1,Y2(y1, y2) =fX1,X2(x1, x2)|J(x1, x2)|−1
=1
2fX1,X2
(
y1 + y2
2,y1 − y2
2
)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 330
Example 7b: The joint density in the polar coordinates.
x = h1(r, θ) = r cos θ, y = h2(r, θ) = r sin θ, i.e.,
r = g1(x, y) =√
x2 + y2, θ = g2(x, y) = tan−1 yx .
Q (a): Express fR,Θ(r, θ) in terms of fX,Y (x, y).
Q (b): What if X and Y are independent standard normal random variables?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 331
Solution: First, compute the Jocobian J(x, y):
∂g1
∂x=
x√
x2 + y2= cos θ
∂g1
∂y=
y√
x2 + y2= sin θ
∂g2
∂x=
1
1 + (y/x)2
(−y
x2
)
=−y
x2 + y2=
− sin θ
r
∂g2
∂y=
1
1 + (y/x)2
(
1
x
)
=x
x2 + y2=
cos θ
r
J =∂g1
∂x
∂g2
∂y− ∂g1
∂y
∂g2
∂x=
cos2 θ
r+
sin2 θ
r=
1
r
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 332
fR,Θ(r, θ) = fX,Y (x, y)|J |−1 = rfX,Y (r cos θ, r sin θ)
where 0 < r < ∞, 0 < θ < 2π.
If X and Y are independent standard normals, then
fR,Θ(r, θ) = rfX,Y (r cos θ, r sin θ) =r
2πe−
x2+y2
2 =re−r2/2
2π
R and Θ are independent. Θ ∼ U(0, 2π).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 333
Example 7d: X1, X2, and X3, are independent standard normal random
variables. Let
Y1 = X1 + X2 + X3
Y2 = X1 − X2
Y3 = X1 − X3
Q: Compute the joint density function fY1,Y2,Y3(y1, y2, y3).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 334
Solution:
Compute the Jocobian
J =
∣
∣
∣
∣
∣
∣
∣
∣
1 1 1
1 −1 0
1 0 −1
∣
∣
∣
∣
∣
∣
∣
∣
= 3
Solve for X1, X2, and X3
X1 =Y1 + Y2 + Y3
3, X2 =
Y1 − 2Y2 + Y3
3, X3 =
Y1 + Y2 − 2Y3
3
fY1,Y2,Y3(y1, y2, y3) = fX1,X2,X3(x1, x2, x3)|J |−1
=1
3(2π)3/2e−
x21+x2
2+x23
2 =1
3(2π)3/2e− 1
2
[
y213 +
2y22
3 +2y2
33 − 2y2y3
3
]
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 335
More about Determinant
∣
∣
∣
∣
∣
∣
∣
∣
a b c
d e f
g h i
∣
∣
∣
∣
∣
∣
∣
∣
= aei + bfg + cdh − ceg − bdi − afh
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
x1 0 0 0 ... 0
× x2 0 0 ... 0
× × x3 0 ... 0
× × × x4 ... 0
× × × × ... 0
× × × × × xn
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
= x1x2x3...xn
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 336
Chapter 7: Properties of Expectation
If X is a discrete random variable with mass function p(x), then
E(X) =∑
x
xp(x)
If X is a continuous random variable with density f(x), then
E(X) =
∫ ∞
−∞xf(x)dx
If a ≤ X ≤ b, then a ≤ E(X) ≤ b.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 337
Proposition 2.1
If X and Y have a joint probability mass function p(x, y), then
E[g(x, y)] =∑
y
∑
x
g(x, y)p(x, y)
If X and Y have a joint probability density function f(x, y), then
E[g(x, y)] =
∫ ∞
−∞
∫ ∞
−∞g(x, y)f(x, y)dxdy
What if g(X, Y ) = X + Y ?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 338
Example 2a: An accident occurs at a point X that is uniformly distributed on
a road of length L. At the time of the accident an ambulance is at a location Y
that is also uniformly distributed on the road. Assuming that X and Y are
independent, find the expected distance between the ambulance and the point of
the accident.
Solution:
X ∼ U(0, L), Y ∼ U(0, L). To find E(|X − Y |).
E(|X − Y |) =
∫ ∫
|x − y|f(x, y)dxdy
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 339
Solution:
X ∼ U(0, L), Y ∼ U(0, L). To find E(|X − Y |).
E(|X − Y |) =
∫ ∫
|x − y|f(x, y)dxdy
=1
L2
∫ L
0
∫ L
0
|x − y|dxdy
=1
L2
∫ L
0
∫ x
0
(x − y) dydx +1
L2
∫ L
0
∫ L
x
(y − x) dydx
=1
L2
∫ L
0
xy − y2
2
∣
∣
∣
∣
x
0
dx +1
L2
∫ L
0
y2
2− xy
∣
∣
∣
∣
L
x
dx
=1
L2
∫ L
0
x2
2dx +
1
L2
∫ L
0
L2
2− xL +
x2
2dx
=L
3
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 340
Let X and Y be continuous random variables with density f(x, y).
g(X, Y ) = X + Y .
E[g(x, y)] = E(X + Y )
=
∫ ∞
−∞
∫ ∞
−∞(x + y)f(x, y)dxdy
=
∫ ∞
−∞
∫ ∞
−∞xf(x, y)dxdy +
∫ ∞
−∞
∫ ∞
−∞yf(x, y)dxdy
=
∫ ∞
−∞x
[∫ ∞
−∞f(x, y)dy
]
dx +
∫ ∞
−∞y
[∫ ∞
−∞f(x, y)dx
]
dy
=
∫ ∞
−∞xfX(x)dx +
∫ ∞
−∞yfY (y)dy
=E(X) + E(Y )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 341
Linearity of Expectations
E(X + Y ) = E(X) + E(Y )
E(X + Y + Z) = E((X + Y ) + Z)
=E(X + Y ) + E(Z) = E(X) + E(Y ) + E(Z)
E(X1 + X2 + ... + Xn) = E(X1) + E(X2) + ... + E(Xn)
Linearity is not affected by correlations among Xi’s.
No assumption of independence is needed.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 342
The Sample Mean
Let X1, X2, ..., Xn be identically distributed random variables having expected
value µ. The quantity X , defined by
X =1
n
n∑
i=1
Xi
is called the sample mean.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 343
E(X) =E
[
1
n
n∑
i=1
Xi
]
=1
nE
[
n∑
i=1
Xi
]
=1
n
n∑
i=1
E[Xi]
=1
n
n∑
i=1
µ
=1
nnµ
=µ
When µ is unknown, an unbiased estimator of µ is X .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 344
Expectation of a Binomial Random Variable
X ∼ Binomial(n, p)
X = Z1 + Z2 + ... + Zn
where Zi, i = 1 to n, are independent Bernoulli with probability of success p.
Because E(Zi) = p,
E(X) = E
(
n∑
i=1
Zi
)
=
n∑
i=1
E(Zi) = np
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 345
Expectation of a Hypergeometric Random Variable
Suppose n balls are randomly selected without replacement, from an urn
containing N balls of which m are white. Denote X ∼ HG(N, m, n).
Q: Find E(X).
Solution 1: Let X = X1 + X2 + ... + Xm, where
Xi =
1 if the ith white ball is selected
0 otherwise
Note that the Xi’s are not independent; but we only need the expectation.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 346
Solution 1: Let X = X1 + X2 + ... + Xm, where
Xi =
1 if the ith white ball is selected
0 otherwise
E(Xi) =PXi = 1 = Pith white ball is selected
=
(
11
)(
N−1n−1
)
(
Nn
) =n
N
Therefore,
E(X) =mn
N.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 347
Solution 2: Let X = Y1 + Y2 + ... + Yn, where
Yi =
1 if the ith ball selected is white
0 otherwise
E(Yi) =PYi = 1 = Pith ball elected is white
=m
N
Therefore,
E(X) =mn
N.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 348
Expected Number of Matches
A group of N people throw their hats into the center of a room. The hats are
mixed up and each person randomly selects one.
Q: Find the expected number of people that selects their own hat.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 349
Solution: Let X denote the number of matches and let
X = X1 + X2 + ... + XN ,
where
Xi =
1 if the i th person selects his own hat
0 otherwise
E(Xi) = PXi = 1 =1
N
Thus
E(X) =N∑
i=1
E(Xi) =1
NN = 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 350
Expectations May Be Undefined
A More Rigorous Definition: If X is a discrete random variable with mass
function p(x), then
E(X) =∑
x
xp(x), provided∑
x
|x|p(x) < ∞
If X is a continuous random variable with density f(x), then
E(X) =
∫ ∞
−∞xf(x)dx, provided
∫ ∞
−∞|x|f(x) < ∞
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 351
Example (Cauchy Distribution):
f(x) =1
π
1
x2 + 1, −∞ < x < ∞
E(X) =
∫ ∞
−∞
x
π
1
x2 + 1dx = 0 ??? (Answer is NO)
because∫ ∞
−∞
|x|π
1
x2 + 1dx =
∫ ∞
0
2x
π
1
x2 + 1dx
=
∫ ∞
0
1
π
1
x2 + 1d[x2 + 1]
=1
πlog(x2 + 1)
∣
∣
∞0
=1
πlog∞ = ∞
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 352
Section 7.3: Moments of The Number of Events
Ii =
1 if Ai occurs
0 otherwise
X =∑n
i=1 Ii: number of these events Ai that occur. We know
E(X) =n∑
i=1
E(Ii) =n∑
i=1
P (Ai)
Suppose we are interested in the number of pairs of events that occur, eg. AiAj
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 353
X =∑n
i=1 Ii: number of these events Ai that occur.
X(X−1)2 =
(
X2
)
: number of pairs of event that occur.
⇐⇒∑
i<j IiIj : number of pairs of event that occur, because
IiIj =
1 if both Ai and Aj occur
0 otherwise
Therefore,
E
(
X(X − 1)
2
)
=∑
i<j
P (AiAj)
E(
X2)
− E (X) = 2∑
i<j
P (AiAj)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 354
Second Moments of a Binomial Random Variables
n independent trials, with each trial being a success with probability p.
Ai = the event that trial i is a success.
If i 6= j, then P (AiAj) = p2.
E(
X2)
− E (X) = 2∑
i<j
P (AiAj) = 2
(
n
2
)
p2 = n(n − 1)p2
E(X2) = n2p2 − np2 + np
V ar(X) =[
n2p2 − np2 + np]
− [np]2
= np(1 − p)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 355
Second Moments of a Hypergeometric Random Variables
n balls are randomly selected without replacement from an urn containing N
balls, of which m are white.
Ai = event that the ith ball selected is white.
X = number of white balls selected. We have known
E(X) =
n∑
i=1
P (Ai) = nm
N.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 356
P (AiAj) = P (Ai)P (Aj |Ai) =m
N
m − 1
N − 1, i 6= j
E(
X2)
− E (X) = 2∑
i<j
P (AiAj) = 2
(
n
2
)
m
N
m − 1
N − 1
E(X2) = n(n − 1)m
N
m − 1
N − 1+
nm
N
V ar(X) =nm(N − n)(N − m)
N2(N − 1)= n
(m
N
)(
1 − m
N
) N − n
N − 1
One can view mN = p. Note that N−n
N−1 ≈ 1 if N is much larger than n.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 357
Second Moments of the Matching Problem
Let Ai = event that person i selects his/her own hat in the match problem, i = 1
to N .
P (AiAj) = P (Ai)P (Aj |Ai) =1
N
1
N − 1, i 6= j
X = number of people who select their own hat. We have shown E(X) = 1.
E(X2) − E(X) = 2∑
i<j
P (AiAj) = N(N − 1)1
N(N − 1)= 1
V ar(X) = E(X2) − E2(X) = 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 358
Section 7.4: Covariance and Correlations
Proposition 4.1: If X and Y are independent, then
E(XY ) = E(X)E(Y ).
For any function h and g,
E[h(X) g(Y )] = E[h(X)] E[g(Y )].
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 359
Proof:
E[h(X) g(Y )] =
∫ ∞
−∞
∫ ∞
−∞g(x)h(y)f(x, y)dxdy
=
∫ ∞
−∞
∫ ∞
−∞g(x)h(y)fX(x)fY (y)dxdy
=
[∫ ∞
−∞g(x)fX(x)dx
] [∫ ∞
−∞h(y)fY (y)dy
]
=E[h(X)] E[g(Y )]
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 360
Covariance
Definition: The covariance between X and Y , is defined by
Cov(X, Y ) = E[(X − E[X ])(Y − E[Y ])]
Equivalently
Cov(X, Y ) = E(XY ) − E(X)E(Y )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 361
Proof:
Cov(X, Y ) = E[(X − E[X ])(Y − E[Y ])]
= E[XY − E[X ]Y − E[Y ]X + E[X ]E[Y ]]
= E[XY ] − E[E[X ]Y ] − E[E[Y ]X ] + E[E[X ]E[Y ]]
= E[XY ] − E[X ]E[Y ] − E[Y ]E[X ] + E[X ]E[Y ]
= E[XY ] − E[X ]E[Y ]
Note that E(X) is a constant, so is E(X)E(Y )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 362
X and Y are independent =⇒ Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 0.
However, that X and Y are uncorrelated Cov(X, Y ) = 0,
does not imply that X and Y are independent
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 363
Properties of Covariance: Proposition 4.2
• Cov(X, Y ) = Cov(Y, X)
• Cov(X, X) = V ar(X)
• Cov(aX, Y ) = aCov(X, Y )
Directly follow from the definition Cov(X, Y ) = E(XY ) − E(X)E(Y ).
•
Cov
n∑
i=1
Xi,m∑
j=1
Yj
=n∑
i=1
m∑
j=1
Cov(Xi, Yj)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 364
Proof:
Cov
n∑
i=1
Xi,m∑
j=1
Yj
=E
n∑
i=1
Xi
m∑
j=1
Yj
− E
(
n∑
i=1
Xi
)
E
m∑
j=1
Yj
=E
n∑
i=1
m∑
j=1
XiYj
−n∑
i=1
E(Xi)m∑
j=1
E(Yj)
=n∑
i=1
m∑
j=1
E(XiYj) −n∑
i=1
m∑
j=1
E(Xi)E(Yj)
=n∑
i=1
m∑
j=1
[E(XiYj) − E(Xi)E(Yj)]
=
n∑
i=1
m∑
j=1
Cov(Xi, Yj)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 365
Cov
n∑
i=1
Xi,m∑
j=1
Yj
=n∑
i=1
m∑
j=1
Cov(Xi, Yj)
If n = m and Xi = Yi, then
V ar
(
n∑
i=1
Xi
)
=Cov
n∑
i=1
Xi,n∑
j=1
Xj
=n∑
i=1
n∑
j=1
Cov(Xi, Xj)
=n∑
i=1
Cov(Xi, Xi) +∑
i 6=j
Cov(Xi, Xj)
=n∑
i=1
V ar(Xi) +∑
i 6=j
Cov(Xi, Xj)
=n∑
i=1
V ar(Xi) + 2∑
i<j
Cov(Xi, Xj)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 366
If Xi’s are pairwise independent, then
V ar
(
n∑
i=1
Xi
)
=
n∑
i=1
V ar(Xi)
Example 4b: X ∼ Binomial(n, p) can be written as
X = X1 + X2 + ... + Xn,
where Xi ∼ Bernoulli(p). Xi’s are independent. Therefore
V ar(X) =V ar
(
n∑
i=1
Xi
)
=n∑
i=1
V ar(Xi) =n∑
i=1
p(1 − p) = np(1 − p).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 367
Example 4a: Let X1, X2, ..., Xn be independent and identically distributed
random variables having expected value µ and variance σ2. Let X be the
sample mean and S2 be the sample variance, where
X =1
n
n∑
i=1
Xi, S2 =1
n − 1
n∑
i=1
(Xi − X)2
Q: Find V ar(X) and E(S2).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 368
Solution:
V ar(X) =V ar
(
1
n
n∑
i=1
Xi
)
=1
n2V ar
(
n∑
i=1
Xi
)
=1
n2
n∑
i=1
V ar (Xi)
=1
n2
n∑
i=1
σ2 =σ2
n
We’ve already shown E(X) = µ. As the sample size n increases, the
variance of X , the unbiased estimator of µ, decreases.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 369
The Less Clever Approach (different from the book)
E(S2) =E
(
1
n − 1
n∑
i=1
(Xi − X)2
)
=1
n − 1E
(
n∑
i=1
(
X2i + X2 − 2XXi
)
)
=1
n − 1E
(
n∑
i=1
X2i + nX2 − 2X
n∑
i=1
Xi
)
=1
n − 1E
(
n∑
i=1
X2i − nX2
)
=1
n − 1
(
n∑
i=1
E(X2i ) − nE
(
X2)
)
=n
n − 1
(
µ2 + σ2 − E(
X2))
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 370
E(X2) =1
n2E
(
n∑
i=1
Xi
)2
=1
n2E
n∑
i
X2i + 2
∑
i<j
XiXj
=1
n2
n∑
i
E(X2i ) + 2
∑
i<j
E(XiXj)
=1
n2
n∑
i
(µ2 + σ2) + 2∑
i<j
µ2
=1
n2
[
n(
µ2 + σ2)
+ n(n − 1)µ2]
=σ2
n+ µ2
Or, directly,
E(X2) =V ar(
X)
+ E2(
X)
=σ2
n+ µ2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 371
Therefore,
E(S2) =n
n − 1
(
µ2 + σ2 − E(
X2))
=n
n − 1
(
µ2 + σ2 − σ2
n− µ2
)
=σ2
Sample variance S2 = 1n−1
∑ni=1(Xi − X)2 is an unbiased estimator of σ2.
Why 1n−1 instead of 1
n ?
∑ni=1(Xi − X)2 is a sum of n terms with only n − 1 degrees of freedom,
because X = 1n
∑ni=1 Xi.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 372
The Standard Clever Technique for evaluating E(S2) (as in the textbook)
(n − 1)S2 =n∑
i=1
(Xi − X)2 =n∑
i=1
(Xi−µ + µ − X)2
=n∑
i=1
(Xi − µ)2 +n∑
i=1
(X − µ)2 − 2n∑
i=1
(Xi − µ)(X − µ)
=n∑
i=1
(Xi − µ)2 − n(X − µ)2
E[(n − 1)S2] =(n − 1)E(S2) = E
[
n∑
i=1
(Xi − µ)2 − n(X − µ)2
]
=nV ar(Xi) − nV ar(X) = nσ2 − nσ2
n= (n − 1)σ2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 373
The Correlation of Two Random Variables
Definition: The correlation of two random variables X and Y , denoted by
ρ(X, Y ), is defined, as long as V ar(X) > 0 and V ar(Y ) > 0, by
ρ(X, Y ) =Cov(X, Y )
√
V ar(X)V ar(Y )
It can be shown that
−1 ≤ ρ(X, Y ) ≤ 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 374
−1 ≤ ρ(X, Y ) =Cov(X, Y )
√
V ar(X)V ar(Y )≤ 1
Proof: The basic trick: Variance V ar(X) ≥ 0 always.
V ar
(
X√
V ar(X)+
Y√
V ar(Y )
)
=V ar(X)
V ar(X)+
V ar(Y )
V ar(Y )+
2Cov(X, Y )√
V ar(X)V ar(Y )
=2 (1 + ρ(X, Y )) ≥ 0
Therefore, ρ(X, Y ) ≥ −1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 375
V ar
(
X√
V ar(X)− Y√
V ar(Y )
)
=V ar(X)
V ar(X)+
V ar(Y )
V ar(Y )− 2Cov(X, Y )√
V ar(X)V ar(Y )
=2 (1 − ρ(X, Y )) ≥ 0
Therefore, ρ(X, Y ) ≤ 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 376
• ρ(X, Y ) = +1 =⇒ Y = a + bX , and b > 0.
• ρ(X, Y ) = −1 =⇒ Y = a − bX , and b > 0.
• ρ(X, Y ) = 0 =⇒ X and Y are uncorrelated
• ρ(X, Y ) is close to +1 =⇒ X and Y are highly positively correlated.
• ρ(X, Y ) is close to −1 =⇒ X and Y are highly negatively correlated.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 377
Section 7.5: Conditional Expectation
If X and Y are jointly discrete random variables
E[X |Y = y] =∑
x
xPX = x|Y = y
=∑
x
xpX|Y (x|y)
If X and Y are jointly continuous random variables
E[X |Y = y] =
∫ ∞
−∞xfX|Y (x|y)dx
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 378
Example 5b: Suppose the joint density of X and Y is given by
f(x, y) =1
ye−x/ye−y, 0 < x, y < ∞
Q: Compute
E[X |Y = y] =
∫ ∞
−∞xfX|Y (x|y)dx
What the answer should be like? (a function a y)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 379
Solution:
fX|Y (x|y) =f(x, y)
fY (y)=
f(x, y)∫∞−∞ f(x, y)dx
=
1y e−x/ye−y
∫∞0
1y e−x/ye−ydx
=
1y e−x/ye−y
e−x/ye−y∣
∣
0
∞
=
1y e−x/ye−y
e−y
=1
ye−x/y, y > 0
E[X |Y = y] =
∫ ∞
−∞xfX|Y (x|y)dx =
∫ ∞
0
x
ye−x/ydx = y
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 380
Computing Expectations by Conditioning
Proposition 5.1: The tower property
E(X) = E[E[X |Y ]]
In some cases, computing E(X) is difficult;
but computing E(X |Y ) may be much more convenient.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 381
Proof: Assume X and Y are continuous. (Book assumes discrete variables)
E[E[X |Y ]] =
∫ ∞
−∞E(X |Y )fY (y)dy
=
∫ ∞
−∞
[∫ ∞
−∞xfX|Y (x|y)dx
]
fY (y)dy
=
∫ ∞
−∞
∫ ∞
−∞xfX|Y (x|y)fY (y)dxdy
=
∫ ∞
−∞
∫ ∞
−∞x[
fX|Y (x|y)fY (y)]
dxdy
=
∫ ∞
−∞
∫ ∞
−∞xf(x, y)dxdy
=
∫ ∞
−∞x
[∫ ∞
−∞f(x, y)dy
]
dx
=
∫ ∞
−∞xfX(x)dx = E(X)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 382
Example 5d: Suppose N the number of people entering a store on a given
day is a random variable with mean 50. Suppose further that the amounts of
money spent by these customers are independent random variables having a
common mean $8. Assume the amount of money spent by a customer is also
independent of the total number of customers to enter the store.
Q: What is the expected amount of money spent in the store on a given day?
Solution:
N : the number of people entering the store. E(N) = 50.
Xi: the amount spent by the ith customer. E(Xi) = 8.
X : the total amount of money spent in the store on a give day. E(X) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 383
We can not use the linearity directly:
E(X) = E
(
N∑
i=1
Xi
)
=N∑
i=1
E(Xi) =N∑
i=1
8 ???
because N , the number of terms in the summation, is also random.
E(X) =E
(
N∑
i=1
Xi
)
=E
[
E
(
N∑
i=1
Xi|N)]
=E
[
N∑
i=1
E(Xi)
]
=E(N8) = 8E(N)
=8 × 50 = $400
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 384
Example 2e: A game is begun by rolling an ordinary pair of dice.
• If the sum of the dice is 2, 3, or 12, the player loses.
• If the sum is 7 or 11, the player wins.
• If the sum is any other numbers i,the player continues to roll the dice until the
sum is either 7 or i.
– If it is 7, the player loses;
– If it is i, the player wins;
—————————–
Q : Compute E(R), where R= number of rolls in a game.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 385
Conditional on S, the initial sum,
E(R) =
12∑
i=2
E(R|S = i)P (S = i)
Let P (S = i) = Pi.
P2 =1
36, P3 =
2
36, P4 =
3
36,
P5 =4
36, P6 =
5
36, P7 =
6
36,
P8 =5
36, P9 =
4
36, P10 =
3
36,
P11 =2
36, P12 =
1
36
Equivalently,
Pi = P14−i =i − 1
36, i = 2, 3, ..., 7
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 386
Obviously,
E(R|S = i) =
1, if i = 2, 3, 7, 11, 12
1 + E(Z), otherwise
Z = the number of rolls until the sum is either i or 7.
Note that i /∈ 2, 3, 7, 11, 12 at this point.
Z is a geometric random variable with success probability = Pi + P7.
Thus E(Z) = 1Pi+P7
.
Therefore,
E(R) =1 +6∑
i=4
Pi
Pi + P7+
10∑
i=8
Pi
Pi + P7= 3.376
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 387
Computing Probabilities by Conditioning
If Y is discrete, then
P (E) =∑
y
P (E|Y = y)P (Y = y)
If Y is continuous, then
P (E) =
∫ ∞
−∞P (E|Y = y)fY (y)dy
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 388
The Best Prize Problem: Deal? No Deal!
We are presented with n distinct prizes in sequence. After being presented with a
prize we must decide whether to accept it or to reject it and consider the next
prize. The objective is to maximize the probability of obtaining the best prize.
Assuming that all n! orderings are equally likely.
Q: How well can we do? What would be a good strategy?
—————————–
Many related applications: online algorithms, hiring decisions, dating, etc.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 389
A plausible strategy: Fix a number k, 0 ≤ k ≤ n. Reject the first k prizes
and then accepts the first one that is better than all of those first k.
Let X = the location of the best prize, i.e., 1 ≤ X ≤ n.
Let Pk(best) = the best prize is selected using this strategy.
Pk(best) =n∑
i=1
Pk(best|X = i)P (X = i)
=1
n
n∑
i=1
Pk(best|X = i)
Obviously
Pk(best|X = i) = 0, if i ≤ k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 390
Assume i > k, then
Pk(best|X = i) =Pk(best of the first i − 1 is among the first k|X = i)
=k
i − 1
Therefore,
Pk(best) =1
n
n∑
i=1
Pk(best|X = i)
=k
n
n∑
i=k+1
1
i − 1
Next question: which k to use? We choose the k that maximizes Pk(best).
Exhaustive search is the best thing to do, which only involves a linear scan.
However, it is often useful to have an “analytical” (albeit approximate) expression.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 391
A useful approximate formula
n∑
i=1
1
i= 1 +
1
2+
1
3+ ... +
1
n≈ log n + γ
where Euler’s constant
γ = 0.57721566490153286060651209008240243104215933593992...
The approximation is asymptotically (as n → ∞) exact.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 392
Pk(best) =k
n
n∑
i=k+1
1
i − 1
=k
n
(
1
k+
1
k + 1+ ... +
1
n − 1
)
=k
n
(
n−1∑
i=1
1
i−
k−1∑
i=1
1
i
)
≈k
n(log(n − 1) − log(k − 1))
=k
nlog
n − 1
k − 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 393
Pk(best) ≈k
nlog
n − 1
k − 1≈ k
nlog
n
k
[Pk(best)]′ ≈ 1
nlog
n
k− k
n
1
k= 0 =⇒ log
n
k= 1 =⇒ k =
n
e
Also, when k = ne , Pk(best) ≈ n
n1e ≈ 0.36788, as n → ∞.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 394
Conditional Variance
Proposition 5.2:
V ar(X) = E[V ar(X |Y )] + V ar(E[X |Y ])
Proof: (from the opposite direction, compared to the book)
E[V ar(X |Y )] + V ar(E[X |Y ])
=E(
E(X2|Y ) − E2(X |Y ))
+(
E(E2(X |Y )) − E2(E[X |Y ]))
=(
E(E(X2|Y )) − E2(E[X |Y ]))
+(
−E(E2(X |Y )) + E(E2(X |Y )))
=E(X2) − E2(X)
=V ar(X)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 395
Example 5o: Let X1, X2, ..., XN be a sequence of independent and
identically distributed random variables and let N be a non-negative
integer-valued random variable independent of Xi.
Q: Compute V ar(
∑Ni=1 Xi
)
.
Solution:
V ar
(
N∑
i=1
Xi
)
=E
[
V ar
(
N∑
i=1
Xi|N)]
+ V ar
(
E
[
N∑
i=1
Xi|N])
=E (V ar(X)N) + V ar(NE(X))
=V ar(X)E(N) + E2(X)V ar(N)
top related