probability, random processes and inference - cic …pescamilla/prpi/slides/prpi_1.pdf ·...
TRANSCRIPT
INSTITUTO POLITÉCNICO NACIONALCENTRO DE INVESTIGACION EN COMPUTACION
Probability, Random Processes and Inference
Dr. Ponciano Jorge Escamilla [email protected]
http://www.cic.ipn.mx/~pescamilla/
Laboratorio de
Ciberseguridad
CIC
Instructor
Dr. Ponciano Jorge Escamilla Ambrosio
http://www.cic.ipn.mx/~pescamilla/
Class meetings
Mondays and Wednesdays 12:00 – 14:00 hrs.
Classroom Aula A3
2
Probability, Random
Processes and Inference
CIC
Course web site:
http://www.cic.ipn.mx/~pescamilla/academy.html
Reader material and homework exercises, etc.
3
Course web site
CIC
The student will learn the fundamentals of
probability theory: probabilistic models, discrete
and continuous random variables, multiple
random variables and limit theorems as well as an
introduction to more advanced topics such as
random processes and statistical inference. At the
end of the course the student will be able to
develop and analyse probabilistic models in a
manner that combines intuitive understanding and
mathematical precision.
4
Course Objective
CIC
5
Course content
1. Probability
1.1. What is Probability?
1.1.1. Statistical Probability
1.1.2. Probability as a Measure of Uncertainty
1.2. Sample Space and Probability
1.2.1. Probabilistic Models
1.2.2. Conditional Probability
1.2.3. Total Probability Theorem and Bayes’ Rule
1.2.4. Independence
1.2.5. Counting
1.2.6. The probabilistic Method
CIC
6
Course content
1.3. Discrete Random Variables
1.3.1. Basic Concepts
1.3.2. Probability Mass Functions
1.3.3. Functions of Random Variables
1.3.4. Expectation and Variance
1.3.5. Joint PMFs of Multiple Random Variables
1.3.6. Conditioning
1.3.7. Independence
CIC
7
Course content
1.4. General Random Variables
1.4.1. Continuous Random Variables and PDFs
1.4.2. Cumulative Distribution Function
1.4.3. Normal Random Variables
1.4.4. Joint PDFs of Multiple Random Variables
1.4.5. Conditioning
1.4.6. The Continuous Bayes’ Rule
1.4.7. The Strong Law of Large Numbers
CIC
8
Course content
2. Introduction to Random Processes
2.1. Markov Chains
2.1.1. Discrete Time Markov Chains
2.1.2. Classification of States
2.1.3. Steady State Behavior
2.1.4. Absorption Probabilities and Expected Time to
Absorption
2.1.5. Continuous Time Markov Chains
2.1.6. Ergodic Theorem for Discrete Markov Chains
2.1.7. Markov Chain Montecarlo Method
2.1.8.Queuing Theory
CIC
9
Course content
3. Statistics
3.2. Classical Statistical Inference
3.2.1. Classical Parameter Estimation
3.2.2. Linear Regression
3.2.3. Analysis of Variance and Regression
3.2.4. Binary Hypothesis Testing
3.2.5. Significance Testing
CIC
10
Course text books
Joseph Blitzstein, Jessica Hwang. Introduction to probability, CRC Press2014.https://www.crcpress.com/Introduction-to-Probability/Blitzstein-Hwang/9781466575578
Dimitri P. Bertsekas and John N. Tsitsiklis. Introduction to probability, 2nd Edition, Athena Scientific, 2008. http://athenasc.com/probbook.html
CIC
11
Course text books
Géza Schay, Introduction to probability with statistical applications,Birkhauser, Boston, 2007.http://link.springer.com/book/10.1007/978-0-8176-4591-5
William Feller. An introduction to probability theory and its applications, Vol. 1, 3rd Edition, Wiley, 1968.http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471257087.html
CIC
Midterm exam 15%
Final exam 15%
Homework assignments 20%
One written departmental exam 50%
12
Grading
CIC
1. What is Probability?
1.1.1. Statistical Probability
1.1.2. Probability as a Measure of Uncertainty
14
Probability
CIC
The relative is trying to use the concept of
probability to discuss an uncertain
situation
Luck, Coincidence, Randomness,
Uncertainty, Risk, Doubt, Fortune, Chance…
Used in a vague, casual way!
A first approach to define probability is in
terms of frequency of occurrence, as a
percentage of success
16
What is Probability?
CIC
For example, if we toss a coin, and observe
whether it lands head (H) or tail (T) up
What is the probability of either result?
Why?
17
What is Probability?
CIC
Example: Flip a coin twice
18
What is Probability?
𝑃 𝐴 =# 𝐹𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
# 𝑃𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
CIC
Definition 1 (Sample space and event).
The sample space S of an experiment is the
set of all possible outcomes of an experiment.
An event A is a subset of the sample space S,
and we say that A occurred if the actual
outcome is in A.
19
Sample space
CIC
“Probability is logical framework for
quantifying uncertainty and randomness” [Blitzstein and Hwang, 2014]
“Probability theory is a branch of
mathematics that deals with repetitive events
whose occurrence or nonoccurrence is
subject to chance variation.” [Schay, 2007]
21
What is Probability?
CIC
Provides tools for understanding and
explaining variation, separating signal from
noise, and modeling complex phenomena.
(engineer definition)
22
What is Probability?
CIC
There are situation where the frequency
interpretation is not appropriate
Example: A scholar asserts that the Iliad and
the Odyssey were composed by the same
person, with probability 90%
It is based on the scholar’s subjective belief
23
What is Probability?
CIC
The theory of probability is useful in a broad
variety of contexts and applications:
Statistics, Physics, Biology, Computer
Science, Meteorology, Gambling, Finance,
Political Science, Medicine, Life.
Assignment 1a: Give an example of the
application of probability theory in each area
Assignment 1b: Read math review: http://projects.iq.harvard.edu/files/stat110/files/math_rev
iew_handout.pdf
24
What is Probability?
CIC
The sample space S, which is the set of all
possible outcomes of an experiment.
The probability law, which assigns to a set A of
possible outcomes (also called an event) a
nonnegative number P(A) (called the probability
of A) that encodes our knowledge or belief about
the collective “likelihood” of the elements of A.
The probability law must satisfy certain
properties.
26
Elements of a Probabilistic Model
CIC
The experiment will produce exactly one out of
several possible outcomes.
A subset of the sample space, that is, a collection
of possible outcomes, is called an event.
It means that any collection of possible
outcomes, including the entire sample space S
and its complement, the empty set , may
qualify as an event.Strictly speaking, however, some sets have to be excluded. In particular when dealing with probabilistic models
involving an uncountable infinite sample space, there are certain unusual subsets for which one cannot
associate meaningful probabilities.
27
Experiments and events
CIC
There is no restriction on what constitutes an
experiment.
The events to be considered can be described by
such statements as “a toss of a given coin results
in head,” “a card drawn at random from a regular
52 card deck is an Ace,” or “this book is green.”
Associated with each statement there is a set S of
possibilities, or possible outcomes.
28
Experiments and events
CIC
Examples of experiments and events:
Tossing a Coin. For a coin toss, S may be taken to consist of
two possible outcomes, which we may abbreviate as H and T
for head and tail. We say that H and T are the members,
elements or points of S, and write S = {H, T}.
Tossing two coins but ignore one of them. In this case S =
{HH, HT, TH, TT}. In this case, for instance, the outcome
“the first coin shows H” is represented by the set {HH, HT},
that is, this statement is true if we obtain HH or HT and false
if we obtain TH or TT.
29
Experiments and events
CIC
Tossing a Coin Until an H is Obtained. If we toss a coin
until an H is obtained, we cannot say in advance how many
tosses will be required, and so the natural sample space is S =
{H, TH, TTH, TTTH, . . . }, an infinite set. We can use, of
course, many other sample spaces as well, for instance, we
may be interested only in whether we had to toss the coin
more than twice or not, in which case S = {1 or 2, more than
2} is adequate.
Selecting a Number from an Interval. Sometimes, we need
an uncountable set for a sample space. For instance, if the
experiment consists of choosing a random number between 0
and 1, we may use S = {x : 0 < x < 1}.
30
Experiments and events
CIC
Specifies the “likelihood” of any outcome, or of
any set of possible outcomes.
Assigns to every event A, a number P(A), called
the probability of A.
31
The probability law
CIC
Given a sample space S and a certain collection ℱ of its
subsets, called events, an assignment P of a number P(A) to
each event A in ℱ is called a probability measure, and P(A)
the probability of A, if P has the following properties:
1. P(A) ≥ 0 for every A,
2. P(S) = 1, and
3. P(A1 ∪ A2 ∪· · · ) = P(A1)+ P(A2) + ·· · for any finite or
countably infinite set of mutually exclusive events A1, A2, …
Then, the sample space S together with ℱ and P is called a
probability space.
32
Probability Space[Schay 2007]
CIC
Definition 1.6.1 (General definition of probability). A
probability space consists of a sample space S and a
probability function P which takes an event A S as input
and returns P(A), a real number between 0 and 1, as output.
The function P must satisfy the following axioms:
1. P() = 0, P(S) = 1.
2. If A1, A2, . . . are disjoint events, then:
(Saying that these events are disjoint means that they are mutually exclusive:
Ai ∩ Aj = for i ≠ j.)
34
Probability Space[Blitzstein and Hwang, 2015]
CIC
The Probability of the Empty Set Is 0. In any
probability space, P(∅) = 0.
Proof:
35
Properties of probabilities
1 = P(S) = P(S∪ ) = P(S) + P() = 1 + P()
CIC
The Probability of the Union of Two Events.
For any two events A and B,
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Proof:
36
Properties of probabilities
CIC
Probability of Complements. For any event A,
P(Ac) = 1 − P(A)
Proof: Ac ∩ A = ∅ and Ac ∪ A = S by the definition
of Ac. Thus, by Axiom 3, P(S) = P(Ac ∪ A) = P(Ac)
+ P(A). Now, Axiom 2 says that P(S) = 1, and so,
comparing these two values of P(S), we obtain
P(Ac) + P(A) = 1.
37
Properties of probabilities
CIC
Probability of Subsets. If A ⊂ B,
then P(A) ≤ P(B).
Proof:
38
Properties of probabilities
If A B, then we can write B as the
union of A and B ∩ Ac, where B ∩ Ac is
the part of B not also in A.
Since A and B ∩ Ac are disjoint, we can
apply the second axiom:
P(B) = P(A∪ (B ∩ Ac)) = P(A) + P(B ∩ Ac)
Probability is nonnegative, so P(B ∩ Ac) ≥ 0, proving that P(B) ≥ P(A).
CIC
In the special case where the probabilities P(s1),
…, P(sn) are all the same, by necessity equal to
1/n, in view of the normalization axiom, we
obtain:
43
Discrete Uniform Probability Law
CIC
The calculation of probabilities often involves
counting the number of outcomes in various events. When the sample space S has finite number of equally likely
outcomes, so that the discrete uniform probability law applies. Then,
the probability of any event A is given by:
When we want to calculate the probability of an event A with a finite
number of equally likely outcomes, each of which has an already
known probability p. Then the probability of A is given by:
46
Counting
𝑃 𝐴 =number of elements of 𝐴
number of elements of 𝑆=
𝑘
𝑛
𝑃 𝐴 = 𝑝 ∙ (number of elements of 𝐴)
CIC
In how many ways you can dress today if you
find:
4 shirts
3 ties
2 jackets
in your closet?
47
Basic Counting Principle
CIC
Consider a process that consists of r stages. Suppose
that:
a) There are n1 possible results at the firs stage.
b) For every possible result at the first stage, there are n2
possible results at the second stage.
c) More generally, for any sequence of possible results at the
first i ˗ 1 stage, there are ni possible results at the ith stage.
Then, the total number of possible results of the r-stage
process is:
𝑛1𝑛2⋯𝑛𝑟
48
The Multiplication Principle
CIC
Example 1. The number of telephone numbers. A
local telephone company number is a 7-digit
sequence, but the first digit has to be different
from 0 or 1. How many distinct telephone
numbers are there?
50
The Multiplication Principle
CIC
Example 2. The number of subsets of an n-
element set. Consider an n-element set
{s1, s2,…, sn}.
How many subsets it have, including itself and
the empty set?
Example, in the set {1,2,3}?
51
The Multiplication Principle
CIC
This is a sequential process where we take in turn
each of the n elements and decide whether to
include it in the desired subset or not.
Thus we have n steps, and in each step two choices,
namely yes or no to the question of whether the
element belongs to the desired subset. Therefore the
number of subsets is:
for n = 1?
52
The Multiplication Principle
CIC
Example 3. Drawing three cards. What is the
number of ways three cards can be drawn one
after the other from a regular 52 cards deck
without replacement?
What is this number if we replace each card
before the next one is drawn?
53
Number of subsets
n1 = n2 = n3 = 52 523
n1 = 52, n2 = 51, n3 = 50 525150
CIC
Involve the selection of k objects out of a
collection of n objects.
If the order of selection matters, the selection is
called a permutation.
If the order of selection does not matter, the
selection is called a combination.
54
Permutation and Combination
CIC
k permutations
Assume there are n distinct objects, and let k be
some positive integer with k n.
We want to count the number of different ways
that we can pick k out of these n objects and
arrange them in a sequence, e.g. the number of
distinct k-object sequences.
55
Permutation
CIC
In place 1 we can put n objects, which we can write as
n−1+1;
In place 2 we can put n−1 = n−2+1 objects; and so on.
Thus the kth factor will be n − k + 1, and so, for any 2
positive integers n and k ≤ n:
n(n − 1)(n − 2) · · · (n − k + 1) = Pn,k
In the special case where k = n:
n(n − 1)(n − 2) · · · 3 · 2 · 1 = n!
The number of possible sequences is simple called
permutations
56
Permutation
CIC
From the definitions of n!, (n − k)! and Pn,k we
can obtain the following relation:
n! = [n(n − 1)(n − 2) · · · (n − k + 1)][(n − k)(n − k − 1) · · · 2 · 1]
= Pn,k · (n − k)!
and so:
𝑃𝑛,𝑘 =𝑛!
𝑛 − 𝑘 !
with 0! = 1.
57
Permutation
CIC
Example 4. Six rolls of a die. Find the probability
that:
Six rolls of a (six sided) die all give different numbers
Assume all outcomes are equally likely
P(all six rolls give different numbers) = ?
58
Probability calculation
𝑃 𝐴 =number of elements of 𝐴
number of elements of 𝑆=
𝑘
𝑛
𝑃 𝐴 = 𝑝 ∙ (number of elements of 𝐴)
p = probability of each equally likely outcome in A
CIC
Example 4. Six rolls of a die. Find the probability
that:
Six rolls of a (six sided) die all give different numbers
Assume all outcomes are equally likely
P(all six rolls give different numbers) = ?
59
Probability calculation
𝑃 𝐴 =number of elements of 𝐴
number of elements of 𝑆=
𝑘
𝑛= 𝐴 =
𝑃6,6
# 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑆=
6!
66
𝑃 𝐴 = 𝑝 ∙ number of elements of 𝐴 =1
666!
p = probability of each equally likely outcome in A
CIC
Example 5. Dealing Three Cards. In how many
ways can three cards be dealt from a regular deck
of 52 cards?
60
Permutation
CIC
Example 5. Dealing Three Cards. In how many
ways can three cards be dealt from a regular deck
of 52 cards?
P52,3 = 𝑃𝑛,𝑘 =𝑛!
𝑛−𝑘 !
= 52·51·50 = 132, 600.
61
Permutation
CIC
Example 6. Birthday problem. There are k people
in a room. Assume each person’s birthday is
equally likely to be any of the 365 days of the
year (we exclude February 29), and that people’s
birthdays are independent (we assume there are
no twins in the room). What is the probability
that two or more people in the group have the
same birthday?
62
Permutation
CIC
This amounts to sampling the 365 days of the year without
replacement, so:
365 · 364 · 363 · · · (365−k +1) for k 365
Therefore the probability of no birthday matches in a group of k
people is:
and the probability of at least one birthday match is:
63
Permutation
CIC
64
Permutation
Probability that in a room of k people, at least two were born on the
same day. This probability first exceeds 0.5 when k = 23.
CIC
The number of possible unordered selections of k different
things out of n different ones is denoted by Cn,k, and each such
selection is called a combination of the given things.
If we select k things out of n without regard to order, then, this
can be done in Cn,k ways.
In each case we have k things which can be ordered k! ways.
Thus, by the multiplication principle, the number of ordered
selections is Cn,k · k!
On the other hand, this number is, by definition, Pn,k . Therefore
Cn,k · k! = Pn,k , and so:
65
Combinations
𝐶𝑛,𝑘 =𝑃𝑛,𝑘𝑘!
=𝑛!
𝑘! 𝑛 − 𝑘 !
CIC
The quantity on the right-hand side is usually abbreviated
as 𝑛𝑘
, and is called a binomial coefficient.
Thus, for any positive integer n and k = 1, 2, . . . , n:
𝐶𝑛,𝑘 =𝑛𝑘
=𝑛(𝑛 − 1)(𝑛 − 2)⋯ (𝑛 − 𝑘 + 1)
𝑘!
=𝑛!
𝑘! 𝑛 − 𝑘 !
66
Combinations
n ! = [n (n − 1)(n − 2) · · · (n − k + 1)][(n − k)(n − k − 1) · · · 2 · 1]
CIC
Binomial coefficient 𝑛𝑘 Binomial probabilities
n 1 independent coin tosses: P(H) = p; P(k heads) = ?
Example: P(HTTTHH) = ?
P(particular sequence) = ?
P(particular k-head sequence) = ?
68
Binomial probabilities
CIC
A combination can be seen as a partition of the
set in two: one part contains k elements and the
other contains the remaining n ˗ k elements.
Given an n-element set and nonnegative integers
n1, n2, …, nr, whose sum is equal to n; consider
partitions of the set into r disjoint subsets, with
the ith subset containing exactly ni elements.
In how many ways this can be done?
69
Partitions
CIC
There are 𝑛𝑛1
ways of forming the first subset.
Having formed the first subset, there are left n – n1
elements. We need to choose n2 of them in order to form
the second subset, and have 𝑛 − 𝑛1𝑛2
choices, and so on.
Thus, using the Counting Principle:
70
Partitions
CIC
As several terms cancel, it results:
This is called the multinomial coefficient and is
usually denoted by:
71
Partitions
CIC
Example 7. Each person gets an ace. There is a 52-
card deck, dealt (fairly) to four players. What is the
probability of each player getting an ace?
73
Partitions
CIC
Example 7. Each person gets an ace. There is a 52-
card deck, dealt (fairly) to four players. What is the
probability of each player getting an ace?
The size of the sample space is: 52!
13!13!13!13!
Constructing an outcome with one ace for each person:
o# of different ways of distributing the 4 aces to 4 players: 4!
oDistribution of the remaining 48 cards: 48!
12!12!12!12!
74
Partitions
CIC
Conditional probability provides us with a way to
reason about the outcome of an experiment,
based on partial information.
Examples:
A) In an experiment involving two successive rolls of a die,
you are told that the sum of the two rolls is 9. How likely is
that the first roll was 6?
B) In a word guessing game, the first letter of the word is a
“t”. What is the likelihood that the second letter is an “h”?
76
Conditional Probability
CIC
C) How likely is it that a person has certain
disease given that a medical test was negative?
D) A spot shows up on a radar screen. How likely
is it to correspond to an aircraft?
77
Conditional Probability
CIC
Given:
An experiment
A corresponding sample space
A probability law
We know that the outcome is within some given event
B.
Quantify the likelihood that the outcome also
belongs to some other given event A.
78
Conditional Probability
CIC
Construct a new probability law that takes into
account the available knowledge.
A probability law that for any event A, specifies the
conditional probability of A given B, P(A|B).
The conditional probabilities P(A|B) of
different events A should satisfy the
probability axioms.
79
Conditional Probability
CIC
Example:
Suppose that all six possible outcomes of a fair die
roll are equally likely.
If the outcome is even, then there are only three
possible outcomes: 2, 4 and 6.
What is the probability of the outcome being 6 given
that the outcome is even?
80
Conditional Probability
CIC
If all possible outcomes are equally likely:
Conditional probability definition:
With P(B) > 0.
The total probability of the elements of B, P(A|B) is the fraction
that is assigned to possible outcomes that also belong to A.
81
Conditional Probability
CIC
Probability law of conditional probabilities
satisfy the three axioms:
1. P(A|B) ≥ 0 for every event A,
2. P(S|B) = 1,
3. P(A1 ∪ A2 ∪ ·· · |B) = P(A1|B)+ P(A2|B) + ·· · for
any finite or countably infinite number of mutually
exclusive events A1, A2, . . . .
82
Conditional Probability
CIC
Proofs:
1. In the definition of P(A|B) the numerator is
nonnegative by Axiom 1, and the denominator is
positive by assumption. Thus, the fraction is
nonnegative.
2. Taking A = S in the definition of P(A|B), we get:
83
Conditional Probability
CIC
85
Conditional Probability
Knowledge that event B has occurred implies that the outcome of the
experiment is in the set B. In computing P(A|B) we can therefore view the
experiment as now having the reduced sample space B. The event A occurs
in the reduced sample space if and only if the outcome ζ is in A ∩ B. The
equation simply renormalizes the probability of events that occur jointly
with B.
CIC
86
Conditional Probability
Suppose that we learn that B occurred. Upon obtaining this information, we get rid
of all the pebbles in Bc because they are incompatible with the knowledge that B
has occurred. Then P(A∩B) is the total mass of the pebbles remaining in A. Finally,
we renormalize, that is, divide all the masses by a constant so that the new total
mass of the remaining pebbles is 1. This is achieved by dividing by P(B), the total
mass of the pebbles in B. The updated mass of the outcomes corresponding to
event A is the conditional probability P(A|B) = P(A∩B)/P(B).
CIC
If we interpret probability as relative frequency:
P(A|B) should be the relative frequency of the event
P(A∩B) in experiments where B occurred.
Suppose that the experiment is performed n times, and
suppose that event B occurs nB times, and that event
A∩B occurs nA∩B times. The relative frequency of
interest is then:
where we have implicitly assumed that P(B) > 0.
87
Conditional Probability
CIC
Example 2. A ball is selected from an urn containing two
black balls, numbered 1 and 2, and two white balls, numbered
3 and 4. The number and color of the ball is noted, so the
sample space is {(1,b),(2,b), (3,w), (4,w)}. Assuming that the
four outcomes are equally likely, find P(A|B) and P(A|C),
where A, B, and C are the following events:
89
Conditional Probability
CIC
Example 3. From all families with three children,
we select one family at random. What is the
probability that the children are all boys, if we
know that a) the first one is a boy, and b) at least
one is a boy? (Assume that each child is a boy or
a girl with probability 1/2, independently of each
other.)
90
Conditional Probability
CIC
Example 4. A card is drawn at random from a deck
of 52 cards. What is the probability that it is a King
or a 2, given that it is a face card (J, Q, K)?
91
Conditional Probability
CIC
If we multiply both sides of the definition of
P(A|B) by P(B) we obtain:
P(A ∩ B) = P(A|B) P(B)
Similarly, if we multiply both sides of the
definition of P(B|A) by P(A) we obtain:
P(B ∩ A) = P(B|A) P(A)
92
Total Probability Theorem and
Bayes’ Rule
CIC
Joint Probability of Two Events. For any events
A and B with positive probabilities:
P(A ∩ B) = P(B) P(A|B) = P(A) P(B|A)
Joint Probability of Three Events
P(A∩B∩C) = P(A) P(B|A) P(C|A∩B)
P(A1∩A2∩A3) = P(A1) P(A2|A1) P(A3|A1∩A2)
93
Total Probability Theorem and
Bayes’ Rule
CIC
Applying repeatedly, we can generalise to the
intersection of n events.
94
Total Probability Theorem and
Bayes’ Rule
CIC
P(B) = P(A1) P(B|A1) + · · · + P(An) P(B|An)
The Ai partition the sample space; P(B) is equal to:
97
Total Probability Theorem
The probability that B occurs is a
weighted average of its conditional
probability under each scenario,
where each scenario is weighted
according to its (unconditional)
probability.
CIC
Example 1. Radar detection. If an aircraft is present
in certain area, a radar detects it and generates an
alarm signal with probability 0.99. If an aircraft is
not present, the radar generates a (false) alarm, with
probability 0.10. We assume that an aircraft is
present with probability 0.05.
What is the probability of no aircraft presence and
false alarm?
What is the probability of aircraft presence and no
detection?
99
Total Probability Theorem
CIC
Example 2. Picking Balls from Urns. Suppose we
have two urns, with the first one containing 2 white
and 6 black balls, and the second one containing 2
white and 2 black balls. We pick an urn at random,
and then pick a ball from the chosen urn at random.
What is the probability of picking a white ball?
102
Total Probability Theorem
CIC
Dealing Three Cards. From a deck of 52 cards
three are drawn without replacement.
What is the probability of the event E of getting
two Aces and one King in any order?
Denote the relevant outcomes by A, K and O (for
“other”),
104
Total Probability Theorem
CIC
To verify Bayes’ rule, by the definition of
conditional probability:
P(B) follows from the total probability theorem.
108
Bayes’ Rule
CIC
Example 1. Rare disease. A test for a rare disease is assumed
to be correct 95% of the time: if a person has the disease, the
test results are positive with probability 0.95, and if the person
does not have the disease, the results are negative with
probability 0.95. A random person drawn from a certain
population has probability 0.001 of having the disease. Given
that the person just tested positive, what is the probability of
having the disease?
A={“the person has the disease”}
B={“the test results are positive”}
P(A|B)=?
111
Bayes’ Rule
CIC
112
Bayes’ Rule
A rare disease we need a much more accurate test. The probability of
a false positive result must be of a lower order of magnitude than the
fraction of people with the disease.
CIC
Example 2. Random coin. You have one fair
coin, and one biased coin which lands Heads with
probability 3/4. You pick one of the coins at
random and flip it three times. It lands Heads all
three times. Given this information, what is the
probability that the coin you picked is the fair
one?
113
Bayes’ Rule
CIC
114
Bayes’ Rule
Before flipping the coin, we thought we were equally likely to have picked
the fair coin as the biased coin: P(F) = P(Fc) = 1/2. Upon observing three
Heads, however, it becomes more likely that we’ve chosen the biased coin
than the fair coin, so P(F|A) is only about 0.23.
CIC
Independence of two events. Events A and B are
independent if
P(A ∩ B) = P(A) P(B)
If P(A) > 0 and P(B) > 0, then this is equivalent
to:
P(A|B) = P(A)
and also equivalent to:
P(B|A) = P(B)
115
Independence
CIC
Two events are independent if we can obtain the
probability of their intersection by multiplying
their individual probabilities. Alternatively, A
and B are independent if learning that B occurred
gives us no information that would change our
probabilities for A occurring (and vice versa).
Independence is a symmetric relation: if A is
independent of B, then B is independent of A.
116
Independence
CIC
Independence is completely different
from disjointness. If A and B are
disjoint, then P(A∩B) = 0, so disjoint
events can be independent only if P(A)
= 0 or P(B) = 0. Knowing that A occurs
tells us that B definitely did not occur,
so A clearly conveys information about
B, meaning the two events are not
independent (except if A or B already
has zero probability).
117
Independence
CIC
If A and B are independent, then A and Bc are
independent, Ac and B are independent, and Ac
and Bc are independent.
Proof. Let A and B be independent. Then
P(Bc|A) = 1 − P(B|A) = 1 − P(B) = P(Bc)
so A and Bc are independent. Swapping the roles of A and B, we
have that Ac and B are independent. Using the fact that A, B
independent implies A, Bc independent, with Ac playing the role
of A, we also have that Ac and Bc are independent.
118
Independence
CIC
Independence of three events. Events A, B, and C
are said to be independent if all of the following
equations hold:
P(A ∩ B) = P(A)P(B)
P(A ∩ C) = P(A)P(C)
P(B ∩ C) = P(B)P(C)
P(A ∩ B ∩ C) = P(A)P(B)P(C)
119
Independence
CIC
Independence of many events. For n events A1,A2, . . . ,
An to be independent, we require any pair to satisfy:
P(Ai ∩ Aj) = P(Ai)P(Aj) (for i ≠ j),
any triplet to satisfy:
P(Ai ∩ Aj ∩ Ak) = P(Ai)P(Aj)P(Ak) (for i, j, k distinct)
And similarly for all quadruplets, quintuplets, and so on.
For infinitely many events, we say that they are
independent if every finite subset of the events is
independent.
121
Independence
CIC
Given an event C, the events A and B are said to
be conditionally independent if:
P(A ∩ B|C) = P(A|C) P(B|C)
122
Conditional independence
CIC
The previous relation states that if C is known to
have occurred, the additional knowledge that B
also occurred does not change the probability of
A.
The independence of two events A and B with
respect to the unconditional probability law, does
not imply conditional independence, and vice
versa.
123
Conditional independence