cs626 data analysis and simulation - cs.wm.edu
TRANSCRIPT
1
CS626 Data Analysis and Simulation
Today: Probability Primer
Quick Reference: Sheldon Ross: Introduction to Probability Models 9th Edition, AP, Ch. 1, Berthold, Hand: Intelligent Data Analysis, Springer 99, Chapter 2 by Feelders, Statistics Concepts.
Instructor: Peter Kemper R 104A, phone 221-3462, email:[email protected]
2
Overview
Random Variable Definition PMF p() and CDF F() Indicator variable Bernoulli variable Binomial variable and Binomial distribution
Combinatorics refresher
3
Random variable
S can contain any type of elements. Often interest in numerical values that are a function of S
instead of E ⊆ SSuch functions are called random variables Mean to develop methods for experiments that yield numbers Compact description of an experiment (S may be too fine grained)
Concerns Probabilities of elements of S imply probabilities on the elements
of the range of a random variable. Range can be a discrete set of values or continuous. For the continuous setting, P(X ≤ x) needs to be well-defined, { s| X ≤ x } be an event, i.e. measurable.
4
Random variable
Definition(s)(special)A random variable X on a discrete sample space S is a function X : S -> R that assigns a real number X(s) to each sample point s ∈ S.
(in general)A random variable X on a probability space (S,F,P) is a function X : S -> R that assigns a real number X(s) to each sample point s ∈ S, such that for every real number x, the set of sample points {s|X(s) ≤ x} is an event, that is a member of F.
5
Quick Example
Let X denote the random variable that is defined as the sum of two fair dice:
Since X must be one of the values 2-12, then:
6
Another Example
A coin has P(H)=pLet’s toss it till we see the first heads. Let N denote the number of flips required Assumption: outcomes of successive flips are independent N is a random variable with range {1, 2, 3, … }
and
7
An Experiment
Lifetime of a battery, S=[0,∞) Will it last at least two years ?
E : event that the battery lasts two or more yearsRandom variable I
I is called indicator random variable for event E
8
What can we say about a Random Variable ?
Discrete Random Variable: Set of possible values is countable (or even finite)
Continuous Random Variable: Set of possible values is non-denumerable
We need terminology, ways to describe them more to the point …
What are characteristics of a random variable ?
9
Cumulative Distribution Function
Definition: The Cumulative Distribution Function (cdf, also distribution function, probability dist function) FX(b) of random variable X for any real number b,
is
Example: the life of a car is within some interval (a, b)Properties of cdf
Omit subscript X if clear
10
Cumulative Distribution Function
DiscreteContinuousMixed
Note cdfs are always of similar shape,although variables differ a lot …
11
Properties of CDF
All probability questions about X can be answered in terms of the cdf F(.)Example:
The probability that X is strictly smaller than b is
Note that P{X < b} does not necessarily equal F(b) since F(b) also includes the probability that X equals b
12
Probability Mass Function of a Discrete Random Variable
For a discrete random variable X, we define the probability of mass function (pmf) pX(a) of X by
If X must assume one of the values x1, x2, …, (at most countable) then
Since X must take on one of the values xi, so
13
Discrete Random Variables
The cumulative distribution function F can be expressed in terms of p(a) by
Example: suppose X has a probability mass function given by:
Then the cumulative distribution function F of X is
14
Bernoulli Random Variable
Suppose an experiment is either a success or failure. Let X = 1 if the outcome is a success X = 0 if it is a failure, then the probability of mass function of X is
X is said to be a Bernoulli random variable if its probability is given by the above equation for some p from 0 to1.
15
Binomial Random Variable
n independent trials of Bernoulli experiment, Each with a probability of success of p and a failure probability of 1-p. Let X be number of successes that occur in n trialsX is a binomial random variable with parameters (n, p).A probability mass function of X
where number of different groups of i objects out of n objects
16
Combinatorics
Let’s do a consistency check:
Uses The binomial theorem
17
A combinatorics refresher
Let’s consider n objects you can choose from and r is the number to be chosen
Permutation (order matters)Number of permutations with repetition Number of permutations without repetition
Combination (order does not matter)Number of combinations without repetition
(choose r=5 out of n=10 different objects) Number of combinations with repetition
(choose r=3 objects of n=10 different types of objects)
18
Example: Binomial random variable
Machine produces items, occasionally with defects Probability of defective item: 0.1, Defects are independent of each other. What is the probability that from a sample of three items at most
one will be defective? Let X be the number of defective items in the sample, then X is a binomial random variable with parameters (3, 0.1)
19
Another Example
Failures of airplane engines Let Fi = 1-p be the failure of i-th engine (in mid-flight) Assumptions: engine failures are independent Airplane is operational if at least half of its engines work. Failures are independent =>number of engines remaining
operative is a binomial random variable
Question: For what values of p is a four-engine plane preferable to a two-engine plane? Probability that a four-engine plane makes a successful flight
Probability that a two-engine plane makes a successful flight
20
Another Example Continued So the four-engine plane is safer if
The four-engine plane is thus safer when the engine success probability is at least as large as 2/3
21
Summary
Random Variable Definition PMF p() and CDF F() Indicator variable Bernoulli variable Binomial variable and Binomial distribution
Combinatorics refresher
Is that all ?
No, there are a few more distributions of random variables we should look at in the near future …