cs626 data analysis and simulation - cs.wm.edu

21
1 CS626 Data Analysis and Simulation Today: Probability Primer Quick Reference: Sheldon Ross: Introduction to Probability Models 9th Edition, AP, Ch. 1, Berthold, Hand: Intelligent Data Analysis, Springer 99, Chapter 2 by Feelders, Statistics Concepts. Instructor: Peter Kemper R 104A, phone 221-3462, email:[email protected]

Upload: others

Post on 29-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS626 Data Analysis and Simulation - cs.wm.edu

1

CS626 Data Analysis and Simulation

Today: Probability Primer

Quick Reference: Sheldon Ross: Introduction to Probability Models 9th Edition, AP, Ch. 1, Berthold, Hand: Intelligent Data Analysis, Springer 99, Chapter 2 by Feelders, Statistics Concepts.

Instructor: Peter Kemper R 104A, phone 221-3462, email:[email protected]

Page 2: CS626 Data Analysis and Simulation - cs.wm.edu

2

Overview

Random Variable Definition PMF p() and CDF F() Indicator variable Bernoulli variable Binomial variable and Binomial distribution

Combinatorics refresher

Page 3: CS626 Data Analysis and Simulation - cs.wm.edu

3

Random variable

S can contain any type of elements. Often interest in numerical values that are a function of S

instead of E ⊆ SSuch functions are called random variables Mean to develop methods for experiments that yield numbers Compact description of an experiment (S may be too fine grained)

Concerns Probabilities of elements of S imply probabilities on the elements

of the range of a random variable. Range can be a discrete set of values or continuous. For the continuous setting, P(X ≤ x) needs to be well-defined, { s| X ≤ x } be an event, i.e. measurable.

Page 4: CS626 Data Analysis and Simulation - cs.wm.edu

4

Random variable

Definition(s)(special)A random variable X on a discrete sample space S is a function X : S -> R that assigns a real number X(s) to each sample point s ∈ S.

(in general)A random variable X on a probability space (S,F,P) is a function X : S -> R that assigns a real number X(s) to each sample point s ∈ S, such that for every real number x, the set of sample points {s|X(s) ≤ x} is an event, that is a member of F.

Page 5: CS626 Data Analysis and Simulation - cs.wm.edu

5

Quick Example

Let X denote the random variable that is defined as the sum of two fair dice:

Since X must be one of the values 2-12, then:

Page 6: CS626 Data Analysis and Simulation - cs.wm.edu

6

Another Example

A coin has P(H)=pLet’s toss it till we see the first heads. Let N denote the number of flips required Assumption: outcomes of successive flips are independent N is a random variable with range {1, 2, 3, … }

and

Page 7: CS626 Data Analysis and Simulation - cs.wm.edu

7

An Experiment

Lifetime of a battery, S=[0,∞) Will it last at least two years ?

E : event that the battery lasts two or more yearsRandom variable I

I is called indicator random variable for event E

Page 8: CS626 Data Analysis and Simulation - cs.wm.edu

8

What can we say about a Random Variable ?

Discrete Random Variable: Set of possible values is countable (or even finite)

Continuous Random Variable: Set of possible values is non-denumerable

We need terminology, ways to describe them more to the point …

What are characteristics of a random variable ?

Page 9: CS626 Data Analysis and Simulation - cs.wm.edu

9

Cumulative Distribution Function

Definition: The Cumulative Distribution Function (cdf, also distribution function, probability dist function) FX(b) of random variable X for any real number b,

is

Example: the life of a car is within some interval (a, b)Properties of cdf

Omit subscript X if clear

Page 10: CS626 Data Analysis and Simulation - cs.wm.edu

10

Cumulative Distribution Function

DiscreteContinuousMixed

Note cdfs are always of similar shape,although variables differ a lot …

Page 11: CS626 Data Analysis and Simulation - cs.wm.edu

11

Properties of CDF

All probability questions about X can be answered in terms of the cdf F(.)Example:

The probability that X is strictly smaller than b is

Note that P{X < b} does not necessarily equal F(b) since F(b) also includes the probability that X equals b

Page 12: CS626 Data Analysis and Simulation - cs.wm.edu

12

Probability Mass Function of a Discrete Random Variable

For a discrete random variable X, we define the probability of mass function (pmf) pX(a) of X by

If X must assume one of the values x1, x2, …, (at most countable) then

Since X must take on one of the values xi, so

Page 13: CS626 Data Analysis and Simulation - cs.wm.edu

13

Discrete Random Variables

The cumulative distribution function F can be expressed in terms of p(a) by

Example: suppose X has a probability mass function given by:

Then the cumulative distribution function F of X is

Page 14: CS626 Data Analysis and Simulation - cs.wm.edu

14

Bernoulli Random Variable

Suppose an experiment is either a success or failure. Let X = 1 if the outcome is a success X = 0 if it is a failure, then the probability of mass function of X is

X is said to be a Bernoulli random variable if its probability is given by the above equation for some p from 0 to1.

Page 15: CS626 Data Analysis and Simulation - cs.wm.edu

15

Binomial Random Variable

n independent trials of Bernoulli experiment, Each with a probability of success of p and a failure probability of 1-p. Let X be number of successes that occur in n trialsX is a binomial random variable with parameters (n, p).A probability mass function of X

where number of different groups of i objects out of n objects

Page 16: CS626 Data Analysis and Simulation - cs.wm.edu

16

Combinatorics

Let’s do a consistency check:

Uses The binomial theorem

Page 17: CS626 Data Analysis and Simulation - cs.wm.edu

17

A combinatorics refresher

Let’s consider n objects you can choose from and r is the number to be chosen

Permutation (order matters)Number of permutations with repetition Number of permutations without repetition

Combination (order does not matter)Number of combinations without repetition

(choose r=5 out of n=10 different objects) Number of combinations with repetition

(choose r=3 objects of n=10 different types of objects)

Page 18: CS626 Data Analysis and Simulation - cs.wm.edu

18

Example: Binomial random variable

Machine produces items, occasionally with defects Probability of defective item: 0.1, Defects are independent of each other. What is the probability that from a sample of three items at most

one will be defective? Let X be the number of defective items in the sample, then X is a binomial random variable with parameters (3, 0.1)

Page 19: CS626 Data Analysis and Simulation - cs.wm.edu

19

Another Example

Failures of airplane engines Let Fi = 1-p be the failure of i-th engine (in mid-flight) Assumptions: engine failures are independent Airplane is operational if at least half of its engines work. Failures are independent =>number of engines remaining

operative is a binomial random variable

Question: For what values of p is a four-engine plane preferable to a two-engine plane? Probability that a four-engine plane makes a successful flight

Probability that a two-engine plane makes a successful flight

Page 20: CS626 Data Analysis and Simulation - cs.wm.edu

20

Another Example Continued So the four-engine plane is safer if

The four-engine plane is thus safer when the engine success probability is at least as large as 2/3

Page 21: CS626 Data Analysis and Simulation - cs.wm.edu

21

Summary

Random Variable Definition PMF p() and CDF F() Indicator variable Bernoulli variable Binomial variable and Binomial distribution

Combinatorics refresher

Is that all ?

No, there are a few more distributions of random variables we should look at in the near future …