continuous random variables - eric f. lockericfrazerlock.com/continuous_rvs.pdfcontinuous random...

30
Continuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics, SPH [email protected] 09/27/2018 PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Upload: others

Post on 03-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Continuous Random Variables

PUBH 7401: Fundamentals of Biostatistical Inference

Eric F. LockUMN Division of Biostatistics, SPH

[email protected]

09/27/2018

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 2: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Types of Random VariablesDiscrete Random Variable

The support of a discrete random variable either1 constitutes a finite set , or2 is an infinite sequence of values in which there is a first

element, a second element, and so on (i.e., not an interval)

E.g., D = {0, 1/2, 1} or D = {0, 1, 2, 3, . . . }Continuous Random Variable

A random variable is continuous if both of the following apply:1 Its support consists either of all numbers in a single interval

on the number line2 No specific value of the variable has positive probability, that

is, P(X = c) = 0 for any single value c.

E.g., D = [0, 1]: all values between 0 and 1E.g., D = [−∞,∞]: any positive or negative number

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 3: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Failure of the pmf for Continuous RVs

I Why can’t we use the same principles that we used fordiscrete random variables?

I In particular, why can’t we assign a probability to eachpossible value x?

I If the number of possible values of x is uncountable, thereexists no possible pmf such that

∑x∈D p(x) <∞, where D is

the support

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 4: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Motivating Example: Heights

I Suppose we are interested in the heights in inches of all malestudents at the U

I Suppose we measured everyone’s height and did so to 10decimal places

I We could provide the proportion of subjects at each recordedheight (pmf)

I Likely there would only be one (maybe two) people at eachheight

I Would not be a very informative summary measure

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 5: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Density Histogram

● ●● ●● ● ● ●● ●●●● ● ●●●● ●● ●● ●● ●● ●● ●● ●● ● ● ●● ●● ●● ●●● ●● ●● ●●● ●● ●●● ● ●●● ●● ●●●● ● ● ● ●● ●● ● ● ●● ●●●● ●● ●● ●●●● ●●● ●●● ●●●● ●●● ●●●●●● ●● ●●● ●● ● ●● ●●●● ● ●● ●● ●● ●●● ●●● ● ●●● ●●● ●●● ●● ●●● ●●● ● ● ●● ● ●●● ● ●●● ●●● ●● ●● ● ● ● ●● ● ●●● ● ●● ● ●● ●● ●● ● ●●● ●● ●● ●● ●●● ● ● ●●●●● ●●● ●● ●● ●●● ●●● ●●● ●●● ● ●● ●● ●●●● ●● ●● ● ●● ●● ● ●● ●● ●● ● ●● ● ●●●●● ●●● ●●● ●● ●● ● ●● ● ●● ●● ●●●●● ●● ●●●● ●● ●● ● ●● ●●● ●● ●●● ● ●● ●●●● ●●● ● ●●● ● ●● ●● ●● ●● ●●● ●●● ● ●●● ●●● ●● ●● ●● ● ●● ●● ●●● ●● ●●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●●●● ●●● ●●● ●● ●●● ● ●●● ●● ● ●●● ●● ●● ●●● ●●● ●●● ●●● ●● ●● ● ●●● ●● ● ●● ●●● ●●● ●●● ● ● ●● ● ●● ●● ● ●●● ●● ●● ●● ●●● ●● ●● ●●●● ●●●● ●● ●●● ● ●● ●●● ●● ●●● ●● ●●● ● ●

65 70 75

0.00

150.

0025

First 500 observerations

Height (inches)

Pro

babi

lity

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 6: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Frequency Histogram

I Instead we could provide the number of students withincertain height ranges (from which you could calculate theproportion within each range)

I Height of each bar is proportional to probability of height ineach range

I The scale of the y -axis changes based on the “bin sizes” orsize of range

I If we divide each bin size in half the height of the barsdecreases by a factor of 2, on average

I As bin size get smaller and smaller the histogram converges tozero almost everywhere

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 7: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Frequency Histogram

height

Fre

quen

cy

60 65 70 75 80

020

0040

0060

0080

00

height

Fre

quen

cy

60 65 70 75 80

020

0040

0060

0080

00

height

Fre

quen

cy

60 65 70 75 80

020

0040

0060

0080

00

height

Fre

quen

cy

60 65 70 75 80

020

0040

0060

0080

00

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 8: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Density Histogram

I A more informative summary measure would be the density ofpoints

I Solution to that problem is to have the area of the curveproportional to the probability

I Area = length × height. Here length is the bin width soheight must be probability/(bin width).

I In statistics, density=probability/length of interval

I The smooth curve which is the limit as the bin size goes tozero is known as the probability density function

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 9: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Density Histogram

height

Den

sity

60 65 70 75 80

0.00

0.05

0.10

0.15

height

Den

sity

60 65 70 75 80

0.00

0.05

0.10

0.15

height

Den

sity

60 65 70 75 80

0.00

0.05

0.10

0.15

height

Den

sity

60 65 70 75 80

0.00

0.05

0.10

0.15

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 10: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Definition of Probability Density Function

Probability Density Function (pdf)

Let X be a continuous rv. Then a probability distribution orprobability density function (pdf) of X is a function f (x) such thatfor any two numbers a and b with a < b

P(a ≤ X ≤ b) =∫ b

af (x)dx

I The probability that X takes on a value in the interval [a, b] isthe area under the graph of the density function f (x).

I Implication of this is that P(X = c) = 0 (or more colloquially,P(X = c) approaches zero)

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 11: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Heuristic for Probability Density

I Think of a pdf as a “smoothed” Density histogram

I The density is NOT the probability

I The density does describe the relative likelihood of the valuesin the support of the RV

I The density f (x)× ε ≈ probability of observations within ε ofpoint x if ε is small.

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 12: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Review of Integration

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Integration

x

f(x)

The definite integral of a function f (x) evaluated from a to b,which we denote as

∫ ba f (x)dx , gives the area under the curve

a and b tell us over what interval to evaluate the area,dx tell use with respect to which variable to take the integral

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 13: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Review of Integration: Summation

I Think of integration in terms of summation (technically, thelimit of a sum).

I Let x1, x2, . . . , xn+1 be a sequence between a and b

I∫ b

a f (x)dx ≈∑n

i=1 f (xi )(xi+1 − xi )

I Think of integral∫

as a sum∑

of the function f(x) evaluatedover a sequence of equally spaced points between a and b anddx as the distance between successive points in the sequence

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 14: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Properties of the pdf

I f (x) ≥ 0 for all x ∈ (−∞,+∞)

I∫∞−∞ f (x)dx = 1

I Is f (x) ≤ 1 for all x ∈ (−∞,+∞)?

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 15: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Wind turbin example: 4.4

Let X denote the vibratory stress (psi) on a wind turbine blade ata particular wind speed in a wind tunnel. As a model for thedistribution of X we use the Rayleigh distribution with pdf

f (x ; θ) = xθ2 exp{−x2/(2θ2)}, for x > 0

I Verify that f (x ; θ) is a legitimate pdf

I Suppose θ = 100. What is the probability that X is at most200? Less than 200?

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 16: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Example #1: Rayleigh Dist. Density Plot

0 100 200 300 400 500

0.00

00.

001

0.00

20.

003

0.00

40.

005

0.00

6

Rayleigh Distribution

Vibratory Stress (psi)

Den

sity

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 17: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

GPA example

The grade point average (GPA’s) for graduating seniors at acollege are distributed as a continuous random variable X with pdf

f (x) = k{1− (x − 3)2}, 2 ≤ x ≤ 4

Find the value of k

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 18: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Definition of Cumulative Distribution Function

Cumulative Distribution Function The cumulative distributionfunction F (x) for a continuous rv X is defined for every number xby

F (x) = P(X ≤ x) =∫ x

−∞f (y)dy

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 19: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

cdf Example #1

Find the cdf of the Rayleigh distribution with pdf given by

f (x ; θ) = xθ2 exp{−x2/(2θ2)}, for x > 0

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 20: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Properties of the cdf

I For all x ∈ (−∞,+∞), 0 ≤ F (x) ≤ 1

I F (−∞) = limx→−∞ P(X ≤ x) = 0

I F (∞) = limx→∞ P(X ≤ x) = 1

I If x < y then F (x) ≤ F (y)

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 21: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Using the cdf to Determine Probabilities Between TwoValues

I Let X be a continuous rv with pdf f (x) and cdf F (x). Thenfor any numbers a ≤ b, P(a ≤ X ≤ b) = F (b)− F (a).

I Because P(X = c) = 0, then P(a ≤ X ≤ b) = P(a < X ≤b) = P(a ≤ X < b) = P(a < X < b)

I

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

f(x)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

f(x)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

xf(

x)

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 22: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

cdf Example #2

A family of pdf’s that have been used to approximate thedistribution of income, city population, and size of firms is thePareto family. The family has two parameters k and θ where both> 0 and the pdf is given by

f (x ; k, θ) = kθk

xk+1 , x ≥ θ

I Find the cdf.

I For θ = 1 and k = 2 find P(1 ≤ X ≤ 3)

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 23: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Obtaining the pdf from the cdf

I If X is a continuous rv with pdf f (x) and cdf F (x), then atevery x at which the derivative F ′(x) exists, F ′(x) = f (x)

I Let F (x) = 1− exp(−λx). Find the density.

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 24: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Obtaining the pdf from the cdf

Find the density of the cdf given by:

F (x) =

0 if x < 0x2

4 if 0 ≤ x < 21 if 2 ≤ x

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 25: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Definition of Percentile

Think growth charts or standardized tests. We say that astudent scored in the 85th percentile of the ACT if she didbetter than 85% of all other students on the test. Formalizedefinition of percentile below.

Percentile Definition Let p be a number between 0 and 1. The(100p)th percentile of the distribution of a continuous rv X ,denoted by η(p), is defined by

p = F{η(p)} =∫ η(p)

−∞f (y)dy

Defining a percentile for a discrete distribution is problematic.Why?Median is the 50th percentile.

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 26: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Percentile Example #1

Find the median of the Rayleigh distributionRayleigh pdf

f (x ; θ) = xθ2 exp{−x2/(2θ2)}, for x > 0

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 27: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Definition of Expectation for Continuous RVs

I As with discrete random variables, expectation is a measure ofthe center of the distribution

I Long-run average of many observations from a distribution

Definition of Expectation The expected or mean value of acontinuous rv X with pdf f (x) is

E (X ) =∫ ∞−∞

xf (x)dx

For a continuous random variable does the expectation have to bein the support?

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 28: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Connection to Definition with Discrete RVs

Recall if X is discrete

E (X ) =∑x∈D

xP(X = x)

Again, think of integral as approximating a sum:

∫xf (x)dx ≈

n∑i=1

xi f (xi )(xi+1 − xi )

≈n∑

i=1xiP(xi ≤ X ≤ xi+1)

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 29: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Definition of Variance and Other Functions for ContinuousRVs

I For any function, h(X ), of the continuous rv X with pdf f (x)the expectation of h(X ) is given by

E (h(X )) =∫ ∞−∞

h(x)f (x)dx

I In particular, the variance of the continuous rv X , denoted byV (X ), with mean µ and pdf f (x) is E{(X − µ)2}

I As with discrete rv, V (X ) = E (X 2)− {E (X )}2

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Page 30: Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random Variables PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division

Expectation and Variance Example #1

Find the mean and variance of the Pareto distribution withpdf below and θ = 1 and k = 3

f (x ; k, θ) = kθk

xk+1 , x ≥ θ

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables