august 15. in chapter 7: 7.1 normal distributions 7.2 determining normal probabilities 7.3 finding...

Post on 16-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Apr 21, 2023

Chapter 7: Chapter 7: Normal Probability Normal Probability

DistributionsDistributions

In Chapter 7:

7.1 Normal Distributions

7.2 Determining Normal Probabilities

7.3 Finding Values That Correspond to Normal Probabilities

7.4 Assessing Departures from Normality

§7.1: Normal Distributions• Normal random variables are the most common

type of continuous random variable

• First described de Moivre in 1733

• Laplace elaborated the mathematics in 1812

• Describe some (not all) natural phenomena

• More importantly, describe the behavior of means

Normal Probability Density Function

• Recall the continuous random variables are described with smooth probability density functions (pdfs) – Ch 5

• Normal pdfs are recognized by their familiar bell-shape

This is the age distribution of a pediatric population. The overlying curve represents its Normal pdf model

Area Under the Curve• The darker bars of the

histogram correspond to ages less than or equal to 9 (~40% of observations)

• This darker area under the curve also corresponds to ages less than 9 (~40% of the total area)

2

21

2

1)(

x

exf

Parameters μ and σ• Normal pdfs are a family of distributions• Family members identified by parameters

μ (mean) and σ (standard deviation)

σ controls spreadμ controls location

Mean and Standard Deviation of Normal Density

μ

σ

Standard Deviation σ

• Points of inflections (where the slopes of the curve begins to level) occur one σ below and above μ

• Practice sketching Normal curves to feel inflection points

• Practice labeling the horizontal axis of curves with standard deviation markers (figure)

68-95-99.7 Rule forNormal Distributions

• 68% of the AUC falls within ±1σ of μ• 95% of the AUC falls within ±2σ of μ• 99.7% of the AUC falls within ±3σ of μ

Example: 68-95-99.7 Rule

Wechsler adult intelligence scores are Normally distributed with μ = 100 and σ = 15; X ~ N(100, 15). Using the 68-95-99.7 rule:

• 68% of scores fall in μ ± σ = 100 ± 15 = 85 to 115

• 95% of scores fall in μ ± 2σ = 100 ± (2)(15) = 70 to 130

• 99.7% of scores in μ ± 3σ = 100 ± (3)(15) = 55 to 145

Symmetry in the TailsBecause of the Normal curve is symmetrical and the total AUC adds to 1…

… we can determine the AUC in tails, e.g., Because 95% of curve is in μ ± 2σ, 2.5% is in each tail beyond μ ± 2σ

95%

Example: Male Height• Male height is approximately Normal with μ =

70.0˝ and σ = 2.8˝ • Because of the 68-95-99.7 rule, 68% of

population is in the range 70.0˝ 2.8˝ = 67.2 ˝ to 72.8˝

• Because the total AUC adds to 100%, 32%

are in the tails below 67.2˝ and above 72.8˝

• Because of symmetry, half of this 32% (i.e.,

16%) is below 67.2˝ and 16% is above 72.8˝

Example: Male Height

70 72.867.2

64%

16%16%

Reexpression of Non-Normal Variables

• Many biostatistical variables are not Normal

• We can reexpress non-Normal variables with a mathematical transformation to make them more Normal

• Example of mathematical transforms include logarithms, exponents, square roots, and so on

• Let us review the logarithmic transformation

Logarithms

• Logarithms are exponents of their base

• There are two main logarithmic bases

– common log10 (base 10)

– natural ln (base e)

Landmarks:• log10(1) = 0

(because 100 = 1) • log10(10) = 1

(because 101 = 10)

Example: Logarithmic Re-expression

• Prostate specific antigen (PSA) not Normal in 60 year olds but the ln(PSA) is approximately Normal with μ = −0.3 and σ = 0.8

• 95% of ln(PSA) falls in μ ± 2σ = −0.3 ± (2)(0.8) = −1.9 to 1.3

• Thus, 2.5% are above ln(PSA) 1.3; take anti-log of 1.3: e1.3 = 3.67

Since only 2.5% of population has values greater than 3.67 → use this as cut-point for suspiciously high results

§7.2: Determining Normal Probabilities

To determine a Normal probability when the value does not fall directly on a ±1σ, ±2σ, or ±3σ landmark, follow this procedure:

1. State the problem

2. Standardize the value (z score)

3. Sketch and shade the curve

4. Use Table B to determine the probability

Example: Normal ProbabilityStep 1. Statement of Problem

• We want to determine the percentage of human gestations that are less than 40 weeks in length

• We know that uncomplicated human pregnancy from conception to birth is approximately Normally distributed with μ = 39 weeks and σ = 2 weeks. [Note: clinicians measure gestation from last menstrual period to birth, which adds 2 weeks to the μ.]

• Let X represent human gestation: X ~ N(39, 2)

• Statement of the problem: Pr(X ≤ 40) = ?

Standard Normal (Z) Variable

• Standard Normal variable ≡ a Normal random variable with μ = 0 and σ = 0

• Called “Z variables”

• Notation: Z ~ N(0,1)

• Use Table B to look up cumulative probabilities

• Part of Table B shown on next slide…

Example: A Standard Normal (Z) variable with a value of 1.96 has a cumulative probability of .9750.

x

z

Normal ProbabilityStep 2. Standardize

5.02

3940

has )2,39(~ from 40 value theexample,For

z

NX

The z-score tells you how the number of σ-units the value falls above or below μ

To standardize, subtract μ and divide by σ.

3. Sketch and label axes4. Use Table B to lookup Pr(Z ≤ 0.5) = 0.6915

Steps 3 & 4. Sketch and Use Table B

Let a represent the lower boundary and b represent the upper boundary of a range:

Pr(a ≤ Z ≤ b) = Pr(Z ≤ b) − Pr(Z ≤ a)

Probabilities Between Two Points

Use of this concept will be demonstrate in class and on HW exercises.

§7.3 Finding Values Corresponding to Normal Probabilities

1. State the problem.2. Use Table B to look up the z-percentile

value.3. Sketch4. Unstandardize with this formula

pzx

Looking up the z percentile value

Use Table B to look up the z percentile value, i.e., the z score for the probability in questions

Look inside the table for the entry closest to the associated cumulative probability.

Then trace the z score to the row and column labels.

Notation: Let zp represents the z score with cumulative probability p, e.g., z.975 = 1.96

Suppose you wanted the 97.5th percentile z score. Look inside the table for .9750. Then trace the z score to the margins.

Finding Normal Values - Example

Suppose we want to know what gestational length is less than 97.5% of all gestations?

Step 1. State the problem!

Let X represent gestations length

Prior problem established X ~ N(39, 2)

We want the gestation length that is shorter than .975 of all gestations. This is equivalent to the gestation that is longer than.025 of gestations.

Example, cont.Step 2. Use Table B to look up the z value. Table B lists only “left tails”. “less than 97.5%” (right tail) = “greater than 2.5%” (left tail).

z lookup in table shows z.025 = −1.96

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

–1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233

3. Sketch

4. Unstandardize3508.35)2)(96.1(39 x

“The 2.5th percentile gestation is 35 weeks.”

7.4 Assessing Departures from Normality

Normal “Q-Q” Plot of same distribution

Approximately Normal histogram

The best way to assess Normality is graphically

A Normal distribution will adhere to a diagonal line on the Q-Q plot

Negative Skew

A negative skew will show an upward curve on the Q-Q plot

Positive Skew

A positive skew will show an downward curve on the Q-Q plot

Same data as previous slide but with logarithmic transform

A mathematical transform can Normalize a skew

Leptokurtotic

A leptokurtotic distribution (skinny tails) will show an S-shape on the Q-Q plot

top related