asistensi statprob 2014 prauts

44
Asistensi StatProb 2014 StatProb 01, 02, dan 03 Jumat, 28 Maret 2014

Upload: aida-safiera

Post on 28-May-2017

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Asistensi StatProb 2014 PraUTS

Asistensi StatProb 2014

StatProb 01, 02, dan 03Jumat, 28 Maret 2014

Page 2: Asistensi StatProb 2014 PraUTS

The mean or arithmetic average

To calculate the average, or mean, add all

values, then divide by the number of

individuals. It is the “center of mass.”

Sum of heights is 1598.3divided by 25 women = 63.9 inches

58.2 64.059.5 64.560.7 64.160.9 64.861.9 65.261.9 65.762.2 66.262.2 66.762.4 67.162.9 67.863.9 68.963.1 69.663.9

Measure of center: the mean

Page 3: Asistensi StatProb 2014 PraUTS

x 1598.3

2563.9

Mathematical notation:

n

iixn

x1

1

woman(i)

hei ght(x)

woman(i)

hei ght(x)

i = 1 x1= 58.2 i = 14 x 14= 64.0

i = 2 x2= 59.5 i = 15 x 15= 64.5

i = 3 x3= 60.7 i = 16 x 16= 64.1

i = 4 x4= 60.9 i = 17 x 17= 64.8

i = 5 x5= 61.9 i = 18 x 18= 65.2

i = 6 x6= 61.9 i = 19 x 19= 65.7

i = 7 x7= 62.2 i = 20 x 20= 66.2

i = 8 x8= 62.2 i = 21 x 21= 66.7

i = 9 x9= 62.4 i = 22 x 22= 67.1

i = 10 x 10= 62.9 i = 23 x 23= 67.8

i = 11 x 11= 63.9 i = 24 x 24= 68.9

i = 12 x 12= 63.1 i = 25 x 25= 69.6

i = 13 x 13= 63.9 n= 25 =1598.3

Learn right away how to get the mean using your calculators.

nxxxx n

...21

Page 4: Asistensi StatProb 2014 PraUTS

Your numerical summary must be meaningful.

Here the shape of the distribution is wildly irregular. Why?

Could we have more than one plant species or phenotype?

6.69x

The distribution of women’s heights appears coherent and symmetrical. The mean is a good numerical summary.

3.69x

Height of 25 women in a class

Page 5: Asistensi StatProb 2014 PraUTS

Height of Plants by Color

0

1

2

3

4

5

Height in centimeters

Num

ber o

f Pla

nts

redpinkblue

58 60 62 64 66 68 70 72 74 76 78 80 82 84

A single numerical summary here would not make sense.

9.63x 5.70x 3.78x

Page 6: Asistensi StatProb 2014 PraUTS

Measure of center: the medianThe median is the midpoint of a distribution—the number such that half of the observations are smaller and half are larger.

1. Sort observations by size.n = number of observations

______________________________

1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 8 2.39 9 2.5

10 10 2.811 11 2.912 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 8 4.722 9 4.923 10 5.324 11 5.6

n = 24 n/2 = 12

Median = (3.3+3.4) /2 = 3.35

2.b. If n is even, the median is the mean of the two middle observations.

1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 8 2.39 9 2.5

10 10 2.811 11 2.912 12 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 8 4.722 9 4.923 10 5.324 11 5.625 12 6.1

n = 25 (n+1)/2 = 26/2 = 13 Median = 3.4

2.a. If n is odd, the median is observation (n+1)/2 down the list

Page 7: Asistensi StatProb 2014 PraUTS

Mean and median for skewed distributions

Mean and median for a symmetric distribution

Left skew Right skew

MeanMedian

Mean Median

MeanMedian

Comparing the mean and the medianThe mean and the median are the same only if the distribution is symmetrical.

The median is a measure of center that is resistant to skew and outliers. The

mean is not.

Page 8: Asistensi StatProb 2014 PraUTS

M = median = 3.4

Q1= first quartile = 2.2

Q3= third quartile = 4.35

1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 1 2.39 2 2.5

10 3 2.811 4 2.912 5 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 1 4.722 2 4.923 3 5.324 4 5.625 5 6.1

Measure of spread: the quartilesThe first quartile, Q1, is the value in the

sample that has 25% of the data at or below

it ( it is the median of the lower half of the

sorted data, excluding M).

The third quartile, Q3, is the value in the

sample that has 75% of the data at or below

it ( it is the median of the upper half of the

sorted data, excluding M).

Page 9: Asistensi StatProb 2014 PraUTS

The standard deviation “s” is used to describe the variation around the mean. Like the mean, it is not resistant to skew or outliers.

2

1

2 )(1

1 xxn

sn

i

1. First calculate the variance s2.

2

1

)(1

1 xxn

sn

i

2. Then take the square root to get the standard deviation s.

Measure of spread: the standard deviation

Mean± 1 s.d.

x

Page 10: Asistensi StatProb 2014 PraUTS

Calculations …

We’ll never calculate these by hand, so make sure to know how to get the standard deviation using your calculator.

2

1

)(1 xxdf

sn

i

Mean = 63.4

Sum of squared deviations from mean = 85.2

Degrees freedom (df) = (n − 1) = 13

s2 = variance = 85.2/13 = 6.55 inches squared

s = standard deviation = √6.55 = 2.56 inches

Women height (inches)i xi x (xi-x) (xi-x)2 1 59 63.4 -4.4 19.0

2 60 63.4 -3.4 11.3

3 61 63.4 -2.4 5.6

4 62 63.4 -1.4 1.8

5 62 63.4 -1.4 1.8

6 63 63.4 -0.4 0.1

7 63 63.4 -0.4 0.1

8 63 63.4 -0.4 0.1

9 64 63.4 0.6 0.4

10 64 63.4 0.6 0.4

11 65 63.4 1.6 2.7

12 66 63.4 2.6 7.0

13 67 63.4 3.6 13.3

14 68 63.4 4.6 21.6

Mean 63.4

Sum 0.0

Sum 85.2

Page 11: Asistensi StatProb 2014 PraUTS

General addition rules General addition rule for any two events A and B:

The probability that A occurs,

or B occurs, or both events occur is:

P(A or B) = P(A) + P(B) – P(A and B)

Disjoint => P(A and B) = 0

What is the probability of randomly drawing either an ace or a heart from a deck of

52 playing cards? There are 4 aces in the pack and 13 hearts. However, 1 card is

both an ace and a heart. Thus:

P(ace or heart) = P(ace) + P(heart) – P(ace and heart)

= 4/52 + 13/52 - 1/52 = 16/52 ≈ .3

Page 12: Asistensi StatProb 2014 PraUTS

Conditional probabilityConditional probabilities reflect how the probability of an event can change if we know that some other event has occurred/is occurring.

– Example: The probability of rain is different if you live in Los Angeles than if you live in Seattle.

– Our brains calculate conditional probabilities, updating our sense of chance with each new piece of evidence.

The conditional probability of event B given event A is:(provided that P(A) ≠ 0)

)()()|(

APBandAPABP

Page 13: Asistensi StatProb 2014 PraUTS

General multiplication rules • The probability that any two events, A and B, both occur is:

P(A and B) = P(A)P(B|A)

This is the general multiplication rule.

• If A and B are independent, then P(A and B) = P(A)P(B) (If A and B are independent then P(B|A) is just P(B) .)

What is the probability of randomly drawing either an ace of heart from a deck of

52 playing cards? There are 4 aces in the pack and 13 hearts.

P(heart|ace) = 1/4 P(ace) = 4/52

P(ace and heart) = P(ace)* P(heart|ace) = (4/52)*(1/4) = 1/52

Notice that heart and ace are independent events.

Page 14: Asistensi StatProb 2014 PraUTS

Discrete random variables

A random variable is a rule which associates a number with

each outcome.

A basketball player shoots three free throws. We define the random

variable X as the number of baskets successfully made.

A discrete random variable X has a set of possible values that

can be listed.

A basketball player shoots three free throws. The number of

baskets successfully made is a discrete random variable (X). X can

only take the values 0, 1, 2, or 3.

Page 15: Asistensi StatProb 2014 PraUTS

The probability distribution of a random variable X associates the

values and their probabilities:

The probabilities pi must add up to 1.

A basketball player shoots three free throws. The random variable X is thenumber of baskets successfully made. Assume the probability of success is .5

Value of X 0 1 23

Probability 1/8 3/8 3/81/8

H

H

H - HHH

M …

M

M - HHM

H - HMH

M - HMM

HMM HHMMHM HMH

MMM MMH MHH HHH

Page 16: Asistensi StatProb 2014 PraUTS

Recall, the probability of any event is the sum of the probabilities pi of the values of X that make up the event.

A basketball player shoots three free throws. The random variable X is thenumber of baskets successfully made.

Value of X 0 1 23

Probability 1/8 3/8 3/81/8HMM HHM

MHM HMHMMM MMH MHH HHH

What is the probability that the player

successfully makes at least two baskets.

P(X≥2) = P(X=2) + P(X=3) = 3/8 + 1/8 = 1/2

What is the probability that the player successfully makes fewer than three baskets?

P(X<3) = P(X=0) + P(X=1) + P(X=2) = 1/8 + 3/8 + 3/8 = 7/8 or

P(X<3) = 1 – P(X=3) = 1 – 1/8 = 7/8

Page 17: Asistensi StatProb 2014 PraUTS

A continuous random variable X takes all values in an interval. Ex. There is an infinite number of values between 0 and 1, e.g. .1, .1234, .123456

How do we assign probabilities to events in an infinite sample space? We use density curves and compute probabilities for intervals. The probability of any event is the area under the density curve for the values of X that make up the event.

Continuous random variables

The probability that X falls between 0.3 and 0.7 is the

area under the density curve for that interval:

P(0.3 ≤ X ≤ 0.7) = (0.7 – 0.3)*1 = 0.4

At the left is a “Uniform density curve” for the variable X.

X

Page 18: Asistensi StatProb 2014 PraUTS

P(X < 0.5 or X > 0.8) = P(X < 0.5) + P(X > 0.8) = 1 – P(0.5 < X < 0.8) = 0.7

The probability of a single event is zero:

P(X=1) = (1 – 1)*1 = 0

IntervalsThe probability of a single number is meaningless for a continuous random variable.

Only intervals can have a non-zero probability, represented by the area under the

density curve for that interval.

Height= 1

X

The probability of an interval is the same whether boundary values are included or excluded:

P(0 ≤ X ≤ 0.5) = (0.5 – 0)*1 = 0.5P(0 < X < 0.5) = (0.5 – 0)*1 = 0.5P(0 ≤ X < 0.5) = (0.5 – 0)*1 = 0.5

Page 19: Asistensi StatProb 2014 PraUTS

Mean of a random variableThe mean, “x-bar”, of a set of observations is their arithmetic average.

The mean µ of a random variable X is a weighted average of the

possible values of X, where the weights are the probabilities of each of

the outcomes.

Value of X 0 1 23

Probability 1/8 3/8 3/81/8

HMM HHMMHM HMH

MMM MMH MHH HHH

A basketball player shoots three free throws. The random variable X is the

number of baskets successfully made (“H”).

The mean of a random variable X is also called expected value of X.

Page 20: Asistensi StatProb 2014 PraUTS

Mean of a discrete random variable

For a discrete random variable X with

probability distribution

the mean µ of X is found by multiplying each possible value of X by its

probability, and then adding the products.

Value of X 0 1 23

Probability 1/8 3/8 3/81/8

The mean µ of X is

µ = (0*1/8) + (1*3/8) + (2*3/8) + (3*1/8)

= 12/8 = 3/2 = 1.5

A basketball player shoots three free throws. The random variable X is the

number of baskets successfully made.

Page 21: Asistensi StatProb 2014 PraUTS

The probability distribution of continuous random variables is described by a density curve. The mean is at the “center of mass”.

Mean of a continuous random variable

The mean lies at the center of symmetric density curvessuch as the normal curves.

Exact calculations for the mean of a distribution with a skewed density curve involves Calculus.

Page 22: Asistensi StatProb 2014 PraUTS

Variance of a random variable

The variance and the standard deviation are the most common measures of spread.

In that way they are analogous the choice of the mean to measure center.

The variance σ2X of a random variable is a weighted average of the squared

deviations (X − µX)2 of the variable X from its mean µX. Each outcome is weighted by

its probability in order to take into account outcomes that are not equally likely.

The larger the variance of X, the more scattered the values of X on average. The

positive square root of the variance gives the standard deviation σ of X.

Page 23: Asistensi StatProb 2014 PraUTS

Variance of a discrete random variable

For a discrete random variable X

with probability distribution and mean µX, the variance σ2 of X is found by multiplying each squared

deviation of X by its probability and then adding all the products.

Value of X 0 1 23

Probability 1/8 3/8 3/81/8The variance σ2 of X is

σ2 = 1/8*(0−1.5)2 + 3/8*(1−1.5)2 + 3/8*(2−1.5)2 + 1/8*(3−1.5)2

= 2*(1/8*9/4) + 2*(3/8*1/4) = 24/32 = 3/4 = .75

A basketball player shoots three free throws. The random variable X is the

number of baskets successfully made.

µX = 1.5.

Page 24: Asistensi StatProb 2014 PraUTS

Binomial distributionsBinomial distributions are probability models for some categorical

variables, often the number of successes in a series of n trials.

The observations must meet these requirements:

– The total number of observations n is fixed in advance.

– Each observation falls into exactly 1 of 2 categories.

– The outcomes of all are independent.

– All observations have the same probability of “success,” p.

We record the next 50 births at a local hospital. Each newborn is either a boy or a

girl; each baby is either born on a Sunday or not.

Page 25: Asistensi StatProb 2014 PraUTS

We express a binomial distribution for the number of X of successes among n observations as a function of the parameters n and p.

• The parameter n is the total number of observations.• The parameter p is the probability of success on each observation.• The number of successes X can be any integer between 0 and n.

A coin is flipped 10 times. Each outcome is either a head or a tail.

Say the variable X is the number of heads among those 10 flips, our count of

“successes.”

On each flip, the probability of success, “heads,” is 0.5. Assume independent

outcomes. The number X of heads among 10 flips has the binomial

distribution B(n = 10, p = 0.5).

Page 26: Asistensi StatProb 2014 PraUTS

Binomial mean and standard deviation

The center and spread of the binomial

distribution for a count X are defined by

the mean m and standard deviation s:

)1( pnpnpqnp sm

Effect of changing p when n is fixed.

a) n = 10, p = 0.25

b) n = 10, p = 0.5

c) n = 10, p = 0.75

For small samples, binomial distributions are

skewed when p is different from 0.5.0

0.05

0.1

0.15

0.2

0.25

0.3

0 1 2 3 4 5 6 7 8 9 10

Number of successes

P(X=

x)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 1 2 3 4 5 6 7 8 9 10

Number of successes

P(X=

x)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 1 2 3 4 5 6 7 8 9 10

Number of successes

P(X

=x) a)

b)

c)

Page 27: Asistensi StatProb 2014 PraUTS

The Poisson Probability Distribution

Simeon Denis Poisson • "Researches on the probability of

criminal civil verdicts" 1837 • Looked at the form of the

binomial distribution When the Number of

Trials is Large. • He derived the cumulative

Poisson distribution as the Limiting case of the Binomial When the Chance of SuccessTends to Zero.

Page 28: Asistensi StatProb 2014 PraUTS

• Poisson Distribution: An approximation to binomial distribution for the SPECIAL CASE when the average number (mean µ) of successes is very much smaller than the possible number n. i.e. µ << n because p << 1.

• This distribution is important for the study of such phenomena as radioactive decay. This distribution is NOT necessarily symmetric! Data are usually bounded on one side & not the other.

An advantage of this distribution is that σ2 = μ

The Poisson Distribution

µ = 1.67σ = 1.29

µ = 10.0σ = 3.16

Page 29: Asistensi StatProb 2014 PraUTS

The Poisson Distribution Models Counts.If events happen at a constant rate over time, the Poisson Distribution gives

The Probability of X Numberof Events Occurring in a time T.

• This distribution tells us theProbability of All Possible Numbers of

Counts, from 0 to Infinity. • If X= # of counts per second, then the Poisson probability

that X = k (a particular count) is:

• Here, λ ≡ the average number of counts per second.

!)(

kekXpk

Page 30: Asistensi StatProb 2014 PraUTS

Mean and Variance for thePoisson Distribution

• It’s easy to show that for this distribution, The Mean is:

• Also, it’s easy to show that The Variance is:.

l

L

• So, The Standard Deviation is:

m

s 2

s

For a Poisson Distribution, the variance and mean are equal!

Page 31: Asistensi StatProb 2014 PraUTS

Terminology: A “Poisson Process” • The Poisson parameter can be given as the mean

number of events that occur in a defined time period OR, equivalently, can be given as a rate, such as = 2 events per month. must often be multiplied by a time t in a physical process

(called a “Poisson Process” )

!)()(ketkXP

tk

μ = t σ = t

More on the Poisson Distribution

Page 32: Asistensi StatProb 2014 PraUTS

Example1. If calls to your cell phone are a Poisson process with a

constant rate = 2 calls per hour, what is the probability that, if you forget to turn your phone off in a 1.5 hour

class, your phone rings during that time?Answer: If X = # calls in 1.5 hours, we want

P(X ≥ 1) = 1 – P(X = 0)

P(X ≥ 1) = 1 – .05 = 95% chance2. How many phone calls do you expect to get during the class?

<X> = t = 2(1.5) = 3

05.!0

)3(!0

)5.1*2()0( 330)5.1(20

eeeXP

Page 33: Asistensi StatProb 2014 PraUTS

33

Conditions Requiredfor the

Poisson Distribution to hold:l

1. The rate is a constant, independent of time.2. Two events never occur at exactly the same time.

3. Each event is independent. That is, the occurrence of one event does not make the next event more or less likely to happen.

Page 34: Asistensi StatProb 2014 PraUTS

Example

λ = (5 defects/hour)*(0.25 hour)= 1.25

p(x) = (xe-)/(x!)x = given number of defects

P(x = 0) = (1.25)0e-1.25)/(0!) = e-1.25 = 0.287

= 28.7%

• A production line produces 600 parts per hour with an average of 5 defective parts an hour. If you test every part that comes off the line in 15 minutes, what is the probability of finding no defective parts (and incorrectly concluding that your process is perfect)?

Page 35: Asistensi StatProb 2014 PraUTS

0

0 .0 5

0 .1

0 .1 5

0 .2

0 .2 5

0 .3

0 .3 5

0 .4

Prob

abili

ty

0 .0 1 .0 2 .0 3 .0 4 .0 5 .0 6 .0 7 .0

m

b in o mialp o isso nm1

N=10, p=0. 1

0

0.1

0.2

0.3

0.4

0.5

Prob

abilit

y

0 1 2 3 4 5m

poissonbinomial

m1N=3, p=1/3

Comparison of Binomial & Poisson Distributions with Mean μ = 1

Clearly, there is not much difference between them!

For N Large & m Fixed:Binomial Poisson

Page 36: Asistensi StatProb 2014 PraUTS

Poisson Distribution: As λ (Average # Counts) gets large, this also approaches a Gaussian

l

λ = 5 λ = 15

λ = 25 λ = 35

Page 37: Asistensi StatProb 2014 PraUTS

We often assume random variables are normally distributed. One particular

normal distribution is shown below.

Example: Probability distribution of women’s heights.

Here since we chose a woman randomly, her height, X, is a random variable.

To calculate probabilities with the normal distribution, we will standardize the random variable (z score) and use Table A.

Normal probability distributions

Page 38: Asistensi StatProb 2014 PraUTS

The Normal DistributionOverview

• A continuous random variable is said to be normally distributed with mean m and variance s2 if its probability density function is

• f(x) is not the same as P(x)– P(x) would be 0 for every x because the normal distribution

is continuous– However, P(x1 < X ≤ x2) = f(x)dx

f (x) =1

s2(x m)2/2s2

e

x1

x2

Page 39: Asistensi StatProb 2014 PraUTS

Standardize normal data by calculating z-scores so that any Normal curve

N(ms) can be transformed into the standard Normal curve N(0,1).

Reminder: standardizing N(ms)

N(0,1)

=>

z

x

N(64.5, 2.5)

Standardized height (no units)

sm)(

xz

Page 40: Asistensi StatProb 2014 PraUTS

What is the probability, if we pick one woman at random, that her height will be in some range? For instance, between 68 and 70 inches P(68 < X < 70)? We assume here that m= 64.5 and s= 2.5Because the woman is selected at random, X is a random variable.

As before, we calculate the z-scores for 68 and 70.

For x = 68",

For x = 70",

z(x m)s

4.15.2

)5.6468(

z

z(70 64.5)2.5

2.2

The area under the curve for the interval [68" to 70"] is 0.9861 − 0.9192 = 0.0669.Thus, the probability that a randomly chosen woman falls into this range is 6.69%.

P(68 < X < 70) = 6.69%

0.98610.9192

N(µ, s) = N(64.5, 2.5)

Page 41: Asistensi StatProb 2014 PraUTS

Inverse problem:

Your favorite chocolate bar is dark chocolate with whole hazelnuts.The weight on the wrapping indicates 8 oz. Whole hazelnuts vary in weight, so how can they guarantee you 8 oz. of your favorite treat? You are a bit skeptical...

To avoid customer complaints and lawsuits, the manufacturer makes sure that 98% of all chocolate bars weigh 8 oz. or more.

The manufacturing process is roughly normal and has a known variability s = 0.2 oz.

How should they calibrate the machines to produce bars with a mean msuch that P(x < 8 oz.) = 2%?

m= ?x = 8 oz.

Lowest2%

s = 0.2 oz.

Page 42: Asistensi StatProb 2014 PraUTS

How should they calibrate the machines to produce bars with a mean m such that P(x < 8 oz.) = 2%?

z(x m)s

m x (z *s ) . 41.8)2.0*05.2(8 ozm

Here we know the area under the density curve (2% = 0.02) and we know x (8 oz.).We want m.

In table A we find that the z for a left area of 0.02 is roughly z = -2.05.

Thus, your favorite chocolate bar weighs, on average, 8.41 oz. Excellent!!!

m= ?x = 8 oz.

Lowest2%

s = 0.2 oz.

Page 43: Asistensi StatProb 2014 PraUTS
Page 44: Asistensi StatProb 2014 PraUTS