stat 31, section 1, last time independence –special case of “and” rule –relation to mutually...

55
Stat 31, Section 1, Last Time Independence Special Case of “And” Rule Relation to Mutually Exclusive Random Variables Discrete vs. Continuous Tables of Probabilities for Discrete R.V.s Areas as Probabilities for Continuous R.V.s

Upload: anna-richardson

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Stat 31, Section 1, Last Time

• Independence– Special Case of “And” Rule– Relation to Mutually Exclusive

• Random Variables– Discrete vs. Continuous– Tables of Probabilities for Discrete R.V.s– Areas as Probabilities for Continuous R.V.s

Page 2: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Means and Variances

(of random variables) Text, Sec. 4.4

Idea: Above population summaries, extended

from populations to probability distributions

Connection: frequentist view

Make repeated draws,

from the distribution

nXXX ,...,, 21

Page 3: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Discrete Prob. Distributions

Recall table summary of distribution:

Taken on by random variable X,

Probabilities: P{X = xi} = pi

(note: big difference between X and

x!)

Values x1 x2 … xk

Prob. p1 p2 … pk

Page 4: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Discrete Prob. Distributions

Table summary of distribution:

Recall power of this:

Can compute any prob., by summing pi

Values x1 x2 … xk

Prob. p1 p2 … pk

Page 5: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Mean of Discrete Distributions

Frequentist approach to mean:

kkii x

nxX

xnxX ##

11

n

XXX n1

i

k

iikk xpxpxp

111

n

xxXxxX kkii ## 11

Page 6: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Mean of Discrete Distributions

Frequentist approach to mean:

a weighted average of values

where weights are probabilities

i

k

iixpX

1

Page 7: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Mean of Discrete Distributions

E.g. Above Die Rolling Game:

Mean of distribution =

= (1/3)(9) + (1/6)(0) +(1/2)(-4) = 3 - 2 = 1

Interpretation: on average (over large number

of plays) winnings per play = $1

Conclusion: should be very happy to play

Winning 9 -4 0

Prob. 1/3 1/2 1/6

Page 8: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Mean of Discrete Distributions

Terminology: mean is also called:

“Expected Value”

E.g. in above game “expect” $1 (per play)

(caution: on average over many plays)

Page 9: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected Value

HW:

4.57

4.60 (2.45)

4.61

Page 10: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected Value

An application of Expected Value:

Assess “fairness” of games (e.g. gambling)

Major Caution: Expected Value is not what is

expected on one play, but instead is

average over many plays.

Cannot say what happens in one or a few

plays, only in long run average

Page 11: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected Value

E.g. Suppose have $5000, and need $10,000

(e.g. you owe mafia $5000, clean out safe at work. If you give to mafia, you go to jail, so decide to try to raise additional $5000 by gambling)

And can make even bets, where P{win} = 0.48

(can really do this, e.g. bets on Red in roulette at a casino)

Page 12: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected Value

E.g. Suppose have $5000, and need $10,000 and can make even bets, w/ P{win} = 0.48

Pressing Practical Problem:

• Should you make one large bet?

• Or many small bets?

• Or something in between?

Page 13: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected Value

E.g. Suppose have $5000, and need $10,000 and can make even bets, w/ P{win} = 0.48

Expected Value analysis:

E(Winnings) = P{lose} x $0 + P{win} x $2

= 0.52 x $0 + 0.48 x $2 =

= $0.96

Thus expect to lose $0.04 for every dollar bet

Page 14: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected ValueE.g. Suppose have $5000, and need $10,000

and can make even bets, w/ P{win} = 0.48

Expect to lose $0.04 for every dollar bet

• This is why gambling is very profitable

(for the casinos, been to Las Vegas?)

• They play many times

• So expected value works for them

• And after many bets, you will surely lose

• So should make fewer, not more bets?

Page 15: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected ValueE.g. Suppose have $5000, and need $10,000

and can make even bets, w/ P{win} = 0.48

Another view:

Strategy P{get $10,000}

one $5000 bet 0.48 ~ 1/2

two $2500 bets ~ (0.48)2 ~ 1/4

four $1250 bets ~ (0.48)2 ~ 1/16

“many” “no chance”

Page 16: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected ValueE.g. Suppose have $5000, and need $10,000

and can make even bets, w/ P{win} = 0.48

Surprising (?) answer:

• Best to make one big bet

• Not much fun…

• But best chance at winning

Casino Folklore:

• This really happens

• Folks walk in, place one huge bet….

Page 17: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected Value

Warning about Expected Value:

Excellent for some things, but not all decisions

e.g. if will play many times

e.g. if only play once

(so don’t have long run)

Page 18: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected ValueReal life decisions against Expected Value:

1. State Lotteries– State sells tickets– Keeps about half of $$$– Gives rest to ~ one (randomly chosen) player– So Expected Value is clearly negative– Why do people play? Totally irrational?– Players buy faint hope of humongous gain– Could be worth joy of thinking about it

Page 19: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected ValueReal life decisions against Expected Value:

1. State Lotteries– Want one in North Carolina?– You will be asked to decide

Interesting (and deep) philosophical balances:– Only totally voluntary tax– Yet tax burden borne mostly by poor– Is that fair?– But we lose revenue to other states…

Page 20: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected ValueReal life decisions against Expected Value:

2. Casino Gambling– Always lose in long run (expected value…)– Yet people do it. Are they nuts?– Depends on how many times they play– If really enjoy being ahead sometimes– Then could be worth price paid for the thrill– Serious societal challenge:

(some are totally consumed by thrill)

Page 21: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Expected ValueReal life decisions against Expected Value:3. Insurance

– Everyone pays about 2 x Expected Loss– Insurance Company keeps the rest!– So very profitable.– But e.g. car insurance is required by law!– Sensible, since if lose, can lose very big– Yet purchase is totally against Expected Value– OK, since you only play once (not many times)– Insurance Co’s play many times (Expected

Value works for them)– So they are an evening out mechanism

Page 22: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

And now for something completely different

Interesting Suggestion / Request

By Katie Baer

Well supported with Data / Analysis!

Page 23: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

SIMPLE MATH:

• Date of the 2005 NCAA Men’s Basketball Tournament Final: Monday, April 4th, 2005

• Date of the Stat 31 Midterm #2: Tuesday, April 5th, 2005

Page 24: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

WHY SHOULD STEVE RESCHEDULE THE

EXAM?

STATISTICAL EVIDENCE:

Page 25: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Frequency of Seeds Reaching Final Four

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12M

ore

Seed in Tourney

Fre

qu

ency

Bin Frequency

1 43

2 23

3 13

4 8

5 4

6 6

7 0

8 5

9 1

10 0

11 1

12 0

Probability of a #1 Seed Reaching the Final Four

Final Four Data:

2004-1979

P{FF} = 43/104 =0.413

http://cbs.sportsline.com/collegebasketball/mayhem/history/finalfourseeds

Page 26: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

How many of these #1 seeds actually win the Tourney?

NCAA Men's Basketball Champions

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9

Seed Number

Fre

qu

en

cy

P{Champ} = 12/25 = 0.48

48 %

Page 27: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

However, this assumes that North Carolina has an equal

probability of winning the Tourney as the other predicted #1 Seeds

(Illinois, Wake Forest, and Boston College)

NBC Sports, msnbc.com

Page 28: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

So we all know that…

• Illinois is undefeated

• Illinois beat Wake Forest 91-78 and is ranked #1 in the Big 10

• Wake Forest beat North Carolina 95-82

• North Carolina is ranked #1 in the ACC and is 4-2 versus ranked teams

• Boston College has lost only one game and is #1 in the Big Least, I mean East

Page 29: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

How do we determine which team is better?

• RPI is derived from three component factors: Div. I winning percentage (25)%, schedule strength (50)%; and opponent's schedule strength (25)%.

• How do the #1 Seeds’ RPI’s compare to the rest of the Top 25?

Page 30: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

RPI vs Rank of Top 25 Teams

R2 = 0.54

05

10152025303540

0 5 10 15 20 25 30

Rank

RP

I

As expected, teams with higher rankings have higher ranking RPI’s. This indicates that the best teams are going to be at the bottom left corner of the graph.

BUT… RPI’s are not an entirely accurate way of measuring team’s ability (as seen with mediocre R^2)

RPI does not take into account factors such as margin of victory, location of game, etc.

Page 31: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

A different approach…

• A study found that approximately 62.8% of all college students consume alcohol on a regular basis

http://www.ftc.gov/reports/alcohol/appendixa.htm

*Considering that this percentage does not take into account specific drinking statistics at UNC nor the fact that a national championship is at stake, this is a conservative figure

Number of students in Steve’s Stat. 31 class: 92 (from class exam data)

92*0.628 ≈ 58 people This number estimates the number of people

enrolled in Stat 31, section 1 that consume alcohol on a regular basis

Page 32: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

• A study by the NCAA showed that 87% of university students strongly believe that supporting collegiate sports is an integral part of college life

• http://www.ncaa.org/releases/miscellaneous/2004/2004090202ms.htm

Taking into account that watching sports and drinking alcohol are major aspects of college students’ lives, what is the probability that a college student will support college sports AND consume alcohol at the same time?

P{A} = 0.628, P{S} = 0.87P {A and S} = P{A}*P{S} = 0.628*0.87 = 0.546 (54.6%)

THUS, over half the class (approx. 50 people) will probably drink alcohol the night of the final game of the NCAA Tourney

Page 33: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Conclusions:• Carolina has a considerable chance of reaching

the Final Four and winning the NCAA tourney as a #1 seed as seen in past tournament data

• They have fierce competition, as seen with in the graph of RPI vs. Rank, for the title

• Over half of the class will probably consume alcohol the night of April 4th, resulting in difficulty in studying for a midterm scheduled the next day

• Note that these figures are very conservative percentages, given that students will most likely drink more when their team is in the final game and especially if it is a close, exciting match-up

Page 34: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

PLEASE MOVE THE TEST, STEVE!

GO HEELS!!!

Page 35: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

And now for something completely different

Now about that exam change request…

• It is possible

• But we all need to agree

• Some choices:

Thursday, April 7 or Tuesday, April 12

• Please email objections to either

Page 36: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Functions of Expected ValueImportant Properties of the Mean:i. Linearity:

Why?

i. e. mean “preserves linear transformations”

i i i

iiiiibaX bpxapbaxp

ba XbaX

bapbxpa Xi

ii

ii

Page 37: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Functions of Expected Value

Important Properties of the Mean:

ii. summability:

Why is harder, so won’t do here

i. e. can add means to get mean of sums

i. e. mean “preserves sums”

YXYX

Page 38: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Functions of Expected Value

E. g. above game:

If we “double the stakes”, then want:

“mean of 2X”

Recall $1 before

i.e. have twice the expected value

Winning 9 -4 0

Prob. 1/3 1/2 1/6

2$22 XX

Page 39: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Functions of Expected ValueE. g. above game:

If we “play twice”, then have

Same as above?

But isn’t playing twice different from doubling

stake?

Yes, but not in means

Winning 9 -4 0

Prob. 1/3 1/2 1/6

2$1$1$2121

XXXX

Page 40: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Functions of Expected ValueHW:

4.67

4.68 (70)

Page 41: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Indep. Of Random Variables

Independence: Random Variables X & Y

are independent when knowledge of

value of X does not change chances of

values of Y

Page 42: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Indep. Of Random Variables

HW:

4.64 (Indep., Dep., Dep.)

4.65

Page 43: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

IndependenceApplication: Law of Large Numbers

IF are independent draws from the

same distribution, with mean ,

THEN:

(needs more mathematics to make precise,

but this is the main idea)

nXX ,...,1

X

n"lim"

Page 44: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

IndependenceApplication: Law of Large Numbers

Note: this is the foundation of the

“frequentist view of probability”

Underlying thought experiment is based on

many replications, so limit works….

Page 45: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Variance of Random Variables

Again consider discrete random variables:

Where distribution is summarized by a table,

Values x1 x2 … xk

Prob. p1 p2 … pk

Page 46: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Variance of Random Variables

Again connect via frequentist approach:

n

iin XX

nXX

1

21 1

1,...,var

1

222

21

nXXXXXX n

1## 2

111

nXxxXXxxX kii

Page 47: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Variance of Random Variables

Again connect via frequentist approach:

2211 XxpXxp kk

n

iin XX

nXX

1

21 1

1,...,var

22

11

1#

1#

Xxn

xXXx

nxX

kkii

k

iii Xxp

1

2

Page 48: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Variance of Random VariablesSo define:

Variance of a distribution

As:

random variable

k

jXjjX xp

1

22

Page 49: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Variance of Random Variables

E. g. above game:

=(1/2)*5^2+(1/6)*1^2+(1/3)*8^2

Note: one acceptable Excel form,

e.g. for exam (but there are many)

Winning 9 -4 0

Prob. 1/3 1/2 1/6

2222 1931

1061

1421 X

X

Page 50: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Standard Deviation

Recall standard deviation is square root of

variance (same units as data)

E. g. above game:

Standard Deviation

=sqrt((1/2)*5^2+(1/6)*1^2+(1/3)*8^2)

Winning 9 -4 0

Prob. 1/3 1/2 1/6

Page 51: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Variance of Random VariablesHW:

C14: Find the variance and standard

deviation of the distribution in 4.60.

(1.21, 1.10)

Page 52: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Properties of Variancei. Linear transformation

I.e. “ignore shifts” var( ) = var

( )

(makes sense)

And scales come through squared

(recall s.d. on scale of data, var is square)

222XbaX a

Page 53: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Properties of Variance

ii. For X and Y independent (important!)

I. e. Variance of sum is sum of variances

Here is where variance is “more natural”

than standard deviation:

222YXYX

22YXX

Page 54: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Properties of Variance

E. g. above game:

Recall “double the stakes”, gave same mean, as “play twice”, but seems different

Doubling:

Play twice, independently:

Note: playing more reduces uncertainty

(var quantifies this idea, will do more later)

Winning 9 -4 0

Prob. 1/3 1/2 1/6

222 4 XX

2222 22121 XXXXX

Page 55: Stat 31, Section 1, Last Time Independence –Special Case of “And” Rule –Relation to Mutually Exclusive Random Variables –Discrete vs. Continuous –Tables

Variance of Random VariablesHW:

4.74 ((a) 550, 5.7, (b) 0, 5.7, (c) 1022, 10.3)

4.75