stat 155, section 2, last time

57
Stat 155, Section 2, Last Time • Normal Distribution: – Interpretation: 68%-95%-99.7% rule – Computation of areas (frequencies) – Inverse Normal area computation • Diagnostics (for Normal approximation) – Normal Quantile plot (linear?) • Relations between variables – Scatterplots – useful visualization

Upload: viola

Post on 05-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Stat 155, Section 2, Last Time. Normal Distribution: Interpretation: 68%-95%-99.7% rule Computation of areas (frequencies) Inverse Normal area computation Diagnostics (for Normal approximation) Normal Quantile plot (linear?) Relations between variables - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stat 155,  Section 2, Last Time

Stat 155, Section 2, Last Time

• Normal Distribution:– Interpretation: 68%-95%-99.7% rule– Computation of areas (frequencies)– Inverse Normal area computation

• Diagnostics (for Normal approximation)– Normal Quantile plot (linear?)

• Relations between variables– Scatterplots – useful visualization

Page 2: Stat 155,  Section 2, Last Time

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 102-112, 123-127, 132-145

Approximate Reading for Next Class:

Pages 151-163, 173-179, 192-196

Page 3: Stat 155,  Section 2, Last Time

E-mail Special Request

Date: Mon, 29 Jan 2007 19:37:30 -0500

Subject: special request

Professor Marron,

I was wondering if you could do a problem like C5 in class tomorrow? I went through the notes, and I went through the workbook for Excel but it seems as though I just can't seem to get how to make a normal standard distribution. Thank you.

Page 4: Stat 155,  Section 2, Last Time

An Earlier EmailDate: Thu, 25 Jan 2007 08:34:34 -0500 (EST)

> I don't understand how you draw a density curve on excel without having data> points. In the NORMDIST function, you need data to go along with u, s, and> False. How do you draw it without, or where is the data?

Right, in the class example that we considered (Stor155Eg8Done.xls), we thought about fitting a normal curve to a data set. We did this by taking the mean, and s.d. of the data set, and then using that to generate the appropriate memmber of the family of normal curves.

Problem C5 is in some sense easier, since bascically the work fo calculationg the mean and s.d. is already done (note they are given as 63.1 and 4.8). So you only need to go through the other steps of generating the graphics input.

I guess that one question that will come up is "what to use for endpoints of the x grid?" You could experiment a bit, but usually

mean +- 3 s.d.gives a nice looking curve. We will see why in today's class meeting.

Page 5: Stat 155,  Section 2, Last Time

Another EmailDate: Sun, 28 Jan 2007 21:32:36 -0500 (EST)

> Hey Professor Marron, I'm having some problems with C5. Ok, so I opened > excel spreadsheet, I typed in the mean and median, I did mean+ 3sd mean-

3sd> For the X-value under NOrdmdist I put in the two x-values and then put in the> sd and mean, and put in FALSE for the cumulative since we want a height> distribution, but it gave me back a number. Am I doing the wrong function?

Hmm, sounds like you may not be computing enough points to generate the plot. Basically you should generate a whole column of X-values, and then plug all of those into NORMDIST.

An example of this available in Class Eg 8, which is linked to page 19 of the notes for 1/23/07.

In that spreadsheet, this grid is in cells E78-E178. The corresponding calls of NORMDIST appear inthe range: J78 - J178.

Then you plot those against each other.

Page 6: Stat 155,  Section 2, Last Time

Decision Problem:When should I do additional things in class?

When should I send people to Open Tutorial Sessions?

Depends on # benefitted

• 1 or 2: send to Open Tutorials

• Majority: should do in Class

Page 7: Stat 155,  Section 2, Last Time

Your Opinion?

1. Raise hand if you think this is worth class time right now.

2. Raise hand if you find this prospect boring, and want to move on instead.

If we do this, go to Class Example 8:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg8Done.xls

Page 8: Stat 155,  Section 2, Last Time

Variable Relationships

Chapter 2 in Text

Idea: Look beyond single quantities, to how quantities relate to each other.

E.g. How do HW scores “relate”

to Exam scores?

Section 2.1: Useful graphical device:

Scatterplot

Page 9: Stat 155,  Section 2, Last Time

Plotting Bivariate Data

Toy Example:

(1,2)

(3,1)

(-1,0)

(2,-1)

Toy Scatterplot, Separate Points

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-2 -1 0 1 2 3 4

x

y

Page 10: Stat 155,  Section 2, Last Time

Plotting Bivariate Data

Common Name: “Scatterplot”

A look under the hood:

EXCEL: Chart Wizard (colored bar icon)

• Chart Type: XY (scatter)

• Subtype controls points only, or lines

• Later steps similar to above

(can massage the pic!)

Page 11: Stat 155,  Section 2, Last Time

Important Aspects of Relations

I. Form of Relationship

II. Direction of Relationship

III. Strength of Relationship

Page 12: Stat 155,  Section 2, Last Time

I. Form of Relationship• Linear: Data approximately follow a line

Previous Class Scores Examplehttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg10.xls

Final vs. High values of HW is “best”

• Nonlinear: Data follows different pattern

Nice Example: Bralower’s Fossil Data

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg11.xls

Page 13: Stat 155,  Section 2, Last Time

Bralower’s Fossil Datahttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg11.xls

From T. Bralower, formerly of Geological Sci.

Studies Global Climate, millions of years ago:

• Ratios of Isotopes of Strontium

• Reflects Ice Ages, via Sea Level

(50 meter difference!)

• As function of time

• Clearly nonlinear relationship

Page 14: Stat 155,  Section 2, Last Time

II. Direction of Relationship

• Positive Association (slopes upwards)

X bigger Y bigger

• Negative Association (slopes down)

X bigger Y smaller

E.g. X = alcohol consumption, Y = Driving Ability

Clear negative association

Page 15: Stat 155,  Section 2, Last Time

III. Strength of Relationship

Idea: How close are points to lying on a line?

Revisit Class Scores Example:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg10.xls

• Final Exam is “closely related to HW”

• Midterm 1 less closely related to HW

• Midterm 2 even related to Midterm 1

Page 16: Stat 155,  Section 2, Last Time

Linear Relationship HW

2.3, 2.5, 2.7, 2.11

Page 17: Stat 155,  Section 2, Last Time

Comparing Scatterplots

Additional Useful Visual Tool:

• Overlaying multiple data sets

• Allows comparison

• Use different colors or symbols

• Easy in EXCEL (colors are automatic)

Already done in HW scores example:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg12.xls

Page 18: Stat 155,  Section 2, Last Time

Comparing Scatterplots HW

2.17

Page 19: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Remember it takes a college degree to fly a plane, but only a high school diploma to fix one.

After every flight, Qantas pilots fill out a form, called a gripe sheet which tells mechanics about problems with the aircraft. The mechanics correct the problems, document their repairs on the form, and then pilots review the gripe sheets before the next flight.

Page 20: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Never let it be said that ground crews lack a sense of humor. Here are some actual maintenance complaints submitted by Qantas' pilots (marked with a P) and the solutions recorded (marked with an S) by maintenance engineers.

Page 21: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Never let it be said that ground crews lack a sense of humor. Here are some actual maintenance complaints submitted by Qantas' pilots (marked with a P) and the solutions recorded (marked with an S) by maintenance engineers.

By the way, Qantas is the only major airline that has never, ever, had an accident.

Page 22: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

P: Left inside main tire almost needs replacement.

S: Almost replaced left inside main tire.

Page 23: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

P: Test flight OK, except auto-land very rough.

S: Auto-land not installed on this aircraft.

Page 24: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

P: Dead bugs on windshield.

S: Live bugs on back-order.

Page 25: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

P: Evidence of leak on right main landing gear.

S: Evidence removed.

Page 26: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

P: IFF inoperative in OFF mode.

S: IFF always inoperative in OFF mode.

Page 27: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

P: Number 3 engine missing.

S: Engine found on right wing after brief search.

Page 28: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

P: Noise coming from under instrument panel. Sounds like a midget pounding on something with a hammer.

S: Took hammer away from midget.

Page 29: Stat 155,  Section 2, Last Time

Section 2.2: Correlation

Main Idea: Quantify Strength of Relationship

Context:

– A numerical summary

– In spirit of mean and standard deviation

– But now applies to pairs of variables

Page 30: Stat 155,  Section 2, Last Time

Section 2.2: Correlation

Main Idea: Quantify Strength of Relationship

Specific Goals:

– Near 1: for positive relat’ship & nearly linear

– > 0: for positive relationship (slopes up)

– = 0: for no relationship

– < 0: for negative relationship (slopes down)

– Near -1: for negative relat’ship & nearly linear

Page 31: Stat 155,  Section 2, Last Time

Correlation - Approach

Numerical Approach:

for symmetric around

has similar properties

Worked out Example :http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg13.xls

)0,0(),( ii yx

n

iii yx

1

Page 32: Stat 155,  Section 2, Last Time

Correlation – Graphical View

Plots (a) & (b), illustrating :

• > 0 for positive relationship

• < 0 for negative relationship

• Bigger for data closer to line

Problem 1: Not between -1 & 1

Problem 2: Feels “Scale”, see plot (c)

Problem 3: Feels “Shift” even more, see (d)

(even gets sign wrong!)

n

iii yx

1

Page 33: Stat 155,  Section 2, Last Time

Correlation - Approach

Solution to above problems:

Standardize!

Define Correlation

n

i y

i

x

i

syy

sxx

r1

Page 34: Stat 155,  Section 2, Last Time

Correlation - Example

Revisit above examplehttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg13.xls

• r is always same, and ~1, for (a), (c), (d)

• r < 0, and not so close to -1, for (b)

Page 35: Stat 155,  Section 2, Last Time

Correlation - Example

A look under the hoodhttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg13.xls

• Cols A&B: generated random numbers

(will study later)• Product versions used SUMPRODUCT• r computed with CORREL (important)• r’s same for (a) & (c), since Y’s are “just shifted”• r’s also same for (d), since x’s and Y’s shifted

(standardization cancels shifts & scales)

Page 36: Stat 155,  Section 2, Last Time

Correlation - Example

Revisit Class Scores Example:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg10.xls

• r is always > 0

• r is biggest for Final vs. HW

• r is smallest for MT2 vs. MT1

Page 37: Stat 155,  Section 2, Last Time

Correlation - Example

Fun Example from Publisher’s Website:

http://bcs.whfreeman.com/ips5e/

Choose

• Statistical Applets

• Correlation and Regression

Gives feeling for how correlation is affected by changing data.

Page 38: Stat 155,  Section 2, Last Time

Correlation - Example

Fun Example from Publisher’s Website:

http://bcs.whfreeman.com/ips5e/

Interesting Exercise:

• Choose points to give correlation r = 0.95

(within 0.01)

• Destroy with a few outliers

Page 39: Stat 155,  Section 2, Last Time

Correlation - HW

HW:

2.23

2.25

2.27a

Page 40: Stat 155,  Section 2, Last Time

Correlation - Outliers

Caution:

Outliers can strongly affect correlation, r

HW:

2.27b

2.30 (big outlier reduces correlation)

Also: recompute correlation with outlier removed

Page 41: Stat 155,  Section 2, Last Time

And now for something completely different

Recall

Distribution

of majors of

students in

this course:

Stat 155, Section 2, Majors

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Busine

ss /

Man

.

Biolog

y

Public

Poli

cy /

Health

Pharm

/ Nur

sing

Jour

nalis

m /

Comm

.

Env. S

ci.

Other

Undec

ided

Fre

qu

ency

Page 42: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Tried to Google “Public Policy Jokes”

But couldn’t find anything decent.

Next tried “Public Health Jokes”

And came up with…

Page 43: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Regular Consumption of Guinness

Well now, you see it's like this....

Page 44: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

A herd of buffalo can only move as fast as the slowest buffalo. And when the herd is hunted, it is the slowest and weakest ones at the rear that are killed. This natural selection is good for the herd as a whole because only the fittest survive thus improving the general health and speed of the entire herd.

Page 45: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

In much the same way the human brain

only operates as quickly as the slowest

of it's brain cells. Excessive intake of

alcohol kills brain cells, as we all know,

and naturally the alcohol attacks the

slowest/weakest cells first....

Page 46: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

So it is as plain as the nose on your face

that regular consumption of Guinness

will eliminate the weaker, slower brain

cells thus leaving the remaining cells

the best in the brain.

Page 47: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

The end result, of course, is a faster more efficient brain.

If you doubt this at all, tell me, isn't it true that we always feel a bit smarter after a few pints?

Page 48: Stat 155,  Section 2, Last Time

Section 2.3: Linear Regression

Idea:

Fit a line to data in a scatterplot

• To learn about “basic structure”

• To “model data”

• To provide “prediction of new values”

Page 49: Stat 155,  Section 2, Last Time

Linear Regression

Recall some basic geometry:A line is described by an equation:

y = mx + b

m = slope m

b = y intercept b

Varying m & b gives a “family of lines”,Indexed by “parameters” m & b

Page 50: Stat 155,  Section 2, Last Time

Basics of Lines

Textbook’s notation:

Y = bx + a

b = m (above) = slope

a = b (above) = y-intercept

Page 51: Stat 155,  Section 2, Last Time

Basics of Lines

HW (to review line ideas):

C6: Fred keeps his savings in his mattress. He begins with $500 from his mother, and adds $100 each year. His total savings y, after x years are given by the equation:

y = 500 + 100 x

(a) Draw a graph of this equation.

Page 52: Stat 155,  Section 2, Last Time

Basics of LinesC6: (cont.)

(b) After 20 years, how much will Fred have?

($2500)

(c) If Fred adds $200 instead of $100 each year to his initial $500, what is the equation that describes his savings after x years? (y = 500 + 200 x)

Page 53: Stat 155,  Section 2, Last Time

Linear Regression

Approach:

Given a scatterplot of data:

Find a & b (i.e. choose a line)

to “best fit the data”

),(),...,,( 11 nn yxyx

Page 54: Stat 155,  Section 2, Last Time

Linear Regression - Approach

Given a line, , “indexed” by

Define “residuals” = “data Y” – “Y on line”

=

Now choose to make these “small”

),( 11 yx

abxy

)( abxy ii

),( 22 yx

),( 33 yx

ab&

ab&

Page 55: Stat 155,  Section 2, Last Time

Linear Regression - Approach

Excellent Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

• Try choosing points near a line

• Then throw in outlier

• Clear and put points on curve

• Use “Residual Plot” to diagnose that line is not a good fit to data.

Page 56: Stat 155,  Section 2, Last Time

Linear Regression - Approach

JAVA Demo, by David Lane at Rice U.http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html

• Try drawing lines (to min MSE)

• Experiment with slopes

• And intercepts

• Guess r?

Page 57: Stat 155,  Section 2, Last Time

Linear Regression - Approach

Make Residuals > 0, by squaring

Least Squares: adjust to

Minimize the “Sum of Squared Errors”

ab&

21

)(

n

iii abxySSE