Final Examination
• Thursday, April 30, 4:00 – 7:00
• Location: here, Hanes 120
Final Examination
• Thursday, April 30, 4:00 – 7:00
• Location: here, Hanes 120
• Suggested Study Strategy: Rework HW
Final Examination
• Thursday, April 30, 4:00 – 7:00
• Location: here, Hanes 120
• Suggested Study Strategy: Rework HW
• Out of Class Review Session?
Final Examination
• Thursday, April 30, 4:00 – 7:00
• Location: here, Hanes 120
• Suggested Study Strategy: Rework HW
• Out of Class Review Session?
• No, personal Q-A much better use of time
Final Examination
• Thursday, April 30, 4:00 – 7:00
• Location: here, Hanes 120
• Suggested Study Strategy: Rework HW
• Out of Class Review Session?
• No, personal Q-A much better use of time
• Thus, instead offer extended Office Hours
Final Examination
Extended Office Hours
Monday, April 27, 8:00 – 11:00
Tuesday, April 28, 12:00 – 2:30
Wednesday, April 29, 1:00 – 5:00
Thursday, April 30, 8:00 – 1:00
Last Time
• Comparing Scatterplots
• Measuring Strength of Relationship– Correlation
• Two Sample Inference– Paired Sampling– Independent Sampling
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 110-135, 560-574
Approximate Reading for Next Class:
None, review only
2 Sample Measurement Error
Easy Case: Paired Differences
Have Treatment 1:
Treatment 2:
nXXX ,,, 21
nYYY ,,, 21
2 Sample Measurement Error
Easy Case: Paired Differences
Have Treatment 1:
Treatment 2:
Hard case: 2 different (unmatched) samples
nXXX ,,, 21
nYYY ,,, 21
2 Sample Measurement Error
Easy Case: Paired Differences
Have Treatment 1:
Treatment 2:
Hard case: 2 different (unmatched) samples
nXXX ,,, 21
nYYY ,,, 21
XnXXX ,,, 21
YnYYY ,,, 21
2 Sample Measurement Error
Easy Case: Paired Differences
Have Treatment 1:
Treatment 2:
Hard case: 2 different (unmatched) samples
different!
nXXX ,,, 21
nYYY ,,, 21
XnXXX ,,, 21
YnYYY ,,, 21
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Notes:
• There are several variations
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Notes:
• There are several variations
• For Hypo. Testing, EXCEL works well
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Notes:
• There are several variations
• For Hypo. Testing, EXCEL works well
• Variations well labelled in TTEST
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Main Ideas:
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Main Ideas:
Data:
XnXXX ,,, 21
YnYYY ,,, 21
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Main Ideas:
Data:
Sample Averages:
XnXXX ,,, 21
YnYYY ,,, 21
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Main Ideas:
Data:
Sample Averages:
XnXXX ,,, 21
YnYYY ,,, 21
X
XX
nNX
,~
Y
YY
nNY
,~
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Base inference on:
YX
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Base inference on:
Probability Theory (can show):
YX
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Base inference on:
Probability Theory (can show):
YX
Y
Y
X
XYX nn
NYX22
,~
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Probability Theory (can show):
Y
Y
X
XYX nn
NYX22
,~
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Probability Theory (can show):
Assumptions
Y
Y
X
XYX nn
NYX22
,~
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Probability Theory (can show):
Assumptions:
• Xs & Ys Independent
Y
Y
X
XYX nn
NYX22
,~
2 Sample Measurement Error
Hard case: 2 different (unmatched) samples
Probability Theory (can show):
Assumptions:
• Xs & Ys Independent
• Otherwise based on Law of Averages
Y
Y
X
XYX nn
NYX22
,~
2 Sample Measurement Error
Step towards statistical inference:
2 sample Z statistic
1,0~22
N
nn
YX
Y
Y
X
X
YX
2 Sample Measurement Error
Step towards statistical inference:
2 sample Z statistic
• Just do standardization (usual idea)
1,0~22
N
nn
YX
Y
Y
X
X
YX
2 Sample Measurement Error
Step towards statistical inference:
2 sample Z statistic
• Just do standardization (usual idea)
• Handle unknown s.d.s???
1,0~22
N
nn
YX
Y
Y
X
X
YX
2 Sample Measurement Error
For unknown s.d.s, use usual approx:
For 2 sample t statistic
Y
Y
X
X
YX
ns
ns
YX22
YYXX ss ,
2 Sample Measurement Error
2 sample t statistic:
Y
Y
X
X
YX
ns
ns
YX22
2 Sample Measurement Error
2 sample t statistic:
Probability Distribution
Y
Y
X
X
YX
ns
ns
YX22
2 Sample Measurement Error
2 sample t statistic:
Probability Distribution:
• 2 sample version of t distribution
Y
Y
X
X
YX
ns
ns
YX22
2 Sample Measurement Error
2 sample t statistic:
Probability Distribution:
• 2 sample version of t distribution
• Well modelled by EXCEL using TTEST
Y
Y
X
X
YX
ns
ns
YX22
2 Sample Measurement Error
2 sample t statistic:
Probability Distribution:
• 2 sample version of t distribution
• Well modelled by EXCEL using TTEST
• Use this for Hypothesis Testing
Y
Y
X
X
YX
ns
ns
YX22
2 Sample Measurement Error
Variations on TTest
2 Sample Measurement Error
Variations on TTest: Argument “Type”
1. Paired (simple case above)
2 Sample Measurement Error
Variations on TTest: Argument “Type”
1. Paired (simple case above)
2. Two sample, equal variance
(studied below)
2 Sample Measurement Error
Variations on TTest: Argument “Type”
1. Paired (simple case above)
2. Two sample, equal variance
(studied below)
3. Two sample, unequal variance
(version derived above)
2 Sample Measurement Error
Variations on TTest: Argument “Type”
2. Two sample, equal variance
2 Sample Measurement Error
Variations on TTest: Argument “Type”
2. Two sample, equal variance
Main Idea: when
• Can find an “improved estimate”
YX
2 Sample Measurement Error
Variations on TTest: Argument “Type”
2. Two sample, equal variance
Main Idea: when
• Can find an “improved estimate”
• By “pooling data”
YX
2 Sample Measurement Error
Variations on TTest: Argument “Type”
2. Two sample, equal variance
Main Idea: when
• Can find an “improved estimate”
• By “pooling data”
• i.e. use combined
YX
YXs
2 Sample Measurement Error
Variations on TTest: Argument “Type”
2. Two sample, equal variance
Main Idea: when
• Can find an “improved estimate”
• By “pooling data”
• i.e. use combined
• Won’t use in this class
YX
YXs
2 Separate SamplesE.g. Old Textbook 7.32:
b. Do separate sample Hypo test,
Class Example 15, Part 3http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg15.xls
2 Separate SamplesE.g. Old Textbook 7.32:
b. Do separate sample Hypo test,
Class Example 15, Part 3http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg15.xls
2 Separate SamplesE.g. Old Textbook 7.32:
b. Do separate sample Hypo test,
Class Example 15, Part 3http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg15.xls
Use type = 3 (don’t know common variance)
2 Separate SamplesE.g. Old Textbook 7.32:
b. Do separate sample Hypo test,
Class Example 15, Part 3http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg15.xls
P-value = 3.95 x 10-6
2 Separate SamplesE.g. Old Textbook 7.32:
b. Do separate sample Hypo test,
Class Example 15, Part 3http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg15.xls
P-value = 3.95 x 10-6
Interpretation: very strong evidence
2 Separate SamplesE.g. Old Textbook 7.32:
b. Do separate sample Hypo test,
Class Example 15, Part 3http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg15.xls
P-value = 3.95 x 10-6
Interpretation: very strong evidence
either yes-no or gray level
2 Separate SamplesSuggested HW:
7.81, 7.82
2 Sample Hypo Testing
Comparison of Paired vs. Unmatched Cases
Notes:
• Can always use unmatched procedure
– Just ignore matching…
• Advantage to pairing???
2 Sample Hypo Testing
Comparison of Paired vs. Unmatched Cases
• Advantage to Pairing???
• Recall previous example:
Old Textbook 7.32
– Matched Paired P-value = 1.87 x 10-5
– Unmatched P-value = 3.95 x 10-6
• Unmatched better!?! (can happen)
2 Sample Hypo Testing
Comparison of Paired vs. Unmatched Cases
• Advantage to Pairing???
Happens when “variation of diff’s”, ,
is smaller than “full sample variation”
I.e.
(whether this happens depends on data)
D
Y
Y
X
XD nn
22
Paired vs. Unmatched SamplingClass Example 29:
A new drug is being tested that should boost white
blood cell count following chemo-therapy. For
a set of 4 patients, it was not administered (as
a control) for the 1st round of chemotherapy,
and then the new drug was tried after the 2nd
round of chemotherapy. White blood cell
counts were measured one week after each
round of chemotherapy.
Paired vs. Unmatched Sampling
Class Example 29:
The resulting white blood cell counts were:
Patient 1 33 35
Patient 2 26 27
Patient 3 36 39
Patient 4 28 30
Paired vs. Unmatched Sampling
Class Example 29:
Does the new drug seem to reduce white
blood cell counts well enough to be
studied further?
• Seems to be some improvement
• But is it statistically significant?
• Only 4 patients…
Paired vs. Unmatched Sampling
Let: = Average Blood c’nts w/out drug
= Average Blood c’nts with drug
Set up:
(want strong evidence of improvement)
YX
YXA
YX
H
H
:
:0
Paired vs. Unmatched Sampling
Class Example 29:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg29.xls
Results:
• Matched Pair P-val = 0.00813
– Very strong evidence of improvement
• Unmatched P-val = 0.295
– Not statistically significant
Paired vs. Unmatched Sampling
Class Example 29:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg29.xls
Conclusions:
• Paired Sampling can give better results
• When diff’ing reduces variation
• Often happens for careful matching
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
Paired Sampling Visualization
2 Sample ProportionsIn text Section 8.2
• Skip this
• Ideas are only slight variation of above
• Basically mix & Match of 2 sample
ideas, and proportion methods
• If you need it (later), pull out text
• Covered on exams to extent it is in HW
Research Corner
Example of High Dimensional Visualization
Research Corner
Example of High Dimensional Visualization:
Microarray Analysis
Research Corner
Example of High Dimensional Visualization:
Microarray Analysis
For a biological tissue sample
(e.g. tumor from a cancer biopsy))
Research Corner
Example of High Dimensional Visualization:
Microarray Analysis
For a biological tissue sample
simultaneously measure “gene expression”
(activity level)
Research Corner
Example of High Dimensional Visualization:
Microarray Analysis
For a biological tissue sample
simultaneously measure “gene expression”
over all human genes
(~38,000)
Research Corner
Example of High Dimensional Visualization:
Microarray Analysis
For a biological tissue sample
simultaneously measure “gene expression”
over all human genes
Data set considered here:
Breast Cancer, ~2500 genes
Research Corner
Research Corner
Research Corner
Research Corner
Research Corner
Research Corner
Research Corner
Research Corner
Research Corner
Research Corner
Research Corner
Section 2.3: Linear Regression
Idea:
Fit a line to data in a scatterplot
Section 2.3: Linear Regression
Idea:
Fit a line to data in a scatterplot
• To learn about basic structure
Section 2.3: Linear Regression
Idea:
Fit a line to data in a scatterplot
• To learn about basic structure
• To model data
Section 2.3: Linear Regression
Idea:
Fit a line to data in a scatterplot
• To learn about basic structure
• To model data
• To provide prediction of new values
Linear Regression
Recall some basic geometry
Linear Regression
Recall some basic geometry:A line
Linear Regression
Recall some basic geometry:A line is described by an equation:
y = mx + b
Linear Regression
Recall some basic geometry:A line is described by an equation:
y = mx + b
Really {(x,y} : y = mx + b} “set of all ordered pairs such that y = mx + b”
Linear Regression
Recall some basic geometry:A line is described by an equation:
y = mx + b
m = slope
Linear Regression
Recall some basic geometry:A line is described by an equation:
y = mx + b
m = slope m
Linear Regression
Recall some basic geometry:A line is described by an equation:
y = mx + b
m = slope m
b = y intercept
Linear Regression
Recall some basic geometry:A line is described by an equation:
y = mx + b
m = slope m
b = y intercept b
Linear Regression
Recall some basic geometry:A line is described by an equation:
y = mx + b
m = slope m
b = y intercept b
Varying m & b gives a family of lines
Linear Regression
Recall some basic geometry:A line is described by an equation:
y = mx + b
m = slope m
b = y intercept b
Varying m & b gives a family of lines,Indexed by parameters m & b
Basics of Lines
Textbook’s notation:
Y = b0 + b1x .
Basics of Lines
Textbook’s notation:
Y = b0 + b1x = b1x + b0.
Basics of Lines
Textbook’s notation:
Y = b0 + b1x = b1x + b0.
b1 = m (above) = slope
Basics of Lines
Textbook’s notation:
Y = b0 + b1x = b1x + b0.
b1 = m (above) = slope
b0 = b (above) = y-intercept
Basics of Lines
Suggested HW (to review line ideas):
C24: Fred keeps his savings in his mattress. He begins with $500 from his mother, and adds $100 each year. His total savings y, after x years are given by the equation:
y = 500 + 100 x
(a) Draw a graph of this equation.
Basics of LinesC24: (cont.)
(b) After 20 years, how much will Fred have?
($2500)
(c) If Fred adds $200 instead of $100 each year to his initial $500, what is the equation that describes his savings after x years? (y = 500 + 200 x)
Linear Regression
Approach:
Given a scatterplot of data:
),(),...,,( 11 nn yxyx
Linear Regression
Approach:
Given a scatterplot of data:
),(),...,,( 11 nn yxyx
Linear Regression
Approach:
Given a scatterplot of data:
Find b0 & b1
),(),...,,( 11 nn yxyx
Linear Regression
Approach:
Given a scatterplot of data:
Find b0 & b1
(i.e. choose a line)
),(),...,,( 11 nn yxyx
Linear Regression
Approach:
Given a scatterplot of data:
Find b0 & b1
(i.e. choose a line)
to best fit the data
),(),...,,( 11 nn yxyx
Linear Regression
Approach:
Given a scatterplot of data:
Find b0 & b1
(i.e. choose a line)
to best fit the data
),(),...,,( 11 nn yxyx
Linear Regression - Approach
),( 11 yx),( 22 yx
),( 33 yx
Linear Regression - Approach
Given a line, y = b1x + b0
),( 11 yx),( 22 yx
),( 33 yx
Linear Regression - Approach
Given a line, y = b1x + b0, indexed by b0 & b1
),( 11 yx),( 22 yx
),( 33 yx
Linear Regression - Approach
Given a line, y = b1x + b0, indexed by b0 & b1
Define residuals
),( 11 yx),( 22 yx
),( 33 yx
Linear Regression - Approach
Given a line, y = b1x + b0, indexed by b0 & b1
Define residuals = data Y – Y on line
),( 11 yx),( 22 yx
),( 33 yx
Linear Regression - Approach
Given a line, y = b1x + b0, indexed by b0 & b1
Define residuals = data Y – Y on line
= Yi – (b1xi + b0)
),( 11 yx),( 22 yx
),( 33 yx
Linear Regression - Approach
Given a line, y = b1x + b0, indexed by b0 & b1
Define residuals = data Y – Y on line
= Yi – (b1xi + b0)
Now choose b0 & b1
),( 11 yx),( 22 yx
),( 33 yx
Linear Regression - Approach
Given a line, y = b1x + b0, indexed by b0 & b1
Define residuals = data Y – Y on line
= Yi – (b1xi + b0)
Now choose b0 & b1 to make these “small”
),( 11 yx),( 22 yx
),( 33 yx
Linear Regression - Approach
Make Residuals > 0, by squaring
Least Squares: adjust b0 & b1 to
minimize the Sum of Squared Errors
21
01 )(
n
iii bxbySSE
Linear Regression - Approach
JAVA Demo, by David Lane at Rice U.http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
Linear Regression - Approach
JAVA Demo, by David Lane at Rice U.http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
• Applet gives us scatterplot with data
(Appears to be
randomly generated)
David Lane Demo AppletRaw Data
Linear Regression - Approach
JAVA Demo, by David Lane at Rice U.http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
• Applet gives us scatterplot with data
• Try drawing lines (to min MSE)
David Lane Demo Applet(Deliberately dumb) hand-drawn line
David Lane Demo Applet(Deliberately dumb) hand-drawn line
Measure Fit (bad) by Mean Square Error
Linear Regression - Approach
JAVA Demo, by David Lane at Rice U.http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
• Applet gives us scatterplot with data
• Try drawing lines (to min MSE)
• Experiment with intercepts, b0
David Lane Demo Applet(Hopefully better) hand-drawn line
David Lane Demo Applet(Hopefully better) hand-drawn line
Yes! Improved (smaller) MSE
David Lane Demo AppletBest choice of b0?
David Lane Demo AppletBest choice of b0?
Try to vertically center, i.e. b0 = Avg(Ys)
David Lane Demo AppletManual attempt at b0 = Avg(Ys)
David Lane Demo AppletManual attempt at b0 = Avg(Ys)
As expected: improved MSE
Linear Regression - Approach
JAVA Demo, by David Lane at Rice U.http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
• Applet gives us scatterplot with data
• Try drawing lines (to min MSE)
• Experiment with intercepts, b0
• And slopes, b1
David Lane Demo AppletNext try to allow slope, while maintaining
intercept
David Lane Demo AppletNext try to allow slope, while maintaining
intercept (so go through center)
David Lane Demo AppletSlope which direction?
(apparent small downward trend?)
David Lane Demo AppletMake an attempt
David Lane Demo AppletMake an attempt: Worse MSE!?!
David Lane Demo AppletMake an attempt: Worse MSE!?!
Perhaps too steep? Try less…
David Lane Demo AppletNext attempt
David Lane Demo AppletNext attempt: Improved MSE!
David Lane Demo AppletCould try to fine tune more,
but let’s look at best possible next
David Lane Demo AppletCould try to fine tune more,
but let’s look at best possible next
David Lane Demo AppletCould try to fine tune more,
but let’s look at best possible next
David Lane Demo AppletOur center point (intercept) was off
(too high)
David Lane Demo AppletBut our slope looks pretty good
Linear Regression - Approach
JAVA Demo, by David Lane at Rice U.http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
• Applet gives us scatterplot with data
• Try drawing lines (to min MSE)
• Experiment with intercepts, b0
• And slopes, b1
• Guess the correlation, r?
David Lane Demo AppletNext try to guess correlation
David Lane Demo AppletNext try to guess correlation:
Answer
David Lane Demo AppletTry Another Data Set
(using)
David Lane Demo Applet• Clearly slopes upwards
• Apparently stronger correlation
David Lane Demo Applet• Clearly slopes upwards
• Apparently stronger correlation
David Lane Demo Applet• Clearly slopes upwards
• Apparently stronger correlation
Linear Regression - Approach
Given a line, y = b1x + b0, indexed by b0 & b1
Define residuals = data Y – Y on line
= Yi – (b1xi + b0)
Now choose b0 & b1 to make these “small”
),( 11 yx),( 22 yx
),( 33 yx
Linear Regression - Approach
Make Residuals > 0, by squaring
Least Squares: adjust b0 & b1 to
minimize the Sum of Squared Errors
21
01 )(
n
iii bxbySSE
Linear Regression - Approach
Make Residuals > 0, by squaring
Least Squares: adjust b0 & b1 to
minimize the Sum of Squared Errors
(How to Compute?)
21
01 )(
n
iii bxbySSE
Least Squares
Can Show: (math beyond this course)
Least Squares
Can Show: (math beyond this course)
Least Squares Fit Line
Least Squares
Can Show: (math beyond this course)
Least Squares Fit Line:
• Passes through the point ),( yx
Least Squares
Can Show: (math beyond this course)
Least Squares Fit Line:
• Passes through the point
• Has Slope:
),( yx
x
y
s
srb
Least Squares
Can Show: (math beyond this course)
Least Squares Fit Line:
• Passes through the point
• Has Slope:
(correction factor uses correlation, r)
),( yx
x
y
s
srb
Least Squares
Can Show: (math beyond this course)
Least Squares Fit Line:
• Passes through the point
• Has Slope:
(correction factor uses correlation, r)
(think r = 0, and r < 0)
),( yx
x
y
s
srb
Least Squares in Excel
• Could compute manually
Least Squares in Excel
• Could compute manually
(using formulas for sX, sY &
r)
Least Squares in Excel
• Could compute manually
• But EXCEL provides useful summaries
Least Squares in Excel
• Could compute manually
• But EXCEL provides useful summaries:
– INTERCEPT (computes y-intercept b0)
Least Squares in Excel
• Could compute manually
• But EXCEL provides useful summaries:
– INTERCEPT (computes y-intercept b0)
– SLOPE (computes slope b1)
Least Squares in Excel
Suggested
HW: 2.59 a, b
Least Squares in Excel
• Could compute manually
• But EXCEL provides useful summaries:
– INTERCEPT (computes y-intercept b0)
– SLOPE (computes slope b1)
Additional trick: To draw overlay fit line
Least Squares in Excel
• Could compute manually
• But EXCEL provides useful summaries:
– INTERCEPT (computes y-intercept b0)
– SLOPE (computes slope b1)
Additional trick: To draw overlay fit line
(to existing data plot)
Least Squares in Excel
• Could compute manually
• But EXCEL provides useful summaries:
– INTERCEPT (computes y-intercept b0)
– SLOPE (computes slope b1)
Additional trick: To draw overlay fit line, – Right click a data point
Least Squares in Excel
• Could compute manually
• But EXCEL provides useful summaries:
– INTERCEPT (computes y-intercept b0)
– SLOPE (computes slope b1)
Additional trick: To draw overlay fit line, – Right click a data point
– Choose: “Add Trendline” from menu
Least Squares in Excel
Suggested
HW: 2.59 c
And now for something completely different
Can you guess the phrase,that these pictures intend to convey?
Requires:“Thinking Outside the Box”
Also Called:“Lateral Thinking”
And now for something completely different
And now for something completely different
Egg Plant
And now for something completely different
And now for something completely different
Pool Table
And now for something completely different
And now for something completely different
Hole Milk
And now for something completely different
And now for something completely different
Tap Dancers
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
What University?
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
What University?
California State University
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
What University?
California State University
San Bernardino
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
• Now we choose data
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
• Now we choose data
• Applet draws fit line
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
• Now we choose data
• Applet draws fit line
• Study quality of fit, using Residual Plot
Diagnostic for Linear Regression
Recall Normal Quantile plot shows “how well
normal curve fits a data set”
Diagnostic for Linear Regression
Recall Normal Quantile plot shows “how well
normal curve fits a data set”
Useful visual assessment of how well the
regression line fits data
Diagnostic for Linear Regression
Recall Normal Quantile plot shows “how well
normal curve fits a data set”
Useful visual assessment of how well the
regression line fits data is:
Residual Plot
Diagnostic for Linear Regression
Recall Normal Quantile plot shows “how well
normal curve fits a data set”
Useful visual assessment of how well the
regression line fits data is:
Residual Plot
Just Plot of Residuals (on Y axis),
versus X’s (on X axis)
Charles Stanton Demo AppletAdd point by clicking
(no line yet…)
Charles Stanton Demo AppletAdd point by clicking, and another
Charles Stanton Demo AppletAdd point by clicking, and another
Applet draws line
Charles Stanton Demo AppletAdd point by clicking, and another
Applet draws line, and gives equation
Charles Stanton Demo AppletAdd point by clicking, and another
Applet draws line, and gives equation
And plots residuals
Charles Stanton Demo AppletNow add another point
(goal: very close to line)
Charles Stanton Demo AppletNow add another point
Equation similar, (but not exact)
Charles Stanton Demo AppletNow add another point
Residuals now non-0
Charles Stanton Demo AppletNow add another point
Residuals now non-0
& Magnify relative differences
Charles Stanton Demo AppletNow add more points along line
Residuals magnify differences
Charles Stanton Demo AppletNow add more points along line
Residuals magnify differences
(note change of scale)
Charles Stanton Demo AppletNow add more points along line
Major players clearly stand out
Charles Stanton Demo AppletOutliers have a drastic impact
Charles Stanton Demo AppletOutliers have a drastic impact
Poor fit to data along previous line
Charles Stanton Demo AppletOutliers have a drastic impact
Poor fit to data along previous line
(shows up clearly)
Charles Stanton Demo AppletMisfit shows up clearly
Especially nonlinear relationships
Charles Stanton Demo AppletMisfit shows up clearly
Especially nonlinear relationships
(even when hard to see in scatterplot)
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
• Now we choose data
• Applet draws fit line
• Study quality of fit, using Residual Plot
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
• Now we choose data
• Applet draws fit line
• Study quality of fit, using Residual Plot
• Useful visual diagnostic
Linear Regression - Insight
Another Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress
• Now we choose data
• Applet draws fit line
• Study quality of fit, using Residual Plot
• Useful visual diagnostic
(Good at highlighting problems)
Residual Diagnostic Plot
Toy Examples:http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg19.xls
1. Generate Data to follow a line
• Residuals seem to be randomly distributed
• No apparent structure
• Residuals seem “random”
• Suggests linear fit is a good model for data
Residual Diagnostic Plot
Toy Examples:http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg19.xls
2. Generate Data to follow a Parabola
• Shows systematic structure
• Pos. – Neg. – Pos. suggests data follow a
curve (not linear)
• Suggests that line is a poor fit
Residual Diagnostic Plot
Example from text: problem 2.74http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg15.xls
Study (for runners), how Stride Rate
depends on Running Speed
(to run faster, need faster strides)
a. & b. Scatterplot & Fit line
c. & d. Residual Plot & Comment
Residual Diagnostic Plot E.g.http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg15.xls
a. & b. Scatterplot & Fit line
• Linear fit looks very good
• Backed up by correlation ≈ 1
• “Low noise” because data are averaged
(over 21 runners)
Residual Diagnostic Plot E.g.http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg15.xls
c. & d. Residual Plot & Comment
• Systematic structure: Pos. – Neg. – Pos.
• Not random, but systematic pattern
• Suggests line can be improved
(as a model for these data)
• Residual plot provides “zoomed in view”
(can’t see this in raw data)
Residual Diagnostic Plot
Suggested HW 2.87
Effect of a Single Data Point
Suggested HW: 2.102
Least Squares Prediction
Idea: After finding a & b (i.e. fit line)
For new x, predict new value of y,
Using b x + a
I. e. “predict by point on the line”
Least Squares Prediction
EXCEL Prediction: revisit examplehttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg14.xls
EXCEL offers two functions:
• TREND
• FORECAST
They work similarly, input raw x’s and y’s
(careful about order!)
Least Squares Prediction
Caution: prediction outside range of data is called “extrapolation”
Dangerous, since small errors are magnified
Least Squares Prediction
Suggested HW:
2.67a, b,
2.75 (hint, use Least Squares formula above, since don’t have raw data)
Interpretation of r squared
Recall correlation measures
“strength of linear relationship”
is “fraction of variation explained by line”
for “good fit”
for “very poor fit”
measures “signal to noise ratio”
1
2r
0
r
2r
Interpretation of r squaredRevisit
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg13.xls
(a, c, d) “data near line”high signal to noise ratio
(b) “noisier data”low signal to noise ratio
(c) “almost pure noise”nearly no signal
197.02 r
58.02 r
003.02 r
Interpretation of r squared
Suggested HW:
2.67 c
Statistical Inference
Show
Excel Output