s010y: answering questions with quantitative data class 12/iii.4: conducting residual analysis

15
What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data? Do people who say they study for more hours also think they’ll finish their doctorate earlier? Are computer literates less anxious about statistics? …. ? Are men more likely to study part-time? Are women more likely to enroll in CCE? …. ? Questions that Require Us To Examine Relationships Between Features of the Participants. How tall are class members, on average? How many hours a week do class members report that they study? …. ? How many members of the class are women? What proportion of the class is fulltime? …. ? Questions That Require Us To Describe Single Features of the Participants “Continuous” Data “Categorical” Data Research Is A Partnership Of Questions And Data © Willett, Harvard University Graduate School of Education, 06/27/22 S010Y/C11 – Slide 1 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

Upload: lane-mendez

Post on 03-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

What Types Of Data Are Collected?. Research Is A Partnership Of Questions And Data. “Categorical” Data. “Continuous” Data. S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis. What Kinds Of Question Can Be Asked Of Those Data?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

What Types Of Data Are Collected?

What Kinds Of Question Can Be

Asked Of Those Data?

Do people who say they study for more hours also think they’ll finish their doctorate earlier?

Are computer literates less anxious about statistics?

…. ?

Are men more likely to study part-time?

Are women more likely to enroll in CCE?

…. ?

Questions that Require Us To

Examine Relationships

Between Features of the

Participants.

How tall are class members, on average?

How many hours a week do class members report that they study?

…. ?

How many members of the class are women?

What proportion of the class is fulltime?

…. ?

Questions That Require Us To

DescribeSingle Features

of the Participants

“Continuous”

Data

“Categorical”

Data

Research Is A Partnership Of

Questions And Data

Research Is A Partnership Of

Questions And Data

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 1

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

Page 2: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 2

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

OPTIONS Nodate Pageno=1; TITLE1 'A010Y: Answering Questions with Quantitative Data';TITLE2 'Class 11/Handout 1: Dissecting Relationships Between Continuous Variables';TITLE3 'The Infamous Wallchart Data';TITLE4 'Data in WALLCHT.txt'; *--------------------------------------------------------------------------------*Input data, name and label variables in the dataset*--------------------------------------------------------------------------------*;DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; *--------------------------------------------------------------------------------*Representing the nature of the relationship of HSGRADRT and STRATIO*--------------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; OUTPUT OUT=DIAGNOSE R=RAWRESID P=PREDVAL;PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10;

Having examined the “smooth” with regression analysis, let’s examine the “rough” with residual analysis …

Here are the PC-SAS data input statements that you’ve come to

know and love

Here’s the OLS regression analysis,

using PROC REG, that you’ve seen before (with one additional line that we will discuss later).

Standard scatterplot of the HSGRADRT vs.

STRATIO relationship

Page 3: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 3

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate   Parameter Estimates  Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46  Parameter Estimates  Variable Label DF Pr > |t|  Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174

Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate   Parameter Estimates  Parameter StandardVariable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46  Parameter Estimates  Variable Label DF Pr > |t|  Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174

Here’s the regression output that you’ve seen before, and which specifies the fitted regression line…..Here’s the regression output that you’ve seen before, and which specifies the fitted regression line…..

These “Parameter Estimates” provide the fitted trend line as the following fitted model:

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

Intercept Slope

Page 4: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 4

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

STRATIOHSGRADRT

Predictedof value

Observed)12.1(69.93of value

The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at every value of STRATIO. For instance…The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at every value of STRATIO. For instance…

1. When STRATIO = 13.3 (the minimum value of STRATIO),

Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = 93.69 – 14.90 = 78.8

1. When STRATIO = 13.3 (the minimum value of STRATIO),

Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = 93.69 – 14.90 = 78.8

2. When STRATIO = 24.7 (the maximum value of STRATIO),

Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = 93.69 – 27.66 = 66.0

2. When STRATIO = 24.7 (the maximum value of STRATIO),

Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = 93.69 – 27.66 = 66.0

Plot these values to obtain the

fitted trend line

Here’s the fitted regression model that you recognize …

Page 5: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 5

This provides us with the “smooth” – where’s the “rough”? …This provides us with the “smooth” – where’s the “rough”? …

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

66.066.0

24.724.713.313.3

78.878.8

Now, to examine the rough …

Let’s pick a few states, and compare our predictions of HS

graduation rate to the actual observed values.

We call this the “analysis of residuals”…

Page 6: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 6

Here’s the “rough” for Minnesota …Here’s the “rough” for Minnesota …

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

How about Minnesota?How about Minnesota?• Observed values of the outcome and

the predictor:

• STRATIO = 17.1

• HSGRADRT = 90.9, &

• Predicted value of HSGRADRT, obtained from fitted regression line:

%4.741.17)12.1(69.93ˆ

ADRTRHSG

%4.741.17)12.1(69.93ˆ

ADRTRHSG

17.117.1

90.990.9

74.474.4

Minnesota graduated a higher percentage of its

High-School Seniors than we would have predicted, given its student/teacher

ratio.

Page 7: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 7

Here’s the “rough” for Hawaii …Here’s the “rough” for Hawaii …

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

How about Hawaii?How about Hawaii?

• Observed values of the outcome and the predictor:

• HSGRADRT = 69.1, &

• STRATIO = 21.6

• Predicted value of HSGRADRT:

%5.696.21)12.1(69.93ˆ

ADRTRHSG

%5.696.21)12.1(69.93ˆ

ADRTRHSG

Hawaii graduated about the percentage of its

High-School Seniors that we would have predicted, given its student/teacher

ratio.

21.621.6

69.169.1

69.569.5

Page 8: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 8

Here’s the “rough” for Minnesota …Here’s the “rough” for Minnesota …

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

How about New York State?How about New York State?

• Observed values of the outcome and the predictor:

• HSGRADRT = 62.3, &

• STRATIO = 15.2

• Predicted value of HSGRADRT:

%7.762.15)12.1(69.93ˆ

ADRTRHSG

%7.762.15)12.1(69.93ˆ

ADRTRHSG

New York State graduated a much smaller

percentage of its High-School seniors than we would have predicted,

based on its student/teacher ratio.

15.215.2

62.362.3

76.776.7

Page 9: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 9

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

On a scatterplot with a fitted regression line, the “vertical distance” between the observed value of HSGRADRT and its predicted value is called the residual…..

On a scatterplot with a fitted regression line, the “vertical distance” between the observed value of HSGRADRT and its predicted value is called the residual…..

Value

PredictedValue

ObservedResidual

Value

PredictedValue

ObservedResidual

StateResidual

Computation

Conclusion:

State graduated HS seniors at a rate that is…

Minnesota (90.90 – 74.54) = 16.36 … better than predicted, based on STRATIO

Hawaii (69.10 – 69.50) = -0.40 … about as predicted, based on STRATIO

New York State (62.30 – 76.67) = -14.37 … worse than predicted, based on STRATIO

etc.

Residuals can be informative and useful: Residuals represent individual deviations from the average trend: They tell us about HSGRADRT, while taking “into account” or “controlling for” STRATIO.

• They tell us whether states are doing “better” or “worse” than we would have predicted, given our knowledge of their student/teacher ratio.

Page 10: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 10

S010Y: Answering Questions with Quantitative DataClass 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative DataClass 12/III.4: Conducting Residual Analysis

<titling and input lines omitted>>*------------------------------------------------------------------------*Representing the nature of the relationship of HSGRADRT and STRATIO*------------------------------------------------------------------------*;PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; OUTPUT OUT=DIAGNOSE R=RAWRESID P=PREDVAL;

We don’t have to compute the residuals and predicted values by hand….

You can ask PC-SAS to compute the residuals for you,and to output them into a diagnostic dataset, for you to explore.

P = PREDVAL

P command tells PC-SAS that you also want to put the

predicted values into the new output dataset, and call

them PREDVAL.

R = RAWRESID

R command tells PC-SAS that you want to put “raw residuals” into the new output

dataset, and call them RAWRESID

OUT = DIAGNOSE

OUT command tells PC-SAS that you want to create an OUTput

dataset called DIAGNOSE.

Page 11: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 11

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

*-------------------------------------------------------------------------------*Examining the distribution of the raw residuals*-------------------------------------------------------------------------------*;PROC UNIVARIATE PLOT DATA=DIAGNOSE; TITLE5 'Univariate descriptive statistics on the Raw Residuals'; VAR RAWRESID; ID STATE;PROC PLOT DATA=DIAGNOSE; TITLE5 'Plot of the Raw Residuals against the Values of the Predictor, STRATIO'; PLOT RAWRESID*STRATIO / HAXIS = 10 TO 25 BY 10 VREF = 0; *-------------------------------------------------------------------------------*Reranking the States based on the value of their raw residuals*-------------------------------------------------------------------------------*;PROC SORT DATA=DIAGNOSE; BY DESCENDING RAWRESID;PROC PRINT LABEL DATA=DIAGNOSE; TITLE5 'Listing of State Observed, Predicted and Residual Graduation Rates'; VAR STATE HSGRADRT PREDVAL RAWRESID;

Once the residuals and predicted values are output to the DIAGNOSE dataset, you can take a look….

You can use PROC UNIVARIATE to explore the sample distribution

of the raw residuals across the states.

You can use PROC PLOT to plot the raw residuals against the

predictor.

You can use PROC SORT to sort the states by the value of their raw residual, and then use PROC PRINT to list them all out for inspection, along with the name of the state, and the observed and

predicted values of HSGRADRT

Page 12: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 12

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

Here are some of the univariate descriptive statistics on the residuals….

Variable: RAWRESID (Residual) 

N 50 Sum Weights 50Mean 0 Sum Observations 0Std Deviation 7.38040638 Variance 54.4703983  Basic Statistical Measures Location Variability Mean 0.00000 Std Deviation 7.38041 Median -0.27000 Variance 54.47040 Mode . Range 32.56358 Interquartile Range 8.69773  Quantile Estimate 100% Max 16.384021 95% 12.101925 75% Q3 4.760352 50% Median -0.269997 25% Q1 -3.937376 5% -11.733883 0% Min -16.179560  Extreme Observations-----------Lowest----------- -----------Highest---------- Value STATE Obs Value STATE Obs-16.1796 FL 9 10.8684 WY 50-14.3466 NY 32 11.3262 MT 26-11.7339 AZ 3 12.1019 ND 34-11.7217 GA 10 13.4066 UT 44-11.5460 LA 18 16.3840 MN 23

Sample mean of the raw residuals is exactly zero!

Sample standard deviation of the raw residuals is 7.4 . This number can be quite useful!

Listing of “extreme observations” is useful for identifying states whose observed values

of HSGRADRT are wildly different from their predicted values

Page 13: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 13

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

Here’s the stem.leaf and boxplot of the residual…

Stem Leaf # Boxplot 16 4 1 | 14 | 12 14 2 | 10 93 2 | 8 646 3 | 6 111 3 | 4 89 2 +-----+ 2 2779938 7 | | 0 6722 4 | + | -0 6566442 7 *-----* -2 9870641 7 +-----+ -4 73 2 | -6 16 2 | -8 808 3 | -10 775 3 | -12 | -14 3 1 | -16 2 1 | ----+----+----+--

Actually, for the p-values that were computed in the regression analysis to be correct, the residuals must be normally distributed: You can use stem.leaf and box plots to check roughly if this

assumption holds in your analysis … see S-030.

Page 14: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 14

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

H.S. Predicted Graduation Value ofSTATE Rate HSGRADRT Residual MN 90.9 74.5160 16.3840UT 79.4 65.9934 13.4066ND 88.3 76.1981 12.1019MT 87.3 75.9738 11.3262WY 88.3 77.4316 10.8684IA 85.8 76.1981 9.6019WI 84.9 75.5252 9.3748NE 85.4 76.7588 8.6412CT 84.9 78.7773 6.1227OH 79.6 73.5067 6.0933WA 77.1 71.0396 6.0604ID 75.4 70.4789 4.9211NV 75.8 71.0396 4.7604KS 80.2 76.4224 3.7776SD 79.6 76.3102 3.2898PE 78.4 75.5252 2.8748AL 74.9 72.0489 2.8511AR 77.2 74.5160 2.6840IN 76.3 73.6189 2.6811MI 73.6 71.3761 2.2239IL 75.6 74.4038 1.1962CO 74.7 73.5067 1.1933WV 77.3 76.6466 0.6534VT 78.7 78.1044 0.5956

OR 73.0 73.1703 -0.1703 HI 69.1 69.4697 -0.3697 MD 74.1 74.5160 -0.4160 NJ 77.4 77.9923 -0.5923 NM 71.9 72.4975 -0.5975 MO 74.0 75.5252 -1.5252 NH 74.1 75.7495 -1.6495 CA 65.9 68.0119 -2.1119 TN 69.3 71.7125 -2.4125 ME 74.4 76.9831 -2.5831 OK 71.7 74.7403 -3.0403 MA 74.4 78.1044 -3.7044 VA 71.6 75.4131 -3.8131 DL 71.7 75.6374 -3.9374 KY 69.0 73.2824 -4.2824 MS 66.9 72.6096 -5.7096 NC 66.7 73.2824 -6.5824 RI 69.8 76.8709 -7.0709 AK 65.5 74.2917 -8.7917 TX 65.3 74.2917 -8.9917 SC 64.6 74.4038 -9.8038 LA 61.4 72.9460 -11.5460 GA 61.0 72.7217 -11.7217 AZ 61.1 72.8339 -11.7339 NY 62.3 76.6466 -14.3466 FL 58.0 74.1796 -16.1796

+1 sd

+2 sd

-1 sd

-2 sd

Here are the individual states, ordered by their residuals …Here are the individual states, ordered by their residuals …

Which are the truly extraordinary states? If the residuals are normally distributed, then the truly

extraordinary states may be those that lie ±2 standard deviations (= ± 2×7.4) from the mean?

Recall that the mean of the residuals is zero.

Page 15: S010Y: Answering Questions with Quantitative Data Class 12/III.4:  Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 04/20/23 S010Y/C11 – Slide 15

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

1 100 ˆ 9 ‚ 8 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A . ‚ A A A A S ‚ A A A A . ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25  1988 Student/Teacher Ratio

An Enhanced Conclusion…

In our investigation of state-level aggregate statistics, the average percentage of seniors graduating from High School is related to the average student/teacher ratio in the state.

With state-wide high-school graduation rate (HSGRADRT) as outcome and state-wide student/teacher ratio (STRATIO) as predictor, the trend-line estimated by OLS regression analysis has a slope of –1.12 (p = 0.0174). This suggests that two states whose student/teacher ratios differ by 1 student per teacher will tend to have graduation rates that differ by 1.12 percentage points, where states that enjoy lower student/teacher ratios having higher high-school graduation rates … <<substantive conjecture follows …>>

However, not all states follow the average trend. Some states graduate high-school seniors at rates considerably different from those predicted from knowledge of their student/teacher ratios. In particular, Minnesota has a very large positive residual indicating that its high-school graduation rate is much higher than we would expect, based on its student/teacher ratio. Florida, on the other hand, has a very large negative residual indicating that it is graduating high-school seniors at a rate that is much lower than we would anticipate … <<substantive conjecture follows …>>