which of these two relationships is “tighter?”
DESCRIPTION
Which of these two relationships is “tighter?”. The relationship on the left appears “tighter” for three reasons:. Cognition bias . Simple linear relationships are easier to “eyeball” than complex relationships. Information bias . Rounding masks information. - PowerPoint PPT PresentationTRANSCRIPT
1
Factor Outcome5 105 1110 -52 206 81 236 71 221 219 -36 87 49 -35 93 178 23 172 207 59 -2
Which of these two relationships is “tighter?”
Factor Outcome10 119 96 61 1-2 -23 42 25 310 97 78 102 15 41 38 78 92 410 117 610 9
2
The relationship on the left appears “tighter” for three reasons:
Factor Outcome10 119 96 61 1-2 -23 42 25 310 97 78 102 15 41 38 78 92 410 117 610 9
1. Cognition bias. Simple linear relationships are easier to “eyeball” than complex relationships.
2. Information bias. Rounding masks information.
3. Confirmation bias. Tendency to focus on observations that confirm beliefs and ignore observations that contradict beliefs.
3
-4
-2
0
2
4
6
8
10
12
-4 -2 0 2 4 6 8 10 12
Factor
Out
com
e
Factor Outcome10 119 96 61 1-2 -23 42 25 310 97 78 102 15 41 38 78 92 410 117 610 9
4
Factor Outcome5 105 1110 -52 206 81 236 71 221 219 -36 87 49 -35 93 178 23 172 207 59 -2
-10
-5
0
5
10
15
20
25
0 2 4 6 8 10 12
Factor
Out
com
e
5
Lesson #1Never trust your eyes.
6
CorollaryDon’t trust summary statistics either.
Anscombe’s quartetFour data sets that yield identical summary
statistics.
7
x y x y x y x y10 8.04 10 9.14 10 7.46 8 6.588 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.719 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.4714 9.96 14 8.1 14 8.84 8 7.046 7.24 6 6.13 6 6.08 8 5.254 4.26 4 3.1 4 5.39 19 12.5
12 10.84 12 9.13 12 8.15 8 5.567 4.82 7 7.26 7 6.42 8 7.915 5.68 5 4.74 5 5.73 8 6.89
Mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50Stdev 3.32 2.03 3.32 2.03 3.32 2.03 3.32 2.03Corr
alpha hatbeta hat
Anscombe's quartet
0.503.000.50
3.000.50
3.000.50
I II III IV
3.00
0.82 0.82 0.82 0.82
8
9
Lesson #1Never trust your eyes.
(Don’t trust summary statistics either)
Lesson #2Always employ sanity checks.
10
6.0%
6.5%
7.0%
7.5%
8.0%
8.5%
9.0%
9.5%
10.0%
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 20021.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
Conventional Mortgage Rates Mystery Variable from 2 Years Prior
11
6.0%
6.5%
7.0%
7.5%
8.0%
8.5%
9.0%
9.5%
10.0%
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 20021.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
Conventional Mortgage Rates Mystery Variable from 2 Years Prior
Mystery variable explains 57% of the variation in mortgage rates.Relationship is: Rate 0.03 0.02 Mystery Variable
12
Mystery variable is Algeria’s GDP-relative-to-Trade
Spurious Results
An infinite number of factors can attempt to explain a given outcome.
Look hard enough and you are guaranteed to find a perfect predictor.
If the factor is “spurious,” what you are observing is random chance.
13
Mystery variable is Algeria’s GDP-relative-to-Trade.
4.0%
6.0%
8.0%
10.0%
12.0%
14.0%
16.0%
18.0%
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
1
1.5
2
2.5
3
3.5
4
Conventional Mortgage Rates Mystery Variable from 2 Years Prior
By random chance, the mystery variable predicts mortgage rates over this period.
14
DJIA will be down
tomorrow!
DJIA will be down
tomorrow!
.
.
.
DJIA will be up tomorrow!
DJIA will be up tomorrow!
.
.
.
200,000 letters
200,000 letters
If you wait long enough, randomness will tell you anything you want to hear.
DJIA will be down
tomorrow!
DJIA will be down
tomorrow!
.
.
.
DJIA will be up tomorrow!
DJIA will be up tomorrow!
.
.
.
100,000 letters
100,000 letters
DJIA will be down
tomorrow!
DJIA will be down
tomorrow!
.
.
.
DJIA will be up tomorrow!
DJIA will be up tomorrow!
.
.
.
50,000 letters
50,000 letters
DJIA will be down
tomorrow!
DJIA will be down
tomorrow!
.
.
.
DJIA will be up tomorrow!
DJIA will be up tomorrow!
.
.
.
25,000 letters
25,000 letters
15
0
20
40
60
80
100
120
140
160
180
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
10
20
30
40
50
60
Number of Sunspots in the Current Year (left axis)
Number of Republicans in the Senate 1 Year in the Future (right axis)
Source: ftp.ngdc.noaa.gov/stp/solar_data/sunspot_numbers/yearlywww.senate.gov/pagelayout/history/one_item_and_teasers/partydiv.htm
16
Counter argument:
Spurious or not, sunspots would have been useful at predicting Republicans in the Senate.
Fallacy:
We see the correlation in hindsight. To be useful, we need to detect the correlation before it ceases to exist.
17
0
20
40
60
80
100
120
140
160
180
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
20
30
40
50
60
70
80
Number of Sunspots in the Current Year (left axis)
Number of Republicans in the Senate 1 Year in the Future (right axis)
0
20
40
60
80
100
120
140
160
180
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
10
20
30
40
50
60
Number of Sunspots in the Current Year (left axis)
Number of Republicans in the Senate 1 Year in the Future (right axis)
Source: ftp.ngdc.noaa.gov/stp/solar_data/sunspot_numbers/yearlywww.senate.gov/pagelayout/history/one_item_and_teasers/partydiv.htm
1960 – 1980 1981 – 2005
18
19
20
21
22
23
Lesson #1Never trust your eyes.
(Don’t trust summary statistics either)
Lesson #2Always employ sanity checks.
Lesson #3An observation is meaningless.
CorollaryAn anecdote is both meaningless and dangerous.
24
Left half of room: Don’t look.Right half of room: Write what you read.
25
The average person in Benin earns an annual income of $750 (in U.S. dollars).
26
Right half of room: Don’t look.Left half of room: Write what you read.
27
The average person in Andorra earns an annual income of $40,000 (in U.S. dollars).
28
The average person on planet Earth earns what annual income (in U.S. dollars)?
29
AnchoringWhen we see a piece of information, we evaluate subsequent information in light of the first piece of information.
InformationNews interview of a single mother working three jobs to support her family.
Policy QuestionDo we need welfare reform?
ProblemHow common is this example?
30
Left half of room: Don’t look.Right half of room: Read and answer.
31
Should we require school districts to pay to install seat belts on school buses?
1 2 3 4 5Definitely not! Absolutely!
32
Right half of room: Don’t look.Left half of room: Read and answer.
33
Every year in the U.S., 17,000 children are treated for injuries sustained in school buses accidents.Most of these injuries could have been avoided had the children been wearing seat belts.Should we require school districts to pay to install seat belts on school buses?
1 2 3 4 5Definitely not! Absolutely!
34
AvailabilityIt’s easier to see what’s in front of us that it is to see what isn’t.
InformationNews report showing the benefit of school bus seat belts.
Policy QuestionShould we require seat belts in school buses?
ProblemWhat is the expected benefit and what are the tradeoffs?
35
Lesson #1Never trust your eyes.
Lesson #2Always employ sanity checks.
Lesson #3An observation is meaningless.
CorollaryAn anecdote is both meaningless and dangerous.
Lesson #4Not everything that appears random is.
X1
Y
X2
1
2
ˆ 50.01 8.65ˆ 0.11 0.14
0.01
y X u
R
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
2
2
ˆ 1.18 7.56ˆ 0.50 0.06
0.55
y X u
R
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
1 1 2 2
1
2
2
ˆ 0.00 0.00ˆ 1.00 0.00ˆ 1.00 0.00
1.00
y X X u
R
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
X1
Y
X2
217
RegressionWhy do we do this?
218
A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.
Approach #1: Calculate Average Time per Mile
Trucks in the data set required a total of 87 hours to travel a total of 4,000 miles. Dividing hours by miles, we find an average of 0.02 hours per mile journeyed.
Miles Traveled Deliveries Travel Time (hours)500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1
(0.02 hours per mile) (200 miles) = 4 hours
219
A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.
Approach #2: Calculate Average Time per Delivery
Trucks in the data set required a total of 87 hours to make 29 deliveries. Dividing hours by deliveries, we find an average of 3 hours per delivery.
Miles Traveled Deliveries Travel Time (hours)500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1
(3 hours per delivery) (3 deliveries) = 9 hours
220
A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.
Approach #3: Combine Average Time per Mile and Average Time per Delivery
Trucks in the data set required 0.02 hours per mile journeyed and 3 hours per delivery.
Miles Traveled Deliveries Travel Time (hours)500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1
(0.02 hours per mile) (200 miles) + (3 hours per delivery) (3 deliveries) = 13 hours
221
A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.
Problems
1. Combining average time per delivery and average time per mile will double-count time if delivery and miles are correlated.
2. We have ignored a possible fixed effect – an amount of “overhead” time that is required regardless of the number of miles and deliveries.
Miles Traveled Deliveries Travel Time (hours)500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1
222
A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.Miles Traveled Deliveries Travel Time (hours)
500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1
0 1
0
1
Time (deliveries )
ˆ 5.38ˆ 1.14
i i iu
5.38 hours + (1.14 hours per delivery) (3 deliveries) = 8.8 hours
223
A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.Miles Traveled Deliveries Travel Time (hours)
500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1
0 1
0
1
Time (miles )
ˆ 3.27ˆ 0.01
i i iu
3.27 hours + (0.01 hours per mile) (200 miles) = 5.27 hours
224
A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.Miles Traveled Deliveries Travel Time (hours)
500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1
0 1 2
0
1
2
Time (miles ) (deliveries )
ˆ 1.13ˆ 0.01ˆ 0.92
i i i iu
1.13 hours + (0.01 hours per mile) (200 miles) + (0.92 hours per delivery) (3 deliveries)= 5.89 hours
225
A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.Miles Traveled Deliveries Travel Time (hours)
500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1
1 2
1
2
Time (miles ) (deliveries )
ˆ 0.01ˆ 1.07
i i i iu
(0.01 hours per mile) (200 miles) + (1.07 hours per delivery) (3 deliveries)= 5.21 hours
226
A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.
Hours per Mile Hours per Delivery Fixed Hours Estimated Hours0.02 4.00
3.00 9.000.02 3.00 13.00
1.14 5.38 8.800.01 3.27 5.270.01 0.92 1.13 5.890.01 1.07 5.21