thinking about data - amazon s3...laguardia 62 356 17.4 113 573 19.7 pittsburg 8 60 13.3 17 119 14.3...
TRANSCRIPT
![Page 1: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/1.jpg)
000
Thinking About Data
MMED
African Institute for the Mathematical Sciences
Muizenberg, South Africa
May, 2017
Jim Scott, PhD, MA, MPH
Slide Set Citation: DOI:10.6084/m9.figshare.5044615
The ICI3D Figshare Collection
![Page 2: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/2.jpg)
000
Goals: • Recognize that data are better than
anecdotes
• Understand that how the data are collected matters
• Be aware of variability – don’t be fooled by it
• Consider confounding and other forms of bias
• Healthy skepticism2
![Page 3: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/3.jpg)
000
Data
Why do we care about data?
3
Questions
Model
Answers
Data !
Where did these come from???
![Page 4: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/4.jpg)
000
“Data, data, data!”, he cried impatiently. “I
can’t make bricks without clay”
- Sherlock Holmes
Source: Statistics 3rd, ed. Pisani Purves, Freedman
4
![Page 5: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/5.jpg)
000
5
"The story became credible because it was
published in The Lancet," Alison Singer,
president of the Autism Science Foundation,
said Tuesday. "It was in The Lancet, and we
really rely on these medical journals."
![Page 6: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/6.jpg)
000
Consequences
6https://www.gov.uk/government/news/national-mmr-vaccination-catch-up-programme-announced-in-response-to-increase-in-measles-cases
Commons.wikimedia.org
![Page 7: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/7.jpg)
000
How the Data are Collected Matters
• “Always do right. This will gratify some people, and astonish the rest”
- Mark Twain
• Beware: All data are not created equal
7Source: Statistics 3rd, ed. Pisani, Purves, Freedman
![Page 8: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/8.jpg)
000
1936 Presidential Election
8
![Page 9: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/9.jpg)
000
1936 Presidential Election
• 1936 Literary Digest Poll
• Literary Digest had predicted the winner of every US presidential election since 1916.
• In 1936, Literary Digest mailed questionnaires to 10 million people (25% of voters).
• 2.4 million people responded
9
Returned questionnaires:
▪ Landon: 1,293,668 57%
▪ FDR: 972,897 43%Source: http://historymatters.gmu.edu/d/5168/
![Page 10: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/10.jpg)
000
Results• Actual Result: Roosevelt 61%, Landon 37%.
• One of the biggest landslides in U.S. history
10
![Page 11: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/11.jpg)
000
What went wrong?• How were the data collected?
• Those who received the questionnaire were systematically different than those who didn’t
• 10 million sent out (~25% of voters)
• 2.3 million returned – sample of convenience
• Not representative
• Sampling frame:
• Telephone books
• Automobile registries
11
![Page 12: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/12.jpg)
000
Bias and Variability• Sample size doesn’t matter if the data collection
scheme is flawed
12
Observed value = +
![Page 13: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/13.jpg)
000
Variability
• “When the facts change, I change my mind. What do you do sir?”
- John Maynard Keynes
• Variation is everywhere
Observed value = Truth + Bias + Random Error
13
![Page 14: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/14.jpg)
000
Time Series of four “stock prices”
14
![Page 15: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/15.jpg)
000
Time series of four “stock prices”
15
![Page 16: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/16.jpg)
000
Don’t be fooled by randomness
16
![Page 17: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/17.jpg)
000
Per Capita Expenditures on Road Maintenance(?)
Confounding
17
![Page 18: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/18.jpg)
000
Airport Late Total % Late Total %
Newark 957 3998 23.9 100 399 25.1
LaGuardia 62 356 17.4 113 573 19.7
Pittsburg 8 60 13.3 17 119 14.3
Detroit 16 145 11.0 16 139 11.5
Totals 1043 4559 22.9 246 1230 20.0
Continental United
Percent of Planes Delayed from City of Origin
January 2009
What is the conclusion?
slide credit: Jeff Witmer, data source: www.bts.gov
18
![Page 19: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/19.jpg)
000
How about now?
19
. logistic delay continental
Logistic regression Number of obs = 5789
LR chi2(1) = 4.72
Prob > chi2 = 0.0298
Log likelihood = -3067.3063 Pseudo R2 = 0.0008
------------------------------------------------------------------------------
delay | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
continental | 1.186576 .0943644 2.15 0.031 1.015318 1.38672
------------------------------------------------------------------------------
. logistic delay continental laguardia newark pittsburg
Logistic regression Number of obs = 5789
LR chi2(4) = 46.30
Prob > chi2 = 0.0000
Log likelihood = -3046.5183 Pseudo R2 = 0.0075
------------------------------------------------------------------------------
delay | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
continental | .9161694 .0859693 -0.93 0.351 .7622593 1.101156
laguardia | 1.807797 .3722533 2.88 0.004 1.207464 2.706607
newark | 2.58219 .5031265 4.87 0.000 1.762527 3.783036
pittsburg | 1.259086 .360524 0.80 0.421 .7183302 2.20692
------------------------------------------------------------------------------
Unadjusted
OR = 1.19
95% CI = (1.02, 1.39)
Adjusted
OR = 0.92
95% CI = (0.76, 1.10)
![Page 20: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/20.jpg)
000
20
BurkinaFaso 03
Cameroon 04
Ethiopia 05Ghana 03Guinea 05
Kenya 03
Liberia 07Mali 06Niger 06
Rwanda 05
Senegal 05
Swaziland 07
Zambia 02
Zimbabwe 06
05
10
15
20
HIV
Pre
va
lence
0 10 20 30 40 50Percent using condom at last intercourse
HIV Prevalence vs Condom Use (DHS data)
![Page 21: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/21.jpg)
000
21
1995
19961997
1998
19992000
2001
2002
20032004
2005
2006
15
20
25
30
Perc
ent
of A
dults t
hat
are
Obese (
BM
I >
30.0
)
10000 20000 30000
Number of Gyms in the U.S.Sources: CDC (BRFSS),
Merriman, Curhan, & Ford Fitness & Wellness report, 2006
Obesity in the United States: 1995 - 2006
![Page 22: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/22.jpg)
000
What do the data say?• Caution:
• This may require thoughtful consideration
22
![Page 23: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/23.jpg)
000
23
05
10
15
Tem
pera
ture
(C
els
ius)
1960 1970 1980 1990 2000 2010Year
What is the relationship?
![Page 24: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/24.jpg)
000
24
05
10
15
Tem
pera
ture
(C
els
ius)
1960 1970 1980 1990 2000 2010Year
What is the relationship?
![Page 25: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/25.jpg)
000
25
13.8
14
14.2
14.4
14.6
Tem
pera
ture
(C
els
ius)
1960 1970 1980 1990 2000 2010Year
Median Global Temperature During the Past 50 Years
![Page 26: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/26.jpg)
000
Love statistics(?)
26
![Page 27: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/27.jpg)
000
Are you a believer?
27
.51
1.5
22
.5
inju
ries
2 4 6 8 10stories
Cat injuries by # of stories fallen
![Page 28: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/28.jpg)
000
Cat conundrum
28
Stories Fallen # of cats
1 0
2 8
3 14
4 27
5 34
6 21
7-8 9
9-32 13
![Page 29: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/29.jpg)
000
What do you think?
29
![Page 30: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/30.jpg)
000
30
![Page 31: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/31.jpg)
000
Skepticism vs Openness
• "It seems to me what is called for is an exquisite balance between two conflicting needs: the most skeptical scrutiny of all hypotheses that are served up to us and at the same time a great openness to new ideas …
• If you are only skeptical, then no new ideas make it through to you …
• On the other hand, if you are open to the point of gullibility and have not an ounce of skeptical sense in you, then you cannot distinguish the useful ideas from the worthless ones."
- Carl Sagan31
![Page 32: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/32.jpg)
000
Summary• We use data to help form and revise models
• Data collection should not be haphazard
• Data are not easily obtained – especially good data
• Think about how the data were collected and be sure to consider:
• Bias and confounding
• Variability
• How the data are presented
• Be skeptical but open
32
![Page 33: Thinking About Data - Amazon S3...LaGuardia 62 356 17.4 113 573 19.7 Pittsburg 8 60 13.3 17 119 14.3 Detroit 16 145 11.0 16 139 11.5 Totals 1043 4559 22.9 246 1230 20.0 Continental](https://reader033.vdocument.in/reader033/viewer/2022060820/6099220694e5de209b0c6ecb/html5/thumbnails/33.jpg)
000
This presentation is made available through a Creative Commons Attribution license. Details of the license and permitted uses are available at
http://creativecommons.org/licenses/by/3.0/
© 2010 International Clinics on Infectious Disease Dynamics and DataScott JC. “Introduction to Thinking About Data” Clinic on the Meaningful Modeling of Epidemiological
Data. DOI:10.6084/m9.figshare.5044615.
For further information or modifiable slides please contact [email protected].
See the entire ICI3D Figshare Collection. DOI: 10.6084/m9.figshare.c.3788224.
000
000
000