![Page 1: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/1.jpg)
Something old, something new, something borrowed, something blue
Ways to teach data science (and learn it too!)
Albert Y. KimAmherst College (Smith College July 2018)
Slides available at twitter.com/rudeboybert
![Page 2: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/2.jpg)
Background
![Page 3: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/3.jpg)
Focus of Today
• Talk is nominally is about how I teach intro statistics and data science courses
• However can apply to a broader target demographic
• R-centric, but many of these ideas are language agnostic
![Page 4: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/4.jpg)
Amherst College STAT135
• Course webpage • Heterogeneous group: Backgrounds and socio-
economics status • Majors: Math, Stats, Econ, Bio, Neuroscience,
Psych, Poli Sci, Environmental Studies • All had high school algebra, most had no coding
experience
![Page 5: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/5.jpg)
How can we introduce data and computation novices to:
1. Data science: Data visualization, data wrangling, exploratory data analysis
2. Data modeling: Explanation (causal inference) & prediction (machine learning), correlation
3. Statistical inference: elementary probability theory, sampling distributions, standard errors, confidence intervals, hypothesis/AB testing & p-values
Question
![Page 6: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/6.jpg)
An Introduction to Statistical and Data Sciences via R
• Online textbook available at moderndive.com • Development version at moderndive.netlify.com • On GitHub at github.com/moderndive/
![Page 7: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/7.jpg)
![Page 8: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/8.jpg)
Technology in the classroom?
![Page 9: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/9.jpg)
The debate continues…
![Page 10: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/10.jpg)
Analogy: Learning Long Division
Do this a few times: Then rely on this:
![Page 12: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/12.jpg)
ggplot2 via the Grammar of Graphics
![Page 13: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/13.jpg)
Tactile simulation of sampling to teach
sampling distributions
![Page 14: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/14.jpg)
Computer simulation of sampling to teach
sampling distributions
![Page 15: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/15.jpg)
![Page 16: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/16.jpg)
CodingCobb (2015) argued there are two possible
computational engines for statistics:
![Page 17: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/17.jpg)
Teaching/Learning Code• Learn how a practitioner would learn:
the “Copy/paste/tweak approach” • Borrow elements of “flipped classroom”: how to use
time we’re all in the same room together?
![Page 18: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/18.jpg)
Teaching Coding: The Battle is Psychological
• “Don’t code from scratch, take the copy/paste/tweak approach!”
• “Computers are stupid!” • “Learning to code is similar to learning a language”
![Page 19: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/19.jpg)
New Tools Specific for Data Science
![Page 20: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/20.jpg)
DataCamp: Immediate Feedback
• Students can practice failing, but with support. • Difference with Coursera & Udacity? • DataCamp will pick off low hanging fruit. Ex:
1. Matching parentheses 2. Variable name misspellings 3. Linearity of programs
• Examples of “Curse of knowledge”
![Page 21: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/21.jpg)
Without DataCamp: # of Questions on Coding
![Page 22: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/22.jpg)
With DataCamp: # of Questions on Coding
![Page 23: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/23.jpg)
![Page 24: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/24.jpg)
Leverage open sourceOpen data, such as data in R packages like nycflights13, gapminder, fivethirtyeight
Bechdel test? Original 538 article
![Page 25: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/25.jpg)
Leverage open source
![Page 26: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/26.jpg)
New textbook authoring paradigm
+ +
![Page 27: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/27.jpg)
New textbook authoring paradigm
![Page 29: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/29.jpg)
New textbook authoring paradigm
+ +
“Versions, not editions”On GitHub at github.com/moderndive/
![Page 30: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/30.jpg)
An Introduction to Statistical and Data Sciences via R
• Available at moderndive.com • Development version at moderndive.netlify.com • On GitHub at github.com/moderndive/
v0.3.0 to be released next week! What’s new?
![Page 31: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/31.jpg)
Diagram inspired by hadley/r4ds
Data Modeling with moderndive
6. Basic Regression
7. Multiple Regression
11. Inference for Regression 3. Data
Visualization
4. Tidy Data 5. Data
WranglingData Science with tidyverse
2. Getting Started with Data in R
1. Introduction 12. Thinking with Data
8. Sampling
10. Hypothesis Testing
Statistical Inference with infer
9. Confidence IntervalsAvailable at moderndive.com
![Page 32: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/32.jpg)
"If You're Not Embarrassed By The First Version Of Your Product, You’ve Launched Too Late"
Reid Hoffman, founder of LinkedIn
![Page 33: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/33.jpg)
Crowdsourcing Typos
![Page 34: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/34.jpg)
3. Data Visualization
8. Sampling
10. Hypothesis Testing
2. Getting Started with Data in R
4. Tidy Data 5. Data
WranglingData Science with tidyverse
Statistical Inference with infer
Data Modeling with moderndive
9. Confidence Intervals
12. Thinking with Data
Diagram inspired by hadley/r4ds
6. Basic Regression
7. Multiple Regression
11. Inference for Regression
1. Introduction
Available at moderndive.com
Stable
Beta
Alpha
![Page 35: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/35.jpg)
http://infer.netlify.com/
Calculate Statistic
Specify Hypothesis
Generate Data
Repeat
(from null)
Visualize
infer package for tidy statistical inference
generate(reps)%>% calculate(stat)hypothesize(null) %>% visualize()%>%
![Page 36: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/36.jpg)
specify()
data
generate() calculate()
visualize()hypothesize()
Hypothesis Testing
![Page 37: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/37.jpg)
“Thinking with Data”
Example student work
• Analysis of crime in Chicago • How many f**ks does Tarantino Give? • Final projects: Code and data
![Page 38: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/38.jpg)
![Page 39: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/39.jpg)
+ +
Albert Y. KimAmherst College
Twitter: @rudeboybert GitHub: rudeboybert
Chester Ismay DataCamp
Twitter: @old_man_chesterGitHub: ismayc
![Page 40: Something old, something new, something borrowed ...rudeboybert.rbind.io/talk/2018-01-27_Data_Day_Texas.pdf · • Talk is nominally is about how I teach intro statistics and data](https://reader033.vdocument.in/reader033/viewer/2022050510/5f9ac69f9e4fe76fc92e121d/html5/thumbnails/40.jpg)
An Introduction to Statistical and Data Sciences via R
• Available at moderndive.com • Development version at moderndive.netlify.com • On GitHub at github.com/moderndive/
v0.3.0 to be released next week! What’s new?
Slides available at twitter.com/rudeboybert