![Page 1: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/1.jpg)
social data scienceData Visualization
Sebastian BarfortAugust 08, 2016
University of CopenhagenDepartment of Economics
1/86
![Page 2: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/2.jpg)
Who’s ahead in the polls?
2/86
![Page 3: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/3.jpg)
What values are displayed in this chart?
3/86
![Page 4: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/4.jpg)
The answer may surprise you
4/86
![Page 5: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/5.jpg)
goals
What are you trying to accomplish?
1. Who’s the audience?
· Exploratory (use defaults) vs. explanatory (customize)· Raw data vs. model results
2. Graphs should be self explanatory3. A graph is a narrative - should convey key point(s)
5/86
![Page 6: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/6.jpg)
6/86
![Page 7: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/7.jpg)
three basic principles
Schwabish (2014)
1. Show the data (many graphs show too much)2. Reduce the clutter3. Integrate text and graph
7/86
![Page 8: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/8.jpg)
your turn
Discuss in groups
1. Look at the eight transformed graphs in Schwabish (2014). Whatdo you think about the transformations?
2. Are there other objectives of data visualization not mentionedby Schwabish?
3. If you were to improve on one of the eight transformed graphs,which one would you choose and how would you change it?
8/86
![Page 9: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/9.jpg)
other considerations
Color palette
Is your data numeric, binary, catetorical, text?
Should you truncate your y axis?
Remember
Visualizing data is not just a matter of good taste
Basic perceptual processes play a very strong role
9/86
![Page 10: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/10.jpg)
10/86
![Page 11: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/11.jpg)
11/86
![Page 12: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/12.jpg)
12/86
![Page 13: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/13.jpg)
13/86
![Page 14: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/14.jpg)
14/86
![Page 15: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/15.jpg)
some things i think work well
Small multiples
Tufte
“Illustrations of postage-stamp size are indexed by categoryor a label, sequenced over time like the frames of a movie,or ordered by a quantitative variable not used in the singleimage itself.”
15/86
![Page 16: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/16.jpg)
16/86
![Page 17: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/17.jpg)
“lollipops” and “dumbbells”
17/86
![Page 18: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/18.jpg)
slopegraphs
18/86
![Page 19: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/19.jpg)
data visualization software
Static
1. Base R2. lattice3. ggplot2
Interactive
1. D3 (javascript)2. htmlwidgets (R)3. Tableau (commercial)
19/86
![Page 20: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/20.jpg)
ggplot2
20/86
![Page 21: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/21.jpg)
why?
Positives
· Great defaults· Intuitive· Extremely well documented
Negatives
· Slow· Limited functionality
21/86
![Page 22: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/22.jpg)
grammar of graphics
ggplot2 is based on the grammar of graphics, the idea that you canbuild every graph from the same few components: a dataset, a set ofgeoms—visual marks that represent data points, and a coordinatesystem
You start with your data, and then you assign a geometry toelements of that data, such as circle size to population, then youdraw those geometries based upon some scaling of your data. Whenyou think about visualization this way it helps you develop a betterunderstanding of the data itself and think of proper ways tovisualize it.
22/86
![Page 23: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/23.jpg)
a pew visualization of partisanship
The Pew Research Center did an interesting visualization of politicalpolarization in the U.S.
Polarization
The new survey finds that as ideological consistency hasbecome more common, it has become increasingly alignedwith partisanship. Looking at 10 political values questionstracked since 1994, more Democrats now give uniformlyliberal responses, and more Republicans give uniformlyconservative responses than at any point in the last 20 years
Bob Rudis recreated their plot in ggplot2. Let’s see how.
23/86
![Page 24: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/24.jpg)
24/86
![Page 25: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/25.jpg)
basics
ggplot works by building your plot piece by pieceFirst we need to load some data into R (link here)
library(”readr”)gh.link = ”https://raw.githubusercontent.com/”user.repo = ”sebastianbarfort/sds_summer/”branch = ”gh-pages/”link = ”data/polarization.csv”data.link = paste0(gh.link, user.repo, branch, link)df = read_csv(data.link)names(df)
## [1] ”x” ”year” ”party” ”pct”
25/86
![Page 26: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/26.jpg)
inspecting the data
## [1] ”x” ”year” ”party” ”pct”
x year party pct
-10 1994 Dem 0.57-9 1994 Dem 1.60-8 1994 Dem 1.89-7 1994 Dem 3.49-6 1994 Dem 3.96-5 1994 Dem 6.56
26/86
![Page 27: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/27.jpg)
ggplot2 basics
ggplot works by building your plot piece by pieceThen we tell ggplot what pieces of the data frame we are interestedin
We create an object called p containing this informationHere, x = x and y = pct say what will go on the x and the y axesThese are aesthetic mappings that connect pieces of the data tothings we can actually see on a plot.
27/86
![Page 28: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/28.jpg)
library(”ggplot2”)p = ggplot(data = df, aes(x = x, y = pct))
28/86
![Page 29: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/29.jpg)
aesthetics, but no geoms
The plot is so far just a frame with no actual information
0.0
2.5
5.0
7.5
10.0
−10 −5 0 5 10x
pct
29/86
![Page 30: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/30.jpg)
let’s add a line
p = p + geom_line()
30/86
![Page 31: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/31.jpg)
0.0
2.5
5.0
7.5
10.0
−10 −5 0 5 10x
pct
31/86
![Page 32: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/32.jpg)
problem
We need to inform ggplot that Democrats and Republicans are twoseparate groups.
This can be done using fill, groups, shape, size or color.
p = ggplot(data = df,aes(x = x, y = pct,
fill = party, color = party))
32/86
![Page 33: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/33.jpg)
try again
p = p + geom_line()
33/86
![Page 34: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/34.jpg)
0.0
2.5
5.0
7.5
10.0
−10 −5 0 5 10x
pct
party
Dem
REP
34/86
![Page 35: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/35.jpg)
thoughts
1. Shading below the line2. Filling should be transparent3. Custom filling4. Axes should be modified5. Subset by year6. Axis title7. Background8. Legend
35/86
![Page 36: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/36.jpg)
1. shading below the line
p = p + geom_ribbon(aes(ymin = 0, ymax = pct))
36/86
![Page 37: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/37.jpg)
0.0
2.5
5.0
7.5
10.0
−10 −5 0 5 10x
pct
party
Dem
REP
37/86
![Page 38: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/38.jpg)
2. filling should be transparent
p = p + geom_ribbon(aes(ymin = 0, ymax = pct),alpha = .5)
38/86
![Page 39: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/39.jpg)
0.0
2.5
5.0
7.5
10.0
−10 −5 0 5 10x
pct
party
Dem
REP
39/86
![Page 40: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/40.jpg)
3. custom filling
p = p + scale_color_manual(name=NULL,values=c(Dem=”#728ea2”,REP=”#cf6a5d”),labels=c(Dem=”Democrats”, REP=”Republicans”)) +scale_fill_manual(
name=NULL,values=c(Dem=”#728ea2”, REP=”#cf6a5d”),labels=c(Dem=”Democrats”, REP=”Republicans”)) +
guides(color=”none”,fill=guide_legend(override.aes=list(alpha=1)))
40/86
![Page 41: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/41.jpg)
0.0
2.5
5.0
7.5
10.0
−10 −5 0 5 10x
pct Democrats
Republicans
41/86
![Page 42: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/42.jpg)
4. axes should be modified
p = p +scale_x_continuous(
expand = c(0,0), breaks = c(-8, 0, 8),labels= c(”Consistently\nliberal”,
”Mixed”,”Consistently\nconservative”)) +
scale_y_continuous(expand = c(0,0), limits = c(0, 12))
42/86
![Page 43: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/43.jpg)
0
3
6
9
12
Consistentlyliberal
Mixed Consistentlyconservative
x
pct Democrats
Republicans
43/86
![Page 44: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/44.jpg)
5. subset by year
p = p + facet_wrap(~ year, ncol = 2,scales = ”free_x”)
44/86
![Page 45: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/45.jpg)
1994 1999
2001 2004
2014
036912
036912
036912
Consistentlyliberal
MixedConsistentlyconservative
Consistentlyliberal
MixedConsistentlyconservative
Consistentlyliberal
MixedConsistentlyconservative
Consistentlyliberal
MixedConsistentlyconservative
Consistentlyliberal
MixedConsistentlyconservative
x
pct Democrats
Republicans
45/86
![Page 46: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/46.jpg)
6. axis title
p = p + labs(x = NULL,y = NULL,title = ”Polarization, 1994-2014”)
46/86
![Page 47: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/47.jpg)
7. background
p = p + theme_minimal()
47/86
![Page 48: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/48.jpg)
8. legend
p = p +theme(legend.position = c(0.75, 0.1)) +theme(legend.direction = ”horizontal”) +theme(axis.text.y = element_blank())
48/86
![Page 49: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/49.jpg)
output
1994 1999
2001 2004
2014
Consistentlyliberal
Mixed Consistentlyconservative
Consistentlyliberal
Mixed Consistentlyconservative
Consistentlyliberal
Mixed Consistentlyconservative
Consistentlyliberal
Mixed Consistentlyconservative
Consistentlyliberal
Mixed Consistentlyconservative
Democrats Republicans
Polarization, 1994−2014
49/86
![Page 50: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/50.jpg)
visualizing polls
How do you visualize polls?
New polls generate a lot of news stories
But much of it is probably noise
How to deal with this visually?
50/86
![Page 51: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/51.jpg)
berlingske
51/86
![Page 52: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/52.jpg)
alternative
A poll is not really a number
It’s a distribution
Usually, that distribution is normal
So let’s draw the distribution
52/86
![Page 53: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/53.jpg)
read data
library(”readr”)gh.link = ”https://raw.githubusercontent.com/”user.repo = ”sebastianbarfort/sds_summer/”branch = ”gh-pages/”link = ”data/polls_tidy.csv”data.link = paste0(gh.link, user.repo, branch, link)df = read_csv(data.link)
53/86
![Page 54: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/54.jpg)
First five rows/columns
predicted density estimate ci Parti
12.83318 6.60e-06 23.6 2.39159 Socialdemokraterne12.92999 7.90e-06 23.6 2.39159 Socialdemokraterne13.02681 9.50e-06 23.6 2.39159 Socialdemokraterne13.12363 1.14e-05 23.6 2.39159 Socialdemokraterne13.22045 1.36e-05 23.6 2.39159 Socialdemokraterne
54/86
![Page 55: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/55.jpg)
your turn
Inspect the dataset
What do you think the variables predicted and density mean?
55/86
![Page 56: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/56.jpg)
visualization
p = ggplot(df, aes(x = predicted, y = density))
56/86
![Page 57: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/57.jpg)
p = p + geom_line()
57/86
![Page 58: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/58.jpg)
p = ggplot(df, aes(x = predicted, y = density,group = group, color = Parti))
p = p + geom_line()
58/86
![Page 59: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/59.jpg)
p = ggplot(df, aes(x = predicted, y = density,group = group, color = Parti))
p = p +geom_line() +facet_grid(party.ord ~ .,
scales = ”free_y”,switch = ”y”)
59/86
![Page 60: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/60.jpg)
new strategy
df.1 = subset(df, date != max(date))df.2 = subset(df, date == max(date))p = ggplot()p = p +geom_line(data = df.1,
aes(x = predicted, y = density,group = group),
color = ”grey50”, alpha = .25) +facet_grid(party.ord ~ .,
scales = ”free_y”, switch = ”y”)
60/86
![Page 61: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/61.jpg)
add latest poll
p = p + geom_line(data = df.2,aes(x = predicted, y = density,
group = group),colour = ”red”, size = 1)
61/86
![Page 62: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/62.jpg)
details i
p = p + scale_x_continuous(breaks = scales::pretty_breaks(10)) +theme_minimal() +theme(strip.text.y = element_text(angle = 180,
face = ”bold”,vjust = 0.75,hjust = 1),
axis.text.y = element_blank()) +labs(y = NULL, x = NULL)
62/86
![Page 63: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/63.jpg)
output
SOCIALDEMOKRATERNE
DANSK FOLKEPARTI
VENSTRE
ENHEDSLISTEN
LIBERAL ALLIANCE
ALTERNATIVET
RADIKALE VENSTRE
SF
KONSERVATIVE
0 5 10 15 20 25 30 35
63/86
![Page 64: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/64.jpg)
steph curry
http://www.nytimes.com/interactive/2016/04/16/upshot/stephen-curry-golden-state-warriors-3-pointers.
html
64/86
![Page 65: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/65.jpg)
65/86
![Page 66: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/66.jpg)
data
library(”readr”)gh.link = ”https://raw.githubusercontent.com/”user.repo = ”sebastianbarfort/sds_summer/”branch = ”gh-pages/”link = ”data/basket_tidy.csv”data.link = paste0(gh.link, user.repo, branch, link)df = read_csv(data.link)
66/86
![Page 67: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/67.jpg)
MP 3P agg.three agg.minutes games player
35 5 5 35 1 Stephen Curry, 2015-1627 4 9 62 2 Stephen Curry, 2015-1635 8 17 97 3 Stephen Curry, 2015-1628 4 21 125 4 Stephen Curry, 2015-1632 7 28 157 5 Stephen Curry, 2015-16
67/86
![Page 68: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/68.jpg)
plot
p = ggplot(data = df,aes(x = games,
y = agg.three,color = as.factor(steph.ind),group = player))
68/86
![Page 69: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/69.jpg)
add steps
p = p + geom_step(size = 1)
69/86
![Page 70: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/70.jpg)
0
100
200
300
400
0 20 40 60 80games
agg.
thre
e
as.factor(steph.ind)
0
1
2
70/86
![Page 71: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/71.jpg)
fix scales, etc.
p = p + scale_x_continuous(expand = c(0, 0),limits = c(0, 160),breaks = c(0, 20, 40, 60, 80)) +
scale_y_continuous(expand = c(0, 0)) +
theme_minimal()
71/86
![Page 72: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/72.jpg)
0
100
200
300
400
0 20 40 60 80games
agg.
thre
e as.factor(steph.ind)0
1
2
72/86
![Page 73: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/73.jpg)
fix color and axis labels
p = p +scale_color_manual(
values = c(”grey50”, ”grey50”, ”darkblue”)) +labs(x = ”Games”,
y = ”# of threes”,title = ”Stephen Curry’s 3-Point Record”)
73/86
![Page 74: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/74.jpg)
theme changes
p = p +theme(legend.position = ”none”,
axis.ticks.x = element_blank(),axis.line.x = element_blank(),panel.grid.major.x = element_blank(),panel.grid.minor.x = element_blank())
74/86
![Page 75: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/75.jpg)
0
100
200
300
400
0 20 40 60 80Games
# of
thre
esStephen Curry's 3−Point Record
75/86
![Page 77: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/77.jpg)
preparation
data.max = df %>%filter(player ==
”Stephen Curry, 2015-16”) %>%filter(agg.three == max(agg.three))
77/86
![Page 78: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/78.jpg)
label
library(”ggrepel”)p = p +geom_text_repel(
data = data.max,aes(label = player,
color = as.factor(steph.ind)),size = 3,nudge_x = 24,segment.size = 0
)
78/86
![Page 79: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/79.jpg)
output
79/86
![Page 80: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/80.jpg)
firearm prices
Look at this visualization from Bob Rudis.
Data here
80/86
![Page 81: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/81.jpg)
load data
library(”readr”)gh.link = ”https://raw.githubusercontent.com/”user.repo = ”sebastianbarfort/sds_summer/”branch = ”gh-pages/”link = ”data/armslist.csv”data.link = paste0(gh.link, user.repo, branch, link)df = read_csv(data.link)
81/86
![Page 82: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/82.jpg)
create ggplot
p = ggplot(data = df, aes(x = price))
82/86
![Page 83: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/83.jpg)
your turn
Discuss in groups
1. What is interesting about these data?2. How would you visualize the distribution of the price variables3. Search the ggplot2 documentation here and create avisualization
4. Discuss how you would improve your visualization5. Look at Bob Rudis’ plots. Would you change anything? Do youthink they work?
83/86
![Page 84: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/84.jpg)
Maps
84/86
![Page 85: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/85.jpg)
making maps
There are many ways to make maps in R
1. easy: use an existing package2. hard: learn how to work with shapefiles (we don’t have time forthis today, but I strongly recommend reading these notes on thetopic)
85/86
![Page 86: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults](https://reader033.vdocument.in/reader033/viewer/2022042321/5f0a989a7e708231d42c68c1/html5/thumbnails/86.jpg)
map packages
There are many useful packages for making maps in R
· maps: all kinds of maps· ggcounty: generate United States county maps· ggmap: extends ggplot2 for maps· mapDK: maps of Denmark
86/86