read in data - john muschellidevices by default, r displays plots in a separate panel. from there,...
TRANSCRIPT
![Page 1: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/1.jpg)
Data VisualizationIntroduction to R for Public Health Researchers
![Page 2: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/2.jpg)
Read in Data
library(readr) death = read_csv( "http://johnmuschelli.com/intro_to_r/data/indicatordeadkids35.csv") death[1:2, 1:5]
# A tibble: 2 x 5 X1 `1760` `1761` `1762` `1763` <chr> <dbl> <dbl> <dbl> <dbl> 1 Afghanistan NA NA NA NA 2 Albania NA NA NA NA
2/90
![Page 3: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/3.jpg)
Read in Data: jhur
jhur::read_mortality()
# A tibble: 197 x 255 X1 `1760` `1761` `1762` `1763` `1764` `1765` `1766` `1767` `1768` <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Afgh… NA NA NA NA NA NA NA NA NA 2 Alba… NA NA NA NA NA NA NA NA NA 3 Alge… NA NA NA NA NA NA NA NA NA 4 Ango… NA NA NA NA NA NA NA NA NA 5 Arge… NA NA NA NA NA NA NA NA NA 6 Arme… NA NA NA NA NA NA NA NA NA 7 Aruba NA NA NA NA NA NA NA NA NA 8 Aust… NA NA NA NA NA NA NA NA NA 9 Aust… NA NA NA NA NA NA NA NA NA 10 Azer… NA NA NA NA NA NA NA NA NA # … with 187 more rows, and 245 more variables: `1769` <dbl>, # `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>, # `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>, # `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>, # `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>, # `1790` <dbl>, `1791` <dbl>, `1792` <dbl>, `1793` <dbl>, `1794` <dbl>, # `1795` <dbl>, `1796` <dbl>, `1797` <dbl>, `1798` <dbl>, `1799` <dbl>, # `1800` <dbl>, `1801` <dbl>, `1802` <dbl>, `1803` <dbl>, `1804` <dbl>, # `1805` <dbl>, `1806` <dbl>, `1807` <dbl>, `1808` <dbl>, `1809` <dbl>, # `1810` <dbl>, `1811` <dbl>, `1812` <dbl>, `1813` <dbl>, `1814` <dbl>, # `1815` <dbl>, `1816` <dbl>, `1817` <dbl>, `1818` <dbl>, `1819` <dbl>, # `1820` <dbl>, `1821` <dbl>, `1822` <dbl>, `1823` <dbl>, `1824` <dbl>,
# `1825` <dbl>, `1826` <dbl>, `1827` <dbl>, `1828` <dbl>, `1829` <dbl>,
3/90
![Page 4: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/4.jpg)
Data are not Tidy!
![Page 5: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/5.jpg)
Tidying data: reshape the data
After reshaping the data to long, we can plot the data with one data.frame:
library(tidyverse) long = gather(death, key = year, value = deaths, country) long = long %>% filter(!is.na(deaths)) head(long); # note class year
# A tibble: 6 x 3 country year deaths <chr> <chr> <dbl> 1 Sweden 1760 2.21 2 United Kingdom 1760 2.20 3 Sweden 1761 2.30 4 United Kingdom 1761 2.35 5 Sweden 1762 2.79 6 United Kingdom 1762 2.32
long = long %>% mutate(year = as.numeric(year))
5/90
![Page 6: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/6.jpg)
Plot the long data
swede_long = long %>% filter(country == "Sweden") qplot(x = year, y = deaths, data = swede_long)
6/90
![Page 7: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/7.jpg)
Plot the long data only up to 2012
qplot(x = year, y = deaths, data = swede_long, xlim = c(1760,2012))
7/90
![Page 8: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/8.jpg)
ggplot2
ggplot2 is a package of plotting that is very popular and powerful (using thegrammar of graphics). qplot (“quick plot”), similar to plot
library(ggplot2) qplot(x = year, y = deaths, data = swede_long)
8/90
![Page 9: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/9.jpg)
ggplot2
The generic plotting function is ggplot, which uses aesthetics:
g is an object, which you can adapt into multiple plots!
ggplot(data, aes(args))
g = ggplot(data = swede_long, aes(x = year, y = deaths))
9/90
![Page 10: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/10.jpg)
ggplot2
Common aesthetics:
If you set these in aes, you set them to a variable. If you want to set them for allvalues, set them in a geom.
x
y
colour/color
size
fill
shape
·
·
·
·
·
·
10/90
![Page 11: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/11.jpg)
ggplot2
You can do this most of the time using qplot, but qplot will assume ascatterplot if x and y are specified and histogram if x is specified:
g is an object, which you can adapt into multiple plots!
q = qplot(data = swede_long, x = year, y = deaths) q
11/90
![Page 12: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/12.jpg)
ggplot2: what’s a geom?
g on it’s own can’t be plotted, we have to add layers, usually with geom_commands:
geom_point - add points
geom_line - add lines
geom_density - add a density plot
geom_histogram - add a histogram
geom_smooth - add a smoother
geom_boxplot - add a boxplots
geom_bar - bar charts
geom_tile - rectangles/heatmaps
·
·
·
·
·
·
·
·
12/90
![Page 13: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/13.jpg)
ggplot2: adding a geom and assigning
You “add” things to a plot with a + sign (not pipe!). If you assign a plot to anobject, you must call print to print it.
gpoints = g + geom_point(); print(gpoints) # one line for slides
13/90
![Page 14: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/14.jpg)
ggplot2: adding a geom
Otherwise it prints by default - this time it’s a line
g + geom_line()
14/90
![Page 15: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/15.jpg)
ggplot2: adding a geom
You can add multiple geoms:
g + geom_line() + geom_point()
15/90
![Page 16: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/16.jpg)
ggplot2: adding a smoother
Let’s add a smoother through the points:
g + geom_line() + geom_smooth()
16/90
![Page 17: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/17.jpg)
ggplot2: grouping - using colour
If we want a plot with new data, call ggplot again. Group plots by country usingcolour (piping in the data):
sub = long %>% filter(country %in% c("United States", "United Kingdom", "Sweden", "Afghanistan", "Rwanda")) g = sub %>% ggplot(aes(x = year, y = deaths, colour = country)) g + geom_line()
17/90
![Page 18: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/18.jpg)
Coloring manually
There are many scale_AESTHETICS_* functions andscale_AESTHETICS_manual allows to directly specify the colors:
g + geom_line() + scale_colour_manual(values = c("United States" = "blue", "United Kingdom" = "green", "Sweden" = "black", "Afghanistan" = "red", "Rwanda" = "orange"))
18/90
![Page 19: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/19.jpg)
ggplot2: grouping - using colour
Let’s remove the legend using the guide command:
g + geom_line() + guides(colour = FALSE)
19/90
![Page 21: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/21.jpg)
ggplot2: boxplot
ggplot(long, aes(x = year, y = deaths)) + geom_boxplot()
21/90
![Page 22: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/22.jpg)
ggplot2: boxplot
For different plotting per year - must make it a factor - but x-axis is wrong!
ggplot(long, aes(x = factor(year), y = deaths)) + geom_boxplot()
22/90
![Page 23: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/23.jpg)
ggplot2: boxplot
ggplot(long, aes(x = year, y = deaths, group = year)) + geom_boxplot()
23/90
![Page 24: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/24.jpg)
ggplot2: boxplot with points
geom_jitter plots points “jittered” with noise so not overlapping·
sub_year = long %>% filter( year > 1995 & year <= 2000) ggplot(sub_year, aes(x = factor(year), y = deaths)) + geom_boxplot(outlier.shape = NA) + # don't show outliers will below geom_jitter(height = 0)
24/90
![Page 25: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/25.jpg)
facets: plotting multiple panels
A facet will make a plot over variables, keeping axes the same (out can changethat):
sub %>% ggplot(aes(x = year, y = deaths)) + geom_line() + facet_wrap(~ country)
25/90
![Page 26: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/26.jpg)
facets: plotting multiple panels
sub %>% ggplot(aes(x = year, y = deaths)) + geom_line() + facet_wrap(~ country, ncol = 1)
26/90
![Page 27: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/27.jpg)
facets: plotting multiple panels
You can use facets in qplot
qplot(x = year, y = deaths, geom = "line", facets = ~ country, data = sub)
27/90
![Page 28: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/28.jpg)
facets: plotting multiple panels
You can also do multiple factors with + on the right hand side
sub %>% ggplot(aes(x = year, y = deaths)) + geom_line() + facet_wrap(~ country + x2 + ... )
28/90
![Page 30: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/30.jpg)
Devices
By default, R displays plots in a separate panel. From there, you can export theplot to a variety of image file types, or copy it to the clipboard.
However, sometimes its very nice to save many plots made at one time to onepdf file, say, for flipping through. Or being more precise with the plot size in thesaved file.
R has 5 additional graphics devices: bmp(), jpeg(), png(), tiff(), and pdf()
30/90
![Page 31: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/31.jpg)
Devices
The syntax is very similar for all of them:
Basically, you are creating a pdf file, and telling R to write any subsequent plotsto that file. Once you are done, you turn the device off. Note that failing to turnthe device off will create a pdf file that is corrupt, that you cannot open.
pdf("filename.pdf", width=8, height=8) # inches plot() # plot 1 plot() # plot 2 # etc dev.off()
31/90
![Page 32: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/32.jpg)
Labels and such
xlab/ylab - functions to change the labels; ggtitle - change the title·
q = qplot(x = year, y = deaths, colour = country, data = sub, geom = "line") + xlab("Year of Collection") + ylab("Deaths /100,000") + ggtitle("Mortality of Children over the years", subtitle = "not great") q
32/90
![Page 33: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/33.jpg)
Saving the output:
png("deaths_over_time.png") print(q) dev.off()
quartz_off_screen 2
file.exists("deaths_over_time.png")
[1] TRUE
33/90
![Page 34: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/34.jpg)
Themes
see ?theme_bw - for ggthemes - black and white·
q + theme_bw()
34/90
![Page 35: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/35.jpg)
Themes: change plot parameters
theme - global or specific elements/increase text size·
q + theme(text = element_text(size = 12), title = element_text(size = 20))
35/90
![Page 36: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/36.jpg)
Themes
q = q + theme(axis.text = element_text(size = 14), title = element_text(size = 20), axis.title = element_text(size = 16), legend.position = c(0.9, 0.8)) + guides(colour = guide_legend(title = "Country")) q
36/90
![Page 37: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/37.jpg)
Code for a transparent legend
transparent_legend = theme(legend.background = element_rect( fill = "transparent"), legend.key = element_rect(fill = "transparent", color = "transparent") ) q + transparent_legend
37/90
![Page 39: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/39.jpg)
Histograms again: Changing bins
qplot(x = deaths, data = sub, bins = 200)
39/90
![Page 40: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/40.jpg)
Multiple Histograms
qplot(x = deaths, fill = factor(country), data = sub, geom = c("histogram"))
40/90
![Page 41: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/41.jpg)
Multiple Histograms
Alpha refers to the opacity of the color, less is more opaque
qplot(x = deaths, fill = country, data = sub, geom = c("histogram"), alpha=.7)
41/90
![Page 42: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/42.jpg)
Multiple Densities
We cold also do densities:
qplot(x= deaths, fill = country, data = sub, geom = c("density"), alpha= .7) + guides(alpha = FALSE)
42/90
![Page 43: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/43.jpg)
Multiple Densities
using colour not fill:·
qplot(x = deaths, colour = country, data = sub, geom = c("density"), alpha= .7) + guides(alpha = FALSE)
43/90
![Page 44: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/44.jpg)
Multiple Densities
You can take off the lines of the bottom like this
ggplot(aes(x = deaths, colour = country), data = sub) + geom_line(stat = "density")
44/90
![Page 45: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/45.jpg)
ggplot2
qplot(x = year, y = deaths, colour = country, data = long, geom = "line") + guides(colour = FALSE)
45/90
![Page 46: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/46.jpg)
ggplot2
Let’s try to make it different like base R, a bit. We use tile for the geom:
qtile = qplot(x = year, y = country, fill = deaths, data = sub, geom = "tile") + xlim(1990, 2005) + guides(colour = FALSE)
46/90
![Page 47: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/47.jpg)
ggplot2: changing colors
scale_fill_gradient let’s us change the colors for the fill:
qtile + scale_fill_gradient( low = "blue", high = "red")
47/90
![Page 48: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/48.jpg)
ggplot2
Let’s try categories.
sub$cat = cut(sub$deaths, breaks = c(0, 1, 2, max(sub$deaths))) q2 = qplot(x = year, y = country, fill = cat, data = sub, geom = "tile") + guides(colour = FALSE)
48/90
![Page 49: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/49.jpg)
Colors
It’s actually pretty hard to make a good color palette. Luckily, smart and artisticpeople have spent a lot more time thinking about this. The result is theRColorBrewer package
RColorBrewer::display.brewer.all() will show you all of the palettesavailable. You can even print it out and keep it next to your monitor forreference.
The help file for brewer.pal() gives you an idea how to use the package.
You can also get a “sneak peek” of these palettes at: http://colorbrewer2.org/ .You would provide the number of levels or classes of your data, and then thetype of data: sequential, diverging, or qualitative. The names of theRColorBrewer palettes are the string after ‘pick a color scheme:’
49/90
![Page 50: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/50.jpg)
ggplot2: changing colors
scale_fill_brewer will allow us to use these palettes conveniently
q2 + scale_fill_brewer( type = "div", palette = "RdBu" )
50/90
![Page 51: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/51.jpg)
Bar Plots with a table
cars = read_csv( "http://johnmuschelli.com/intro_to_r/data/kaggleCarAuction.csv", col_types = cols(VehBCost = col_double())) counts < table(cars$IsBadBuy, cars$VehicleAge)
51/90
![Page 52: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/52.jpg)
Bar Plots
Stacked Bar Charts are sometimes wanted to show distributions of data·
barplot(counts, main="Car Distribution by Age and Bad Buy Status", xlab="Vehicle Age"
52/90
![Page 53: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/53.jpg)
Bar Plots
prop.table allows you to convert a table to proportions (depends on margin -either row percent or column percent)
## Use percentages ﴾column percentages﴿ barplot(prop.table(counts, 2), main = "Car Distribution by Age and Bad Buy Status", xlab="Vehicle Age", col=c("darkblue","red"), legend = rownames(counts))
53/90
![Page 54: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/54.jpg)
Bar Plots
ggplot(aes(fill = factor(IsBadBuy), x = VehicleAge), data = cars) + geom_bar()
54/90
![Page 55: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/55.jpg)
Normalized Stacked Bar charts
we must calculate percentages on our own·
perc = cars %>% group_by(IsBadBuy, VehicleAge) %>% tally() %>% ungroup head(perc)
# A tibble: 6 x 3 IsBadBuy VehicleAge n <dbl> <dbl> <int> 1 0 0 2 2 0 1 2969 3 0 2 7942 4 0 3 14601 5 0 4 15149 6 0 5 11061
55/90
![Page 56: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/56.jpg)
Each Age adds to 1
perc_is_bad = perc %>% group_by(VehicleAge) %>% mutate(perc = n / sum(n)) ggplot(aes(fill = factor(IsBadBuy), x = VehicleAge, y = perc), data = perc_is_bad) + geom_bar(stat = "identity")
56/90
![Page 57: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/57.jpg)
Each Bar adds to 1 for bad buy or not
perc_yr = perc %>% group_by(IsBadBuy) %>% mutate(perc = n / sum(n)) ggplot(aes(fill = factor(VehicleAge), x = IsBadBuy, y = perc), data = perc_yr) + geom_bar(stat = "identity")
57/90
![Page 58: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/58.jpg)
Histograms again
We can do histograms again using hist. Let’s do histograms of weight at all timepoints for the chick’s weights. We reiterate how useful these are to show yourdata.
hist(ChickWeight$weight, breaks = 20)
58/90
![Page 59: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/59.jpg)
Multiple Histograms
qplot(x = weight, fill = factor(Diet), data = ChickWeight, geom = c("histogram"))
59/90
![Page 60: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/60.jpg)
Multiple Histograms
Alpha refers tot he opacity of the color, less is
qplot(x = weight, fill = Diet, data = ChickWeight, geom = c("histogram"), alpha=.7)
60/90
![Page 61: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/61.jpg)
Multiple Densities
We cold also do densities
qplot(x= weight, fill = Diet, data = ChickWeight, geom = c("density"), alpha= .7)
61/90
![Page 62: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/62.jpg)
Multiple Densities
qplot(x= weight, colour = Diet, data = ChickWeight, geom = c("density"), alpha=.7)
62/90
![Page 63: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/63.jpg)
Multiple Densities
ggplot(aes(x= weight, colour = Diet), data = ChickWeight) + geom_density(alpha=.7)
63/90
![Page 64: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/64.jpg)
Multiple Densities
You can take off the lines of the bottom like this
ggplot(aes(x = weight, colour = Diet), data = ChickWeight) + geom_line(stat = "density")
64/90
![Page 65: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/65.jpg)
Spaghetti plot
We can make a spaghetti plot by telling ggplot we want a “line”, and each line iscolored by Chick.
qplot(x=Time, y=weight, colour = factor(Chick), data = ChickWeight, geom = "line")
65/90
![Page 66: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/66.jpg)
Spaghetti plot: Facets
In ggplot2, if you want separate plots for something, these are referred to asfacets.
qplot(x = Time, y = weight, colour = factor(Chick), facets = ~Diet, data = ChickWeight, geom = "line")
66/90
![Page 67: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/67.jpg)
Spaghetti plot: Facets
We can turn off the legend (referred to a “guide” in ggplot2). (Note - there isdifferent syntax with the +)
qplot(x=Time, y=weight, colour = factor(Chick), facets = ~ Diet, data = ChickWeight, geom = "line") + guides(colour=FALSE)
67/90
![Page 68: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/68.jpg)
Spaghetti plot: Facets
ggplot(aes(x = Time, y = weight, colour = factor(Chick)), data = ChickWeight) + geom_line() + facet_wrap(facets = ~Diet) + guides(colour = FALSE)
68/90
![Page 69: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/69.jpg)
ggplot2
Let’s try this out on the childhood mortality data used above. However, let’s dosome manipulation first, by using gather on the data to convert to long.
library(tidyverse) long = death long = long %>% gather(year, deaths, country) head(long, 2)
# A tibble: 2 x 3 country year deaths <chr> <chr> <dbl> 1 Afghanistan 1760 NA 2 Albania 1760 NA
69/90
![Page 70: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/70.jpg)
ggplot2
Let’s also make the year numeric, as we did above in the stand-alone yearvariable.
library(stringr) library(dplyr) long$year = long$year %>% str_replace("^X", "") %>% as.numeric long = long %>% filter(!is.na(deaths))
70/90
![Page 71: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/71.jpg)
ggplot2
qplot(x = year, y = deaths, colour = country, data = long, geom = "line") + guides(colour = FALSE)
71/90
![Page 72: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/72.jpg)
ggplot2
Let’s try to make it different like base R, a bit. We use tile for the geometricunit:
qplot(x = year, y = country, colour = deaths, data = long, geom = "tile") + guides(colour = FALSE)
72/90
![Page 73: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/73.jpg)
ggplot2
Useful links:
http://docs.ggplot2.org/0.9.3/index.html
http://www.cookbook-r.com/Graphs/
·
·
73/90
![Page 75: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/75.jpg)
Base Graphics - explore on yourown
![Page 76: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/76.jpg)
Basic Plots
library(dplyr) sweden = death %>% filter(country == "Sweden") %>% select(country) year = as.numeric(colnames(sweden)) plot(as.numeric(sweden) ~ year)
76/90
![Page 77: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/77.jpg)
Base Graphics parameters
Set within most plots in the base ‘graphics’ package:
pch = point shape, http://voteview.com/symbols_pch.htm
cex = size/scale
xlab, ylab = labels for x and y axes
main = plot title
lwd = line density
col = color
cex.axis, cex.lab, cex.main = scaling/sizing for axes marks, axes labels, and title
·
·
·
·
·
·
·
77/90
![Page 78: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/78.jpg)
Basic Plots
The y-axis label isn’t informative, and we can change the label of the y-axis usingylab (xlab for x), and main for the main title/label.
plot(as.numeric(sweden) ~ year, ylab = "# of deaths per family", main = "Sweden", type = "l")
78/90
![Page 79: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/79.jpg)
Basic Plots
Let’s drop any of the projections and keep it to year 2012, and change the pointsto blue.
plot(as.numeric(sweden) ~ year, ylab = "# of deaths per family", main = "Sweden", xlim = c(1760,2012), pch = 19, cex=1.2,col="blue")
79/90
![Page 80: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/80.jpg)
Basic Plots
You can also use the subset argument in the plot() function, only when usingformula notation:
plot(as.numeric(sweden) ~ year, ylab = "# of deaths per family", main = "Sweden", subset = year < 2015, pch = 19, cex=1.2,col="blue")
80/90
![Page 81: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/81.jpg)
Bar Plots
Using the beside argument in barplot, you can get side-by-side barplots.
# Stacked Bar Plot with Colors and Legend barplot(counts, main="Car Distribution by Age and Bad Buy Status", xlab="Vehicle Age", col=c("darkblue","red"), legend = rownames(counts), beside=TRUE)
81/90
![Page 82: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/82.jpg)
Boxplots, revisited
These are one of my favorite plots. They are way more informative than thebarchart + antenna…
boxplot(weight ~ Diet, data=ChickWeight, outline=FALSE) points(ChickWeight$weight ~ jitter(as.numeric(ChickWeight$Diet),0.5))
82/90
![Page 83: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/83.jpg)
Formulas
Formulas have the format of y ~ x and functions taking formulas have a dataargument where you pass the data.frame. You don’t need to use $ or referencingwhen using formulas:
boxplot(weight ~ Diet, data=ChickWeight, outline=FALSE)
83/90
![Page 84: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/84.jpg)
Colors
R relies on color ‘palettes’.
palette("default") plot(1:8, 1:8, type="n") text(1:8, 1:8, lab = palette(), col = 1:8)
84/90
![Page 85: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/85.jpg)
Colors
The default color palette is pretty bad, so you can try to make your own.
palette(c("darkred","orange","blue")) plot(1:3,1:3,col=1:3,pch =19,cex=2)
85/90
![Page 86: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/86.jpg)
Colors
library(RColorBrewer) palette(brewer.pal(5,"Dark2")) plot(weight ~ jitter(Time,amount=0.2),data=ChickWeight, pch = 19, col = Diet,xlab="Time")
86/90
![Page 87: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/87.jpg)
Adding legends
The legend() command adds a legend to your plot. There are tons of argumentsto pass it.
x, y=NULL: this just means you can give (x,y) coordinates, or more commonly justgive x, as a character string:“top”,“bottom”,“topleft”,“bottomleft”,“topright”,“bottomright”.
legend: unique character vector, the levels of a factor
pch, lwd: if you want points in the legend, give a pch value. if you want lines, givea lwd value.
col: give the color for each legend level
87/90
![Page 88: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/88.jpg)
Adding legends
palette(brewer.pal(5,"Dark2")) plot(weight ~ jitter(Time,amount=0.2),data=ChickWeight, pch = 19, col = Diet,xlab="Time") legend("topleft", paste("Diet",levels(ChickWeight$Diet)), col = 1:length(levels(ChickWeight$Diet)), lwd = 3, ncol = 2)
88/90
![Page 89: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/89.jpg)
Coloring by variable
circ = read_csv("http://johnmuschelli.com/intro_to_r/data/Charm_City_Circulator_Ridership.csv"palette(brewer.pal(7,"Dark2")) dd = factor(circ$day) plot(orangeAverage ~ greenAverage, data=circ, pch=19, col = as.numeric(dd)) legend("bottomright", levels(dd), col=1:length(dd), pch = 19)
89/90
![Page 90: Read in Data - John MuschelliDevices By default, R displays plots in a separate panel. From there, you can export the plot to a variety of image file types, or copy it to the clipboard](https://reader035.vdocument.in/reader035/viewer/2022070713/5ed2f8aaa9caa7002f55bc28/html5/thumbnails/90.jpg)
Coloring by variable
dd = factor(circ$day, levels=c("Monday","Tuesday","Wednesday", "Thursday","Friday","Saturday","Sunday")) plot(orangeAverage ~ greenAverage, data=circ, pch=19, col = as.numeric(dd)) legend("bottomright", levels(dd), col=1:length(dd), pch = 19)
90/90