data visualization and graphic design special topics
DESCRIPTION
Data visualization and graphic design Special topics. Allan Just and Andrew Rundle EPIC Short Course June 24, 2011. Wickham 2008. Agenda. Quick hits Layer order in Deducer Bubble charts ggplot2 quasi- beanplot Being on your own with ggplot2 and R – getting unstuck - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/1.jpg)
Data visualization and graphic designSpecial topics
Allan Just and Andrew RundleEPIC Short CourseJune 24, 2011
Wickham 2008
![Page 2: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/2.jpg)
2
Quick hits• Layer order in Deducer• Bubble charts• ggplot2 quasi-beanplot
Being on your own with ggplot2 and R – getting unstuck
Small datasets revisitedLarge datasetsDisplaying uncertainty
Automated generation of many plots
Extending ggplot2 – direct labels and scatterplot matrices
New geoms
More practice exercises!
Wrap up
Agenda
![Page 3: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/3.jpg)
3
A theory about practice…
![Page 4: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/4.jpg)
4
Getting unstuck…• Check the str() of your data• Check the console for error messages
• Look at the call for your plot – is that what you wanted?
• Easier to start with something that works but is too simple1. Simplify the plot until it works2. Add back components one-by-one to isolate the
problem
![Page 5: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/5.jpg)
5
Reproducible examples and the ggplot2 listserve
http://groups.google.com/group/ggplot2
Compose your question well and you might figure out the answer in the process!
![Page 6: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/6.jpg)
6
Data + summaryLoss of information
![Page 7: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/7.jpg)
7
Better than bar charts…data(airquality)# open the plot builder and add geom_point# with x = Month and y = Ozone
Data + summary – building this ourselves…
![Page 8: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/8.jpg)
8
Pseudo beanplotsg_violin_bean <- ggplot(sleep,
aes(x = extra)) +geom_ribbon(aes(ymax = ..density.., ymin = -..density..), stat = "density", fill = "black") + geom_segment(aes(y = -.05, yend = .05, xend = extra), color = "grey90") + facet_grid(. ~ group, as.table = FALSE, scales = "free_y") +opts(panel.margin = unit(0 , "lines")) + xlab(NULL) + theme_bw(base_size = 20) + coord_flip() + opts(axis.text.x = theme_blank()) + expand_limits(x = c(-5, 9))
g_violin_bean
![Page 9: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/9.jpg)
What about large datasets?
![Page 10: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/10.jpg)
10
Playing with diamonds…
data(diamonds)str(diamonds)
With your neighbor: how do we show the data on the caret – price relationship…
![Page 11: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/11.jpg)
11
Strategies for large datasets
– Use smaller points - use circles
– Use partial transparency
– Jitter (small random noise) if data take discrete values
– Overlay a smoother to show the trend
– Display a random sample from your data
![Page 12: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/12.jpg)
12
Partial transparencyAlpha = 0.01
Contours for densityAlpha = 0.1
How do you show 54,000 diamonds?
Hexagonal binswith legend
![Page 13: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/13.jpg)
13
Displaying uncertainty
• Confidence intervals (uniformly shaded or bounded)
• Pointwise errorbars• Bayesian simulations• Resampling based estimates
![Page 14: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/14.jpg)
14
Model shouldn’t extend beyond the range of your dataxkcd.com/605/
![Page 15: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/15.jpg)
15
![Page 16: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/16.jpg)
16
![Page 17: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/17.jpg)
17
![Page 18: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/18.jpg)
18
Graph your uncertaintyInformal Bayesian Simulation
1. Run regression
2. Draw random numbers based on uncertainty of your regression
3. Plot some lines!
4. Uses the sim() function in package “arm”
2~
/ˆ
knXfor
Xkn
Gelman and Hill 2007
![Page 19: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/19.jpg)
19
Informal bayesian simulation
Figure 3. Association between DEP concentrations in personal air and the urinary metabolite MEP concentrations (adjusted for specific gravity) stratified by perfume use using linear regression of log transformed values. Lighter lines represent predictive uncertainty in regression parameters from informal Bayesian simulations (20 simulation draws with uniform priors). Boxplots show the distribution of MEP with means (“X”). Just et al 2010
![Page 20: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/20.jpg)
20
Resampling - Spline after bootstrap
Cosma Shalizi 2010
![Page 21: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/21.jpg)
21
How random is random - the qq-plot
qqreference from package DAAG
![Page 22: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/22.jpg)
22
a Q-Q envelope – show range from 19 draws of random normal
Venables and Ripley
![Page 23: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/23.jpg)
23
Generating many graphsExample: suppose we wanted to save a separate plot
of mileage for each car manufacturer in "mpg"Start with data formatted so that it is long…
manufacturer cty hwy1 audi 18 292 audi 21 2925 chevrolet 15 2326 chevrolet 16 26100 honda 28 33101 honda 24 32
Use the magic of R and ggplot2…
![Page 24: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/24.jpg)
24
Generating many graphsExample: suppose we wanted to save a separate plot
of mileage for each car manufacturer in "mpg"Start with data formatted so that it is long…
manufacturer cty hwy1 audi 18 292 audi 21 2925 chevrolet 15 2326 chevrolet 16 26100 honda 28 33101 honda 24 32
• Use d_ply (from the plyr package – also by Hadley Wickham) to split up the dataframe by our subsetting variable
• Define a function to run on subsets; we name these smaller dataframes "dat"
• Call ggplot() and ggsave() within this function to generate and save our plot
![Page 25: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/25.jpg)
25
Generating many graphsExample: suppose we wanted to save a separate plot
of mileage for each car manufacturer in "mpg"
# d_ply takes a dataframe, splits it apart, applies a functiond_ply(mpg, .(manufacturer), function(dat) { # create a ggplot2 object named figure using 'dat'
figure <- ggplot(dat, aes(cty, hwy)) + geom_smooth(method = "lm") + geom_point(alpha = 0.7, size = 2.5,
position = position_jitter(height = 0.1, width = 0.1)) +
annotate("text", x = -Inf, y = Inf, hjust = -.1, vjust = 1.2,label = paste("n =", nrow(dat))) +
opts(title = dat$manufacturer[1]) # unique title can help# create a unique filename for each subset (e.g. "MPG_Audi.png")filename <- paste("MPG_", dat$manufacturer[1], ".png", sep = "")# by default this saves to your working directory; see ?getwdggsave(filename, figure, height = 6.5, width = 10)
})
![Page 26: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/26.jpg)
26
Extending ggplot2
Let's get some more packages:install.packages()
directlabels GGally
![Page 27: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/27.jpg)
27
Extending ggplot2: directlabels
![Page 28: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/28.jpg)
28
# original code adapted from http://learnr.wordpress.com
library(ggplot2)# define the datasetdf <- structure(list(City = structure(c(2L, 3L, 1L), .Label = c("Minneapolis", "Phoenix", "Raleigh"), class = "factor"), January = c(52.1, 40.5, 12.2), February = c(55.1, 42.2, 16.5), March = c(59.7, 49.2, 28.3), April = c(67.7, 59.5, 45.1), May = c(76.3, 67.4, 57.1), June = c(84.6, 74.4, 66.9), July = c(91.2, 77.5, 71.9), August = c(89.1, 76.5, 70.2), September = c(83.8, 70.6, 60), October = c(72.2, 60.2, 50), November = c(59.8, 50, 32.4), December = c(52.5, 41.2, 18.6)), .Names = c("City", "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"), class = "data.frame", row.names = c(NA, -3L))#and season labelsseasons <- data.frame(month = c(1.5, 4.5, 7.5, 10.5),
value = 97, season = c("Winter", "Spring", "Summer", "Autumn"))
# melt the dataset to a long formatdfm <- melt(df, variable_name = "month")levels(dfm$month) <- month.abb
#build the basic plotp <- ggplot(dfm, aes(month, value, group = City, colour = City)) p1 <- p + geom_line(size = 1)dgr_fmt <- function(x, ...) { parse(text = paste(x, "*degree", sep = "")) }none <- theme_blank() p2 <- p1 + theme_bw() + scale_y_continuous(formatter = dgr_fmt, limits = c(0, 100), expand = c(0, 0)) +
xlab(NULL) + ylab(NULL) + opts(title = expression("Average Monthly Temperatures (" * degree * "F)"), panel.grid.major = none, panel.grid.minor = none, legend.position = "none",panel.background = none,panel.border = none,axis.line = theme_segment(colour = "grey50"))
(p3 <- p2 + geom_vline(xintercept = c(2.9, 5.9, 8.9, 11.9), colour = "grey85", alpha = 0.5) + geom_hline(yintercept = 32, colour = "grey80", alpha = 0.5) + annotate("text", x = 1.2, y = 35, label = "Freezing", colour = "grey80", size = 4) + geom_text(data = seasons, aes(label = season, group = NULL), colour = "grey70", size = 4))
(p4 <- p3 + geom_text(data = dfm[dfm$month == "Dec", ], aes(label = City), hjust = 0.7, vjust = 1))
data_table <- ggplot(dfm, aes(x = month, y = factor(City), label = format(value, nsmall = 1), colour = City)) + geom_text(size = 3.5) + theme_bw() + scale_y_discrete(formatter = abbreviate, limits = c("Minneapolis", "Raleigh", "Phoenix")) + xlab(NULL) + ylab(NULL) + opts(panel.grid.major = none, legend.position = "none", panel.border = none, axis.text.x = none, axis.ticks = none,plot.margin = unit(c(-0.5, 1, 0, 0.5), "lines"))
Layout <- grid.layout(nrow = 2, ncol = 1, heights = unit(c(2, 0.25), c("null", "null")))grid.show.layout(Layout)vplayout <- function(...) { grid.newpage()
pushViewport(viewport(layout = Layout)) }subplot <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y)mmplot <- function(a, b) { vplayout()
print(a, vp = subplot(1, 1)) print(b, vp = subplot(2, 1)) }
mmplot(p4, data_table)
# to save - run the following code - see ?png###### png("temperature_plot.png")# mmplot(p4, data_table)# dev.off()
#note that when we were at the p3 stage we didn't yet have labels for the datap3
library(directlabels) # code to put labels into your ggplot2 objectsp3.labelled <- direct.label(p3, list(last.points, hjust = 0.7, vjust = 1))p3.labelled #############################
A fully polished plot probably took a lot of coding
![Page 29: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/29.jpg)
29
Extending ggplot2: GGallyScatterplot matrix: 36 plots showing ~9K measures
bivariate densities and correlations
![Page 30: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/30.jpg)
30
![Page 31: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/31.jpg)
31
Making a scatterplot matrixlibrary(GGally)data(iris)head(iris[, 3:5]) #iris columns 3 to 5
# example 1 - defaultsggpairs(iris[, 3:5])
# example 2 – more customized by data typeggpairs(iris[,3:5],
upper = list(continuous = "density", combo = "box"), lower = list(continuous = "points", combo = "dot"), diag = list(continuous = "bar", discrete = "bar"))
# example 3 – some new stuff!!!dat <- data.frame(x = rnorm(100),
y = rnorm(100),z = rnorm(100))
plotmatrix <- GGally::ggpairs(dat,lower = list(continuous = "density", aes_string = aes_string(fill = "..level..")),upper = "blank")
plotmatrix#EOF
![Page 32: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/32.jpg)
32
Thinking about some new geoms
![Page 33: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/33.jpg)
33
Showing density surfaces from stat_density2d
Let's make a plot of x and y from data.frame dat with stat_density2d
What is the default geom?
In the previous plot, which aesthetic was showing those colors?
What geom would we need to make that plot?
![Page 34: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/34.jpg)
34
geom_rug to show marginal distribution
![Page 35: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/35.jpg)
35
![Page 36: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/36.jpg)
36
![Page 37: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/37.jpg)
37
![Page 38: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/38.jpg)
38
![Page 39: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/39.jpg)
39
geom_polygon after computing the convex outer hull, labels at the centroids, moved the legend to the top
![Page 40: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/40.jpg)
40
![Page 41: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/41.jpg)
41
“Hey, what did you learn in that EPIC class you took?”
![Page 42: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/42.jpg)
42
Recap: Why we did thisVisualization is important for communicating
information and promoting your ideas
Effective designs will be noticed
We make many graphs quickly for discovery and choose the best ones to polish for communication
With a theory of visualization we can create sophisticated graphics using basic components
![Page 43: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/43.jpg)
Recap: Designing a good scientific figure
1. Answer a question – usually a comparison
2. Use an appropriate design (emphasize comparisons
of position before length, angle, area or color)
3. Make it self-sufficient (annotation & figure legend)
4. Show your data – tell its story
![Page 44: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/44.jpg)
44
Recap: ggplot2 and R R is a powerful language for statistics and data analysis
ggplot2 implements a “grammar of graphics”
ggplot2: Builds plots using data,
and layers of geometric objects,
mapping variables to aesthetic features,
which have been transformed by scales,
summarized with statistics,
projected into a coordinate system,
and subset into adjacent plots with facets
![Page 45: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/45.jpg)
45
Recap: JGR and Deducer
JGR: a graphic interface system for R programming
Deducer: adds menu driven analysis and plotting
![Page 46: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/46.jpg)
46
Send R code to Console
Deducer: Plot BuilderSave or import .ggp file
View call to see R code
ggsave("plot.png", height = 6.5, width = 10)
![Page 47: Data visualization and graphic design Special topics](https://reader035.vdocument.in/reader035/viewer/2022062501/56816464550346895dd645fe/html5/thumbnails/47.jpg)
47
GeomData
Stat
Order of drawing layers
Mapped vars
More optionsby component
Switch to map to a var
Right-click to Get info
Right-click to edit, toggle, remove
Adjust position
Set to a constant value
Deducer: Plot Builder