science online 2013: data visualization using r
DESCRIPTION
R TalkTRANSCRIPT
![Page 1: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/1.jpg)
Data Visualization using R
How to get, manage, and present data to tell a compelling science
story
William Gunn @mrgunn Head of Academic Outreach, Mendeley
Access point: NRC Visitor
![Page 2: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/2.jpg)
1. A short history of graphical presentation of data
2. Introduction to R
3. Finding, cleaning, and presenting data
4. Reproducibility and data sharing
![Page 3: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/3.jpg)
Data viz has a long history
John Snow’s cholera map helped communicate the idea that cholera was a water-borne disease.
![Page 4: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/4.jpg)
Florence Nightingale used dataviz
![Page 5: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/5.jpg)
Modernization of dataviz
![Page 6: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/6.jpg)
Chart junk: good, bad, and ugly
Which presentation is better?
![Page 7: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/7.jpg)
![Page 8: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/8.jpg)
It can be elegant…
![Page 9: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/9.jpg)
![Page 10: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/10.jpg)
Tufte
![Page 11: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/11.jpg)
Tufte
![Page 12: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/12.jpg)
How our eyes and brain perceive
It takes 200 ms to initiate an eye movement, but the red dot can be found in 100 ms or less. This is due to pre-attentive processing.
![Page 13: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/13.jpg)
Shape is a little slower than color!
![Page 14: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/14.jpg)
Pre-attentive processing fails!
![Page 15: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/15.jpg)
There are many “primitive” properties which we perceive
• Length • Width • Size • Density • Hue • Color intensity • Depth • 3-D orientation
![Page 16: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/16.jpg)
Length
![Page 17: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/17.jpg)
Width
![Page 18: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/18.jpg)
Density
![Page 19: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/19.jpg)
Hue
![Page 20: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/20.jpg)
Color Intensity
![Page 21: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/21.jpg)
Depth
![Page 22: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/22.jpg)
3D orientation
![Page 23: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/23.jpg)
![Page 24: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/24.jpg)
Types of color schemes
• Sequential – suited for ordered data that progress from low to high. Use light colors for low values and dark colors for higher.
• Diverging – uses hue to show the breakpoint and intensity to show divergent extremes.
• Qualitative – uses different colors to represent different categories. Beware of using hue/saturation to highlight unimportant categories.
![Page 25: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/25.jpg)
Sequential
http://colorbrewer2.org/
![Page 26: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/26.jpg)
Diverging
![Page 27: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/27.jpg)
Qualitative
![Page 28: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/28.jpg)
Tips for maps
• Keep it to 5-7 data classes
• ~8% of men are red-green colorblind
• Diverging schemes don’t do well when printed or photocopied
• Colors will often render differently on different screens, especially low-end LCD screens
• http://colorbrewer2.org
![Page 29: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/29.jpg)
Part 2
Introduction to R
![Page 30: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/30.jpg)
Why R?
• Open source tool
• Huge variety of packages for any kind of analysis
• Saves time repeating data processing steps
• Allows working with more diverse types of data and much larger datasets than Excel
• Processing is much faster than Excel
• Scripts are easily shareable, promoting reproducible work
![Page 31: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/31.jpg)
.csv and .xls / xlsx
• Excel files are designed to hold the appearance of the spreadsheet in addition to the data.
• R just wants the data, so always save as .csv if you have tabular data
![Page 32: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/32.jpg)
data structures
• x<-c(1,2,3,4,5,6,7,8,9,10)
• x
• length(x)
• x[1]
• x[2]
• x<-c(1:10)
• x
![Page 33: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/33.jpg)
types of data
• y<-c(“abc”, “def”, “g”, “h”, “i”)
• y
• class(y)
• y[2]
• length(y)
• data can be integer (1,2,3,…), numeric (1.0, 2.3, …), character (a, b, c,…), logical (TRUE, FALSE) or other things
![Page 34: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/34.jpg)
Vectors
• R can hold data organized a few different ways
• vectors (1,2,3,4) but not (1,2,3,x,y,z)
• lists – can hold heterogeneous data
– 1
– 2
– a
• x
• arrays – multi-dimensional
• dataframes – lists of vectors - like
spreadsheets
![Page 35: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/35.jpg)
Vector operations
• x + 1
• x
• sum(x)
• mean(x)
• mean(x+1)
• x[2]<-x[2]+1
• x
• x+c(2:3)
• x[2:10] + c(2:3)
![Page 36: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/36.jpg)
working with lists
• y<-list(name = “Bob”, age = 24)
• y
• y$name
• y[1]
• y[[1]]
• class(y[1])
• class(y[[1]])
• y<-list(y$name, “Sue”)
• y$name
• y$age[2]<-list(33)
![Page 37: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/37.jpg)
Loading data
• data<-read.csv("C:/Users/William Gunn/Desktop/Dropbox/Scripting/Data/traffic_accidents/accidents2010_all.csv", header = TRUE, stringsAsFactors = FALSE)
![Page 38: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/38.jpg)
Selecting subsets of data
• “[“
• “$”
• which
• grep and grepl
• subset
![Page 39: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/39.jpg)
PLOTS
• ggplot2 – an implementation of the “grammar of graphics” in R
• a set of graph types and a way of mapping variables to graph features
• graph types are called “geoms”
• mappings are “aesthetics”
• graphs are built up by layering geoms
![Page 40: Science Online 2013: Data Visualization Using R](https://reader034.vdocument.in/reader034/viewer/2022052522/5484e9e85806b5d1588b46a5/html5/thumbnails/40.jpg)
Types of geoms
• point – dotplot – takes x,y coords of points
• abline – line layer – takes slope, intercept
• line – connect points with a line
• smooth – fit a curve
• bar – aka histogram – takes vector of data
• boxplot – box and whiskers
• density – to show relative distributions
• errorbar – what it says on the tin