introduction to ggplot2
DESCRIPTION
TRANSCRIPT
Introduction to ggplot2Elegant Graphics for Data Analysis
Maik Röder15.12.2011
RUGBCN and Barcelona Code Meetup
1vendredi 16 décembre 2011
Data Analysis Steps• Prepare data
• e.g. using the reshape framework for restructuring data
• Plot data
• e.g. using ggplot2 instead of base graphics and lattice
• Summarize the data and refine the plots
• Iterative process
2vendredi 16 décembre 2011
ggplot2grammar of graphics
3vendredi 16 décembre 2011
Grammar
• Oxford English Dictionary:
• The fundamental principles or rules of an art or science
• A book presenting these in methodical form. (Now rare; formerly common in the titles of books.)
• System of rules underlying a given language
• An abstraction which facilitates thinking, reasoning and communicating
4vendredi 16 décembre 2011
The grammar of graphics
• Move beyond named graphics (e.g. “scatterplot”)
• gain insight into the deep structure that underlies statistical graphics
• Powerful and flexible system for
• constructing abstract graphs (set of points) mathematically
• Realizing physical representations as graphics by mapping aesthetic attributes (size, colour) to graphs
• Lacking openly available implementation
5vendredi 16 décembre 2011
Specification
• DATA - data operations that create variables from datasets. Reshaping using an Algebra with operations
• TRANS - variable transformations
• SCALE - scale transformations
• ELEMENT - graphs and their aesthetic attributes
• COORD - a coordinate system
• GUIDE - one or more guides
Concise description of components of a graphic
6vendredi 16 décembre 2011
Birth/Death Rate
Source: http://www.scalloway.org.uk/popu6.htm
7vendredi 16 décembre 2011
Excess birth (vs. death) rates in selected countries
Source: The grammar of Graphics, p.138vendredi 16 décembre 2011
Grammar of Graphics
DATA: source("demographics")DATA: longitude, latitude = map(source("World"))TRANS: bd = max(birth - death, 0)COORD: project.mercator()ELEMENT: point(position(lon * lat), size(bd), color(color.red))ELEMENT: polygon(position(longitude * latitude))
Source: The grammar of Graphics, p.13
Specification can be run in GPL implemented in SPSS
9vendredi 16 décembre 2011
Grammar of Graphics
DataTrans
Element
ScaleGuide
Coord
Layered Grammar of Graphics Defaults
DataMapping
LayerDataMappingGeomStatPosition
ScaleCoordFacet
Rearrangement of Components
10vendredi 16 décembre 2011
Layered Grammar of Graphics
w <- worldd <- demographicsd <- transform(d, bd = pmax(birth - death, 0))p <- ggplot(d, aes(lon, lat)) p <- p + geom_polygon(data = w)p <- p + geom_point(aes(size = bd), colour = "red")p <- p + coord_map(projection = "mercator")p
Implementation embedded in R using ggplot2
11vendredi 16 décembre 2011
ggplot2
• Author: Hadley Wickham
• Open Source implementation of the layered grammar of graphics
• High-level R package for creating publication-quality statistical graphics
• Carefully chosen defaults following basic graphical design rules
• Flexible set of components for creating any type of graphics
12vendredi 16 décembre 2011
ggplot2 installation
• In R console:
install.packages("ggplot2")library(ggplot2)
13vendredi 16 décembre 2011
qplot
• Quickly plot something with qplot
• for exploring ideas interactively
• Same options as plot converted to ggplot2
qplot(carat, price, data=diamonds, main = "Diamonds", asp = 1)
14vendredi 16 décembre 2011
15vendredi 16 décembre 2011
Exploring with qplot
qplot(log(carat), log(price), data=diamonds)
qplot(carat, price, data=diamonds)
First try:
Log transform using functions on the variables:
16vendredi 16 décembre 2011
17vendredi 16 décembre 2011
from qplot to ggplot
qplot(carat, price, data=diamonds, main = "Diamonds", asp = 1)
p <- ggplot(diamonds, aes(carat, price)) p <- p + geom_point()p <- p + opts(title = "Diamonds", aspect.ratio = 1)p
18vendredi 16 décembre 2011
Data and mapping
• If you need to flexibly restructure and aggregate data beforehand, use Reshape
• data is considered an independent concern
• Need a mapping of what variables are mapped to what aesthetic
• weight => x, height => y, age => size
• Mappings are defined in scales
19vendredi 16 décembre 2011
Statistical Transformations
• a stat transforms data
• can add new variables to a dataset
• that can be used in aesthetic mappings
20vendredi 16 décembre 2011
stat_smooth
• Fits a smoother to the data
• Displays a smooth and its standard error
ggplot(diamonds, aes(carat, price)) + geom_point() + geom_smooth()
21vendredi 16 décembre 2011
22vendredi 16 décembre 2011
Geometric Object
• Control the type of plot
• A geom can only display certain aesthetics
23vendredi 16 décembre 2011
geom_histogram
ggplot(diamonds, aes(carat)) + geom_histogram()
• Distribution of carats shown in a histogram
24vendredi 16 décembre 2011
25vendredi 16 décembre 2011
Position adjustments
• Tweak positioning of geometric objects
• Avoid overlaps
26vendredi 16 décembre 2011
position_jitter
x <- c(0, 0, 0, 0, 0)y <- c(0, 0, 0, 0, 0)overplotted <- data.frame(x, y)ggplot(overplotted, aes(x,y)) + geom_point(position=position_jitter(w=0.1, h=0.1))
• Avoid overplotting by jittering points
27vendredi 16 décembre 2011
28vendredi 16 décembre 2011
Scales
• Control mapping from data to aesthetic attributes
• One scale per aesthetic
29vendredi 16 décembre 2011
scale_x_continuousscale_y_continuous
x <- c(0, 0, 0, 0, 0)y <- c(0, 0, 0, 0, 0)overplotted <- data.frame(x, y)ggplot(overplotted, aes(x,y)) + geom_point(position=position_jitter(w=0.1, h=0.1)) + scale_x_continuous(limits=c(-1,1)) + scale_y_continuous(limits=c(-1,1))
30vendredi 16 décembre 2011
31vendredi 16 décembre 2011
Coordinate System
• Maps the position of objects into the plane
• Affect all position variables simultaneously
• Change appearance of geoms (unlike scales)
32vendredi 16 décembre 2011
coord_maplibrary("maps")
map <- map("nz", plot=FALSE)[c("x","y")]
m <- data.frame(map)
n <- qplot(x, y, data=m, geom="path")
n
d <- data.frame(c(0), c(0))
n + geom_point(data = d, colour = "red")
33vendredi 16 décembre 2011
34vendredi 16 décembre 2011
Faceting• lay out multiple plots on a page
• split data into subsets
• plot subsets into different panels
35vendredi 16 décembre 2011
Facet Types2D grid of panels: 1D ribbon of panels
wrapped into 2D:
36vendredi 16 décembre 2011
Faceting
aesthetics <- aes(carat, ..density..)p <- ggplot(diamonds, aesthetics)p <- p + geom_histogram(binwidth = 0.2) p + facet_grid(clarity ~ cut)
37vendredi 16 décembre 2011
38vendredi 16 décembre 2011
Faceting Formula
no faceting . ~ .
single row multiple columns . ~ a
single column, multiple rows b ~ .
multiple rows and columns a ~ b
multiple variables in rows and/or columns
. ~ a + ba + b ~.
a + b ~ c + d
39vendredi 16 décembre 2011
Scales in Facets
scales value free
fixed -
free x, y
free_x x
free_y y
facet_grid(. ~ cyl, scales="free_x")
40vendredi 16 décembre 2011
Layers
• Iterativey update a plot
• change a single feature at a time
• Think about the high level aspects of the plot in isolation
• Instead of choosing a static type of plot, create new types of plots on the fly
• Cure against immobility
• Developers can easily develop new layers without affecting other layers
41vendredi 16 décembre 2011
Hierarchy of defaults
Omitted layer Default chosen by layer
Stat Geom
Geom Stat
Mapping Plot default
Coord Cartesian coordinates
Scale Chosen depending on aesthetic and type of variable
PositionLinear scaling for continuous variables
Integers for categorical variables
42vendredi 16 décembre 2011
Thanks!
• Visit the ggplot2 homepage:
• http://had.co.nz/ggplot2/
• Get the ggplot2 book:
• http://amzn.com/0387981403
• Get the Grammar of Graphics book from Leland Wilkinson:
• http://amzn.com/0387245448
43vendredi 16 décembre 2011