intro to ggplot2 - sheffield r users group, feb 2015
TRANSCRIPT
![Page 1: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/1.jpg)
Introduction to ggplot22
Paul Richards, ScHARR, The University of Sheffield
Thursday, February 26, 2015
![Page 2: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/2.jpg)
Introduction
I ggplot2 is a package written by Hadley WickhamI Powerful but easy to use functions for 2D graphicsI Based on the “Grammar of Graphics” theory by Leland
WilkinsonI use install.packages() to install latest version
![Page 3: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/3.jpg)
Concepts
I ggplot2 works with data.framesI aesthetic = a feature we can see on the graphic (shape, size,
colour etc)I map from data to aestheics (i.e. different colour per group)I layer = geometric object + data/aesthetics + statistical
transformationI a graphic will also have scale(s) and a co-ordinate systemI may also have facets - subsetting plot by some characteristic(s)
![Page 4: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/4.jpg)
Quick note on “plot”
I The “plot” function in base R is just a wrapper for lots ofmethods
I Behaviour depends on what object is suppliedI Can require some manual tinkering to get it to work as requiredI Example, using the “iris” data, plot petal length vs sepal lengthI Different colour for each speciesI Point size varies by sepal width
![Page 5: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/5.jpg)
Scatterplot with “plot”
with(iris, plot(Sepal.Length, Petal.Length,col = Species, cex = Sepal.Width))
4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
12
34
56
7
Sepal.Length
Pet
al.L
engt
h
![Page 6: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/6.jpg)
Problems
I Cannot specify dataset within functionI No legend, have to add manually via legend() which is fiddly to
useI Arguments are a bit “dumb” - we need to rescale Sepal.Width
to get better point sizesI Need to use different functions to add new geometric objects,
e.g. regression lines
![Page 7: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/7.jpg)
Qplot
qplot(Sepal.Length, Petal.Length, data = iris,color = Species, size = Sepal.Width)
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h
Species
setosa
versicolor
virginica
Sepal.Width
2.0
2.5
3.0
3.5
4.0
![Page 8: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/8.jpg)
Qplot()
I Qplot is the nearest equivalent to plot() in ggplot2I For single layer plots this is easy enough to useI Use geom argument to change plot type
![Page 9: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/9.jpg)
Boxplot exampleqplot(Species, Sepal.Length, data = iris, geom="boxplot")
5
6
7
8
setosa versicolor virginicaSpecies
Sep
al.L
engt
h
![Page 10: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/10.jpg)
Violin exampleqplot(Species, Sepal.Length, data = iris, geom="violin")
5
6
7
8
setosa versicolor virginicaSpecies
Sep
al.L
engt
h
![Page 11: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/11.jpg)
Histogram exampleqplot(Sepal.Length, data = iris, fill = Species)
0
5
10
4 5 6 7 8Sepal.Length
coun
t
Species
setosa
versicolor
virginica
![Page 12: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/12.jpg)
ggplot()
I For multilayer plots or where more flexibility is requiredI ggplot() sets up the default data and aesthetic mappingsI add layers using the “+” operator and appropriate functionsI all aesthetic mappings are wrapped in aes() functionI global changes (e.g. set all points to “red”) go outside
![Page 13: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/13.jpg)
Iris scatterplot againggplot(data = iris, aes(x = Sepal.Length,
y = Petal.Length, color = Species)) +geom_point(aes(size = Sepal.Width))
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h
Species
setosa
versicolor
virginica
Sepal.Width
2.0
2.5
3.0
3.5
4.0
![Page 14: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/14.jpg)
Alternativeggplot(data = iris, aes(x = Sepal.Length,
y = Petal.Length, color = Species)) +geom_point(size = 5)
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h Species
setosa
versicolor
virginica
![Page 15: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/15.jpg)
Adding to plots
I ggplots are objects so you can save them as you goI You can then add new layers etc to the saved object
gg1 <- ggplot(data = iris, aes(x = Sepal.Length,y = Petal.Length, color = Species)) +
geom_point(aes(size = Sepal.Width))
gg2 <- gg1 + geom_smooth()gg2
![Page 16: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/16.jpg)
Iris plot with loess smoother
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h
Species
setosa
versicolor
virginica
Sepal.Width
2.0
2.5
3.0
3.5
4.0
![Page 17: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/17.jpg)
Add contours
gg3 <- gg2 + geom_density2d()gg3
![Page 18: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/18.jpg)
Add contours
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h
Species
setosa
versicolor
virginica
Sepal.Width
2.0
2.5
3.0
3.5
4.0
![Page 19: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/19.jpg)
Note on statistical transformations
I The loess smoother and contours in the previous plot are notpart of data
I In base plot we would have to calculate them firstI In ggplot2 the stat_*() functions do such tranformations for usI Often the corresponding geom_*() function does this
automaticallyI For more flexibility, use the stat_*() function and specify the
geom you need
![Page 20: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/20.jpg)
Histogram example using stat_bin
I stat_bin performs a 1d “binning” transformationI i.e. a histogram transformation
ggplot(data = iris,aes(x = Sepal.Length, fill = Species)) +
stat_bin(binwidth=1,position="dodge")
![Page 21: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/21.jpg)
Histogram example using stat_bin
0
10
20
30
4 6 8Sepal.Length
coun
t
Species
setosa
versicolor
virginica
![Page 22: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/22.jpg)
Density plot with facet
I Use facet_grid() to subset plots by up to 2 variablesI Function takes a formula as its main argumentI Row variable on left, column variable on rightI Use . if no variable needed
ggplot(data = iris, aes(x = Sepal.Length)) +geom_density(fill = Species) +facet_grid(Species ~ .)
![Page 23: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/23.jpg)
Density plot with facet
0.0
0.4
0.8
1.2
0.0
0.4
0.8
1.2
0.0
0.4
0.8
1.2
setosaversicolor
virginica
5 6 7 8Sepal.Length
dens
ity
Species
setosa
versicolor
virginica
![Page 24: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/24.jpg)
Scale + axis control
I Adding title labels, axes etc is similar to adding layersI Use the “+” operator with the appropriate functions
ggplot(USArrests, aes(x=Assault,y=Murder))+ geom_point(aes(size=UrbanPop))+ labs(title = "Violent Crime Rates by US State, 1973",
x = "Arrests for Assault (per 100 000)",y = "Arrests for Murder (per 100 000)")
![Page 25: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/25.jpg)
Scale + axis control
0
5
10
15
100 200 300Arrests for Assault (per 100 000)
Arr
ests
for
Mur
der
(per
100
000
)
UrbanPop
40
50
60
70
80
90
Violent Crime Rates by US State, 1973
![Page 26: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/26.jpg)
Logged x axis
ggplot(USArrests, aes(x=Assault,y=Murder))+ geom_point(aes(size=UrbanPop))+ labs(title = "Violent Crime Rates by US State, 1973",
x = "Arrests for Assault (per 100 000)",y = "Arrests for Murder (per 100 000)")
+ scale_x_log10()
![Page 27: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/27.jpg)
Logged x axis
0
5
10
15
100Arrests for Assault (per 100 000)
Arr
ests
for
Mur
der
(per
100
000
)
UrbanPop
40
50
60
70
80
90
Violent Crime Rates by US State, 1973
![Page 28: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/28.jpg)
Conclusion
I ggplot2 works best with “long” format dataI One row per observation, rather than different obs in different
columnsI See “reshape2” package for easy conversion between “wide”
and “long” data formats
![Page 29: Intro to ggplot2 - Sheffield R Users Group, Feb 2015](https://reader034.vdocument.in/reader034/viewer/2022042701/55a931601a28ab40368b4590/html5/thumbnails/29.jpg)
Where to learn more
I Web documentation is a good place to startI http://docs.ggplot2.orgI Lots of examples on blogs, stackoverflow etc.I We have only scratched the surface here!I Why not bring some example data visualisations to the next
meeting?I Tweet your plots @Sheffield_R_