r workshop iii -- 3 hours to learn ggplot2 series
DESCRIPTION
NYC Data Science Academy, NYC Open Data Meetup, Big Data, Data Science, NYC, Vivian Zhang, SupStat Inc, R programming, R workshopTRANSCRIPT
![Page 1: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/1.jpg)
Learn ggplot2A K f SkillAs Kungfu Skills
Given byKai Xiao(Data Sciencetist)
Vi i Zh (C f d & CTO)Vivian Zhang(Co-founder & CTO)
Contact: vivian zhang@supstat comContact: [email protected]
![Page 2: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/2.jpg)
• I: Point
• II: Bar
• III:Histogramg
• IV:Line
• V: Tile• V: Tile
• VI:Map
![Page 3: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/3.jpg)
IntroductionIntroduction• ggplot2!is!a!plotting!system!for!R
• based!on!the《The Grammar!of!Graphics》
• which!tries!to!take!the!good!parts!of!base!and!lattice!graphics!and!none of the bad partsnone!of!the!bad!parts
• It!takes!care!of!many!of!the!fiddly!details that make plotting a hassledetails!that!make!plotting!a!hassle
• It!becomes!easy!to!produce!complex!multi‐layered!graphicsp y g p
![Page 4: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/4.jpg)
Why we love ggplot2?Why we love ggplot2?• control the plot as abstract layers and make creativity become reality;
d l hi ki• get used to structural thinking;
• get beautiful graphics while avoiding complicated details
1973 murder cases in USA
![Page 5: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/5.jpg)
7 Basic Concepts7 Basic Concepts
Mapping• Mapping
• Scale
• Geometric
• StatisticsStat st cs
• Coordinate
L• Layer
• Facet
![Page 6: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/6.jpg)
MappingMapping
M i t l l ti b t i blMapping controls relations between variables
![Page 7: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/7.jpg)
ScaleScale
Scale will present mapping on coordinate scalesScale will present mapping on coordinate scales.
Scale and Mapping is closely related concepts.
![Page 8: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/8.jpg)
GeometricGeometric
Geom means the graphical elements such asGeom means the graphical elements, such as
points, lines and polygons.
![Page 9: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/9.jpg)
StatisticsStatistics
Stat enables us to calculate and do statisticalStat enables us to calculate and do statistical
analysis based, such as adding a regression line.
StatStat
![Page 10: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/10.jpg)
CoordinateCoordinate
Cood will affect how we observe graphicalCood will affect how we observe graphical
elements. Transformation of coordinates is useful.
Stat Coord
![Page 11: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/11.jpg)
LayerLayerComponent: data, mapping, geom, stat
i l ill ll bli h lUsing layer will allow users to establish plots stepby step. It become much easier to modify a plot.
![Page 12: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/12.jpg)
FacetFacetFacet splits data into groups and draw each
group separately. Usually, there is a order.
![Page 13: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/13.jpg)
7 Basic Concepts7 Basic Concepts
Mapping• Mapping
• Scale
• Geometric
• StatisticsStat st cs
• Coordinate
L• Layer
• Facet
![Page 14: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/14.jpg)
Skill I:PointSkill I:Point
![Page 15: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/15.jpg)
Sample data--mpgSample data mpg• Fuel�economy�data�from�1999�and�2008�for�38�popular�y p pmodels�of�car
D il• Details• Displ :!!!!!!!!!!!!!!!engine!displacement,!in!litres• Cyl: number of cylinders• Cyl:!!!!!!!!!!!!!!!!!!!!number!of!cylinders!• Trans:!!!!!!!!!!!!!!!!type!of!transmission!• Drv:!!!!!!!!!!!!!!!!!!!front‐wheel,!rear!wheel!drive,!4wd!, ,• Cty:!!!!!!!!!!!!!!!!!!!!city!miles!per!gallon!• Hwy:!!!!!!!!!!!!!!!!!!highway!miles!per!gallon!
![Page 16: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/16.jpg)
> library(ggplot2)
> str(mpg)
'data.frame': 234 obs. of 14 variables:
$ manufacturer: Factor w/ 15 levels "audi","chevrolet",..:
$ model : Factor w/ 38 levels "4runner 4wd",..:
$ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
$ year : int 1999 1999 2008 2008 1999 1999 2008 1999$ year : int 1999 1999 2008 2008 1999 1999 2008 1999
$ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
$ trans : Factor w/ 10 levels "auto(av)","auto(l3)",..:
$ d 3 l l f$ drv : Factor w/ 3 levels "4","f","r":
$ cty : int 18 21 20 21 16 18 18 18 16 20 ...
$ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
$ fl : Factor w/ 5 levels "c","d","e","p",..:
$ class : Factor w/ 7 levels "2seater","compact",..:
![Page 17: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/17.jpg)
p <‐ ggplot(data=mpg mapping=aes(x=cty y=hwy))aesthetics
p!< ggplot(data mpg,!mapping aes(x cty,!y hwy))p!+!geom_point()
![Page 18: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/18.jpg)
>!summary(p)!data:!manufacturer,!model,!displ,!year,!cyl,!trans,!drv,!cty,!hwy,!fl l [ ]fl,!class![234x11]!mapping:!x!=!cty,!y!=!hwyfaceting:!facet_null()!g _ ()
>!summary(p+geom_point())data: manufacturer model displ year cyl trans drv cty hwydata:!manufacturer,!model,!displ,!year,!cyl,!trans,!drv,!cty,!hwy,fl,!class![234x11]mapping:!!x!=!cty,!y!=!hwyf i f ll()faceting:!facet_null()!‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐geom_point:!na.rm!=!FALSE!g _pstat_identity:!!position_identity:!(width!=!NULL,!height!=!NULL)
![Page 19: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/19.jpg)
p + geom point(color='red4' size=3)p!+!geom_point(color red4 ,size 3)
![Page 20: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/20.jpg)
# add one more layer--colorp <- ggplot(mpg,aes(x=cty,y=hwy,colour=factor(year)))p ggp ( pg, ( y,y y, (y )))p + geom_point()
![Page 21: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/21.jpg)
# add one more stat (loess: local partial polynomial regression)>!p!+!geom_point()!+!stat_smooth()
![Page 22: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/22.jpg)
p!<‐ ggplot(data=mpg,!mapping=aes(x=cty,y=hwy))p!+!geom_point(aes(colour=factor(year)))+p g _p ( ( (y )))stat_smooth()
![Page 23: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/23.jpg)
Two equally ways to draw
p <‐ ggplot(mpg aes(x=cty y=hwy))
q y y
p!!<‐ ggplot(mpg,!aes(x=cty,y=hwy))p!!+!geom_point(aes(colour=factor(year)))+
stat smooth()_ ()
()d!<‐ ggplot()!+geom_point(data=mpg,!aes(x=cty,!y=hwy,!colour=factor(year)))+stat_smooth(data=mpg,!aes(x=cty,!y=hwy))
print(d)
![Page 24: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/24.jpg)
Beside the “white paper”canvas, we will find geom and statcanvas.>!summary(d)data:![0x0][ ]faceting:!facet_null()!‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐mapping: x = cty y = hwy colour = factor(year)mapping:!x!=!cty,!y!=!hwy,!colour =!factor(year)!geom_point:!na.rm!=!FALSE!stat_identity:!!position_identity:!(width!=!NULL,!height!=!NULL)
mapping:!x!=!cty,!y!=!hwy!pp g y, y ygeom_smooth:!!stat_smooth:!method!=!auto,!formula!=!y!~!x,!se!=!TRUE,!n = 80 fullrange = FALSE level = 0 95 na rm = FALSEn!=!80,!fullrange =!FALSE,!level!=!0.95,!na.rm!=!FALSE!position_identity:!(width!=!NULL,!height!=!NULL)
![Page 25: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/25.jpg)
# Using scale() function, we can control color of scale.p + geom_point(aes(colour=factor(year)))+
h()stat_smooth()+scale_color_manual(values =c('steelblue','red4'))
![Page 26: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/26.jpg)
# We can map “displ”to the size of pointp + geom_point(aes(colour=factor(year),size=displ))+
h()stat_smooth()+scale_color_manual(values =c('steelblue','red4'))
![Page 27: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/27.jpg)
#!We!solve!the!problem!with!overlapping!and!point!being!too!smallp!+!geom_point(aes(colour=factor(year),size=displ),!alpha=0.5,position!=!"jitter")+stat_smooth()+scale_color_manual(values!=c('steelblue','red4'))+scale_size_continuous(range!=!c(4,!10))
![Page 28: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/28.jpg)
#!We!change!the!coordinate!system.p!+!geom_point(aes(colour=factor(year),size=displ),!alpha=0.5,position!=!"jitter")+stat_smooth()+scale_color_manual(values!=c('steelblue','red4'))+scale_size_continuous(range!=!c(4,!10))!+!!!!coord_flip()
![Page 29: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/29.jpg)
p!+!geom_point(aes(colour=factor(year),size=displ),alpha=0.5,position!=!"jitter")+
stat smooth()+_ ()scale_color_manual(values!=c('steelblue','red4'))+scale_size_continuous(range!=!c(4,!10))!!+!!!!coord_polar()
![Page 30: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/30.jpg)
p!+!geom_point(aes(colour=factor(year),size=displ),alpha=0.5,position!=!"jitter")!!+!!!!stat_smooth()+
scale color manual(values!=c('steelblue','red4'))+_ _ ( ( , ))scale_size_continuous(range!=!c(4,!10))+!!!!!!!!!!!!!!!!!!!coord_cartesian(xlim =!c(15,!25),!ylim=c(15,40))
![Page 31: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/31.jpg)
# Using facet() function, we now split data and draw them by groupp + geom_point(aes(colour=class,size=displ),
alpha=0.5,position = "jitter")+hstat_smooth()+
scale_size_continuous(range = c(4, 10))+facet_wrap(~ year,ncol=1)
![Page 32: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/32.jpg)
# Add plot name and specify all information you want to addp <- ggplot(mpg, aes(x=cty,y=hwy))p + geom_point(aes(colour=class,size=displ),
alpha=0.5,position = "jitter")+ stat_smooth()+l i ti ( (4 10))scale_size_continuous(range = c(4, 10))+
facet_wrap(~ year,ncol=1) + opts(title='model of car and mpg')+labs(y='driving distance per gallon on highway', x='driving distance per gallon on city road',
size='displacement', colour ='model')
![Page 33: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/33.jpg)
# scatter plot for diamond datasetp <- ggplot(diamonds,aes(carat,price))p ggp ( , ( ,p ))p + geom_point()
![Page 34: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/34.jpg)
# use transparency and small size pointsp + geom_point(size=0.1,alpha=0.1)p g _p ( , p )
![Page 35: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/35.jpg)
# use bin chart to observe intensity of pointsp + stat_bin2d(bins = 40)p _ ( )
![Page 36: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/36.jpg)
# estimate data dentisyp + stat_density2d(aes(fill = ..level..), geom="polygon") +
coord cartesian(xlim = c(0 1 5) ylim=c(0 6000))coord_cartesian(xlim = c(0, 1.5),ylim=c(0,6000))
![Page 37: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/37.jpg)
Skill II:BarSkill II:Bar
![Page 38: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/38.jpg)
Skill III:HistogramSkill III:Histogram
![Page 39: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/39.jpg)
Skill IV:LineSkill IV:Line
![Page 40: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/40.jpg)
Skill V:TileSkill V:Tile
![Page 41: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/41.jpg)
Skill VI:MapSkill VI:Map
![Page 42: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/42.jpg)
ResourcesResources
http://learnr.wordpress.com
Redraw all the lattice graphRedraw all the lattice graph
by ggplot2
![Page 43: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/43.jpg)
ResourcesResources
All the examples are done by
ggplot2ggplot2.
![Page 44: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/44.jpg)
ResourcesResources
• http://wiki stdout org/rcookbook/Graphs/http://wiki.stdout.org/rcookbook/Graphs/
• http://r-blogger.com
htt //St k fl• http://Stackoverflow.com
• http://xccds1977.blogspot.com
• http://r-ke.info/
• http://www.youtube.com/watch?v=vnVJJYi1
mbw
![Page 45: R workshop iii -- 3 hours to learn ggplot2 series](https://reader033.vdocument.in/reader033/viewer/2022051612/54c662f54a79594b538b46ce/html5/thumbnails/45.jpg)
Thank you! Come back for more!
Sign up at: www.meetup.com/nyc-open-dataGive feedback at: www.bit.ly/nycopen