lecture 4: what graphs do and making bar...
TRANSCRIPT
Admin G/B/U Few R
Lecture 4:What Graphs Do and Making Bar Charts
February 12, 2018
Admin G/B/U Few R
Overview
Course Administration
Good, Bad and Ugly
Few, Chapters 6
Bar Charts in R
Admin G/B/U Few R
Course Administration
1. Will return proposal comments during programming
2. Rosa has graded problem sets – thank you
3. Grades posted?
4. Missing anything from me?
Admin G/B/U Few R
Next Week’s Good Bad and Ugly
Monday by 9 am. Earlier is ok.
• Adam Brooks
• Gulfishan Khadim
Admin G/B/U Few R
This Week’s Good Bad and Ugly
• Kelsey Wilson
• Nathan Rupp
• Haley Dunn
Admin G/B/U Few R
Kelsey’s Example
Admin G/B/U Few R
Nathan’s Example
Chapter 4 | The Efficient A
SEAN
Scenario
93
4
Figure 4.1 ⊳ Trends in energy intensity and GDP per capita in selected countries, 1980 - 2011
Indonesia
Malaysia
Philippines
Thailand
Vietnam China India
toe per 1 000 dollars (2012, MER)
Dollars per capita (2012, MER)
10 000 8 000 6 000 4 000 2 000
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
OECD Singapore
Japan
Malaysia
10 000 20 000 30 000 40 000 50 000 60 000
0.1
0.2
0.3 1980
1990
2000
2011
Note: GDP is measured at market exchange rates (MER) in year-2012 dollars.
0 0
0 0
© OECD/IEA, 2013
Admin G/B/U Few R
Haley’s Example
Admin G/B/U Few R
Few:Fundamental Variations of Graphs
Admin G/B/U Few R
Today
1. Types of graphs
2. What you can communicate, by graph
Admin G/B/U Few R
1. Types of Graphs
• Points
• Lines
• Bars
• Boxes
• Shapes with varying 2-D areas
• Lines
Admin G/B/U Few R
Why to Avoid 2-D Sizes
Admin G/B/U Few R
Graph Design Solutions
As we go through these, we’ll discuss policy examples
1. Nominal Comparisons
2. Time Series Designs
3. Ranking Designs
4. Part-to-Whole Designs
5. Deviation Designs
6. Distribution Designs
7. Correlation Designs
8. Geospatial Designs
Admin G/B/U Few R
1. Nominal Comparisons
• Use bars: horizontal or vertical
• Or points to compare values
• Possible for not “too many” values
Admin G/B/U Few R
For Example
Admin G/B/U Few R
2. Time Series Designs
• Present data over time: months, days, hours, years, decades,...
• Almost required to use horizontal axis left to right for time
• And usually a connected line, with or without dots
• If time intervals are not consistent, then maybe dots or bars
• Lines indicate connection between observations, so watch outif you’re using them in another context
Admin G/B/U Few R
For Example
��������� ������ ��������
������������ �� ������������������������� ���!"#�$%�&'�%�&'���������������������()*'&+��'�((�'+�+&���,���*-��������*��%./� ���
012345678997999:;<3=4<>9?@6<;A2BC<D
EF<A=GFH;:I22J=;KFBGBA;LMGN22KFBGBA;LOPQO2B;R
O2:;:;<3=4<S:2HT1NJGH;<?A2HU
OG:T;4<3HHG:RVWXY2H;QMPZ[WXYLK;1\78[96]OZP̂6\?86_PE9?99>9?99̀ Dab;B LYFJN Lc2d L OT4AGb 6?e8X]fZ:G4F2 5@?9eEFgRF;=C 9?h8̀KFBGBAFG=B;d<7A2HbG:F<2B<GBCH2:;�))� �))' ���� ���' ���� ���'����������(�(�i�i�������WXYIGH;<T1NJGH;<?A2Hf]=GR2g;:5@999GHGjFBJS:;;JGH;<2BklmIGH;<?6]=GR;:]=GR6b=GR;:JGH;<2BWXYIGH;<7d;NGg;4N;1;<4??? P3b;:OG:F2@8]=GRP3b;:OG:F2@8JGH;2B=FB;G0A4F2BIGH;<IGH;<2BWXY???P3b;:PHG<NK=G<N6]=GRP3b;:PHG<NK=G<N6JGH;2B=FB;GK:;;IGH;<2BWXY??? ]2T;H2B]=GRb2T;H2BJGH;<2B=FB;?I22B;bFAGCg;B43:;424:GFBGBC???KFJN4FBJKFJN4421;A2H;4N;3=4FHG4;dG::F2:FBHGBR2S23:nJN4FBJ??? 0BFH;XG44=;6?6]=GR0BFH;XG44=;6?6JGH;2B=FB;G0A4F2BIGH;<IGH;<2BWXY???WXY[P3HHG:RS2:WXY2H;LMGN22KFBGBA;N44b<[ffnBGBA;?RGN22?A2Hfo324;fWXYfeCGR<GJ2LpF;d4N;1G<FAklm<42ATANG:42BMGN22KFBGBA;?qNGBJ;4N;CG4;:GBJ;7ANG:44Rb;GBCA2HbG:;WXY2H;GJGFB<424N;:A2HbGBF;<?WXYP42AT]:FA;LWXY2H;P42ATr324;>_?P?[QMPZDLOG:T;4sG4ANN44b<[ffddd?HG:T;4dG4AN?A2HfFBg;<4FBJf<42ATfT1NWXY2H;<42ATb:FA;4G:J;4:GF<;C42t89S:2Hth8G4uXqqGbF4G=????WXY2H;1;G4<;vb;A4G4F2B<7;vb;A4<w<2=FCC;HGBCwS2:N23<FBJFB695x????WXY2H;<42ATSGF:gG=3;;<4FHG4;:GF<;C42th9S:2Ht6@G4OWO]G:4B;:<?WXYWXY2H;P42ATr324;LKFBgFj?A2HN44b<[ffnBgFj?A2Hfo324;?G<Nvy4zWXY{GBL6eL5x98[59]O7WXY2H;E;A=G:;<KF:<4r3G:4;:695xEFgFC;BCX3<FB;<<sF:;?{GBL68L5x9x[990O7WXY2H;ZvbGBC<Y;G=4NRY2H;K;G43:;<|}B4:2C3A;<Q;dWdFT<;4~]:2C3A4<X3<FB;<<sF:;?{GBL5\L5x59[e@0O7WXY2H;[klmL_P[ZG:BFBJ<0BG=R<F<[r87695�XR4N;Q3H1;:<[{GB3G:R5\7695xqGbF4G=q31;?WXY6\?86�9?\h>WXY2H;DLO2:BFBJ<4G:ddd?H2:BFBJ<4G:?A2Hf<42AT<fvBR<fT1Nfo324;?N4H=O2:BFBJ<4G::;G=L4FH;<42ATo324;<7J:GbN<7GBCFBC;b;BC;B4GBG=R<F<S2:klmT;;bR23FBS2:H;C?P4GR3b42CG4;dF4NWXY2H;<42ATb:FA;GBCGBG=R<4:G4FBJ<2BO2:BFBJ<4G:?A2H?
q2HbGBRWXY2H;F<GN2H;13F=CFBJA2HbGBR1G<;CFB4FB5\e�G<WG3SHGB�X:2GCFBE;4:2F47OFANFJGB421;4:GC;C2B4N;QMPZG<GN2H;13F=C;:GBCA2HbGBRS:2H69994N:23JN699x?sFTFb;CFGm������������c2<0BJ;=;<7q0����{;SS:;R̂?O;jJ;:>Q2g699@�D������5>xxxDe68L@@h���������Z=FX:2GC��������5\e���������h?e\1F==F2B_PE>695@D]:2n=;<];2b=;G=<2<;G:ANS2:cFBT;C}B KGA;122T ]FB4;:;<4c;BBG:q2:b2:G4F2B ]3=4;I:23b 2̂==X:24N;:< EEF<A=GFH;:
5CGR eCGR 5H2B4N hH2B4N 5R;G: eR;G: HGv��� KFBGBA; OGb< Q;d< PN2bbFBJ O2:; P;44FBJ< 2̂2=<���
Admin G/B/U Few R
3. Ranking Designs
• Like nominal, but ranked
• So use a bar chart
• And sort by value
• Put the item you want to call attention to at top or left
Admin G/B/U Few R
For Example
Courtesy of this site
Admin G/B/U Few R
4. Part-to-Whole Designs
• Comparison of shares
• Use simple bar
• Use stacked bar only when you want to compare acrosscategories
• So use a bar chart
• And sort by value
• Put the item you want to call attention to at top or left
Admin G/B/U Few R
For Example
Courtesy of this site
Admin G/B/U Few R
5. Deviation Designs
• Highlight differences across types
• Paired bars
• Doesn’t work too well with too many comparison categories
• Use stacked bar only when you want to compare acrosscategories
• More sophisticated (not in Few): scatterplot and compare to45 degree line
Admin G/B/U Few R
For Example
Admin G/B/U Few R
6. Distribution Designs
• Distributions can be continuous or by bin
• And you want to display one or many
• Use a bar chart
• Or a line chart
• Or a box plot – not too keen on these
Admin G/B/U Few R
For Example
Courtesy of this site
Admin G/B/U Few R
7. Correlation Designs
• Scatter chart
• Or scatter and trend line
• You can enhance scatter with color and weight variations
• Too many variations are not comprehensible
• No pic as you know this one
Admin G/B/U Few R
8. Geospatial Designs
• Map with
• Color fills
• Lines
• Much more on this later in the course
Admin G/B/U Few R
Bar Charts in R
Admin G/B/U Few R
Today’s Goals
• A few non-graph commands• ifelse• data.frame, c
• For graphing, via ggplot
• geom bar()• geom text(), geom label()• theme()
Admin G/B/U Few R
A Basic Programming Command: ifelse
data$var <- ifelse(test expression, [outcome if
true], [outcome if false])
• var
• Outcome is a variable in the dataframe data• Or something to do, instead of a variable
• test expression
• an expression that is evaluated, e.g. x > y?, a = b?
• After evaluation• if x > y , then you get the outcome if true – the second
element• if X < y , then you get the outcome if false – the third element• you can nest another ifelse in the third one
Admin G/B/U Few R
Get Started and Make Your Own Dataframe: data.frame
dfname <- data.frame(col1 =, col2 =, ...)
• data.frame creates a dataframe called dfname
• write column as name = c("e1", "e2", ... "en")
Make a dataframe
newframe <- data.frame(fruit = c("apples", "bananas","pomegranates"),
price.per.lb = c("2.49", "0.79","6"),
junk = rep(1,length(c("apples",
"bananas","pomegranates"))))
newframe
## fruit price.per.lb junk## 1 apples 2.49 1## 2 bananas 0.79 1## 3 pomegranates 6 1
Admin G/B/U Few R
Meryl’s Example from Last Class
A Dataframe from Last Class’s Bad Graph# load north korean datankd <- data.frame(year = c("2000","2001","2002","2003",
"2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017"),
defectors = c("0","0","1","0","0","0","0","0","2","0","1","0","3","0","0","1","1","4"))
nkd
## year defectors## 1 2000 0## 2 2001 0## 3 2002 1## 4 2003 0## 5 2004 0## 6 2005 0## 7 2006 0## 8 2007 0## 9 2008 2## 10 2009 0## 11 2010 1## 12 2011 0## 13 2012 3## 14 2013 0## 15 2014 0## 16 2015 1## 17 2016 1## 18 2017 4
Admin G/B/U Few R
And on to ggplot
For today we’re exploring
• geom bar
• geom text
• theme
Making a Bar Chart
# make a bar chartlibrary(ggplot2)ggplot(nkd, aes(x=defectors)) + geom_bar() +
labs(x = "annual number of defectors",y="number of years")
I call the ggplot libraryI a similar first part to last lecture:
I x axis is determined by quantity of defectorsI tell R we want a bar chart with geom_bar
I default is to count the total number of observations by type(defectors)
I labs() makes the graph comprehensible
What It Looks Like
0
3
6
9
0 1 2 3 4
annual number of defectors
num
ber
of y
ears
Other Options for geom_bar()
I if you want R to use the value in the dataframe, rather thancounting observations, use geom_bar(stat="identity")
I you can control aesthetics within the bar viageom_bar(aes(fill= [something])), useful for stackedgraphs
I you can weight the totalsI zillions more are available
geom_text() to Put Things on Your Chart
I puts variable value (maybe a fruit name) where you say basedon the value of another variable
I very powerful: need to set up data the right way to use thispower
geom_text() Example
Adding text to the chart with geom_text, telling R that themapping for labels is divisions$div.name.
ggplot(data = divisions, aes(x = division, y=ppl.by.cnty)) +geom_bar(stat = "identity") +ggtitle("counties by division using summarized data") +coord_flip() +labs(x="", y="people per county") +geom_text(mapping = aes(label=div.name))
What This Looks Like
New England
Mid−Atlantic
East North Central
West North Central
South Atlantic
East South Central
West South Central
Mountain
Pacific
2.5
5.0
7.5
0e+00 1e+05 2e+05 3e+05
people per county
counties by division using summarized data
Fixing the Previous
# make labels legibledivisions$nada <- c(rep(0, length(divisions$div.name)))ggplot(data = divisions, aes(x = division,
y=ppl.by.cnty)) +geom_bar(stat = "identity") +ggtitle("counties per capita by division
using summarized data") +coord_flip() +labs(x="", y="people per county") +geom_text(mapping = aes(y=nada, label=div.name),
hjust = 0)
I make a new variable that tells R where to put the name
How Does it Look?
New England
Mid−Atlantic
East North Central
West North Central
South Atlantic
East South Central
West South Central
Mountain
Pacific
2.5
5.0
7.5
0e+00 1e+05 2e+05 3e+05
people per county
counties per capita by division using summarized data
ggplot’s theme Commands
I a theme is a set of commands that standardize the look of thegraph
I ggplot has a built-in defaultI you can choose another defaultI or modify the themeI we’ll focus on the latter
Modifying the Default Theme
I there are > 60 different parts of the default theme, includingI axis.ticks.x()I legend.title()I legend.box.margin()
I see them all hereI in this class we mostly get rid of parts by adding the below to
the ggplot commandI theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),axis.ticks.y=element_blank())
Admin G/B/U Few R
Try Today’s Tutorial
• Pay attention to the output of each bit
• Go forth!
Admin G/B/U Few R
Next Lecture
• Turn in PS 4
• Read Few Chapter 6
• R Graphics Cookbook, Chapter 4
• Next policy brief deadline: April 2 for draft