beautiful tables in r: the tables package
Post on 16-Jan-2017
233 Views
Preview:
TRANSCRIPT
About tables in general The tables package References
Beautiful tables in R: the tables package
Duncan Murdoch
Department of Statistical and Actuarial SciencesUniversity of Western Ontario
November 29, 2013
1 of 28
About tables in general The tables package References
Outline
1 About tables in general
2 The tables package
2 of 28
About tables in general The tables package References
Outline
1 About tables in general
2 The tables package
3 of 28
About tables in general The tables package References
Tables aren’t easy
Gelman (2011) “Why Tables are Really Much Better ThanGraphs” is a tongue-in-cheek article defending the use ofgraphs rather than tables. It presents poor arguments “for”tables, and refutes them in favour of graphs.
Sometimes tables are better than graphs, but it’s not easyto create good tables (or good graphs).
4 of 28
About tables in general The tables package References
Tables aren’t easy
Gelman (2011) “Why Tables are Really Much Better ThanGraphs” is a tongue-in-cheek article defending the use ofgraphs rather than tables. It presents poor arguments “for”tables, and refutes them in favour of graphs.Sometimes tables are better than graphs, but it’s not easyto create good tables (or good graphs).
4 of 28
About tables in general The tables package References
Some quotes from Gelman’s paper
A table is not meant to be read as a narrative, so donot obsess about clarity. It is much more important toput in the exact numbers, as these represent the mostimportant summary of your results. . .
5 of 28
About tables in general The tables package References
Some quotes from Gelman’s paper
It is also helpful in a table to have a minimum of foursignificant digits. A good choice is often to use thedefault provided by whatever software you have usedto fit the model. Software designers have chosen theirdefaults for a good reason, and I would go with that.
6 of 28
About tables in general The tables package References
The depressing truth
The depressing truth is that many authors follow the previouspieces of advice (and others in the paper). I would postexamples, but I’d rather not embarrass those authors.
7 of 28
About tables in general The tables package References
Principles of good tables
Ehrenberg (1977) is an excellent paper about producing tables.Some advice:
Round to two significant or effective digits.Display row and column averages.Put items to be compared in the same column, one abovethe other.Order rows and columns by size.Don’t insert too much white space: things to be comparedshould be close to each other, but add gaps every 5 or sorows to help the eye travel across the table.
This advice should be considered, not followed blindly: tablesare meant for communication.
8 of 28
About tables in general The tables package References
Outline
1 About tables in general
2 The tables package
9 of 28
About tables in general The tables package References
How to produce good tables?
I don’t think authors want to produce bad tables, I think theydon’t know better, or don’t know how to do better, so I wrote theR package tables (Murdoch, 2013) to make it easy toproduce good tables.
10 of 28
About tables in general The tables package References
Background of my package
Many years ago, I loved SAS PROC TABULATE, whichmade it pretty easy to do the computations necessary toproduce good tables.My package tables improves on PROC TABULATE, byworking well with Sweave and LATEX. R is a particularlynatural choice for this, much more flexible than SAS.
11 of 28
About tables in general The tables package References
What is a table?
A rectangular array of numbers or text or pictures.Labels on the rows and columns. These may covermultiple entries, and may be nested.A caption.
The formula interface in tables handles the body and thelabels. LATEX can handle the captions.
12 of 28
About tables in general The tables package References
Fisher’s Iris Data
My examples work with Fisher’s famous iris dataset:> head(iris,10)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa7 4.6 3.4 1.4 0.3 setosa8 5.0 3.4 1.5 0.2 setosa9 4.4 2.9 1.4 0.2 setosa10 4.9 3.1 1.5 0.1 setosa
13 of 28
About tables in general The tables package References
Group summaries
> booktabs() # Choose "booktabs" style
> latex(tabular(Species ~ (Sepal.Length+ + Sepal.Width)+ *(mean + sd),+ data=iris))
14 of 28
About tables in general The tables package References
Group summaries
\begin{tabular}{lcccc}\toprule& \multicolumn{2}{c}{Sepal.Length} & \multicolumn{2}{c}{Sepal.Width} \\ \cmidrule(lr){2-3}\cmidrule(lr){4-5}
Species & mean & sd & mean & \multicolumn{1}{c}{sd} \\\midrulesetosa & $5.006$ & $0.3525$ & $3.428$ & $0.3791$ \\versicolor & $5.936$ & $0.5162$ & $2.770$ & $0.3138$ \\virginica & $6.588$ & $0.6359$ & $2.974$ & $0.3225$ \\\bottomrule\end{tabular}
15 of 28
About tables in general The tables package References
Group summaries
Sepal.Length Sepal.Width
Species mean sd mean sd
setosa 5.006 0.3525 3.428 0.3791versicolor 5.936 0.5162 2.770 0.3138virginica 6.588 0.6359 2.974 0.3225
Fewer digits!
16 of 28
About tables in general The tables package References
Group summaries
Sepal.Length Sepal.Width
Species mean sd mean sd
setosa 5.006 0.3525 3.428 0.3791versicolor 5.936 0.5162 2.770 0.3138virginica 6.588 0.6359 2.974 0.3225
Fewer digits!
16 of 28
About tables in general The tables package References
Fewer digits
> latex(tabular(Species ~ Format(digits=2)+ *(Sepal.Length+ + Sepal.Width)+ *(mean + sd),+ data=iris))
17 of 28
About tables in general The tables package References
Fewer digits
Sepal.Length Sepal.Width
Species mean sd mean sd
setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32
Marginal summaries!
18 of 28
About tables in general The tables package References
Fewer digits
Sepal.Length Sepal.Width
Species mean sd mean sd
setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32
Marginal summaries!
18 of 28
About tables in general The tables package References
Marginal summaries
> latex(tabular(Species + 1 ~ Format(digits=2)+ *(Sepal.Length+ + Sepal.Width+ + 1)+ *(mean + sd),+ data=iris))
19 of 28
About tables in general The tables package References
Marginal summaries (Oops...)
Sepal.Length Sepal.Width All
Species mean sd mean sd mean sd
setosa 5.01 0.35 3.43 0.38 NA NAversicolor 5.94 0.52 2.77 0.31 NA NAvirginica 6.59 0.64 2.97 0.32 NA NAAll 5.84 0.83 3.06 0.44 NA NA
20 of 28
About tables in general The tables package References
Marginal summaries
> latex(tabular(Species + 1 ~ Format(digits=2)+ *(Sepal.Length+ + Sepal.Width)+ *(mean + sd),+ data=iris))
21 of 28
About tables in general The tables package References
Marginal summaries
Sepal.Length Sepal.Width
Species mean sd mean sd
setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32All 5.84 0.83 3.06 0.44
Spacing!
22 of 28
About tables in general The tables package References
Marginal summaries
Sepal.Length Sepal.Width
Species mean sd mean sd
setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32All 5.84 0.83 3.06 0.44
Spacing!
22 of 28
About tables in general The tables package References
Spacing
> latex(tabular(Species+ + Hline(2:5) + 1+ ~ Format(digits=2)+ *(Sepal.Length+ + Sepal.Width)+ *(mean + sd),+ data=iris))
23 of 28
About tables in general The tables package References
Spacing
Sepal.Length Sepal.Width
Species mean sd mean sd
setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32
All 5.84 0.83 3.06 0.44
Better labels!
24 of 28
About tables in general The tables package References
Spacing
Sepal.Length Sepal.Width
Species mean sd mean sd
setosa 5.01 0.35 3.43 0.38versicolor 5.94 0.52 2.77 0.31virginica 6.59 0.64 2.97 0.32
All 5.84 0.83 3.06 0.44
Better labels!
24 of 28
About tables in general The tables package References
Better labels
> names <- paste("\\textit{Iris",+ levels(iris$Species), "}")> latex(tabular(Factor(Species, levelnames=names)+ + Hline(2:5) + 1+ ~ Format(digits=2)+ *(Heading("Sepal length")*Sepal.Length+ + Heading("Sepal width")*Sepal.Width)+ *(mean + sd),+ data=iris))
25 of 28
About tables in general The tables package References
Better labels
Sepal length Sepal width
Species mean sd mean sd
Iris setosa 5.01 0.35 3.43 0.38Iris versicolor 5.94 0.52 2.77 0.31Iris virginica 6.59 0.64 2.97 0.32
All 5.84 0.83 3.06 0.44
26 of 28
About tables in general The tables package References
What’s in a formula?
Terms in a formula can be:
function names Summary statistics, e.g. mean.factors Categories, e.g. Species.logical vectors Subsets.other vectors Values to be summarized.“pseudo-functions” Things that handle formatting, e.g. Format.formula functions Abbreviate formulas, e.g. Hline
27 of 28
About tables in general The tables package References
References I
A. S. C. Ehrenberg. Rudiments of numeracy. Journal of the RoyalStatistical Society, Series A, 140:277–297, 1977.
Andrew Gelman. Why tables are really much better than graphs.Journal of Computational and Graphical Statistics, 20:3–7, 2011.
Duncan Murdoch. tables: Formula-driven table generation, 2013. Rpackage version 0.7.64, on CRAN.
Read the vignette in tables for lots of details and examples.
28 of 28
top related