statistical writing. tables and figures (sven sandin)
TRANSCRIPT
Statistical Writing*
Tables and Figures
Sven Sandin,Dpt of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm
Scope
Tables and figures - General comments
The primary table: table 1
The work flow
Figure presentations to use and to avoid
Presentations : Figures & Tables
Summarize and focus results
Facilitate reproducing results
Help interpreting the RESULTS - Avoid busy tables … not all data are interesting
Table & Figure must be able to stand by itselfTitle - short, clear Footnotes explaining ALL abbreviations ….. Underlying model be clear Categorical covariates p-value: What's the hypothesis ?
Presentations: "Primary" table
Allow comparison of treatments (exposures)
Ideally (randomized) these should be "similar" ....
One column for each treatment
One row for each covariate
Confounding ...
Modifying of effect - sub tables
Presentations: "Primary" table
Allow comparison of treatments (exposures)Ideally these should be "similar" ....
One column for each treatmentOne row for each covariateConfounding ... Modifying of effect - sub tables
OutcomeTreatment
Confoundingcovariate
Tablecolumn
Table row
Presentations: "Primary" table
Allow comparison of treatments (exposures)Ideally these should be "similar" ....
One column for each treatmentOne row for each covariateConfounding ... Modifying of effect - sub tables
OutcomeTreatment
Confoundingcovariate
Tablecolumn
Table row
M
EXAMPLE: "Primary" table
Trolle-Lagerros, Y., Mucci, L. A., Kumle, M., Braaten, T., Weiderpass, E., Hsieh, C.-C., Sandin, S. … Adami, H.-O. (2005). Physical activity as a determinant of mortality in women. Epidemiology, 16(6), 780–785.
EXAMPLE: Table summarizing results
Trolle-Lagerros, Y., Mucci, L. A., Kumle, M., Braaten, T., Weiderpass, E., Hsieh, C.-C., Sandin, S. … Adami, H.-O. (2005). Physical activity as a determinant of mortality in women. Epidemiology, 16(6), 780–785.
Presentations: "Primary" table
Allow comparison of treatments (exposures)Ideally these should be "similar" ....
One column for each treatmentOne row for each covariateConfounding ... Modifying of effect - sub tables
Generally, don't test for baseline differences !If important ----> In the model already ---> No need to test !If not important ----> p-value not important ---> No need to test !Not known ---> No need to test !
p-values vs estimates ---> No need to test ! Estimate !confuse strength of association with importanceInflation of overall significance level ...
Presentations: Table work process
One-to-one relation
Data ----> Computer program ---> Table results
MethodDon't point-and-click (choice of software)Rerun all results each time ....... or use log bookIn your draft: Make notes about source, date...
Reproducibility !
Presentations : Tables
LayoutDecimalsAvoid using shading and colors
MeasuresNumber of missing data must be clearSurvival-type of analysis: Person year is the relevant measure
Binary data: Show one of the proportions, e.g. males
ContinuousMean or median (both to show symmetry)Q1 and Q3 or P10 and P90 etc. instead of Min and MaxSD not useful for asymmetric data
Presentations: Figures
Figures - examples
Continuous - Box plots
Ordinal - Segmented bar charts
Agreement - Altman Bland
Interactions
Confidence intervals
Bar charts with SD errors and other things to avoid
Figures - Box plot
Qualities
Meaning for any continuous data
Efficient when compare several groups
Minimizes data reduction
Interpretation
Half of the data between Q1 and Q3
Half above and half below the median
Difference between mean and median indicate lack of symmetry
Whiskers to ??? Tukey or percentiles
Outliers
Figures - Box plot
Figures - Box plot
#Data simulatedg=gl(10, 100, n*100) rnorm(n*100) + sqrt(as.numeric(g))boxplot(split(x,g), notch=TRUE)
Figures - Box plot
Wilcoxon rank sum test
Figures - Bar chart ± SD
Figures - Bar chart ± SD
t - test, a
ssuming symmetric data
Bar chart with SD errors
Often misinterpreted to be "different" or "not different" if error bars overlap or not
Why ± 1*SD ? it's 1.96 or 2 times SD that is relevant
A lot of ink to represent one (two) numbers: Mean and SD
Assume symmetry and normal distribution
Use the box plot instead !
Bar chart vs Box plot
Qualities
Meaning for any continuous data
Efficient when comparing several
groups
Minimizes data reduction
Interpretation
Half of the data between Q1 and Q3
Half above and half below the
median
Difference between mean and
median indicate lack of symmetry
Outliers
Qualities
NOT for any continuous data
NOT efficient when comparing
several groups
BIG reduction
Interpretation
?
?
?
Can't evaluate lack symmetry
Extremely sensitive to single outliers
Box plots Bar chart ± SD
Figures - Ordinal Scale
What do we want to achieve ?
What is an ordinal scale
Summarize data - not reducing
Evaluate distribution - Also cumulative
Change in distributions
Avoid problem with scattered tables
Integrated part of statistical analysis - test
Binary ?
Nominal ?
Figures - Ordinal ScaleICSI frozen, surgeryICSI fresh, surgery
IVF fresh
IVF frozen
ICSI fresh
ICSI frozen
N=12,775N=9,457N=142 N=1,699 N=6,886
Figures - Ordinal ScaleICSI frozen, surgeryICSI fresh, surgery
IVF fresh
IVF frozen
ICSI fresh
ICSI frozen
N=12,775N=9,457N=142 N=1,699 N=6,886
Wilcoxon rank sum test
Figures - Ordinal Scale
Trolle-Lagerros, Y., Mucci, L. A., Kumle, M., Braaten, T., Weiderpass, E., Hsieh, C.-C., Sandin, S. … Adami, H.-O. (2005). Physical activity as a determinant of mortality in women. Epidemiology, 16(6), 780–785.
Figures - Interaction
Trolle-Lagerros, Y, Mucci, LA, Kumle, M, Braaten, T, Weiderpass, E, Hsieh, CC, Sandin, S … Adami, HO (2005). Physical activity as a determinant of mortality in women. Epidemiology, 16(6), 780–785
Figure - Confidence intervals
Figure - Confidence intervals on log scale
Sandin, S, Nygren, KG, Iliadou, A, Hultman, CM, Reichenberg, A (2013). Autism and mental retardation among offspring born after in vitro fertilization. JAMA, 310(1), 75–84
Figure - Confidence intervals on log scale
Knight, A, Sandin, S, Askling, J (2010). Occupational risk factors for Wegener’s granulomatosis: a case-control study. Annals of the Rheumatic Diseases, 69(4), 737–740
Figure - Confidence intervals
Yang, L, Lof, M, Veierød, MB, Sandin, S, Adami, HO, Weiderpass, E (2011). Ultraviolet exposure and mortality among women in Sweden. Cancer Epidemiology, Biomarkers & Prevention: A Publication of the American Association for Cancer Research, Cosponsored by the American Society of Preventive Oncology, 20(4), 683–690
Figure - Confidence intervals
Knight, A, Sandin, S, & Askling, J (2010). Increased risk of autoimmune disease in families with Wegener’s granulomatosis. The Journal of Rheumatology, 37(12), 2553–2558
Figure - Confidence intervals
Overlapping CI's can be statistically significantly different
Scale: Ratio vs absolute (linear)
Tables with several comparisons can be hard to digest
Efficient in picking single effects
Efficient in picking out statistically significant results
Figures - Altman Bland
The problem
In a lab we have just bought a new robot. It is expected to be a lot
more accurate than the old one.
Can we just start using it or do we need to evaluate ? How ?
There are two variables measuring the effect of disease.
Can they be used interchangeable ?
Figures - Altman Bland
The problem
Compare two methods
What is our best guess of the truth ?
X and X-Y correlated
Y and X-Y correlated
Figures - Altman Bland
The problem
Compare two methods
What is our best guess of the truth ?
X and X-Y correlated
Y and X-Y correlated
The Figure
Calculate the mean X and Y
Calculate the difference X-Y
Plot Mean vs Difference
Draw reference line at D=0
Mean and Difference un-correlated
EXAMPLE : Altman-Bland
Bexelius, C, Löf, M, Sandin, S, Trolle Lagerros, Y, Forsum, E, Litton JE (2010). Measures of physical activity using cell phones: validation using criterion methods. Journal of Medical Internet Research, 12(1)