how to produce statistical graphics general clinical research center august 15, 2005 rachel enriquez
TRANSCRIPT
How to Produce Statistical Graphics
General Clinical Research Center
August 15, 2005
Rachel Enriquez
What are we going to talk about?
Why should we care about statistical graphics?
What is the theoretical framework for statistical graphics?
When do we make statistical graphics? How can we produce good quality
graphics?
Why do we care about statistical graphics?
Interpretation - good graphs help you understand your data.
Data visualization is part of analysis
Communication of results is the last step in the scientific process Many people can comprehend the results
better by seeing them in a figure than they can by reading them in a table.
Do you have an opinion?Graphics can help persuade.Objectively correct graphics can call attention
to the result you WANT the viewer to see.
Get Attention
Can you produce exceptional statistical graphics? (me neither)
Do you want people to know that you are committed to the scientific process?
If people understand your research, they’ll listen to you and do what you tell them to.
The Theory of Statistical Graphics
Data Visualization
Visually encode the data. Viewers decode the picture
Easy to figure outLearn something newSee the right comparisons
Hierarchy of Visual PerceptionPosition along common scale
Position along nonaligned scale
Length
Angle / Slope
Area
Color
Volume
Aesthetics
A personal matter Unless you ask Tufte Data / Ink ratio Avoid 3-D Fill patterns are bad Obtain good
resolution Text can be small
(in print)
Aesthetics
When do we make statistical graphics?
For preliminary analysis
Speed
16 24 32 40 48 56 64 72
0
5
10
15
20
25
30
35
Percent
Age
For Publication in Journals
Data density is good. Excellent resolution is
required. Color is difficult. Column width is a
consideration. MS office is frequently not
an option. Too many tables!
A plot is better
Confounding variable
TABLES - Consider the on-line supplement
Maybe…
Frequency of Reflux Symptoms1/wk 2-3/wk >3/wk
0
5
10
15
20
25
30
35
Esophageal Squamous Cell CarcinomaAdenocarcinoma of Gastric CardiaEsophageal Adenocarcinoma
1-2 points 2.5-4 points 4.5-6.5 points
Odd
s Ra
tio
RefluxSymptomScore Duration of Reflux<12 yr 12-20 yr >20 yr
5
10
15
20
25
30
35
5
10
15
20
25
30
VariableEsophageal
Adenocarcinom aAdenocarcinoma
of the G astric Cardia
Fre que ncy ofreflux symptoms
Re flu x symp to mdcore
D uration ofreflux symptoms
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
Esophageal Squamous-Cell Carcinom a
1/week 2-3/week >3/week 1/week 2-3/week >3/week 1/week 2-3/week >3/week
1 - 2 2.5 - 4 4.5 - 6.5 1 - 2 2.5 - 4 4.5 - 6.5 1 - 2 2.5 - 4 4.5 - 6.5
<12 yrs 12-20 yrs >20 yrs <12 yrs 12-20 yrs >20 yrs <12 yrs 12-20 yrs >20 yrs
The figure should be labeled!
Oral Presentations
HPR223 2004
The boxplot (3)
The space between the mean and the median shows you the data is not normally distributed.
If the data are normally distributed, 95% of the values should be inside of the upper fence (1.5 IQRs); 99% should be inside of 3 IQRs.
Data density should be moderate.
Color is available.
LABEL! Hope you
have interesting data
Posters
Smaller audience
Experimentation is good.
Graphics will bring you customers!
Experimentation may, or may NOT work.
How Do I do this?
How much time do you have?
It is not easy. There is no perfect, easy to use, cheap
software that is going to solve your problems.
This is not too hard
Books are not very helpful
Software changes quickly.
People use different software.
You want to do it NOW, not after reading for 5 hours.
Surfing the net is frequently useful.
Vector Graphics vs Bitmaps Vector graphics.
A set of instructions that tells the device how to display the document.
Adobe software is the most common way to edit vector graphics.
Bitmaps Resolution depends on the size of the computer file. Easy to open and publish on-line. Generally not accepted for publication.
Vector graphics can be made into bitmaps. Bitmaps cannot be made in vector graphics.
Bitmaps, compression, and enlarging Compression can be
‘lossy’
We are familiar with the grainy effect of enlargement.
Software SPSS
Many chart options Graphics can be edited Can export vector graphics.
SAS Known for poor graphics. However, some people produce very good graphs
with SAS. Hope SAS improves and use something else for now?
Stata Any comments?
R It is free. Produces good graphics that can be exported in various
formats. Infinitely customizable Difficult for the novice statistician / programmer R clinic
SyStat EpiInfo S+ Spotfire Prism – also available in GCRC computer lab. Others…..
Sigma Plot
Can be used with Excel and SPSS Opens other data formats Menu driven Multiple graphics options Easily produces compound graphics Exports graphics in multiple formats.
MS Office Windows Metafile is a vector graphic format. Excel
More control over graphics Limited selection of graph types User typically provides the S.E.s and effect estimates.
PowerPoint Surprisingly good at managing bitmaps. If you already use it, then improve your graphics by
applying aesthetic rules.
For example…
0
1
2
3
4
5
6
negative family history postive family historyhay fever asthma eczema current wheeze 0
1
2
3
4
5
6
negative family history postive family history
hay fever
asthma
eczema
current wheeze
Scanners
Scanned figures are an option. Good way to clean up figures from journals if you’re
proficient in Photoshop The bitmap resolution problem remains Which file format and program will avoid lossy
compression?
Art Software
As a novice graphic preparator, I appreciate the ability to draw on graphs.
Can also ‘cover’ unwanted parts with white shapes.
Group the resulting collection of shapes and save as a picture.
Adobe Illustrator Adobe
Photoshop These programs
may seem counter-intuitive at first use.
Paint, MS office, etc. Easy to use Bitmap
products.
Call the experts
The Medical Illustrators at VUMC will improve your graphs. $50/hr Average graph is
20 minutes. Grow your own
group ‘expert’.