basics of analytical graph theory handouts/basic… · basics of analytical graph theory • graph...
Post on 12-Oct-2020
3 Views
Preview:
TRANSCRIPT
1
Analytical Graphing
lets start with the best graph ever made
“Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses
suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border,
the thick band shows the size of the army at each position. The path of Napoleon's retreat from Moscow
in the bitterly cold winter is depicted by the dark lower band, which is tied to temperature and time
scales.”
“The graph illustrates an amazing point — how an army of 400,000 can dwindle to 10,000 without
losing a single major battle.”
2
When is a graph appropriate?
• Always for data exploration
• Often for data analysis and to develop
predictions (models) and experimental
designs
• Sometimes for presentations
• Less often for publications
Data Exploration
• Is not snooping in the pejorative sense. Exploration is a necessary and desired operation for:
– Checking data for unusual values
– Making sure the data meet the assumptions of the chosen form of analysis
• Eg – normality, homogeneity of variances, linearity (in regression approaches)
– deciding (sometimes) what sort of analysis to do. This hopefully will have been done prior to initiating a study
– To look for patterns that may not be expected or apparent – this is indeed snooping but it is an essential part of hypothesis formation
3
• Checking data for unusual values
• Making sure the data meet the assumptions of the chosen form of analysis
See ourworld - pop_86
Data Exploration
0 50 100 150
0 50 100 150
Population of countries (1986)
0
10
20
30
40
50
Count
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Pro
portio
n p
er B
ar
1.0 10.0 100.0 0
5
10
15
20
Count
0.0
0.1
0.2
0.3
Pro
portio
n p
er B
ar
1.0 10.0 100.0
Population of countries (1986)
Determining distributions and outliers
Will a transformation help??
4
Data Exploration
• Is not snooping in the pejorative sense.
Exploration is a necessary and desired
operation for:
– Checking data for unusual values
– Making sure the data meet the assumptions of
the chosen form of analysis
• Eg – normality, homogeneity of variances, linearity
(in regression approaches)
0 10 20 30 40 50
BIRTH_82
0
10
20
30
DE
AT
H_82
The relationship between birth and death rates (ourworld)
Is it linear, or is there perhaps a more appropriate model
5
0 10 20 30 40 50
BIRTH_82
0
10
20
30
DE
AT
H_82
Clearly not linear using LOESS procedure (locally weighted
scatterplot smoothing): a non-parametric regression method that
combines multiple regression models in a k-nearest-neighbor-based
meta-mode
When is a graph appropriate?
• Often for data analysis (e.g.)
– To understand the nature of interaction terms
(more later)
– To understand the power of a test. Say we
wanted to determine sample size for an
experiment where we thought the response
would be around 10 (alternate Hypothesis =0)
the standard deviation about 8 and we were
willing to relax alpha (from 0.05 to 0.20)
6
Pop. Mean = 10
Alternative = 0
SD = 8
Alpha=.05, .20
Power Curve (Alpha = 0.050)
0 5 10 15 20 25 30 35
Sample Size (per cell)
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Po
we
r
Power Curve (Alpha = 0.200)
0 5 10 15 20
Sample Size (per cell)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Po
we
r
For example the effect of relaxing alpha on power
When is a graph appropriate?
• Sometimes for presentations
– Idea is to communicate information quickly
– Be sure you know why you are presenting the graph (is
it to convey stats or some other information (we will
talk about this more later)
– Graphs should be simple and not contain too much
information – never have a graph that is not
interpretable
• So many factors involved that no one could figure it out, or
worse
7
I know you can’t really see this but ….”
P OP _1983
PO
P_19
83
P OP _1986 P OP _1990 P OP _2020 B IRTH_82 B IRTH_RT DE A TH_82 DE A TH_RT B A B Y MT82 B A B Y MORT LIFE _E X P GNP _82 GNP _86 GDP _CA P LOG_GDP E DUC_84 E DUC HE A LTH84 HE A LTH
PO
P_19
83
PO
P_19
86
PO
P_19
86
PO
P_19
90
PO
P_19
90
PO
P_20
20
PO
P_20
20
BIR
TH
_82
BIR
TH
_82
BIR
TH
_R
T BIR
TH
_R
T
DEA
TH
_82
DEA
TH
_82
DEA
TH
_R
T
DEA
TH
_R
T
BA
BY
MT
82
BA
BY
MT
82
BA
BY
MO
RT
BA
BY
MO
RT
LIF
E_EX
P
LIF
E_EX
P
GN
P_8
2
GN
P_8
2
GN
P_8
6
GN
P_8
6
GD
P_C
AP
GD
P_C
AP
LO
G_G
DP L
OG
_G
DP
ED
UC
_84 E
DU
C_
84
ED
UC
ED
UC
HEA
LTH
84
HEA
LTH
84
P OP _1983
HEA
LTH
P OP _1986 P OP _1990 P OP _2020 B IRTH_82 B IRTH_RT DE A TH_82 DE A TH_RT B A B Y MT82 B A B Y MORT LIFE _E X P GNP _82 GNP _86 GDP _CA P LOG_GDP E DUC_84 E DUC HE A LTH84 HE A LTH
HEA
LTH
These are usually presented to demonstrate how much work the
researcher has done – really conveys that he or she has not
adequately prepared the presentation
When is a graph appropriate?
• Less often for publications – Idea is to communicate information that is too complex to leave in tables
or text
– They typically depict rather than present information (you have to read across to axes to get numbers). Hence if precise bits of information are important to the argument being made – use tables.
– If a graph is presented it must be important to the argument being made in the text (no fluff graphs)
– Information cannot be presented twice (eg table and figure, text and figure)
– If a graph is presented it must be interpretable
• You should be able to understand the purpose and content of the figure directly from the legend.
8
Basics of analytical graph theory
• Graph types imply a basis of logic and are not always interchangeable
• Even interchangeable graph types are not always equivalent (some are just non-informative)
• Be very clear about what you are trying to convey: models, stats or data structure
• Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make
• Graph trickery is usually just that – and typically subtracts from the depiction
Graph types imply a basis of logic
and are not always interchangeable
• Summary Charts
• Density Charts
• Scatterplots, quantile plots and probability
plots
9
Summary Charts
There are a series of general graphical displays useful for
characterizing the relationship between independent variables
(usually categorical) and summary statistics of dependent
variables (usually continuous).
An example would be a bar graph of the relationship between
education and income (see survey2 data). Some types of
summary charts:
Examples of continuous and
categorical variables
Categorical Continuous
Gender (male, female) Hormone level
Nationality (French, Italian) Location (Latitude, Longitude)
Species (Human, Chimp) Phylogenetic distance
Color (red, green, blue) Color (wavelength)
Age Group (Young, Old) Age (years, days)
Height Group (Short, Medium,
Tall)
Height (cm, inches)
Weight Group (Thin, Obese) Weight (grams, pounds)
Speed (Fast, Slow) Speed (cm/sec)
Temperature (Cold, Warm) Temperature (degrees C)
10
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
0
10
20
30
40
50
60
70IN
CO
ME
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
0
10
20
30
40
50
60
70
INC
OM
E
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
0
10
20
30
40
50
60
70
INC
OM
E
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
0
10
20
30
40
50
60
70
INC
OM
E
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
0
10
20
30
40
50
60
70
INC
OM
E
no grad hs
hs grad
some college
college grad
Bar Dot Line
Profile Pyramid Pie
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
0
10
20
30
40
50
60
70
INC
OM
E
MaleFemale
SEX
Female
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
Male
SEX
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
0
10
20
30
40
50
60
70
INC
OM
E
0
10
20
30
40
50
60
70
INC
OM
E
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
0
10
20
30
40
50
60
70
INC
OM
E
MaleFemale
SEX
Which conveys the information most clearly – how about the
comparisons of interest
11
Density Charts
• The density of a sample is the relative concentration of data points in intervals across the range of the distribution. A histogram is one way to display the density of a quantitative variable; box plots, dot or symmetric dot density, frequency polygons, fuzzygrams, jitter plots, density stripes, and histograms with data-driven bar widths are others.
Histogram
Length (mm)
12
median hinge hinge
25% 25% 25% 25%
Y
outliers
Statistical Range
Features of a BOXPLOT
mean
confidence interval
Smallest 50%
Rather than comparing sample
values to the normal distribution
(mean, standard deviation, and so
on), box plots show robust (what
does this mean) statistics (median,
quartiles, and so on).
Raw Data Plots:e.g. Scatterplots,
• Scatterplots are probably the most common form of graphical display. The key feature of scatterplots is that raw data are plotted (in contrast to summary data as in summary charts). Regression lines with confidence bands or smoothers (e.g. linear, non-linear) can be added to help explain relationships among variables. An example is the relationship between mussel height, and length and mussel height and mussel mass.
• How to estimate length and mass of mussels?
13
Height
Length
Non-linear and linear smoothing
Each point is
a mussel
14
Scatterplots, quantile plots and
probability plots • Quantile plots and probability plots are useful for
studying the distribution of a variable.
• Quantile Plots produces quantile plots, or Q plots. Unlike
probability plots, which compare a sample to a theoretical
probability distribution, a quantile plot compares a sample
to its own quantiles (a one-sample plot) or to another
sample (a two-sample, or Q-Q, plot). The quantile of a
sample is the data point corresponding to a given fraction
of the data.
See ourworld (pop_1986 )
0 50 100 150
POP_1986
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Fra
ction o
f D
ata
Features of a Quantile Plot
Distribution of data
Distribution of quantiles
(should be uniform – but
is subject to sample size)
86% of countries
had populations
less or equal to 50
million people
15
Scatterplots, quantile plots and
probability plots • A Probability Plot plots the values of a variable against
the corresponding percentage points of a theoretical
distribution--normal, chi-square, t, F, uniform, binomial,
logistic, exponential, gamma, Weibull, or Studentized
range. Graphs like this are called probability plots, or P
plots. You can also plot the expected values of one variable
against those of another (P-P plot). These graphs are
very important for determining if data are in need of
transformation.
See ourworld (pop_1986 )
Features of a Probability Plot No transformation Log (base 10) transformation
16
Lets Play
Activity 1: Graph construction
• Draw the most appropriate graph, given the data set and type provided
– Think about the nature of the information and how best to depict the information.
• Label both the x and y axis.
• Use appropriate scales for both axes. Think about the number of ticks
on axis and labeling of tick marks
• Make sure the elements (points, bars, lines etc), are crafted in a way
that simplifies interpretation (think about, color, pattern, shape of
elements, whether or not to depict a trend)
• Provide a figure legend that is descriptive: the reader should be able to
interpret the figure based on the graph and legend
17
Average size of Seastars
(Pisaster) over time Age (years) Diameter (mm)
1 35
2 65
3 92
4 116
5 138
6 158
7 176
8 192
9 206
10 219
Total commercial abalone landings
(pounds)over time in California
Year Abalone Landings
1973 3,187,706
1974 2,587,108
1975 2,128,545
1976 1,730,111
1977 1,434,305
1978 1,292,517
1979 989,124
1980 1,238,495
1981 1,109,463
1982 1,240,443
1983 840,025
1984 826,514
1985 823,931
1986 614,962
1987 762,951
1988 568,716
1989 741,150
1990 523,942
1991 380,593
1992 514,840
1993 461,340
1994 302,596
1995 262,314
1996 229,379
1997 112,323
18
VO2 max (oxygen
consumption,
mL/(kg·min) ) Runtime (minutes per
mile)
59.57 8.17
60.06 8.63
54.3 8.65
54.63 8.92
49.16 8.95
49.87 9.22
48.67 9.4
45.44 9.63
50.55 9.93
46.67 10
45.31 10.07
50.39 10.08
50.54 10.13
46.77 10.25
51.86 10.33
45.79 10.47
47.47 10.5
47.27 10.6
49.09 10.85
40.84 10.95
45.12 11.08
44.75 11.12
46.08 11.17
44.61 11.37
47.92 11.5
44.81 11.63
45.68 11.95
39.41 12.63
39.2 12.88
39.44 13.08
37.39 14.03
The relationship between time to run a mile and maximum oxygen consumption
Size distribution of Limpets
Limpet size
(mm)
10
10
10
20
20
20
20
30
30
30
30
30
30
40
40
40
50
50
60
70
19
Two variables: Number of Blue whales
as a function of period and location
Southern
Hemisphere
North
PacificNorth Atlantic
Pre-
whaling
~175,000 4,900 1,300
Current ~2,000 ~2,500 ~500
Extra slides
20
Basics of analytical graph theory
• Graph types imply a basis of logic and are not always interchangeable
• Even interchangeable graph types are not always equivalent (some are just non-informative)
• Be very clear about what you are trying to convey: models, stats or data structure
• Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make
• Graph trickery is usually just that – and typically subtracts from the depiction
The underlying basis of the graph
• There are two general bases for any data graph
that will be presented or published.
– To display data (hopefully in the most efficient way)
– To convey information about statistics associated with
the data
– Both of the above
Although these may not appear to present a conflict
often times there is – here is an example
21
Error Bars
• Be very Careful - error bars convey meaning - at least two sorts
– Estimate of variability for subjects in that category, irrespective of strata or statistical assumptions
• Of use for showing spread in sampled data
• Of no use for conveying inferential statistics
– Estimate of variability for subjects in that category, with respect to strata and statistical assumptions
• Of no use for showing spread in sampled data
• Of use for conveying inferential statistics
See typing
How and why are these two graphs different?
Least Squares Means
electric
plain old
word p
roce
ss
EQUIPMNT
38
48
58
68
78
SP
EE
D
electric
plain old
word p
roce
ss
EQUIPMNT
40
50
60
70
80
SP
EE
D
with respect to strata and
statistical assumptions
without respect to strata and
statistical assumptions
22
Basics of analytical graph theory
• Graph types imply a basis of logic and are not always interchangeable
• Even interchangeable graph types are not always equivalent (some are just non-informative)
• Be very clear about what you are trying to convey: models, stats or data structure
• Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make
• Graph trickery is usually just that – and typically subtracts from the depiction
no gra
d hs
hs gra
d
som
e colle
ge
colle
ge gra
d
EDUC
0
10
20
30
40
50
INC
OM
E
MaleFemale
SEX
Which is best?
top related