stats 330: lecture 4 - department of statisticsstats330/lects/330_lecture4_2015.pdf · type of...
TRANSCRIPT
STATS 330: Lecture 4Trellis Graphics
26.07.2015
Housekeeping
I Contact details
Office auckland.ac.nz hoursSteffen Klaere 303.219 s.klaere 9:30–10:30, Thu+Fri
Arden Miller 303.229C a.miller Wed 9–10, Thu 12–1
I Class representativesCourse aucklanduni.ac.nz
Jessica Courtney 330 jcou608Monica Hill 330 mhil084
??? 762 ???
I Assignment 1 is due August 10
I Lecturer evaluation before mid semester break in Tutorial.
Today’s Lecture: More on Trellis graphics
Aim of the lecture
I To give you an idea of the scope of Trellis graphics
I To discuss several examples and show how Trellis graphicsreveal important insights into the data.
Recall last time
I Last lecture we discussed coplots
I Coplots show how the relationship between two variables, xand y , changes as the value of a third variable, z , changes
I We can generalise this to more than one variable z , i.e.conditioning on more than one variable
Coplots: syntax
plot(y ∼ x |z ∗ w)
I plot is the plot type, either xyplot, dotplot or bwplot
I x and y are the relationship variables
I z and w are the conditioning variables, optional
Conditioning on two variables
I Suppose we have two conditioning variables, Z and W .
I No problem if both are categorical
I If one or both are continuous variables, we turn them intocategorical variables by using subranges, e.g.,
I Turn ages into 10 year age groupsI Turn marks into grades
Conditioning on two variables: Example
Gender is already categorical
Age not, so divide the age rangeaccording to TV rating range
This gives a 4× 2 table with 8cells.
Female Male
0− 17
18− 34
35− 49
50+
Conditioning on two variables: Example
In each of the 8 cells of the table, we can draw a graph thatillustrates the relationship between x and y for individuals havingthat age and gender.
Type of graph will depend onvariable type of x and y
Both continuous: scatterplot
One continuous: boxplots,dotplots, etc
Both categorical: mosaic plots
Female Male
0− 17
18− 34
35− 49
50+
x and y continuous: xyplot
xyplot(y~x|gender*age)
x
y
10
12
14
16
18
20
18 20 22 24
● ●●●
●●
●
●
●●●
●●●
●
●
●●
●
●
●
●
●
0−17F
●
● ●●
●●
● ●●
●
● ●
●●
●
●
●
●
●
●
18−34F
18 20 22 24
●●
●●
●●
●
●
●● ●
●
●
●
●
●●
●●
●
●●
●
●
●
●●●
●
35−49F
●●
●●
●
●● ●
●
●●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
50+F
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
● ●●
●
0−17M
18 20 22 24
●
●
●●
●
●●
●
●
●
●
●
● ●
●●
●
●
●●●
●● ●●
●
●
●
●
●
●
18−34M
●
●●
●●
●
●●
●
●●
●●
●
●●●
●
●
●
35−49M
18 20 22 24
10
12
14
16
18
20
●●
●
●
●
●
●●
●●●●
●● ●●
●
●
●●
●
●●
50+M
x categorical, y continuous: dotplot
dotplot(y~x|gender*age)
y
2
4
6
8
10
12
A B
●
●
●
●●
●●
●
●●●
●
●●
●
●●
●
●●●
●
●
0−17F
A B
●
●●●
●●
●
●●
●●● ●
●
●
●
●
●
●
●
18−34F
A B
●●
●
●
●●●●
●●
●●
●●
●●
●●
●●
●●●●●
●●●
●
35−49F
A B
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
50+F
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●
●
●
0−17M
●●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●●●
● ●
●●
●
●
●
18−34M
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
35−49M
2
4
6
8
10
12
●●●
●
●
● ●●
●●●●
●●
●●
●
●
●
●
●● ●
50+M
x categorical, y continuous: bwplot
bwplot(y~x|gender*age)
y
2
4
6
8
10
12
A B
●
●
0−17F
A B
●
●
18−34F
A B
●
●●●
35−49F
A B
●
●
50+F
●
●
●
0−17M
●
●
18−34M
●
●
35−49M
2
4
6
8
10
12
●●
●
●
●
50+M
x categorical, y categorical: mosaicplot
mosaicplot(table(age=age,sex=gender,x=x,y=y),
col=c("red","blue"),main=NA)
age
sex
0−17 18−34 35−49 50+
FM
A B
ab
ab
A B A B A B
x categorical, y categorical: mosaicplot
mosaicplot(table(age=age,sex=gender,x=x,y=y),
col=c("red","blue"),main=NA)
age
sex
0−17 18−34 35−49 50+
FM
A B
ab
ab
A B A B A B
In summary
I The conditioning variablesdetermine the layout of thecells
I The x/y variables determinethe kind of graph to draw ineach cell
x
y
10
12
14
16
18
20
18 20 22 24
● ●●●
●●
●
●
●●●
●●●
●
●
●●
●
●
●
●
●
0−17F
●
● ●●
●●
● ●●
●
● ●
●●
●
●
●
●
●
●
18−34F
18 20 22 24
●●
●●
●●
●
●
●● ●
●
●
●
●
●●
●●
●
●●
●
●
●
●●●
●
35−49F
●●
●●
●
●● ●
●
●●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
50+F
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
● ●●
●
0−17M
18 20 22 24
●
●
●●
●
●●
●
●
●
●
●
● ●
●●
●
●
●●●
●● ●●
●
●
●
●
●
●
18−34M
●
●●
●●
●
●●
●
●●
●●
●
●●●
●
●
●
35−49M
18 20 22 24
10
12
14
16
18
20
●●
●
●
●
●
●●
●●●●
●● ●●
●
●
●●
●
●●
50+M
age
sex
0−17 18−34 35−49 50+
FM
A B
ab
ab
A B A B A B
What to look for
I Does the value range for x and y change across panels?
I Do we observe any kind of relationship between x and y?
I Does the relationship between x and y change for differentlevels combinations of z ∗ w?
Example: Sports
I In a study on athletes at the Australian Institute of Sport,various physical measurements were made.
I In this example we look at the relationship between body fatand BMI, and how it differs between athletes of either genderplaying different sports.
BMI =weight (kg)
(height (m))2
Body fat vs. BMI conditioned on Gender and Sport
Percent body fat
BM
I
20
25
30
35
5 10 15 20 25
●●
●●
●●●
●
●
●●
●
●
●
●●
●
●
●●●●
Athleticsfemale
●●● ●
●● ●
●●
●
●
● ●
BBallfemale
5 10 15 20 25
●
●● ●●
●●●●
●
●
●●●●
● ●●
●●●
●
Rowfemale
●●
●
●●
●●●
●
Swimfemale
5 10 15 20 25
●
●●●
●●●
Tennisfemale
●●●●
●
●
●●
●●●●●●●
●●
●●
●
●
●
●
●●●
●●●
●
●● ●●
●
●●
●
●
●●
Athleticsmale
5 10 15 20 25
●
●
●●●
●●
●●●●
●
BBallmale
●
●●●●●●
●
●●●
●●● ●
Rowmale
5 10 15 20 25
●●●●●
●●●
●
●
●●
●
Swimmale
20
25
30
35
●● ●
●
Tennismale
Conclusions: Sport data
I Male athletes have lower percent body fat than females (wellknown).
I Male BMI tends to be larger than female BMI.
I Largest range of BMI values for athletes (quite diversenumber of disciplines there).
I Apart from female athletics there doesn’t seem to be anyrelationship between percent body fat and BMI.
I Initial scatterplot indicated positive relationship between BMIand percent body fat which goes away when looking at sportsseparately.
Example: engines
I In a study of engine emissions, a test engine was run underdifferent conditions and the amount of nitrogen oxide (NOx)emitted was measured.
I The conditions involved different settings of the compressionratio C , and the equivalence ratio, E (related to fuel/airmixture).
I How does NOx relate to E?Does the relationshipdepend on C?
I There are only 5 settings ofC (7.5, 9.0, 12.0, 15.0,18.0) so we condition onthese.
I How does NOx relate to E?Does the relationshipdepend on C?
I There are only 5 settings ofC (7.5, 9.0, 12.0, 15.0,18.0) so we condition onthese.
NOx vs E given C
Equivalence ratio
Nitr
ogen
oxi
de c
once
ntra
tion
1
2
3
4
0.6 0.8 1.0 1.2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
7.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
9
0.6 0.8 1.0 1.2
●
●
●
●
●
●
●
●
●
●●
●●
●
12
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
15
0.6 0.8 1.0 1.2
1
2
3
4
●
●●
●
●
●
●
●
●●
●
● ●
●●
●
18
NOx vs E given C
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●●
●
0.6 0.7 0.8 0.9 1.0 1.1 1.2
12
34
Equivalence ratio
Nitr
ogen
oxi
de c
once
ntra
tion
●
●
●
●
●
7.59121518
Conclusions: Ethanol data
I NOx and E have a sinoid relationship.
I For E between 0.5 and about 0.9 the relationship is increasingand from 0.9 to 1.3 it is decreasing.
I The impact of C on the relationship is reflected by a largervariation for E between 0.5 and 0.9.
Example: yarn
In an experiment to test the strength of different yarns, lengths ofyarn are repeatedly stressed until they break (cycles to failure). Itis desired to see how this variable is related to the length of theyarn samples, the amplitude and the load (two variables related tothe amount of stress). The experiment involved using 3amplitudes, 3 lengths and 3 loads, for a total of 27 = 3× 3× 3different experimental conditions. (Coursebook p.9)
Testing procedure
Cycles to failure: number of pushes before yarn breaks.
length
amplitude
Yarn length vs. yarn cycles
cycles
leng
th
low
med
high
0 1000 2000 3000
●
●
●
amp:lowload:low
●
●
●
amp:medload:low
0 1000 2000 3000
●
●
●
amp:highload:low
low
med
high
●
●
●
amp:lowload:med
●
●
●
amp:medload:med
●
●
●
amp:highload:med
low
med
high
●
●
●
amp:lowload:high
0 1000 2000 3000
●
●
●
amp:medload:high
●
●
●
amp:highload:high
Conclusions
I For longer lengths, the cycles to failure are higher. (less likelyto break)
I High loads reduce the cycles to failure. (more likely to break)
I High amplitudes reduce the cycles to failure. (more likely tobreak)
I Most likely to break when load and amplitude are high andlength is low.
Summary
I Trellis graphics are a powerful tool to assess the influence ofmultiple variables at once!
I Permits to compare different types of variables.
I Can detect confounding between variables!
I Can help to untangle difficult relationships.
Thank you