stats 330: lecture 4 - department of statisticsstats330/lects/330_lecture4_2015.pdf · type of...

Post on 20-Aug-2018

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

STATS 330: Lecture 4Trellis Graphics

26.07.2015

Housekeeping

I Contact details

Office auckland.ac.nz hoursSteffen Klaere 303.219 s.klaere 9:30–10:30, Thu+Fri

Arden Miller 303.229C a.miller Wed 9–10, Thu 12–1

I Class representativesCourse aucklanduni.ac.nz

Jessica Courtney 330 jcou608Monica Hill 330 mhil084

??? 762 ???

I Assignment 1 is due August 10

I Lecturer evaluation before mid semester break in Tutorial.

Today’s Lecture: More on Trellis graphics

Aim of the lecture

I To give you an idea of the scope of Trellis graphics

I To discuss several examples and show how Trellis graphicsreveal important insights into the data.

Recall last time

I Last lecture we discussed coplots

I Coplots show how the relationship between two variables, xand y , changes as the value of a third variable, z , changes

I We can generalise this to more than one variable z , i.e.conditioning on more than one variable

Coplots: syntax

plot(y ∼ x |z ∗ w)

I plot is the plot type, either xyplot, dotplot or bwplot

I x and y are the relationship variables

I z and w are the conditioning variables, optional

Conditioning on two variables

I Suppose we have two conditioning variables, Z and W .

I No problem if both are categorical

I If one or both are continuous variables, we turn them intocategorical variables by using subranges, e.g.,

I Turn ages into 10 year age groupsI Turn marks into grades

Conditioning on two variables: Example

Gender is already categorical

Age not, so divide the age rangeaccording to TV rating range

This gives a 4× 2 table with 8cells.

Female Male

0− 17

18− 34

35− 49

50+

Conditioning on two variables: Example

In each of the 8 cells of the table, we can draw a graph thatillustrates the relationship between x and y for individuals havingthat age and gender.

Type of graph will depend onvariable type of x and y

Both continuous: scatterplot

One continuous: boxplots,dotplots, etc

Both categorical: mosaic plots

Female Male

0− 17

18− 34

35− 49

50+

x and y continuous: xyplot

xyplot(y~x|gender*age)

x

y

10

12

14

16

18

20

18 20 22 24

● ●●●

●●

●●●

●●●

●●

0−17F

● ●●

●●

● ●●

● ●

●●

18−34F

18 20 22 24

●●

●●

●●

●● ●

●●

●●

●●

●●●

35−49F

●●

●●

●● ●

●●

●●

●●

● ●

●●

50+F

●●

●●

●●

●●

● ●●

0−17M

18 20 22 24

●●

●●

● ●

●●

●●●

●● ●●

18−34M

●●

●●

●●

●●

●●

●●●

35−49M

18 20 22 24

10

12

14

16

18

20

●●

●●

●●●●

●● ●●

●●

●●

50+M

x categorical, y continuous: dotplot

dotplot(y~x|gender*age)

y

2

4

6

8

10

12

A B

●●

●●

●●●

●●

●●

●●●

0−17F

A B

●●●

●●

●●

●●● ●

18−34F

A B

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

35−49F

A B

●●

●●

●●

●●

●●

50+F

● ●●●

●●

●●

0−17M

●●●

●●

●●

●●

●●

●●●

● ●

●●

18−34M

●●●●

●●

35−49M

2

4

6

8

10

12

●●●

● ●●

●●●●

●●

●●

●● ●

50+M

x categorical, y continuous: bwplot

bwplot(y~x|gender*age)

y

2

4

6

8

10

12

A B

0−17F

A B

18−34F

A B

●●●

35−49F

A B

50+F

0−17M

18−34M

35−49M

2

4

6

8

10

12

●●

50+M

x categorical, y categorical: mosaicplot

mosaicplot(table(age=age,sex=gender,x=x,y=y),

col=c("red","blue"),main=NA)

age

sex

0−17 18−34 35−49 50+

FM

A B

ab

ab

A B A B A B

x categorical, y categorical: mosaicplot

mosaicplot(table(age=age,sex=gender,x=x,y=y),

col=c("red","blue"),main=NA)

age

sex

0−17 18−34 35−49 50+

FM

A B

ab

ab

A B A B A B

In summary

I The conditioning variablesdetermine the layout of thecells

I The x/y variables determinethe kind of graph to draw ineach cell

x

y

10

12

14

16

18

20

18 20 22 24

● ●●●

●●

●●●

●●●

●●

0−17F

● ●●

●●

● ●●

● ●

●●

18−34F

18 20 22 24

●●

●●

●●

●● ●

●●

●●

●●

●●●

35−49F

●●

●●

●● ●

●●

●●

●●

● ●

●●

50+F

●●

●●

●●

●●

● ●●

0−17M

18 20 22 24

●●

●●

● ●

●●

●●●

●● ●●

18−34M

●●

●●

●●

●●

●●

●●●

35−49M

18 20 22 24

10

12

14

16

18

20

●●

●●

●●●●

●● ●●

●●

●●

50+M

age

sex

0−17 18−34 35−49 50+

FM

A B

ab

ab

A B A B A B

What to look for

I Does the value range for x and y change across panels?

I Do we observe any kind of relationship between x and y?

I Does the relationship between x and y change for differentlevels combinations of z ∗ w?

Example: Sports

I In a study on athletes at the Australian Institute of Sport,various physical measurements were made.

I In this example we look at the relationship between body fatand BMI, and how it differs between athletes of either genderplaying different sports.

BMI =weight (kg)

(height (m))2

Body fat vs. BMI conditioned on Gender and Sport

Percent body fat

BM

I

20

25

30

35

5 10 15 20 25

●●

●●

●●●

●●

●●

●●●●

Athleticsfemale

●●● ●

●● ●

●●

● ●

BBallfemale

5 10 15 20 25

●● ●●

●●●●

●●●●

● ●●

●●●

Rowfemale

●●

●●

●●●

Swimfemale

5 10 15 20 25

●●●

●●●

Tennisfemale

●●●●

●●

●●●●●●●

●●

●●

●●●

●●●

●● ●●

●●

●●

Athleticsmale

5 10 15 20 25

●●●

●●

●●●●

BBallmale

●●●●●●

●●●

●●● ●

Rowmale

5 10 15 20 25

●●●●●

●●●

●●

Swimmale

20

25

30

35

●● ●

Tennismale

Conclusions: Sport data

I Male athletes have lower percent body fat than females (wellknown).

I Male BMI tends to be larger than female BMI.

I Largest range of BMI values for athletes (quite diversenumber of disciplines there).

I Apart from female athletics there doesn’t seem to be anyrelationship between percent body fat and BMI.

I Initial scatterplot indicated positive relationship between BMIand percent body fat which goes away when looking at sportsseparately.

Example: engines

I In a study of engine emissions, a test engine was run underdifferent conditions and the amount of nitrogen oxide (NOx)emitted was measured.

I The conditions involved different settings of the compressionratio C , and the equivalence ratio, E (related to fuel/airmixture).

I How does NOx relate to E?Does the relationshipdepend on C?

I There are only 5 settings ofC (7.5, 9.0, 12.0, 15.0,18.0) so we condition onthese.

I How does NOx relate to E?Does the relationshipdepend on C?

I There are only 5 settings ofC (7.5, 9.0, 12.0, 15.0,18.0) so we condition onthese.

NOx vs E given C

Equivalence ratio

Nitr

ogen

oxi

de c

once

ntra

tion

1

2

3

4

0.6 0.8 1.0 1.2

7.5

●●

9

0.6 0.8 1.0 1.2

●●

●●

12

15

0.6 0.8 1.0 1.2

1

2

3

4

●●

●●

● ●

●●

18

NOx vs E given C

●●

●●

●●

●●

●●

●●●●

●●

0.6 0.7 0.8 0.9 1.0 1.1 1.2

12

34

Equivalence ratio

Nitr

ogen

oxi

de c

once

ntra

tion

7.59121518

Conclusions: Ethanol data

I NOx and E have a sinoid relationship.

I For E between 0.5 and about 0.9 the relationship is increasingand from 0.9 to 1.3 it is decreasing.

I The impact of C on the relationship is reflected by a largervariation for E between 0.5 and 0.9.

Example: yarn

In an experiment to test the strength of different yarns, lengths ofyarn are repeatedly stressed until they break (cycles to failure). Itis desired to see how this variable is related to the length of theyarn samples, the amplitude and the load (two variables related tothe amount of stress). The experiment involved using 3amplitudes, 3 lengths and 3 loads, for a total of 27 = 3× 3× 3different experimental conditions. (Coursebook p.9)

Testing procedure

Cycles to failure: number of pushes before yarn breaks.

length

amplitude

Yarn length vs. yarn cycles

cycles

leng

th

low

med

high

0 1000 2000 3000

amp:lowload:low

amp:medload:low

0 1000 2000 3000

amp:highload:low

low

med

high

amp:lowload:med

amp:medload:med

amp:highload:med

low

med

high

amp:lowload:high

0 1000 2000 3000

amp:medload:high

amp:highload:high

Conclusions

I For longer lengths, the cycles to failure are higher. (less likelyto break)

I High loads reduce the cycles to failure. (more likely to break)

I High amplitudes reduce the cycles to failure. (more likely tobreak)

I Most likely to break when load and amplitude are high andlength is low.

Summary

I Trellis graphics are a powerful tool to assess the influence ofmultiple variables at once!

I Permits to compare different types of variables.

I Can detect confounding between variables!

I Can help to untangle difficult relationships.

Thank you

top related