stats 330: lecture 4 - department of statisticsstats330/lects/330_lecture4_2015.pdf · type of...

32
STATS 330: Lecture 4 Trellis Graphics 26.07.2015

Upload: trinhphuc

Post on 20-Aug-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

STATS 330: Lecture 4Trellis Graphics

26.07.2015

Page 2: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Housekeeping

I Contact details

Office auckland.ac.nz hoursSteffen Klaere 303.219 s.klaere 9:30–10:30, Thu+Fri

Arden Miller 303.229C a.miller Wed 9–10, Thu 12–1

I Class representativesCourse aucklanduni.ac.nz

Jessica Courtney 330 jcou608Monica Hill 330 mhil084

??? 762 ???

I Assignment 1 is due August 10

I Lecturer evaluation before mid semester break in Tutorial.

Page 3: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Today’s Lecture: More on Trellis graphics

Aim of the lecture

I To give you an idea of the scope of Trellis graphics

I To discuss several examples and show how Trellis graphicsreveal important insights into the data.

Page 4: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Recall last time

I Last lecture we discussed coplots

I Coplots show how the relationship between two variables, xand y , changes as the value of a third variable, z , changes

I We can generalise this to more than one variable z , i.e.conditioning on more than one variable

Page 5: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Coplots: syntax

plot(y ∼ x |z ∗ w)

I plot is the plot type, either xyplot, dotplot or bwplot

I x and y are the relationship variables

I z and w are the conditioning variables, optional

Page 6: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Conditioning on two variables

I Suppose we have two conditioning variables, Z and W .

I No problem if both are categorical

I If one or both are continuous variables, we turn them intocategorical variables by using subranges, e.g.,

I Turn ages into 10 year age groupsI Turn marks into grades

Page 7: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Conditioning on two variables: Example

Gender is already categorical

Age not, so divide the age rangeaccording to TV rating range

This gives a 4× 2 table with 8cells.

Female Male

0− 17

18− 34

35− 49

50+

Page 8: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Conditioning on two variables: Example

In each of the 8 cells of the table, we can draw a graph thatillustrates the relationship between x and y for individuals havingthat age and gender.

Type of graph will depend onvariable type of x and y

Both continuous: scatterplot

One continuous: boxplots,dotplots, etc

Both categorical: mosaic plots

Female Male

0− 17

18− 34

35− 49

50+

Page 9: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

x and y continuous: xyplot

xyplot(y~x|gender*age)

x

y

10

12

14

16

18

20

18 20 22 24

● ●●●

●●

●●●

●●●

●●

0−17F

● ●●

●●

● ●●

● ●

●●

18−34F

18 20 22 24

●●

●●

●●

●● ●

●●

●●

●●

●●●

35−49F

●●

●●

●● ●

●●

●●

●●

● ●

●●

50+F

●●

●●

●●

●●

● ●●

0−17M

18 20 22 24

●●

●●

● ●

●●

●●●

●● ●●

18−34M

●●

●●

●●

●●

●●

●●●

35−49M

18 20 22 24

10

12

14

16

18

20

●●

●●

●●●●

●● ●●

●●

●●

50+M

Page 10: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

x categorical, y continuous: dotplot

dotplot(y~x|gender*age)

y

2

4

6

8

10

12

A B

●●

●●

●●●

●●

●●

●●●

0−17F

A B

●●●

●●

●●

●●● ●

18−34F

A B

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

35−49F

A B

●●

●●

●●

●●

●●

50+F

● ●●●

●●

●●

0−17M

●●●

●●

●●

●●

●●

●●●

● ●

●●

18−34M

●●●●

●●

35−49M

2

4

6

8

10

12

●●●

● ●●

●●●●

●●

●●

●● ●

50+M

Page 11: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

x categorical, y continuous: bwplot

bwplot(y~x|gender*age)

y

2

4

6

8

10

12

A B

0−17F

A B

18−34F

A B

●●●

35−49F

A B

50+F

0−17M

18−34M

35−49M

2

4

6

8

10

12

●●

50+M

Page 12: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

x categorical, y categorical: mosaicplot

mosaicplot(table(age=age,sex=gender,x=x,y=y),

col=c("red","blue"),main=NA)

age

sex

0−17 18−34 35−49 50+

FM

A B

ab

ab

A B A B A B

Page 13: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

x categorical, y categorical: mosaicplot

mosaicplot(table(age=age,sex=gender,x=x,y=y),

col=c("red","blue"),main=NA)

age

sex

0−17 18−34 35−49 50+

FM

A B

ab

ab

A B A B A B

Page 14: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

In summary

I The conditioning variablesdetermine the layout of thecells

I The x/y variables determinethe kind of graph to draw ineach cell

x

y

10

12

14

16

18

20

18 20 22 24

● ●●●

●●

●●●

●●●

●●

0−17F

● ●●

●●

● ●●

● ●

●●

18−34F

18 20 22 24

●●

●●

●●

●● ●

●●

●●

●●

●●●

35−49F

●●

●●

●● ●

●●

●●

●●

● ●

●●

50+F

●●

●●

●●

●●

● ●●

0−17M

18 20 22 24

●●

●●

● ●

●●

●●●

●● ●●

18−34M

●●

●●

●●

●●

●●

●●●

35−49M

18 20 22 24

10

12

14

16

18

20

●●

●●

●●●●

●● ●●

●●

●●

50+M

age

sex

0−17 18−34 35−49 50+

FM

A B

ab

ab

A B A B A B

Page 15: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

What to look for

I Does the value range for x and y change across panels?

I Do we observe any kind of relationship between x and y?

I Does the relationship between x and y change for differentlevels combinations of z ∗ w?

Page 16: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Example: Sports

I In a study on athletes at the Australian Institute of Sport,various physical measurements were made.

I In this example we look at the relationship between body fatand BMI, and how it differs between athletes of either genderplaying different sports.

BMI =weight (kg)

(height (m))2

Page 17: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between
Page 18: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Body fat vs. BMI conditioned on Gender and Sport

Percent body fat

BM

I

20

25

30

35

5 10 15 20 25

●●

●●

●●●

●●

●●

●●●●

Athleticsfemale

●●● ●

●● ●

●●

● ●

BBallfemale

5 10 15 20 25

●● ●●

●●●●

●●●●

● ●●

●●●

Rowfemale

●●

●●

●●●

Swimfemale

5 10 15 20 25

●●●

●●●

Tennisfemale

●●●●

●●

●●●●●●●

●●

●●

●●●

●●●

●● ●●

●●

●●

Athleticsmale

5 10 15 20 25

●●●

●●

●●●●

BBallmale

●●●●●●

●●●

●●● ●

Rowmale

5 10 15 20 25

●●●●●

●●●

●●

Swimmale

20

25

30

35

●● ●

Tennismale

Page 19: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Conclusions: Sport data

I Male athletes have lower percent body fat than females (wellknown).

I Male BMI tends to be larger than female BMI.

I Largest range of BMI values for athletes (quite diversenumber of disciplines there).

I Apart from female athletics there doesn’t seem to be anyrelationship between percent body fat and BMI.

I Initial scatterplot indicated positive relationship between BMIand percent body fat which goes away when looking at sportsseparately.

Page 20: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Example: engines

I In a study of engine emissions, a test engine was run underdifferent conditions and the amount of nitrogen oxide (NOx)emitted was measured.

I The conditions involved different settings of the compressionratio C , and the equivalence ratio, E (related to fuel/airmixture).

Page 21: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

I How does NOx relate to E?Does the relationshipdepend on C?

I There are only 5 settings ofC (7.5, 9.0, 12.0, 15.0,18.0) so we condition onthese.

Page 22: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

I How does NOx relate to E?Does the relationshipdepend on C?

I There are only 5 settings ofC (7.5, 9.0, 12.0, 15.0,18.0) so we condition onthese.

Page 23: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

NOx vs E given C

Equivalence ratio

Nitr

ogen

oxi

de c

once

ntra

tion

1

2

3

4

0.6 0.8 1.0 1.2

7.5

●●

9

0.6 0.8 1.0 1.2

●●

●●

12

15

0.6 0.8 1.0 1.2

1

2

3

4

●●

●●

● ●

●●

18

Page 24: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

NOx vs E given C

●●

●●

●●

●●

●●

●●●●

●●

0.6 0.7 0.8 0.9 1.0 1.1 1.2

12

34

Equivalence ratio

Nitr

ogen

oxi

de c

once

ntra

tion

7.59121518

Page 25: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Conclusions: Ethanol data

I NOx and E have a sinoid relationship.

I For E between 0.5 and about 0.9 the relationship is increasingand from 0.9 to 1.3 it is decreasing.

I The impact of C on the relationship is reflected by a largervariation for E between 0.5 and 0.9.

Page 26: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Example: yarn

In an experiment to test the strength of different yarns, lengths ofyarn are repeatedly stressed until they break (cycles to failure). Itis desired to see how this variable is related to the length of theyarn samples, the amplitude and the load (two variables related tothe amount of stress). The experiment involved using 3amplitudes, 3 lengths and 3 loads, for a total of 27 = 3× 3× 3different experimental conditions. (Coursebook p.9)

Page 27: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Testing procedure

Cycles to failure: number of pushes before yarn breaks.

length

amplitude

Page 28: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between
Page 29: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Yarn length vs. yarn cycles

cycles

leng

th

low

med

high

0 1000 2000 3000

amp:lowload:low

amp:medload:low

0 1000 2000 3000

amp:highload:low

low

med

high

amp:lowload:med

amp:medload:med

amp:highload:med

low

med

high

amp:lowload:high

0 1000 2000 3000

amp:medload:high

amp:highload:high

Page 30: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Conclusions

I For longer lengths, the cycles to failure are higher. (less likelyto break)

I High loads reduce the cycles to failure. (more likely to break)

I High amplitudes reduce the cycles to failure. (more likely tobreak)

I Most likely to break when load and amplitude are high andlength is low.

Page 31: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Summary

I Trellis graphics are a powerful tool to assess the influence ofmultiple variables at once!

I Permits to compare different types of variables.

I Can detect confounding between variables!

I Can help to untangle difficult relationships.

Page 32: STATS 330: Lecture 4 - Department of Statisticsstats330/lects/330_lecture4_2015.pdf · Type of graph will depend on ... I Initial scatterplot indicated positive relationship between

Thank you