stats 330: lecture 4

43
07/04/22 330 Lecture 4 1 STATS 330: Lecture 4

Upload: natara

Post on 03-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

STATS 330: Lecture 4. Graphics: Doing it in R. Housekeeping. My contact details…. Plus much else on course web page www.stat.auckland.ac.nz/~lee/330/ Or via Cecil. Today’s lecture: R for graphics. Aim of the lecture: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: STATS 330: Lecture 4

04/22/23 330 Lecture 4 1

STATS 330: Lecture 4

Page 2: STATS 330: Lecture 4

04/22/23 330 Lecture 4 2

HousekeepingMy contact details….

Plus much else on course web page

www.stat.auckland.ac.nz/~lee/330/

Or via Cecil

Page 3: STATS 330: Lecture 4

04/22/23 330 Lecture 4 3

Page 4: STATS 330: Lecture 4

04/22/23 330 Lecture 4 4

Today’s lecture: R for graphics

Aim of the lecture:

To show you how to use R to produce the plots shown in the last few lectures

Page 5: STATS 330: Lecture 4

04/22/23 330 Lecture 4 5

Getting data into R In 330, as in many cases, data comes in 2 main

forms• As a text file• As an Excel spreadsheet

Need to convert from these formats to R Data in R is organized in data frames

• Row by column arrangement of data (as in Excel)

• Variables are columns• Rows are cases (individuals)

Page 6: STATS 330: Lecture 4

04/22/23 330 Lecture 4 6

Text files to R Suppose we have the data in the form of a text

file Edit the text file (use Notepad or similar) so that

• The first row consists of the variable names• Each row of data (i.e. data on a complete case)

corresponds to one line of the file Suppose data fields are separated by spaces

and/or tabs Then, to create a data frame containing the

data, we use the R function read.table

Page 7: STATS 330: Lecture 4

04/22/23 330 Lecture 4 7

Example: the cherry tree data

Suppose we have a text file called cherry.txt (probably created using Notepad or maybe Word, but saved as a text file)

First line: variable names

Data for each tree on a separate line, separated

by “white space” (spaces or tabs)

Page 8: STATS 330: Lecture 4

04/22/23 330 Lecture 4 8

Creating the data frame

In R, type

cherry.df = read.table(file.choose(),

header=TRUE)

and press the return key

This brings up the dialog to select the file cherry.txt

containing the data.

Click here to select file

Click here to load data

Page 9: STATS 330: Lecture 4

04/22/23 330 Lecture 4 9

Check all is OK!

Page 10: STATS 330: Lecture 4

04/22/23 330 Lecture 4 10

Getting data from a spreadsheet (1)

Create the spreadsheet in Excel

Save it as Comma Delimited Text (CSV)

This is a text file with all cells separated by commas

File is called cherry.csv

Page 11: STATS 330: Lecture 4

04/22/23 330 Lecture 4 11

Getting data from a spreadsheet (2)

In R, type

cherry.df = read.table(file.choose(),

header=TRUE, sep=“,”)

and proceed as before

Page 12: STATS 330: Lecture 4

Getting data from the R330 package

The package R330 contains several data sets used in the course, including the cherry tree data

To access the data frame:• Install the R330 package (see Appendix A.10 of the

coursebook)• In R, type

> library(R330)

> data(cherry.df)

04/22/23 330 Lecture 4 12

Page 13: STATS 330: Lecture 4

04/22/23 330 Lecture 4 13

Data frames and variables

Suppose we have read in data and made a data frame

At this point R doesn’t know about the variables in the data frame, so we can’t use e.g. the variable diameter in R commands

We need to say attach(cherry.df)

to make the variables in cherry.df visible to R.

Alternatively, say cherry.df$diameter (better)

Page 14: STATS 330: Lecture 4

04/22/23 330 Lecture 4 14

Scatterplots

In R, there are 2 distinct sets of functions for graphics, one for ordinary graphics, one for trellis.

Eg for scatterplots, we use either plot (ordinary R) or xyplot (Trellis)

In the next few slides, we discuss plot.

Page 15: STATS 330: Lecture 4

04/22/23 330 Lecture 4 15

Simple plottingplot(cherry.df$height,

cherry.df$volume,

xlab=“Height (feet)”,

ylab=“Volume (cubic feet)”,

main = “Volume versus height for 31 black cherry trees”)

i.e. label axes (give units if possible), give a title

Page 16: STATS 330: Lecture 4

04/22/23 330 Lecture 4 16

65 70 75 80 85

10

20

30

40

50

60

70

Volume versus height for 31 black cherry trees

Height (feet)

Vo

lum

e (

cub

ic fe

et)

Page 17: STATS 330: Lecture 4

Alternative form of plotplot(volume ~ height,

xlab=“Height (feet)”,

ylab=“Volume (cubic feet)”,

main = “Volume versus height for 31 black cherry trees”,

data = cherry.df)

Don’t need use the $ notation with this form, note reversal of x,y

04/22/23 330 Lecture 4 17

Page 18: STATS 330: Lecture 4

04/22/23 330 Lecture 4 18

Colours, points, etcpar(bg="darkblue")plot(cherry.df$height, cherry.df$volume, xlab="Height (feet)", ylab="Volume (cubic feet)", main = "Volume versus height for 31 black cherry trees", pch=19,fg="white", col.axis=“lightblue",col.main="white", col.lab=“white",col="white",cex=1.3)

Type

?par

for more info

Page 19: STATS 330: Lecture 4

04/22/23 330 Lecture 4 19

65 70 75 80 85

10

20

30

40

50

60

70

Volume versus height for 31 black cherry trees

Height (feet)

Vo

lum

e (

cub

ic fe

et)

Page 20: STATS 330: Lecture 4

04/22/23 330 Lecture 4 20

Lines Suppose we want to join up the rats on the

rats plot. (see data next slide) We could try

plot(rats.df$day, rats.df$growth, type=“l”)

but this won’t work Points are plotted in order they appear in

the data frame and each point is joined to the next

Page 21: STATS 330: Lecture 4

04/22/23 330 Lecture 4 21

Rats: the data> rats.df growth group rat change day1 240 1 1 1 12 250 1 1 1 83 255 1 1 1 154 260 1 1 1 225 262 1 1 1 296 258 1 1 1 367 266 1 1 2 438 266 1 1 2 449 265 1 1 2 5010 272 1 1 2 5711 278 1 1 2 6412 225 1 2 1 112 230 1 2 1 8

... More data

Page 22: STATS 330: Lecture 4

04/22/23 330 Lecture 4 22

0 10 20 30 40 50 60

30

04

00

50

06

00

day

gro

wth

Don’t want this!

Page 23: STATS 330: Lecture 4

04/22/23 330 Lecture 4 23

SolutionVarious solutions, but one is to plot each line

separately, using subsetting

plot(day,growth,type="n")lines (day[rat==1],growth[rat==1])lines (day[rat==2],growth[rat==2])

and so on …. (boring!), or (better)

for(j in 1:16){lines (day[rat==j],growth[rat==j])}

Draw axes, labels only

Page 24: STATS 330: Lecture 4

04/22/23 330 Lecture 4 24

Indicating groupsWant to plot the litters with different colours, add a legend:

Rats 1-8 are litter 1, 9-12 litter 2, 13-16 litter 3

plot(day,growth,type="n")

for(j in 1:8)lines(day[rat==j],growth[rat==j],col="white") # litter 1

for(j in 9:12)lines (day[rat==j], growth[rat==j],col="yellow") # litter 2

for(j in 13:16)lines (day[rat==j], growth[rat==j],col="purple") # litter 3

Set colour of line

Page 25: STATS 330: Lecture 4

04/22/23 330 Lecture 4 25

legendlegend(13,380,legend = c(“Litter 1”, “Litter 2”,

“Litter 3”), col = c("white","yellow","purple"),lwd = c(2,2,2),horiz = TRUE,cex = 0.7)

(Type ?legend for a full explanation of these parameters)

Page 26: STATS 330: Lecture 4

04/22/23 330 Lecture 4 26

0 10 20 30 40 50 60

30

04

00

50

06

00

day

gro

wth

Litter 1 Litter 2 Litter 3

Page 27: STATS 330: Lecture 4

Points and text

x=1:25

y=1:25

plot(x,y, type="n")

points(x,y,pch=1:25, col="red",

cex=1.2)

04/22/23 27330 Lecture 4

Page 28: STATS 330: Lecture 4

5 10 15 20 25

51

01

52

02

5

x

y

04/22/23 28330 Lecture 4

Page 29: STATS 330: Lecture 4

Points and text (3)

x=1:26

y=1:26

plot(x,y, type="n")

text(x,y, letters, col="blue", cex=1.2)

04/22/23 29330 Lecture 4

Page 30: STATS 330: Lecture 4

0 5 10 15 20 25

05

10

15

20

25

x

y

ab

cd

ef

gh

ij

kl

mn

op

qr

st

uv

wx

yz

04/22/23 30330 Lecture 4

Page 31: STATS 330: Lecture 4

Use of pos

04/22/23 330 Lecture 4 31

x = 1:10y = 1:10plot(x,y)

position = rep(c(2,4), 5)mytext = rep(c(“Left",“Right"), 5)text(x,y,mytext, pos=position)

Page 32: STATS 330: Lecture 4

04/22/23 330 Lecture 4 32

Page 33: STATS 330: Lecture 4

04/22/23 330 Lecture 4 33

Trellis Must load trellis library first

library(lattice)

General form of trellis plots

xyplot(y~x|W*Z, data=some.df)

Don’t need to use the $ form, , trellis functions can pick out the variables, given the data frame

Page 34: STATS 330: Lecture 4

04/22/23 330 Lecture 4 34

Main trellis functions

dotplot for dotplots, use when X is categorical, Y is continuous

bwplot for boxplots, use when X is categorical, Y is continuous

xyplot for scatter plots, use when both x and y are continuous

equal.count use to turn continuous conditioning variable into groups

Page 35: STATS 330: Lecture 4

Changing background colour

To change trellis background to white

trellis.par.set(background = list(col="white"))

To change plotting symbols

trellis.par.set(plot.symbol = list(pch=16, col="red", cex=1))

04/22/23 330 Lecture 4 35

Page 36: STATS 330: Lecture 4

04/22/23 330 Lecture 4 36

Equal.countxyplot(volume~height|diameter, data=cherry.df)

height

volu

me

20

40

60

80

65 70 75 80 85

diameter diameter

65 70 75 80 85

diameter diameter

65 70 75 80 85

diameter diameter

diameter diameter diameter diameter diameter

20

40

60

80diameter

20

40

60

80diameter diameter diameter diameter diameter diameter

diameter diameter diameter diameter diameter

20

40

60

80diameter

20

40

60

80diameter

65 70 75 80 85

diameter diameter

Page 37: STATS 330: Lecture 4

04/22/23 330 Lecture 4 37

Equal.count (2)diam.gp<-equal.count(diameter,number=4,overlap=0) xyplot(volume~height|diam.gp, data=cherry.df)

height

volu

me

10

20

30

40

50

60

65 70 75 80 85

diam.gp diam.gp

diam.gp

65 70 75 80 85

10

20

30

40

50

60diam.gp

Page 38: STATS 330: Lecture 4

Changing plotting symbols

To change plotting symbols

trellis.par.set(plot.symbol = list(pch=16, col="red", cex=1))

04/22/23 330 Lecture 4 38

Page 39: STATS 330: Lecture 4

04/22/23 330 Lecture 4 39

height

volu

me

10

20

30

40

50

60

65 70 75 80 85

diam.gp diam.gp

diam.gp

65 70 75 80 85

10

20

30

40

50

60diam.gp

Page 40: STATS 330: Lecture 4

04/22/23 330 Lecture 4 40

Non-trellis version

1020

3040

5060

70

65 70 75 80 85

65 70 75 80 85 65 70 75 80 85

1020

3040

5060

70

height

volu

me

10 12 14 16 18

Given : diameter

coplot(volume~height|diameter, data=cherry.df)

Page 41: STATS 330: Lecture 4

04/22/23 330 Lecture 4 41

Non-trellis version (2)

coplot(volume~height|diameter,data=cherry.df,number=4,overlap=0)

1030

5070

65 70 75 80 85

65 70 75 80 85

1030

5070

height

volu

me

10 12 14 16 18

Given : diameter

Page 42: STATS 330: Lecture 4

04/22/23 330 Lecture 4 42

Other useful functions

Regular R• scatterplot3d (3d scatter plot, load library

scatterplot3d)• contour, persp (draws contour plots, surfaces)• pairs

Trellis• cloud (3d scatter plot)

Page 43: STATS 330: Lecture 4

Rotating plots You need to install the R330 package

Create a data frame e.g. called data.df with the response in the first column

Then, type

reg3d(data.df)

04/22/23 330 Lecture 4 43