experimental epidemiology analyses with r and r commander ... · - download necessary web files put...

46
1 Experimental epidemiology analyses with R and R commander Lars T. Fadnes Centre for International Health University of Bergen

Upload: dongoc

Post on 11-Apr-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Experimental epidemiology analyses with R and R commander

Lars T. FadnesCentre for International Health

University of Bergen

2

• Click to add an outline

3

How to install R commander?

- install.packages("Rcmdr", dependencies=TRUE)-

- Download necessary web files put them in a folder:

http://statistics.fadnes.net/epi

– Fieldtrials12.RData

– Exercises-for-the-exp-epi-2012-03.rtf

4

• Install R (if not done already)

– http://www.r-project.org/

• Install package Rcmdr (if not done already)

– Package Install package scroll down to Rcmdr click OK

• Install package Epi

– Package Install package scroll down to Rcmdr click OK

• Download RcmdrPlugin.Cih.Epi from http://statistics.fadnes.net/epi/

• Install package(s) from local zip file (choose the package you just downloaded)

• Open program

– Load package Rcmdr

– Tools -> load Rcmdr plug-in(s) RcmdrPlugin.Cih.Epi

• click ok and Yes

• Now you are ready

Installation of RcmdrPlugin.Cih.Epi

5

Aim for this session

• Introduce a brilliant tool

• Analyse a dataset with the tool

• Guide to further knowledge

6

What is R?

• R is a free software environment that includes a set of base packages for graphics, math, and statistics.

• You can make use of specialized packages contributed by R users or write your own new functions.

7

Why R?

• Very powerful

• Developing extremely quickly

• Working on different platforms (not only Microsoft Windows…)

• Free of all costs

8

Why don’t all use R?

• Click to add an outline

9

What is R commander?

11

Why R commander?

• Powerful

• Free of all costs

• Working on different platforms (not only Microsoft Windows…)

• Easy to learn and to use…

12

How to install R?For Windows:• http://cran.r-project.org/bin/windows/base/

– Easy• If here at UiB, the IT department will fix it for you if you just ask them

to add it for you

For Linux (Ubuntu etc):- Very Easy

- Just search for R-base-core in Synaptic Package Manager and add it

- http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html

(also contains a good description for installation on Mac)

13

How to install R commander?

- Is already installed on UiB computers-

- If not:

install.packages("Rcmdr", dependencies=TRUE)

14

Some things to note first

• R is case-sensitive – help, Help, HELP and HELF are different…– Recommendation:

• Choose one style and stick to it

• If it’s something you don’t know?– There are lot’s of good information on the web

• Particularly for R

15

How to start?

• Open R– Load packages

Rcmdr

or write

• library(Rcmdr)

16

Menu• File Menu:

– items for loading and saving script files; – for saving output and the R workspace; – and for exiting

• Edit Menu: – items (Cut, Copy, Paste, etc.) for editing the contents of the script and output

windows.

• Data – Submenus containing menu items for reading and manipulating data.

• Statistics – Submenus containing menu items for a variety of basic statistical analyses.

17

Menu• Graphs

– Menu items for creating simple statistical graphs.

• Models – Menu items and submenus for obtaining numerical summaries, confidence intervals,

hypothesis tests, diagnostics, and graphs for a statistical model, and for adding diagnostic quantities, such as residuals, to the data set.

• Distributions – Probabilities, quantiles, and graphs of standard statistical distributions (to be used, for

example, as a substitute for statistical tables) and samples from these distributions.

• Tools – Menu items for loading R packages unrelated to the Rcmdr package (e.g., to access data

saved in another package), and for setting some options.

• Help Menu – items to obtain information about the R Commander (including this manual). As well, each R

Commander dialog box has a Help button (see below).

18

• Script Window

– R commands generated by the R Commander

– You can also type R commands directly into the script window or the R Console

– The main purpose of the R Commander, however, is to avoid having to type commands.

• Output Window – Printed output

• Messages Window– Displays error messages, warnings, and

notes

• Graphics Device window – When you create graphs, these will

appear in a separate window outside of the main R Commander window.

19

Available functions:

20

Click to add title

21

Save dataset under your documents folder

Files and documents are available at

http://statistics.fadnes.net/epi

22

Let’s get started…

• Change directory (under File)– Find the folder where

you placed your data file

• Import data– Give it the name: fieldtrials

• Save workspace as– Give a name to your file– The file contains the dataset and

any models you might have generated

23

Data Types

• Vectors – Quantitative difference (one vs. two apples)– including continuous (numerical) variables

– Number variables coded as vectors as default

• Factors – Qualitative difference (apples vs. pears)– Categorical

– Text variables coded as factors as default

• Matrices, lists, arrays and data frames

http://www.statmethods.net/input/datatypes.html

24

Variables in dataset - define the datatypes

id id number

gender male/female - (factor)

treatmentarm Treatment (1=zinc, 0=placebo) - (factor)

childage Age of the child in months - vector

breastfed Is the child breast fed - (factor)

lentils Does the child eat lentils? (0=no, 1= yes) - (factor)

meat Does the child eat meat? (0=no, 1= yes) - (factor)

duration Duration of diarrhea in days - vector

diarsev Severe diarrhoea ≥10 stools per day - (factor)

fever Did the child have fever at enrollment? - (factor)

clusterzn2/4/8/16 cluster variables identifying living areas

- coded as vector, but needs to be transformed into a factor

25

How to save?

• Save R workspace as…– This will save your data (in the R format)

• Save output as…– This will save your output– Another strategy is to cut and paste what you want to save

• Always save the commands (syntax)– essential if you want to re-run the analyses later– WordPad is a better option than Word etc (does not autocorrect -

change to upper case etc)

26

How to write and a command?

• Simply write the command in the script window, mark it and click ’Submit’ or press Ctrl+R

27

Nice to know:

• When writing comments in the syntax, start with the following sign #– R will then not consider the line as a

command

• If you are uncertain about a function, use google or help(name-of-function)

28

– Cluster has numbers and is as default coded as vector, but needs to be recoded into a factor (categorical variables for grouping etc)

29

Now we’re ready to answer some scientific questions…

30

Compute new variable

• Child age in years• Child age now given in months• of vaccination is often calculated by measuring

antibodies before and after vaccination

•childageyear = childage/ 12

31

Does the new variable look reasonable?

• View data set

32

Summarize variable

• Calculate mean, median and standard deviation for– childageyear

• for each intervention arm (treatmentarm)

• Numerical summaries

33

Doing calculations for subsetsby generating new datasets

34

• Placebo: treatmentarm == "placebo"

• Zinc: treatmentarm == "zinc"

Make the other subset by changing the syntax and run ('Submit') the syntax

zinc <- subset(fieldtrials3, subset=treatmentarm=="zinc")

placebo <- subset(fieldtrials3, subset=treatmentarm=="placebo")

35

– You can now easily change between the datasets

36

Make a histogram of childageyear

• First for the 'zinc' dataset

• Then for the 'placebo' dataset

• Are they look similar?

Note:– The histograms will be printed in the R window (not

inside R commander)– Right click on the graph and you can copy it as

metafile to paste it into a document, print it or save it

37

• Does it look normally distributed?

placebo$childageyear

frequ

ency

0.5 1.0 1.5 2.0 2.5

010

2030

40

38

Box plot

• Box plot for childageyear– By treatmentarm– (first remember to select the complete

fieldtrials dataset)

placebo zinc

0.5

1.0

1.5

2.0

2.5

treatmentarm

ch

ild

ag

eye

ar

39

Is the baseline child age different in the zinc and placebo arms?

• This can be checked with a robust test not assuming normal distribution?

• Check with a – non-paramethric

» two-sample wilcoxon test (log rank test)» Use the ’Exact’ test

40

Recoding variables

Diarrhoea duration can be recoded into an additional categorised variable (diarlong)

– Data manage variables

• value = ”factor”– Factor can be either number or word

• value, value, value = ”factor”– Listed with comma

• value:value = ”factor”– From lowest to highest values

• else all other values• NA missing

41

Click to add title

42

Are there differenses in syntax between R and R commander?

• Some few:– Commands that extend over more than one line should have the second and

subsequent lines indented by one or more spaces or tabs; all lines of a multiline command must be submitted simultaneously for execution.

– Commands that include an assignment arrow (<-) will not generate printed output, even if such output would normally appear had the command been entered in the R Console [the command print(x <- 10), for example]. On the other hand, assignments made with the equals sign (=) produce printed output even when they normally would not (e.g., x = 10).

– Commands that produce normally invisible output will occasionally cause output to be printed in the output window. This behaviour can be modified by editing the entries of the log-exceptions.txt file in the R Commander’s etc directory.

– Blocks of commands enclosed by braces, i.e., {}, are not handled properly unless each command is terminated with a semicolon (;). This is poor R style, and implies that the script window is of limited use as a programming editor. For serious R programming, it would be preferable to use the script editor provided by the Windows version of R itself, or – even better – a programming editor.

43

True or false quiz

• R commander only works for Ms Windows?

• R is case sensitive (difference with small and large letters)

• R is built by a few people with a secret source-code?

• R was the program that gave their shareholders most profit last year?

• There are a lot of enthusiastic people working with R providing help to their peers in the R forum?

44

R help forum:

• http://r.789695.n4.nabble.com/

45

Further reading:

• The R Commander A Basic-Statistics Graphical User Interface to R - John Fox 2005.pdf– http://www.jstatsoft.org/v14/i09/paper

• Getting started with the R Commander: a basic-statistics graphical user interface to R– http://socserv.mcmaster.ca/jfox/Getting-Started-with-the-Rcmdr.pdf

• Quick-R: magnificent guide– http://www.statmethods.net/

• http://cran.r-project.org/manuals.html

46

• You have learnt some basic skills and can now experiment with the program yourself

47

Questions and comments

• Click to add an outline