r: statistics? programme? and who are you? -- an abc introduction to r presented by guohui ding...

51
0 10 20 30 40 50 -2 -1 0 1 2 Simple Use of Color In a Plot J ust a Whisper of a Label -1.0 -0.5 0.0 0.5 1.0 sin and cos Phase Angle sin Sepal.Length 2.0 3.0 4.0 0.5 1.5 2.5 4.5 5.5 6.5 7.5 3.0 4.0 Sepal.Width P etal.Length 1 2 3 4 5 6 7 1.5 1 2 3 4 5 6 7 Petal.W idth E dgar A nderson's Iris D ata M ath can be beautiful... R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Upload: clara-bute

Post on 01-Apr-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

0 10 20 30 40 50

-2-1

01

2

Simple Use of Color In a Plot

Just a Whisper of a Label

-1.0

-0.5

0.00.5

1.0

sin and cos

Phase Angle

sin

Sepal.Length

2.0 3.0 4.0 0.5 1.5 2.5

4.5

5.5

6.5

7.5

2.0

3.0

4.0

Sepal.Width

Petal.Length

12

34

56

74.5 5.5 6.5 7.5

0.5

1.5

2.5

1 2 3 4 5 6 7

Petal.Width

Edgar Anderson's Iris Data

Math can be beautiful ...

cosr2e r 6

R: Statistics? Programme?and Who are You?

-- An ABC introduction to R

Presented byGuohui Ding

R&D, SIBS, CAS8 Sept, 2004For Fudan University

Page 2: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Main Topics Today

• What is R?

• How to administrate R?

• How does R work?

• How to apply R for statistical problem?

• How to program your R function?

• ………

Page 3: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

x

y

100 200 300 400 500 600 700 800

10

02

00

30

04

00

50

06

00

Maunga Whau Volcano

What is R?A brief history of R

0 200 400 600 800

02

00

40

06

00

A Topographic Map of Maunga Whau

Page 4: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

The legend of R

• R started in the early 1990’s as a project by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, intended to provide a statistical environment in their teaching lab. The lab had Macintosh computers, for which no suitable commercial environment was available.

Robert GentlemanRoss Ihaka

Page 5: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

R’s Parents(1)• The S language

– S: an interactive environment for data analysis developed at Bell Laboratories since 1976

– Exclusively licensed by AT&T/Lucent to Insightful Corporation, Seattle WA. Product name: “S-plus”.

You can learn more from:http://cm.bell-labs.com/cm/ms/departments/sia/S/history.html

My father is S, mother is

Scheme, but why my name

is “R”?

Page 6: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

• The Scheme languageScheme is a statically scoped and properly tail-recursive dialect of the Lisp programming language invented by Guy Lewis Steele Jr. and Gerald Jay Sussman.

Learn more: http://swiss.csail.mit.edu/projects/scheme/

• Scheme’s underlying semantics + S’syntax = R

R’s Parents(2)

“ We have named our language R –in part to acknowledge the influence of S and in part to celebrate our own efforts.”

-- R. Ihaka R. Gentleman

-- Ihaka R. & Gentleman R., 1996

Page 7: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

R Now

• Since mid-1997 there has been a core group who can modify the R source code CVS archive.

• The R package system CRAN (the Comprehensive

R Archive Network )

http://www.r-project.org

Page 8: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

The characters of R

• R is “GNU S” — A language and environment for data manipula-tion, calculation and graphical display. – That is R is a Free Software (or Open source software).

(Here, Free refers to freedom, not price, although R is free in that sense as well.)

• The core of R is an interpreted computer language.– A mosaic of procedure-based programming and object-oriented

programming – Good interface to procedures written in C, C++, FORTRAN and

other languages– A flexible data exchange mechanism accessing

relational databases -ODBC, PostgreSQL, MySQL and so on.

——小偷与强盗的谈判

Page 9: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

R and Statistics

• Most packages deal with statistics and data analysis.

• Powerful statistical graphics.

• Well crosstalking with other statistical softwares.

• Most R user are statistical experts. You can learn more modern analysis method from they by email.

• You can do it when you come across a thing no body do it before.

Page 10: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Install and administrate R

Focus on Windows(MS)

rowcolumn

volcano

Page 11: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

How do I get R?• The informational web site http://www.r-project.org/• CRAN - the Comprehensive R Archive Network.

– The primary site is http://cran.r-project.org/ .Mirror sites are available for many countries.– CRAN sites have binary distributions for Windows 95, 98, ME, NT4, 2000 and XP on Intel,

for the Macintosh (System 8.6 to 9.1 and MacOS X), and for several Linux distributions.

• New releases occur frequently – about every 3 months.

Be prepared to re-install frequently.

• Also you can get it from your friends, teachers, etc.

Down it!It is about 20.6M

in size.

Using Precompiled Binary Distributions

Page 12: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Installing R

• Double click “rw1091.exe” using your mouse. That is OK. You can install it as all other standard MS softwares.

Page 13: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

R Console/RGui in Windows(MS)

Command box

Graphics boxMenu

Icons

Page 14: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Several concepts in Administrating R

• Workspace– xxx.RData

• History– xxx.Rhistory

• Package• Object• Session • Console

Run your R codes

Load/save workspace

Load/save History

Change your working directory

-- Ihaka R. & Gentleman R., 1996

Page 15: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Add a new package

• Commands:– library() add a package in the library– detach(package : xxx) detach a package

• All can do in the GUI (except detach())

Load a local package

Install packages frominternet or local

Update the local package from internet

Page 16: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Packages in R Environment

• Basic packages– "package:methods" "package:stats" "package:graphics“

"package:utils" "package:base"

• Recommanded packages– grid; lattice;e1071…

• Contributed packages (more than 366 packages nowadays)

– ……You can see what packages loaded now by the command search().

Page 17: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Don’t lose your way!

• Three useful system command– getwd() Get Working Directory

– setwd() Set Working Directory

– list.files() List the Files in a Directory/Folder

Page 18: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Show the Demonstrations of the Packages/Functions

• Commands– demo() Demonstrations of R Functionality

– example() Run an Examples Section from the Online Help

Page 19: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Getting Helps

• Several commands– help.start()

– help() or ?()

– help.search()

– apropos()

• Internet searching– I like it very

much. It seems

omnipotence.

Page 20: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Quit R

• Command– q() Terminate an R Session

Page 21: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

How does R work?

Basic R Structure and data manipulation

-60 -40 -20 0 20 40 60 80

-60

-40

-20

02

04

06

0

clusplot(clara(x = xclara, k = 3, keep.data = FALSE))

Component 1

Co

mp

on

en

t 2

These two components explain 100 % of the point variability.

Page 22: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Basic R working flow(Object orientation)

package

-- R for Beginners. Emmanuel Paradis

Page 23: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Object orientation• Object: a collection of atomic variables and/or other

objects that belong together

• Parlance:– class: the “abstract” definition of it– object: a concrete instance– method: other word for ‘function’– slot: a component of an object

Page 24: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Types of Data in R• The basic data object is a vector of elements of type:

– numeric numbers - either floating point or integer– character each element is a character string– logical each element is TRUE or FALSE– list elements can be any type of object, including other lists

• Components of the S language, such as functions, are also vectors.• Any vector can include the missing data marker NA as an element.• All vectors have a length and a mode. The functions length and mode

return this information as does the str function.• A structure consists of a data object plus additional information.

Matrices (or arrays, in general) and time series are examples of structures.

Page 25: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Operators

Page 26: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Vectors, Matrices and Arrays

• Command: – array(data = NA, dim = length(data), dimnames = NULL)

– matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)

Page 27: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Lists

• List vs. Vector– list: an ordered collection of data of arbitrary types. – vector: an ordered collection of data of the same type.– Typically, vector elements are accessed by their index (an integer),

list elements by their name (a character string). But both types support both access methods.

Page 28: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Factors

• Factors: classification variables

• If the levels of a factor are numeric (e.g. the treatments are labelled“1”, “2”, and “3”) it is important to ensure that the data are ctually

stored as a factor and not as numeric data. Always check this by using summary.

Page 29: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Data frames

• data frame: is supposed to represent the typical data table that researchers come up with – like a spreadsheet.

– It is a rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. ( A list actually)

Page 30: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Subsetting

Individual elements of a vector, matrix, array or data frame are accessed with “[ ]” by specifying their index, or their name

Page 31: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Using R on Windows(MS)

Basic statistical analysis by R

70-74 65-69 60-64 55-59 50-54

Rural MaleRural FemaleUrban MaleUrban Female

Death Rates in Virginia

Faked 95 percent error bars

02

04

06

08

01

00

Mean 60.35 Mean 40.4 Mean 25.88 Mean 16.93 Mean 11.05

Page 32: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Data Input

• From the keyboard one by one– c( ); scan( )

• From the file– read.table(); read.csv(); read.csv2();

read.dta(); read.spss(); …

• By a spreadsheet– data.entry()– edit()– fix()– ……

Page 33: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Data Edit

• Commands– edit()– fix()

Tips: edit() can invokean notepad in the RGui!

Page 34: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Data Discription

• Commands– summary()– mean()– sd()– hist()– boxplot()– ……

Page 35: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Probability Distribution

Page 36: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Three useful prefix in Probability Distribution Function

• dxxx for the density

• pxxx for the CDF

• qxxx for the quantile function

• rxxx for the simulation(random deviates)

They are different!The seed is set by the system.

You can set seed yourselfby set.seed().

Page 37: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Statistical Inference• Commands

– qxxx () for the quantile function– t.test()– wilcox.test(stats)– kruskal.test(stats)– var.test();

shapiro.test();qqnorm(); qqline()

--……

Page 38: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Analysis of variance and Regression Analysis

• Commands– anova()– lm()– ……

Page 39: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Experiment Design

• Commands– sample()– power.t.test()– ……

Page 40: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Save Object/Data

• Every R object can be stored into and restored from a file with the commands “save” and “load”.

> save(x, file=“x.Rdata”)> load(“x.Rdata”)

• Importing and exporting data with rectangular tables in the form of tab-delimited text files.

> write.table(x, file=“x.txt”, sep=“\t”)

Page 41: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Graphics with R

Page 42: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

A Friendly R Environment -- Rcmdr

If you don’t like a command line environment,

package Rcmdr may be a good choice!

Page 43: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Cube Root Ozone (cube root ppb)

Wind Speed (mph)

Te

mp

era

ture

(F

)

2.0

2.5

3.0

3.5

3.5

4.0

4.0

4.55.0

5.5

60

70

80

90

5 10 15 20

radiation

2.5

3.0

3.5 4.0

4.5

4.55.05.56.06.5

radiation

2.5

3.0

3.5

4.0 4.5

5.0

5.05.56.0

6.57.0radiation

2.5

3.0

3.5

4.0

4.5

5.0

5.05.56.06.57.0

60

70

80

90

radiation

5 10 15 20

1

2

3

4

5

6

7

8

R programming (.R)

Program your R code own

5 10 15 20

-1.0

-0.5

0.0

0.5

1.0

sin

es

Page 44: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Control Flow

• if(cond) expr

• if(cond) cons.expr else alt.expr

• for(var in seq) expr

• while(cond) expr

• repeat expr

• break

• next

Page 45: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Loops

• The main loop construct in R is for. The commonest use, as in C and other languages, is to count from 1 to n.– for (i in 1:n) {

## do something}

Page 46: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Leaving loops

• The break and next commands allow the flow of a loop to be altered–break jumps out the loop

–next jumps to the next iteration of the loop

Page 47: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Avoiding Iteration• The canonical bad R program looks like this

• ## multiply two vectors• for(i in 1:n) {

d[i] <- a[i] * b[i]• }• ##compute the inner product• s <- 0• for (i in 1:n){• s <- s + d[i]• }

• The right way to do this is– s<-sum(a*b)

• apply(); lapply(); sapply()

Page 48: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Write R function

A function definition looks like

median <- function(x, na.rm = FALSE)

{…lots of code...## a return value}

Page 49: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

More

• Packages

• Objects and methods

• Debugging and optimisation

• Connecting to other packages

• Interface to other programme language or DataBase

R++? ++R!

Page 50: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Some Resources

• A Course (The ppt is showed with R Development Core Group)– http://faculty.washington.edu/tlumley/Rcourse/

• A Paper (citing R in a publication)– Ihaka R. & Gentleman R. 1996. R: a language for data analysis and graphics. Journal of

Computational and Graphical Statistics 5: 299–314.

• Two URL– http://www.r-project.org– http://www.ats.ucla.edu/stat/

• Several Books– Using R for Data Analysis and Graphics—An Introduction. J.H. Maindonald– An Introduction to R. The R Development Core Team– simpleR –Using R for Introductory Statistics. John Verzani– R for Beginners. Emmanuel Paradis– The R Reference Manual Base Package. The R Development Core Team

Page 51: R: Statistics? Programme? and Who are You? -- An ABC introduction to R Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004 For Fudan University

Acknowledge

PhD. Qi Liu Prof. Naiqing Zhao

Prof. Gang Pei Everyone Here

Prof. Yixue Li

Any Question?