introduction to r programming

Post on 09-May-2015

1.145 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Quantitative Data Analysis - Part I: Introduction to R programming - Ma

TRANSCRIPT

Quantitative Data Analysis

Working with R

Working with RWhat is R

A computer language, with orientation toward statistical applications

AdvantagesCompletely free, just download from Internet

Many add-on packages for specialized uses

Open source

Getting Started: Installing RHave Internet connectionGo to http://cran.r-project/R for Windows screen, click “base”Find, click on download R Click Run, OK, or Next for all screensEnd up with R icon on desktop

At http://cran.r-project.org/

Haga clic para modificar el estilo de texto del patrónSegundo nivel

● Tercer nivel● Cuarto nivel

● Quinto nivel

Downloading Base R

Click on WindowsThen in next screen, click on “base”Then screens for Run, OK, or NextAnd finally “Finish”

will put R icon on desktop

Rgui and R Consolenending with R prompt (>)

Haga clic para modificar el estilo de texto del patrónSegundo nivel

● Tercer nivel● Cuarto nivel

● Quinto nivel

The R prompt (>)

> This is the “R prompt.” It says R is ready to take your command.Enter these after the prompt, observe output

>2+3

>2^3+(5)

>6/2+(8+5)

>2 ^ 3 + (5)

Installing Packages and Libraries

install.packages("akima")install.packages("chron")install.packages("lme4")install.packages("mcmc")install.packages("odesolve")install.packages("spdep")install.packages("spatstat")install.packages("tree")install.packages("lattice")

Installing Packages and Libraries

Installing Packages and Libraries

R.versioninstalled.packages()update.packages()setRepositories()

Help

help(mean) ?meanhelp will not find a function in a package unless you install it and load it with libraryhelp.search(“aspline”) will find functions in packages installed but not loadedapropos("lm")

Help

For help on whole package:help(package=akima)

objects(grep("akima",search()))

library(“akima”) my.packages <- search()aki <- grep("akima",my.packages)my.objects <- objects(aki)

Help

example(mean)

demo()demo(package = packages(all.available = TRUE))demo(graphics)

vignette(all=TRUE)V <- vignette("sp")print(V)edit(V)

Maintenance

ls() / objects()search()class(a)rm(a,b,c)rm(list=ls())

Maintenance

getwd()setwd()source("myprogram.R ")save(list = ls(all=TRUE), file= "all.Rdata")load("all.Rdata")save.image()savehistory()

To cite use of R

To cite the use of R for statistical work, R documentation recommends the following: R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.

Get the latest citation by typing citation ( ) at the prompt.

Email Support Lists

http://r-project.org under "mailing lists"r-help is the most general oneBefore posting, read: http://www.R-project.org/postingguide.htmlSend the smallest possible example of your problem (generated data is handy)sessionInfo() will list your computer & R details to cut/paste to your question

Quantitative Data Analysis

Programming with R

Basic concepts

CodeCommandsProgramsObjectsTypesFunctionsOperators

assignment

a <- 1assign("b", 2)

Mathematical operators

+ - */ ^ arithmetic> >= < <= == != relational! & logical$ list indexing (the ‘element name’ operator): create a sequence~ model formulae

Logical operators

! logical NOT& logical AND| logical OR< less than<= less than or equal to> greater than>= greater than or equal to== logical equals (double =)!= not equal&& AND with IF|| OR with IFxor(x,y) exclusive ORisTRUE(x) an abbreviation of identical(TRUE,x)all(x)any(x)

Mathematical functions

log(x) log to base e of xexp(x) antilog of x exlog(x,n) log to base n of xlog10(x) log to base 10 of xsqrt(x) square root of x

factorial(x) x!choose(n,x) binomial coefficients n!/(x! n−x!)gamma(x) x, for real x x−1!, for integer xlgamma(x) natural log of x

Mathematical functions

floor(x) greatest integer <xceiling(x) smallest integer >xtrunc(x) round(x, digits=0) round the value of x to an integerabs(x) the absolute value of x, ignoring the minus sign if there is onesignif(x, digits=6) give x to 6 digits in scientific notation

Trigonometrical functions

cos(x) cosine of x in radianssin(x) sine of x in radianstan(x) tangent of x in radiansacos(x), asin(x), atan(x) inverse trigonometric transformations of real or complex numbersacosh(x), asinh(x), atanh(x) inverse hyperbolic trigonometric transformations of real or complex numbers

Infinity and Things that Are Not a Number

Inf (is.finite,is.infinite)3/0

2 / Inf

exp(-Inf)

(0:3)^Inf

NaN (is.nan)0/0

Vectors

a <- c(1,2,3,4,5)a <- 1:5a <- scan()a <- seq(1,10,2)b <- 1:4a <- seq(1,10,along=b)x <- runif(10)which(a == 2)

Plotting functions

x<-seq(-10,10,0.1)y<-x^3plot(x,y,type=‘l’)

Vector functions

max(x) maximum value in xmin(x) minimum value in xsum(x) total of all the values in xsort(x) a sorted version of xrank(x) vector of the ranks of the values in xorder(x) an integer vector containing the permutation to sort x into ascending orderrange(x) vector of minx and maxx

More functions

cumsum(x) vector containing the sum of all of the elements up to that pointcumprod(x) vector containing the product of all of the elements up to that pointcummax(x) vector of non-decreasing numbers which are the cumulative maxima of the values in x up to that pointcummin(x) vector of non-increasing numbers which are the cumulative minima of the values in x up to that pointpmax(x,y,z) vector, of length equal to the longest of x y or z, containing the maximum of x y or z for the ith position in eachpmin(x,y,z) vector, of length equal to the longest of x y or z, containing the minimum of x y or z for the ith position in eachrowSums(x) row totals of dataframe or matrix xcolSums(x) column totals of dataframe or matrix x

functions

Geometric mean (p.49)

geometric<-function (x) exp(mean(log(x)))

Harmonic mean (p.51)

harmonic<-function (x) 1/mean(1/x)

Exercises

Finding the value in a vector that is closest to a specified valueclosest<-function(xv,sv){ xv[which(abs(xv-sv)==min(abs(xv-sv)))]}

Calculate a trimmed mean of x which ignores both the smallest and largest values

trimmed.mean <- function (x) { mean(x[-c(which(x==min(x)),which(x==max(x)))])}

Sets

union(x,y)intersect(x,y)setdiff(x,y)setequal(x,y),is.element(el,set)

Matrices

X<-matrix(c(1,0,0,0,1,0,0,0,1),nrow=3)dim(X)is.matrix(X)

vector<-c(1,2,3,4,4,3,2,1)V<-matrix(vector,byrow=T,nrow=2)dim(vector) <- c(2,4)

Matrices

X<-rbind(X,apply(X,2,mean))X<-cbind(X,apply(X,1,var))

sweep

matdata<-read.table("data\\sweepdata.txt")cols<-apply(matdata,2,mean)sweep(matdata,2,cols)

listsperson <- list()person$name <- "Alberto”person$age <- 37person$nationality <- "Spain“class(persona)[1] "list"

> persona$name[1] "Alberto"

$age[1] 37

$nationality[1] "Spain"

names(persona)[1] “name" “age" "nationality"

Stringsphrase<-"the quick brown fox jumps over the lazy dog"letras <- table(strsplit(phrase,split=character(0)))numwords<-1+table(strsplit(phrase,split=character(0)))[1]

words <- unlist(strsplit(phrase,split=" "))words[grep("o",words)]"fox" %in% unlist(strsplit(phrase,split=" "))unlist(strsplit(phrase,,split=" ")) %in% c("fox","dog")

Strings

nchar(words)paste(words[1],words[2])toupper(words)

Regular expressions

grep("^t", words)words[grep("^t", words)]words[grep("s$", words)]gsub("o","O",words)regexp()

Dataframes

lista <- data.frame() lista[1,1] = "Alberto"lista[1,2] = 37lista[2,1] = "Ana"lista[2,2] = 23names(lista) <- c("Ana", "Edad")

Missing values

NA (is.na)x<-c(1:8,NA)mean(x)mean(x,na.rm=T)which(is.na(x))as.vector(na.omit(x))x[!is.na(x)]

Dates and Times in R

date()date<- as.POSIXlt(Sys.time())unlist(unclass(date))difftime()excel.dates <- c("27/02/2004", "27/02/2005", "14/01/2003“,"28/06/2005", "01/01/1999")strptime(excel.dates,format="%d/%m/%Y")

Testing and Coercing in R

if

if (y > 0) print(1) else print (-1)z <- ifelse (y < 0, -1, 1)

Loops and Repeatsfor (i in 1:10) print(i^2)

t = 1

while(t<=10) {

print(i^2)

i <- i + 1

}

t = 1

repeat {

if (i > 10)break

print(i^2)

i <- i + 1

}

Exercise

Compute the Fibonacci series 1, 1, 2, 3, 5, 8

fibonacci<-function(n) {

a<-1

b<-0

while(n>0)

{swap<-a

a<-a+b

b<-swap

n<-n-1 }

b }

Avoid loops

x<-runif(10000000)

system.time(max(x))

pc<-proc.time()

cmax<-x[1]

for (i in 2:length(x)) {

if(x[i]>cmax) cmax<-x[i]

}

proc.time()-pc

switch

central<-function(y, measure) {switch(measure,

Mean = mean(y),

Geometric = exp(mean(log(y))),

Harmonic = 1/mean(1/y),

Median = median(y),

stop("Measure not included"))

}

Quantitative Data Analysis

Working with datasets

Help for DatasetsTo list built-in datasets:

data()data(package = .packages(all.available = TRUE))data(swiss)

For help on a dataset: help(swiss) “Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.”

The attach Command

To access individual variables, do this:> attach(swiss)Now try:> mean(Fertility)> detach(swiss)

Using R Functions: Simple Stuff

rownames(swiss)colnames(swiss)• summary(swiss)

Applying functionsmean(swiss$Fertility)

sd(swiss$Fertility)

apply(swiss,2,max)

Factorsclass(Detergent)nlevels(Detergent)levels(Detergent)as.factor()

Working with your dataset

fix(swiss)hist(Agriculture)plot(Catholic,Fertility)

Working with your own datasets

write.table(swiss, "swiss.txt")swiss2 <- read.table("swiss.txt")

data<-read.table(file.choose(),header=T)

readLines()

Reading data from files

read.table(file) reads a file in table format and creates a data frame from it; the default separator sep="" is any whitespace; use header=TRUE to read the first line as a header of column names; use as.is=TRUE to prevent character vectors from being converted to factors; use comment.char="" to prevent "#" from being interpreted asa comment; use skip=n to skip n lines before reading data; see thehelp for options on row naming, NA treatment, and othersread.csv("filename", header=TRUE) id. but with defaults set for reading comma-delimited filesread.delim("filename", header=TRUE) id. but with defaults setfor reading tab-delimited filesread.fwf(file,widths)read a table of f ixed width f ormatted data into a ’data.frame’; widthsis an integer vector, giving the widths of the fixed-width fields

Example

data<-read.table(".\\data\\daphnia.txt",header=T)names(data)attach(data)table(Detergent)tapply(Growth.rate,Detergent,mean)aggregate(Growth.rate,list(Detergent), mean)tapply(Growth.rate,list(Water,Daphnia),median)with(data,boxplot(Growth.rate ~ Detergent))

top related