![Page 1: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/1.jpg)
Training on RFor 3rd and 4th Year Honours Students, Dept. of Statistics, RU
Empowered by
Higher Education Quality Enhancement Project (HEQEP)Department of Statistics
Rajshahi University, Rajshahi-6205, Bangladesh
March 21-23, 2013
Installation and Data Structures of R
![Page 2: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/2.jpg)
Statistical Programming Language S developed at Bell Labs, 1976.
Licensed as S-Plus in 1983.
1990 : R An open source program similar to S
Developed by Robert Gentleman and Ross Ihaka (Auckland, NZ)
1997: Developed international “R-core” team
Updated versions available every couple months
For more: http://cran.r-project.org/mirrors.html
History of R
![Page 3: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/3.jpg)
R is a free computer programming language, developed by renowned Statisticians.
It is open-source and runs on Windows, Linux and Macintosh.
R has excellent graphing capabilities. R has an excellent built-in help system. R's language has a powerful, easy to learn syntax with many
built-in statistical functions. The language is easy to extend with user-written functions.
Advantage of R
![Page 4: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/4.jpg)
To obtain and install R on your computer
Choose the appropriate item from the “Packages” menu
Go to http://cran.r-project.org/mirrors.html to choose a mirror near you
Click on your favorite operating system (Windows, Linux, or Mac)
Download and install from the “base”
To install additional packages
Start R on your computer
Here, CRAN = Comprehensive R Archive Network.
![Page 5: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/5.jpg)
To obtain and install R on your computer
![Page 6: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/6.jpg)
To obtain and install R on your computer
![Page 7: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/7.jpg)
To obtain and install R on your computer
![Page 8: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/8.jpg)
Double Click
To obtain and install R on your computer
![Page 9: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/9.jpg)
To obtain and install R on your computer
![Page 10: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/10.jpg)
To obtain and install R on your computer
![Page 11: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/11.jpg)
Command Prompt
Tools bar
Menu bar
The R Environment
![Page 12: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/12.jpg)
For clear screenctrl + L
The R Environment
![Page 13: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/13.jpg)
>
Creating a Script File
![Page 14: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/14.jpg)
Working in R: As Calculator
Operator SymbolAddition +
Subtraction -Multiplication *
Division /Power ^ or **
Numeric Operators
4 +2 =6 4 – 2 = 2 4 * 2 = 8 4 / 2 = 2 4 ^ 2 = 16
![Page 15: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/15.jpg)
Numeric5, 5.76, etc
Logical Values corresponding to True or False
Character StringsSequences of characters (blue, male, Rahim, etc)
Variables are assigned by the operator <- or = Data type need not to be declared.
a = 5 (or, a <- 5)b = “blue”c = a^2 + 5c > a etc
Variables & Assignment Operator
![Page 16: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/16.jpg)
Data Structure
Vectors Matrices Arrays Factors Lists Data frames
![Page 17: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/17.jpg)
c() to concatenate elements or sub-vectorsrep() to repeat elements or patternsseq() to generate sequences
> c(2, 7, 9)> [1] 2 7 9> a = c(2, 7, 9)> b = c(3, 5, 8, a)> b> [1] 2 7 9 2 7 9
rep(value(s), number of repetition)> rep(5,10) [1] 5 5 5 5 5 5 5 5 5 5> rep(c(2,4,6),3)[1] 2 4 6 2 4 6 2 4 6
VectorHere we introduce three functions, c, seq, and rep, that are used to create vectors in various situations.
seq(initial value, Terminated value, increment)> seq(2, 10, 2)> [1] 2 4 6 8 10
![Page 18: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/18.jpg)
h = c(21,25, 19, 22, 23, 20) # Numeric vectorh [1] 21 25 19 22 23 20
name = c(“Rahim”, “Rani”, “Raju”) # Character vectorname[1] “Rahim” “Rani” “Raju”
c = h > 22 # Logical vectorc[1] FALSE TRUE FALSE FALSE TRUE FALSE
a = c(1,2,3,4,5)a[1] 1 2 3 4 5
a = 1:5a[1] 1 2 3 4 5
Vector
![Page 19: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/19.jpg)
w = c(1, 3, 5, 2, 10)
> w[3] # the third element of w>[1] 5
> w[3:5] # the third to fifth element of w, inclusive>[1] 5 2 10
> w[w>3] # elements in w greater than 3>w[-2] # all except the second element>[1] 1 5 2 10
> w[w>2 & w<=5)# greater than 2 and less than or equal to 5
VectorIndexing
![Page 20: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/20.jpg)
w = c(1, 3, 5, 2, 10)length(w) sum(w)cumsum(w) min(w)max(w) range(w)sum(w) mean(w)median(w) var(w) std(w) summary(w)abs(10-50) sort(w)sort(w, decreasing=T) etc
VectorVector used in functions
![Page 21: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/21.jpg)
Specific R
keyword help(keyword)
?keyword
HTML
> ?mean# information on mean command> help(mean)
> help(median)
> help.start()
CRAN Full Manual help.start()HTML
Finding "vague" topic
help.search(“topic”)
??topic
Working in R: Using help
![Page 22: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/22.jpg)
# Generate a 3 by 4 array> x <- 1:12> dim(x) <- c(3,4)> x [,1] [,2] [,3] [,4][1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12
The dim assignment function sets or changes the dimension attribute of x, causing R to treat the vector of 12 numbers as a 3 × 4 matrix.
Notice that the storage is column-major; that is, the elements of the first column are followed by those of the second, etc.
# Generate a 4 by 5 array> A <- array(1:20, dim = c(4,5)) > A [,1] [,2] [,3] [,4] [,5][1,] 1 5 9 13 17[2,] 2 6 10 14 18[3,] 3 7 11 15 19[4,] 4 8 12 16 20
Array & MatrixA matrix in mathematics is just a two-dimensional array of numbers. Matrices and arrays are represented as vectors with dimensions:
![Page 23: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/23.jpg)
Array & MatrixA matrix in mathematics is just a two-dimensional array of numbers. Matrices and arrays are represented as vectors with dimensions:
# 3 x 2 matrix of 0> Y <- matrix(0, nrow=3, ncol=2) > Y [,1] [,2][1,] 0 0[2,] 0 0[3,] 0 0
# Generate a 3 by 2 Matrix > A = matrix(1:12, nrow=3, byrow=T)> A [,1] [,2] [,3] [,4][1,] 1 2 3 4[2,] 5 6 7 8[3,] 9 10 11 12
> A[ ,2] # 2nd column of matrix A[1] 2 6 10
> A[3, ] # 3rd row of matrix A[1] 9 10 11 12
> A[2 ,2] # (2, 2) th element of matrix A[1] 2 6 10
![Page 24: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/24.jpg)
Basic operations – MatrixR command Purpose (output)A+B addition of A and B matricesA * B element by element productsA %*% B product of A and B matrices t(A) transpose of matrix Asolve(A) inverse of matrix Acbind() forms matrices by binding together
matrices horizontally, or column-wise
rbind() forms matrices by binding together matrices vertically, or row-wise
![Page 25: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/25.jpg)
> A.mat <- matrix(c(19,8,11,2,18,17,15,19,10),nrow=3) > A.mat [,1] [,2] [,3][1,] 19 2 15[2,] 8 18 19[3,] 11 17 10
> inv.A <- solve(A.mat) # inverse of matrix A.mat
> t(A.mat) # transpose of matrix A.mat
> A.mat %*% inv.A
Basic operations – Matrix
![Page 26: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/26.jpg)
> a=matrix(1:9,nrow=3)> b=matrix(2:10, nrow=3)
> a [,1] [,2] [,3][1,] 1 4 7[2,] 2 5 8[3,] 3 6 9
> b [,1] [,2] [,3][1,] 2 5 8[2,] 3 6 9[3,] 4 7 10
> cbind(a,b) [,1] [,2] [,3] [,4] [,5] [,6][1,] 1 4 7 2 5 8[2,] 2 5 8 3 6 9[3,] 3 6 9 4 7 10
> rbind(a,b) [,1] [,2] [,3][1,] 1 4 7[2,] 2 5 8[3,] 3 6 9[4,] 2 5 8[5,] 3 6 9[6,] 4 7 10
Basic operations – Matrix
Cov.matrix = cov(b) Cor.matrix = cor(b)Row.mean = apply(b, 1, mean) Col.mean = apply(b, 2, mean)
NOTE: apply(X, MARGIN, FUN)
![Page 27: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/27.jpg)
vector: an ordered collection of data of the same type. > a = c(7,5,1)> a[2][1] 5
list: an ordered collection of data of arbitrary types. > a = list(Name="Rahim",age=c(12, 23,10), Married = F)> a$Name[1] "Rahim"$age[1] 12 23 10$Married[1] FALSE
Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string).
List
![Page 28: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/28.jpg)
Data frames Data frame is supposed to represent the typical data table that
researchers come up with – like a spreadsheet. It is a rectangular table with rows and columns with same length; data
within each column has the same type (e.g. number, text, logical), but different columns may have different types.
Example:> a localisation tumorsize progress1 proximal 6.3 FALSE2 distal 8.0 TRUE3 proximal 10.0 FALSE
![Page 29: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/29.jpg)
We illustrate how to construct a data frame from the following car data.
Make Model Cylinder Weight Mileage TypeHonda Civic V4 2170 33 Sporty
Chevrolet Beretta V4 2655 26 CompactFord Escort V4 2345 33 SmallEagle Summit V4 2560 33 Small
Volkswagen Jetta V4 2330 26 SmallBuick Le Sabre V6 3325 23 Large
Mitsubishi Galant V4 2745 25 CompactDodge Grand Caravan V6 3735 18 Van
Chrysler New Yorker V6 3450 22 MediumAcura Legend V6 3265 20 Medium
Making data frames
![Page 30: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/30.jpg)
Making data frames> Make <- c("Honda","Chevrolet","Ford","Eagle","Volkswagen","Buick","Mitsbusihi",
+ "Dodge","Chrysler","Acura")
> Model <- c("Civic","Beretta","Escort","Summit","Jetta","Le Sabre","Galant", + "Grand Caravan","New Yorker","Legend")
> Cylinder <-c (rep("V4",5),"V6","V4",rep("V6",3))
> Weight <- c(2170, 2655, 2345, 2560, 2330, 3325, 2745, 3735, 3450, 3265)
> Mileage <- c(33, 26, 33, 33, 26, 23, 25, 18, 22, 20)
> Type <- c("Sporty","Compact",rep("Small",3),"Large","Compact","Van", + rep("Medium",2))
![Page 31: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/31.jpg)
Now data.frame() function combines the six vectors into a single data frame.
> Car <- data.frame(Make, Model, Cylinder, Weight, Mileage, Type) > Car
Make Model Cylinder Weight Mileage Type
1 Honda Civic V4 2170 33 Sporty 2 Chevrolet Beretta V4 2655 26 Compact 3 Ford Escort V4 2345 33 Small 4 Eagle Summit V4 2560 33 Small 5 Volkswagen Jetta V4 2330 26 Small 6 Buick Le Sabre V6 3325 23 Large 7 Mitsubishi Galant V4 2745 25 Compact 8 Dodge Grand Caravan V6 3735 18 Van 9 Chrysler New Yorker V6 3450 22 Medium 10 Acura Legend V6 3265 20 Medium
Making data frames
![Page 32: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/32.jpg)
> names(Car) [1] "Make" "Model" "Cylinder“ "Weight" "Mileage" "Type"
> Car[1,] Make Model Cylinder Weight Mileage Type 1 Honda Civic V4 2170 33 Sporty
> Car[10,4][1] 3265
> Car$Mileage [1] 33 26 33 33 26 23 25 18 22 20
> mean(Car$Mileage) #average mileage of the 10 vehicles [1] 25.9
> min(Car$Weight) [1] 2170
Making data frames
![Page 33: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/33.jpg)
> table(Car$Type) # gives a frequency table Compact Large Medium Small Sporty Van 2 1 2 3 1 1
> table(Car$Make, Car$Type) # Cross tabulation Compact Large Medium Small Sporty Van Acura 0 0 1 0 0 0 Buick 0 1 0 0 0 0 Chevrolet 1 0 0 0 0 0 Chrysler 0 0 1 0 0 0 Dodge 0 0 0 0 0 1 Eagle 0 0 0 1 0 0 Ford 0 0 0 1 0 0 Honda 0 0 0 0 1 0 Mitsbusihi 1 0 0 0 0 0 Volkswagen 0 0 0 1 0 0
Making data frames
![Page 34: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/34.jpg)
> Make.Small <- Car$Make[Car$Type == "Small"]
> summary(Car$Mileage) # gives summary statistics Min. 1st Qu. Median Mean 3rd Qu. Max. 18.00 22.25 25.50 25.90 31.25 33.00
Making data frames
![Page 35: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/35.jpg)
> b = data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10))> b x y z1 -1.7651180 0.462309932 0.092309142 -0.7340731 -1.681826091 0.666487913 -0.4968900 1.728658405 -0.682816644 -1.3217873 0.307030157 0.241927455 -0.2070019 0.003892192 1.195918076 -0.9633084 0.060328696 -1.404248437 -1.1323626 1.079521099 1.635529158 -0.7301976 -1.422012899 -0.166958609 0.2979073 0.528152338 0.6599577810 -0.5759655 0.655296337 -0.39156127
> cor(b) x y zx 1.0000000000 0.0007151043 0.12151913y 0.0007151043 1.0000000000 -0.05770153z 0.1215191317 -0.0577015345 1.00000000
> apply(b,1,var) [1] 1.42472853 1.39573092 1.80047438 0.85041478 0.57226442 0.56454121 [7] 2.14379987 0.39516798 0.03357767 0.44098693
Making data frames
![Page 36: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/36.jpg)
> b = data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10))> b x y z1 -1.7651180 0.462309932 0.092309142 -0.7340731 -1.681826091 0.666487913 -0.4968900 1.728658405 -0.682816644 -1.3217873 0.307030157 0.241927455 -0.2070019 0.003892192 1.195918076 -0.9633084 0.060328696 -1.404248437 -1.1323626 1.079521099 1.635529158 -0.7301976 -1.422012899 -0.166958609 0.2979073 0.528152338 0.6599577810 -0.5759655 0.655296337 -0.39156127
attach(b)lm.D9 <- lm(y ~ x) # Regression of y on xlm.D90 <- lm(weight ~ group - 1) # omitting intercept
anova(lm.D9)summary(lm.D9
Making data frames
![Page 37: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/37.jpg)
Data Entry using Data Editor • R has a Data Editor with spreadsheet-like interface. • The interface quite useful for small data sets.
Suppose we want to construct a data frame based on following data
Roll Bstat101 Bstat1024701 78 804702 75 654703 60 704704 72 68
![Page 38: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/38.jpg)
To do this – type> result <- data.frame(Roll=integer(0), Bstat101=numeric(0),
Bstat102=numeric(0))> result <- edit(result)
Then enter the data in the Data Editor and close Editor
> result # To see the data
> result <- edit(result) # To modify the data
Data Entry using Data Editor
![Page 39: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/39.jpg)
An entire data frame can be read directly with the read.table() function.
# Reading data from Excel .csv File> data1 <- read.table(file= “d:/RFiles/data1.csv", header=T, sep=“,”)> data1 <- read.csv(file= “d:/RFiles/data1.csv", header=T )> data1
# Reading data from text filedata2 <- read.table(file= “d:/RFiles/data3.txt", header=T, sep=“\t” )> data2
> attach(data1)
> detach(data1)
Reading data from File
![Page 40: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/40.jpg)
Importing from other statistical systemsPackage foreign on cran provides import facilities for files produced by the following statistical software.
> read.mtp # imports a `Minitab Portable Worksheet’> read.xport # reads a file in SAS format> read.spss # reads files created by spss
Package Rstreams on cran contain functions
> readSfile # reads binary objects produced by S-PLUS> data.restore # reads S-PLUS data dumps (created by data.dump)
![Page 41: Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU](https://reader035.vdocument.in/reader035/viewer/2022062520/5681607d550346895dcfa7fd/html5/thumbnails/41.jpg)
Thanks