r language tutorial
DESCRIPTION
R language tutorialTRANSCRIPT
1Confidential | Copyright 2013 Trend Micro Inc.
David Chiu
R Language Tutorial
04/11/2023
Confidential | Copyright 2012 Trend Micro Inc.
Background of R
04/11/2023 2
Confidential | Copyright 2012 Trend Micro Inc.
What is R?
• GNU Project Developed by John Chambers @ Bell Lab
• Free software environment for statistical computing and graphics
• Functional programming language written primarily in C, Fortran
04/11/2023 3
R Language
• R is functional programming language
• R is an interpreted language
• R is object oriented-language
Why Using R
• Statistic analysis on the fly
• Mathematical function and graphic module embedded
• FREE! & Open Source! – http://cran.r-project.org/src/base/
Kaggle
http://www.kaggle.com/
R is the most widely language used by kaggle participants
Confidential | Copyright 2013 Trend Micro Inc.
Data Scientist of these Companies Using R
What is your programming language of choice, R, Python or something else?
“I use R, and occasionally matlab, for data analysis. There is a large, active and extremely knowledgeable R community at Google.”http://simplystatistics.org/2013/02/15/interview-with-nick-chamandy-statistician-at-google/
04/11/2023 7
“Expert knowledge of SAS (With Enterprise Guide/Miner) required and candidates with strong knowledge of R will be preferred”http://www.kdnuggets.com/jobs/13/03-29-apple-sr-data-scientist.html?utm_source=twitterfeed&utm_medium=facebook&utm_campaign=tfb&utm_content=FaceBook&utm_term=analytics#.UVXibgXOpfc.facebook
Commercial support for R
• In 2007, Revolution Analytics providea commercial support for Revolution R
– http://www.revolutionanalytics.com/products/revolution-r.php– http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php
• Big Data Appliance, which integrates R, Apache Hadoop, Oracle Enterprise Linux, and a NoSQL database with the Exadata hardware– http://
www.oracle.com/us/products/database/big-data-appliance/overview/index.html
Confidential | Copyright 2013 Trend Micro Inc.
Revolotion R
• Free for Community Version– http://www.revolutionanalytics.com/downloads/
– http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
04/11/2023 9
Base R 2.14.2 64
Revolution R (1-core)
Revolution R (4-core) Speedup (4 core)
Matrix Calculation 17.4 sec 2.9 sec 2.0 sec 7.9x
Matrix Functions 10.3 sec 2.0 sec 1.2 sec 7.8x
Program Control 2.7 sec 2.7 sec 2.7 sec Not Appreciable
Confidential | Copyright 2013 Trend Micro Inc.
IDE
R Studio
• http://www.rstudio.com/
04/11/2023 10
RGUI
• http://www.r-project.org/
Confidential | Copyright 2013 Trend Micro Inc.
Web App Development
Shiny makes it super simple for R users like you to turn analyses into interactive web applications that anyone can use
http://www.rstudio.com/shiny/
04/11/2023 11
Confidential | Copyright 2013 Trend Micro Inc.
Package Management
• CRAN (Comprehensive R Archive Network)
04/11/2023 12
Repository URLCRAN http://cran.r-project.org/web/packages/Bioconductor http://www.bioconductor.org/packages/release/Software.htmlR-Forge http://r-forge.r-project.org/
Confidential | Copyright 2012 Trend Micro Inc.
R Basic
04/11/2023 13
Confidential | Copyright 2013 Trend Micro Inc.
Basic Command
• help()– help(demo)
• demo()– demo(is.things)
• q()
• ls()
• rm()– rm(x)
04/11/2023 14
Confidential | Copyright 2013 Trend Micro Inc.
Basic Object
• Vector
• List
• Factor
• Array
• Matrix
• Data Frame
04/11/2023 15
Confidential | Copyright 2013 Trend Micro Inc.
Objects & Arithmetic
• Scalar– x=3; y<-5; x+y
• Vectors– x = c(1,2,3, 7); y= c(2,3,5,1); x+y; x*y; x – y; x/y;– x =seq(1,10); y= 2:11; x+y– x =seq(1,10,by=2); y =seq(1,10,length=2)– rep(c(5,8), 3)– x= c(1,2,3); length(x)
04/11/2023 16
Confidential | Copyright 2013 Trend Micro Inc.
Summaries and Subscripting
• Summary– X = c(1,2,3,4,5,6,7,8,9,10)– mean(x), min(x), median(x), max(x), var(x)– summary(x)
• Subscripting– x = c(1,2,3,4,5,6,7,8,9,10)– x[1:3]; x[c(1,3,5)];– x[c(1,3,5)] * 2 + x[c(2,2,2)]– x[-(1:6)]
04/11/2023 17
Lists
• Contain a heterogeneous selection of objects– e <- list(thing="hat", size="8.25"); e– l <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)– l$j– man = list(name="Qoo", height=183); man$name
Confidential | Copyright 2013 Trend Micro Inc.
Factor
• Ordered collection of items to present categorical value
• Different values that the factor can take are called levels
• Factors– phone = factor(c('iphone', 'htc', 'iphone', 'samsung', 'iphone',
'samsung'))– levels(phone)
04/11/2023 19
Matrices & Array
• Array– An extension of a vector to more than two dimensions– a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))
• Matrices– A vector to two dimensions – 2d-array– x = c(1,2,3); y = c(4,5,6); rbind(x,y);cbind(x,y)– x = rbind(c(1,2,3),c(4,5,6)); dim(x)– x<-matrix(c(1,2,3,4,5,6),nr=3); – x<-matrix(c(1,2,3,4,5,6),nrow=3, ,byrow=T)– x<-matrix(c(1,2,3,4),nr=2);y<-matrix(c(5,6),nr=2); x%*%y– t(matrix(c(1,2,3,4),nr=2))– solve(matrix(c(1,2,3,4),nr=2))
Data Frame
• Useful way to represent tabular data
• essentially a matrix with named columns may also include non-numerical variables
• Example– df = data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df
Function
• Function– `%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1– f <- function(x) {return(x^2 + 3)}create.vector.of.ones <- function(n) {
return.vector <- NA; for (i in 1:n) { return.vector[i] <- 1; } return.vector;
} – create.vector.of.ones(3)
• Control Structures– If …else…– Repeat, for, while
• Catch error – trycatch
Anonymous Function
• Functional language Characteristic– apply.to.three <- function(f) {f(3)}– apply.to.three(function(x) {x * 7})
Objects and Classes
• All R code manipulates objects.
• Every object in R has a type
• In assignment statements, R will copy the object, not just the reference to the object Attributes
S3 & S4 Object
• Many R functions were implemented using S3 methods
• In S version 4 (hence S4), formal classes and methods were introduced that allowed – Multiple arguments– Abstract types– inheritance.
OOP of S4
• S4 OOP Example– setClass("Student", representation(name = "character",
score="numeric"))– studenta = new ("Student", name="david", score=80 )– studentb = new ("Student", name="andy", score=90 )setMethod("show", signature("Student"), function(object) { cat(object@score+100) })– setGeneric("getscore", function(object)
standardGeneric("getscore"))– Studenta
Packages
• A package is a related set of functions, help files, and data files that have been bundled together.
• Basic Command– library(rpart)– CRAN– Install– (.packages())
Confidential | Copyright 2013 Trend Micro Inc.
Package used in Machine Learning for Hackers
04/11/2023 28
Confidential | Copyright 2013 Trend Micro Inc.
Apply
• Apply– Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.
– data <- cbind(c(1,2),c(3,4)) – data.rowsum <- apply(data,1,sum) – data.colsum <- apply(data,2,sum) – data
04/11/2023 29
Confidential | Copyright 2013 Trend Micro Inc.
Apply
• lapply – returns a list of the same length as X, each element of which is
the result of applying FUN to the corresponding element of X.
• sapply – is a user-friendly version and wrapper of lapply by default
returning a vector, matrix or
• vapply – is similar to sapply, but has a pre-specified type of return value,
so it can be safer (and sometimes faster) to use.
04/11/2023 30
File IO
• Save and Load– x = USPersonalExpenditure – save(x, file="~/test.RData") – rm(x) – load("~/test.RData") – x
Charts and Graphics
Plotting Example
– xrange = range(as.numeric(colnames(USPersonalExpenditure)));– yrange= range(USPersonalExpenditure);– plot(xrange, yrange, type="n", xlab="Year",ylab="Category" )
– for(i in 1:5) {
lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditure[i,], type="b", lwd=1.5)
}
IRIS Dataset
• data()
Confidential | Copyright 2013 Trend Micro Inc.
IRIS Dataset
• The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Fisher (1936) as an example ofdiscriminant analysis.[1] It is sometimes called Anderson's Iris data set– http://en.wikipedia.org/wiki/Iris_flower_data_set
04/11/2023 35
Iris setosa Iris versicolor Iris virginica
Confidential | Copyright 2013 Trend Micro Inc.
Classification of IRIS
• Classification Example– install.packages("e1071")– pairs(iris[1:4],main="Iris Data
(red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[unclass(iris$Species)])
– classifier<-naiveBayes(iris[,1:4], iris[,5])– table(predict(classifier, iris[,-5]), iris[,5])– classifier<-svm(iris[,1:4], iris[,5]) > table(predict(classifier, iris[,-
5]), iris[,5] + )– prediction = predict(classifier, iris[,1:4])
• http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/Na%C3%AFve_Bayes
04/11/2023 36
Performance Tips
• Use Built-in Math Functions
• Use Environments for Lookup Tables
• Use a Database to Query Large Data Sets
• Preallocate Memory
• Monitor How Much Memory You Are Using
• Cleaning Up Objects
• Functions for Big Data Sets
• Parallel Computation with R
Confidential | Copyright 2012 Trend Micro Inc.
R for Machine Learning
04/11/2023 38
Helps of the Topic
• ?read.delim – # Access a function's help file
• ??base::delim – # Search for 'delim' in all help files for functions in 'base'
• help.search("delimited") – # Search for 'delimited' in all help files
• RSiteSearch("parsing text") – # Search for the term 'parsing text' on the R site.
Confidential | Copyright 2013 Trend Micro Inc.
Sample Code of Chapter 1
• https://github.com/johnmyleswhite/ML_for_Hackers.git
04/11/2023 40
Confidential | Copyright 2012 Trend Micro Inc.
Reference & Resource
04/11/2023 41
Confidential | Copyright 2013 Trend Micro Inc.
Study Material
• R in a nutshell
04/11/2023 42
Confidential | Copyright 2013 Trend Micro Inc.
Online Reference
04/11/2023 43
Confidential | Copyright 2013 Trend Micro Inc.
Community Resources for R help
04/11/2023 44
Confidential | Copyright 2013 Trend Micro Inc.
Resource
• Websites– Stackoverflow – Cross Validated– R-help– R-devel– R-sig-*– Package-specific mailing list
• Blog– R-bloggers
• Twitter– https://twitter.com/#rstats
• Quora– http://www.quora.com/R-software
04/11/2023 45
Confidential | Copyright 2013 Trend Micro Inc.
Resource (Con’d)
• Conference– useR!– R in Finance– R in Insurance– Others– Joint Statistical Meetings– Royal Statistical Society Conference
• Local User Group– http://blog.revolutionanalytics.com/local-r-groups.html
• Taiwan R User Group– http://www.facebook.com/Tw.R.User– http://www.meetup.com/Taiwan-R/
04/11/2023 46
04/11/2023 47Confidential | Copyright 2012 Trend Micro Inc.
Thank You!