introduction to r. j. charles victor – intro to r workshop plan the r interface the console the...
TRANSCRIPT
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Workshop PlanWorkshop PlanThe R interfaceThe R interface
The ConsoleThe Console The Script EditorThe Script Editor The “Workspace”The “Workspace” R programming rules…R programming rules…
How does R ‘think’ How does R ‘think’ R ObjectsR Objects The data frameThe data frame
Importing DataImporting Data
Data ManipulationData Manipulation
Simple AnalysesSimple Analyses
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
What is R?What is R?
Programming environmentProgramming environment Useful for statistics and powerful graphing Useful for statistics and powerful graphing
capabilitiescapabilitiesBut you will be programming, not clicking and But you will be programming, not clicking and pointingpointing
Free, ‘open’ softwareFree, ‘open’ softwareUsers create programs which are made available Users create programs which are made available to other users via web and installation interfaceto other users via web and installation interface
Based on S, S-Plus programmingBased on S, S-Plus programming
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
The R ConsoleThe R ConsoleThe main windowThe main window Commands are written and submittedCommands are written and submitted Log of progress recordedLog of progress recorded Output (except graphs) producedOutput (except graphs) produced Similar to STATA interface and functionSimilar to STATA interface and function
Prompt ‘>’ indicates R is waiting for a commandPrompt ‘>’ indicates R is waiting for a command Try the following:Try the following:
> x <- c(1,2,3,4,5) [ENTER]> x <- c(1,2,3,4,5) [ENTER]> mean(x) [ENTER]> mean(x) [ENTER]
You should find the following resultYou should find the following result[1] 3[1] 3
R is telling you the mean of [1,2,3,4,5] is 3R is telling you the mean of [1,2,3,4,5] is 3
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
The Script EditorThe Script Editor
Accessible from the File menu itemAccessible from the File menu item Used to create a series of commands (ie program) Used to create a series of commands (ie program)
that can be saved and run at a later datethat can be saved and run at a later date Similar to DO editor in STATASimilar to DO editor in STATA Will make SAS and SPSS syntax users more Will make SAS and SPSS syntax users more
comfortablecomfortableWrite commands, highlight and click on submit buttonWrite commands, highlight and click on submit button
Try opening the Script editor (‘New Script’) and Try opening the Script editor (‘New Script’) and repeating the same commands as beforerepeating the same commands as before
X <- c(1,2,3,4,5)X <- c(1,2,3,4,5)
mean(x)mean(x) Now highlight this code and click on the submit buttonNow highlight this code and click on the submit button
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Script EditorScript Editor
Nothing Fancy – but VERY usefulNothing Fancy – but VERY useful
Saves programsSaves programs
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
The WorkspaceThe Workspace
The ‘Workspace’ is the R data and objectsThe ‘Workspace’ is the R data and objects When exiting R, saving the workspace saves When exiting R, saving the workspace saves
your data and workyour data and work
Let’s see our work thus farLet’s see our work thus farType: Type: ls()ls()
What do you see?What do you see?
Try saving your work thus farTry saving your work thus farFile -> Save WorkspaceFile -> Save Workspace
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
R ProgrammingR ProgrammingGeneral ProgrammingGeneral Programming R is generally case-sensitiveR is generally case-sensitive
Character strings must be in quotes (only “ “)Character strings must be in quotes (only “ “)
Hitting ENTER submits a commandHitting ENTER submits a commandIf you want a command to go over more than one line, add a If you want a command to go over more than one line, add a ‘+ then hit enter‘+ then hit enter
Try the following:Try the following:> newy > newy <- c<- c(0,0,1, +(0,0,1, ++ 1,0)+ 1,0)
Use ‘comments’ to identify what you have doneUse ‘comments’ to identify what you have done Comments begin with “#”Comments begin with “#”
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
How does R think?How does R think?
R thinks of data elements as ‘objects’R thinks of data elements as ‘objects’ Objects can be:Objects can be:
Single variablesSingle variablesArrays of variablesArrays of variablesEntire DatasetsEntire DatasetsResults from analyses (if saved as an object)Results from analyses (if saved as an object)
When you save the ‘Workspace’ you save all of When you save the ‘Workspace’ you save all of these objectsthese objects So in a small sense, R works like Excel.So in a small sense, R works like Excel.
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
OK, I don’t understand this Object thing…OK, I don’t understand this Object thing…
For data analysts it is usually easiest to start by equating For data analysts it is usually easiest to start by equating the term ‘object’ to mean ‘variable’ at firstthe term ‘object’ to mean ‘variable’ at first We have already created one variable called ‘x’We have already created one variable called ‘x’ We can create another variable (object) called ‘y’ that We can create another variable (object) called ‘y’ that
has the values (20, 27, 18, 50, 99)has the values (20, 27, 18, 50, 99)> y > y <- c<- c(20,27,18,50,99)(20,27,18,50,99)
To see all of the variables (objects) in memory we can To see all of the variables (objects) in memory we can use the ‘list’ commanduse the ‘list’ command
ls()ls() Or click on MISC -> LIST OBJECTSOr click on MISC -> LIST OBJECTS What do you see?What do you see?
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Creating DataCreating Data
How? No Spreadsheet??How? No Spreadsheet?? Create your ownCreate your own
Class of 5 students, need average test scoreClass of 5 students, need average test score
John SmithJohn Smith 58 M58 M
Jaysharee Singh Jaysharee Singh 82 F82 F
Emily Xu Emily Xu 90 F90 F
Ute VanDroglenUte VanDroglen 65 F65 F
Charles VictorCharles Victor 90 M90 M
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
First Attempt to Enter DataFirst Attempt to Enter DataMany ways to create and edit data in RMany ways to create and edit data in R First create variables (objects)First create variables (objects) Then compile the data set from the variablesThen compile the data set from the variables
Creating variables – 2 main ways Creating variables – 2 main ways Relatively few valuesRelatively few valuesVARIABLENAME VARIABLENAME <- c<- c(VALUE1,VALUE2,VALUE3….)(VALUE1,VALUE2,VALUE3….)
Character values in quotes z <- c(“ABC”,”DEF”)Character values in quotes z <- c(“ABC”,”DEF”)
Many valuesMany valuesVARIABLENAME VARIABLENAME <- scan()<- scan() ENTER ENTERVALUE1 VALUE2 VALUE3 VALUE4 …. VALUE8 ENTERVALUE1 VALUE2 VALUE3 VALUE4 …. VALUE8 ENTERVALUE9 VALUE10 ….. ENTERVALUE9 VALUE10 ….. ENTERENTERENTER>>
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Try entering this data inTry entering this data in Use method 1 for first name and last name and sexUse method 1 for first name and last name and sex Use method 2 for exam markUse method 2 for exam mark
John SmithJohn Smith 5858Jaysharee Singh Jaysharee Singh 8282Emily Xu Emily Xu 9090Ute VanDroglenUte VanDroglen 6565Charles VictorCharles Victor 9090
After you create each variable, look at the variable to After you create each variable, look at the variable to see that it is correct by typing the variable name at the see that it is correct by typing the variable name at the command promptcommand prompt
> firstname> firstname
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
A few notes on entering valuesA few notes on entering values
1) Variable names can contain most special 1) Variable names can contain most special characters including ‘.’characters including ‘.’
2) Missing values should be coded as2) Missing values should be coded as
NANA
3) To create a variable whose values are a 3) To create a variable whose values are a sequential list of numbers, use a colon (:)sequential list of numbers, use a colon (:)
StudentID StudentID <- c<- c(1:5)(1:5)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Creating the DatasetCreating the DatasetCurrently we just have 5 variables (objects)Currently we just have 5 variables (objects) These objects are independent of each other (ie the first These objects are independent of each other (ie the first
name John is not linked with the last name Smith)name John is not linked with the last name Smith)
To ‘link’ these objects we need to compile these variables To ‘link’ these objects we need to compile these variables together in a dataset which R calls a ‘data frame’together in a dataset which R calls a ‘data frame’ In R a data frame is an object just like a variable, and thus it In R a data frame is an object just like a variable, and thus it
is created in a similar fashionis created in a similar fashion
DATA_NAME DATA_NAME <-<- data.framedata.frame (VARIABLE1,VARIABLE2,VARIABLE3) (VARIABLE1,VARIABLE2,VARIABLE3)
Note: All variables must have the same number of Note: All variables must have the same number of observationsobservations
Now take a look at the data by typing the dataset nameNow take a look at the data by typing the dataset name
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Back to ‘Objects’Back to ‘Objects’
Look at the objects now in memoryLook at the objects now in memory ls() or click MISC -> List all objectls() or click MISC -> List all object
You should see all of the variables + the You should see all of the variables + the datasetdataset
You can now use the dataset similar to how You can now use the dataset similar to how we have used variableswe have used variables
To see a variable, type the variable nameTo see a variable, type the variable name To see the dataset, type the dataset nameTo see the dataset, type the dataset name
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
BUT…BUT…Once attached to a dataset, the variables (Studentid, Once attached to a dataset, the variables (Studentid, firstname, lastname, mark, sex) are different than the firstname, lastname, mark, sex) are different than the ‘objects’ in R’s memory‘objects’ in R’s memory
So we have So we have The object: The object: markmark
The variable The variable mark mark on the on the classclass dataset dataset
You may want to get rid of the ‘objects’ now that you You may want to get rid of the ‘objects’ now that you have compiled them onto the dataset – (any changes have compiled them onto the dataset – (any changes made to the objects, will not be reflected on dataset)made to the objects, will not be reflected on dataset)
rmrm(studentid, firstname, lastname, mark, sex)(studentid, firstname, lastname, mark, sex)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Importing Existing Data into RImporting Existing Data into R
R has not been very foreign data friendlyR has not been very foreign data friendly But this is changing - rapidlyBut this is changing - rapidly
Optimally datasets need to be in the form of:Optimally datasets need to be in the form of:ASCII textASCII text
Tab delimitedTab delimited
Comma delimitedComma delimited
Best to convert Excel data into one of these Best to convert Excel data into one of these formatsformats
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Importing: ASCII textImporting: ASCII textUse command: read.table Use command: read.table
OBJECT OBJECT <- read.table<- read.table(“C:\\My Document\\FILE.TXT, header=T)(“C:\\My Document\\FILE.TXT, header=T)
Note: Pathways, have to have double slash: \\Note: Pathways, have to have double slash: \\ If variable names are on the first rowIf variable names are on the first row
Use header=T optionUse header=T optionOtherwise variables will be named V1 V2 V3…Otherwise variables will be named V1 V2 V3…
Try to import the heart_rx datasetTry to import the heart_rx dataset If you are unsure of the pathway you can use the command: If you are unsure of the pathway you can use the command:
file.choose() nested in the read.tablefile.choose() nested in the read.tableThis will cause R to bring up a GUI to choose your fileThis will cause R to bring up a GUI to choose your file
OBJECT OBJECT <- read.table<- read.table((file.choosefile.choose(), header=F)(), header=F)
Try to import the heart_rx_noheader dataset this wayTry to import the heart_rx_noheader dataset this way
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Importing: Tab Delimited or Comma Separated or Importing: Tab Delimited or Comma Separated or Database File Database File
Tab DelimitedTab Delimited Use command: read.delim Use command: read.delim
OBJECT OBJECT <- read.delim<- read.delim(“C:\\My Document\\FILE.TXT”, header=T,sep=“\t”)(“C:\\My Document\\FILE.TXT”, header=T,sep=“\t”)
Comma Separated Value (CSV)Comma Separated Value (CSV) Use command: read.csvUse command: read.csv
OBJECT OBJECT <- read.csv<- read.csv(“C:\\My Document\\FILE.CSV”, header=T,sep=“,”)(“C:\\My Document\\FILE.CSV”, header=T,sep=“,”)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Importing: Access, SPSS, Stata etcImporting: Access, SPSS, Stata etcBest method: 3Best method: 3rdrd party software to convert data to a party software to convert data to a Delimited or CSV fileDelimited or CSV file
DBMS Copy is very popularDBMS Copy is very popular
Stat Transfer is very goodStat Transfer is very good
Some users have createdSome users have created read.spssread.spss read.xport (for SAS files)read.xport (for SAS files) read.dta (for STATA files)read.dta (for STATA files) But these commands need to be downloaded and But these commands need to be downloaded and
installed (more on that later)installed (more on that later)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Importing: R DatasetImporting: R Dataset
If a workspace has been saved from a previous If a workspace has been saved from a previous session, simply load the workspace by ‘clicking session, simply load the workspace by ‘clicking and pointing’and pointing’
Or use the load commandOr use the load command
loadload(“PATHWAY\\FILENAME.Rdata”)(“PATHWAY\\FILENAME.Rdata”)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Creating a Dataset from a DatasetCreating a Dataset from a DatasetIf you want to create a copy of a current dataset, this is a If you want to create a copy of a current dataset, this is a simple function in R.simple function in R.
Simply create a new object (ie with a different name) Simply create a new object (ie with a different name) from the existing datasetfrom the existing dataset
NEWDATA NEWDATA <-<- OLDDATA OLDDATA
To create a new dataset from an edited version of an To create a new dataset from an edited version of an old datasetold dataset
NEWDATA <- NEWDATA <- editedit(olddata)(olddata)
This will bring up the data editor (more on this later), and any This will bring up the data editor (more on this later), and any changes will be attributed to NEWDATA, but not to OLDDATAchanges will be attributed to NEWDATA, but not to OLDDATA
DATA MANIPULATIONDATA MANIPULATION
99% of the work99% of the work
(don’t underestimate)(don’t underestimate)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Data Manipulation: GeneralData Manipulation: GeneralMost of your time Most of your time shouldshould be spent in this phase be spent in this phase R is probably not the ‘best’ packageR is probably not the ‘best’ package
Data manipulation includes (among other things)Data manipulation includes (among other things) Renaming variablesRenaming variables Getting rid of variablesGetting rid of variables Creating variablesCreating variables Changing variables (eg categorising age)Changing variables (eg categorising age) Changing values of specific observations Changing values of specific observations
(eg someone reports age of 180)(eg someone reports age of 180) Getting rid of observationsGetting rid of observations Merging datasetsMerging datasets
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
A couple of things first….A couple of things first….
R has MANY ways of accomplishing similar R has MANY ways of accomplishing similar tasks due to its open software constructiontasks due to its open software construction
When referring to variables on a dataset you When referring to variables on a dataset you must either:must either:
Use: Use: d_named_name$$v_namev_nameOROR
““Attach” the datasetAttach” the dataset AttachAttach((d_named_name))
But attaching the dataset does not allow for But attaching the dataset does not allow for manipulation of dataset variables only the use of manipulation of dataset variables only the use of these variablesthese variables
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
What is he talking about??What is he talking about??Lets create a new dataset with two variables x and yLets create a new dataset with two variables x and y
X will be the numbers 1 to 20X will be the numbers 1 to 20 Y will be 20 random values from a normal distributionY will be 20 random values from a normal distribution
XX <- c<- c(1:20)(1:20)YY <- rnorm<- rnorm(x)(x)TestdataTestdata <- data.frame<- data.frame((x,yx,y))
Remove the x and y objectsRemove the x and y objectsrmrm((x,yx,y))
Print the dataset, and then x and yPrint the dataset, and then x and ytestdatatestdataXXYY
Notice we could not access x and y this way. Try:Notice we could not access x and y this way. Try:TestdataTestdata$$xxTestdataTestdata$$yy
That worked, but is a lot of typing. So we could also:That worked, but is a lot of typing. So we could also:AttachAttach((testdatatestdata))XXYY
That worked too! So attaching a dataset, allows us to access the That worked too! So attaching a dataset, allows us to access the variables on the dataset, without using the $ format – but only for variables on the dataset, without using the $ format – but only for visualizing and analysing, not editing (so I don’t like to do it)visualizing and analysing, not editing (so I don’t like to do it)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Renaming VariablesRenaming Variables
Occasionally we need to rename a variableOccasionally we need to rename a variable Many waysMany ways
We can edit the data like a spreadsheetWe can edit the data like a spreadsheetFixFix(d_name)(d_name)
Create a copy of Class dataset, and “Fix” itCreate a copy of Class dataset, and “Fix” it
NEWDATA <- NEWDATA <- editedit(d_name)(d_name)
OR We can create a new variableOR We can create a new variabled_name$new_v_name <-d_name$old_v_named_name$new_v_name <-d_name$old_v_name
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Deleting and Creating VariablesDeleting and Creating Variables
To delete a variable set a variable to NULLTo delete a variable set a variable to NULL
d_named_name$$v_namev_name <-<- NULL NULL
To create a variable just set the new variable To create a variable just set the new variable equal to some value – we use a similar construct equal to some value – we use a similar construct as beforeas before d_named_name$$v_namev_name <-<- SOME_VALUE OR SOME_VALUE OR
EXPRESSIONEXPRESSION
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Creating VariablesCreating Variables
Suppose we want a variable identifying the Suppose we want a variable identifying the day the exam was written and a variable day the exam was written and a variable identifying the maximum value for the examidentifying the maximum value for the exam
classclass$$test_daytest_day <- c<- c(“Monday”)(“Monday”)
classclass$$test_maxtest_max <- c<- c(100)(100)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
We can also create variables based on other We can also create variables based on other variablesvariables Imagine that we now want to calculate the Imagine that we now want to calculate the
students percentage on the examstudents percentage on the exam d_named_name$$newv_namenewv_name = = expressionexpression For example:For example:
classclass$$prctprct <- <- ((classclass$$scorescore // classclass$$test_maxtest_max)*100)*100
Remember rules of BEDMASRemember rules of BEDMAS
Creating VariablesCreating Variables
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
A Note on Mathematic FunctionsA Note on Mathematic Functions ++ = addition= addition -- = subtraction= subtraction ** = multiplication= multiplication // = division= division ( )( ) = brackets= brackets **** = to the exponent= to the exponent abs( abs( x x )) = absolute value of x= absolute value of x int( int( x x )) = integer value of x= integer value of x log( log( xx ) ) = natural log of x (ie Ln to non-math types)= natural log of x (ie Ln to non-math types) log10( log10( x x )) = log base 10 of x (ie Log to non-math = log base 10 of x (ie Log to non-math
types)types) sqrt( sqrt( x x )) = square root of x= square root of x
round( round( xx, , valuevalue)) = round x, to value decimals= round x, to value decimals
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Lets change the existing prct variable into letter Lets change the existing prct variable into letter gradesgrades Map out which letter grades apply to which Map out which letter grades apply to which
percentspercents
Below 50 Below 50 = F= F 50 – 5950 – 59 = D= D 60 – 69 60 – 69 = C= C 70 – 7970 – 79 = B= B 80 – 10080 – 100 = A= A
Changing VariablesChanging Variables
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Two waysTwo ways1) Only for numeric variables1) Only for numeric variables
Using Base RUsing Base R Cut functionCut function
D_nameD_name$$new_v_namenew_v_name <-<-
CutCut(d_name$old_v_name , (d_name$old_v_name ,
breaks = c(breakpoints) OR breaks = #breaks,breaks = c(breakpoints) OR breaks = #breaks,
labels = c(“LABEL1”, “LABEL2”,….) )labels = c(“LABEL1”, “LABEL2”,….) )
EGEG
classclass$$lettergrdlettergrd <- cut<- cut((classclass$$prctprct , breaks = c(-Inf,49,59,60, , breaks = c(-Inf,49,59,60,
79,100), labels = 79,100), labels = c(“F”,”D”,”C”,”B”,”A”) )c(“F”,”D”,”C”,”B”,”A”) )
Changing Variables - RecodingChanging Variables - Recoding
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Recoding variables – Second MethodRecoding variables – Second Method
There is a “RECODE” function, but it has been There is a “RECODE” function, but it has been developed outside of the original Base Rdeveloped outside of the original Base R We can incorporate programs that have been We can incorporate programs that have been
written by other peoplewritten by other people Often these programs are compiled into a Often these programs are compiled into a
group of programs that are used for a similar group of programs that are used for a similar constructconstruct
These groups of programs are called These groups of programs are called “Packages” “Packages”
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Installing a Package Installing a Package (to get a function that you do not have)(to get a function that you do not have)
First, note that you do not have ‘recode’First, note that you do not have ‘recode’ help(recode)help(recode)
Now (after searching google) you find out that a special function Now (after searching google) you find out that a special function called ‘recode’ is available in the package called ‘car’called ‘recode’ is available in the package called ‘car’Click PACKAGES -> INSTALL PACKAGE(S)Click PACKAGES -> INSTALL PACKAGE(S)
R will ask you to set a CRAN Mirror (site from which to download R will ask you to set a CRAN Mirror (site from which to download packages)packages)
Choose CANADA (ON)Choose CANADA (ON) R will now ask which package you want to downloadR will now ask which package you want to download
Choose “CAR”Choose “CAR” R will now download the ‘car’ packageR will now download the ‘car’ package
BUT the car package has just been installed, it has not yet been BUT the car package has just been installed, it has not yet been loadedloadedClick PACKAGES -> LOAD PACKAGE(S)Click PACKAGES -> LOAD PACKAGE(S)
R will ask which package to Load from all that you have installedR will ask which package to Load from all that you have installedChoose “CAR”Choose “CAR”
You can now use the recode functionYou can now use the recode function Type help(recode)Type help(recode)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Recoding – Second MethodRecoding – Second Method
Now that the ‘CAR’ package is installed, we can Now that the ‘CAR’ package is installed, we can use ‘recodeuse ‘recode
D_nameD_name$$new_v_namenew_v_name <- recode<- recode((d_named_name$$old_v_nameold_v_name, recodes), recodes)
Where recodes can be in form of:Where recodes can be in form of:specific values: “c(99,999) = NA; c(1)=‘Y’ “specific values: “c(99,999) = NA; c(1)=‘Y’ “range of values: “lo:50=‘F’; 51:60=‘D’ “range of values: “lo:50=‘F’; 51:60=‘D’ “
classclass$$lettergrd2lettergrd2 <- recode<- recode((classclass$$prctprct, “lo:50=‘F’; , “lo:50=‘F’; 51:60=‘D’;…..”)51:60=‘D’;…..”)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Combining Conditional StatementsCombining Conditional Statementsto Change Values within Observationsto Change Values within Observations
Your TA informs you that Jim Smith was sick on for the Monday Your TA informs you that Jim Smith was sick on for the Monday Exam, instead he was given a makeup exam, out of 98Exam, instead he was given a makeup exam, out of 98 To identify observations using conditional statements, we To identify observations using conditional statements, we
use the R function IFELSEuse the R function IFELSE
IFELSEIFELSE(condition/expression, value if true, value if false)(condition/expression, value if true, value if false)
classclass$$testmaxtestmax <- ifelse<- ifelse((classclass$$firstnamefirstname == ‘Jim’ & == ‘Jim’ & classclass$$lastnamelastname == ‘Smith’, 98, == ‘Smith’, 98, classclass$$testmaxtestmax))
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
You are then informed that the twins (Joan and You are then informed that the twins (Joan and John Smith) cheated, you have to give them John Smith) cheated, you have to give them zeros:zeros:
classclass$$scorescore <- ifelse<- ifelse((((classclass$$firstnamefirstname == == ‘Joan’ | ‘Joan’ | classclass$$firstnamefirstname == ‘John’) & == ‘John’) &classclass$$lastnamelastname == ‘Smith’, 0, == ‘Smith’, 0, classclass$$scorescore))
More complex…More complex…
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Logical StatementsLogical Statements
< < = Less than= Less than <= <= = Less than or equal to= Less than or equal to > > = Greater than= Greater than >= >= = Greather than or equal to= Greather than or equal to != != = Not equal to= Not equal to ==== = Equal to= Equal to
& or &&& or && = Intersection boolean operator= Intersection boolean operator | or ||| or || = Union boolean operator= Union boolean operator
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Deleting Observations (or Subsetting)Deleting Observations (or Subsetting)
Suppose we want to look at only the Female studentsSuppose we want to look at only the Female students
We need to either delete the Males or keep the We need to either delete the Males or keep the femalesfemales
Best to create a new dataset with only females than Best to create a new dataset with only females than deleting observations from our original datasetdeleting observations from our original dataset
Many ways – Use subset commandMany ways – Use subset command
New_d_name <- subset(old_d_name, condition, New_d_name <- subset(old_d_name, condition, select=variables wanted)select=variables wanted)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Females <- subset(class, class$sex == ‘F’)Females <- subset(class, class$sex == ‘F’)
Note, we can also select out certain variables Note, we can also select out certain variables onlyonly
Males <- subset(class, class$sex == ‘M’, Males <- subset(class, class$sex == ‘M’, select=c(firstname,lastname,lettergrd) )select=c(firstname,lastname,lettergrd) )
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Data MergeData Merge
Two important types of mergeTwo important types of merge
ConcatenationConcatenationAdding new observations to a set of old Adding new observations to a set of old observationsobservations
Matched mergeMatched mergeAdding new variables (values) to an existing Adding new variables (values) to an existing dataset with the same observationsdataset with the same observations
(eg we need to add mid-term marks to our exam (eg we need to add mid-term marks to our exam database)database)
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
ConcatenationConcatenation
EasyEasy Use Use rbind rbind function, and add all datasetsfunction, and add all datasets
new_d_name new_d_name <- rbind<- rbind(d_name1, d_name2,…)(d_name1, d_name2,…)
But all datasets must have same number (and But all datasets must have same number (and names) of variables!names) of variables!
J. Charles Victor – Intro to RJ. Charles Victor – Intro to R
Matched MergeMatched Merge
A little more complexA little more complex Use Use mergemerge function function
If there is a common variable on which to merge:If there is a common variable on which to merge:
New_d_name <- merge(d_name1, d_name2, New_d_name <- merge(d_name1, d_name2,
by = “ID”, all=TRUE)by = “ID”, all=TRUE)
If the matching variables has different namesIf the matching variables has different names
New_d_name <- merge(d_name1, d_name2, by.x=“IDX”, New_d_name <- merge(d_name1, d_name2, by.x=“IDX”, by.y=“IDY”,all=TRUE)by.y=“IDY”,all=TRUE)