Download - R Course 2014: Lecture 3
-
8/11/2019 R Course 2014: Lecture 3
1/59
Lecture 3: String Function
Ben FansonSimon Lisovski
-
8/11/2019 R Course 2014: Lecture 3
2/59
Quick Refresher1) R data types (major ones)
numeric
character
factor [think of as a hybrid of numeric and character]
2) R classes (major ones) vector: c(), factor(), length()
data frame: data.frame() or read.table()/read.csv()/read.xls(), dim
list: list(), str()
-
8/11/2019 R Course 2014: Lecture 3
3/59
Quick Refresher3) subsetting
vector_name[ position_number ]
data.frame_name[ row_number, column_number ] or
data.frame_name$column_name
list_name[ item_number ] orlist_name$item_name
-
8/11/2019 R Course 2014: Lecture 3
4/59
Quick Refresher4) useful functions that we have encountered
class( ) # to get object structure
names() # get column names
unique() # list of unique items in a vector(s)
rep() # repeat an something so many times
seq() # create a sequence of numbers
1:5 # ':' is shortcut for numeric vector
-
8/11/2019 R Course 2014: Lecture 3
5/59
Quick Refresher4) useful functions that we have encountered
as.numeric(), as.character(), as.factor() # convert a vector to another data type (if possib
mean(), sd(), median(), min(), max() # summary statistics
summary() # gives some statistics
log(), sqrt(), x^2
-
8/11/2019 R Course 2014: Lecture 3
6/59
Quick Refresher5) importing files
read.table(filename, header=T, sep='\t') # tab-delimited files
read.csv(filename,header=T) # comma-delimited files
read.xls(filename, sheet=1) # library(gdata)used for reading excel files
-
8/11/2019 R Course 2014: Lecture 3
7/59
Lecture Outline1) sorting (ben)
2) stringr package (ben)
3) regular expressions (ben)
4) dates (Simeon)
Helpful references- http://cran.r-project.org/web/packages/stringr/stringr.pdf
- http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Wickham.pdf
-
8/11/2019 R Course 2014: Lecture 3
8/59
Sorting in R1) for vectors, use sort()
returns sorted items
-
8/11/2019 R Course 2014: Lecture 3
9/59
Sorting in R1) for vectors, use sort()
returns sorted items
-
8/11/2019 R Course 2014: Lecture 3
10/59
Sorting in R1) for vectors, use sort()
returns sorted items
-
8/11/2019 R Course 2014: Lecture 3
11/59
Sorting in R2) for data.frames, use order()
returns ordered row numbers
-
8/11/2019 R Course 2014: Lecture 3
12/59
-
8/11/2019 R Course 2014: Lecture 3
13/59
Sorting in R2) for data.frames, use order()
dataframe_name[ row_order, ]
-
8/11/2019 R Course 2014: Lecture 3
14/59
Sorting in R2) for data.frames, use order()
dataframe_name[ row_order, ]
-
8/11/2019 R Course 2014: Lecture 3
15/59
String functionsString functions have two main purposes
1) cleaning and preparing data tables for analysis
name
Bactrocera tryoni
Bactrocera meloni
Ceratitis capitata
Anastrepha ludens
genus species
Bactrocera tryoni
Bactrocera meloni
Ceratitis capitataAnastrepha ludens
split
-
8/11/2019 R Course 2014: Lecture 3
16/59
String functionsString functions have two main purposes
1) cleaning and preparing data tables for analysis
model
output
Estimate
trtAwk1 0.23
trtAwk2 0.45
trtBwk1 0.12
trtBwk2 0.0001
trt Week Estimate
A 1 0.23
B 2 0.45
B 1 0.12
B 2 0.0001
split and strip offspecific text
split
-
8/11/2019 R Course 2014: Lecture 3
17/59
String functionsString functions have two main purposes
1) cleaning and preparing data tables for analysis
name
Bactrocera__tryoni
Bactrocera__meloni
Ceratitis__capitata
Anastrepha__ludensreplace'__' with ' '
name
Bactrocera tryoni
Bactrocera meloni
Ceratitis capitata
Anastrepha ludens
-
8/11/2019 R Course 2014: Lecture 3
18/59
String functionsString functions have two main purposes
1) cleaning and preparing data tables for analysis2) writing R scripts, especially generic script
automating file paths/names
-
8/11/2019 R Course 2014: Lecture 3
19/59
String functionsString functions have two main purposes
1) cleaning and preparing data tables for analysis2) writing R scripts, especially generic script
automating file paths/names
-
8/11/2019 R Course 2014: Lecture 3
20/59
String functionsString functions have two main purposes
1) cleaning and preparing data tables for analysis2) writing R scripts, especially generic script
automating file paths/names
-
8/11/2019 R Course 2014: Lecture 3
21/59
paste()1) one of the more important functions that you will use
2) concatenates two or more objects
genus species
Bactrocera tryoni
Bactrocera meloni
Ceratitis capitata
Anastrepha ludens
name
Bactrocera tryoni
Bactrocera meloni
Ceratitis capitata
Anastrepha ludens
-
8/11/2019 R Course 2014: Lecture 3
22/59
paste()
-
8/11/2019 R Course 2014: Lecture 3
23/59
paste()
-
8/11/2019 R Course 2014: Lecture 3
24/59
stringr packagefunctions start with 'str_'
Note - Like most R functions, there are many other functions that do similar things. stto add consistency and have all the main functions in one place (can just start typing s
t i B i t i
-
8/11/2019 R Course 2014: Lecture 3
25/59
str_c( object1, object2, ) # same as paste(...,sep='') or paste0()
stringr - Basic string opera
stringr Basic string opera
-
8/11/2019 R Course 2014: Lecture 3
26/59
str_split_fixed(object1, pattern, num_splits) # break apart a string by a
stringr - Basic string opera
stringr Basic string opera
-
8/11/2019 R Course 2014: Lecture 3
27/59
str_split_fixed(object1, pattern, num_splits) # break apart a string by a
stringr - Basic string opera
stringr Basic string opera
-
8/11/2019 R Course 2014: Lecture 3
28/59
str_length( object ) # gets the length of each element in a vector
stringr - Basic string opera
stringr Basic string opera
-
8/11/2019 R Course 2014: Lecture 3
29/59
str_sub( object, start, end ) # substring - remove part of a string by po
stringr - Basic string opera
stringr Basic string opera
-
8/11/2019 R Course 2014: Lecture 3
30/59
str_trim(object, side) # remove leading and/or trailing whitespace
stringr - Basic string opera
stringr Basic string opera
-
8/11/2019 R Course 2014: Lecture 3
31/59
stringr - Basic string opera
str_trim(object, side) # remove leading and/or trailing whitespace
stringr - Pattern matchi
-
8/11/2019 R Course 2014: Lecture 3
32/59
str_detect( object, pattern) # determine if the string contains the patte
# returns TRUE or FALSE
stringr - Pattern matchi
stringr - Pattern matchi
-
8/11/2019 R Course 2014: Lecture 3
33/59
str_detect( object, pattern) # determine if the string contains the patte
# returns TRUE or FALSE
stringr - Pattern matchi
stringr - Pattern matchi
-
8/11/2019 R Course 2014: Lecture 3
34/59
str_detect( object, pattern) # determine if the string contains the patte
# returns TRUE or FALSE
stringr - Pattern matchi
stringr - Pattern matchi
-
8/11/2019 R Course 2014: Lecture 3
35/59
str_replace(string, pattern, repalce_with) # replace pattern with anoth
stringr - Pattern matchi
stringr - Pattern matchi
-
8/11/2019 R Course 2014: Lecture 3
36/59
str_replace(string, pattern, repalce_with) # replace pattern with anoth
stringr Pattern matchi
-
8/11/2019 R Course 2014: Lecture 3
37/59
Regular expressioncomputer language built to find stuff in strings
'^' starts with
-
8/11/2019 R Course 2014: Lecture 3
38/59
Regular expressioncomputer language built to find stuff in strings
'$' ends with
-
8/11/2019 R Course 2014: Lecture 3
39/59
Regular expressioncomputer language built to find stuff in strings
'[0-9]' find strings with numbers
-
8/11/2019 R Course 2014: Lecture 3
40/59
Regular expressioncomputer language built to find stuff in strings
'[0-9]' find strings with numbers
-
8/11/2019 R Course 2014: Lecture 3
41/59
Regular expressioncomputer language built to find stuff in strings
'[a-z]' find strings with specific letters
Advanced (putting it al
-
8/11/2019 R Course 2014: Lecture 3
42/59
Advanced (putting it altogether):
Output for publication tabmod
-
8/11/2019 R Course 2014: Lecture 3
43/59
Advanced (putting it altogether):
Output for publication tabmod
-
8/11/2019 R Course 2014: Lecture 3
44/59
Output for publication ta
O t t f bli ti t
-
8/11/2019 R Course 2014: Lecture 3
45/59
Output for publication talibrary(rtf)
rtf
-
8/11/2019 R Course 2014: Lecture 3
46/59
Dates in R
-
8/11/2019 R Course 2014: Lecture 3
47/59
Dates in R
Three date/time classes are built-in in R
DatePOSIXct
POSIXlt
Dates in R
-
8/11/2019 R Course 2014: Lecture 3
48/59
Dates in R
Three date/time classes are built-in in R
DatePOSIXct - Portable Operating System Interface [for Unix] cale
POSIXlt - Portable Operating System Interface [for Unix] loca
Dates in R
-
8/11/2019 R Course 2014: Lecture 3
49/59
ClassDate
This is the class you could use if you have only dates, BUT no times,
Dates in R
Symbol Explanation
%a Abbreviated week
%A Full weekday (e.g.
%m Abbreviated mont
%M Full month (e.g. J
.%d Day of the month
.
%H Hours as decimal
.
%j Day of the year (0
.
Create a date:
Non-standard formats must be specified:
-
8/11/2019 R Course 2014: Lecture 3
50/59
-
8/11/2019 R Course 2014: Lecture 3
51/59
Dates in R
-
8/11/2019 R Course 2014: Lecture 3
52/59
ClassPOSIXlt
This class enables easy extraction of specific components of a time.
Dates in R
Create some POSIXlt objects: NOTE:
Internal integer representation:
Extract components of a time object:
Dates in R
-
8/11/2019 R Course 2014: Lecture 3
53/59
Dates in Rfunctionformat()
Format R Objects (not specific to POSIXct and POSIXlt objects)
Symbol Explanation
%a Abbreviated weekd
%A Full weekday (e.g.
%m Abbreviated month
%M Full month (e.g. Ja
.
%d Day of the month (
.
%H Hours as decimal n
.
%j Day of the year (00
.
Dates in R
-
8/11/2019 R Course 2014: Lecture 3
54/59
atesR Package lubridate
?Dates and Times Made Easy?
- http://www.jstatsoft.org/v40/i03/paper
NAs OR Missing Data
-
8/11/2019 R Course 2014: Lecture 3
55/59
gIn R, missing values are represented by the symbol NA (not available). Imvalues (e.g., dividing by zero) are represented by the symbol NaN (not a n
Testing for missing values:
Excluding missing data from analysis
Next Week
-
8/11/2019 R Course 2014: Lecture 3
56/59
Data frame manipulation
grammar of data manipulation (dplyr package)
restructuring dataframes
-
8/11/2019 R Course 2014: Lecture 3
57/59
Lecture 3: Hands on Sectio
-
8/11/2019 R Course 2014: Lecture 3
58/59
Getting Started
-
8/11/2019 R Course 2014: Lecture 3
59/59
g
1) download zip file from http://github.com/bfanson/Rcourse_pr
2) move 'R programs/Lecture3.R' into your 'Rcourse_proj/R programs
3) move 'data/' folder and replace all your data files