r course 2014: lecture 3

Upload: gceid

Post on 02-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 R Course 2014: Lecture 3

    1/59

    Lecture 3: String Function

    Ben FansonSimon Lisovski

  • 8/11/2019 R Course 2014: Lecture 3

    2/59

    Quick Refresher1) R data types (major ones)

    numeric

    character

    factor [think of as a hybrid of numeric and character]

    2) R classes (major ones) vector: c(), factor(), length()

    data frame: data.frame() or read.table()/read.csv()/read.xls(), dim

    list: list(), str()

  • 8/11/2019 R Course 2014: Lecture 3

    3/59

    Quick Refresher3) subsetting

    vector_name[ position_number ]

    data.frame_name[ row_number, column_number ] or

    data.frame_name$column_name

    list_name[ item_number ] orlist_name$item_name

  • 8/11/2019 R Course 2014: Lecture 3

    4/59

    Quick Refresher4) useful functions that we have encountered

    class( ) # to get object structure

    names() # get column names

    unique() # list of unique items in a vector(s)

    rep() # repeat an something so many times

    seq() # create a sequence of numbers

    1:5 # ':' is shortcut for numeric vector

  • 8/11/2019 R Course 2014: Lecture 3

    5/59

    Quick Refresher4) useful functions that we have encountered

    as.numeric(), as.character(), as.factor() # convert a vector to another data type (if possib

    mean(), sd(), median(), min(), max() # summary statistics

    summary() # gives some statistics

    log(), sqrt(), x^2

  • 8/11/2019 R Course 2014: Lecture 3

    6/59

    Quick Refresher5) importing files

    read.table(filename, header=T, sep='\t') # tab-delimited files

    read.csv(filename,header=T) # comma-delimited files

    read.xls(filename, sheet=1) # library(gdata)used for reading excel files

  • 8/11/2019 R Course 2014: Lecture 3

    7/59

    Lecture Outline1) sorting (ben)

    2) stringr package (ben)

    3) regular expressions (ben)

    4) dates (Simeon)

    Helpful references- http://cran.r-project.org/web/packages/stringr/stringr.pdf

    - http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Wickham.pdf

  • 8/11/2019 R Course 2014: Lecture 3

    8/59

    Sorting in R1) for vectors, use sort()

    returns sorted items

  • 8/11/2019 R Course 2014: Lecture 3

    9/59

    Sorting in R1) for vectors, use sort()

    returns sorted items

  • 8/11/2019 R Course 2014: Lecture 3

    10/59

    Sorting in R1) for vectors, use sort()

    returns sorted items

  • 8/11/2019 R Course 2014: Lecture 3

    11/59

    Sorting in R2) for data.frames, use order()

    returns ordered row numbers

  • 8/11/2019 R Course 2014: Lecture 3

    12/59

  • 8/11/2019 R Course 2014: Lecture 3

    13/59

    Sorting in R2) for data.frames, use order()

    dataframe_name[ row_order, ]

  • 8/11/2019 R Course 2014: Lecture 3

    14/59

    Sorting in R2) for data.frames, use order()

    dataframe_name[ row_order, ]

  • 8/11/2019 R Course 2014: Lecture 3

    15/59

    String functionsString functions have two main purposes

    1) cleaning and preparing data tables for analysis

    name

    Bactrocera tryoni

    Bactrocera meloni

    Ceratitis capitata

    Anastrepha ludens

    genus species

    Bactrocera tryoni

    Bactrocera meloni

    Ceratitis capitataAnastrepha ludens

    split

  • 8/11/2019 R Course 2014: Lecture 3

    16/59

    String functionsString functions have two main purposes

    1) cleaning and preparing data tables for analysis

    model

    output

    Estimate

    trtAwk1 0.23

    trtAwk2 0.45

    trtBwk1 0.12

    trtBwk2 0.0001

    trt Week Estimate

    A 1 0.23

    B 2 0.45

    B 1 0.12

    B 2 0.0001

    split and strip offspecific text

    split

  • 8/11/2019 R Course 2014: Lecture 3

    17/59

    String functionsString functions have two main purposes

    1) cleaning and preparing data tables for analysis

    name

    Bactrocera__tryoni

    Bactrocera__meloni

    Ceratitis__capitata

    Anastrepha__ludensreplace'__' with ' '

    name

    Bactrocera tryoni

    Bactrocera meloni

    Ceratitis capitata

    Anastrepha ludens

  • 8/11/2019 R Course 2014: Lecture 3

    18/59

    String functionsString functions have two main purposes

    1) cleaning and preparing data tables for analysis2) writing R scripts, especially generic script

    automating file paths/names

  • 8/11/2019 R Course 2014: Lecture 3

    19/59

    String functionsString functions have two main purposes

    1) cleaning and preparing data tables for analysis2) writing R scripts, especially generic script

    automating file paths/names

  • 8/11/2019 R Course 2014: Lecture 3

    20/59

    String functionsString functions have two main purposes

    1) cleaning and preparing data tables for analysis2) writing R scripts, especially generic script

    automating file paths/names

  • 8/11/2019 R Course 2014: Lecture 3

    21/59

    paste()1) one of the more important functions that you will use

    2) concatenates two or more objects

    genus species

    Bactrocera tryoni

    Bactrocera meloni

    Ceratitis capitata

    Anastrepha ludens

    name

    Bactrocera tryoni

    Bactrocera meloni

    Ceratitis capitata

    Anastrepha ludens

  • 8/11/2019 R Course 2014: Lecture 3

    22/59

    paste()

  • 8/11/2019 R Course 2014: Lecture 3

    23/59

    paste()

  • 8/11/2019 R Course 2014: Lecture 3

    24/59

    stringr packagefunctions start with 'str_'

    Note - Like most R functions, there are many other functions that do similar things. stto add consistency and have all the main functions in one place (can just start typing s

    t i B i t i

  • 8/11/2019 R Course 2014: Lecture 3

    25/59

    str_c( object1, object2, ) # same as paste(...,sep='') or paste0()

    stringr - Basic string opera

    stringr Basic string opera

  • 8/11/2019 R Course 2014: Lecture 3

    26/59

    str_split_fixed(object1, pattern, num_splits) # break apart a string by a

    stringr - Basic string opera

    stringr Basic string opera

  • 8/11/2019 R Course 2014: Lecture 3

    27/59

    str_split_fixed(object1, pattern, num_splits) # break apart a string by a

    stringr - Basic string opera

    stringr Basic string opera

  • 8/11/2019 R Course 2014: Lecture 3

    28/59

    str_length( object ) # gets the length of each element in a vector

    stringr - Basic string opera

    stringr Basic string opera

  • 8/11/2019 R Course 2014: Lecture 3

    29/59

    str_sub( object, start, end ) # substring - remove part of a string by po

    stringr - Basic string opera

    stringr Basic string opera

  • 8/11/2019 R Course 2014: Lecture 3

    30/59

    str_trim(object, side) # remove leading and/or trailing whitespace

    stringr - Basic string opera

    stringr Basic string opera

  • 8/11/2019 R Course 2014: Lecture 3

    31/59

    stringr - Basic string opera

    str_trim(object, side) # remove leading and/or trailing whitespace

    stringr - Pattern matchi

  • 8/11/2019 R Course 2014: Lecture 3

    32/59

    str_detect( object, pattern) # determine if the string contains the patte

    # returns TRUE or FALSE

    stringr - Pattern matchi

    stringr - Pattern matchi

  • 8/11/2019 R Course 2014: Lecture 3

    33/59

    str_detect( object, pattern) # determine if the string contains the patte

    # returns TRUE or FALSE

    stringr - Pattern matchi

    stringr - Pattern matchi

  • 8/11/2019 R Course 2014: Lecture 3

    34/59

    str_detect( object, pattern) # determine if the string contains the patte

    # returns TRUE or FALSE

    stringr - Pattern matchi

    stringr - Pattern matchi

  • 8/11/2019 R Course 2014: Lecture 3

    35/59

    str_replace(string, pattern, repalce_with) # replace pattern with anoth

    stringr - Pattern matchi

    stringr - Pattern matchi

  • 8/11/2019 R Course 2014: Lecture 3

    36/59

    str_replace(string, pattern, repalce_with) # replace pattern with anoth

    stringr Pattern matchi

  • 8/11/2019 R Course 2014: Lecture 3

    37/59

    Regular expressioncomputer language built to find stuff in strings

    '^' starts with

  • 8/11/2019 R Course 2014: Lecture 3

    38/59

    Regular expressioncomputer language built to find stuff in strings

    '$' ends with

  • 8/11/2019 R Course 2014: Lecture 3

    39/59

    Regular expressioncomputer language built to find stuff in strings

    '[0-9]' find strings with numbers

  • 8/11/2019 R Course 2014: Lecture 3

    40/59

    Regular expressioncomputer language built to find stuff in strings

    '[0-9]' find strings with numbers

  • 8/11/2019 R Course 2014: Lecture 3

    41/59

    Regular expressioncomputer language built to find stuff in strings

    '[a-z]' find strings with specific letters

    Advanced (putting it al

  • 8/11/2019 R Course 2014: Lecture 3

    42/59

    Advanced (putting it altogether):

    Output for publication tabmod

  • 8/11/2019 R Course 2014: Lecture 3

    43/59

    Advanced (putting it altogether):

    Output for publication tabmod

  • 8/11/2019 R Course 2014: Lecture 3

    44/59

    Output for publication ta

    O t t f bli ti t

  • 8/11/2019 R Course 2014: Lecture 3

    45/59

    Output for publication talibrary(rtf)

    rtf

  • 8/11/2019 R Course 2014: Lecture 3

    46/59

    Dates in R

  • 8/11/2019 R Course 2014: Lecture 3

    47/59

    Dates in R

    Three date/time classes are built-in in R

    DatePOSIXct

    POSIXlt

    Dates in R

  • 8/11/2019 R Course 2014: Lecture 3

    48/59

    Dates in R

    Three date/time classes are built-in in R

    DatePOSIXct - Portable Operating System Interface [for Unix] cale

    POSIXlt - Portable Operating System Interface [for Unix] loca

    Dates in R

  • 8/11/2019 R Course 2014: Lecture 3

    49/59

    ClassDate

    This is the class you could use if you have only dates, BUT no times,

    Dates in R

    Symbol Explanation

    %a Abbreviated week

    %A Full weekday (e.g.

    %m Abbreviated mont

    %M Full month (e.g. J

    .%d Day of the month

    .

    %H Hours as decimal

    .

    %j Day of the year (0

    .

    Create a date:

    Non-standard formats must be specified:

  • 8/11/2019 R Course 2014: Lecture 3

    50/59

  • 8/11/2019 R Course 2014: Lecture 3

    51/59

    Dates in R

  • 8/11/2019 R Course 2014: Lecture 3

    52/59

    ClassPOSIXlt

    This class enables easy extraction of specific components of a time.

    Dates in R

    Create some POSIXlt objects: NOTE:

    Internal integer representation:

    Extract components of a time object:

    Dates in R

  • 8/11/2019 R Course 2014: Lecture 3

    53/59

    Dates in Rfunctionformat()

    Format R Objects (not specific to POSIXct and POSIXlt objects)

    Symbol Explanation

    %a Abbreviated weekd

    %A Full weekday (e.g.

    %m Abbreviated month

    %M Full month (e.g. Ja

    .

    %d Day of the month (

    .

    %H Hours as decimal n

    .

    %j Day of the year (00

    .

    Dates in R

  • 8/11/2019 R Course 2014: Lecture 3

    54/59

    atesR Package lubridate

    ?Dates and Times Made Easy?

    - http://www.jstatsoft.org/v40/i03/paper

    NAs OR Missing Data

  • 8/11/2019 R Course 2014: Lecture 3

    55/59

    gIn R, missing values are represented by the symbol NA (not available). Imvalues (e.g., dividing by zero) are represented by the symbol NaN (not a n

    Testing for missing values:

    Excluding missing data from analysis

    Next Week

  • 8/11/2019 R Course 2014: Lecture 3

    56/59

    Data frame manipulation

    grammar of data manipulation (dplyr package)

    restructuring dataframes

  • 8/11/2019 R Course 2014: Lecture 3

    57/59

    Lecture 3: Hands on Sectio

  • 8/11/2019 R Course 2014: Lecture 3

    58/59

    Getting Started

  • 8/11/2019 R Course 2014: Lecture 3

    59/59

    g

    1) download zip file from http://github.com/bfanson/Rcourse_pr

    2) move 'R programs/Lecture3.R' into your 'Rcourse_proj/R programs

    3) move 'data/' folder and replace all your data files