r introduction

20
R Intro Week 1 Scott Chamberlain [modified from Haldre Rogers] September 9, 2011

Upload: schamber

Post on 13-May-2015

4.366 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: R Introduction

R IntroWeek 1

Scott Chamberlain[modified from Haldre Rogers]

September 9, 2011

Page 2: R Introduction

Don’t just listen to me! Other Intros to R:

• http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf

• http://www.cyclismo.org/tutorial/R/• http://www.r-tutor.com/r-introduction• Quick R: http://www.statmethods.net/• http://www.bioconductor.org/help/course-materials/2011/CSAMA/Mond

ay/Morning%20Talks/R_intro.pdf

Page 3: R Introduction

R user frameworks• R from command line: OSX and PC

– Just type “R” into the command line – and have fun!

• R itself– http://www.r-project.org/

• RStudio – good choice– http://www.rstudio.org/

• RevolutionR [free academic version] – this is sort of the SAS-ised version of R– http://www.revolutionanalytics.com/downloads/free-academic.php– Uses proprietary .xdf file format that speeds up computation times

• Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors– https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources

• If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R– You can learn using these interfaces what code does what after pressing

buttons

Page 4: R Introduction

R user frameworks, cont.• R from Python

– RPy: http://rpy.sourceforge.net/

• C from R: – rcpp package:

• http://cran.r-project.org/web/packages/Rcpp/index.html • http://dirk.eddelbuettel.com/code/rcpp.html

– Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R.• E.g.,

http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html

• & http://dirk.eddelbuettel.com/code/rcpp.examples.html

• Excel from R– XLConnect package:

http://cran.r-project.org/web/packages/XLConnect/index.html

• And more….see for yourself

Page 5: R Introduction

R Tips

• R can crash Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: – https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources

• When asking for help on listserves/help websites, use BRIEF and REPRODUCIBLE examples– Not doing this makes people not want to help you!

• R automatically overwrites files with the same file name!!!!– Make sure you want to overwrite a file before doing so

Page 6: R Introduction

Style

Page 7: R Introduction

Not this kind of style…

Page 8: R Introduction

This kind of style!!!

Page 9: R Introduction

Style

Style is important so YOU and OTHERS can read your code and actually use it

• Google style guide: – http://google-styleguide.googlecode.com/svn/tru

nk/google-r-style.html#generallayout• Henrik Bengtsson style guide: – http://www1.maths.lth.se/help/R/RCC/

• Hadley Wickham's style guide: – https://github.com/hadley/devtools/wiki/Style

Page 10: R Introduction

Preparing your data for R

• What makes clean data?– Correct spelling– Identical capitalization (e.g. Premna vs premna)

• If myvector <- c(3, 4, 5), calling Myvector does not work!

– No spaces between words (spaces turned into “.”)• Generally try to avoid, use underscores instead

– NA or blank (if using csv) for missing values• Find and replace to get rid of spaces after words• I generally keep an .xls and a .csv file so you can

always recreate work in R with the .csv file and still modify the .xls file

Page 11: R Introduction

Bringing data into R• Create csv file

– One worksheet only– No special formatting, filters, comments etc.– Copy only columns and rows with your data to the CSV, as R will read in columns without data

sometimes

• Name your variables well – self-explanatory, unique, lowercase, short-ish, one-word names

• In R, set the working directory– setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro")– What is the working directory? getwd()– What is in the working directory? dir()

• Read in data– CSV files: iris.df <- read.csv("iris_df.csv", header=T)– Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it– From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV")– From excel files: (using the XLConnect package)iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”)

• Write data– write.csv(dataframe, “dataframename.csv”), OR– save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]

Page 12: R Introduction

R data structures• Scalar:

– Object with a single value, either numeric or character• Vector:

– Sequence of any values, including numeric, character, and NA• List:

– Arbitrary collections of variables – very useful R object• Character:

– Text, e.g., “this is some text”• Factor:

– Like character vectors, but only w/ values in predefined “levels”• Matrix:

– Only numeric values allowed• Dataframe:

– Each column can be of a different class• Immutable dataframe:

– special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations

• Function• Environment

Page 13: R Introduction

Exploring dataframes• str(dataframe) gives column formats and dimensions• head(dataframe) and tail() give first and last 6 rows• names(dataframe) gives column names• row.names(dataframe) gives row names• attributes(dataframe) gives column and row names and object class• summary(dataframe) gives a lot of good information

– Make sure variables are appropriate form• Character/string, Numeric, Factor, Integer, logical

– Make sure mins, maxs, means, etc. seem right– Make sure you don’t have typing errors so Premna and premna are two

separate factors• Use: unique(iris$species) to see what all unique values of a column

are• Or use: levels(spider$species) to see different levels

Page 14: R Introduction

To attach or not to attach…that is the question

• Some like to use ‘attach’ to make dataframe variables accessible by name within the R session

• Generally, ‘attach’ is frowned upon by R junkies. • Use dataframe$y, or data=dataframe, or

dataframe[,”y”], or dataframe[, 2]• To detach the object, use: detach()

I recommend: do not use attach, but do what you want

Page 15: R Introduction

R Packages

• 3,262 packages!!!!• Packages are extensions written by anyone for any purpose,

usually loaded by:– install.packages(”packagename”), then– require(packagename) or library()– Use ?functionname for help on any function in base R or in

R packages– In RStudio, just press tab when in parentheses after the

function name to see function options!!!• Explore packages at the CRAN site:

– http://cran.r-project.org/web/packages/

• Inside-R package reference: – http://www.inside-r.org/packages

Page 16: R Introduction

Data manipulation• Packages: plyr, data.table, doBY, sqldf,

reshape2, and more• Comparison of packages– Modified from code from Recipes,

scripts and Genomics blog: https://gist.github.com/878919

– data.table is by far the fastest!!! – BUT, ease of use and flexibility may be

plyr? See for yourself…• Also, see examples in the tutorial

code for reshape2 package for neat data manipulation tricks

Page 17: R Introduction

Visualizations

• A few different approaches:– Base graphics– Lattice graphics– Grid graphics– ggplot2 graphics– Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics

• An example:

Page 18: R Introduction

more on ggplot2 graphics

• There are classes taught by Hadley Wickham here at Rice if you want to learn more!– Data visualization (Stat645): http://had.co.nz/stat645/– Statistical computing (Stat405):

http://had.co.nz/stat405/• Hadley’s website is really helpful:

http://had.co.nz/ggplot2/ • The ggplot2 google groups site:

https://groups.google.com/forum/#!forum/ggplot2

Page 19: R Introduction

QUICK RSTUDIO RUN THROUGH

Keyboard shortcuts!!http://www.rstudio.org/docs/using/keyboard_shortcuts

Page 20: R Introduction

USE CASE HERE[see intro_usecase.R file]