introduction to the r project for statistical computing for this lecture ... the r project for...

25
OS Tools for Spatial Ecological modeling  University of Basilicata, Italy – May 2010 Introduction to the R Project for Statistical Computing Stefano CASALEGNO, Ph.D. www.spatial-ecology.net [email protected]

Upload: hakhuong

Post on 19-Mar-2018

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Master GIS Remote Sensing University of Saragoza Spain 

Introduction to the 

R Project for Statistical Computing

March 2010

Stefano CASALEGNO, Ph.D.

[email protected]

Page 2: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Topics for this lecture

1. Introducing the R Project for Statistical Computing: what and why?

2. Getting help: ressources for learning R

3. Applications: Using R for Spatial Ecological modelling

4. Editing scripts with KATE

[email protected]

Page 3: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

1. GENERAL INTRODUCTION

The R Project for Statistical Computing: 

what and why?

www.spatial-ecology.net

Page 4: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

What ?

R is a language and environment for statistical computing and graphics. 

It is a GNU OS project               :  open source free software, a mass collaboration project

R is based and similar to the S language and environment  → developed at Bell Laboratories (formerly AT&T) by John Chambers and colleagues. (the same group that developed C and UNIX©)

www.spatial-ecology.net1. R introduction

Page 5: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Software or Environment ?

Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented. 

R has its own LaTeX­like documentation format, which is used to supply comprehensive documentation, both on­line in a number of formats and in hardcopy. 

www.spatial-ecology.net1. R introduction

Page 6: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

The R environment

The term "environment" is intended to characterize        as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. 

www.spatial-ecology.net1. R introduction

Page 7: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

What does the R environment includes ?

an effective data handling and storage facility,

a suite of operators for calculations on arrays, in particular matrices,

 a large, coherent, integrated collection of intermediate tools for data analysis,

graphical facilities for data analysis and display either on­screen or on hardcopy, and

a well­developed, simple and effective programming language which includes conditionals, loops, user­defined recursive functions and input and output facilities. 

www.spatial-ecology.net1. R introduction

Page 8: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

WHY ?

Page 9: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Peculiarity

In S a statistical analysis is normally done as a series of steps, with intermediate results being stored in objects.   

Thus whereas SAS and SPSS will give copious output from a regression or discriminant analysis,

R will give minimal output and store the results in a fit object for subsequent interrogation by further R functions.

www.spatial-ecology.net1. R introduction

Page 10: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

FREE There are no restrictions on access or use. Scientifically robust It is the product of 

international collaboration between top computational statisticians and computer language designers

It runs on almost all operating systems  It allows statistical analysis and modelling of high 

sophistication: you are not limited to one method of accomplishing a given computation or graphical presentation

Advantages of 

www.spatial-ecology.net1. R introduction

Page 11: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

It can work on objects of unlimited size and complexity (cluster processing)

Exchange data (csv, Gdal) and work environment ( shell / GRASS) 

It is supported by comprehensive online technical documentation and user­contributed community 

Repetitive functions ”scripts” Published and available source codes

Advantages of             2 

www.spatial-ecology.net1. R introduction

Page 12: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Command line Learn the S language Approach a new way of thinking about data, as 

objects each with its type, which in turn supports a set of methods.

R works on Random Access Memory          

 RAM is a type of physical memory that can be read from and written to.

Disadvantagtes of              

www.spatial-ecology.net1. R introduction

Page 13: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

2. Resources for learning 

http://www.r­project.org/ Introductions and tutorials Textbooks, manuals Web R News, Mailing lists, user’s conference

... R help 

www.spatial-ecology.net2. Learning R

Page 14: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Introductions and tutorials 

Venables, W. N. ; Smith, D. M. ; R Development Core Team, 2007. An Introduction to  R (Notes on R: A Programming Environment for Data Analysis and Graphics), Version  2.5.0 (2007­04­23). ISBN 3­900051­12­7  http://www.cran.r­project.org 

Hornik, K. 2007. R FAQ: Frequently Asked Questions on R. Version 2.5.2007­04­23.  ISBN 3­900051­08­9  Rossiter, D.G., 2007. Introduction to the R Project for Statistical Computing for use at ITC. Revision 2.95. International Institute for Geo­information Science & Earth  Observation (ITC), Enschede (NL), 129 pp.http://www.itc.nl/personal/rossiter/teach/R/RIntro_ITC.pdf

www.spatial-ecology.net2. Learning R

Page 15: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

text books 

Introductory Statistics with R. Dalgaard, P. 2002. Springer Verlag

Venables, W. N. & Ripley, B. D. 2002. Modern applied statistics with S. New York: Springer­Verlag, 4th edition  

A Handbook of Statistical Analyses Using R, Brian S. Everitt, Torsten Hothorn. 2006 Chapman & Hall.

A Practical Guide to Ecological Modelling: Using R as a Simulation Platform. Karline Soetaert, Peter M.J. Herman. 2008. Springer 

Data Manipulation with R, Phil Spector. 2009. Springer.

www.spatial-ecology.net2. Learning R

Page 16: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Web

Wikipedia on R http://wiki.r­project.org/rwiki/doku.php

Help at UCLA http://www.ats.ucla.edu/stat/r/

help on packages http://astrostatistics.psu.edu/datasets/R/html/index.html

Ecological models and data in R, princeton Universityhttp://www.zoology.ufl.edu/bolker/emdbook/

R seek function http://www.rseek.org/ 

multi­site search engine http://www.dangoldstein.com/search_r.html

www.spatial-ecology.net2. Learning R

Page 17: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

R News, Mailing lists, user’s conference

● MAILING LIST: http://www.r­project.org/mail.html

R­sig­geo: R Special Interest Group on using Geographical data and Mapping https://stat.ethz.ch/mailman/listinfo/r­sig­geo

Help in spanishhttps://stat.ethz.ch/mailman/listinfo/r­help­es

● NEWS LETTER http://cran.r­project.org/doc/Rnews/Rnews_2001­3.pdf

● CONFERENCEShttp://www2.agrocampus­ouest.fr/math/useR­2009/

www.spatial-ecology.net2. Learning R

Page 18: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

3. APPLICATION

Using           for Spatial Ecological modelling

Page 19: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

packages

Basic package of R environment, 8 “standard” packages 

Packages includes: functions / data / examples / manuals

Packages Internet sites 

http://cran.r­project.org

www.spatial-ecology.net3. R spatial

Page 20: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

packages

www.spatial-ecology.net3. R spatial

Page 21: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Spatial data and 

R has dedicated data structures and methods for specific kinds of data (e.g. time series data, spatial data, ecological modelling)

A large number of packages provide spatial statistical methods or interfaces to GIS, and many of them provide data structures and e.g. plotting methods for spatial data.  

www.spatial-ecology.net3. R spatial

Page 22: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Editing          scripts 

using KATE KDE Advanced Text Editor

www.spatial-ecology.net3. R and Kate

Page 23: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

Editing          scripts 

Many editors exists for scripting in different programming languages, for instance...

http://www.activestate.com/komodo_edit/

http://www.gnu.org/software/emacs/

..

Editors can help programming with syntax highlighting, tight integration with the console commands, extensive help and more options.

www.spatial-ecology.net3. R and Kate

Page 24: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

KATE as editor for        scripting

Kate (K Desktop Environment ) is an advanced text editor. http://kate­editor.org/

An easy tool for helping in scripting in R   KDE is a network transparent contemporary desktop

environment for UNIX workstations. KDE seeks to fulfill the

need for an easy to use desktop for UNIX workstations      

script can run it with the source method commands can be pasted into the R

consolewww.spatial-ecology.net3. R and Kate

Page 25: Introduction to the R Project for Statistical Computing for this lecture ... The R Project for Statistical Computing: ... Geo-informationGScienceGKGEarthGGObservationG(ITC)

                     hands on 

  Learning and discovering R by practicing

open KATE and edit~/ost4sem/exercise/basic_r/basic_R.R

www.spatial-ecology.net