basic introduction into r

91
Quantitave research methods Data analysis workflow Statistical Software Installing R and RStudio Getting help Introduction into R Part 1A Richard L. Zijdeman 2016-06-15 Richard L. Zijdeman Introduction into R

Upload: richard-zijdeman

Post on 16-Apr-2017

141 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Introduction into RPart 1A

Richard L. Zijdeman

2016-06-15

Richard L. Zijdeman Introduction into R

Page 2: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

1 Quantitave research methods

2 Data analysis workflow

3 Statistical Software

4 Installing R and RStudio

5 Getting help

Richard L. Zijdeman Introduction into R

Page 3: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Quantitave research methods

Richard L. Zijdeman Introduction into R

Page 4: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Why

To answer descriptive and explanatory questions on populations

Richard L. Zijdeman Introduction into R

Page 5: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Workflow: PTE

problem (research question)theory (hypothesis)empirical test . . . with loops between T-E and P-T-E

Richard L. Zijdeman Introduction into R

Page 6: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Research Questions

descriptive (to what extent. . . )comparative (comparing two entities)

trend (comparison over time)

explanatory (focus on mechanism at hand)

Richard L. Zijdeman Introduction into R

Page 7: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Theory

deductive reasoningexplanans

general mechanismcondition

explanandum (hypothesis)

Richard L. Zijdeman Introduction into R

Page 8: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Empirical test

sample vs. populationrandom vs. stratified samplestesting technique, e.g.:

T-test, correlation, regression

Software required for faster analysis

Richard L. Zijdeman Introduction into R

Page 9: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Data analysis workflow

Richard L. Zijdeman Introduction into R

Page 10: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Empirical testings has its own workflow

Grolemund & Wickham, 2016, Creative CommonsAttribution-NonCommercial-NoDerivs 4.0.

Richard L. Zijdeman Introduction into R

Page 11: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Statistical Software

Richard L. Zijdeman Introduction into R

Page 12: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

The dangers of analysing with spreadsheets(e.g. MS Excel)

tempting to input and clean data and analyse in the same sheetdi�cult to track cleaning rulesdefaults mess up your data (e.g. 01200 -> 1200)

Richard L. Zijdeman Introduction into R

Page 13: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Why use syntax (scripting)

E�ciency (really)Quality (error checking)ReplicatabilityCommunication

Richard L. Zijdeman Introduction into R

Page 14: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

R

R is open source, which is good and bad:anybody can contribute (check, improve, create code)free of chargebut: R depends on collective action

cannot ‘demand’ supportsprawl of packages

Richard L. Zijdeman Introduction into R

Page 15: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

RStudio

browser for Rprovides easy access to:

scriptsdataplotsmanual

Richard L. Zijdeman Introduction into R

Page 16: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Installing R and RStudio

Richard L. Zijdeman Introduction into R

Page 17: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Download R

Instructions via http://www.r-project.org

Choose a CRAN mirrorhttp://cran.r-project.org/mirrors.html

close, but active too!Romania hasn’t gone (yet!)

Click on ‘Download R for Windows’Follow usual installation procedureDouble click on R

You should now have a working session!Close the session, do not save workspace image

Richard L. Zijdeman Introduction into R

Page 18: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Packages and libraries

base R (core product)additional packages

CRAN repositoryspread through ‘mirrors’

choose a local, but active mirror

Githubpackages not on CRAN

development versions of CRAN libraries

Richard L. Zijdeman Introduction into R

Page 19: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

RStudio

RStudio is found on http://www.rstudio.com

Download the version for your OS (e.g. windows)http://www.rstudio.com/products/rstudio/download/

Install by double clicking on the downloaded fileStart RStudio by double clicking on the iconYou do not need to start R, before starting RStudio

Richard L. Zijdeman Introduction into R

Page 20: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Getting help

Richard L. Zijdeman Introduction into R

Page 21: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Build-in help: “?”

?[function] / ?[package]e.g. “?plot” or “?graphics”

check the index for user guides and vignettes

Richard L. Zijdeman Introduction into R

Page 22: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Cran website

ManualsR FAQR Journal

Richard L. Zijdeman Introduction into R

Page 23: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Online communities

StackoverflowInstance of StackexchangeReputation based Q&A

Specific lists for packages, e.g.:ggplot2R-sig-mixed-models

Richard L. Zijdeman Introduction into R

Page 24: Basic introduction into R

Quantitave research methodsData analysis workflow

Statistical SoftwareInstalling R and RStudio

Getting help

Asking a question Getting an answer

Search the web: others must have had this problem tooIf you raise a question:

be politebe conciseshort backgroundreplicatable exampledebrief your e�orts sofar

Richard L. Zijdeman Introduction into R

Page 25: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Introduction into R

Part 1B

Richard L. Zijdeman

2016-06-15

Richard L. Zijdeman Introduction into R

Page 26: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

1

Introducing RStudio and R

2

Introducing base R

3

Data visualization using ggplot2

Richard L. Zijdeman Introduction into R

Page 27: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Introducing RStudio and R

Richard L. Zijdeman Introduction into R

Page 28: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

RStudio

Rstudio is sort of a ‘viewer’ on Rhelps to organize input and output:

editor (upper left)console (lower left)environment (upper right)output (lower right)

Richard L. Zijdeman Introduction into R

Page 29: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

R script

series of ))commands to manipulate dataalways save your script, NEVER change your data

original data + script = reproducable research

Richard L. Zijdeman Introduction into R

Page 30: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Packages

Build your R system using packages‘Base R’ is basic. Add packages for your specific needsPackages are found on servers, called ‘mirrors’

Make sure to select a mirror firsthttps://cran.r-project.org/mirrors.html%5Bhttps://cran.r-project.org/mirrors.html%5D

## To permanently add the mirror, type:options(repos=structure(

c(CRAN="http://cran.xl-mirror.nl")))## replace http://... with your favorite mirror

Richard L. Zijdeman Introduction into R

Page 31: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Packages for book (see 1.4.2)

pkgs <- c("broom", "dplyr", "ggplot2", "jpeg", "jsonlite","knitr", "Lahman", "microbenchmark", "png", "pryr","purrr", "rcorpora", "readr", "stringr", "tibble","tidyr"

)install.packages(pkgs)

Richard L. Zijdeman Introduction into R

Page 32: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

R Session

contains scripts, data, functionscan be saved ‘workspace image’prefer not to:

sessions are usually clutteredonly useful if running script takes time

Suggested tweak:Options: uncheck “Restore .RData into workspace at startup”Options: Save workspace to .RData on exit, select ‘never’

Richard L. Zijdeman Introduction into R

Page 33: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Introducing base R

Richard L. Zijdeman Introduction into R

Page 34: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

base R: assignment and print()

‘attach’ values to an object (e.g. a variable)

x <- 5y <- 4z <- x * yprint(z)

## [1] 20

Richard L. Zijdeman Introduction into R

Page 35: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

base R: assignment and print() (II)

Try and imagine the potential of assignment

x <- c(4, 3, 2, 1, 0, 27, 34, 35)# �c� for concatenate values

y <- -1z <- x*yprint(z)

## [1] -4 -3 -2 -1 0 -27 -34 -35

Richard L. Zijdeman Introduction into R

Page 36: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

base R: data.frame

basically a tablecontains columns (variables)contains rows (cases)“flat table” in Kees’ terminology

my.df <- data.frame(x,z)str(my.df) # show STRucture

## �data.frame�: 8 obs. of 2 variables:## $ x: num 4 3 2 1 0 27 34 35## $ z: num -4 -3 -2 -1 0 -27 -34 -35

There’s much more, but let’s keep that for tomorrow

Richard L. Zijdeman Introduction into R

Page 37: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Data visualization using ggplot2

Richard L. Zijdeman Introduction into R

Page 38: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Visualizing your data

Not just for analyses!Data quality

representativenessmissing data

Richard L. Zijdeman Introduction into R

Page 39: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

plot() in base R

library(help = "datasets") # all datasets in R

?mtcars # show help on mtcars dataset

df <- mtcars()str(mtcars) # display STRucture of an object

plot(mtcars$hp, mtcars$mpg)plot(df)

Richard L. Zijdeman Introduction into R

Page 40: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

plot() is like . . .

plot() is like latex:Forge it in anyway you wantHeterogeneous approach thoughTakes quite some time to get it right

Richard L. Zijdeman Introduction into R

Page 41: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

ggplot() as alternative

ggplot is but one of many graph packages ggplot is nice bc, of:similar approach to various types of graphseasy build up for basic graphscan get quite complex too(but cannot do it all)

Richard L. Zijdeman Introduction into R

Page 42: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

ggplot() and the canvas metaphore

ggplot() consists of two elementscanvas(multiple) layers of paint

Richard L. Zijdeman Introduction into R

Page 43: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

mapping and geom layers

ggplot() consists of two elementscanvas:

datamapping (aesthetic)

(multiple) layers of paintgeom layers

ggplot(data = <DATASET>,mapping = aes(x = <X-VAR>, y = <Y-VAR>)) +

geom_<TYPE>

Richard L. Zijdeman Introduction into R

Page 44: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

our first ggplot

install.packages("ggplot2")library(ggplot2)df <- mtcarsggplot(data = df, aes(x = hp, y = mpg)) +

geom_point()

Richard L. Zijdeman Introduction into R

Page 45: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

geom_ features

? geom_point

install.packages("ggplot2")library(ggplot2)df <- mtcarsggplot(data = df, aes(x = hp, y = mpg)) +

geom_point(fill = "white", colour = "blue",shape = 21, size = 4)

Richard L. Zijdeman Introduction into R

Page 46: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Adding characteristics to your plot

Add variables to explain a pattern

ggplot(data = df, aes(x = hp, y = mpg)) +geom_point(aes(colour = wt), size = 4)

NB: notice the di�erence?

ggplot(data = df, aes(x = hp, y = mpg)) +geom_point(aes(colour = wt, size = 4))

Richard L. Zijdeman Introduction into R

Page 47: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Multiple geom’s

Add variables to explain a pattern

ggplot(data = df, aes(x = hp, y = mpg)) +geom_point(aes(colour = as.factor(am)),

size = 6) + # increase size bc overlap

geom_point(aes(shape = as.factor(vs)),size = 3)

#V/S whether V8 (0) or Straight (European) (1)

Richard L. Zijdeman Introduction into R

Page 48: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Adding facets

Facets help reduce complexity

ggplot(data = df, aes(x = hp, y = mpg)) +geom_point(aes(colour = as.factor(am)),

size = 4) +facet_wrap( ~ vs)

Richard L. Zijdeman Introduction into R

Page 49: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Things to consider with geom(_point)

fill only works where shape actually can be filledconsider order of geomsmind overlap:

decrease sizeuse alphause ‘open’ shapesgeom_jitter

Richard L. Zijdeman Introduction into R

Page 50: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

ggplot and titles

Various ways to add titlex to axes and stu�Can get quite complexHere’s the basiscs

ggplot(data = df, aes(x = hp, y = mpg)) +geom_point() +

labs(title = "Nice graph", x = "Horse Power",y = "Miles per Gallon" )

Richard L. Zijdeman Introduction into R

Page 51: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Themes and size

ggplot(data = df, aes(x = hp, y = mpg)) +geom_point() +

labs(title = "Nice graph", x = "Horse Power",y = "Miles per Gallon" ) +

theme_bw(base_size = 16)

Richard L. Zijdeman Introduction into R

Page 52: Basic introduction into R

Introducing RStudio and R

Introducing base R

Data visualization using ggplot2

Much more to learn

not just about ggplot()axeslegend (guides)geoms

also about dataviz in generalgeneral do’s and don’tswhich problem fits which graphit’s a science! (Graph theory)

Richard L. Zijdeman Introduction into R

Page 53: Basic introduction into R

Data wrangling

bit about NA

Introduction into R

Part 2A, 2B

Richard L. Zijdeman

2016-06-16

Richard L. Zijdeman Introduction into R

Page 54: Basic introduction into R

Data wrangling

bit about NA

1

Data wrangling

2

bit about NA

Richard L. Zijdeman Introduction into R

Page 55: Basic introduction into R

Data wrangling

bit about NA

Data wrangling

Richard L. Zijdeman Introduction into R

Page 56: Basic introduction into R

Data wrangling

bit about NA

Grolemund & Wickham, 2016, Creative Commons

Attribution-NonCommercial-NoDerivs 4.0.

Richard L. Zijdeman Introduction into R

Page 57: Basic introduction into R

Data wrangling

bit about NA

dplyr package

# install.packages("dplyr") # 1 time only

library(dplyr)

install.packages("nycflights13")

library(nycflights13)

print(flights)

Richard L. Zijdeman Introduction into R

Page 58: Basic introduction into R

Data wrangling

bit about NA

tibble or data_frame vs data.frame

str(mtcars)

class(mtcars)

mtcars_tbl <- as_data_frame(mtcars)

str(mtcars)

class(mtcars)

Richard L. Zijdeman Introduction into R

Page 59: Basic introduction into R

Data wrangling

bit about NA

filter

filter(mtcars, am == 1, vs == 0)

some.cars <- filter(mtcars, am == 1, vs == 0)

some.cars

(some.cars2 <- filter(mtcars, am == 1, vs == 0))

Richard L. Zijdeman Introduction into R

Page 60: Basic introduction into R

Data wrangling

bit about NA

filter and using or

filter(mtcars, gear == 3 | gear == 4) # !! not like this:

filter(mtcars, gear == 3 | 4)

Richard L. Zijdeman Introduction into R

Page 61: Basic introduction into R

Data wrangling

bit about NA

bit about NA

Richard L. Zijdeman Introduction into R

Page 62: Basic introduction into R

Data wrangling

bit about NA

Arrange

arrange(flights, dep_time)

arrange(flights, year, month, day) # ascending order

arrange(flights, desc(day))

# NB: missing values come at end

Richard L. Zijdeman Introduction into R

Page 63: Basic introduction into R

Data wrangling

bit about NA

Select

df <- select(flights, year, month, day)

names(flights)

df <- select(flights, tailnum:dest)

df <- select(flights, -(tailnum:dest))

df

df <- select(flights, starts_with("arr_"))

df <- select(flights, ends_with("e"))

df <- select(flights, contains("a"))

Richard L. Zijdeman Introduction into R

Page 64: Basic introduction into R

Data wrangling

bit about NA

rename

df <- rename(flights, Y_ear = year)

df <- mutate(flights, year1 = year+1)

select(df, year, year1)

df <- mutate(flights, year1 = year + 1, year2 = year1+1)

select(df, contains("year"))

df <- transmute(flights, year1 = year + 1, year2 = year1+1)

# only maintains the newly created variables

Richard L. Zijdeman Introduction into R

Page 65: Basic introduction into R

Data wrangling

bit about NA

group_by

by_day <- group_by(flights, year, month, day)

summarise(by_day)

cars <- mtcars

cars <- as_data_frame(mtcars)

summarise(cars, mean_hp = mean(hp, na.rm = TRUE))

mean(cars$hp, na.rm = TRUE)

Richard L. Zijdeman Introduction into R

Page 66: Basic introduction into R

Data wrangling

bit about NA

the pipe: %>%

cars_grp <- group_by(cars, carb)

class(cars)

class(cars_grp)

summarise(cars_grp, mmpg = mean(mpg, na.rm = TRUE))

cars_grp_sum <- summarise(cars_grp,

mmpg = mean(mpg, na.rm = TRUE),

count = n())

cars_grp_sum

plot <- ggplot(cars_grp_sum,

aes(x = carb, y = mmpg,

label = carb)) +

geom_point(aes(size = count)) +

geom_text(colour = "cyan")

plot

cars_grp_sum2 <- cars %>%

group_by(carb) %>%

summarise(mmpg = mean(mpg, na.rm = TRUE),

count = n())

ggplot(cars_grp_sum2, aes(x = carb, y = mmpg, label = carb)) +

geom_point(aes(size = count)) +

geom_text(colour = "cyan") +

labs(title = "figure with %>%")

Richard L. Zijdeman Introduction into R

Page 67: Basic introduction into R

Data wrangling

bit about NA

more pipe, adding a filter

cars_grp_sum3 <- cars %>%

group_by(carb) %>%

summarise(mmpg = mean(mpg, na.rm = TRUE),

count = n()) %>%

filter(count > 3)

ggplot(cars_grp_sum3, aes(x = carb, y = mmpg, label = carb)) +

geom_point(aes(size = count)) +

geom_text(colour = "cyan") +

labs(title = "figure with %>% and count > 3")

Richard L. Zijdeman Introduction into R

Page 68: Basic introduction into R

Session management

Basic data manipulation

Introduction into R

Part 3A

Richard L. Zijdeman

2016-06-17

Richard L. Zijdeman Introduction into R

Page 69: Basic introduction into R

Session management

Basic data manipulation

1

Session management

2

Basic data manipulation

Richard L. Zijdeman Introduction into R

Page 70: Basic introduction into R

Session management

Basic data manipulation

Session management

Richard L. Zijdeman Introduction into R

Page 71: Basic introduction into R

Session management

Basic data manipulation

Maintaining your workspace

Grolemund & Wickham, 2016, Creative Commons

Attribution-NonCommercial-NoDerivs 4.0.

Richard L. Zijdeman Introduction into R

Page 72: Basic introduction into R

Session management

Basic data manipulation

Setting up a session

clear your Environment

check sessionInfo() for loaded packages

detach obsolete packages under ‘other attached packages’

set your directory (“" on windows and”/" for linux/mac)

load libraries (install new ones)

load your data

Richard L. Zijdeman Introduction into R

Page 73: Basic introduction into R

Session management

Basic data manipulation

Example session setup

rm(list = ls())sessionInfo() # check for �other attached packages�

detach("package:nycflights13", unload = TRUE)setwd("/Users/RichardZ/Dropbox/

Summer school 2016/Richard Zijdeman/")getwd() # to see whether you�re in the right directory

dir() # shows what�s in your directory

Richard L. Zijdeman Introduction into R

Page 74: Basic introduction into R

Session management

Basic data manipulation

Loading your data

read.table() (generic function)

read.csv()

library(foreign) # e.g. SPSS and Stata

library(readxl) # fast excel-package

Richard L. Zijdeman Introduction into R

Page 75: Basic introduction into R

Session management

Basic data manipulation

Reading in data

Di�erent functions for di�erent files:

Base R: read.table() (read.csv())

foreign package: read.spss(), read.dta(), read.dbf()

readxl

alternatives packages:

xlsx(Java required)

gdata (perl-based)

openxlsx package: read.xlsx()

Richard L. Zijdeman Introduction into R

Page 76: Basic introduction into R

Session management

Basic data manipulation

read.csv()

file: your file, including directory

header: variable names or not?

sep: seperator

read.csv default: “,”

read.csv2 default: “;”

skip: number of rows to skip

nrows: total number of rows to read

stringsAsFactors

encoding (e.g. “latin1” or “UTF-8”)

Richard L. Zijdeman Introduction into R

Page 77: Basic introduction into R

Session management

Basic data manipulation

read_excel from readxl package

path: your file, including directory

sheet: name or number of sheet

col_names: col names in 1st row?

col_types: specify type

na: what’s the sign for missing values

skip: how many rows to skip before data starts

Richard L. Zijdeman Introduction into R

Page 78: Basic introduction into R

Session management

Basic data manipulation

Example session loading your csv data

# setwd() to set your working directory

hmar100 <- read.csv("./Datafiles_HSN/HSN_marriages.csv",stringsAsFactors = FALSE,encoding = "latin1",header = TRUE,nrows = 100) # just first 100 rows

Richard L. Zijdeman Introduction into R

Page 79: Basic introduction into R

Session management

Basic data manipulation

Example session loading your excel data

# setwd() to set your working directory

install.packages("readxl")library("readxl")hmar <- read_excel("./Datafiles_HSN/HSN_marriages_awful.xlsx",

col_names = TRUE,skip = 3) # empty lines not counted!!!

Richard L. Zijdeman Introduction into R

Page 80: Basic introduction into R

Session management

Basic data manipulation

Basic data manipulation

Richard L. Zijdeman Introduction into R

Page 81: Basic introduction into R

Session management

Basic data manipulation

Change case of text

tolower()

toupper()

tolower("CaN we pleASe jUSt have LOWER cases?")names(hmar) <- tolower(names(hmar))

Richard L. Zijdeman Introduction into R

Page 82: Basic introduction into R

Session management

Basic data manipulation

length()

Used to count how many instances there are

length(names(hmar))# shows number of variables in hmar

Richard L. Zijdeman Introduction into R

Page 83: Basic introduction into R

Basic statistical techniques

Introduction into R

Part 3B

Richard L. Zijdeman

2016-06-17

Richard L. Zijdeman Introduction into R

Page 84: Basic introduction into R

Basic statistical techniques

1

Basic statistical techniques

Richard L. Zijdeman Introduction into R

Page 85: Basic introduction into R

Basic statistical techniques

Basic statistical techniques

Richard L. Zijdeman Introduction into R

Page 86: Basic introduction into R

Basic statistical techniques

Box and whisker plot

Distribution of dataMedian: 50% of the cases above and belowBox: 1st and 3rd quartileInterquartile range (IQR): Q3-Q1Outliers (Tukey, 1977):

x < Q1 - 1.5*IQRx > Q3 + 1.5*IQR

Richard L. Zijdeman Introduction into R

Page 87: Basic introduction into R

Basic statistical techniques

p <- ggplot(hmar, aes(sign_groom, age_groom))

p + geom_boxplot()

Richard L. Zijdeman Introduction into R

Page 88: Basic introduction into R

Basic statistical techniques

hmar <- mutate(hmar, sign_groomD = (sign_groom == "h" & !(is.na(sign_groom))))

p <- ggplot(hmar, aes(sign_groomD, age_groom))

p + geom_boxplot()

Richard L. Zijdeman Introduction into R

Page 89: Basic introduction into R

Basic statistical techniques

hmar <- mutate(hmar, sign_groomD = (sign_groom == "h" & !(is.na(sign_groom))))

p <- ggplot(hmar, aes(sign_groomD, age_groom))

p + geom_boxplot() + geom_jitter(shape = 24, width = 0.2)

Richard L. Zijdeman Introduction into R

Page 90: Basic introduction into R

Basic statistical techniques

library(stats)

var.test(age_groom ~ sign_groomD, data = hmar)

t.test(age_groom ~ sign_groomD, data = hmar)

# NB: always check for variances

Richard L. Zijdeman Introduction into R

Page 91: Basic introduction into R

Basic statistical techniques

A small PTE project

Look at the variables in the HSN filesThink of a research questionProvide a general mechanism and hypothesisPlot your results

Richard L. Zijdeman Introduction into R